Well, we all need MMX and the other SIMD instruction sets, but who says we can't mimic them in "C. The advantage of MMX is that you can perform the same operation on multiple data elements all in the same cycle. If we pack two 16 bit pixels into a single 32 bit value and operate on them as a unit we could eliminate at least 3 multiplication per every 2 pixels! In order to pull this off we will need to modify our alpha blending equation. We cannot afford to have any negative values at any time during the calculation. Our 2 packed pixels are defined as an unsigned 32bit value and if our destination color channel is greater then our source we will get a negative value, which will cause strange things to happen. Negative values are acceptable if you are blending one pixel at a time because the result of equations 1, 2, and 3 will always be a positive value. We need to add the maximum value of a color channel to the source color channel so when we subtract the destination we are assured to have a positive value. For 16 bit color this maximum value is 64 (the green channel can have 6 bits, 111111 binary or 64 in decimal). Of course we need to subtract it back out later to maintain the integrity of the equation. Note that this trick only works for the 16 bit color modes because we are stuffing two 16 bit pixels into one 32 bit value. We will start with equation 3. result = ( ALPHA * ( s - d ) ) / 256 + d Add 64 to the source, this means we must add something to the left side of the equation but what that is will become more clear in a second. result = ( ALPHA * ( ( s + 64 ) - d ) ) / 256 + d After some manipulation of the equation it becomes clear that we need to add ( ALPHA * 64 ) / 256 to the left side of the equation. result = ( ALPHA * s) / 256 + (ALPHA * 64) / 256 - ( ALPHA * d ) / 256 + d The equivalent of adding to the left side is subtracting from the right side. result = ( ALPHA * s) / 256 + (ALPHA * 64) / 256 - ( ALPHA * d ) / 256 + d - (ALPHA * 64) / 256 Now rearrange the equation for efficiency and you get…. Eq 4. Listing 3 put everything together. Take a look at it and then I will walk you through the code. |
|||||||||||