I coded many second attempts to improve upon the original. All of them improved the speed at which I processed the pixels but for a price! I tried using lookup tables with some good success I believe this resulted in a 40%-45% improvement in speed but you had the overhead and sloppiness of the lookup tables to contend with. Also the lookup tables for 16 bit color are huge and I have heard can cause cache-thrashing on higher end machines. If your goal is a fullscreen fade you can realize a two-fold increase in speed by only fading every other line per frame. This worked extremely well and produces an interesting effect. However, this does not look so hot if you are trying to alpha blend an explosion with your background tiles. If you can live with a single grade of alpha blending (50/50), then there is an extremely fast method to achieve this. For both the source and destination color values you mask off the low bit in each color channel then shift by one to the right, then add the two results together which gives you a 50% faded look. Listing 2 is the inner loop of a routine that show how to do to this for 16bpp 5-6-5 buffers. The code above is designed for a 16bpp surface with a format of 5-6-5. It also checks to see if the srcPixel is equal to the ColorKey in which case it skips drawing this pixel. This gives us transparency as well as alpha blending. This is your basic alpha blending function. We will use the performance of this function as our baseline and will compare subsequent attempts against the numbers below to see how much improvement we have achieved. Performance of the 50/50 single shade alpha blending function:
If you look at who the big cycle-eaters are in the first attempt you see that we do 3 multiplies per pixel. At 10 cycles a piece we know it will take more then 30 cycles per pixel. I got a tip from someone on the former www.Angelic-Coders.com about how to eliminate one of the multiplications which was great but it had the side effect of limiting your ALPHA gradations to only 32. Another suggestion I received was to always read and write 32 bits at a time. This is excellent advice and you'll see it in use later on. And one last tip was that you should never read from video memory! This is what my original 17 second fade function was doing. All of these ideas are good and they all have their place and time but they were not giving me what I was looking for, a lightening fast 16bpp inline alpha blending routine.
|
|||||||||||||||||