There are two alpha blending examples using MMX on Intel's website. This is what I used to learn the MMX instruction set. Once I was comfortable with the ins and outs of the packed data instructions I put together my own MMX version of alpha blending. You can find the code in the zip file. It is well commented but the interlacing of the instructions makes it difficult to follow. The logic is the same as Listing 3 but instead of working on 2 pixels at a time it operates on 4 at a time! The code is very fast, here are the comparisons. Performance of the Fourth Attempt:
OPTIMIZATIONS: None of the functions I have designed take into consideration that we may be reading or writing to non-aligned addresses. I believe you pay a 3-cycle penalty for this. We need to minimize the number of non-aligned reads and writes. Another optimization that applies only to the MMX function is to test if your four source pixels are all equal to the colorkey value before running through all the calculations. For bitmaps with 50% or more set to the colorkey value this would significantly increase the drawing speed. I'm sure there are countless other optimizations as well.
|
|||||||||||||||||