Contents

Introduction
Basic Blending
Get it Working
Tried them all!
Who needs MMX?
Who needs
MMX? (cont)
MMX Version
Conclusion

Printable version

Basic Alpha Blending

eq. 1>
result = ALPHA * srcPixel + ( 1 - ALPHA ) * destPixel

Where:
ALPHA - ranges from 0.0 to 1.0
result - Is the alpha blended color
srcPixel - Is the foreground pixel
destPixel- Is the background pixel

This equation says that the greater the ALPHA value the more the result will resemble the srcPixel. Or, when the ALPHA value equals 0.0 the result will equal the destPixel and when ALPHA equals 1.0 the result will equal the srcPixel.

Before we wrap this equation with some code we need to see if we can optimize it. The first thing that should stand out are the two multiplications in the equation. As costly as multiplication is let's see if we can do anything about it. Rearranging things a bit produces an equation with a single multiplication.

eq. 2


result = ALPHA* srcPixel + destPixel - ALPHA * destPixel

result = ALPHA * ( srcPixel - destPixel ) + destPixel

There is one more optimization we can make to the equation. Currently ALPHA is a floating point value. It would be best if all of our calculations were integer only. What if we pick an ALPHA range of 0 - 256 with a granularity of 1. This gives us a broad range of ALPHA values and we can represent the ALPHA value in our code as an integer. Effectively we have multiplied our original floating point ALPHA range by 256 this means we will need to divide by 256 at some point to keep the results of the equation correct. In our code a divide by 256 can be accomplished by simply shifting to the right by 8 which only cost one cycle. Below is our new equation.

eq. 3
result = ( ALPHA * ( srcPixel - destPixel ) ) / 256 + destPixel

Where:
ALPHA - ranges from 0 to 256

A WORD ON TESTING:

I will state millisecond and cycles per pixel timings through out this article, it is important to know the exact conditions under which these timings were taken. First, here are some specs for the PC that was used for all the timings:

Processor	: Intel PentiumII - 350 MHz
Ram	: 128 MB
Video Card	: Viper 550 16 MB
Video Mode	: 640x480 16bpp (5-6-5)

All of the functions I will be describing read from two buffers, manipulate the data, and then write the data out to a buffer. For these timings all of the buffers that were read from or written to resided in system memory, NOT video memory. Even though these routines will be able to handle transparency the timings were taken using a 320x240 pixel bitmap where none of the pixels in the bitmap were set to the transparent (or colorkey) value. Doing this levels the playing field. We want to take our timings on the worst case scenario which for us would be an image where we have to check all pixels for transparency but do not get to skip any of them because none of them match the colorkey value. There are many more factors that come into play when determining the "speed" of an algorithm. What is important is the difference in performance from one attempt to the next while holding these factors constant.

Next : Get It Working