Upcoming Events
Unite 2010
11/10 - 11/12 @ Montréal, Canada

GDC China
12/5 - 12/7 @ Shanghai, China

Asia Game Show 2010
12/24 - 12/27  

GDC 2011
2/28 - 3/4 @ San Francisco, CA

More events...
Quick Stats
67 people currently visiting GDNet.
2406 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!
Link to us Events 4 Gamers
Intel sponsors gamedev.net search:

Contents
 Page 1
 Page 2

 Printable version
 Discuss this article

How it works

I used lookup tables for this formula, but I used them in a way which I’ve never seen used before. I created a standard 256*256 element __int8 (Byte for you VB people) lookup table, but I created it in a way as to allow you to create a pointer to each group of 256 elements… this will probably make more sense later. I also came up with an extremely fast way of eliminating a large number of branching instructions in my code (I checked the assembly myself) by using lots of tricky boolean operations.

Let’s get to some code.

struct LookupLevel {
  unsigned __int8 Values[256];
};

struct LookupTable {
  LookupLevel Levels[256];
} AlphaTable;

__inline unsigned __int8 clipByte(int value) {
  value = (0 & (-(int)(value < 0))) | (value & (-(int)!(value < 0)));
  value = (255 & (-(int)(value > 255))) | (value & (-(int)!(value > 255)));
  return value;
}

int InitTable() {
float fValue,
fAlpha;
int iValue, iAlpha;
  for (iAlpha = 0; iAlpha < 256; iAlpha++) {
    fAlpha = ((float)iAlpha) / 255;
    for (iValue = 0; iValue < 256; iValue++) {
      fValue = ((float)iValue) / 255;
      AlphaTable.Levels[iAlpha].Values[iValue] = clipByte((int)((fValue * fAlpha) * 255));
    }
  }
  return true;
}

LookupLevel * LookupPointer(int alpha) {
  return &AlphaTable.Levels[clipByte(alpha)];
}

Explanation

Well, now that we have that (very small) piece of example code, I’m going to explain it in detail.

The two structs are layered in a way to allow you to easily retrieve a pointer to a single 256 element array, for a single alpha value. This is very efficient for static-alpha blits, in which the alpha values for the image are the same the entire time. Blitting images with alpha channels is somewhat slower, but still very fast. I’ve been using this technique for months, and have done some extensive optimization.

Now, first, we start with the clipByte function. That function uses some cryptic-looking bit manipulation to clip a value to a byte. When optimized by Visual C++ 6.0’s compiler, it results in exactly 15 assembly instructions, none of which involve any sort of branching (or cache-thrashing). A variation of this can also be used for masking and other techniques. Here’s what the assembly for clipByte looks like:

mov    ecx, DWORD PTR _value$[esp-4]
xor    eax, eax
test   ecx, ecx
setge  al
neg    eax
and    ecx, eax
cmp    ecx, 255
setle  al
neg    al
and    al, cl
cmp    ecx, 255
setg   cl
neg    cl
or     al, cl
ret    4

Pretty small, isn’t it? It’s fast, too.

Now, onto the InitTable function. Please excuse my horrible naming convention, I’m one of the world’s worst coders. InitTable is pretty simple, it calculates a value from 0-255 for each element of the lookup table. It’s moderately fast, and probably could be made faster if I ever needed to speed it up. (It currently takes my P3-750 between 0 and 10 milliseconds to run the initialization routine. More than fast enough for me.)

Alright, hopefully you understand it now. Let’s move on to the actual use of the lookup tables: Alpha blending.

I’m not going to demonstrate an alpha blended blit routine here, because it would take far too much time. Perhaps in another article. For now I’m going to demonstrate 3 kinds of color blending. Note that I use the 32-bit BGRA colorspace in all my code, but it is easy to adapt it to other colorspaces.

struct ColorChannels
{
  unsigned __int8 Blue;
  unsigned __int8 Green;
  unsigned __int8 Red;
  unsigned __int8 Alpha;
};

union Pixel32 {
  ColorChannels Channels;
  __int32 Value;
};

__int32 AlphaBlend(Pixel32 DestColor, Pixel32 SourceColor, __int32 Alpha) {
Pixel32 NewColor;
LookupLevel *SourceTable;
LookupLevel *DestTable;

  SourceTable = LookupPointer(Alpha);
  DestTable = LookupPointer(255 - Alpha);

  // The clipByte's aren't really necessary, but they're there to remove the
  // chance for an occasional overflow. They only eat a couple clock cycles anyway.
  NewColor.Channels.Red = clipByte(SourceTable->Values[SourceColor.Channels.Red] +
                          DestTable->Values[DestColor.Channels.Red]);
  NewColor.Channels.Green = clipByte(SourceTable->Values[SourceColor.Channels.Green] +
                            DestTable->Values[DestColor.Channels.Green]);
  NewColor.Channels.Blue = clipByte(SourceTable->Values[SourceColor.Channels.Blue] +
                           DestTable->Values[DestColor.Channels.Blue]);
  return NewColor.Value;
}

Explanation

Alright, there we have it. A fully functional alpha blending routine that will blend two colors. Pretty small, isn’t it? Now, here’s how it works:

It gets two pointers: One for the destination pixels, one for the source pixels. Each is a pointer to a 256-element __int8 (Byte for you VB people) lookup table. Then, it uses those tables to blend each color channel of the source and destination pixels. It’s quite fast, even in its unoptimized state. I have the clipByte() around the values, just in case one of the calculations comes out to 256 or something. You could remove the clipByte and probably not have any problems, but it’s there anyway.

When writing a blitter, I suggest doing the calculations inside the blitter itself. It would be much faster, as you’d only have to get those two pointers once per blit, and you could avoid the whole temporary color variable and save your results directly to the destination pixel.

Note that if you want to do this with 15-bit or 16-bit colors, you won’t be able to use a pixel struct, but the lookup table will work just fine with absolutely no modification whatsoever, as long as your alpha values are 0-255. You could also optimize it a bit more by removing the clipByte from the LookupPointer function, as well.

Now that you hopefully understand the alpha blend function, I’m going to show you an additive blend function, and a subtractive blend function. Quite handy functions to have.

__int32 SubtractiveBlend(Pixel32 DestColor, Pixel32 SourceColor, __int32 Alpha) {
Pixel32 NewColor;
LookupLevel *
SourceTable;

  SourceTable = LookupPointer(Alpha);

  // These clipByte's ARE necessary. Don't remove them.
  NewColor.Channels.Red = clipByte(DestColor.Channels.Red -
                          SourceTable->Values[SourceColor.Channels.Red]);
  NewColor.Channels.Green = clipByte(DestColor.Channels.Green -
                            SourceTable->Values[SourceColor.Channels.Green]);
  NewColor.Channels.Blue = clipByte(DestColor.Channels.Blue -
                           SourceTable->Values[SourceColor.Channels.Blue]);
  return NewColor.Value;
}

__int32 AdditiveBlend(Pixel32 DestColor, Pixel32 SourceColor, __int32 Alpha) {
Pixel32 NewColor;
LookupLevel *
SourceTable;

  SourceTable = LookupPointer(Alpha);

  // These clipByte's ARE necessary. Don't remove them.
  NewColor.Channels.Red = clipByte(SourceTable->Values[SourceColor.Channels.Red] +
                          DestColor.Channels.Red);
  NewColor.Channels.Green = clipByte(SourceTable->Values[SourceColor.Channels.Green] +
                            DestColor.Channels.Green);
  NewColor.Channels.Blue = clipByte(SourceTable->Values[SourceColor.Channels.Blue] +
                           DestColor.Channels.Blue);
  return NewColor.Value;
}

Explanation

Alright, there you have it. Additive Blend and Subtractive Blend functions. They’re actually faster than the alpha blend function, believe it or not. You COULD remove the clipByte call, but then you’d have tons of overflow problems, so just leave it there. I may have screwed up the algorithm a bit, as these aren’t the actual functions I use, I rewrote these from scratch.

Hopefully this article has shown you a few ways to optimize your code, and maybe even given you insipriation for how to optimize one of your graphics routines. If you like the article, have any questions, or would like to point something out to me, feel free to email me at janusfury@citlink.net.

Also, I might be writing another article soon on blitters and more graphics operations, if I get enough response to this article.