Contents

Number
Representation
Bitwise Combinational
Operators
Bitwise Shifts
Uses For Bitwise
Operators

Printable version
Discuss this article
in the forums

Uses For Bitwise Operators

Bitwise operators have two main applications. The first is using them to combine several values into a single variable. Suppose you have a series of flag variables which will always have only one of two values: 0 or 1 (this could also be true or false). The smallest unit of memory you can allocate to a variable is a byte, which is eight bits. But why assign each of your flags eight bits, when each one only needs one bit? Using bitwise operators allows you to combine data in this way. The Win32 and DirectX both make liberal use of this. The second application for bitwise operators is that you can use them to accomplish certain arithmetic operations. We'll take a look at both.

Extracting and Clearing Values

A perfect example for when you'd want to combine multiple values into a single variable is if you're doing graphics work with 32-bit color. In a 32-bit color dword, there are four distinct values. The low byte (bits 0 through 7) is the value for blue. The next most significant byte is a value for green, then a byte for blue, and finally, the high byte is an alpha (transparency) value. So the color dword looks like this in memory:

AAAA AAAA RRRR RRRR GGGG GGGG BBBB BBBB

Now, suppose you have a 32-bit integer called dwColor, and you want to extract the value for green. How would you do it? What you need is a way to eliminate the other three bytes, and leave the red byte untouched. Recall the truth table for the bitwise AND. If you remember, ANDing any bit with 0 yields 0, and ANDing any bit with 1 yields the original bit. So what you do here is define a value called a mask, which has 0s where you want to erase information, and 1s where you want to save information. Since you want to extract the red byte, your mask would look like this:

0000 0000 1111 1111 0000 0000 0000 0000

Of course, you can't write binary numbers directly into C code, so you have to convert it into hexadecimal. But that's easy, remember? In this case, the hex equivalent is 0x00FF0000. If we use the bitwise AND on dwColor and our mask, we get the following result:

dwColor:   AAAA AAAA RRRR RRRR GGGG GGGG BBBB BBBB
mask:    & 0000 0000 1111 1111 0000 0000 0000 0000
         -----------------------------------------
result:    0000 0000 RRRR RRRR 0000 0000 0000 0000

That's great, but there's just one problem. To use the red byte by itself like we want, it would have to be the low byte. But it's not -- it's 16 bits up in the dword. So, what do you think we learned those shift operators for? :) All we need now is a shift right by 16 places, and we're all set:

Previous:  0000 0000 RRRR RRRR 0000 0000 0000 0000 
Shift:                                       >> 16
           ---------------------------------------
Result:    0000 0000 0000 0000 0000 0000 RRRR RRRR  ==  RRRR RRRR

We've done exactly what we wanted to do: we extracted the red byte from the full color dword. This example can be applied to virtually anything that uses bit fields to store a number of values in a single variable. All you have to do is make sure your masks are set correctly, and you shift by the correct number of places. The mask should contain a 1 wherever you want to keep information, and a 0 wherever you want to clear it. As one more example, I'll show you how you would come up with the masks for a 16-bit color word.

Color word:  RRRR RGGG GGGB BBBB
Red mask:    1111 1000 0000 0000  ==  0xF800
Green mask:  0000 0111 1110 0000  ==  0x07E0
Blue mask:   0000 0000 0001 1111  ==  0x001F

Note that instead of clearing all the fields but one so you can use that by itself, you can also use AND to leave all the fields except one as they are, clearing only that one. For example, you could alter a 32-bit color dword by clearing its green byte. But, then what happens if you want to set that value to something else without altering the rest of the color word? You could accomplish that with the bitwise AND operator and a number of shifts, but it wouldn't be the most efficient way. To easily splice new values into a larger variable, we need to employ the bitwise OR.

Inserting and Combining Values

Now our situation is the reverse to what we were just doing. Instead of extracting a byte from a color dword, suppose we want to reset one. Maybe we have a color dword that represents the color (214, 53, 240), and we want to change it to (214, 166, 240). The first step is to clear the green byte to 0, which we learned how to do in the last section. To see how to rewrite that byte, consider the truth table for the bitwise OR. Remember that any value ORed with 0 is that value. So we must create a new mask to use. It will have zeroes wherever the color dword is already defined, and it will have an actual color value wherever the color dword has a 0. If you didn't follow that, this should clear it up:

dwColor:   AAAA AAAA RRRR RRRR 0000 0000 BBBB BBBB
mask:    | 0000 0000 0000 0000 GGGG GGGG 0000 0000
         -----------------------------------------
result:    AAAA AAAA RRRR RRRR GGGG GGGG BBBB BBBB

So in this case, the mask is the green byte, located at the appropriate position so that it merges correctly with the color dword. As before, we can use a bitwise shift to shift the green byte into the position we want it in. In the example above, the green byte is located eight bits above the low byte, so the shift operation you'd use would look like this:

Previous:  0000 0000 0000 0000 0000 0000 GGGG GGGG  ==  GGGG GGGG
Shift:                                        << 8
           ---------------------------------------
Result:    0000 0000 0000 0000 GGGG GGGG 0000 0000

And that's it! What we've done so far with the bitwise AND and OR operations is enough for you to fully manipulate any values divided into bit fields, such as color words. The other place this comes in handy is when you're designing a function that can take a number of flags as arguments, and you want to combine all of those flags into a single parameter to the function. The only thing you need to do is make sure you define the various flags so that their binary values don't have any 1s in common. Then you can OR them together to create a unique combination of flags. For example, take a look at the following function call:

Animate(lpdds, 8, ANIM_LOOP | ANIM_MAXSPEED | ANIM_LINK);

This is an example of a function call you might use to start an animation in a game engine. Note that the last parameter to the function has three constants combined using the bitwise OR operator. The definition of those constants might look like this:

#define ANIM_LOOP        1    // (0000 0001)
#define ANIM_ONCE        2    // (0000 0010)
#define ANIM_MAXSPEED    4    // (0000 0100)
#define ANIM_MINSPEED    8    // (0000 1000)
#define ANIM_CUSTSPEED  16    // (0001 0000)
#define ANIM_LINK       32    // (0010 0000)
#define ANIM_LINKALL    64    // (0100 0000)

Note that the constants are defined as successive powers of 2, so that each has only one bit set to 1. This is so any combination of these values will yield a unique result. If you defined them as consecutive integers, you would get repeated values. For example, 1 OR 2 is 3, but 1 OR 3, 2 OR 3, and just 3 by itself all come out to 3 as well. So the function receives a value of 3, it won't know what combination you mean. To test to see if one of the flags is present, you just try to extract it with an AND. If the result is nonzero, the flag is set. So the body of the function might look like this:

int Animate(LPDIRECTDRAWSURFACE lpdds, int nFrames, DWORD dwFlags)
{
    // test for looping
    if ((dwFlags & ANIM_LOOP) > 0)
        anim.bLoop = TRUE;

    // test for maximum speed
    if ((dwFlags & ANIM_MAXSPEED) > 0)
        anim.nSpeed = MAX_SPEED;

    // ...and so on
}

As I mentioned before, the Win32 API and DirectX both have a great number of functions that let you combine flags in this way, and this is how they accomplish it. If you look through your Windows headers, you'll find those flags defined as powers of 2 for that very reason. Very useful stuff.

Swapping Variables

If you're ever writing some code that needs to fit in a very tight amount of memory (such as a piece of assembly language), you might come across a situation where you want to swap two variables without having to use a third. The XOR operator provides a very cool way to do this. This works based on two facts. First, the XOR operation is commutative. That is, X ^ Y is the same thing as Y ^ X. Second, as we saw earlier, anything XORed with itself yields 0. With that in mind, suppose we have two constants, called CONST_A and CONST B. Take a look at the following code fragment:

                    //  Value of x                         Value of y
                    // -----------------------------------------------
int x, y;           //  0                                  0
x = CONST_A;        //  CONST_A                            0
y = CONST_B;        //  CONST_A                            CONST_B
x = x ^ y;          //  CONST_A ^ CONST_B                  CONST_B
y = x ^ y;          //  CONST_A ^ CONST_B                  CONST_A ^ CONST_B ^ CONST_B == CONST_A ^ 0 == CONST_A
x = x ^ y;          //  CONST_A ^ CONST_A ^ CONST_B        CONST_A
                    //  == 0 ^ CONST_B == CONST_B          CONST_A
                    //  CONST_B                            CONST_A

Look this over for a minute so you can see exactly what's going on. The two variables begin as CONST_A and CONST_B, respectively. With two XOR operations, y becomes CONST_A ^ CONST_B ^ CONST_B, but CONST_B ^ CONST_B is of course 0, and CONST_A ^ 0 is simply CONST_A. Another XOR operation does the reverse to x, and in three quick statements, you have swapped the values of two variables, without using a third. Is that cool or what? :) Also, note that there is a ^= operator similar to the += and -= operators in C, so the code for swapping x and y comes down to this:

x ^= y;
y ^= x;
x ^= y;

Bitwise operations are extremely fast for the processor to handle, so this is nice and quick, as well as a way to rid yourself of a temporary variable, which can be very nice. All right, we're almost done. The last thing we can do is replace a few arithmetic operations with bitwise ones.

Replacing Arithmetic Operations

What happens when you take a binary number and multiply it by two? Let's write it out the long way to see exactly what goes on. First, choose a random binary number, say 0010 1101. Then write it as a sum of powers of two, like we saw at the beginning of this article:

0010 1101 = (1 * 2⁵) + (1 * 2³) + (1 * 2²) + (1 * 2⁰)

I have eliminated the cases where the bit is 0, because they just multiply out to 0 anyway so they don't affect the sum. Now, suppose we take this sum of products and multiply it by two. Here's what we get, again in the expanded notation:

0010 1101 * 2 = 2(1 * 2⁵) + 2(1 * 2³) + 2(1 * 2²) + 2(1 * 2⁰)
0010 1101 * 2 = (1 * 2⁶) + (1 * 2⁴) + (1 * 2³) + (1 * 2¹)

Notice what our result is -- since the original number was written as a sum of powers of two times the bits of the number, all that happens when we multiply the whole sum by two is that the exponents on the 2s increase by one each. See what I'm getting at? The result of the multiplication is still a sum of powers of two, and as such, it can be directly reduced to a binary number. Let's see what it is.

0010 1101 * 2 = (1 * 2⁶) + (1 * 2⁴) + (1 * 2³) + (1 * 2¹) = 0101 1010

Notice anything odd about the result? That's right -- it's the original number, shifted left by one position. This is because a shift left increases the weight on each bit by one place, which is the same as increasing the exponents on the 2s in expanded form, which is the same as multiplying by two. Thus, you can multiply by any power of two by shifting left the same number of places. For example, if you wanted to multiply a number by 16, which is 2⁴, you could simply shift it left four places. The following pairs of statements are all equivalent to one another:

x = y * 8;
x = y << 3;

x = y * 64;
x = y << 6;

x = y * 32768;
x = y << 15;

Pretty neat, isn't it? This is a fast way to accomplish multiplication by powers of two, using only a bitwise shift. If you've ever done any circuit design, you know that a bitwise shift is extremely fast, and so this can help you out. Newer processors' multiplication speeds are getting faster and faster, so this is less of a help than it used to be, but it's still an interesting thing to know.

Now, you might be asking, if shift left is equivalent to multiplication by two, is shift right equivalent to division by 2? The answer is yes! Shifting right simply drops the weight on each bit, meaning that the exponents on the 2s all decrease by one. Note that this is integer division only; you can't get a fractional value or a remainder out of this. Division in hardware is not the fastest thing in the world, though it is getting better, so this is again a good trick to know. These pairs of statements are equivalent to one another:

x = y / 4;
x = y >> 2;

x = y / 32;
x = y >> 5;

Finally, there is one more arithmetic operation that we can replace. Let's take our last example number, 0010 1101, and suppose we extract the last three bits, so the number is in two parts:

Part 1:    0010 1000
Part 2:  | 0000 0101
         -----------
Sum:       0010 1101

Part 1 of this number has had its lower three bits extracted, which means they are all 0s. So the smallest power of two in the longhand form of Part 1 is 8 = 2³, because bit 3 is the lowest bit set to 1. That means that Part 1 is evenly divisible by 8. Part 2 is of course less than eight, because only the three lowest bits have been extracted into it, so it can have a maximum value of 2³ - 1 = 7. Thus if you add Parts 1 and 2 together, to get our original number, and divide that number by eight, the result would be the same as if you were to simply divide Part 1 by eight. Have you figured out what Part 2 is yet? It's the remainder when you divide the sum by eight. That means that you can use an AND to replace a modulus operation if you're using a power of two.

You might want to re-read that to let it sink in a bit, because this one isn't quite as obvious as the last two. Consider this. You have a number you want to divide by 32. If you were to shift the number right by five places, this accomplishes the division, and the five least significant bits are thrown away. That means that those five bits (which we'll call x) had no effect whatsoever on the result of the division. All they represented was how close that original number was to reaching the next multiple of 32.

In summary, to accomplish a modulus operation on a number of the form 2ⁿ, simply mask off the lowest n bits using a bitwise AND. To do this, you use 1s for all n of those bits. But notice that we have the following identity:

2ⁿ⁺¹ - 1 = 2^k

In other words, the number we need to use as our mask in order to perform a modulus operation on 2ⁿ is simply 2ⁿ - 1. To illustrate this, the following pairs of statements are all equivalent to one another:

x = y % 8;
x = y & 7;

x = y % 32;
x = y & 31;

x = y % 256;
x = y & 255;

Conclusion

All right, we're all finished. We've accomplished quite a bit with this article. You've now been through a brief introduction to the various number systems and how they are used. You've also seen a number of useful applications of this knowledge, such as inserting and extracting values within a larger variable, performing fast arithmetic operations, and a neat trick to swap two variables quickly. If you have any comments on this article, or questions about anything in it, feel free to send me an E-mail at ironblayde@aeon-software.com. Happy coding!