IntroductionAfter writing the Game Programming Genesis series, I got a lot of E-mail from people asking me to clarify the bitwise operations I used in the sample programs. I use them all over the place in my code, enough that I don't really think about what I'm writing anymore, and so I neglected the fact that a lot of people don't see them used too often. This article is meant to be a complete introduction to bitwise manipulations using the C programming language. If you don't know C, no big deal. The ideas presented here are very general. The only things that will differ from one language to another are the characters used to represent the various operators. I will go over the combinational operators AND, OR, and XOR, and show you some of the things they are commonly used for. I will also go over bitwise shifts, and their applications. But before that, in case you're very new to this, I'm going to run through a discussion of how numbers are represented in binary form, and some basic terminology. If you've gone that far already, which many of you probably have, feel free to skip over this first part. Number RepresentationA number is a very abstract concept. Unlike physical objects, which are easily recognizable on sight, a number can be represented in an infinite number of ways. The representation we are used to is called decimal, or base 10. The first time is pretty familiar, but if you're reading this section, the second term may be new to you. To see why we use the term "base 10," let's take a look at an arbitrary number, say 5346. Read aloud, this is five thousand, three hundred, forty-six. We hear numbers like that so often that it's not immediately obvious, but this sounds a lot like a formula. Specifically: 5346 = (5 * 1000) + (3 * 100) + (4 * 10) + (6 * 1) Or, if we write it another way, we see that a decimal number is actually the sum of its digits multiplied by successive powers of 10: 5346 = (5 * 103) + (3 * 102) + (4 * 101) + (6 * 100) Do you see why the term "base 10" is used to describe the way we normally write numbers? Any integer can be written this way. Suppose we call the least significant digit (the digit in the ones place) "digit 0," and each successively higher digit is digit 1, digit 2, and so on. Then the full number is represented by digit 0 times 100, plus digit 1 times 101, and so on, up through digit n times 10n. The shorthand for the general case of an n-digit number is: dndn-1...d1d0 = dk * 10k This formula makes it clear why we call decimal representation "base 10" -- because 10 is the number used to weight the digits in order to give the number its value. Note also that there are 10 unique digits in the decimal number system. This is not coincidence. This is the way things must be to ensure that every integer has a representation, and that representation is unique. For example, consider if we had only the nine digits 0 through 8. There would be no way of representing the concept of ninety in the decimal system. The largest two-digit number would be 88, and the smallest three-digit number would be 100. Clearly this is not acceptable. Conversely, suppose we had eleven digits: 0 through 9, and A, where A has a value of 10. Then the number ten could be written as "10" or as "A." Since we want a unique representation for each number, this is not acceptable either. The obvious question to ask now is, "Why do we have to use 10 as the base?" Quite simply, we don't! Any positive integer greater than two can be used as the base. (Note that if we tried to use a "base 1" number system, the only thing we could write would be strings of zeroes.) A more interesting question is to ask why base 10 happens to be the number system in common use. That, I'm not sure of. But it wasn't always that way. The ancient Babylonians used to use a base 60 number system. Sound crazy? Well, some remnants of that system can still be seen today. Why do you think we divide an hour into sixty minutes, and a minute into sixty seconds? :) Anyway, if we take our last formula, and replace the 10 with a generic base B, then we have the representation for a number in any base. A base is also called a radix, by the way, so you'll know what I mean if I use that term. Now, we've already seen the decimal system. Let's take a look at some other important number systems. BinaryThe binary, or base 2, number system is the one that computers use to represent numbers in memory. Since we're dealing with base 2, there are only two digits available, namely 0 and 1. Each digit is multiplied by successive powers of two to obtain the number's overall value. For example, consider the binary number 100101. The value of this number is: 100101 = (1 * 25) + (0 * 24) + (0 * 23) + (1 * 22) + (0 * 21) + (1 * 20) = 32 + 4 + 1 = 37 So, why do computers use binary instead of just using decimal like we do? In computer hardware, a binary digit is represented by the presence of electric current. If the level of electricity is above a certain level, the digit is regarded as a 1. Otherwise, the digit is a 0. Obviously, building hardware that can differentiate between ten levels of current would be considerably more difficult and more expensive than this simple two-digit model. Hence, computers store everything in binary. A single binary digit is usually called a bit, and eight bits make a byte, which is the fundamental unit of storage in memory devices. Because a byte has eight bits, and each bit can take one of two values (0 or 1), the number of distinct values a byte can hold is 28 = 256. Two bytes together are called a word, which is a 16-bit value. A word can have 216 = 65,536 distinct values. Finally, two words together are called a double word or dword, which is a 32-bit value. Most processors in common use today are 32-bit, meaning they are built to normally operate on 32-bit numbers. This is why the int data type currently defaults to 32 bits in Win32 compilers, whereas an int is 16 bits in a realmode DOS compiler. Sometime in the not-too-distant future, we'll be writing 64-bit code instead, where the default size for an integer will be a quad word or qword. But for now, the dword is the standard. To see how a dword is broken up, take a look at the following diagram. Bits are labeled 0 through 31, starting with the rightmost (least significant) bit. The least significant bit (often abbreviated LSB) is also called the low bit. The grouping of bits 0 through 7 is called the low byte, and bits 0 through 15 are called the low word. This terminology indicates that the bit, byte, or word in question is at the least significant end of the dword. Similarly, the most significant bit (MSB) is called the high bit, bits 24 to 31 are called the high byte, and bits 16 to 31 are called the high word. These terms are used often, so remember them! Now that you have an idea of what binary numbers are, let's take a look at one other system that's used often in computer programming, hexadecimal. HexadecimalThe hexadecimal number system uses base 16, which means that there are sixteen distinct digits. Obviously we're only used to having ten, 0 through 9. In hexadecimal, the character A has a value of 10, B has a value of 11, and so on through F, which has a value of 15. Let's see an example of determining the value of a hexadecimal number: 3FC = (3 * 162) + (F * 161) + (C * 160) = (3 * 162) + (15 * 161) + (12 * 160) = 768 + 240 + 12 = 1020 The reason hexadecimal is so frequently used in programming is that it's very easy to translate between hexadecimal and binary, whereas converting from decimal to binary and back is kind of a pain. The reason it's so easy to convert between base 16 and base 2 is that 16 is a power of two. Specifically, 16 = 24. Why is this significant? Well, a group of four binary digits can take exactly 24 values, which means that each hexadecimal digit corresponds to exactly four binary digits. Since ten is not a power of two, the conversion is not so easy in decimal. Thus, when programmers want to use a specific binary number, they write it in hexadecimal. In C, you can recognize a hexadecimal number because it is always prefixed by "0x." For example, this C statement assigns the value 1020 to a variable, by using its hexadecimal equivalent: nValue = 0x3FC; There is no such prefix that will allow you to write a binary number directly, which is why hexadecimal is used is used. The following table shows the binary equivalents for each of the sixteen hex digits. You should try to internalize this. At first you'll have to think about what you're doing as you convert binary to hex and back, but after awhile it will become second nature.
We can use this table to show a quick example, finding the binary equivalent of the hex number 3FC. Simply convert each hex digit to its binary equivalent, and you're all done: 3FC = 0011 1111 1100 = 1111111100 Note that we can drop the leading zeroes, because they don't affect the value of the number, just as the decimal numbers 345 and 000345 are the same thing. All right, now you're all caught up on binary and hexadecimal, and you know why they're used. Let's get into the focus of this article, bitwise operations. |