Upcoming Events
Unite 2010
11/10 - 11/12 @ Montréal, Canada

GDC China
12/5 - 12/7 @ Shanghai, China

Asia Game Show 2010
12/24 - 12/27  

GDC 2011
2/28 - 3/4 @ San Francisco, CA

More events...
Quick Stats
97 people currently visiting GDNet.
2406 articles in the reference section.

Help us fight cancer!
Join SETI Team GDNet!
Link to us Events 4 Gamers
Intel sponsors gamedev.net search:

Preface

In this article I intend to address the basics, and the slightly less basic aspects, of the CRC32 algorithm. This is a relatively straightforward article, most of all the source is provided - demos for both Borland C++ and MSVC++ .NET (which should be easily ported to MSVC++6) are available with this article as well.

A slight comprehension of math (polynomial division) would be helpful, but I am writing this article so that it is not required. You will however need to understand C++ - this, of course, is a given.

Why use CRC32?

Ok, so as one of my smaller projects I have decided to write an online application updater that simply generates a unique signature for a set of files, compares these signature values with a master list on the server, and updates any files that do not match.

Now there are a couple of ways that I can check the integrity of these files and decide as to whether or not they require an update from the master server. File time/date stamps, file size, and CRC32 come to mind. There are some obviously more complex ways such as databasing the file entries; however there is no need to be overly elaborate. The same goes for most uses of CRC32 in gaming. Besides, most of the more complex algorithms take a lot more computation than CRC32.

CRC32, of all the choices present, is most feasible since you do not have to worry about files that are of the same size when modified (very common with bitmap images, and a few other common game file types), and we do not have to worry about inaccurate file time/date stamps due to modified files. Though CRC32 is not a perfect algorithm and there is always a chance that 2 files will have the same CRC32 value it is rare enough that CRC32 makes a perfect candidate for file signature generation.

A CRC32 value is a 32bit (4byte) number (also known as a signature) that uniquely represents a specified chunk of data. The advantage with CRC32 for signature checking is that the algorithm is designed so that a slight change in the file (be it a change, a delete, or an insert) should cause a large change in the CRC32 signature value. CRC32 has 2^32 possible values; a total of 4.294967E+09.

CRC32 is very useful in gaming for these reasons as it allows us to do many things. They can be used heck files to see if they have been modified to help prevent game hacking, or they can be used to check files or data that is sent over an internet connection during game play such as a map file.

How does CRC32 work?

CRC32 is a somewhat involved process, and relies on a lot of mathematical reasoning. But this is generally how it goes:

First we are going to take a polynomial value, and with this value we generate a CRC32 lookup. I will, towards the end of this article, demonstrate the proper way to generate a lookup table with a given polynomial. What we will do next is take the data we want to generate a CRC32 signature for and using a CRC32 lookup function generate an updated CRC32 value for each byte in the file, string, or structure we want to verify. This is done for every byte within the string, structure, or file. When complete the most updated value becomes the signature.

The heart of the CRC32 algorithm is simply a system that takes the present byte to compute from the data set, and the present CRC32 value that we have and performs a lookup into our lookup table. If there is no present CRC32 value available we will use 0xFFFFFFFF. With a little bit of math, more specifically a polynomial division, the function will return an updated CRC32 value.

The implementation is very simple.

The Class

This is what our class is going to look like.

class ECRC32 {
public:
      static DWORD CRC32ByString( LPCTSTR, DWORD& );
      static DWORD CRC32ByFile( LPCTSTR, DWORD& );
      static DWORD CRC32ByStruct( BYTE*, DWORD, DWORD& );
 
private:
      static bool GetFileSize( HANDLE, DWORD& );
      static inline void GetCRC32( BYTE, DWORD& );
 
      static DWORD CRC32Table[256];
};

Simply we have a function for each type of data type we plan on dealing with that we will generate a CRC32 signature. We also have two private functions, one the calculate the size of the file - needed internally for the algorithm to work properly, and the other to perform the math and update the CRC32 value for the specific byte.

There is also one static array, this is simply for us to store the CRC32 lookup table. You do not need to actually store this table permanently if you want. With a constructor and destructor you can generate and cleanup a table only when you plan on using it. For our purposes it is more sensible to use the table since this can greatly reduce the processing time.

Generating the lookup table

To generate a lookup table we need to specify a polynomial value to use in which will give us the values we use in our lookup table. The polynomial value you use will be a value that should theoretically reduce the likeliness of duplicate signatures in your data; however the math behind one of these values is overly complex and not within the scope of this article. This is ok though since we have a value already picked out for us. The value that we will use to generate our lookup table will be 0x04c11db7.

Consequently this polynomial is also used by many other applications and systems, such as WinZip, PkZip, and Ethernet.

Note: You will need to generate the lookup table at least once even if you are using a hard coded lookup table. Unless of course you use the lookup table provided with my demo.

Before we continue, CRC standard states that we have to reflect the values in our lookup table; this can be done by also reflecting the polynomial. In this case 0x04c11db7 becomes 0xedb88320. Though some equations will reflect the table values as they are generating I am going to reflect the polynomial. It all ends up being the same in the end anyway, this just means a few less lines of code.

Note: Reflecting is a simple process of reversing the binary pattern of a given binary segment. For example 10100111 would become 11100101 when reflected.

DWORD dwPolynomial = 0xEDB88320;
DWORD* CRC32Table = NULL;
 
CRC32Table = new DWORD[256];
 
DWORD dwCRC;
for(int i = 0; i < 256; i++)
{
      dwCRC = i;
      
for(int j = 8; j > 0; j--)
      {
            if(dwCRC & 1)
                  dwCRC = (dwCRC >> 1) ^ dwPolynomial;
            else
                  dwCRC >>= 1;
      }
      
CRC32Table[i] = dwCRC;
}

Simply cycle through all 256 entries and generate a lookup value based on the polynomial lookup table position.

Depending on your situation you may or may not want to spend CPU cycles to keep regenerating the lookup table every time you want to generate a signature. If this is the case simply write this code into a separate application, dump the table to a file, to your screen, or where ever you find is most convenient, and copy and paste the values right into the table. Though you have the cost of 256*4 bytes of memory (which really is not a whole lot) you will drastically improve calculation time without the need to constantly recalculate the lookup table.

For those of you who are attempting to generate a lookup table using the polynomial listed above the first 14 values should look like this:

0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA,
0x076DC419, 0x706AF48F, 0xE963A535, 0x9E6495A3,
0x0EDB8832, 0x79DCB8A4, 0xE0D5E91E, 0x97D2D988,
0x09B64C2B, 0x7EB17CBD, 0xE7B82D07, 0x90BF1D91




Page 2

Contents
  Page 1
  Page 2

  Source code
  Printable version
  Discuss this article