In Memory Data Compression and Decompression
Part One - Compression
This Sweet Snippet will show you how easy it can be to perform compression/decompression between data buffers in memory using the zlib library. We will go the easy route to get a simple example application going which will read in the contents of a file into memory, compress that data, and then write it back out to file. In the second part we will use the output from part one to decompress the data, then write it back out to disk so we can check the results.
What you will need:
Zlib provides two different functions for in-memory buffer-to-buffer compression so let's have a look at them. Located in zlib.h at line 876 you will find the following declaration:
int ZEXPORT compress(Bytef* dest, uLongf *destLen, const Bytef *source, uLong SourceLen);
destLen - After the function has returned, this value will be the size in bytes of the destination buffer.
source - Pointer to the source buffer which contains the data to be compressed.
sourceLen - Length of the source data in bytes.
This function is pretty simple to use - pass it pointers to two memory buffers, one containing the source data and one empty buffer for the compressed data. But what if you'd like a little more control about exactly how the data is compressed? For that you would need to use the following function instead which is at line 891 in zlib.h:
int ZEXPORT compress2(Bytef *dest, uLongf *destLen, const Bytef *source, uLong SourceLen, int level);
The parameters are identical to compress except for the addition of a new one: level. The value of this parameter will determine how the data is compressed - allowing you to achieve a trade-off between speed and compression ratio. The possible values are:
Z_BEST_SPEED - sacrifices compression ratio for improved speed.
Z_BEST_COMPRESSION - gain improved compression ratios but at a cost of execution speed.
Z_DEFAULT_COMPRESSION - this is a compromise between compression ratios and speed of execution.
Both of these functions will return Z_OK on success, otherwise an error code detailing a little more information about exactly why the call failed will be returned instead.
Now that you know what functions we can use, let's go through a simple example:
//input and output files FILE *FileIn = fopen("FileIn.bmp", "rb"); FILE *FileOut = fopen("FileOut.dat", "wb"); //get the file size of the input file fseek(FileIn, 0, SEEK_END); unsigned long FileInSize = ftell(FileIn); //buffers for the raw and compressed data void *RawDataBuff = malloc(FileInSize); void *CompDataBuff = NULL; //zlib states that the source buffer must be at least 0.1 //times larger than the source buffer plus 12 bytes //to cope with the overhead of zlib data streams uLongf CompBuffSize = (uLongf)(FileInSize + (FileInSize * 0.1) + 12); CompDataBuff = malloc((size_t)(CompBuffSize)); //read in the contents of the file into the source buffer fseek(FileIn, 0, SEEK_SET); fread(RawDataBuff, FileInSize, 1, FileIn); //now compress the data uLongf DestBuffSize; compress2((Bytef*)CompDataBuff, (uLongf*)&DestBuffSize, (const Bytef*)RawDataBuff, (uLongf)FileInSize, Z_BEST_COMPRESSION); //write the compressed data to disk fwrite(CompDataBuff, DestBuffSize, 1, FileOut);
I've not included any error checking in the above code for reasons of clarity; this is something you would obviously want to include in your own applications.
Part Two - Decompression
Having compressed data is of no use to anyone without a way of decompressing it back to the original form. Fortunately zlib provides the following utility function to decompress a data buffer in memory:
int uncompress(Bytef *dest, uLongf *destLen, const Bytef *source, uLongf sourceLen);
destLen - After the function has returned, this value will be the size in bytes of the decompressed data.
source - Pointer to the source buffer which contains the data to be decompressed.
sourceLen - Length of the compressed data buffer in bytes
Unlike its compression counterpart, there is only a single version of the decompression function since there is not much customisation you can apply to decompression - you generally want the function to operate as fast as possible. The uncompress function returns the same set of values as its compression counterparts regarding success and failures.
Now let's move onto an example of how to use the above function to decompress the data from the file in part one before writing the original contents back out to disk:
//the input file, this is the output file from part one FILE *FileIn = fopen("FileOut.dat", "rb"); //output file FILE *FileOut = fopen("OrigFile.bmp", "wb"); //get the file size of the input file fseek(FileIn, 0, SEEK_END); unsigned long FileInSize = ftell(FileIn); //buffers for the raw and uncompressed data void *RawDataBuff = malloc(FileInSize); void *UnCompDataBuff = NULL; //read in the contents of the file into the source buffer fseek(FileIn, 0, SEEK_SET); fread(RawDataBuff, FileInSize, 1, FileIn); //allocate a buffer big enough to hold the uncompressed data, we can cheat here //because we know the file size of the original uLongf UnCompSize = 482000; UnCompDataBuff = malloc(UnCompSize); //all data we require is ready so compress it into the source buffer, the exact //size will be stored in UnCompSize uncompress((Bytef*)UnCompDataBuff, &UnCompSize, (const Bytef*)RawDataBuff, FileInSize); //write the decompressed data to disk fwrite(UnCompDataBuff, UnCompSize, 1, FileOut);
Again error checking has been removed for this example; we also use a fixed file size for the uncompressed data since we know how big the original file is. Ideally you would want to store the size of the original uncompressed data along with the actual data itself for use when decompressing it.
That sums up compression between buffers in memory. The code for compression/decompression is ideally suitable for being as utility functions to hide away all those details of buffer allocation/checking return values etc.