Run Length Encoding compressor program, 16 bit header version
by Shaun Case 1991 in Borland C++ 2.0 with sizeof (int) == 2
This program and its source code are Public Domain. This program should be portable to any machine with 2 byte short ints and 8 bit bytes.
What is run length encoding?
Run Length Encoding, also known as RLE, is a method of compressing data that has a lot of "runs" of bytes (or bits) in it. A "run" is a series of bytes that are all the same. For instance, the string "THIS IS A VEEEEEEEEEEEEEEEEEEEEEEEERY INTERESTING SENTENCE" has a run of 23 'E's in it. This could be compressed in the following manner:
THIS IS A V23ERY INTERESTING SENTENCE
resulting in a savings of 20 characters. A further savings of one character can be realized if the sequence "23" is replaced by a single byte with the value 23.
However, if the text to be encoded is arbitrary, then it may contain numbers as well as letters, and bytes of all possible values. For this reason, there must be some way to let the decoder know when a compressed run is encountered, and when a sequence to be passed straight through is encountered. For this reason, the following file format was used:
========= tech info =========
16 bit header version.
File format:
13 bytes : original filename, followed by:
[ 16 bit header + data ][ 16 bit header + data ][16 bit header + data ]
etc..
header:
[lo byte][hi byte] ==> turn into 16 bit int ==>
bit 15 : 1 if following byte is a run
bit 14 - 0 : length of run (max 32767, min 4)
data: 1 byte : which character run consists of
*** OR ***
header:
[lo byte][hi byte] ==> turn into 16 bit int ==>
bit 15 : 0 if following bytes are sequence
bit 14 - 0 : length of sequence (max 32767)
data: (header & 0x7FFF) bytes of data
: data bytes copied to output stream unchanged
===============================
bugs:
None known
Nasty features :
- In 8 bit version, when encoder reaches max run length, it is written out correctly, but is followed by a 1 length run of the next byte. Odd. Reason unknown. This version probably does it too, but I haven't tested it.
- Better compression could be achieved by having min compression length and sequence length understood to be 4. This would allow an "understood" multiplication of the seq_len or run_len by 4, since 1 is never (should never be) used, allowing sequences of 131068 bytes. Implementing this requires fixing 1 above, too.
- This 16 bit "improvement" is very unlikley to give better compresssion than the 8 bit version for data consisting of executable code, English or or other natural language text, netnews, BBS messages, or just about anything else other than large simple graphics images or other types of data that consist almost entirely of very long runs of characters. In most cases, this program will actually INCREASE the size of the data file slightly, due to the overhead needed for sequences between runs. It also runs about 10% slower than the 8 bit version due to the use of shifts and logical operators. Mainly it was an experiment that yielded disappointing results. However, here it is, for you to use and abuse.
- 0Portability -- this program is "pretty portable." The stuff dealing with dos filenames should be removed, but that should be about it, unless you want to make the datafiles portable too, which is another story.
- There is a lack of error checking when bytes are written out to the output file. Bytes are written in lots of places, and I didn't get around to doing error checking on any of them, since my application was going to run only in memory, and all the fputc() calls were going to be changed to pokes or pointer-directed stuffing. If you are going to use this program in a disk based application, you should really put EOF checks after each fputc().
Author: atman%ecst.csuchico.edu@RELAY.CS.NET (internet)
1@9651 (WWIVnet)
atman of 1:119/666.0 (fidonet)
Tell me hi if you use this program!
Discuss this article in the forums
Date this article was posted to GameDev.net: 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)
See Also:
Compression Algorithms
© 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!
|