8--bit RLE Compression

What is run length encoding?

Run Length Encoding, also known as RLE, is a method of compressing data that has a lot of "runs" of bytes (or bits) in it. A "run" is a series of bytes that are all the same. For instance, the string "THIS IS A VEEEEEEEEEEEEEEEEEEEEEEEERY INTERESTING SENTENCE" has a run of 23 'E's in it. This could be compressed in the following manner:

THIS IS A V23ERY INTERESTING SENTENCE

resulting in a savings of 20 characters. A further savings of one character can be realized if the sequence "23" is replaced by a single byte with the value 23.

However, if the text to be encoded is arbitrary, then it may contain numbers as well as letters, and bytes of all possible values. For this reason, there must be some way to let the decoder know when a compressed run is encountered, and when a sequence to be passed straight through is encountered. For this reason, the following file format was used:

  
========= tech info =========  
  
8 bit header version.  
  
File format:  
  
13 byte original filename, followed by  
  
[ 8 bit header + data ][ 8 bit header + data ][ 8 bit header + data ]  
etc..  
  
header:  
  
  bit 7         : 1 if following byte is a run  
  bit 6 - 0     : legnth of run (max 127, min 3)  
  
data: 1 byte : which character run consists of  
  
*** OR ***  
  
header:  
  
  bit 7         : 0 if following bytes are sequence  
  bit 6 - 0     : legnth of sequence (max 127)  
  
data:  (header AND 0x7F) bytes of data  
                : data bytes copied to output stream unchanged  
  
===============================

bugs:

None known

Nasty features :

When encoder reaches max run length, it is written out correctly, but is followed by a 1 length run of the next byte. Odd. Reason unknown.

Better compression could be achieved by having min compression length and sequence length understood to be 2. This would allow an "understood" multiplication of the seq_len or run_len by 2, since 1 is never used, allowing sequences of 254 bytes. This is not likely to give much better compression in most cases, and is left as an exercise for the reader. Implementing this requires fixing 1 above, too.

Author: atman%ecst.csuchico.edu@RELAY.CS.NET (internet) 1@9651 (WWIVnet) atman of 1:119/666.0 (fidonet)

Tell me hi if you use this program!

Date this article was posted to GameDev.net: 7/15/1999
(Note that this date does not necessarily correspond to the date the article was written)