Simple Compression using an LZ buffer Part 3 Revision 1.d: An introduction to compression on the Amiga by Adisak Pochanayon Freely Distributable as long as reproduced completely. Copyright 1993 Adisak Pochanayon ------------------------------------------------------------------------- In Part 1 (now obsolete) and Part 2 of the Amiga Compression series, I covered a new and extremely fast RLE (run-length encoding) method for graphics. This article will delve further into the world of compression on the Amiga by covering my latest experiment in compression techniques and show the use of a simple LZ history compression algorithm. In the second part of this article, I describe a Compressor (and DeCompressor) program in C. It is not written for optimal speed merely because I wanted the code to be readable and the time that speed is really needed is decompression (which occurs in the super-fast assembly routine). Using this code along with the unpacker.asm will allow you to include any raw compressed data easily in your own programs. Important Notes and Disclaimers. NOTE 1: This is a "beta" document. I welcome your suggestions but before you employ the methods mentioned in this document, please try to realize that I hope to provide a completely new compressor within a week or two that should be an order of magnitude faster. I may end up slightly changing the compression technique as well to provide better compression. Also, I welcome suggestions to improve the "readability" of the article as well.* *Both of these features have been added. Hashing with collision detection in the compressor, the capability to alter the TAGs easily, and a new unpacking algorithm have been added. NOTE 2: A very common comment that I have received is "Why not use XPK or PowerPacker libraries?" I respond with several reasons. (I also removed a paragraph that some compression library programmers found insulting -- Please understand that compression libraries are good for the Amiga, but they are not always exactly what a programmer wants.) The first is simply that I wish to provide an alternative method. Although, I have been "urged" to let professionals handle compression, I would like to argue that any "amateur" can help the field. My compression algorithm has already consistently beat ZOO 2.1 and Arc 5 for compression ratios and decompression from RAM: to RAM: is realistically more than twice as fast as any ZOO encoding. When SLZ is used within a program and compared to ZOO used on the RAM: filing system (an unfair test I know), SLZ beats ZOO by a speed factor of approximately 15. So if you feel that a technique that achieves better compression than the widely used ZOO and decompresses twice as fast is for amateurs, then you might be right. Then again, you might be wrong. I believe this is a viable "alternative" for Amiga programmers. The second involves distribution: PowerPacker can not be distributed with commercial products and it is possible that modules of XPK be likewise restricted at the discrimination of their authors. This algorithm is being released as Freely Distributable without restrictions to commercial applications although I retain the copyright. Also, many programmers would rather have direct control over their data and not go through the overhead of a library, regardless of how small that overhead is. Another factor, of course, is speed. SLZ has one of the fastest LZ unpacking algorithms available. This is in part due to the very little overhead imposed on the routine. The original asm encoding plus the enhancements made by Jesse Michael play no small role in the speed factor though. Combine that with the simple yet elegant data format and you get some awesome speed. To quote U.D. Mueller, "Good implementations of the algorithm used by ZOO are about an estimated half as fast as yours (check xpkBLZW)." So if you want TWICE the speed, drop ZOO -- go SLZ. The final reason is even if you decide not to use SLZ, I hope this article provides an educational benefit to Amiga programmers. I have certainly learned a lot by my experiments in data compression. There is so little "public" information on data compression available even if you read the compression newsgroups. I have also seen several postings on the c.s.a.p. newsgroup recently requesting further information on compression algorithms. With the large positive response I got on the RLE compression article earlier this year, I thought that an LZ article might also be received positively. NOTE 3: I have been told by several people that the compression algorithm I created is almost identical to the LZRW algorithm (partially patented). One person even sent me the specs of LZRW by e-mail. Indeed, the two algorithms are similar. However, there are some distinct differences. SLZ attains better compression than LZRW by using exhaustive history searching. SLZ also now uses a 16 collision HASHING for compression compared to the unit HASH used by LZRW. Furthermore, SLZ codes compressed strings slightly different, both with the END-TAGs, and the compressed string length. By doing so, SLZ again allows better compression (with a 4096-byte history, SLZ allows string lengths to vary between 3 and 17 while LZRW has a maximum of 15). Due to the data representation differences and the differences in the actual compression method, SLZ will *ALWAYS* obtain better compression ratios than LZRW. And yet another difference is the ability to alter the TAG in SLZ such that a 2048 History and 33-byte maxstring can be achieved. Finally, SLZ has a (currently) faster decompression algorithm than LZRW. Because of all these differences, I warrant that SLZ is a new algorithm that is simply in the same family as LZRW (After all I did say it is an LZ compression routine). I don't believe I have violated any patents with SLZ or broken any copyrights as I genuinely came up with this idea on my own. If I had access to other good compression source codes, I wouldn't have written SLZ to begin with. However, if I am violating any such ownership of intellectual property, I will remove SLZ from Freely Distributable status and I will ask anyone who has a copy of this article to destroy it and forbid my code from being used for any purposes on the Amiga. Comments, questions, spelling errors, or grammar police? Let me know :) Improvements from 3.0a: I have more than doubled the speed of compression on text and many other forms of data by modifying my HASH code to allow for up to 16 HASH collisions to be detected before brute force exhaustive searching occurs. Users who do not use the hash option will not notice the speedup in the C routines. I have also made the files more flexible to modify the TAG/CS bit-orderings. This will allow users to customize the file for greater compression by trading off length of compression strings for length of compression history. I have found that a history of 2048 instead of 4096 works much better for many graphics images which have lots of "line" oriented repeats. This allows me to compress longer repeat strings with a single CS tag. Many thanks to Jesse Michael of Haze of Epsilon (jdm@key880.rain.com), for rewriting the assembly. He unrolled my loops, freed a register, restructured my branching, optimized literal copying by making the speed of copying literal strings up to three times faster (on a 68000). Basically, he built a Porsche out of VW parts (joke - the original Porsches were built with VW parts :). The asm code is similar to the original (same excellent comments) and still easy to read. The asm takes a little more space, but the emphasis of this compressor has always been the speed of decompression. Thanks also go out to Urban Dominik Mueller (UDM) for pointing out some mistakes I made in my previous article and allowing me to correct them. Also, thank you for pointing out some obvious optimizations that I had not implemented such as unrolled loops. Finally, thanks to the 7-8 other people who responded with interesting suggestions and to the person who sent me the SPECs on the similar LZRW algorithm. Disclaimer: Neither Adisak Pochanayon, nor SilverFox SoftWare, warrant the use of this article for any means. The author holds no responsibility to the accuracy of the description of the routines or any adverse side effects they may cause when used on any machine. ------------------------------------------------------------------------- Adisak Pochanayon 2525 University Avenue Apt #J Madison WI 53706 608-238-2463 pochanay@cae.wisc.edu So you've plugged in your RLE compressor from Parts 1 and 2 and found that including fast compressed graphics is really easy to do in programs. But you want to include help text or other data that isn't easily compressed by RLE methods. You don't want to give up the incredible speed from straight RLE yet you want better compression. One simple solution might be to use one of the fine compression libraries available on the Amiga. However, this isn't always the best solution. Some libraries restrict commercial distribution, other libraries don't provide source code, and still others require you allocate complicated buffers and handles before using them. None of them guarantee to work if you have shut down the OS, multi-tasking or are using nonstandard programming methods (not that I would ever advocate OS shutdowns). Even libraries that are simple to use are not always tolerant to "braindead" installation. If a library is missing, or a wrong revision, your program won't run -- or worse, it could crash the machine. For a beginning or intermediate programmer, the "best" solution might be the simplest solution. Using a very simple, compact routine that can easily be linked or added to programs and used without worrying about opening libraries, looking for include files, allocating buffers, or any other overhead. However, to be viable, the routine would have to feature very fast decompression and have full available source for any custom modifications that a more experienced programmer might wish to make. Before deciding on an exact format for the compression, a programmer might want to examine the available algorithms. RLE, or run-length encoding is great for graphics where you have many repeating bytes of data. That's why it was chosen as part of the IFF standard. However, RLE's compression of this article would be feeble at best and the performance would be noticeably poorer than ZOO, ARC, or LHA. RLE's biggest advantage is it's extremely fast speed (Described in Part 2 of this series). A better method might be to reduce the "size" of codes for ASCII characters that appear often (e,r,o,s,t,l,etc.) and increase the "size" of codes that do not appear often (z,q,x,etc.). Think of it as making the size of codes for the ASCII characters related to their SCRABBLE (tm) scores. This type of compression, which is known as HUFFMAN codes, is much better at compressing text than RLE methods. Unfortunately, using huffman-coding or huffman variants can often be too slow for games or other programs that require extrememly fast decompression. HUFFMAN coding also requires creation and maintainance of structures to describe the codes that can become quite confusing. By varying the codes as one traverses through a file, the compression can be improved even more at a cost of greater file complexity. This is known as ADAPTIVE HUFFMAN compression. Another method of compression is what is often called LZ (Lempel-Ziv) compression. This involves a "history" of some sort representing the last X characters read and a pointer to a current character buffer that has yet to be compressed. The compression program looks for strings matching the current string in the history and replaces matches with a data TAG which describes the offset and length of the data in the history buffer. This is the type of compression that will be discussed in this article. Important note from (UDM): This actually describes the LZ77 group of LZ's. "There are other algorithms in the LZ family (like LZ78/LZW as used in 'compress') that work totally different." Slightly more complicated is a compression called LZH or LZ-Huffman compression. LZH (or LH) often outperforms LZ. This combines the LZ compression with HUFFMAN encoding methods. Adaptive LZH is even more complicated, but again can lead to better compression. Adaptive LZH is not always better as LhA uses static Huffman codes and usually outperforms LhArc which uses dynamic adaptive encoding (UDM). However, these methods of compression are outside the scope of this article which is directed mainly towards intermediate programmers. So which method is best? Well, I planned on including text, graphics, and other files in programs, with an emphasis on EXTREMELY FAST unpacking of compressed data. After examining all the methods available to me, I decided that the most flexible algorithm with enough speed would be a byte oriented LZ compression method. I decided to eliminate CRCs (checks on the integrity of the data) since the data would be "included" within a program and data integrity is most often a problem during telecommunications where a program would normally be transfered as part of a "LZH" or "ZOO" archive. I also eliminated bounds checking since from a programming perspective, I will always know the bounds of the data or I can include a simple structure to describe the size of the data without adding to the complexity of the compressed data itself. These ideas led to my creation of a simple LZ compression method that I call SLZ (version 0). After writing programs to implement my new compression technique, I found that compression was comparable to ZOO (v2.1) and ARC (v5.0). As a matter of fact, on the Fred Fish Library Contents-870, a 1,451,420 byte text, SLZ-0 outperformed ZOO by over 34K and ARC by over 40K !!!! Compression times for the SLZ-0 program are considerably longer than similar compression times with ZOO however (the code is not optimized at all and features exhaustive searching in C which is excruciatingly slow). However, using the asm decompression routines from within a program yielded decompression at a level of speed about 15 times the speed of ZOO from the RAM: disk!!! (Please see Note 2 for more on this claim). Part of this incredible decompression speed comes from the simplicity of the encoding method. SLZ-0 works on a TAG-data format. First, an 8-bit TAG is read which describes the next 8 chunks of data to follow. For each chunk in the data, a bit in the TAG will be set or cleared. A set bit represents compressed data and a cleared bit represents uncompressed data. each time a cleared bit is encountered, the unpacker simply copies a byte from source to destination and continues on to the next chunk. When a set bit is encountered in the tag, the unpacker reads a byte from the source. If this byte is zero, the routine exits. Otherwise, a second byte is read. The combination of these two bytes yields the location and length of the data to copy in the following form (O is offset - L is length): O11 O10 O9 O8 L3 L2 L1 L0 O7 O6 O5 O4 O3 O2 O1 O0 H=4096 The location of the data is determined by subtracting the offset from the current output buffer pointer (as compressed bytes are copied from output to output and not input to output). The length is simply the value represented plus two. The compressor and decompression routine allow you to easily modify the TAG format by changine #defines/EQU's to allow for a choice of History=4096 bytes/MaxString=17 or History=2048 bytes/MaxString=33. The normal default (I found it to be generally better on everything but bitmapped graphics which have long repeat strings), is a History=4096. Changing to History=2048 results in the following CS TAG. O10 O9 O8 L5 L3 L2 L1 L0 O7 O6 O5 O4 O3 O2 O1 O0 H=2048 Again, the location of the data is determined by subtracting the offset from the current output buffer pointer (as compressed bytes are copied from output to output and not input to output) and the length is simply the value represented plus two. However, if you change the compression, be sure to be careful to call the correct decompression routine. Believe it or not, this simple compression scheme yields compression ratios similar to ZOO and decompression speeds that are several times faster. A simple asm implementation of the unpacker is provided for use in programs which include SLZ-0 compressed data. To include compressed data in a file, simply compress it with the compressor and use a directive such as INCBIN from the CAPE 68K assembler to include the data. Then call this routine from C or ASM with the source and destination pointers. It's as easy as folling off a log :) Here's the asm (highly documented for non-asm programmers). ---------------------------BEGIN ASM---------------------------------------- CODE, PUBLIC ***** VOID __regargs UnPackSLZ(UBYTE *from, UBYTE *to) ***** calling from assembly: a0 , a1 ***** This code takes a SLZ packed buffer and unpacks it to ***** another buffer. Note: for maximum speed, no checking of any ***** sort is performed for boundaries. Also, there is no CRC or ***** error correction/detection. It is assumed that your data is ***** valid. ***** Uncomment the following for use with C without __regargs. ***** ***** XDEF _UnPackSLZ ***** _UnPackSLZ: ***** movea.l 4(SP),a0 ***** movea.l 8(SP),a1 ***** CS_MASK=(2^(8-CS_LSL))-1 ***** CS_LSL increases with history size (4->H=4096) CS_MASK EQU 15 CS_LSL EQU 4 XDEF @UnPackSLZ @UnPackSLZ: movem.l d2-d3/a2,-(SP) ; Save Registers bra.b slz_start ; Skip to entry point slz_literal move.b (a0)+,(a1)+ ; Copy 8 byte literal string FAST! move.b (a0)+,(a1)+ move.b (a0)+,(a1)+ move.b (a0)+,(a1)+ move.b (a0)+,(a1)+ move.b (a0)+,(a1)+ move.b (a0)+,(a1)+ move.b (a0)+,(a1)+ slz_start move.b (a0)+,d0 ; Load compression TAG beq.b slz_literal ; 8-byte literal string? moveq #7,d1 ; Loop thru 8 bits slz_nxtloop add.b d0,d0 ; Set flags for this compression TAG bcs.b slz_comp ; If bit is set then compress move.b (a0)+,(a1)+ ; Otherwise copy a literal byte dbf d1,slz_nxtloop ; Check and loop through 8 iterations bra.b slz_start ; Get next TAG slz_comp moveq #0,d2 ; Clear offset register move.b (a0)+,d2 ; Load compression specifier (cs) into d2 beq.b slz_exit ; If cs is 0, exit (decompression finished) moveq #CS_MASK,d3 ; Copy cs into number reg and mask off bits and.w d2,d3 ; num = ( cs & CS_MASK ) [+ 2] ; {at least 3} lsl.w #CS_LSL,d2 ; Multiply cs_or by (2^CS_LSL) move.b (a0)+,d2 ; and replace lsb with rest of cs movea.l a1,a2 ; Now compute the offset from the current suba.w d2,a2 ; output pointer (d2 auto-extends) add.w d3,d3 ; Compute the unroll offset and begin neg.w d3 ; unrolled compressed data expansion jmp slz_unroll(pc,d3.w) slz_exit movem.l (SP)+,d2-d3/a2 ; Restore Registers rts ; EXIT routine **** **** The following is an unrolled loop for compressed data copying. **** The first part of the loop (lines ;33 - ;18) are not necessary **** unless you have specified a history of size 2048 and made the **** apropriate changes to SLZ.c as well. **** **** For when H=2048 move.b (a2)+,(a1)+ ; 33 move.b (a2)+,(a1)+ ; 32 move.b (a2)+,(a1)+ ; 31 move.b (a2)+,(a1)+ ; 30 move.b (a2)+,(a1)+ ; 29 move.b (a2)+,(a1)+ ; 28 move.b (a2)+,(a1)+ ; 27 move.b (a2)+,(a1)+ ; 26 move.b (a2)+,(a1)+ ; 25 move.b (a2)+,(a1)+ ; 24 move.b (a2)+,(a1)+ ; 23 move.b (a2)+,(a1)+ ; 22 move.b (a2)+,(a1)+ ; 21 move.b (a2)+,(a1)+ ; 20 move.b (a2)+,(a1)+ ; 19 move.b (a2)+,(a1)+ ; 18 **** For when H=4096 do not include the above (18-35) move.b (a2)+,(a1)+ ; 17 move.b (a2)+,(a1)+ ; 16 move.b (a2)+,(a1)+ ; 15 move.b (a2)+,(a1)+ ; 14 move.b (a2)+,(a1)+ ; 13 move.b (a2)+,(a1)+ ; 12 move.b (a2)+,(a1)+ ; 11 move.b (a2)+,(a1)+ ; 10 move.b (a2)+,(a1)+ ; 9 move.b (a2)+,(a1)+ ; 8 move.b (a2)+,(a1)+ ; 7 move.b (a2)+,(a1)+ ; 6 move.b (a2)+,(a1)+ ; 5 move.b (a2)+,(a1)+ ; 4 move.b (a2)+,(a1)+ ; 3 slz_unroll move.b (a2)+,(a1)+ ; 2 move.b (a2)+,(a1)+ ; 1 dbf d1,slz_nxtloop ; Check and loop through 8 iterations bra.s slz_start ; Process Next TAG END ---------------------------END ASM------------------------------------------ The compressor/decompressor program in C is mainly a utility to compress data for use by this high speed assembly unpacker. In general the program applies a brute force method for matching strings within the history and provides extremely slow compression. Decompression, however, is rather fast even within the C environment and using a file system. Even though the exhaustive searching is what leads to compression on par with ZOO (often better), it's painful in C. However, I didn't want to write this in asm as the code is currently very portable from machine to machine and it's much harder for intermediate programmers to read asm. I have made some improvements to the C utility to improve compression speeds though. By defining the symbol "hashing_on", the program will allocate a 256K HASH table that is used for quickly matching strings and performing up to 16 collision checks based on the first two bytes of the current string. After these checks have been made, the program resorts to exhaustively scanning the rest of the history buffer. However, the hash table does more than double the speed of the compression. Even with the hashing on, compression can take a while but you'll be rewarded with the high speed at which your data can be decompressed. The compressor works quite simply. First, it examines the string in the input file that is currently being pointed to. If the hashing option is present, the program checks hashed pointers for matching strings. It then searches the remainder of the history buffer exhaustively for matches of length greater than two. When it finds a string longer than the current best match, the current best match is updated. If a string of the maximum compressed string length is found, the searching is halted. As long as a string of length greater than two is found, the compressor sets a bit in the TAG and creates a CS specifier which contains data with the offset and length of the compressed sctring for the output file. Otherwise, no bit in the TAG is set and a single literal byte is copied to the output. The program continues until the input is exhausted. When the input is exhausted, the program sets a bit in the TAG and writes a zero (byte) to the output signifying it has finished compression. An interesting aspect of this compression technique is how much data remains "intact" from input file to output file. When looking at the output of most compression programs, a human will see nothing but jibberish. The output of a text file compressed with SLZ will still look partially intelligible to a human and can be hand-decoded much more easily. The decompression is simply a C version of the asm decompressor (the asm was actually written first) that is file oriented in it's output. It is robust but lacks optimization. This is OK though because it is still rather fast (extremely fast compared to compression). If anyone writes a faster version of this program (very easy to do since I have made few optimizations and the program is more "robust" in error handling than "efficient" in speed), please let me know. I plan on writing a faster version myself when I have the time but finding the time to write a ~40K article is hard enough. The program is used with the following syntax: SLZ (u or p) arg_src arg_dest SLZ FLAG SOURCE DESTINATION FLAG can be U - unpack P - pack ---------------------------BEGIN C------------------------------------------ ;/* SLZ.c - SLZ (un)packer -- Source for Lattice/SAS C version 5.10 LC -rr -b1 -ccist -v -O -L SLZ.c quit */ /***************************************************************************** SLZ - SLZ (un)packer. C. 1993 SilverFox SoftWare. Written by Adisak Pochanayon All Rights Reserved. This program is a simple example of using SLZ compression upon a file. It will allow you to both pack and unpack a file using SLZ compression. The syntax for calling this program from the CLI is: SLZ FLAG SOURCE DESTINATION FLAG can be U - unpack P - pack *****************************************************************************/ // Comment out the following line to save 256K of memory but can slow down // compression (immensely :( ). #define hashing_on #include Discuss this article in the forums
See Also: © 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
|