Wyvern Semiconductors

Expert in Digital IP Solutions

Data Compression Implementation

by Simon Southwell
10th September 2004

1 Introduction
2 Dictionary: 2.1 Bounding the Dictionary; 2.2 Resetting Choices; 2.3 Implementing a Dictionary
3 Packing and Unpacking: 3.1 Dynamic Packing; 3.2 Implementation
4 Compression
5 Decompression
6 Conclusions
7 Downloadable Code and Useful Links

1 Introduction

In this article, as a working example, we'll look at implementing a simple LZW algorithm (slzw). In doing this we can highlight the limitations and boundaries that exist when turning an algorithm into a working implementation. LZW is chosen as it follows naturally on from the previous article, and because it forms the basis of many practical implementations, including GIF graphics files, hardware compression in tape drives and many other applications.

In section 3 of the previous article the LZW algorithm was discussed in general terms and idealistically. In order to realise a practical implementation, whether hardware or software, some boundaries must be defined and choices made. In the next few sections a simple software implementation is described, laying out where choices are made, and what alternatives might exist, as well as alluding to the architecture of a hardware solution. It may seem that the discussion is ordered the 'wrong' way round, in that we'll look at the actual compression and decompression code last, but this is because those functions are built on the features of the ancilliary functions. The compressor matches and builds entries in the dictionary and sends on codewords to the packer. The decompressr gets codewords from the unpacker and reads and builds entries in the dictionary. So we need to understand these functions first. So let's look at the dictionary.

2 Dictionary

In the LZW discussion of the previous article, there was an unspoken assumption that is actually impractical—the dictionary is infinitely large. In this section we are going to look at some bounding of the dictionary to make it practical, and how a dictionary is realistically implemented. The goals are twofold. Firstly, what sensible limit on dictionary size is acceptable, and secondly, how can we construct it so that it is reasonably fast to use.

2.1 Bounding the Dictionary

The actual size (in bytes) of a dictionary is determined by two things: the size of each entry, and the number of entries. Let's bound the latter of these first (why not!). Making the dictionary as large as possible more nearly approximates the ideal of infinite, but you'll need much more space to store it, and (perhaps) a very long time to search it for an entry. Making it very small (and it must have at least 256 + 1 entries, or we're not making a compressor!) is not likely to yield a useful implementation. A sensible approach is to choose a power of 2 size from 2^9 upwards. Each increment yields a dictionary which is double in size, and requires double the resources, so this quickly becomes a limiting factor. In all practical LZW based implementations I have seen (both software and hardware) dictionaries are between 1K and 64K bytes. Any smaller and compression figures are severely hit; any larger, and the gain in performance is negated by the resources required. So we will choose a midway point of 4K, which is used in the DCLZ algorithm embedded in DAT tape drives. By making this choice, we have significantly bounded the dictionary. It has 4096 entries, and this limits the maximum codeword width to 12 bits. Knowing this defines the entry width to 12 + 8 = 20 bits, as the dictionary entry pointers are the same size as the codewords (indeed they are synonymous). As a minimum then, we need (4096 x 20)/8 = 10K bytes of resource for a dictionary.

2.2 Resetting Choices

Now we've bound our dictionary, there is the problem of what to do when it's full. This question opens up a whole can of worms, as there many possible choices, which may have some advantages for some cases, whilst being quite disadvantageous in others. The issue is, when the dictionary is full, how relevant is it to current input data? If the data remains similar in character (say we're compressing a large source of English text), then it may be very relevant. On the other hand, if we're compressing, say, an archive file which contains an eclectic mix of small files of different types, then the dictionary will become irrelevant when the file type switches after it's become full. The simplest thing to do is reset the dictionary when its full (or at least when we want to build a new entry after its full). This is great, but there is a penalty in that compression ratios suffer after a reset when there are less entries to match against. An advantage is that the compression and decompression resets don't have to be explicitly synchronised, since the dictionary is built up in the same manner in both cases, and reaches full in the same place (well, almost—see below). A more elaborate method is to somehow measure the 'effectiveness' of the dictionary, and only reset it when it becomes 'ineffective'. How to do this is the subject of much research and patent filing, and without going in to details, measuring the compression ratio over a set number of input bytes can achieve this. An even more complex enhancement is to only reset some of the entries which are less 'relevant'. But that is beyond the scope of this article. Suffice it to say that the common factor with these methods is that the resetting of the dictionary is done externally to it. Therefore we need to synchronise between compression resets and decompression resets. This is usually done by inserting a marker in the compression stream. Reserving one of the codewords used for a dictionary entry is one method. In DCLZ this is the Reset codeword (value 0x001). Whenever the dictionary is reset, the compressor inserts a Reset codeword. On decompression, if a Reset is encountered then the dictionary is reset before continuing. The advantage of the special codeword is that decompression is immune to the method used to determine reset on compression. So it can be made completely configurable, dynamic, or whatever, during compression without precluding the decompressor from decoding the data—even a different implementation by a third party. Having said all that, we're going straight back to square one, and choosing to implement a 'reset on full' approach.

2.3 Implementing a Dictionary

We may have defined a dictionary's boundaries, but this doesn't tell us how to use it. There are two main things we must be able to do with the dictionary; add new entries and determine whether a given value is in the dictionary (is 'matched'). Building an entry is a trifle more involved than simply writing a value (but not much more!). The address used when writing the entry is managed by the dictionary itself. An incrementing counter is kept, initially set to the lowest dynamic entry, and incremented as each new entry is written. This counter is managed by the dictionary, but is going to be needed externally and so must be exported (preferably behind a 'data hiding function', but our example won't go this far for the sake of clarity). An example function is shown:

static unsigned int build_entry(codeword_type codeword, byte_type byte,

boolean compress_mode)

{

static unsigned int codeword_len = MINCWLEN;

if (next_available_codeword == DICTFULL) {

next_available_codeword = FIRSTCW;

codeword_len = MINCWLEN;

return codeword_len;

}

dictionary[next_available_codeword].pointer = codeword;

dictionary[next_available_codeword].byte = byte;

if (compress_mode)

indirection_table[codeword][byte] = next_available_codeword;

if (codeword_len < MAXCWLEN) {

if (next_available_codeword == (1U << codeword_len) - (!compress_mode ? 1U : 0U))

codeword_len++;

} else if (!compress_mode && next_available_codeword == (DICTFULL-1))

codeword_len = MINCWLEN;

++next_available_codeword;

return codeword_len;

}

This code assumes a pre-existing global array (dictionary) each entry of which is a structure with a pointer and byte field. A codeword and byte to be written is passed in, and a flag to indicate whether we're compressing (as opposed to decompressing—the dictionary routines are common to both). The first lines test for a full dictionary and reset it if it is, before returning. The entry to build in that case is effectively discarded. If the dictionary isn't full the codeword and byte values are written to the dictionary at the address indicated by the next_available_codeword counter, which is incremented before returning. Before returning, though, some additional processing is done. An indirection table is updated. More on this below, but it is only a method of quickly locating an entry during a match search, which could be done in many different ways, such as a hash table or even a linear search if speed is not important. The next bit of code is determining whether the current dictionary entry (next_available_codeword) is larger than the current packing/unpacking width (see section 3 below). If it is, codeword_len is incremented to increase the packing/unpacking width. Finally, when decompressing, the entries being built lag behind compression by one codeword (the famous KωK exception case is caused by this), so we have to reset the codeword_len one entry early to compensate.

Searching the dictionary to see whether a given data value matches one already built is more difficult. As mentioned above, one solution is to linearly search from the first dictionary entry to the last built entry, but this is very slow. A hash table is a much better idea. For those unfamiliar with this technique, basically instead of storing the entry linearly against an address, an address is generated from the data, trying to make it pseudo-random like. So usually, bits are jumbled up, and/or inverted etc. At this data dependant location is stored the 'codeword' (next_available_codeword) that would have been the address of the linear implementation, along with the data itself (or some unique part of it). When searching for an entry, the hashed address can be regenerated using the same algorithm, and that entry inspected for the data we're try to match. If it's the same, then the stored codeword is retrieved. This sounds great, and is very fast, but has some drawbacks. For an effective hashing algorithm, the number of entries must be greater than for a linear dictionary. This is because of another drawback, different values can generate the same address (whatever hashing algorithm you come up with). This is alleviated somewhat by making the hashing dictionary bigger (at the cost of additional resources) and making 'collisions' less likely, but even so, these collisions must be dealt with, and extra complexity is added to detect a collision and end up at a unique location for a given entry. This method can be used, however, in both software and hardware implementations affectively, and some early DCLZ hardware in DAT tape drives used this method. More recently, as of writing, hardware implementations use a type of memory called content-addressable-memory (CAM). CAMs can act like ordinary SRAM, but in addition allow a parallel search of a value in all its locations at once, returning a 'match' status and an address of the matched data (if any). We can't do this same technique in software, but we can avoid the hasing penalties at the expense of more memory requirements—but it is much more plentiful on a computer than on a silcon chip anyway.

The technique makes use of an indirection table, which is nothing new, but its use with respect to an LZW based dictionary is novel to me as far as I know. We can take advantage of the fact that each entry in the dictionary is unique (if the dictionary is a valid one!), and that it is contiguous from the first entry to the top; i.e. has a valid entry, without holes, up to next_available_codeword. The table is a two dimensional array which is 256 wide by 'dictionary size' long (4K in our case). It could have been simply a linear 256*4K array, but the important point is that there is an entry for every single codeword/byte pair that can exist. When a new entry is built in the dictionary, the address of the new entry is stored in the indirection table at indirection_table[codeword][byte]; i.e. the unique location defined by the data. This is what's happening in the build_entry() function above. When we want to match a particular value (a codeword and byte pair) in the dictionary, the dictionary address is fetched from the indirection table and the dictionary entry compared with the match data. If it is the same then we have a match, and the address from the table is the codeoword we need, or we haven't matched, and the data is not in the dictionary. Let's look at the code:

static codeword_type entry_match(codeword_type pointer, byte_type byte)

{

codeword_type addr;

addr = indirection_table[pointer][byte];

if (addr < FIRSTCW || addr >= next_available_codeword)

return NOMATCH;

if ((dictionary[addr].byte == byte) &&

(dictionary[addr].pointer == pointer))

return addr;

return NOMATCH;

}

Hopefully it is clear that the above code implements the described operation. The only extra bit is a check that the retrieved address is within the valid dictionary range, and returing a NOMATCH if it isn't. By doing this we avoid having to initialise the table (which can be quite large), and random invalid values cause a mismatch. Of course a random value may lie within a valid dictionary location, but this still doesn't matter. If the codeword/byte pair has never been written in the dictionary, then that random value can't contain the value, since all dictionary entries are unique, and we mismatch. If the value has been written in the dictionary, then the indirection table won't have a random value in it at that location, and we get a match.

These two functions constitute the two operations needed by the compression and decompression code and, as stated before, the next_available_codeword counter, defining the top of the valid dictionary, is exported (in our case a simple global) to enable inspection of the reset state. One last operation is needed from the dictionary, and that is to read a particular location in decompression, where incoming codewords define this address. In the example code this is acheived by making the dictionary global and reading it directly. Again this is purely for clarity—keeping the number of lines of code to a minimum is essential in a document, but ideally the dictionary would have a data hiding function which returned the pointer and byte value of a given entry, leaving the dictionary array completely local to the dictionary routines.

So we have a dictionary. Before moving on to compression and decompression we still need some more functions. To get the most performance from our implementation, the steady stream of codewords need to be 'packed' as tightly as possible, and subsequently 'unpacked' before decompressing. We'll look at this in the next section.

3 Packing and Unpacking

The pure LZW algorithm discussed in the previous article will generate a stream of codewords. By bounding the size of the dictionary (e.g. to 4K bytes) that stream of codewords will be fixed (to 12 bits for a 4K dictionary). Since the codewords might not exactly fit into bytes, it is useful to 'pack' the codewords so as to avoid wasted padding bits to make the codewords align to byte boundaries. Packing simply means appending the next codeword at the bit position after where the current output codeword finishes. Unpacking is simply the reverse. Since the codeword size is known, enough bytes are read to have a whole codeword, which can then be passed into the decompression processing. Any remaining bits are held over and used as the begining of the next codeword.

3.1 Dynamic Packing

The above scheme is all very well, and it avoids the insertion of redundant bits (not good in a compression algorithm). However, from the previous article, you may have noticed that initially, when only a few dictionary entries have been built, all the codeword values are small. As the dictionary is filled, higher value codewords become possible. So, when packing, we can take advantage of this and pack the codewords more compactly by treating them as being less wide as the maximum width. For example, for a 4K dictionary with 12 bit codewords, initially 9 bits are all that are required to pack until the dictionary has an entry in 0x200 (512) when 10 bits are needed, and so on until entry number 0x800 (2048) is built, when 12 bits (the maximum) are needed. There's a choice about whether to pack at the new width when the entry is built, or when a wider codeword must be packed (since there may be a delay between the two). We'll discuss this later, but suffice it to say that we can inform the packer what the current width requred is, and adjust the packing accordingly. This gives us the best packing of codewords we can acheive.

One other arbitrary choice we have is whether to pack codewords least- or most-significant byte first. It doesn't matter which, but we have to choose the same between packing and unpacking! I prefer least significant first, and this is used in a format called DCLZ. But another algorithm ALDC uses most-significant. Using lsb, let's have an example. Suppose we're packing 9 bit codewords:

In the above sequence, time is flowing from right to left, so that the codeword sequence (in hex) is 061, 020, 109, 10b, with the binary values underneath. Hopefully you can see that the second row of binary values are the same bit sequence as the first, but have been split at every eighth bit instead of ninth. Converting back to hexadecimal values now gives a byte sequence of 61, 40, 24, 5c. We have 4 bits left over, which will form part of the next byte, or if this is the last byte, would be flushed to a byte boundary to give 08. Unpacking is the reverse, with the bytes read and the correct number of bits for a whole codeword pulled off to form a stream of unpacked codewords.

Now to determine when to update the packing width. The simplest approach is to set the width to match the last built dictionary. Since we can only output codewords that are built in the dictionary, we'll never find ourselves in the position of trying to pack, say, a 12 bit codeword when we're only packing at 11 bits. It is also easy to synchronise between packing and unpacking as the dictionary is built in the same manner between compression and decompression. There are two issues with this however; a small and a not so small one. The small issue is that the packing/unpacking has to have access to some internal dictionary state; i.e. the highest built entry. Not a disaster, but it would be better if these mostly disassociated functions remained decoupled.

The second more relevent issue is that when an entry is built at one of the bit boundaries, it cannot be used straight away. At least one (necessarily smaller) codeword will be output first, and more likely several codewords. These codewords will be packed more loosely than is strictly necessary, wasting bits. We can solve this by waiting to up the packing width until a codeword is ready to be output that requires the wider width. The packer can determine this by inspecting the upper bits of the codewords and if any are set higher than the current packing width, it can increase as necessary. There is also no need to inspect any dictionary state, and its input is all that's required. Unfortunately, during unpacking it is not possible to know in advance when the packer would have increased the width. So some sort of indicator must be inserted into the data stream to inform the unpacker to increases its width. This is usually done by reserving one of the dictionary codewords to have a special meaning. It makes sense if we choose one of the lowest numbered codewords (e.g. 0x100) as this will be nine bits, and so most efficient. In DCLZ, this codeword is called a 'Grow' codeword, and (for strange historical reasons I won't go into) has a value 0x002. Whenever the packer receives a codeword which is too big, it inserts a Grow codeword (at the current packing width) and increments its current width. When the unpacker receives a Grow codeword, it increments its current width, and then discards the codeword as the decompressor knows nothing of this type of command. It should be noted that it is possible to jump more than one width (from say 10 to 12), and the packer must insert as many Grow codewords as necessary to get to the required width. The advantage of this method is that it decouples the dictionary from packing and doesn't waste bits in the packed codewords. However, bits are wasted in outputing a Grow codeword, and a dictionary entry has been wasted to reserve a number for the Grow. In addition it is more complex to implement and the benefit may be dubious. Although there may be a significant lag between a dictionary entry being built and its codeword being output, usually this is not the case as much repetition is localised in common data (though there are no hard and fast rules on this). My own feeling is that it is an optimisation too far, and that dictionary synchronised packing width is simpler and of no significant degradation in performance.

One final thing before we look at some code implementation. If the dictionary ever gets reset, the packing/unpacking width must also revert to its minimum value. This may mean that the packer/unpacker must interpret another special codeword (Reset), or have further tendrils into the dictionary to determine when its is reset.

3.2 Implementation

Let's see some code for a packer:

static void pack(codeword_type ip_codeword, unsigned int codelen,

boolean flush, FILE *ofp)

{

static unsigned long barrel = 0;

static int residue = 0;

barrel |= ((ip_codeword & CODEWORDMASK) << residue);

residue += codelen;

while (residue >= (flush ? BITSIZE : BYTESIZE)) {

putc((barrel & BYTEMASK), ofp);

barrel >>= BYTESIZE;

residue -= BYTESIZE;

}

This function has four inputs. The codewords arrive as ip_codeword, and the packing width has been determined from dictionary externally, and is input as codelen. The flush flag indicates the end of data case mentioned in the above example, where the final bits must be padded to a byte boundary. Finally *ofp is simply a file pointer to the destination output stream.

The code works by appending the ip_codeword onto barrel (a barrel shifter) above any bits that remain on the shifter, as indicated by residue. For every ip_codeword added, residue is incremented by the current width (codelen). Then, for as many whole bytes as there are on the barrel shifter (as indicated by residue), the bottom eight bits are output to the output file, the barrel shifted right 8 bits and residue decremented by 8. The exception to this is if flush is set, when bytes are output until there are no residue bits (residue is 0).

An important point to note is that the maximum number of bits on the barrel shifter is never more than 7 plus the maximum codeword width (e.g. 19 for a 4K dictionary). So the type of the barrel in C program (or width of a register vector in a hardware implementation) needs to take this into consideration. Unpacking is the reverse of packing.

static int unpack(codeword_type *codeword, unsigned int codelen, FILE *ifp)

{

static unsigned int residue = 0, barrel = 0;

short int ipbyte;

while (residue < codelen) {

if ((ipbyte = getc(ifp)) == EOF)

return EOF;

barrel |= (ipbyte & BYTEMASK) << residue;

residue += BYTESIZE;

}

*codeword = (barrel & ((1 << codelen) - 1));

residue -= codelen;

barrel >>= codelen;

return ipbyte;

}

The arguments are similar to pack(), but *codeword is a pointer where unpack() will output a codeword. The codelen input is the same, and *ifp is the input stream file pointer. So here, the unpacker appends bytes to the barrel shifter, adding 8 to residue, until enough bits for a codeword are read. The bottom codelen bits are then returned in *codeword, the barrel shifted right to delete those bits, and residue adjusted accordingly. One special case is when end-of-file is reached (EOF). Here no processing is done, but EOF is returned to tell the decompressor that no more data is coming. So, normally we must return something other than EOF, which what the last line is about.

So now, with a dictionary and packer/unpacker, we have enough to look at compression and decompression (at last!). This is discussed in the next section.

4 Compression

Having (wisely) defined the dictionary and packing functions, compression becomes a breeze. We simply keep pulling bytes in, matching the byte and the previously matched codeword (if any) in the dictionary until there is no match. Then we output the most recent matched codeword and build a new entry for this codeword and byte that we mismatched. The code is shown below:

static void compress(FILE *ifp, FILE *ofp)

{

codeword_type previous_codeword = NULLCW;

unsigned int code_size = MINCWLEN;

codeword_type match_addr;

int ipbyte;

while ((ipbyte = getc(ifp)) != EOF)

if (previous_codeword == NULLCW)

previous_codeword = ipbyte;

else

if ((match_addr = entry_match(previous_codeword, ipbyte)) == NOMATCH) {

pack(previous_codeword, code_size, FALSE, ofp);

code_size = build_entry(previous_codeword, ipbyte, TRUE);

previous_codeword = ipbyte;

} else

previous_codeword = match_addr;

pack(previous_codeword, code_size, TRUE, ofp);

}

So, the compress() function has two arguments, which are the input and output file pointers. The heart of the function is the while loop pulling in bytes into ipbyte, until the end-of-file. The special case of the very first byte is caught when previous_codeword is NULL. The previous_codeword variable holds the best match we have so far, but at the beginning we haven't done a match. So in this case the ipbyte value is simply copied to previous_codeword, as we now have an implied match on the 'root' codeword for the input byte. In our scheme root codewords run from 0 to 255 matching the bytes—a sensible choice I trust you'll agree. However, not all schemes follow this obvious choice and DCLZ, for example, has root codewords which run from 8 to 263! (It has 8 reserved codewords from 0 to 7).

Normally, we pull a byte in and do a match by calling the dictionary function entry_match(). If we have a match, then the returned match address becomes the previous_codeword, and we go round the loop again. If we didn't match, then the best match we had so far is output by sending to the packer (see section 3) and a new entry built for the failing match. We have output previous_codeword, but the byte in ipbyte is not yet represented in the output, so we must copy it to previous_codeword as it is now our best match (or its root codeword value is, which is the same thing for us). When end-of-file is reached, and we drop out of the loop, the output packer is called once more, as we will have a best match still in previous_codeword, even if it is just the last byte's root codeword. The pack() function is normally called with its third argument as FALSE, but the last call has it set to TRUE. This informs the packer that no more data is coming and to 'flush' to a byte boundary. Also the code_size variable defines the 'packing width' to use. It is based on the highest dictionary entry built, and is set when build_entry() is called which returns the decoded value.

5 Decompression

Decompression is only slightly more involved than compression. Here we pull in a stream of codewords, and traverse the dictionary stuffing the byte values of the entries onto a stack. When a root codeword is reached, the stack is flushed to the output, reversing the order of the bytes read from the dictionary. Let's look at some code:

static void decompress(FILE *ifp, FILE *ofp)

{

static byte_type stack[MAXDICTSIZE];

static unsigned int stack_pointer = 0;

codeword_type ip_codeword, prev_codeword = NULLCW;

unsigned int code_size = MINCWLEN;

byte_type byte;

codeword_type pointer;

while (unpack(&ip_codeword, code_size, ifp) != EOF) {

if (ip_codeword <= next_available_codeword) {

pointer = ip_codeword;

while (pointer != NULLCW) {

if (pointer >= FIRSTCW) {

if (pointer == next_available_codeword && (prev_codeword != NULLCW))

pointer = prev_codeword;

else {

byte = dictionary[pointer].byte;

pointer = dictionary[pointer].pointer;

}

} else {

byte = pointer;

pointer = NULLCW;

}

stack[stack_pointer++] = byte;

}

while (stack_pointer != 0)

putc(stack[--stack_pointer], ofp);

if (prev_codeword != NULLCW)

code_size = build_entry(prev_codeword, byte, FALSE);

prev_codeword = ip_codeword;

} else {

fprintf(stderr, "***decompress: Error --- UNKNOWN CODEWORD (0x%03x)\n", ip_codeword);

exit(DECOMPRESSION_ERROR);

}

Like compression, a 'while' loop forms the heart of the function. Codewords are fetched by calling the unpacker (unpack()) which returns a codeword into ip_codeword. A check that the codeword is valid is done before proceeding (it can happen!) and an error printed if something's wrong. Otherwise we traverse the dictionary, which is what the inner 'while' loop is doing. When pointer (which holds the current dictionary address whilst we're traversing) is a root codeword (less than FIRSTCW) then we construct a root entry, making the byte value simply the pointer (in our example) and setting prev_codeword to NULL. If it's a dynamic codeword then we get the new byte and pointer value directly from the dictionary. However, we have to detect and deal with the KωK case. This is detected when the pointer is equal to the dictionary's next_available_codeword value (and this is not the first time through the loop—prev_codeword is not NULL). In that case we reconstruct the pointer from the prev_codeword of the last time round the loop, and the byte value is also correct from last time (i.e. the last output byte that terminated the previous string). In all cases, the byte value we determined gets added to the stack. When we reached a root codeword, the inner loop terminates and the stack gets flushed to the output, reversing the bytes. A new entry in the dictionary is then built (except first time round) using the prev_codeword value and the last byte pushed on the stack. The input codeword we just processed then becomes the previous codeword for the next loop.

As in the packer during compression, the unpacker is informed of the current codeword width based on the dictionary's highest entry, and code_size is updated on every call to the dictionary routine build_entry().

6 Conclusions

So that's pretty much all the code we need. A top level main() function can define the input and output streams (*ifp and *ofp) and select between compression and decompression. Actually I have done this for you and the entire code can be accessed on github by clicking here. This is a single, self contained file (slzw.c) which should compile on just about any platform (I've tried Solaris, Linux and Windows—you'll need getopt() for windows; try here), and has all the functions already mentioned and everything else that's required. The whole thing is less than 250 lines of code, and still achieves a satisfactory performance in both speed and compression ratio. Feel free to try enhancing the code for larger dictionaries, different reset strategies, or any other enhancement you can think of. The code is copyright free (though this text isn't) and the patent for the LZW algorithm, held by Unisys, has recently expired in all countries. So you are also free to incorporate this code into your own products without violating the law.

7 Downloadable Code and Useful Data Compression Links

More detail on lossless compression and real implementations of data compression can be found via the following links.

slzw C source code on github