Data compression schemes fall into two categories. Some are lossless, others are lossy. Lossless schemes are those that do not lose information in the compression process. Lossy schemes are those that may lead to the loss of information. Lossy techniques provide more compression than lossless ones and are therefore popular in settings in which minor errors can be tolerated, as in the case of images and audio. In cases where the data being compressed consist of long sequences of the same value, the compression technique called run-length encoding, which is a lossless method, is popular.
It is the process of replacing sequences of identical data elements with a code indicating the element that is repeated and the number of times it occurs in the sequence. For example, less space is required to indicate that a bit pattern consists of 253 ones, followed by 118 zeros, followed by 87 ones than to actually list all 458 bits. Another lossless data compression technique is frequency-dependent encoding; a system in which the length of the bit pattern used to represent a data item is inversely related to the frequency of the item’s use.
Such codes are examples of variable-length codes, meaning that items are represented by patterns of different lengths as opposed to codes such as Unicode, in which all symbols are represented by 16 bits. David Huffman is credited with discovering an algorithm that is commonly used for developing frequency-dependent codes, and it is common practice to refer to codes developed in this manner as Huffman codes. In turn, most frequency-dependent codes in use today are Huffman codes. In some cases, the stream of data to be compressed consists of units, each of which differs only slightly from the preceding one.
An example would be consecutive frames of a motion picture. In these cases, techniques using relative encoding, also known as differential encoding, are helpful. These techniques record the differences between consecutive data units rather than entire units; that is, each unit is encoded in terms of its relationship to the previous unit. Relative encoding can be implemented in either lossless or lossy form depending on whether the differences between consecutive data units are encoded precisely or approximated. Still other popular compression ystems are based on dictionary encoding techniques. Here the term dictionary refers to a collection of building blocks from which the message being compressed is constructed, and the message itself is encoded as a sequence of references to the dictionary. We normally think of dictionary encoding systems as lossless systems, but as we will see in our discussion of image compression, there are times when the entries in the dictionary are only approximations of the correct data elements, resulting in a lossy compression system.
Dictionary encoding can be used by word processors to compress text documents because the dictionaries already contained in these processors for the purpose of spell checking make excellent compression dictionaries. In particular, an entire word can be encoded as a single reference to this dictionary rather than as a sequence of individual characters encoded using a system such as ASCII or Unicode.