Say one wishes to describe some set X using symbols Σ. A code is a one to one function c:X→Σ* that matches every element of X to a word using the letters of Σ. Call the set C=c(X) of all words used to describe X the set of codewords. c is prefix-free if no codeword is a prefix of another codeword.
Examples and non-examples
- ASCII, or any fixed-length encoding is prefix-free, since different words of the same length cannot be prefixes of one another.
- Morse code is not prefix-free! For instance, "." (the code for 'e') is a prefix of "..." (the code for 's'). Indeed, when transmitting Morse code three symbols are used: dot, dash, and a short silence between letters.
- Any code that has a distinct stop symbol is prefix-free. Morse code considered with the 3 symbols above is one example; null terminated strings are another.
- Consider X={a,b,c,d,e}. The code
a ⇒ 01, b ⇒ 00, c ⇒ 100, d ⇒ 1011, e ⇒ 1111
is prefix-free, but if we reverse the encoding of each letter the code is no longer prefix-free.
Uses
A prefix-free encoding is useful because it is self-delimiting: if we have any string of symbols, there cannot be two ways to decode it into a sequence of elements of X. This has obvious advantages for transmission, but prefix-free encodings are also useful elsewhere:
- Huffman codes are prefix-free (even though their reverses often aren't, as above); being prefix-free has obvious advantages in compression, and indeed...
- Kolmogorov-Chaitin complexity is defined using a prefix-free encoding of Turing machines.
- If we ignore comments, XML documents are prefix-free: There is one top-level tag, and upon closing it the document cannot be extended.
Yay! Another long short for bq2K6!