Examples of Codes
Definition
A source code for a random variable is a mapping from , the range of , to , the set of finite-length strings of symbols from a -ary alphabet. Let denote the codeword corresponding to and let denote the length of .
For example, , is a source code for with alphabet .
Length
The expected length of a source code for a random variable with PMF is given by
where is the length of the codeword associated with .
Singularity
A code is said to be non-singular if every element of the range of maps into a different string in (Injection); this is,
Extension
The extension of a code is the mapping from finite-length strings of to finite-length strings of , defined by
Example
If and , then
Uniquely Decodable
A code is called uniquely decodable is its extension in non-singular.
Prefix Code
A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword.
Note
Instantaneous codes Uniquely decodable codes Nonsingular codes All codes
Summary
Singular | Nonsingular, But Not Uniquely Decodable | Uniquely Decodable, But Not Instantaneous | Instantaneous | |
---|---|---|---|---|
1 | 0 | 0 | 10 | 0 |
2 | 0 | 010 | 00 | 10 |
3 | 0 | 01 | 11 | 110 |
4 | 0 | 10 | 110 | 111 |
Tip
The crucial difference of non-instantaneous code and instantaneous ones lies in the delay and complexity of decoding. While both can be decoded eventually, the non-instantaneous code incurs additional costs:
- For non-instantaneous code: Every encoded bitstream has only one possible original sequence of symbols. However, you might have to look ahead in the stream before you can definitively decode a symbol. It requires buffering bits and delaying the output.
- For instantaneous code: You can decode each symbol immediately as you read the bits. No lookahead is needed
Kraft Inequality
Motivation
We want to construct instantaneous codes of minimum expected length to describe a given source.
Definition
For any instantaneous code (prefix code) over an alphabet of size , the codeword lengths must satisfy the inequality
Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths.
Intuition
For an alphabet of size , the tree is -ary (each node has children). The "space" taken by a codeword of length id (since it occupies one of the possible leaves at depth ). And the Kraft inequality ensures that the total "space" occupied by all codewords does not exceed the tatal available space (1).
Proof
Necessity: If a prefix code exists, then Kraft holds.
- Consider the full -ary tree of depth (the longest codeword)
- A codeword of length blocks leaves (all its descendants)
- Since no two codewords share a leaf, the total number of blocked leaves is
- Dividing both sides by gives the inequality
Sufficiency: If Kraft holds, a prefix code exists.
- Sort the lengths
- For each length , pick any unused node at depth and block all its descendants.
- The Kraft Inequality ensures that there is always enough "space" to assign the next codeword without violating the prefix-free property.
Extended Kraft Inequality
For any countably infinite set of codewords that form a prefix code, the codeword lengths satisfy the extended Kraft inequality
Optimal Codes
Minimization via Lagrange multipliers
We want to minimize
over all integers satisfying
We neglect the integer constraint on and assume equality in the constraint. Hence, we can write the constrained minimization using Lagrange multiplier as the minimization of
Setting , we get and
This non-integer choice of codeword lengths yields expected codeword length
Lower bound of expected length
The expected length of any instantaneous -ary code for a random variable is greater than or equal to the entropy ; that is
with equality if an only if .
Bounds on the Optimal Code Length
Let be optimal codeword lengths for a source distribution and a -ary alphabet, and let be the associated expected length of an optimal code (). Then
Therefore, the minimum expected codeword length per symbol satisfies
Wrong code
We can not always precisely estimate the real distribution of data. We consider the Shannon code assignment designed for the probability mass function . Suppose that the true probability mass function . Thus, we will not achieve expected length .
We now show that the increase in expected length due to the incorrect distribution is the relative entropy :
Kraft Inequality for Uniquely Decodable Codes
The codeword lengths of any uniquely decodable -ary code must satisfy the Kraft inequality
Conversely, given a set of codeword lengths that satisfy this inequality, it is possible to construct a uniquely decodable code with these codeword lengths.
Corollary A uniquely decodable code for an infinite source alphabet also satisfies the Kraft inequality.
Huffman Codes
Algorithm
An optimal prefix code for a given distribution can be constructed by a simple algorithm discovered by Huffman.
- List the symbols and their probabilities in ascending order of probabilities (or any order if probabilities are the same).
- For every iteration:
- Find the two symbols with the lowest probabilities
- Combine these two symbols into a new node, with a probability equal to the sum of their probabilities
- Assign '0' to the branch leading to the first symbol and '1' to the branch leading to the second symbol in our combined node.
Example
Consider a random variable taking in the set with probabilities , , , ,
Now we get
Codeword Length | Codeword | X |
---|---|---|
2 | 11 | 1 |
2 | 00 | 2 |
2 | 01 | 3 |
3 | 100 | 4 |
3 | 101 | 5 |
Tip
If , we may not have a sufficient number of symbols so that we can always combine them at a time. In such a case, we add dummy symbols to the end of the set of symbols. The dummy symbols have probability and are inserted to fill the tree.
Since at each stage of the reduction, the number of symbols is reduced by , we want the total number of symbols to be , where is the number of merges.