Examples of Codes

Definition

A source code for a random variable is a mapping from , the range of , to , the set of finite-length strings of symbols from a -ary alphabet. Let denote the codeword corresponding to and let denote the length of .

For example, , is a source code for with alphabet .

Length

The expected length of a source code for a random variable with PMF is given by

where is the length of the codeword associated with .

Singularity

A code is said to be non-singular if every element of the range of maps into a different string in (Injection); this is,

Extension

The extension of a code is the mapping from finite-length strings of to finite-length strings of , defined by

Example

If and , then

Uniquely Decodable

A code is called uniquely decodable is its extension in non-singular.

Prefix Code

A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword.

Note

Instantaneous codes Uniquely decodable codes Nonsingular codes All codes

Summary

SingularNonsingular, But Not Uniquely DecodableUniquely Decodable, But Not InstantaneousInstantaneous
100100
200100010
300111110
4010110111

Tip

The crucial difference of non-instantaneous code and instantaneous ones lies in the delay and complexity of decoding. While both can be decoded eventually, the non-instantaneous code incurs additional costs:

  • For non-instantaneous code: Every encoded bitstream has only one possible original sequence of symbols. However, you might have to look ahead in the stream before you can definitively decode a symbol. It requires buffering bits and delaying the output.
  • For instantaneous code: You can decode each symbol immediately as you read the bits. No lookahead is needed

Kraft Inequality

Motivation

We want to construct instantaneous codes of minimum expected length to describe a given source.

Definition

For any instantaneous code (prefix code) over an alphabet of size , the codeword lengths must satisfy the inequality

Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths.

Intuition

For an alphabet of size , the tree is -ary (each node has children). The "space" taken by a codeword of length id (since it occupies one of the possible leaves at depth ). And the Kraft inequality ensures that the total "space" occupied by all codewords does not exceed the tatal available space (1).

Proof

Necessity: If a prefix code exists, then Kraft holds.

  • Consider the full -ary tree of depth (the longest codeword)
  • A codeword of length blocks leaves (all its descendants)
  • Since no two codewords share a leaf, the total number of blocked leaves is
  • Dividing both sides by gives the inequality

Sufficiency: If Kraft holds, a prefix code exists.

  • Sort the lengths
  • For each length , pick any unused node at depth and block all its descendants.
  • The Kraft Inequality ensures that there is always enough "space" to assign the next codeword without violating the prefix-free property.

Extended Kraft Inequality

For any countably infinite set of codewords that form a prefix code, the codeword lengths satisfy the extended Kraft inequality

Optimal Codes

Minimization via Lagrange multipliers

We want to minimize

over all integers satisfying

We neglect the integer constraint on and assume equality in the constraint. Hence, we can write the constrained minimization using Lagrange multiplier as the minimization of

Setting , we get and

This non-integer choice of codeword lengths yields expected codeword length

Lower bound of expected length

The expected length of any instantaneous -ary code for a random variable is greater than or equal to the entropy ; that is

with equality if an only if .

Bounds on the Optimal Code Length

Let be optimal codeword lengths for a source distribution and a -ary alphabet, and let be the associated expected length of an optimal code (). Then

Therefore, the minimum expected codeword length per symbol satisfies

Wrong code

We can not always precisely estimate the real distribution of data. We consider the Shannon code assignment designed for the probability mass function . Suppose that the true probability mass function . Thus, we will not achieve expected length .

We now show that the increase in expected length due to the incorrect distribution is the relative entropy :

Kraft Inequality for Uniquely Decodable Codes

The codeword lengths of any uniquely decodable -ary code must satisfy the Kraft inequality

Conversely, given a set of codeword lengths that satisfy this inequality, it is possible to construct a uniquely decodable code with these codeword lengths.

Corollary A uniquely decodable code for an infinite source alphabet also satisfies the Kraft inequality.

Huffman Codes

Algorithm

An optimal prefix code for a given distribution can be constructed by a simple algorithm discovered by Huffman.

  1. List the symbols and their probabilities in ascending order of probabilities (or any order if probabilities are the same).
  2. For every iteration:
    1. Find the two symbols with the lowest probabilities
    2. Combine these two symbols into a new node, with a probability equal to the sum of their probabilities
    3. Assign '0' to the branch leading to the first symbol and '1' to the branch leading to the second symbol in our combined node.

Example

Consider a random variable taking in the set with probabilities , , , ,

Now we get

Codeword LengthCodewordX
2111
2002
2013
31004
31015

Tip

If , we may not have a sufficient number of symbols so that we can always combine them at a time. In such a case, we add dummy symbols to the end of the set of symbols. The dummy symbols have probability and are inserted to fill the tree.

Since at each stage of the reduction, the number of symbols is reduced by , we want the total number of symbols to be , where is the number of merges.