Encoding: Concatenate the codewords representing each character of the file
Example: Encode file "abc" using the codewords above
Note: "." denotes the concatenation operation. It is just for illustration purposes, and does not exist in the encoded string.
Convenient representation for the prefix code:
Binary codeword for a character is the path from the
root to that character in the binary tree
"" means "go to the left child"
"" means "go to the right child"
Weight of an internal node: sum of weights of the leaves in its subtree
The binary tree corresponding to the optimal variable-length code
An optimal code for a file is always represented by a full binary tree
Consider an FBT corresponding to an optimal prefix code
It has leaves (external nodes)
One for each letter of the alphabet where is the alphabet from which the characters are drawn
Lemma: An FBT with external nodes has exactly internal nodes
Lemma: Let each internal node i is labeled with
the sum of the weight of the leaves in its subtree
Then
where is the set of internal nodes of
Proof: Consider a leaf node with &
Problem Formulation: For a given character set C, construct an optimal prefix code with the minimum total cost
Huffman invented a greedy algorithm that constructs an optimal prefix code called a Huffman code
The greedy algorithm
A priority queue , keyed on , is used
to identify the two least-frequent objects to merge
The result of the merger of two objects is a new object
We need to prove:
What is the greedy step in Huffman’s algorithm?
We will first prove the greedy choice property
Outline of the proof:
Proof: Let be an arbitrary optimal solution where:
Reminder:
The difference in cost between and :
We can similarly show that
Since is optimal is also optimal
Note: contains our greedy choice:
Hence, the proof for the greedy choice property is complete
Lemma 1 implies that
We have already proved that , that is,
At each step Huffman chooses the merger that incurs the least cost
Reminder:
Try to express in terms of .
Note: All characters in have the same depth in and .
Reminder:
Contradiction! Proof complete