CLR cover

Problem 99: Cost of a Huffman Code

Huffman compression method applied to a text $x\in A^*$ assigns a binary codeword to each letter of $x$ in order to produce a shortest encoded text. Its principle is that the most frequent letters are given the shortest codewords while the least frequent symbols correspond to the longest codewords.

Codewords form a prefix code (prefix-free set) naturally associated with a binary tree in which the links from a node to its left and right children are labelled by $\sa{0}$ and $\sa{1}$ respectively. Leaves correspond to original letters and labels of branches are their codewords. In the present method codes are complete: internal nodes of the tree all have exactly two children.

The cost of a Huffman code is the sum $\sum_{a\in A} \freq(a) \times |\code(a)|$, where $\code(a)$ is the binary codeword of letter $a$. It is the smallest length of a binary text compressed by the method from a word $x$ in which $\freq(a)=|x|_a$ for each letter $a\in\alph(x)$. Let us consider the following algorithm applied to frequencies (weights).

HuffmanCost$(S \textrm{ list of positive weights})$
    \begin{algorithmic}
  \STATE $result\leftarrow 0$
  \WHILE{$|S|\gt 1$}
    \STATE $p\leftarrow$ MinDelete$(S)$
    \STATE $q\leftarrow$ MinDelete$(S)$
    \STATE add $p+q$ to $S$
    \STATE $result\leftarrow result+p+q$
  \ENDWHILE
  \RETURN{result}
    \end{algorithmic}

Prove that Algorithm HuffmanCost$(S)$ computes the smallest cost of a Huffman code from a list $S$ of item weights.
Consider the Huffman tree associated with the code/

Show how to implement algorithm HuffmanCost$(S)$ so that it runs in linear time when the list $S$ is in increasing order.
Use a queue for inserting the new values (corresponding to internal nodes of the tree).

References

  • D. A. Huffman. A method for the construction of minimum redundancy codes. Proc. I.R.E., 40:1098-1101, 1951.
  • J. van Leeuwen. On the construction of Huffman trees. In ICALP, pages 382-410, 1976.