CLR cover

Problem 100: Length-Limited Huffman Coding

Given the frequencies of alphabet letters, the Huffman algorithm builds an optimal prefix code to encode the letters in such a way that encodings are as short as possible. In the general case there is no constraint on the length of the codewords. But sometimes one may want to bound the codeword length. Building a code satisfying such a constraint is the subject of this problem.

The coin collector's problem is an example of where the constraint is used. Collectors have coins with two independent properties: denominations (currency values) and numismatic values (collector values). Their goal is to collect a sum $N$ while minimising the total numismatic value.

Let denominations be integer powers of $2$: $2^{-i}$ with $1\leq i \leq L$. Coins are organised as follows: there is a list for each denomination in which coins are sorted in increasing order of their numismatic values.

The method consists in grouping adjacent coins two by two in the list of smaller denominations, dropping the last coin if their number is odd. The numismatic value of a package is the sum of numismatic values of the two coins. Newly formed packages are associated with the coins of the next smallest denomination (sorted in increasing numismatic value). The process is repeated until the list of coins of denomination $2^{-1}$ is processed.

Design an algorithm that computes for a list of $n$ frequencies an optimal length-limited Huffman code in which no codeword is longer than $L$ and that runs in time $O(nL)$.
Reduce the problem to the binary coin collector's problem.

Algorithm PackageMerge$(S,L)$ implements the strategy described in the example for a set $S$ of coins with denominations between $2^{-L}$ and $2^{-1}$. Package$(S)$ groups two by two consecutive items of $S$ and Merge$(S,P)$ merges two sorted lists.

Eventually, the first $N$ items of the list PackageMerge$(S,L)$ have the lowest numismatic values and are selected to form the solution.

PackageMerge$(S \textrm{ set of coins}, L)$
    \begin{algorithmic}
  \FOR{$d\leftarrow 1$ \TO $L$}
    \STATE $S_d\leftarrow \mbox{list of coins of $S$ with denomination $2^{-}$ $^d$}$
    \STATE $\qquad$ sorted by increasing numismatic value
  \ENDFOR
  \FOR{$d\leftarrow L$ \DOWNTO $1$}
    \STATE $P\leftarrow$ Package$(S_d)$
    \STATE $S_{d-1}\leftarrow$ Merge$(S_{d-1},P)$
  \ENDFOR
  \RETURN{$S_0$}
    \end{algorithmic}

Both Package$(S')$ and Merge$(S',P')$ run in linear time according to $n=|S|$. Thus, provided that the lists of coins are already sorted, the algorithm PackageMerge$(S,L)$ runs in time and space $O(nL)$.

References

  • J. Katajainen, A. Moffat, and A. Turpin. A fast and space-economical algorithm for length-limited coding. In J. Staples, P. Eades, N. Katoh, and A. Moffat, editors, Algorithms and Computation, 6th International Symposium, ISAAC '95, Cairns, Australia, December 4-6, 1995, Proceedings, volume 1004 of Lecture Notes in Computer Science, pages 12-21. Springer, 1995.
  • L. L. Larmore and D. S. Hirschberg. A fast algorithm for optimal lengthlimited Huffman codes. J. ACM, 37(3):464-473, 1990.
  • B. Schieber. Computing a minimum weight-link path in graphs with the concave Monge property. J. Algorithms, 29(2):204-222, 1998.