CLR cover

Problem 101: Online Huffman Coding

\( \def\weight{\mathit{weight}} \)

The two main drawbacks of the static Huffman compression method arethat first, if the frequencies of letters in the source text are not known a priori, the source text has to be scanned twice and second, the Huffman coding tree must be included in the compressed file. The problem shows a solution that avoids these drawbacks.

The solution is based on a dynamic method in which the coding tree is updated each time a symbol is read from the source text. The current Huffman tree relates to the part of the text that has already been processed and evolves exactly in the same way during the decoding process.

Design a Huffman compression method that reads only once the source text and does not need to store the coding tree in the compressed text.
Huffman trees are characterised by the siblings property.

Siblings property

Let $T$ be a Huffman tree with $n$ leaves (a complete binary weighted tree in which all leaves have positive weights). Nodes of $T$ can be arranged in a list $(t_0,t_1,\ldots,t_{2n-2})$ that satisfies

References

  • G. V. Cormack and R. N. Horspool. Algorithms for adaptive Huffman codes. Inf. Process. Lett., 18(3):159-165, 1984.
  • N. Faller. An adaptive system for data compression. In Record of the 7th Asilomar Conference on Circuits, Systems, and Computers, pages 593-597, 1973.
  • A. Fruchtman, Y. Gross, S. T. Klein, and D. Shapira. Weighted adaptive Huffman coding. In A. Bilgin, M. W. Marcellin, J. Serra-Sagristà, and J. A. Storer, editors, Data Compression Conference, DCC 2020, Snowbird, UT, USA, March 24-27, 2020, pages 368-385. IEEE, 2020. http://arxiv.org/abs/2005.08232v1.
  • R. G. Gallager. Variations on a theme by Huffman. IEEE Trans. Information Theory, 24(6):668-674, 1978.
  • D. E. Knuth. Dynamic Huffman coding. J. Algorithms, 6(2):163-180, 1985.
  • J. S. Vitter. Design and analysis of dynamic Huffman codes. J. ACM, 34(4):825-845, 1987.