CLR cover

Problem 98: Lempel-Ziv-Welch Decoding

\( \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \newcommand{\finput}{\textit{input}}% \)

The Lempel-Ziv-Welch compression method is based on a type of Lempel-Ziv factorisation. It consists in encoding repeating factors of the input text by their code in a dictionary $D$ of words. The dictionary, initialised with all the letters of the alphabet $A$, is prefix-closed: every prefix of a word in the dictionary is in it.

Here is the encoding algorithm in which $\textit{code}_D(w)$ is the index of the factor $w$ in the dictionary $D$.

LZW-encoder$(input \textrm{ non-empty word})$
    \begin{algorithmic}
  \STATE $D\leftarrow A$
  \STATE $w\leftarrow \mbox{first letter of } input$
  \WHILE{$\mbox{not end of } input$}
    \STATE $a\leftarrow \mbox{next letter of } input$
    \IF{$wa\in D$} 
      \STATE $w\leftarrow wa$
    \ELSE
      \STATE write $\textit{code}_D(w)$
      \STATE $D\leftarrow D\cup\{wa\}$
      \STATE $w\leftarrow a$
    \ENDIF
  \ENDWHILE
  \STATE write $\textit{code}_D(w)$
    \end{algorithmic}

The decompression algorithm reads the sequence of codes produced by the encoder and updates the dictionary similarly to the way the encoder does.

LZW-decoder$(input \textrm{ non-empty word})$
    \begin{algorithmic}
  \STATE $D\leftarrow A$
  \WHILE{$\mbox{not end of } input$}
    \STATE $i\leftarrow \mbox{next code of } input$
    \STATE $w\leftarrow \mbox{factor of code $i$ in $D$}$
    \STATE write $w$
    \STATE $a\leftarrow \mbox{first letter of next decoded factor}$
    \STATE $D\leftarrow D\cup\{wa\}$
  \ENDWHILE
    \end{algorithmic}

Show that during the decoding step Algorithm LZW-decoder can read a code $i$ that does not belong yet to the dictionary $D$ if and only if index $i$ corresponds to the code of $aua$, where $au$ is the previous decoded factor, $a\in A$ and $u\in A^*$.

The question highlights the only critical situation encountered by the decoder. The property provides the element to ensure it can correctly decode its input.

References

  • T. A. Welch. A technique for high-performance data compression. IEEE Computer, 17(6):8-19, 1984.
  • J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Information Theory, 23(3):337-343, 1977.