# Problem 98: Lempel-Ziv-Welch Decoding

$$\def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \newcommand{\finput}{\textit{input}}%$$

The Lempel-Ziv-Welch compression method is based on a type of Lempel-Ziv factorisation. It consists in encoding repeating factors of the input text by their code in a dictionary $D$ of words. The dictionary, initialised with all the letters of the alphabet $A$, is prefix-closed: every prefix of a word in the dictionary is in it.

Here is the encoding algorithm in which $\textit{code}_D(w)$ is the index of the factor $w$ in the dictionary $D$.

LZW-encoder$(input \textrm{ non-empty word})$
    \begin{algorithmic}
\STATE $D\leftarrow A$
\STATE $w\leftarrow \mbox{first letter of } input$
\WHILE{$\mbox{not end of } input$}
\STATE $a\leftarrow \mbox{next letter of } input$
\IF{$wa\in D$}
\STATE $w\leftarrow wa$
\ELSE
\STATE write $\textit{code}_D(w)$
\STATE $D\leftarrow D\cup\{wa\}$
\STATE $w\leftarrow a$
\ENDIF
\ENDWHILE
\STATE write $\textit{code}_D(w)$
\end{algorithmic}


The decompression algorithm reads the sequence of codes produced by the encoder and updates the dictionary similarly to the way the encoder does.

LZW-decoder$(input \textrm{ non-empty word})$
    \begin{algorithmic}
\STATE $D\leftarrow A$
\WHILE{$\mbox{not end of } input$}
\STATE $i\leftarrow \mbox{next code of } input$
\STATE $w\leftarrow \mbox{factor of code$i$in$D$}$
\STATE write $w$
\STATE $a\leftarrow \mbox{first letter of next decoded factor}$
\STATE $D\leftarrow D\cup\{wa\}$
\ENDWHILE
\end{algorithmic}


Show that during the decoding step Algorithm LZW-decoder can read a code $i$ that does not belong yet to the dictionary $D$ if and only if index $i$ corresponds to the code of $aua$, where $au$ is the previous decoded factor, $a\in A$ and $u\in A^*$.

The question highlights the only critical situation encountered by the decoder. The property provides the element to ensure it can correctly decode its input.

## References

• T. A. Welch. A technique for high-performance data compression. IEEE Computer, 17(6):8-19, 1984.
• J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Information Theory, 23(3):337-343, 1977.