Problem 98: Lempel-Ziv-Welch Decoding |
The Lempel-Ziv-Welch compression method is based on a type of Lempel-Ziv factorisation. It consists in encoding repeating factors of the input text by their code in a dictionary $D$ of words. The dictionary, initialised with all the letters of the alphabet $A$, is prefix-closed: every prefix of a word in the dictionary is in it.
Here is the encoding algorithm in which $\textit{code}_D(w)$ is the index of the factor $w$ in the dictionary $D$.
\begin{algorithmic} \STATE $D\leftarrow A$ \STATE $w\leftarrow \mbox{first letter of } input$ \WHILE{$\mbox{not end of } input$} \STATE $a\leftarrow \mbox{next letter of } input$ \IF{$wa\in D$} \STATE $w\leftarrow wa$ \ELSE \STATE write $\textit{code}_D(w)$ \STATE $D\leftarrow D\cup\{wa\}$ \STATE $w\leftarrow a$ \ENDIF \ENDWHILE \STATE write $\textit{code}_D(w)$ \end{algorithmic}
The decompression algorithm reads the sequence of codes produced by the encoder and updates the dictionary similarly to the way the encoder does.
\begin{algorithmic} \STATE $D\leftarrow A$ \WHILE{$\mbox{not end of } input$} \STATE $i\leftarrow \mbox{next code of } input$ \STATE $w\leftarrow \mbox{factor of code } i \mbox{ in } D$ \STATE write $w$ \STATE $a\leftarrow \mbox{first letter of next decoded factor}$ \STATE $D\leftarrow D\cup\{wa\}$ \ENDWHILE \end{algorithmic}
The question highlights the only critical situation encountered by the decoder. The property provides the element to ensure it can correctly decode its input.