CLR cover

Additional Problem 4: Shrinking a Text by Pairing Adjacent Symbols

\( \newcommand{\Comp}{\mathit{Compress}} \)

One of the most powerfull compression techniques is recompression. In this technique there are two crucial operations: shrinking unary runs and pairing letters.

A unary run is a maximal occurrence of a factor of length at least 2 that is a repetition of the same letter. The first phase of the recompression technique consists in shrinking each unary run into a single letter.

The second phase it to apply the operation $\Comp(x,L,R)$, where $(L,R)$ is a partition of the alphabet $\Sigma$ of letters, $L\cup R=\Sigma$ and $L\cap R=\emptyset$. The compressed word $\Comp(x,L,R)$ results from $x$ by substituting a single letter (identifier of a pair of letters) for each of its 2-letter factor $ab$, whenever $a\in L$ and $b\in R$.

The Pairing problem consists in computing a partition $(L,R)$ of the alphabet of $x$ for which $|\Comp(x,L,R)|\le \frac{3}{4}|x|$.

Let $x$, $|x|\geq 2$, be a word over an integer alphabet containing no unary runs. Show how to compute in linear time a partition $(L,R)$ of the alphabet of $x$ for which $|\Comp(x,L,R)|\le \frac{3}{4}|x|$.

References