CLR cover

Problem 107: Compression Ratio of Greedy Superstrings

\( \newcommand{\Greedy}{\mathit{Greedy}}% \newcommand{\GreedyHam}{\mathit{GreedyHam}}% \)

The problem considers Algorithm GreedySCS (presented under different forms in Problem 61) that computes a superstring $\Greedy(X)$ for a set $X$ of words of total length $n$. The superstring can be viewed as a compressed text representing all words in $X$ and from this point of view it is interesting to quantify the gain of representing $X$ by a superstring.

Let $\mathit{GrCompr}(X) = n-|\Greedy(X)|$ denote the compression achieved by the greedy algorithm. Similarly define $\mathit{OptCompr}(X)=n-|\mathit{OPT}(X)|$ where $\mathit{OPT}$ is an optimal (unknown) superstring for $X$.

The fraction $\frac{\mathit{GrCompr}(X)}{\mathit{OptCompr}(X)}$ is called the compression ratio of Algorithm GreedySCS.

Show the compression ratio of Algorithm GreedySCS is at least $1/2$.
Consider the overlap graph of the input set.

References

  • J. Tarhio and E. Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci., 57:131-145, 1988.