CLR cover

Problem 61: Greedy Superstring

A superstring of a set of words can be used to store the set in a compact way. Formally, a common superstring of a set $X$ of words is a word $z$ in which all elements of $X$ occur as factors, that is, $X\subseteq\Fact(z)$. The shortest common superstring of $X$ is denoted by $\SCS(X)$.

The greedy paradigm leads to a simple algorithm that produces a fairly good approximation of $\SCS(X)$. The goal of the problem is to show an efficient implementation of the method.

For two words $u$ and $v$, $\Overlap(u,v)$ is the longest suffix of $u$ that is a prefix of $v$. If $w=\Overlap(u,v)$, $u=u'w$ and $v=wv'$, $u\otimes v$ is defined as the word $u'wv'$. Note that $\SCS(u,v)$ is either $u\otimes v$ or $v\otimes u$. Also note that a word in $X$ factor of another word in $X$ can be discarded without changing $\SCS(X)$. Then $X$ is supposed to be factor free.

GreedySCS$(X \textrm{ non-empty factor-free set of words})$
    \begin{algorithmic}
  \IF{$|X|=1$}
    \RETURN{$x \in X$}
  \ELSE
    \STATE let $x,y\in X, x\neq y$, with $|Overlap(x,y)|$ maximal
    \RETURN{GreedySCS$(X \setminus \{x,y\} \cup \{x\otimes y\}$}
  \ENDIF 
    \end{algorithmic}

For a set $X$ of words drawn from the alphabet $\{\sa{1}, \sa{2}, \dots,\sa{n}\}$ show how to implement the algorithm so that GreedySCS$(X)$ runs in time $O(\Sigma\{|x| \mid x\in X\})$.

References

  • J. Tarhio and E. Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci., 57:131-145, 1988.