CLR cover

Problem 55: Bare Suffix Tree

\( \newcommand{\depth}{\mathit{depth}}% \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \)

Suffix trees provide a data structure for indexing texts. Optimal-time constructions of them suffer from a rather high memory requirement, larger than for Suffix arrays with the same usage. The problem deals with a moderately efficient but not completely naive and very simple construction of Suffix trees.

The Suffix tree $T$ of a word $x$ ending with a unique end marker is the compacted trie of suffixes of $x$. A leaf corresponds to a suffix and an internal node to a factor having at least two occurrences followed by different letters. Each edge is labelled by a factor $x[i\dd j]$ of $x$, represented by the pair $(i,j)$. Its word-length is $|x[i\dd j]|=j-i+1$. The word-length of a path in $T$ is the sum of word-lengths of its edges, while the length of the path is its number of edges. Let $\depth(T)$ be the maximum length of a path in $T$ from the root to a leaf. Let $l_i$ be the leaf ending the branch labelled by $x[i\dd n-1]$.

Design a construction of the Suffix tree $T$ of a word $x$ using no additional array and running in time $O(|x| \depth(T))$ on a fixed-size alphabet.

References

  • E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262-272, 1976.
  • E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995.