CLR cover

Suffix Structures

Suffix structures that store the suffixes of a word are important data structures used to produce efficient indexes. Tries can be used as such but their size can be quadratic. One solution to cope with that is to compact the trie, resulting in the Suffix tree of the word. It consists in eliminating non-terminal nodes with only one outgoing edge and in labelling arcs by factors of the word accordingly. Eliminated nodes are sometimes called implicit nodes of the Suffix tree and remaining nodes are called explicit nodes.

A second solution to reduce the size of the Suffix trie is to minimise it, which means considering the minimal deterministic automaton accepting the suffixes of the word, its Suffix automaton.

It is known that $\aSuff(x)$ possesses fewer than $2|x|$ states and fewer than $3|x|$ arcs, for a total size $O(|x|)$, that is, linear in $|x|$. The Factor automaton $\aFact(x)$ of the word, minimal deterministic automaton accepting its factors, can even be smaller because all its states are terminal.