CLR cover

Problem 49: Ternary Search Trie

Ternary search tries provide an efficient data structure to store and search a set of words. It figures a clever implementation of the trie of the set in the same way as the Suffix array does for the set of suffixes of a word.

Searching a trie for a pattern starts at the initial state (the root) and proceeds down following the matching arcs until the end of the pattern is met or until no arc matches the current letter. When the alphabet is large, representing arcs outgoing a state can lead to either a waste of space because many arcs have no target, or a waste of time if linear lists are used. The goal of Ternary search tries is to represent them by binary search trees on the outgoing letters.

To do so, each node of the trie has three outgoing arcs: left and right (up and down in the picture) for the binary search tree at the current trie node, and a middle arc to the next trie node.

Describe the data structure to implement a Ternary search trie storing a set of $n$ words and show how to search it for a word of length $m$. Analyse the running time.
Note the analogy with searching a Suffix array.

Notice on the above example that the binary search tree corresponding to the arcs outgoing the initial node of the trie has its root labelled $\sa{s}$ and not the middle letter $\sa{p}$. Indeed, to make the search more efficient binary search trees are weight balanced. The weight corresponds to the number of elements in the subtree. This is why the $\sa{s}$ starting letter of the majority of words is chosen for the root of the binary search tree.

References

  • J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In M. E. Saks, editor, Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 5-7 January 1997, New Orleans, Louisiana., pages 360-369. ACM/SIAM, 1997.
  • J. Clément, P. Flajolet, and B. Vallée. The analysis of hybrid trie structures. In H. J. Karloff, editor, Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 25-27 January 1998, San Francisco, California., pages 531-539. ACM/SIAM, 1998.