Below are the Ternary search trie (left) and the trie (right) of the set \[\{\sa{large},\sa{long},\sa{pattern},\sa{sequence},\sa{short},\sa{string}\}.\]
Problem 49: Ternary Search Trie |
Ternary search tries provide an efficient data structure to store and search a set of words. It figures a clever implementation of the trie of the set in the same way as the Suffix array does for the set of suffixes of a word.
Searching a trie for a pattern starts at the initial state (the root) and proceeds down following the matching arcs until the end of the pattern is met or until no arc matches the current letter. When the alphabet is large, representing arcs outgoing a state can lead to either a waste of space because many arcs have no target, or a waste of time if linear lists are used. The goal of Ternary search tries is to represent them by binary search trees on the outgoing letters.
To do so, each node of the trie has three outgoing arcs: left and right (up and down in the picture) for the binary search tree at the current trie node, and a middle arc to the next trie node.
Notice on the above example that the binary search tree corresponding to the arcs outgoing the initial node of the trie has its root labelled $\sa{s}$ and not the middle letter $\sa{p}$. Indeed, to make the search more efficient binary search trees are weight balanced. The weight corresponds to the number of elements in the subtree. This is why the $\sa{s}$ starting letter of the majority of words is chosen for the root of the binary search tree.