CLR cover

Problem 27: Sparse Matching Automaton

The most standard method to locate given patterns in a text processed sequentially is to use a pattern-matching automaton. Border tables used in Algorithm KMP may be viewed as implementations of such an automaton. The aim of the problem is to show another implementation technique that eventually improves searches. Searches using this implementation run at least as fast as Algorithm KMP, then run in linear time executing no more letter comparisons than twice the text length.

The pattern-matching or string-matching automaton $\aSMA(x)$ of a word $x$ drawn from the alphabet $A$ is the minimal deterministic automaton accepting words ending with $x$. It accepts the language $A^*x$ and has $|x|+1$ states, $0, 1, \dots, |x|$, the initial state $0$ and its only accepting state $|x|$. Its transition table $\delta$ is defined, for a state $i$ and a letter $a$, by: $$\delta(i,a) = \max\{s+1 \mid -1\leq s\leq i \mbox{ and } x[0\dd s] \mbox{ suffix of } x[0\dd i-1]\cdot a\}.$$ Observe that the size of the table is $\Omega(|x|^2)$ when the alphabet of $x$ is as large as its length. But the table is very sparse, since most of its values are null.

Show that the table $\delta$ associated with the string-matching automaton of a word of length $n$ has at most $2n$ non-zero entries and that the bound is tight.

References

  • M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.
  • M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific Publishing, Hong-Kong, 2002. 310 pages.
  • D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, 1997.
  • C. Hancart. On Simon's string searching algorithm. Inf. Process. Lett., 47(2):95-99, 1993.
  • I. Simon. String matching algorithms and automata. In R. Baeza-Yates and N. Ziviani, editors, Proceedings of the 1st South American Workshop on String Processing, pages 151-157, Belo Horizonte, Brasil, 1993. Universidade Federal de Minas Gerais.