CLR cover

Problem 28: Comparison-Effective String Matching

When the access to textual data is through equality comparisons between symbols only, it is appropriate to reduce their number during searches for patterns. For example, Algorithm KMP (in Problem 26) executes in the worst case $2|y|-1$ letter comparisons to find occurrences of a preprocessed pattern $x$ in a text $y$. The same result holds when the search uses a string-matching automaton with an appropriate implementation.

The problem shows that the number of letter comparisons can easily be reduced to only $|y|$ comparisons for simple patterns. A similar approach, sketched at the end of the solution, leads to a maximum of $\frac{3}{2}|y|$ letter comparisons for general patterns.

Design an algorithm searching a text $y$ for all occurrences of a two-letter pattern $x$ and using at most $|y|$ comparisons in the equality model.
Distinguish whether the two letters are identical or not.

References

  • A. Apostolico and M. Crochemore. Optimal canonization of all substrings of a string. Inf. Comput., 95(1):76-95, 1991.
  • D. Breslauer, L. Colussi, and L. Toniolo. Tight comparison bounds for the string prefix-matching problem. Inf. Process. Lett., 47(1):51-57, 1993.
  • R. Cole and R. Hariharan. Tighter upper bounds on the exact complexity of string matching. SIAM J. Comput., 26(3):803-856, 1997.
  • Z. Galil and R. Giancarlo. On the exact complexity of string matching: Upper bounds. SIAM J. Comput., 21(3):407-437, 1992.
  • C. Hancart. On Simon's string searching algorithm. Inf. Process. Lett., 47(2):95-99, 1993.