|
Problem 28: Comparison-Effective String Matching
|
|
When the access to textual data is through equality comparisons between
symbols only, it is appropriate to reduce their number during searches for
patterns.
For example, Algorithm KMP (in Problem 26) executes
in the worst case $2|y|-1$ letter comparisons to find occurrences of a
preprocessed pattern $x$ in a text $y$.
The same result holds when the search
uses a string-matching automaton with an appropriate implementation.
The problem shows that the number of letter comparisons can easily be reduced
to only $|y|$ comparisons for simple patterns.
A similar approach, sketched
at the end of the solution, leads to a maximum of $\frac{3}{2}|y|$ letter
comparisons for general patterns.
Design an algorithm searching a text $y$ for all occurrences of a two-letter
pattern $x$ and using at most $|y|$ comparisons in the equality model.
Distinguish whether the two letters are identical or not.
References
A. Apostolico and M. Crochemore. Optimal canonization of all substrings
of a string. Inf. Comput., 95(1):76-95, 1991.
D. Breslauer, L. Colussi, and L. Toniolo. Tight comparison bounds for
the string prefix-matching problem. Inf. Process. Lett., 47(1):51-57,
1993.
R. Cole and R. Hariharan. Tighter upper bounds on the exact complexity
of string matching. SIAM J. Comput., 26(3):803-856, 1997.
Z. Galil and R. Giancarlo. On the exact complexity of string matching:
Upper bounds. SIAM J. Comput., 21(3):407-437, 1992.
C. Hancart. On Simon's string searching algorithm. Inf. Process. Lett.,
47(2):95-99, 1993.