Problem 34: Worst Case of the Boyer-Moore Algorithm

$ \def\lcs{\mathit{lcs}} \def\per{\mathit{per}} \def\tbsuff{\mathit{good\text{-}suff}} \def\dd{\dot{}\dot{}} $

a Boyer-Moore string matching is based on a technique that leads to the fastest searching algorithms for fixed patterns. Its main feature is to scan the pattern backward when aligned with a factor of the searched text. A typical pattern preprocessing is shown in Problem 33.

When locating a fixed pattern $x$ of length $m$ in a text $y$ of length $n$, in the generic situation $x$ is aligned with a factor (the window) of $y$ ending at position $j$ (see picture). The algorithm computes the longest common suffix ($\lcs$) between $x$ and the factor of $y$; after possibly reporting an occurrence, it slides the window towards the end of $y$ based on the preprocessing and on information collected during the scan, without missing an occurrence of $x$. Algorithm BM implements the method with table $\tbsuff$ of Problem 33.

BM$(x,y \mbox{ non-empty words}, m,n \mbox{ their lengths})$

    \begin{algorithmic}
  \STATE $j\leftarrow m-1$
  \WHILE{$j \lt n$}
    \STATE $i\leftarrow m-1-|lcs(x,y[j-m+1..j])|$
    \IF{$i \lt 0$}
      \STATE report an occurrence of $x$ at position $j-m+1$ on $y$
      \STATE $j\leftarrow j+per(x)$
      \COMMENT{$per(x)=good\text{-}suff[0]$}
    \ELSE
      \STATE $j\leftarrow j+good\text{-}suff[i]$
    \ENDIF
  \ENDWHILE
    \end{algorithmic}

After position $j$ on $y$ is treated, if an occurrence of $x$ is found the algorithm slides naturally the window at distance $\per(x)$. If no occurrence is found, the distance $\tbsuff[i]$ depends on the factor $bu$ of $y$ (it depends on $au$ of $x$ in Problem 33). Value $\per(x)$ and array $\tbsuff$ are preprocessed before the search.

Give examples of a non-periodic pattern and of a text $y$ for which Algorithm BM performs close to $3|y|$ letter comparisons at line 3 for computing the longest common suffix.

References

R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM., 20(10):762-772, 1977.

R. Cole. Tight bounds on the complexity of the Boyer-Moore string matching algorithm. SIAM J. Comput., 23(5):1075-1091, 1994.

M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.

M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994. 412 pages.

M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific Publishing, Hong-Kong, 2002. 310 pages.

D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, 1997.

D. E. Knuth, J. H. Morris Jr., and V. R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323-350, 1977.

B. Smyth. Computing Patterns in Strings. Pearson Education Limited, Harlow, England, 2003. 423 pages.

125 Problems in Text Algorithms

Problem 34: Worst Case of the Boyer-Moore Algorithm

References