Problem 34: Worst Case of the Boyer-Moore Algorithm |
a Boyer-Moore string matching is based on a technique that leads to the fastest searching algorithms for fixed patterns. Its main feature is to scan the pattern backward when aligned with a factor of the searched text. A typical pattern preprocessing is shown in Problem 33.
When locating a fixed pattern $x$ of length $m$ in a text $y$ of length $n$, in the generic situation $x$ is aligned with a factor (the window) of $y$ ending at position $j$ (see picture). The algorithm computes the longest common suffix ($\lcs$) between $x$ and the factor of $y$; after possibly reporting an occurrence, it slides the window towards the end of $y$ based on the preprocessing and on information collected during the scan, without missing an occurrence of $x$. Algorithm BM implements the method with table $\tbsuff$ of Problem 33.
\begin{algorithmic} \STATE $j\leftarrow m-1$ \WHILE{$j \lt n$} \STATE $i\leftarrow m-1-|lcs(x,y[j-m+1..j])|$ \IF{$i \lt 0$} \STATE report an occurrence of $x$ at position $j-m+1$ on $y$ \STATE $j\leftarrow j+per(x)$ \COMMENT{$per(x)=good\text{-}suff[0]$} \ELSE \STATE $j\leftarrow j+good\text{-}suff[i]$ \ENDIF \ENDWHILE \end{algorithmic}
After position $j$ on $y$ is treated, if an occurrence of $x$ is found the algorithm slides naturally the window at distance $\per(x)$. If no occurrence is found, the distance $\tbsuff[i]$ depends on the factor $bu$ of $y$ (it depends on $au$ of $x$ in Problem 33). Value $\per(x)$ and array $\tbsuff$ are preprocessed before the search.