Problem 35: Turbo-BM algorithm |
The problem shows how a very light modification of Boyer-Moore string matching produces a much faster algorithm when the method is used to locate all the pattern occurrences in a text, like Algorithm BM in Problem 34 does. The initial design of the Boyer-Moore algorithm was to find the first pattern occurrence. It is known that for that goal it runs in linear time with respect to the searched text length. But it has a quadratic worst-case running time to report all pattern occurrences, especially for periodic patterns. This is due to its amnesic aspect when it slides the window to the next position.
The goal of Algorithm Turbo-BM is to get a linear-time search by adapting the Boyer-Moore search algorithm, Algorithm BM, without changing its pattern preprocessing, table $\tbsuff$. Only constant extra space is added and used during the search.
Algorithm Turbo-BM uses the lengths $\memo=|u|$ of the previous suffix match $u$ and $\ell=|v|$ of the current suffix match $v$ to compute $\max\{\tbsuff[i],|u|-|v|\}$.
In the example below, the first match $u=\sa{bababa}$ of length $6$ leads to a slide of length $4=\tbsuff[4]$. After sliding the window, the new match $v=\sa{a}$ alone would give a slide of length $1=\tbsuff[9]$. But the turbo-shift applies and produces a slide of length $\turb=6-1=5$.
\begin{algorithmic} \COMMENT{jumping over memory if $mem \gt 0$} \STATE $(j,mem) \leftarrow (m-1,0)$ \WHILE{$j \lt n$} \STATE $i \leftarrow m-1-|lcs(x,y[j-m+1..j])|$ \IF{$i \lt 0$} \STATE report an occurrence of $x$ at position $j-m+1$ on $y$ \STATE $(shift,\ell) \leftarrow (per(x),m-per(x))$ \ELSE \STATE $(shift,\ell) \leftarrow (good\text{-}suff[i],m-i-1)$ \ENDIF \STATE $turbo \leftarrow mem-(m-i-1)$ \IF{$turbo\gt shift$} \STATE $(j,mem) \leftarrow (j+turbo,0)$ \ELSE \STATE $(j,mem) \leftarrow (j+shift,\ell)$ \ENDIF \ENDWHILE \end{algorithmic}