Problem 33: Good-Suffix Table |
The Boyer-Moore algorithm (BM in Problem 34) applies the sliding window strategy on the text to locate occurrences of a pattern. It requires a pattern preprocessing to accelerate the search.
At a given step, the algorithm compares the pattern and a window on the text by computing their longest common suffix $u$. If $u=x$ a match occurs. Otherwise, in the generic situation, pattern $x[0\dd m-1]$ is aligned with the window, factor $y[j-m+1\dd j]$ of the text, $au$ is a suffix of $x$ and $bu$ a suffix of the window, for different letters $a$ and $b$.
To continue the search, Algorithm BM slides the window according to the period of $x$ in case of a match or otherwise to the factor $bu$ of the text to avoid positions of the window where no occurrence of $x$ is possible. To do so it uses the good-suffix table $\tbsuff$ defined for a position $i$ on $x$ and $u=x[i+1\dd m-1]$ by $$\tbsuff[i]=\min\{|v| \mid x \mbox{ suffix of } uv \mbox{ or } cuv \mbox{ suffix of } x, c\neq x[i]\}.$$
The condition "$cuv \mbox{ suffix of } x$" with $c\neq a=x[i]$ ensures that when letter $c$ is aligned with letter $b=y[j-m+1+i]$ after sliding the window, the same mismatch does not re-occur immediately. From the definition note that $\tbsuff[0]=\per(x)$.