Problem 22: Prefix Table |
The prefix table, like the border table of Problem 19, is a basic tool for building efficient algorithms on words. It is used mostly when searching texts for various types of given patterns.
Let $x$ be a non-empty string. The prefix table of $x$ is defined on its positions $i$, $i=0, \dots, |x|-1$, by: $\tpref[i]$ is the length of the longest prefix of $x$ starting at position $i$. Obviously $\tpref[0]=|x|$.
Here is the prefix table of the word $\sa{abaababaaba}$:
$i$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
$x[i]$ | $\sa a$ | $\sa b$ | $\sa a$ | $\sa a$ | $\sa b$ | $\sa a$ | $\sa b$ | $\sa a$ | $\sa a$ | $\sa b$ | $\sa a$ |
$\tpref[i]$ | 11 | 0 | 1 | 3 | 0 | 6 | 0 | 1 | 3 | 0 | 1 |
\begin{algorithmic} \STATE $pref[0]\leftarrow |x|$ \STATE $g\leftarrow 0$ \FOR{$i\leftarrow 1$ \TO $|x|-1$} \IF{$i\lt g \mbox{ and } pref[i-f]\ne g-i$} \STATE $pref[i]\leftarrow \min\{pref[i-f],g-i\}$ \ELSE \STATE $(f,g)\leftarrow (i,\max\{g,i\})$ \WHILE{$g\lt |x|$ and $x[g]=x[g-f]$} \STATE $g\leftarrow g+1$ \ENDWHILE \STATE $pref[f]\leftarrow g-f$ \ENDIF \ENDFOR \RETURN{$pref$} \end{algorithmic}
The key idea in Algorithm Prefixes that computes the table sequentially from left to right is to benefit from what has been computed before the current position.
When $v=x[f\dd g-1]$ is a prefix of $x$ and position $i$ is between $f$ and $g$ (see picture), the first step for computing $\tpref[i]$ is to check whether or not its value can be deduced from the work done on the prefix occurrence of $v$, that is, at position $i-f$ on $x$. This saves enough work to get a linear-time algorithm.