CLR cover

Problem 22: Prefix Table

\( \def\sa#1{\tt{#1}} \def\tpref{\mathit{pref}} \def\dd{\dot{}\dot{}} \)

The prefix table, like the border table of Problem 19, is a basic tool for building efficient algorithms on words. It is used mostly when searching texts for various types of given patterns.

Let $x$ be a non-empty string. The prefix table of $x$ is defined on its positions $i$, $i=0, \dots, |x|-1$, by: $\tpref[i]$ is the length of the longest prefix of $x$ starting at position $i$. Obviously $\tpref[0]=|x|$.

Here is the prefix table of the word $\sa{abaababaaba}$:
$i$ 012345678910
$x[i]$ $\sa a$$\sa b$$\sa a$$\sa a$$\sa b$$\sa a$$\sa b$$\sa a$$\sa a$$\sa b$$\sa a$
$\tpref[i]$ 110130601301

Show that Algorithm Prefixes computes in linear time the prefix table of its input word $x$ and executes no more than $2|x|-2$ letter comparisons.

Prefixes$(x \textrm{ non-empty word})$
    \begin{algorithmic}
    \STATE $pref[0]\leftarrow |x|$
    \STATE $g\leftarrow 0$
    \FOR{$i\leftarrow 1$ \TO $|x|-1$}
        \IF{$i\lt g \mbox{ and } pref[i-f]\ne g-i$}
            \STATE $pref[i]\leftarrow \min\{pref[i-f],g-i\}$
        \ELSE
            \STATE $(f,g)\leftarrow (i,\max\{g,i\})$
            \WHILE{$g\lt |x|$ and $x[g]=x[g-f]$}
                \STATE $g\leftarrow g+1$
            \ENDWHILE
            \STATE $pref[f]\leftarrow g-f$
        \ENDIF
    \ENDFOR
    \RETURN{$pref$}
    \end{algorithmic}

The key idea in Algorithm Prefixes that computes the table sequentially from left to right is to benefit from what has been computed before the current position.

Prefix Table

When $v=x[f\dd g-1]$ is a prefix of $x$ and position $i$ is between $f$ and $g$ (see picture), the first step for computing $\tpref[i]$ is to check whether or not its value can be deduced from the work done on the prefix occurrence of $v$, that is, at position $i-f$ on $x$. This saves enough work to get a linear-time algorithm.

References

  • M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.
  • M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994. 412 pages.
  • D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, 1997.
  • G. Navarro and M. Raffinot. Flexible pattern matching in strings - practical on-line search algorithms for texts and biological sequences. Cambridge University Press, 2002.
  • B. Smyth. Computing Patterns in Strings. Pearson Education Limited, Harlow, England, 2003. 423 pages.