Problem 22: Prefix Table

$ \def\sa#1{\tt{#1}} \def\tpref{\mathit{pref}} \def\dd{\dot{}\dot{}} $

The prefix table, like the border table of Problem 19, is a basic tool for building efficient algorithms on words. It is used mostly when searching texts for various types of given patterns.

Let $x$ be a non-empty string. The prefix table of $x$ is defined on its positions $i$, $i=0, \dots, |x|-1$, by: $\tpref[i]$ is the length of the longest prefix of $x$ starting at position $i$. Obviously $\tpref[0]=|x|$.

Here is the prefix table of the word $\sa{abaababaaba}$:

$i$ 0 1 2 3 4 5 6 7 8 9 10

$x[i]$ $\sa a$ $\sa b$ $\sa a$ $\sa a$ $\sa b$ $\sa a$ $\sa b$ $\sa a$ $\sa a$ $\sa b$ $\sa a$

$\tpref[i]$ 11 0 1 3 0 6 0 1 3 0 1

Show that Algorithm Prefixes computes in linear time the prefix table of its input word $x$ and executes no more than $2|x|-2$ letter comparisons.

Prefixes$(x \textrm{ non-empty word})$

    \begin{algorithmic}
    \STATE $pref[0]\leftarrow |x|$
    \STATE $g\leftarrow 0$
    \FOR{$i\leftarrow 1$ \TO $|x|-1$}
        \IF{$i\lt g \mbox{ and } pref[i-f]\ne g-i$}
            \STATE $pref[i]\leftarrow \min\{pref[i-f],g-i\}$
        \ELSE
            \STATE $(f,g)\leftarrow (i,\max\{g,i\})$
            \WHILE{$g\lt |x|$ and $x[g]=x[g-f]$}
                \STATE $g\leftarrow g+1$
            \ENDWHILE
            \STATE $pref[f]\leftarrow g-f$
        \ENDIF
    \ENDFOR
    \RETURN{$pref$}
    \end{algorithmic}

The key idea in Algorithm Prefixes that computes the table sequentially from left to right is to benefit from what has been computed before the current position.

When $v=x[f\dd g-1]$ is a prefix of $x$ and position $i$ is between $f$ and $g$ (see picture), the first step for computing $\tpref[i]$ is to check whether or not its value can be deduced from the work done on the prefix occurrence of $v$, that is, at position $i-f$ on $x$. This saves enough work to get a linear-time algorithm.

References

M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.

M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994. 412 pages.

D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, 1997.

G. Navarro and M. Raffinot. Flexible pattern matching in strings - practical on-line search algorithms for texts and biological sequences. Cambridge University Press, 2002.

B. Smyth. Computing Patterns in Strings. Pearson Education Limited, Harlow, England, 2003. 423 pages.

125 Problems in Text Algorithms

Example

Prefix Table

Problem 22: Prefix Table

References

$i$	0	1	2	3	4	5	6	7	8	9	10
$x[i]$	$\sa a$	$\sa b$	$\sa a$	$\sa a$	$\sa b$	$\sa a$	$\sa b$	$\sa a$	$\sa a$	$\sa b$	$\sa a$
$\tpref[i]$	11	0	1	3	0	6	0	1	3	0	1