Problem 46: Computing Longest Common Prefixes

$ \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \def\SA{\mathrm{SA}} \def\LCP{\mathrm{LCP}} $

The Suffix array of a non-empty word $y$ is a light and efficient solution for text indexing. It consists in using a binary search procedure to locate patterns inside $y$. To do so the suffixes of $y$ are first sorted in lexicographic order, producing a table $\SA$ that lists the starting positions of the sorted suffixes.

But this standard technique is not sufficient to get a powerful search method. This is why the table $\SA$ is adjoined a second table $\LCP$ that gives the length of longest common prefixes between consecutive suffixes in the sorted list (some more values easy to deduce are also needed). Using both tables, searching $y$ for a word $x$ is then achieved in time $O(|x|+\log |y|)$ instead of a straightforward $O(|x|\log |y|)$ time without the table $\LCP$. Here is the Suffix array of $\sa{abaabababbabbb}$:

$j$ 0 1 2 3 4 5 6 7 8 9 10 11 12 13

$y[j]$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{b}$ $\sa{b}$

Rank $r$ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

$\SA[r]$ 2 0 3 5 7 10 13 1 4 6 9 12 8 11

$\LCP[r]$ 0 1 3 4 2 3 0 1 2 3 4 1 2 2 0

where $\LCP[r] = |\mbox{lcp}(y[\SA[r-1]\dd |y|-1],y[\SA[r]\dd |y|-1])|$.

Given the table $\SA$ for the word $y$, show that Algorithm Lcp computes the associated table $\LCP$ in linear time.

Lcp$(y \textrm{ non-empty word})$

    \begin{algorithmic}
  \FOR{$r \leftarrow 0$ \TO $|y|-1$}
    \STATE $\mathrm{Rank}[\mathrm{SA}[r]] \leftarrow r$
  \ENDFOR 
  \STATE $\ell \leftarrow 0$
  \FOR{$j \leftarrow 0$ \TO $|y|-1$}
    \STATE $\ell \leftarrow \max\{0,\ell-1\}$
    \IF{$\mathrm{Rank}[j] \gt 0$}
      \WHILE{$\max\{j+\ell,\mathrm{SA}[\mathrm{Rank}[j]-1]+\ell\}\lt |y|$ \AND
        $y[j+\ell]=y[\mathrm{SA}[\mathrm{Rank}[j]-1]+\ell]$}
        \STATE $\ell \leftarrow \ell+1$
      \ENDWHILE 
    \ELSE
      \STATE $\ell \leftarrow 0$
    \ENDIF 
    \STATE $\mathrm{LCP}[\mathrm{Rank}[j]] \leftarrow \ell$
  \ENDFOR 
  \STATE $\mathrm{LCP}[|y|] \leftarrow 0$
\RETURN{$\mathrm{LCP}$}
    \end{algorithmic}

Note the solution is counterintuitive, since it looks natural to compute the values $\LCP[r]$ sequentially, that is, by processing suffixes in the increasing order of their ranks. But this does not readily produce a linear-time algorithm. Instead, Algorithm Lcp processes the suffixes from the longest to the shortest, which is its key feature and is more efficient.

References

M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.

T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In A. Amir and G. M. Landau, editors, Combinatorial Pattern Matching, 12th Annual Symposium, CPM 2001 Jerusalem, Israel, July 1-4, 2001 Proceedings, volume 2089 of Lecture Notes in Computer Science, pages 181-192. Springer, 2001.

125 Problems in Text Algorithms

Example

Computing longest common prefixes

Problem 46: Computing Longest Common Prefixes

References

$j$	0	1	2	3	4	5	6	7	8	9	10	11	12	13
$y[j]$	$\sa{a}$	$\sa{b}$	$\sa{a}$	$\sa{a}$	$\sa{b}$	$\sa{a}$	$\sa{b}$	$\sa{a}$	$\sa{b}$	$\sa{b}$	$\sa{a}$	$\sa{b}$	$\sa{b}$	$\sa{b}$
Rank $r$	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14
$\SA[r]$	2	0	3	5	7	10	13	1	4	6	9	12	8	11
$\LCP[r]$	0	1	3	4	2	3	0	1	2	3	4	1	2	2	0