# Problem 46: Computing Longest Common Prefixes

$$\def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \def\SA{\mathrm{SA}} \def\LCP{\mathrm{LCP}}$$

The Suffix array of a non-empty word $y$ is a light and efficient solution for text indexing. It consists in using a binary search procedure to locate patterns inside $y$. To do so the suffixes of $y$ are first sorted in lexicographic order, producing a table $\SA$ that lists the starting positions of the sorted suffixes.

But this standard technique is not sufficient to get a powerful search method. This is why the table $\SA$ is adjoined a second table $\LCP$ that gives the length of longest common prefixes between consecutive suffixes in the sorted list (some more values easy to deduce are also needed). Using both tables, searching $y$ for a word $x$ is then achieved in time $O(|x|+\log |y|)$ instead of a straightforward $O(|x|\log |y|)$ time without the table $\LCP$. Here is the Suffix array of $\sa{abaabababbabbb}$:
 $j$ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 $y[j]$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{b}$ $\sa{b}$ Rank $r$ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 $\SA[r]$ 2 0 3 5 7 10 13 1 4 6 9 12 8 11 $\LCP[r]$ 0 1 3 4 2 3 0 1 2 3 4 1 2 2 0
where $\LCP[r] = |\mbox{lcp}(y[\SA[r-1]\dd |y|-1],y[\SA[r]\dd |y|-1])|$.

Given the table $\SA$ for the word $y$, show that Algorithm Lcp computes the associated table $\LCP$ in linear time.

Lcp$(y \textrm{ non-empty word})$
    \begin{algorithmic}
\FOR{$r \leftarrow 0$ \TO $|y|-1$}
\STATE $\mathrm{Rank}[\mathrm{SA}[r]] \leftarrow r$
\ENDFOR
\STATE $\ell \leftarrow 0$
\FOR{$j \leftarrow 0$ \TO $|y|-1$}
\STATE $\ell \leftarrow \max\{0,\ell-1\}$
\IF{$\mathrm{Rank}[j] \gt 0$}
\WHILE{$\max\{j+\ell,\mathrm{SA}[\mathrm{Rank}[j]-1]+\ell\}\lt |y|$ \AND
$y[j+\ell]=y[\mathrm{SA}[\mathrm{Rank}[j]-1]+\ell]$}
\STATE $\ell \leftarrow \ell+1$
\ENDWHILE
\ELSE
\STATE $\ell \leftarrow 0$
\ENDIF
\STATE $\mathrm{LCP}[\mathrm{Rank}[j]] \leftarrow \ell$
\ENDFOR
\STATE $\mathrm{LCP}[|y|] \leftarrow 0$
\RETURN{$\mathrm{LCP}$}
\end{algorithmic}


Note the solution is counterintuitive, since it looks natural to compute the values $\LCP[r]$ sequentially, that is, by processing suffixes in the increasing order of their ranks. But this does not readily produce a linear-time algorithm. Instead, Algorithm Lcp processes the suffixes from the longest to the shortest, which is its key feature and is more efficient.

## References

• M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.
• T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In A. Amir and G. M. Landau, editors, Combinatorial Pattern Matching, 12th Annual Symposium, CPM 2001 Jerusalem, Israel, July 1-4, 2001 Proceedings, volume 2089 of Lecture Notes in Computer Science, pages 181-192. Springer, 2001.