Example

$ \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \def\LCF{\mathit{LCF}} \def\endmarker{\sharp} $

Below is the common Suffix tree of $x=\sa{aabaa}$ and $y=\sa{babab}$. Grey (resp. white) doubly circled nodes are non-empty suffixes of $x$ (resp. $y$). The node $\sa{aba}$ gives $\LCF(\sa{aabaa},\sa{babab}) = |\sa{aba}| = 3$.

$i$	0	1	2	3	4
$x[i]$	$\sa{a}$	$\sa{a}$	$\sa{b}$	$\sa{a}$	$\sa{a}$

$j$	0	1	2	3	4
$y[j]$	$\sa{b}$	$\sa{a}$	$\sa{b}$	$\sa{a}$	$\sa{b}$

Problem 50: Longest Common Factor of Two Words

The problem deals with common factors of two words. It serves as a basis to compare texts and extends to applications such as bio-sequence alignment or plagiarism detection.

Let $\LCF(x,y)$ denote the maximal length of factors that appear in two given words $x$ and $y$ drawn from the alphabet $A$. A straightforward solution to compute it is to build the common Suffix tree of $x$ and $y$. Nodes are prefixes of their suffixes. A deepest node whose subtree contains both suffixes of $x$ and suffixes of $y$ gives the answer, its depth. This can also be done with the Suffix tree of $x\endmarker y$, where $\endmarker$ is a letter that does not appear in $x$ nor in $y$.

The time to compute the tree is $O(|xy|\log|A|)$, or $O(|xy|)$ on linearly sortable alphabets (see Problem 47), and the required space is $O(|xy|)$.

The goal of the problem is to reduce the size of the data structure to that of only one word, contrary to the above solution.

Design an algorithm to compute $\LCF(x,y)$ using the Suffix automaton (or the Suffix tree) of only one word. Analyse the time and space complexity of the whole computation.

Use the indexing structure as a search machine.

References

M. Crochemore. Longest common factor of two words. In H. Ehrig, R. A. Kowalski, G. Levi, and U. Montanari, editors, TAPSOFT'87: Proceedings of the International Joint Conference on Theory and Practice of Software Development, Pisa, Italy, March 23-27, 1987, Volume 1: Advanced Seminar on Foundations of Innovative Software Development I and Colloquium on Trees in Algebra and Programming (CAAP'87), volume 249 of Lecture Notes in Computer Science, pages 26-36. Springer, 1987.

M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.

A. Hartman and M. Rodeh. Optimal parsing of strings. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume 12 of NATO ASI Series F: Computer and System Sciences, pages 155-167, Berlin, 1985. Springer.

125 Problems in Text Algorithms

Example

Problem 50: Longest Common Factor of Two Words

References