Below is the common Suffix tree of $x=\sa{aabaa}$ and $y=\sa{babab}$. Grey (resp. white) doubly circled nodes are non-empty suffixes of $x$ (resp. $y$). The node $\sa{aba}$ gives $\LCF(\sa{aabaa},\sa{babab}) = |\sa{aba}| = 3$.
$i$ | 0 | 1 | 2 | 3 | 4 |
$x[i]$ | $\sa{a}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{a}$ |
$j$ | 0 | 1 | 2 | 3 | 4 |
$y[j]$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ |
Problem 50: Longest Common Factor of Two Words |
The problem deals with common factors of two words. It serves as a basis to compare texts and extends to applications such as bio-sequence alignment or plagiarism detection.
Let $\LCF(x,y)$ denote the maximal length of factors that appear in two given words $x$ and $y$ drawn from the alphabet $A$. A straightforward solution to compute it is to build the common Suffix tree of $x$ and $y$. Nodes are prefixes of their suffixes. A deepest node whose subtree contains both suffixes of $x$ and suffixes of $y$ gives the answer, its depth. This can also be done with the Suffix tree of $x\endmarker y$, where $\endmarker$ is a letter that does not appear in $x$ nor in $y$.
The time to compute the tree is $O(|xy|\log|A|)$, or $O(|xy|)$ on linearly sortable alphabets (see Problem 47), and the required space is $O(|xy|)$.
The goal of the problem is to reduce the size of the data structure to that of only one word, contrary to the above solution.