CLR cover

Problem 50: Longest Common Factor of Two Words

The problem deals with common factors of two words. It serves as a basis to compare texts and extends to applications such as bio-sequence alignment or plagiarism detection.

Let $\LCF(x,y)$ denote the maximal length of factors that appear in two given words $x$ and $y$ drawn from the alphabet $A$. A straightforward solution to compute it is to build the common Suffix tree of $x$ and $y$. Nodes are prefixes of their suffixes. A deepest node whose subtree contains both suffixes of $x$ and suffixes of $y$ gives the answer, its depth. This can also be done with the Suffix tree of $x\endmarker y$, where $\endmarker$ is a letter that does not appear in $x$ nor in $y$.

The time to compute the tree is $O(|xy|\log|A|)$, or $O(|xy|)$ on linearly sortable alphabets (see Problem 47), and the required space is $O(|xy|)$.

The goal of the problem is to reduce the size of the data structure to that of only one word, contrary to the above solution.

Design an algorithm to compute $\LCF(x,y)$ using the Suffix automaton (or the Suffix tree) of only one word. Analyse the time and space complexity of the whole computation.
Use the indexing structure as a search machine.

References

  • M. Crochemore. Longest common factor of two words. In H. Ehrig, R. A. Kowalski, G. Levi, and U. Montanari, editors, TAPSOFT'87: Proceedings of the International Joint Conference on Theory and Practice of Software Development, Pisa, Italy, March 23-27, 1987, Volume 1: Advanced Seminar on Foundations of Innovative Software Development I and Colloquium on Trees in Algebra and Programming (CAAP'87), volume 249 of Lecture Notes in Computer Science, pages 26-36. Springer, 1987.
  • M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.
  • A. Hartman and M. Rodeh. Optimal parsing of strings. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume 12 of NATO ASI Series F: Computer and System Sciences, pages 155-167, Berlin, 1985. Springer.