CLR cover

Problem 74: Computing Runs on General Alphabets

\( \newcommand{\offset}{\mathit{offset}}% \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \)

The goal of the problem is to design an algorithm for computing runs in a word without any extra assumption on the alphabet. To say it differently, the algorithm should use the equality model on letters, that is, use only $=$/$\neq$ letter comparisons when necessary.

Problem 87 deals with computing runs on linear-sortable alphabets, a necessary condition to obtain a linear-time algorithm.

A run in a word $x$ is a maximal periodicity or a maximal occurrence of a periodic factor of $x$. Formally, it is an interval of positions $[i\dd j]$ for which the (smallest) period $p$ of $x[i\dd j]$ satisfies $2p\leq j-i+1$, and both $x[i-1]\neq x[i+p-1]$ and $x[j+1]\neq x[j-p+1]$ when the inequalities make sense. The centre of run $[i\dd j]$ is the position $i+p$.

To avoid reporting the same run twice, they can be filtered out according to their centre. To do so, we say that a run is right centred in the product $uv$ of words if it starts at a position on $u$ and has its centre on $v$. And we say it is left centred in $uv$ if its centre is on $u$ and it ends in $v$.

Design a linear-time algorithm that finds right-centred runs occurring in a product $uv$ of two words.
Use prefix tables, see Problem 22.

Design an algorithm that computes all the runs in a word of length $n$ in time $O(n \log n)$ in the equality model.
Use a divide-and-conquer approach.

References

  • M. Crochemore, C. S. Iliopoulos, M. Kubica, J. Radoszewski, W. Rytter, K. Stencel, and T. Walen. New simple efficient algorithms computing powers and runs in strings. Discrete Applied Mathematics, 163:258-267, 2014.
  • M. G. Main and R. J. Lorentz. An $O(n \log n)$ algorithm for recognizing repetition. Report CS-79-056, Washington State University, Pullman, 1979.