Problem 74: Computing Runs on General Alphabets

$ \newcommand{\offset}{\mathit{offset}}% \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} $

The goal of the problem is to design an algorithm for computing runs in a word without any extra assumption on the alphabet. To say it differently, the algorithm should use the equality model on letters, that is, use only $=$/$\neq$ letter comparisons when necessary.

Problem 87 deals with computing runs on linear-sortable alphabets, a necessary condition to obtain a linear-time algorithm.

A run in a word $x$ is a maximal periodicity or a maximal occurrence of a periodic factor of $x$. Formally, it is an interval of positions $[i\dd j]$ for which the (smallest) period $p$ of $x[i\dd j]$ satisfies $2p\leq j-i+1$, and both $x[i-1]\neq x[i+p-1]$ and $x[j+1]\neq x[j-p+1]$ when the inequalities make sense. The centre of run $[i\dd j]$ is the position $i+p$.

To avoid reporting the same run twice, they can be filtered out according to their centre. To do so, we say that a run is right centred in the product $uv$ of words if it starts at a position on $u$ and has its centre on $v$. And we say it is left centred in $uv$ if its centre is on $u$ and it ends in $v$.

Design a linear-time algorithm that finds right-centred runs occurring in a product $uv$ of two words.

Use prefix tables, see Problem 22.

Design an algorithm that computes all the runs in a word of length $n$ in time $O(n \log n)$ in the equality model.

Use a divide-and-conquer approach.

References

M. Crochemore, C. S. Iliopoulos, M. Kubica, J. Radoszewski, W. Rytter, K. Stencel, and T. Walen. New simple efficient algorithms computing powers and runs in strings. Discrete Applied Mathematics, 163:258-267, 2014.

M. G. Main and R. J. Lorentz. An $O(n \log n)$ algorithm for recognizing repetition. Report CS-79-056, Washington State University, Pullman, 1979.

125 Problems in Text Algorithms

Problem 74: Computing Runs on General Alphabets

References