CLR cover

Problem 24: Periodicity Test

\( \def\sa#1{\tt{#1}} \def\per{\mathit{per}} \def\ms{\mathit{ms}} \def\dd{\dot{}\dot{}} \)

The detection of periodicities in words is an essential question to cope with when designing text searching or text compression methods. The goal is to test the periodicity of a word in a space-economical way.

A word $x$ is said to be periodic if its (smallest) period is no more than half its length, $\per(x) \leq |x|/2$. Equivalently, $x$ is periodic if it is of the form $u^kv$ for two words $u$ and $v$ and an integer $k$, $u$ non-empty, $v$ a proper prefix of $u$ and $k\geq 2$.

Checking the property is straightforward from the border or prefix tables of $x$. The goal here is to do it still in linear time but with only constant extra space (in addition to input). The idea is to use the function MaxSuffixPos of Problem 40 that gives the position and period of the maximal suffix of $x$.

The examples below illustrate variations of the period of a word according to its maximal suffix. The maximal suffix starts at position $\ms$ and is in the form $u^kv$ (here $k=1$), where $u=x[\ms\dd\ms+p-1]$, $p$ is its period and $v$ is a proper prefix of $u$.

First, $x=\sa{ababbaababbaab}$. The prefix $\sa{aba}$ of $x$ preceding the maximal suffix $uv$ is a suffix of $u$. Then $x$ is periodic with period $6$.

Periodicity Test

Second, $x=\sa{ababbaaabbaababbaa}$. The prefix of $x$ preceding $uv$ is longer than $u$. The period of $x$ is $11\gt |uv|$ and $x$ is not periodic.

Periodicity Test

Third, $x=\sa{baabbaababbaab}$. The prefix of $x$ preceding $uv$ is shorter than $u$ but not a suffix of it. The period of $x$ is $10\gt |x|-|v|$ and $x$ is not periodic.

Periodicity Test

Show that the periodicity of a word $x$ can be tested with less than $|x|/2$ letter comparisons if the starting position and the period of its maximal suffix are given.

References

  • M. Crochemore. String-matching on ordered alphabets. Theor. Comput. Sci., 92(1):33-47, 1992.
  • M. Crochemore and W. Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13(5):405-425, 1995.
  • Z. Galil and J. I. Seiferas. Time-space-optimal string matching. J. Comput. Syst. Sci., 26(3):280-294, 1983.