CLR cover

Problem 20: Shortest covers

\( \def\sa#1{\tt{#1}} \def\pos{\mathit{pos}} \def\MaxGap{\mathit{MaxGap}} \def\tcover{\mathit{cover}} \def\dd{\dot{}\dot{}} \)

The notion of cover tries to capture the regularity of a word. It goes beyond the possible periodicity of the word by considering an a priori shorter factor that covers the whole word. Period words are specific covers that occur consecutively in the word while general covers may have overlapping occurrences. In that sense the notion of cover generalises the notion of period. More accurately, a cover $u$ of a word $x$ is a border of $x$ whose consecutive occurrence positions are at maximum distance $|u|$.

Example

The word $u=\sa{aba}$ is a cover of the word $x=\sa{abaababa}$. Indeed, the sorted list of starting positions of occurrences of $u$ in $x$ is $\pos(\sa{aba},\sa{abaababa}) = (0,3,5)$ and the maximum distance consecutive positions of the list is MaxGap$(\{0,3,5\})=3\leq |u|$. The word $u$ is the shortest cover of $x$.

The shortest cover of $\sa{abaababaaaba}$ is the whole word, which is always a trivial cover of itself. When this condition holds the word is said to be super-primitive. It is also primitive.

The cover table, denoted by $\tcover$, associated with a word $x$ is defined on length $\ell$ of its prefixes as follows: $\tcover[\ell]$ is the smallest length of covers of $x[0\dd \ell-1]$. Here is the cover table of the word $\sa{abababaaba}$.
$i$ 0123456789
$x[i]$ $\sa{a}$$\sa{b}$$\sa{a}$$\sa{b}$$\sa{a}$$\sa{b}$$\sa{a}$$\sa{a}$$\sa{b}$$\sa{a}$
$\ell$ 012345678910
$\tcover[\ell]$ 01232323893

Design a linear-time algorithm computing the shortest cover of each prefix of a word.

References

  • D. Breslauer. An on-line string superprimitivity test. Inf. Process. Lett., 44(6):345-347, 1992.