CLR cover

Problem 112: Factoring with Border-Free Prefixes

\( \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \def\Bord{\mathit{Border}} \def\per{\mathit{per}} \)

Searching for a border-free pattern in texts is done very efficiently without any sophisticated solution by BM algorithm (see Problem 33) because two occurrences of the pattern cannot overlap. When a pattern is not border free, its factorisation into border-free words may lead to efficient searching methods.

We say that a non-empty word $u$ is border free if none of its proper non-empty prefix is also a suffix, that is, $\Bord(u) = \varepsilon$, or equivalently, if its smallest period is its length, that is, $\per(u) = |u|$.

The aim of the problem is to show how a word factorises into its border-free prefixes.

Consider, for example, the word $\sa{aababaaabaababaa}$. Its set of border-free prefixes is $\{\sa{a},\sa{aab},\sa{aabab}\}$ and its factorisation on the set is

Factoring with Border-Free Prefixes

Show that a non-empty word $x$ uniquely factorises into $x_k x_{k-1} \cdots x_1$, where each $x_i$ is a border-free prefix of $x$ and $x_1$ is the shortest border-free prefix of $x$.

The factorisation of $x$ can be represented by the list of its factor lengths. On the preceding example it is $(5,1,3,5,1,1)$ and the list of factors is $x[0\dd 4]$, $x[5\dd 5]$, $x[6\dd 8]$, $x[9\dd 13]$, $x[14\dd 14]$, $x[15\dd 15]$.

Design a linear-time algorithm for computing the border-free prefix factorisation of a word, namely the list of factor lengths.