Problem 21: Short borders |
The problem deals with a special type of border table of a word. It is adapted to search texts for Zimin patterns containing word variables (see Problem 43). It shows the notion of border table is powerful when tuned for searching online for various types of patterns.
A border of a non-empty word $x$ is any word that is both a proper prefix and a suffix of it. A border is said to be short if its length is smaller than $|x|/2$. The notations $\Bord(x)$ and $\SB(x)$ stand for the longest border of $x$ and for its longest short border, respectively. Any of these borders can be the empty word.
For example, borders of $x=\sa{abababab}$ are $\mv$, $\sa{ab}$, $\sa{abab}$ and $\sa{ababab}$. Only $\mv$ and $\sa{ab}$ are short, $\Bord(x)=\sa{ababab}$ and $\SB(x)=\sa{ab}$.
In some algorithms the notion of a short border is more appropriate when known for all the prefixes of a word. Let $\tSB$ denote the short-border table of prefixes of a non-empty word $x$. It is defined on prefix lengths $\ell$, $0\lt \ell\leq |x|$, by $$\tSB[\ell]=|\SB(x[0\dd \ell-1])|$$ (by convention $\tSB[0]=-1$). Below are the tables for the word $\sa{abaababaaba}$. They differ at positions $\ell=6,10,11$.
$i$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
$x[i]$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | |
$\ell$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
$\tbord[\ell]$ | -1 | 0 | 0 | 1 | 1 | 2 | 3 | 2 | 3 | 4 | 5 | 6 |
$\tSB[\ell]$ | -1 | 0 | 0 | 1 | 1 | 2 | 1 | 2 | 3 | 4 | 2 | 3 |