Example
\(
\def\sa#1{\tt{#1}}
\def\dd{\dot{}\dot{}}
\def\LZ{\mathrm{LZ}}
\def\LPF{\mathrm{LPF}}
\)
The factorisation of $\sa{abaabababbabbb}$ is
$\sa{a}\cdot\sa{b}\cdot\sa{a}\cdot\sa{aba}\cdot\sa{bab}\cdot\sa{babb}\cdot\sa{b}$,
which gives the array $\LZ=[0,1,2,3,6,9,13,14]$.
Below is the $\LPF$ array of $\sa{abaabababbabbb}$.
$i$ | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
$w[i]$ | | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{b}$ | $\sa{b}$ |
$\LPF[i]$ | | 0 | 0 | 1 | 3 | 2 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 2 | 1 |
|
Problem 97: Lempel-Ziv Factorisation
|
|
The problem deals with a Lempel-Ziv factorisation of words.
The factorisation considered here is the decomposition of a word $w$ into the
product $w_0w_1\cdots w_k$ where each $w_i$ is the longest prefix of
$w_iw_{i+1}\cdots w_k$ that occurs in $w$ before the present position
$|w_0w_1\cdots w_{i-1}|$.
If there is no previous occurrence, $w_i$ is the
first letter of $w_iw_{i+1}\cdots w_k$.
The factorisation is stored in the array $\LZ$:
$\LZ[0] = 0$ and, for $1\le i \le k$, $\LZ[i] = |w_0w_1\cdots w_{i-1}|$.
Show how to compute the array $\LZ$ of a word in linear time assuming a
fixed-size alphabet.
The same running time can be achieved when the alphabet is linearly sortable,
which is a weaker condition than the above one.
This is done from the longest
previous array ($\LPF$) array of the word that computes in linear time under
this condition (see Problem 53).
The $\LPF$ array of a word $w$ is defined, for each position $i$ on $w$, by:
$\LPF[i]$ is the length of the longest factor of $w$ that starts both at
position $i$ and at a smaller position.
Design an algorithm that computes in linear time the Lempel-Ziv
factorisation of a word given its $\LPF$ array.
References
H. Bannai, T. I, S. Inenaga, Y. Nakashima, M. Takeda, and K. Tsuruta.
The "runs" theorem. SIAM J. Comput., 46(5):1501-1514, 2017.
J. Berstel and A. Savelli. Crochemore factorization of sturmian and other
infinite words. In Mathematical Foundations of Computer Science 2006,
31st International Symposium, MFCS 2006, Stará Lesná, Slovakia, August
28-September 1, 2006, Proceedings, volume 4162 of Lecture Notes
in Computer Science, pages 157-166. Springer, 2006.
M. Crochemore. Transducers and repetitions. Theor. Comput. Sci.,
45(1):63-86, 1986.
M. Crochemore and L. Ilie. Computing longest previous factors in linear
time and applications. Inf. Process. Lett., 106(2):75-80, 2008.
M. Crochemore, G. M. Landau, and M. Ziv-Ukelson. A subquadratic
sequence alignment algorithm for unrestricted scoring matrices. SIAM
J. Comput., 32(6):1654-1673, 2003.
R. M. Kolpakov and G. Kucherov. Finding maximal repetitions in a word
in linear time. In 40th Annual Symposium on Foundations of Computer
Science, FOCS '99, 17-18 October, 1999, New York, NY, USA, pages
596-604. IEEE Computer Society, 1999.
R. M. Kolpakov and G. Kucherov. Finding approximate repetitions
under Hamming distance. In F. Meyer auf der Heide, editor, Algorithms -
ESA 2001, 9th Annual European Symposium, Aarhus, Denmark, August
28-31, 2001, Proceedings, volume 2161 of Lecture Notes in Computer
Science, pages 170-181. Springer, 2001.
J. Ziv and A. Lempel. A universal algorithm for sequential data compression.
IEEE Trans. Information Theory, 23(3):337-343, 1977.