|
Problem 87: Computing Runs on Ordered Alphabets
|
|
\(
\def\sa#1{\tt{#1}}
\def\dd{\dot{}\dot{}}
\def\Lyn{\mathit{Lyn}}
\)
The aim of the problem is to show that all runs (maximal
periodicities) in a word drawn from a linearly sortable alphabet can be
computed in linear time.
The solution is based on the results of Problem 86,
where it is shown that a run possesses a special position from which starts
a longest Lyndon factor of the whole word.
Tracking the longest Lyndon factors has to be done
according to the alphabet ordering $\lt$ but also to its inverse $\lt^{-1}$.
When a longest Lyndon factor is located, simple extensions from two positions to
the right and to the left (like in Problem 74)
can confirm the starting position of the Lyndon factor is a special position
of a run whose period is the length of the Lyndon factor.
To do so we first define the Lyndon table $\Lyn$ of a
(non-empty) word $x$.
For a position $i$ on $x$, $i=0,\dots,|x|-1$, $\Lyn[i]$
is the length of the longest Lyndon factor starting at $i$:
\[\Lyn[i] = \max\{\ell \mid x[i\dd i+\ell-1] \mbox{ is a Lyndon word}\}.\]
$i$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
$x[i]$ | $\sa{a}$ | $\sa{b}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{b}$ | $\sa{a}$ | $\sa{b}$ | $\sa{a}$ |
$\Lyn[i]$ | 3 | 1 | 1 | 2 | 1 | 2 | 1 | 8 | 5 | 1 | 3 | 1 | 1 | 2 | 1 | 1 |
Show that Algorithm LongestLyndon correctly computes the table $\Lyn$.
LongestLyndon$(x \textrm{ non-empty word of length } n)$
\begin{algorithmic}
\FOR{$i\leftarrow n-1$ \DOWNTO $0$}
\STATE $(Lyn[i],j)\leftarrow (1,i+1)$
\WHILE{$j\lt n \mbox{ and } x[i..j-1] \lt x[j..j+Lyn[j]-1]$}
\STATE $(Lyn[i],j)\leftarrow (Lyn[i]+Lyn[j],j+Lyn[j])$
\ENDWHILE
\ENDFOR
\RETURN{Lyn}
\end{algorithmic}
Extend Algorithm LongestLyndon to compute all runs occurring in a word.
Use the longest common extensions like in
Problem 74.
What is the total running time of the algorithm if a comparison of two
factors is done with the help of the ranks of their associated suffixes in
the alphabetic order and if the longest common extension techniques are used?
References
M. Crochemore, C. S. Iliopoulos, T. Kociumaka, R. Kundu, S. P. Pissis,
J. Radoszewski, W. Rytter, and T. Walen. Near-optimal computation of
runs over general alphabet via non-crossing LCE queries. In S. Inenaga,
K. Sadakane, and T. Sakai, editors, String Processing and Information
Retrieval - 23rd International Symposium, SPIRE 2016, Beppu, Japan,
October 18-20, 2016, Proceedings, volume 9954 of Lecture Notes in Computer
Science, pages 22-34, 2016.
J. Fischer and V. Heun. Theoretical and practical improvements on
the RMQ-problem, with applications to LCA and LCE. In M. Lewenstein
and G. Valiente, editors, Combinatorial Pattern Matching, 17th
Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006, Proceedings,
volume 4009 of Lecture Notes in Computer Science, pages 36-48. Springer, 2006.
F. Franek, A. S. M. S. Islam, M. S. Rahman, and W. F. Smyth. Algorithms
to compute the Lyndon array. CoRR, abs/1605.08935, 2016.
P. Gawrychowski, T. Kociumaka, W. Rytter, and T. Walen. Faster
longest common extension queries in strings over general alphabets. In
R. Grossi and M. Lewenstein, editors, 27th Annual Symposium on Combinatorial
Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv,
Israel, volume 54 of LIPIcs, pages 5:1-5:13. Schloss Dagstuhl - Leibniz-
Zentrum fuer Informatik, 2016.
C. Hohlweg and C. Reutenauer. Lyndon words, permutations and trees.
Theor. Comput. Sci., 307(1):173-178, 2003.
M. Lothaire. Combinatorics on Words. Addison-Wesley, 1983. Reprinted
in 1997.
S. Mantaci, A. Restivo, G. Rosone, and M. Sciortino. Suffix array and
Lyndon factorization of a text. J. Discrete Algorithms, 28:2-8, 2014.