CLR cover

Problem 87: Computing Runs on Ordered Alphabets

\( \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \def\Lyn{\mathit{Lyn}} \)

The aim of the problem is to show that all runs (maximal periodicities) in a word drawn from a linearly sortable alphabet can be computed in linear time.

The solution is based on the results of Problem 86, where it is shown that a run possesses a special position from which starts a longest Lyndon factor of the whole word. Tracking the longest Lyndon factors has to be done according to the alphabet ordering $\lt$ but also to its inverse $\lt^{-1}$. When a longest Lyndon factor is located, simple extensions from two positions to the right and to the left (like in Problem 74) can confirm the starting position of the Lyndon factor is a special position of a run whose period is the length of the Lyndon factor.

To do so we first define the Lyndon table $\Lyn$ of a (non-empty) word $x$. For a position $i$ on $x$, $i=0,\dots,|x|-1$, $\Lyn[i]$ is the length of the longest Lyndon factor starting at $i$: \[\Lyn[i] = \max\{\ell \mid x[i\dd i+\ell-1] \mbox{ is a Lyndon word}\}.\]

$i$0123456789101112131415
$x[i]$$\sa{a}$$\sa{b}$$\sa{b}$$\sa{a}$$\sa{b}$$\sa{a}$$\sa{b}$$\sa{a}$$\sa{a}$$\sa{b}$$\sa{a}$$\sa{b}$$\sa{b}$$\sa{a}$$\sa{b}$$\sa{a}$
$\Lyn[i]$3112121851311211

Show that Algorithm LongestLyndon correctly computes the table $\Lyn$.

LongestLyndon$(x \textrm{ non-empty word of length } n)$
    \begin{algorithmic}
  \FOR{$i\leftarrow n-1$ \DOWNTO $0$}
    \STATE $(Lyn[i],j)\leftarrow (1,i+1)$
    \WHILE{$j\lt n \mbox{ and } x[i..j-1] \lt x[j..j+Lyn[j]-1]$}
      \STATE $(Lyn[i],j)\leftarrow (Lyn[i]+Lyn[j],j+Lyn[j])$
    \ENDWHILE
  \ENDFOR
  \RETURN{Lyn}
    \end{algorithmic}

Extend Algorithm LongestLyndon to compute all runs occurring in a word.
Use the longest common extensions like in Problem 74.

What is the total running time of the algorithm if a comparison of two factors is done with the help of the ranks of their associated suffixes in the alphabetic order and if the longest common extension techniques are used?

References

  • M. Crochemore, C. S. Iliopoulos, T. Kociumaka, R. Kundu, S. P. Pissis, J. Radoszewski, W. Rytter, and T. Walen. Near-optimal computation of runs over general alphabet via non-crossing LCE queries. In S. Inenaga, K. Sadakane, and T. Sakai, editors, String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Beppu, Japan, October 18-20, 2016, Proceedings, volume 9954 of Lecture Notes in Computer Science, pages 22-34, 2016.
  • J. Fischer and V. Heun. Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In M. Lewenstein and G. Valiente, editors, Combinatorial Pattern Matching, 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006, Proceedings, volume 4009 of Lecture Notes in Computer Science, pages 36-48. Springer, 2006.
  • F. Franek, A. S. M. S. Islam, M. S. Rahman, and W. F. Smyth. Algorithms to compute the Lyndon array. CoRR, abs/1605.08935, 2016.
  • P. Gawrychowski, T. Kociumaka, W. Rytter, and T. Walen. Faster longest common extension queries in strings over general alphabets. In R. Grossi and M. Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, volume 54 of LIPIcs, pages 5:1-5:13. Schloss Dagstuhl - Leibniz- Zentrum fuer Informatik, 2016.
  • C. Hohlweg and C. Reutenauer. Lyndon words, permutations and trees. Theor. Comput. Sci., 307(1):173-178, 2003.
  • M. Lothaire. Combinatorics on Words. Addison-Wesley, 1983. Reprinted in 1997.
  • S. Mantaci, A. Restivo, G. Rosone, and M. Sciortino. Suffix array and Lyndon factorization of a text. J. Discrete Algorithms, 28:2-8, 2014.