CLR cover

Problem 103: A Compact Factor Automaton

\( \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \def\fib{\mathit{fib}} \def\gib{\mathit{g}} \)

A Factor automaton is a minimal (deterministic) automaton accepting all the factors of a word. It is also called a directed acyclic word graph (DAWG). All its states are terminal and its edges are labelled by single letters. For certain well-structured words the automaton can be highly compressed by removing nodes having exactly one parent and one child and labelling edges by factors of the word accordingly. The resulting DAWG is called a compact DAWG (CDAWG) or compact Suffix automaton (CSA) if nodes corresponding to suffixes are marked as terminal.

The problem considers words whose CDAWGs are extremely small, namely Fibonacci words $\fib_n$ and their shortened version $\gib_n$. The word $\gib_n$ is $\fib_n$ with the last two letters deleted, that is, $\gib_n = \fib_n\{\sa{a},\sa{b}\}^{-2}$.

Describe the structure of CDAWGs of Fibonacci words $\fib_n$ and of their trimmed versions $\gib_n$. Using this structure compute the number of distinct factors occurring in the words.

References

  • P. Baturo, M. Piatkowski, and W. Rytter. Usefulness of directed acyclic subword graphs in problems related to standard sturmian words. Int. J. Found. Comput. Sci., 20(6):1005-1023, 2009.
  • W. Rytter. The structure of subword graphs and suffix trees of Fibonacci words. Theor. Comput. Sci., 363(2):211-223, 2006.