CLR cover

Problem 58: Avoiding a Set of Words

For a finite set $F$ of words on a finite alphabet $A$, $F\subseteq A^*$, let $L(F)\subseteq A^*$ be the language of words that avoids $F$, that is, no word in $F$ appears in words of $L(F)$. The aim is to build an automaton accepting $L(F)$.

Note that if $w$ avoids $u$ it also avoids any word $u$ is a factor of. Thus we can consider that $F$ is an anti-factorial (factor code) language: no word in $F$ is a proper factor of another word in $F$. On the contrary, $F(L)$ is a factorial language: any factor of a word in $F(L)$ is in $F(L)$.

Show that $L(F)$ is recognised by a finite automaton and design an algorithm to build, from the trie of $F$, a deterministic automaton that accepts it.
Use the Aho-Corasick technique.

The set $F$ is said to be unavoidable if no infinite word avoids it (see Problem 57).

Show how to test if the set $F$ is unavoidable.

References

  • A. V. Aho and M. J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333-340, 1975.