Remarkable Words |
Besides Lyndon words, three sets of words have remarkable properties and are often used in examples. They are Thue-Morse words, Fibonacci words and de Bruijn words. The first two are prefixes of (one-way) infinite words. Formally an infinite word on the alphabet $A$ is a mapping from natural numbers to $A$. Their set is denoted by $A^\infty$.
The notion of (monoid) morphism is central to defining some infinite sets of words or an associate infinite word. A morphism from $A^*$ to itself (or another free monoid) is a mapping $h : A^* \mapsto A^*$ satisfying $h(uv)=h(u)h(v)$ for all words $u$ and $v$. Consequently, a morphism is entirely defined by the images $h(a)$ of letters $a\in A$.
The Thue-Morse word is produced by iterating the Thue-Morse morphism $\mu$ from $\{\sa{a}, \sa{b}\}^*$ to itself, defined by $$\cases{ \mu(\sa{a}) = \sa{ab}, &\cr \mu(\sa{b}) = \sa{ba}. &\cr } $$ Iterating the morphism from letter $\sa{a}$ gives the list of Thue-Morse words $\mu^k(\sa{a})$, $k\geq 0$, that starts with $$\begin{array}{lcl} \tau_0=\mu^0(\sa{a}) & = & \sa{a} \\ \tau_1=\mu^1(\sa{a}) & = & \sa{ab} \\ \tau_2=\mu^2(\sa{a}) & = & \sa{abba} \\ \tau_3=\mu^3(\sa{a}) & = & \sa{abbabaab} \\ \tau_4=\mu^4(\sa{a}) & = & \sa{abbabaabbaababba} \\ \tau_5=\mu^5(\sa{a}) & = & \sa{abbabaabbaababbabaababbaabbabaab} \end{array} $$ and eventually produces its infinite associate: $$\mathbf{t} = \lim_{k \rightarrow \infty}\mu^k(\sa{a}) = \sa{abbabaabbaababbabaababbaabbabaab}\cdots. $$
An equivalent definition of Thue-Morse words is provided by the following recurrence: $$\cases{ \tau_0 = \sa{a}, \cr \tau_{k+1} = \tau_k\overline{\tau_k}, & for $k \geq 0$, }$$ where the bar morphism is defined by $\overline{\sa{a}}=\sa{b}$ and $\overline{\sa{b}}=\sa{a}$. Note the length of the $k$th Thue-Morse word is $|\tau_k|=2^k$.
A direct definition of $\mathbf{t}$ is as follows: the letter $\mathbf{t}[n]$ is $\sa{b}$ if the number of occurrences of digit $\sa{1}$ in the binary representation of $n$ is odd, and is $\sa{a}$ otherwise.
The infinite Thue-Morse word is known to contain no overlap (factor of the form $auaua$ for a letter $a$ and a word $u$), that is, no factor of exponent larger than $2$. It is said to be overlap-free.
The Fibonacci word is similarly produced by iterating a morphism, the Fibonacci morphism $\phi$ from $\{\sa{a}, \sa{b}\}^*$ to itself, defined by $$\cases{ \phi(\sa{a}) = \sa{ab}, &\cr \phi(\sa{b}) = \sa{a}. & } $$ Iterating the morphism from letter $\sa{a}$ gives the list of Fibonacci words $\phi^k(\sa{a})$, $k\geq 0$, that starts with $$\begin{array}{lcl} \fib_0=\phi^0(\sa{a}) & = & \sa{a} \\ \fib_1=\phi^1(\sa{a}) & = & \sa{ab} \\ \fib_2=\phi^2(\sa{a}) & = & \sa{aba} \\ \fib_3=\phi^3(\sa{a}) & = & \sa{abaab} \\ \fib_4=\phi^4(\sa{a}) & = & \sa{abaababa} \\ \fib_5=\phi^5(\sa{a}) & = & \sa{abaababaabaab} \\ \fib_6=\phi^6(\sa{a}) & = & \sa{abaababaabaababaababa} \end{array} $$ and eventually its infinite associate: $$\mathbf{f} = \lim_{k \rightarrow \infty}\phi^k(\sa{a}) = \sa{abaababaabaababaababaabaababaabaab}\cdots.$$
An equivalent definition of Fibonacci words comes from the recurrence relation: $$\cases{ \fib_0 = \sa{a}, \cr \fib_1 = \sa{ab}, \cr \fib_{k+1} = \fib_{k}\fib_{k-1}, & for $k \geq 1$. }$$
The sequence of lengths of these words is the sequence of Fibonacci numbers, that is, $|\fib_k|=F_{k+2}$. Recall that Fibonacci numbers are defined by the recurrence $$\cases{ F_0 = 0, \cr F_1 = 1, \cr F_{k+1} = F_k+F_{k-1}, & for $k \geq 1$. }$$
Among many properties they satisfy are
The interest in Fibonacci words comes from the combinatorial properties they satisfy and the large number of repeats they contain. However, the infinite Fibonacci word contains no factor of exponent larger than $\Phi^2+1=3.61803\cdots$.
De Bruijn words are defined here on the alphabet $A=\{\sa a,\sa b\}$ and are parameterised by a positive integer $k$. A word $x\in A^+$ is a de Bruijn word of order $k$ if each word of $A^k$ occurs exactly once in $x$. As a first example, $\sa{ab}$ and $\sa{ba}$ are the only two de Bruijn words of order $1$. As a second example, the word $\sa{aaababbbaa}$ is a de Bruijn word of order $3$, since its eight factors of length 3 are the eight words of $A^3$, that is, $\sa{aaa}$, $\sa{aab}$, $\sa{aba}$, $\sa{abb}$, $\sa{baa}$, $\sa{bab}$, $\sa{bba}$ and $\sa{bbb}$.
The existence of a de Bruijn word of order $k\geq2$ can be verified with the help of the de Bruijn automaton defined by
It can also be verified that the number of de Bruijn words of order $k$ is exponential in $k$.
De Bruijn words can be defined on larger alphabets and are often used as examples of limit cases because they contain all the factors of a given length.