CLR cover
Previous page

Words

Next page

An alphabet is a non-empty set whose elements are called letters or symbols. We typically use alphabets $\mathbf{A}=\{\sa{a},\sa{b},\sa{c},\dots\}$, $\mathbf{B}=\{\sa{0},\sa{1}\}$ and natural numbers. A word (mot, in French) or string on an alphabet $A$ is a sequence of elements of $A$.

Rue Mot

The zero letter sequence is called the empty word and is denoted by $\mv$. The set of all finite words on an alphabet $A$ is denoted by $A^*$, and $A^+=A^*\setminus\{\mv\}$.

The length of a word $x$, length of the sequence, is denoted by $|x|$. We denote by $x[i]$, for $i=0,1,\dots,|x|-1$, the letter at position or index $i$ on a non-empty word $x$. Then $x=x[0] x[1] \cdots x[|x|-1]$ also denoted by $x[0\dd |x|-1]$. The set of letters that occur in the word $x$ is denoted by $\alph(x)$.

The product or concatenation of two words $x$ and $y$ is the word composed of the letters of $x$ followed by the letters of $y$. It is denoted by $xy$, or by $x\cdot y$ to emphasise the decomposition of the resulting word. The neutral element for the product is $\mv$ and we denote respectively by $zy^{-1}$ and $x^{-1}z$ the words $x$ and $y$ when $z=xy$.

A conjugate, rotation or cyclic shift of a word $x$ is any word $y$ that factorises into $vu$, where $uv=x$. This makes sense because the product of words is obviously non-commutative.

A word $x$ is a factor (sometimes called substring) of a word $y$ if $y=uxv$ for two words $u$ and $v$. When $u=\mv$, $x$ is a prefix of $y$, and when $v=\mv$, $x$ is a suffix of $y$. Sets $\Fact(x)$, $\Pref(x)$ and $\Suff(x)$ denote the sets of factors, prefixes and suffixes of $x$ respectively.

When $x$ is a non-empty factor of $y=y[0\dd n-1]$ it is of the form $y[i\dd i+|x|-1]$ for some $i$. An occurrence of $x$ in $y$ is an interval $[i\dd i+|x|-1]$ for which $x=y[i\dd i+|x|-1]$. We say that $i$ is the starting position (or left position) on $y$ of this occurrence, and that $i+|x|-1$ is its ending position (or right position). An occurrence of $x$ in $y$ can also be defined as a triple $(u,x,v)$ such that $y=uxv$. Then the starting position of the occurrence is $|u|$.

For words $x$ and $y$, $|y|_x$ denotes the number of occurrences of $x$ in $y$. Then, for instance, $|y|=\Sigma\{|y|_a \mid a\in\alph(y)\}$.

The word $x$ is a subsequence or subword of $y$ if the latter decomposes into $w_0 x[0] w_1 x[1]\dots x[|x|-1] w_{|x|}$ for words $w_0$, $w_1$, \dots, $w_{|x|}$.

A factor or a subsequence $x$ of a word $y$ is said to be proper if $x\ne y$.