Example

$ \def\sa#1{\tt{#1}} \def\alph{\mathit{alph}} \def\mv{\varepsilon} \def\Fact{\mathit{Fact}} \def\Pref{\mathit{Pref}} \def\Suff{\mathit{Suff}} \def\dd{\dot{}\dot{}} $

For $x=\sa{abaaab}$ we have $|x|=6$ and $\alph(x)=\{\sa{a},\sa{b}\}$.

The set of conjugates of $\sa{abba}$, its conjugacy class because conjugacy is an equivalence relation, is $\{\sa{aabb},\sa{abba},\sa{baab},\sa{bbaa}\}$ and that of $\sa{abab}$ is $\{\sa{abab},\sa{baba}\}$.

The starting and ending positions of $x=\sa{aba}$ on $y=\sa{babaababa}$ are:

$i$ 0 1 2 3 4 5 6 7 8

$y[i]$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{a}$ $\sa{b}$ $\sa{a}$ $\sa{b}$ $\sa{a}$

starting positions 1 4 6

ending positions 3 6 8

Words

An alphabet is a non-empty set whose elements are called letters or symbols. We typically use alphabets $\mathbf{A}=\{\sa{a},\sa{b},\sa{c},\dots\}$, $\mathbf{B}=\{\sa{0},\sa{1}\}$ and natural numbers. A word (mot, in French) or string on an alphabet $A$ is a sequence of elements of $A$.

The zero letter sequence is called the empty word and is denoted by $\mv$. The set of all finite words on an alphabet $A$ is denoted by $A^*$, and $A^+=A^*\setminus\{\mv\}$.

The length of a word $x$, length of the sequence, is denoted by $|x|$. We denote by $x[i]$, for $i=0,1,\dots,|x|-1$, the letter at position or index $i$ on a non-empty word $x$. Then $x=x[0] x[1] \cdots x[|x|-1]$ also denoted by $x[0\dd |x|-1]$. The set of letters that occur in the word $x$ is denoted by $\alph(x)$.

The product or concatenation of two words $x$ and $y$ is the word composed of the letters of $x$ followed by the letters of $y$. It is denoted by $xy$, or by $x\cdot y$ to emphasise the decomposition of the resulting word. The neutral element for the product is $\mv$ and we denote respectively by $zy^{-1}$ and $x^{-1}z$ the words $x$ and $y$ when $z=xy$.

A conjugate, rotation or cyclic shift of a word $x$ is any word $y$ that factorises into $vu$, where $uv=x$. This makes sense because the product of words is obviously non-commutative.

A word $x$ is a factor (sometimes called substring) of a word $y$ if $y=uxv$ for two words $u$ and $v$. When $u=\mv$, $x$ is a prefix of $y$, and when $v=\mv$, $x$ is a suffix of $y$. Sets $\Fact(x)$, $\Pref(x)$ and $\Suff(x)$ denote the sets of factors, prefixes and suffixes of $x$ respectively.

When $x$ is a non-empty factor of $y=y[0\dd n-1]$ it is of the form $y[i\dd i+|x|-1]$ for some $i$. An occurrence of $x$ in $y$ is an interval $[i\dd i+|x|-1]$ for which $x=y[i\dd i+|x|-1]$. We say that $i$ is the starting position (or left position) on $y$ of this occurrence, and that $i+|x|-1$ is its ending position (or right position). An occurrence of $x$ in $y$ can also be defined as a triple $(u,x,v)$ such that $y=uxv$. Then the starting position of the occurrence is $|u|$.

For words $x$ and $y$, $|y|_x$ denotes the number of occurrences of $x$ in $y$. Then, for instance, $|y|=\Sigma\{|y|_a \mid a\in\alph(y)\}$.

The word $x$ is a subsequence or subword of $y$ if the latter decomposes into $w_0 x[0] w_1 x[1]\dots x[|x|-1] w_{|x|}$ for words $w_0$, $w_1$, \dots, $w_{|x|}$.

A factor or a subsequence $x$ of a word $y$ is said to be proper if $x\ne y$.

$i$	0	1	2	3	4	5	6	7	8
$y[i]$	$\sa{b}$	$\sa{a}$	$\sa{b}$	$\sa{a}$	$\sa{a}$	$\sa{b}$	$\sa{a}$	$\sa{b}$	$\sa{a}$
starting positions		1			4		6
ending positions				3			6		8

125 Problems in Text Algorithms

Example

Words