The Factor oracle is an indexing structure similar to the Factor or Suffix automaton (or DAWG) of a word \(x\). It is a deterministic automaton with \(|x|+1\) states, the minimum number of states the Suffix automaton of \(x\) can have. This makes it a well-suited data structure in many applications that require a simple indexing structure and leads both to a space-economical data structure and to an efficient online construction. The drawback is that the oracle of \(x\) accepts slightly more words than the factors of \(x\).

For a factor \(v\) of \(y\), let \(pocc(v,y)\) be the position on \(y\) following the first occurrence of \(v\) in \(y\), that is, \(pocc(v,y) = \min \{|z| \mid z = wv \mbox{ prefix of } y\}\). The following algorithm may be viewed as a definition of the Factor oracle \({\cal O}(x)\) of a word \(x\). It computes the automaton in which \(Q\) is the set of states and \(E\) the set of labelled edges.

Oracle\((x \mbox{ non-empty word})\)

\begin{algorithmic} \STATE $(Q,E)\leftarrow (\{0,1,\dots,|x|\},\emptyset)$ \FOR{$i \leftarrow 0$ \TO $|x| - 1$} \STATE $u \leftarrow $ shortest word recognised in state $i$ \ENDFOR \FOR{$a\in A$} \IF{ $ua \in Fact(x[i-|u|..|x|-1])$} \STATE $E \leftarrow E \cup\{(i,a,pocc(ua,x[i-|u|..|x|-1]))\}$ \ENDIF \ENDFOR \RETURN $(Q,E)$ \end{algorithmic}

Actually the structure has several interesting properties. Its \(|x|+1\) states are all terminal states. Every edge whose target is \(i+1\) is labelled by \(x[i]\). There are \(|x|\) edges of the form \((i,x[i],i+1)\), called internal edges. Other edges, of the form \((j,x[i],i+1)\) with \(j\le i\), are called external edges. The oracle can thus be represented by \(x\) and its set of external edges without their labels.

Show that the Factor oracle of a word $x$ has between $|x|$ and $2|x|-1$
edges.

Design an online construction of the Factor oracle of a word $x$ running in
linear time on a fixed alphabet with linear space.

Use suffix links.

Show the Factor oracle ${\cal O}(x)$ can be used for locating all the
occurrences of $x$ in a text, despite the oracle may accept words that are
not factors of $x$.

The only word of length $|x|$ recognised by ${\cal O}(x)$ is $x$ itself.