CLR cover

Problem 26: Delay of Sequential String Matching

\( \def\sa#1{\tt{#1}} \)

Algorithm KMP encapsulates a key feature (the use of a border table or the like) for the design of string-matching algorithms processing the text sequentially. Its concept is used for various types of patterns after appropriate preprocessing.

Algorithm KMP searches a text for occurrences of a pattern $x$. After preprocessing $x$ it treats the text in an online manner and outputs found occurrences. It runs in linear time executing no more letter comparisons than twice the text length. In the version below, it just outputs "1" each time an occurrence of $x$ is found in the text and outputs "0" otherwise.

The algorithm runs in linear time but not in real time due to the internal while loop of the algorithm dealing with a symbol of the text and that can take some time. This is called the delay of the algorithm. The goal of the problem is to bound the delay, precisely defined as the maximum number of comparisons executed at line 4 on the letter $a$ of the input $text$.

KMP$(x, text \mbox{ non-empty words})$
    \begin{algorithmic}
  \STATE $stbord \leftarrow $ \CALL{StrictBorders}{$x$}
  \STATE $i \leftarrow 0$
  \FOR{each letter $a$ of $text$ sequentially}
    \WHILE{$(i=|x|)$ \OR $(i\geq 0$ \AND $a \neq x[i])$}
      \STATE $i \leftarrow stbord[i]$
    \ENDWHILE 
    \STATE $i \leftarrow i+1$
    \STATE \textbf{if} $i=|x|$ \textbf{then} output 1 \textbf{else} output 0
  \ENDFOR 
    \end{algorithmic}

If the table $border$ (see Problem 19) of $x$ is used in the algorithm instead of its table $stbord$ (see Problem 25), the delay is $|x|$ in the worst-case. For example, if $x=\sa{a}^m$ is aligned with a factor $\sa{a}^{m-1}\sa{b}$ of the text, the letter $\sa{b}$ is compared to all the letters of $x$. But with the table $stbord$ the delay becomes logarithmic.

Show the delay of Algorithm KMP is $\Theta(\log |x|)$ when searching for a word $x$.
Consider interlocked periods and apply the Periodicity Lemma.

References