CLR cover

Problem 30: Words with Singleton Variables

\( \def\sa#1{\tt{#1}} \def\Al{\tt{A}} \def\alph{\mathit{alph}} \)

The problem shows the flexibility of the border table notion and of the fast algorithm computing this table. Having such a table is a valuable element to design efficient pattern matching, searching here for patterns with a variable.

We consider words over the alphabet $\Al=\{\sa{a},\sa{b},\dots\}$ in which letters are considered as singleton variables, that is, each letter represents a distinct unknown letter of the alphabet.

Two words $u$ and $v$ are said to be equivalent, denoted as $u \equiv v$, if there is a bijective letter-to-letter morphism $h:\alph(u)^*\rightarrow \alph(v)^*$ for which $h(u)=v$. For example, $\sa{aacbaba}\equiv\sa{bbdabab}$ through the morphism $h$ on $\Al^*$ to itself defined by $h(\sa{a})=\sa{b}$, $h(\sa{b})=\sa{a}$, $h(\sa{c})=\sa{d}$ and $h(\sa{d})=\sa{c}$. When $\alph(u)=\alph(v)$ and $u \equiv v$ the words become equal after permuting thei r letters.

The pattern-matching problem is naturally redefined as follows: given a pattern word $x$ and a text $y$, check if a factor $z$ of $y$ is equivalent to $x$: $z \equiv x$. For example, the pattern $x=\sa{aacbaba}$ occurs in $y=\sa{babbdababbacb}$ because its factor $z=\sa{bbdabab}$ is equivalent to $x$.

Assume the alphabet is sortable in linear time. Design a linear-time algorithm that searches a text for a pattern with singleton variables.
Design a notion of border table adequate to the problem.

References

  • B. S. Baker. Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci., 52(1):28-42, 1996.