CLR cover

Problem 131: Idempotent equivalence

We consider a relation between words of $A^*$, for a finite alphabet $A$, that identifies a square $uu$ to its root $u$. More precisely, any factor $u$ occurring in a word $x\in A^*$ can be replaced by $uu$, and any occurrence of $uu$ replaced by $u$. Such replacements define an equivalence relation $\approx$ between words of $A^*$.

For a given alphabet the number of equivalence classes is finite, but grows considerably fast. For alphabet sizes 1,2,3,4,5 the number of equivalence classes are respectively 1,2,7,160,332381.

The goal of the problem is to design an efficient algorithm for testing the $\approx$-equivalence of two words. To do so, with each $x\in A^*$ is associated a (characteristic) quadruple $\Psi(x)=(p,a,b,q)$, where $a,b\in A$, $pa$ is a shortest prefix and $bq$ is a shortest suffix of $x$ for which $\alph(pa)=\alph(bq)=\alph(x)$ (Recall that $\alph(u)$ is the set of letters occurring in $u$).

The sought algorithm is based on the following result (see References).

Lemma (Equivalence Criterion)

Let $x,y\in A^*$ be two words and their quadruples $\Psi(x)=(p,a,b,q)$ and $\Psi(y)=(p',a',b',q')$. Then, $x \approx y$ iff $p \approx p'$, $a=a'$, $b=b'$ and $q \approx q'$.

Assuming $A$ is an integer alphabet (sortable in linear time), show how to check if $x\approx y$ in $(n\cdot |A|)$ time, where $n=|x|+|y|$.

References