CLR cover

Problem 59: Minimal Unique Factors

\( \newcommand{\MINUNIQ}{\mathit{MinUniq}}% \def\sa#1{\tt{#1}} \def\dd{\dot{}\dot{}} \)

The topic of this problem has the same flavour as the notion of minimal absent words. It can also be used to identify, to filter or to distinguish files. But the corresponding algorithms and the underlying combinatorial properties are simpler.

A minimal unique factor of a word $x$ is a factor that occurs once in $x$ and whose proper factors are repeats, that is, have at least two occurrences in $x$. A minimal unique factor $x[i\dd j]$ is stored in the table $\MINUNIQ$ by setting $\MINUNIQ[j]=i$.

Example

Minimal unique factors of $x=\sa{abaabba}$ are $\sa{aba}$$=$$x[0\dd 2]$, $\sa{aa}$$=$$x[2\dd 3]$ and $\sa{bb}$$=$$x[4\dd 5]$, and $\MINUNIQ[2]=0$, $\MINUNIQ[3]=2$ and $\MINUNIQ[5]=4$ (other values are set to $-1$).
Minimal Unique Factors

Algorithm MinUnique applies to a singleton-free word (each letter appear at least twice in it).

MinUnique$(x \textrm{ non-empty singleton-freeword})$
    \begin{algorithmic}
  \STATE $(\mathrm{SA},\mathrm{LCP})\leftarrow \mbox{Suffix array of } x$
  \FOR{$i\leftarrow 0$ \TO $|x|$}
     \STATE $MinUniq[i]\leftarrow -1$
  \ENDFOR
  \FOR{$r\leftarrow 0$ \TO $|x|-1$}
     \STATE $\ell\leftarrow \max\{\mathrm{LCP}[r],\mathrm{LCP}[r+1]\}$
     \STATE $MinUniq[\mathrm{SA}[r]+\ell]\leftarrow \max\{MinUniq[\mathrm{SA}[r]+\ell],\mathrm{SA}[r]\}$
  \ENDFOR
  \RETURN{$MinUniq[0..|x|-1]$}
    \end{algorithmic}

Show that the algorithm MinUnique computes the table $\MINUNIQ$ associated with its singleton-free input word $x$.

There is a duality between minimal unique factors and maximal occurrences of repeats in the word $x$.

Show that a minimal unique factor induces two maximal occurrences of repeats in a singleton-free word.

How many minimal unique factors are there in a de Bruijn word of order $k$?

References

  • M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific Publishing, Hong-Kong, 2002. 310 pages.
  • L. Ilie and W. F. Smyth. Minimum unique substrings and maximum repeats. Fundam. Inform., 110(1-4):183-195, 2011.
  • T. Mieno, Y. Kuhara, T. Akagi, Y. Fujishige, Y. Nakashima, S. Inenaga, H. Bannai, and M. Takeda. Minimal unique substrings and minimal absent words in a sliding window. In A. Chatzigeorgiou, R. Dondi, H. Herodotou, C. A. Kapoutsis, Y. Manolopoulos, G. A. Papadopoulos, and F. Sikora, editors, SOFSEM 2020: Theory and Practice of Computer Science - 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20-24, 2020, Proceedings, volume 12011 of Lecture Notes in Computer Science, pages 148-160. Springer, 2020.
  • K. Tsuruta, S. Inenaga, H. Bannai, and M. Takeda. Shortest unique substrings queries in optimal time. In V. Geffert, B. Preneel, B. Rovan, J. Stuller, and A. M. Tjoa, editors, SOFSEM 2014: Theory and Practice of Computer Science - 40th International Conference on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, January 26-29, 2014, Proceedings, volume 8327 of Lecture Notes in Computer Science, pages 503-513. Springer, 2014.