|
Problem 59: Minimal Unique Factors
|
|
\(
\newcommand{\MINUNIQ}{\mathit{MinUniq}}%
\def\sa#1{\tt{#1}}
\def\dd{\dot{}\dot{}}
\)
The topic of this problem has the same flavour as the notion of minimal
absent words.
It can also be used to identify, to filter or to distinguish files.
But the corresponding algorithms and the underlying combinatorial
properties are simpler.
A minimal unique factor of a word $x$ is a factor that occurs once in
$x$ and whose proper factors are repeats, that is, have at least two
occurrences in $x$.
A minimal unique factor $x[i\dd j]$ is stored in the
table $\MINUNIQ$ by setting $\MINUNIQ[j]=i$.
Example
Minimal unique factors of $x=\sa{abaabba}$ are $\sa{aba}$$=$$x[0\dd 2]$,
$\sa{aa}$$=$$x[2\dd 3]$ and $\sa{bb}$$=$$x[4\dd 5]$, and $\MINUNIQ[2]=0$,
$\MINUNIQ[3]=2$ and $\MINUNIQ[5]=4$ (other values are set to $-1$).
Algorithm MinUnique applies to a singleton-free word (each letter
appear at least twice in it).
MinUnique$(x \textrm{ non-empty singleton-freeword})$
\begin{algorithmic}
\STATE $(\mathrm{SA},\mathrm{LCP})\leftarrow \mbox{Suffix array of } x$
\FOR{$i\leftarrow 0$ \TO $|x|$}
\STATE $MinUniq[i]\leftarrow -1$
\ENDFOR
\FOR{$r\leftarrow 0$ \TO $|x|-1$}
\STATE $\ell\leftarrow \max\{\mathrm{LCP}[r],\mathrm{LCP}[r+1]\}$
\STATE $MinUniq[\mathrm{SA}[r]+\ell]\leftarrow \max\{MinUniq[\mathrm{SA}[r]+\ell],\mathrm{SA}[r]\}$
\ENDFOR
\RETURN{$MinUniq[0..|x|-1]$}
\end{algorithmic}
Show that the algorithm MinUnique computes the table $\MINUNIQ$
associated with its singleton-free input word $x$.
There is a duality between minimal unique factors and maximal occurrences
of repeats in the word $x$.
Show that a minimal unique factor induces two maximal occurrences of repeats
in a singleton-free word.
How many minimal unique factors are there in a de Bruijn word of order $k$?
References
M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific
Publishing, Hong-Kong, 2002. 310 pages.
L. Ilie and W. F. Smyth. Minimum unique substrings and maximum
repeats. Fundam. Inform., 110(1-4):183-195, 2011.
T. Mieno, Y. Kuhara, T. Akagi, Y. Fujishige, Y. Nakashima, S. Inenaga,
H. Bannai, and M. Takeda. Minimal unique substrings and minimal
absent words in a sliding window. In A. Chatzigeorgiou, R. Dondi,
H. Herodotou, C. A. Kapoutsis, Y. Manolopoulos, G. A. Papadopoulos,
and F. Sikora, editors, SOFSEM 2020: Theory and Practice of Computer
Science - 46th International Conference on Current Trends in Theory
and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January
20-24, 2020, Proceedings, volume 12011 of Lecture Notes in Computer
Science, pages 148-160. Springer, 2020.
K. Tsuruta, S. Inenaga, H. Bannai, and M. Takeda. Shortest unique
substrings queries in optimal time. In V. Geffert, B. Preneel, B. Rovan,
J. Stuller, and A. M. Tjoa, editors, SOFSEM 2014: Theory and Practice
of Computer Science - 40th International Conference on Current Trends
in Theory and Practice of Computer Science, Novy Smokovec, Slovakia,
January 26-29, 2014, Proceedings, volume 8327 of Lecture Notes in Computer
Science, pages 503-513. Springer, 2014.