Abstract
The set of all \(q\)-ary strings that do not contain repeated substrings of length \({\le\! 3}\) (i.e., that do not contain substrings of the form \(a a\), \(a b a b\), and \(a b c a b c\)) constitutes a code correcting an arbitrary number of tandem-duplication mutations of length \({\le\! 3}\). In other words, any two such strings are non-confusable in the sense that they cannot produce the same string while evolving under tandem duplications of length \({\le\! 3}\). We demonstrate that this code is asymptotically optimal in terms of rate, meaning that it represents the largest set of non-confusable strings up to subexponential factors. This result settles the zero-error capacity problem for the last remaining case of tandem-duplication channels satisfying the “root-uniqueness” property.
Similar content being viewed by others
Notes
The first code constructions for these models (with \(\ell\in\{4, 5,\ldots\}\)) have been reported in [5].
Irreducible strings are an instance of pattern-avoiding strings, or constrained strings [10], the set of forbidden patterns being \(\{a\,a,a\,b\,a\,b, a\,b\,c\,a\,b\,c:\: a, b, c\in\mathcal{A}_q\}\).
For \(\boldsymbol{x}\) in (3.2a), the substrings of length \(\le\!3\) that partially overlap with the substring \(\phantom{^2}{\color{blue} 1\,\color{blue}2\,\color{blue}0}\) are: \({\color{blue}1}\), \({\color{blue} 2}\), \({\color{blue} 0}\), \(1\,{\color{blue} 1}\), \({\color{blue} 1\,\color{blue}2}\), \({\color{blue} 2\,\color{blue}0}\), \({\color{blue} 0}\,2\), \(0\,1\,{\color{blue}1}\), \(1\,{\color{blue} 1\,\color{blue}2}\), \({\color{blue} 2\,\color{blue}0}\,2\), and \({\color{blue}0}\,2\,1\).
References
Jain, S., Farnoud, F., Schwartz, M., and Bruck, J., Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms, IEEE Trans. Inform. Theory, 2017, vol. 63, no. 8, pp. 4996–5010. https://doi.org/10.1109/TIT.2017.2688361
Kovačević, M. and Tan, V.Y.F., Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems, IEEE Commun. Lett., 2018, vol. 22, no. 11, pp. 2194–2197. https://doi.org/10.1109/LCOMM.2018.2868666
Lenz, A., Jünger, N., and Wachter-Zeh, A., Bounds and Constructions for Multi-Symbol Duplication Error Correcting Codes, https://arXiv.org/abs/1807.02874v3 [cs.IT], 2018.
Kovačević, M., Zero-Error Capacity of Duplication Channels, IEEE Trans. Commun., 2019, vol. 67, no. 10, pp. 6735–6742. https://doi.org/10.1109/TCOMM.2019.2931342
Chee, Y.M., Chrisnata, J., Kiah, H.M., and Nguyen, T.T., Efficient Encoding/Decoding of GC-Balanced Codes Correcting Tandem Duplications, IEEE Trans. Inform. Theory, 2020, vol. 66, no. 8, pp. 4892–4903. https://doi.org/10.1109/TIT.2020.2981069
Farnoud, F., Schwartz, M., and Bruck, J., The Capacity of String-Duplication Systems, IEEE Trans. Inform. Theory, 2016, vol. 62, no. 2, pp. 811–824. https://doi.org/10.1109/TIT.2015.2505735
Jain, S., Farnoud, F., and Bruck, J., Capacity and Expressiveness of Genomic Tandem Duplication, IEEE Trans. Inform. Theory, 2017, vol. 63, no. 10, pp. 6129–6138. https://doi.org/10.1109/TIT.2017.2728079
Leupold, P., Martín-Vide, C., and Mitrana, V., Uniformly Bounded Duplication Languages, Discrete Appl. Math., 2005, vol. 146, no. 3, pp. 301–310. https://doi.org/10.1016/j.dam.2004.10.003
Shannon, C.E., The Zero Error Capacity of a Noisy Channel, IRE Trans. Inform. Theory, 1956, vol. 2, no. 3, pp. 8–19. https://doi.org/10.1109/TIT.1956.1056798
Marcus, B.H., Roth, R.M., and Siegel, P.H., An Introduction to Coding for Constrained Systems (unpublished manuscript), 5th ed., 2001. Available online at http://www.math.ubc.ca/~marcus/Handbook.
Chee, Y.M., Chrisnata, J., Kiah, H.M., and Nguyen, T.T., Deciding the Confusability of Words under Tandem Repeats in Linear Time, ACM Trans. Algorithms, 2019, vol. 15, no. 3, Art. 42 (22 pp.). https://doi.org/10.1145/3338514
Funding
This work was supported by the European Union's Horizon 2020 research and innovation programme under Grant Agreement no. 856967, and by the Secretariat for Higher Education and Scientific Research of the Autonomous Province of Vojvodina through the project no. 142-451-2686/2021.
Author information
Authors and Affiliations
Additional information
Translated from Problemy Peredachi Informatsii, 2022, Vol. 58, No. 2, pp. 12–23 https://doi.org/10.31857/S0555292322020028.
Rights and permissions
About this article
Cite this article
Kovačević, M. On the Maximum Number of Non-Confusable Strings Evolving under Short Tandem Duplications. Probl Inf Transm 58, 111–121 (2022). https://doi.org/10.1134/S0032946022020028
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0032946022020028