Skip to main content
Log in

On the Maximum Number of Non-Confusable Strings Evolving under Short Tandem Duplications

  • CODING THEORY
  • Published:
Problems of Information Transmission Aims and scope Submit manuscript

Abstract

The set of all \(q\)-ary strings that do not contain repeated substrings of length \({\le\! 3}\) (i.e., that do not contain substrings of the form \(a a\), \(a b a b\), and \(a b c a b c\)) constitutes a code correcting an arbitrary number of tandem-duplication mutations of length \({\le\! 3}\). In other words, any two such strings are non-confusable in the sense that they cannot produce the same string while evolving under tandem duplications of length \({\le\! 3}\). We demonstrate that this code is asymptotically optimal in terms of rate, meaning that it represents the largest set of non-confusable strings up to subexponential factors. This result settles the zero-error capacity problem for the last remaining case of tandem-duplication channels satisfying the “root-uniqueness” property.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The first code constructions for these models (with \(\ell\in\{4, 5,\ldots\}\)) have been reported in [5].

  2. Irreducible strings are an instance of pattern-avoiding strings, or constrained strings [10], the set of forbidden patterns being \(\{a\,a,a\,b\,a\,b, a\,b\,c\,a\,b\,c:\: a, b, c\in\mathcal{A}_q\}\).

  3. For \(\boldsymbol{x}\) in (3.2a), the substrings of length \(\le\!3\) that partially overlap with the substring \(\phantom{^2}{\color{blue} 1\,\color{blue}2\,\color{blue}0}\) are: \({\color{blue}1}\), \({\color{blue} 2}\), \({\color{blue} 0}\), \(1\,{\color{blue} 1}\), \({\color{blue} 1\,\color{blue}2}\), \({\color{blue} 2\,\color{blue}0}\), \({\color{blue} 0}\,2\), \(0\,1\,{\color{blue}1}\), \(1\,{\color{blue} 1\,\color{blue}2}\), \({\color{blue} 2\,\color{blue}0}\,2\), and \({\color{blue}0}\,2\,1\).

References

  1. Jain, S., Farnoud, F., Schwartz, M., and Bruck, J., Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms, IEEE Trans. Inform. Theory, 2017, vol. 63, no. 8, pp. 4996⁠–⁠5010. https://doi.org/10.1109/TIT.2017.2688361

    Article  MathSciNet  Google Scholar 

  2. Kovačević, M. and Tan, V.Y.F., Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems, IEEE Commun. Lett., 2018, vol. 22, no. 11, pp. 2194⁠–⁠2197. https://doi.org/10.1109/LCOMM.2018.2868666

    Article  Google Scholar 

  3. Lenz, A., Jünger, N., and Wachter-Zeh, A., Bounds and Constructions for Multi-Symbol Duplication Error Correcting Codes, https://arXiv.org/abs/1807.02874v3 [cs.IT], 2018.

  4. Kovačević, M., Zero-Error Capacity of Duplication Channels, IEEE Trans. Commun., 2019, vol. 67, no. 10, pp. 6735⁠–⁠6742. https://doi.org/10.1109/TCOMM.2019.2931342

    Article  Google Scholar 

  5. Chee, Y.M., Chrisnata, J., Kiah, H.M., and Nguyen, T.T., Efficient Encoding/Decoding of GC-Balanced Codes Correcting Tandem Duplications, IEEE Trans. Inform. Theory, 2020, vol. 66, no. 8, pp. 4892⁠–⁠4903. https://doi.org/10.1109/TIT.2020.2981069

    Article  MathSciNet  Google Scholar 

  6. Farnoud, F., Schwartz, M., and Bruck, J., The Capacity of String-Duplication Systems, IEEE Trans. Inform. Theory, 2016, vol. 62, no. 2, pp. 811⁠–⁠824. https://doi.org/10.1109/TIT.2015.2505735

    Article  MathSciNet  Google Scholar 

  7. Jain, S., Farnoud, F., and Bruck, J., Capacity and Expressiveness of Genomic Tandem Duplication, IEEE Trans. Inform. Theory, 2017, vol. 63, no. 10, pp. 6129⁠–⁠6138. https://doi.org/10.1109/TIT.2017.2728079

    Article  MathSciNet  Google Scholar 

  8. Leupold, P., Martín-Vide, C., and Mitrana, V., Uniformly Bounded Duplication Languages, Discrete Appl. Math., 2005, vol. 146, no. 3, pp. 301⁠–⁠310. https://doi.org/10.1016/j.dam.2004.10.003

    Article  MathSciNet  Google Scholar 

  9. Shannon, C.E., The Zero Error Capacity of a Noisy Channel, IRE Trans. Inform. Theory, 1956, vol. 2, no. 3, pp. 8⁠–⁠19. https://doi.org/10.1109/TIT.1956.1056798

    Article  MathSciNet  Google Scholar 

  10. Marcus, B.H., Roth, R.M., and Siegel, P.H., An Introduction to Coding for Constrained Systems (unpublished manuscript), 5th ed., 2001. Available online at http://www.math.ubc.ca/~marcus/Handbook.

  11. Chee, Y.M., Chrisnata, J., Kiah, H.M., and Nguyen, T.T., Deciding the Confusability of Words under Tandem Repeats in Linear Time, ACM Trans. Algorithms, 2019, vol. 15, no. 3, Art. 42 (22 pp.). https://doi.org/10.1145/3338514

    Article  MathSciNet  Google Scholar 

Download references

Funding

This work was supported by the European Union's Horizon 2020 research and innovation programme under Grant Agreement no. 856967, and by the Secretariat for Higher Education and Scientific Research of the Autonomous Province of Vojvodina through the project no. 142-451-2686/2021.

Author information

Authors and Affiliations

Authors

Additional information

Translated from Problemy Peredachi Informatsii, 2022, Vol. 58, No. 2, pp. 12–23 https://doi.org/10.31857/S0555292322020028.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kovačević, M. On the Maximum Number of Non-Confusable Strings Evolving under Short Tandem Duplications. Probl Inf Transm 58, 111–121 (2022). https://doi.org/10.1134/S0032946022020028

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0032946022020028

Keywords

Navigation