Skip to main content
Log in

On the size distribution of the fixed-length Levenshtein balls with radius one

  • Published:
Designs, Codes and Cryptography Aims and scope Submit manuscript

Abstract

The fixed-length Levenshtein (FLL) distance between two words \(\varvec{x}, \varvec{y}\in \mathbb {Z}_m^n\) is the smallest integer t such that \(\varvec{x}\) can be transformed to \(\varvec{y}\) by t insertions and t deletions. The size of a ball in the FLL metric is a fundamental yet challenging problem. Very recently, Bar-Lev, Etzion, and Yaakobi explicitly determined the minimum, maximum and average sizes of the FLL balls with radius one, respectively. In this paper, based on these results, we further prove that the size of the FLL balls with radius one is highly concentrated around its mean by Azuma’s inequality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. In [2, p. 2334], left column, from the fourth equation to the fifth equation, the last term of the fifth equation should not be \(\frac{2}{q-1}\) but \(\frac{2q}{q-1}\). This error spreads out in the following calculations therein.

  2. It can be verified by taking \(i=1\) in Eq. (9). Alternatively, it also follows from the symmetry induced by the equivalence relation defined in the proof of Lemma 1.

  3. Note that \(f_{m,n}(x_i) = mn - n - 1\), and we then express \(f_{m,n}(\varvec{x}_{[1,i]})\) by \(f_{m,n}(\varvec{x}_{[1,i-1]})\) and \(f_{m,n}(x_{i})\).

References

  1. Alon N., Spencer J.H.: The Probabilistic Method. Wiley Series in Discrete Mathematics and Optimization, 4th edn. Wiley, Hoboken (2016).

  2. Bar-Lev D., Etzion T., Yaakobi E.: On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric. IEEE Trans. Inf. Theory 69(4), 2324–2340 (2023).

    Article  MathSciNet  Google Scholar 

  3. Calabi L., Hartnett W.E.: Some general results of coding theory with applications to the study of codes for the correction of synchronization errors. Inf. Control 15, 235–249 (1969).

    Article  MathSciNet  Google Scholar 

  4. He L., Ye M.: The size of Levenshtein ball with radius 2: expectation and concentration bound. In: 2023 IEEE International Symposium on Information Theory (ISIT), pp. 850–855. IEEE (2023).

  5. Hirschberg D.S., Regnier M.: Tight bounds on the number of string subsequences. J. Discret. Algorithms (Oxf.) 1(1), 123–132 (2000).

    MathSciNet  Google Scholar 

  6. Levenshtein V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Dokl. 10, 707–710 (1965).

    MathSciNet  Google Scholar 

  7. Mitzenmacher M.: A survey of results for deletion channels and related synchronization channels. Probab. Surv. 6, 1–33 (2009).

    Article  MathSciNet  Google Scholar 

  8. Sala F., Dolecek L.: Counting sequences obtained from the synchronization channel. In: 2013 IEEE International Symposium on Information Theory (ISIT 2013), pp. 2925–2929. IEEE (2013).

Download references

Acknowledgements

The authors are very grateful to the editor and the anonymous reviewer for comments that improve the quality of this paper. In particular, the authors sincerely thank the Editor-in-Chief, Professor Dieter Jungnickel for his great efforts in overseeing the review process of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Wang.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest that are relevant to the content of this article.

Additional information

Communicated by D. Jungnickel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The paper was presented (in part) at The Twelfth International Workshop on Coding and Cryptography (WCC), Rostock, Germany, March 7–11, 2022. Qi Wang was supported in part by the National Natural Science Foundation of China (Grant Nos. 12371522, 11931005). Geyang Wang was supported by NSF-BSF grant CCF2110113).

Appendices

A Proof of Eq. (18)

We follow the notations in the proof of Theorem 4.

  • Case \(i = 1\): By symmetry, we have \(|Z_1 - Z_0| = 0\).

  • Case \(1< i <n\):

    By Eq. (16), we have

    $$\begin{aligned}{} & {} Z_i - Z_{i-1} \\{} & {} \quad \ge f_{m,n}(\varvec{x}_{[1,i]}) + g_{m,n}(i) + 1 -n + \frac{n}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) \\{} & {} \qquad - \left[ f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i-1) -n + \frac{n}{m} + \frac{1}{m}\right] \\{} & {} \quad = f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i) - g_{m,n}(i-1) \\{} & {} \qquad + \frac{m-1}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) ,\\{} & {} Z_i - Z_{i-1} \\{} & {} \quad \le f_{m,n}(\varvec{x}_{[1,i]}) + g_{m,n}(i) - n + \frac{n}{m} + \frac{1}{m} \\{} & {} \qquad - \left[ f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i-1) + 1 -n + \frac{n}{m} - \right. \\{} & {} \quad \left. \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \right] \\{} & {} \quad = f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i) - g_{m,n}(i-1) \\{} & {} \qquad - \frac{m-1}{m} + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) . \end{aligned}$$

    By Lemma 5, we haveFootnote 3

    $$\begin{aligned} 0 \le f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) \le mn - n - 1, \end{aligned}$$

    and

    $$\begin{aligned} g_{m,n}(i) - g_{m,n}(i-1) = 1 - n(m + \frac{1}{m} - 2) - \frac{1}{m^{n-i}}. \end{aligned}$$

    Therefore, we have

    $$\begin{aligned} \begin{aligned} Z_i - Z_{i-1}&\ge 1 - n \left( m + \frac{1}{m} - 2 \right) - \frac{1}{m^{n-i}} + \frac{m-1}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) \\&> - n\left( m + \frac{1}{m} - 2\right) - \frac{1}{m^{n-i}} - (n-1) \cdot 2 \\&> - n\left( m + \frac{1}{m}\right) + 1, \end{aligned} \end{aligned}$$

    and also

    $$\begin{aligned} \begin{aligned} Z_i - Z_{i-1}&\le mn - n - 1 + 1 - n\left( m + \frac{1}{m} - 2\right) - \frac{1}{m^{n-i}} - \frac{m-1}{m} \\&\quad + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \\&= n\left( 1 - \frac{1}{m}\right) - \frac{1}{m^{n-i}} - \frac{m-1}{m} + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \\&< n\left( 1 - \frac{1}{m}\right) + 2n = n\left( 3 - \frac{1}{m}\right) . \end{aligned} \end{aligned}$$

    Note that \(|-n(m + \frac{1}{m}) + 1| \le n(m + \frac{1}{m})\) and \(|n(3 - \frac{1}{m})| \le n(m + \frac{1}{m})\). Hence, we have \(|Z_i - Z_{i-1}| \le n(m + \frac{1}{m})\) for \(1< i < n\).

  • Case \(i = n\):

    By Eq. (16) and Lemma 5, we have

    $$\begin{aligned} 0 \le f_{m,n}(\varvec{x}) - f_{m,n}(\varvec{x}_{[1,n-1]}) \le mn -n - 1, \end{aligned}$$

    and

    $$\begin{aligned} \begin{aligned} Z_n - Z_{n-1}&\le f_{m,n}(\varvec{x}) - \left[ f_{m,n}(\varvec{x}_{[1,n-1]}) + \left( 1 - \frac{1}{m} \right) \left( mn -n -t(\varvec{x}_{[1,n-1]})\right) \right] \\&\le mn - n - 1 - \left( 1 - \frac{1}{m} \right) (mn - n -t(\varvec{x}_{[1,n-1]})) \\&= t(\varvec{x}_{[1,n-1]}) \left( 1 - \frac{1}{m} \right) - \frac{n}{m} \\&< n \left( 1 - \frac{1}{m} \right) , \end{aligned} \end{aligned}$$

    and also,

    $$\begin{aligned} \begin{aligned} Z_n - Z_{n-1}&\ge f_{m,n}(\varvec{x}) - \left[ f_{m,n}(\varvec{x}_{[1,n-1]}) + \left( 1 - \frac{1}{m}\right) (mn - n -1) \right] \\&\ge -\left( 1 - \frac{1}{m}\right) (mn - n - 1)\\&= -n\left( m + \frac{1}{m} \right) - \frac{1}{m} + 1. \end{aligned} \end{aligned}$$

    Thus, we have \(|Z_n - Z_{n-1}| \le n(m + \frac{1}{m})\).

B Simulation results

We independently pick \(x \in \mathbb {Z}_m^n\) uniformly at random and record the value \(|L_1(\varvec{x})|\). The distribution of \(|L_1(\varvec{x})|\) is then reflected by the frequency that \(|L_1(\varvec{x})|\) lies in different intervals. We also compare it with the expected frequency given by the bounds in Theorems 3 and 4. For instance, the expected frequency of the event \(|L_1(\varvec{x})| > \tau \) by Eq. (7) is

$$\begin{aligned} N e^{-2 \left( \frac{\tau - \mathbb {E}\left[ |L_1(\varvec{x})| \right] }{n \sqrt{n-1}}\right) ^2}, \end{aligned}$$

where N is the sample size. The simulation results for \(n = 100\), \(m = 2,3,4,5\) are depicted in Fig. 1.

Fig. 1
figure 1

\(N = 5000\) for each experiment

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, G., Wang, Q. On the size distribution of the fixed-length Levenshtein balls with radius one. Des. Codes Cryptogr. (2024). https://doi.org/10.1007/s10623-024-01382-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10623-024-01382-1

Keywords

Mathematics Subject Classification

Navigation