Abstract
The fixed-length Levenshtein (FLL) distance between two words \(\varvec{x}, \varvec{y}\in \mathbb {Z}_m^n\) is the smallest integer t such that \(\varvec{x}\) can be transformed to \(\varvec{y}\) by t insertions and t deletions. The size of a ball in the FLL metric is a fundamental yet challenging problem. Very recently, Bar-Lev, Etzion, and Yaakobi explicitly determined the minimum, maximum and average sizes of the FLL balls with radius one, respectively. In this paper, based on these results, we further prove that the size of the FLL balls with radius one is highly concentrated around its mean by Azuma’s inequality.
Similar content being viewed by others
Notes
In [2, p. 2334], left column, from the fourth equation to the fifth equation, the last term of the fifth equation should not be \(\frac{2}{q-1}\) but \(\frac{2q}{q-1}\). This error spreads out in the following calculations therein.
Note that \(f_{m,n}(x_i) = mn - n - 1\), and we then express \(f_{m,n}(\varvec{x}_{[1,i]})\) by \(f_{m,n}(\varvec{x}_{[1,i-1]})\) and \(f_{m,n}(x_{i})\).
References
Alon N., Spencer J.H.: The Probabilistic Method. Wiley Series in Discrete Mathematics and Optimization, 4th edn. Wiley, Hoboken (2016).
Bar-Lev D., Etzion T., Yaakobi E.: On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric. IEEE Trans. Inf. Theory 69(4), 2324–2340 (2023).
Calabi L., Hartnett W.E.: Some general results of coding theory with applications to the study of codes for the correction of synchronization errors. Inf. Control 15, 235–249 (1969).
He L., Ye M.: The size of Levenshtein ball with radius 2: expectation and concentration bound. In: 2023 IEEE International Symposium on Information Theory (ISIT), pp. 850–855. IEEE (2023).
Hirschberg D.S., Regnier M.: Tight bounds on the number of string subsequences. J. Discret. Algorithms (Oxf.) 1(1), 123–132 (2000).
Levenshtein V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Dokl. 10, 707–710 (1965).
Mitzenmacher M.: A survey of results for deletion channels and related synchronization channels. Probab. Surv. 6, 1–33 (2009).
Sala F., Dolecek L.: Counting sequences obtained from the synchronization channel. In: 2013 IEEE International Symposium on Information Theory (ISIT 2013), pp. 2925–2929. IEEE (2013).
Acknowledgements
The authors are very grateful to the editor and the anonymous reviewer for comments that improve the quality of this paper. In particular, the authors sincerely thank the Editor-in-Chief, Professor Dieter Jungnickel for his great efforts in overseeing the review process of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest that are relevant to the content of this article.
Additional information
Communicated by D. Jungnickel.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The paper was presented (in part) at The Twelfth International Workshop on Coding and Cryptography (WCC), Rostock, Germany, March 7–11, 2022. Qi Wang was supported in part by the National Natural Science Foundation of China (Grant Nos. 12371522, 11931005). Geyang Wang was supported by NSF-BSF grant CCF2110113).
Appendices
A Proof of Eq. (18)
We follow the notations in the proof of Theorem 4.
-
Case \(i = 1\): By symmetry, we have \(|Z_1 - Z_0| = 0\).
-
Case \(1< i <n\):
By Eq. (16), we have
$$\begin{aligned}{} & {} Z_i - Z_{i-1} \\{} & {} \quad \ge f_{m,n}(\varvec{x}_{[1,i]}) + g_{m,n}(i) + 1 -n + \frac{n}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) \\{} & {} \qquad - \left[ f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i-1) -n + \frac{n}{m} + \frac{1}{m}\right] \\{} & {} \quad = f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i) - g_{m,n}(i-1) \\{} & {} \qquad + \frac{m-1}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) ,\\{} & {} Z_i - Z_{i-1} \\{} & {} \quad \le f_{m,n}(\varvec{x}_{[1,i]}) + g_{m,n}(i) - n + \frac{n}{m} + \frac{1}{m} \\{} & {} \qquad - \left[ f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i-1) + 1 -n + \frac{n}{m} - \right. \\{} & {} \quad \left. \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \right] \\{} & {} \quad = f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i) - g_{m,n}(i-1) \\{} & {} \qquad - \frac{m-1}{m} + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) . \end{aligned}$$By Lemma 5, we haveFootnote 3
$$\begin{aligned} 0 \le f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) \le mn - n - 1, \end{aligned}$$and
$$\begin{aligned} g_{m,n}(i) - g_{m,n}(i-1) = 1 - n(m + \frac{1}{m} - 2) - \frac{1}{m^{n-i}}. \end{aligned}$$Therefore, we have
$$\begin{aligned} \begin{aligned} Z_i - Z_{i-1}&\ge 1 - n \left( m + \frac{1}{m} - 2 \right) - \frac{1}{m^{n-i}} + \frac{m-1}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) \\&> - n\left( m + \frac{1}{m} - 2\right) - \frac{1}{m^{n-i}} - (n-1) \cdot 2 \\&> - n\left( m + \frac{1}{m}\right) + 1, \end{aligned} \end{aligned}$$and also
$$\begin{aligned} \begin{aligned} Z_i - Z_{i-1}&\le mn - n - 1 + 1 - n\left( m + \frac{1}{m} - 2\right) - \frac{1}{m^{n-i}} - \frac{m-1}{m} \\&\quad + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \\&= n\left( 1 - \frac{1}{m}\right) - \frac{1}{m^{n-i}} - \frac{m-1}{m} + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \\&< n\left( 1 - \frac{1}{m}\right) + 2n = n\left( 3 - \frac{1}{m}\right) . \end{aligned} \end{aligned}$$Note that \(|-n(m + \frac{1}{m}) + 1| \le n(m + \frac{1}{m})\) and \(|n(3 - \frac{1}{m})| \le n(m + \frac{1}{m})\). Hence, we have \(|Z_i - Z_{i-1}| \le n(m + \frac{1}{m})\) for \(1< i < n\).
-
Case \(i = n\):
By Eq. (16) and Lemma 5, we have
$$\begin{aligned} 0 \le f_{m,n}(\varvec{x}) - f_{m,n}(\varvec{x}_{[1,n-1]}) \le mn -n - 1, \end{aligned}$$and
$$\begin{aligned} \begin{aligned} Z_n - Z_{n-1}&\le f_{m,n}(\varvec{x}) - \left[ f_{m,n}(\varvec{x}_{[1,n-1]}) + \left( 1 - \frac{1}{m} \right) \left( mn -n -t(\varvec{x}_{[1,n-1]})\right) \right] \\&\le mn - n - 1 - \left( 1 - \frac{1}{m} \right) (mn - n -t(\varvec{x}_{[1,n-1]})) \\&= t(\varvec{x}_{[1,n-1]}) \left( 1 - \frac{1}{m} \right) - \frac{n}{m} \\&< n \left( 1 - \frac{1}{m} \right) , \end{aligned} \end{aligned}$$and also,
$$\begin{aligned} \begin{aligned} Z_n - Z_{n-1}&\ge f_{m,n}(\varvec{x}) - \left[ f_{m,n}(\varvec{x}_{[1,n-1]}) + \left( 1 - \frac{1}{m}\right) (mn - n -1) \right] \\&\ge -\left( 1 - \frac{1}{m}\right) (mn - n - 1)\\&= -n\left( m + \frac{1}{m} \right) - \frac{1}{m} + 1. \end{aligned} \end{aligned}$$Thus, we have \(|Z_n - Z_{n-1}| \le n(m + \frac{1}{m})\).
B Simulation results
We independently pick \(x \in \mathbb {Z}_m^n\) uniformly at random and record the value \(|L_1(\varvec{x})|\). The distribution of \(|L_1(\varvec{x})|\) is then reflected by the frequency that \(|L_1(\varvec{x})|\) lies in different intervals. We also compare it with the expected frequency given by the bounds in Theorems 3 and 4. For instance, the expected frequency of the event \(|L_1(\varvec{x})| > \tau \) by Eq. (7) is
where N is the sample size. The simulation results for \(n = 100\), \(m = 2,3,4,5\) are depicted in Fig. 1.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, G., Wang, Q. On the size distribution of the fixed-length Levenshtein balls with radius one. Des. Codes Cryptogr. (2024). https://doi.org/10.1007/s10623-024-01382-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10623-024-01382-1