On the size distribution of the fixed-length Levenshtein balls with radius one

Wang, Geyang; Wang, Qi

doi:10.1007/s10623-024-01382-1

On the size distribution of the fixed-length Levenshtein balls with radius one

Published: 05 April 2024

(2024)
Cite this article

Designs, Codes and Cryptography Aims and scope Submit manuscript

46 Accesses
Explore all metrics

Abstract

The fixed-length Levenshtein (FLL) distance between two words $\varvec{x}, \varvec{y}\in \mathbb {Z}_m^n$ is the smallest integer t such that $\varvec{x}$ can be transformed to $\varvec{y}$ by t insertions and t deletions. The size of a ball in the FLL metric is a fundamental yet challenging problem. Very recently, Bar-Lev, Etzion, and Yaakobi explicitly determined the minimum, maximum and average sizes of the FLL balls with radius one, respectively. In this paper, based on these results, we further prove that the size of the FLL balls with radius one is highly concentrated around its mean by Azuma’s inequality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometry of $$\ell _p^n\,\text{-Balls}$$ : Classical Results and Recent Developments

Small Ball Probability, Inverse Theorems, and Applications

Volume Properties of High-Dimensional Orlicz Balls

Notes

In [2, p. 2334], left column, from the fourth equation to the fifth equation, the last term of the fifth equation should not be $\frac{2}{q-1}$ but $\frac{2q}{q-1}$. This error spreads out in the following calculations therein.
It can be verified by taking $i=1$ in Eq. (9). Alternatively, it also follows from the symmetry induced by the equivalence relation defined in the proof of Lemma 1.
Note that $f_{m,n}(x_i) = mn - n - 1$, and we then express $f_{m,n}(\varvec{x}_{[1,i]})$ by $f_{m,n}(\varvec{x}_{[1,i-1]})$ and $f_{m,n}(x_{i})$.

References

Alon N., Spencer J.H.: The Probabilistic Method. Wiley Series in Discrete Mathematics and Optimization, 4th edn. Wiley, Hoboken (2016).
Bar-Lev D., Etzion T., Yaakobi E.: On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric. IEEE Trans. Inf. Theory 69(4), 2324–2340 (2023).
Article MathSciNet Google Scholar
Calabi L., Hartnett W.E.: Some general results of coding theory with applications to the study of codes for the correction of synchronization errors. Inf. Control 15, 235–249 (1969).
Article MathSciNet Google Scholar
He L., Ye M.: The size of Levenshtein ball with radius 2: expectation and concentration bound. In: 2023 IEEE International Symposium on Information Theory (ISIT), pp. 850–855. IEEE (2023).
Hirschberg D.S., Regnier M.: Tight bounds on the number of string subsequences. J. Discret. Algorithms (Oxf.) 1(1), 123–132 (2000).
MathSciNet Google Scholar
Levenshtein V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Dokl. 10, 707–710 (1965).
MathSciNet Google Scholar
Mitzenmacher M.: A survey of results for deletion channels and related synchronization channels. Probab. Surv. 6, 1–33 (2009).
Article MathSciNet Google Scholar
Sala F., Dolecek L.: Counting sequences obtained from the synchronization channel. In: 2013 IEEE International Symposium on Information Theory (ISIT 2013), pp. 2925–2929. IEEE (2013).

Download references

Acknowledgements

The authors are very grateful to the editor and the anonymous reviewer for comments that improve the quality of this paper. In particular, the authors sincerely thank the Editor-in-Chief, Professor Dieter Jungnickel for his great efforts in overseeing the review process of this paper.

Author information

Authors and Affiliations

Department of Electronic and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, MD, 20742, USA
Geyang Wang
Department of Computer Science and Engineering & National Center for Applied Mathematics Shenzhen, Southern University of Science and Technology, Nanshan District, Shenzhen, 518055, Guangdong, China
Qi Wang

Authors

Geyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Wang.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest that are relevant to the content of this article.

Additional information

Communicated by D. Jungnickel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The paper was presented (in part) at The Twelfth International Workshop on Coding and Cryptography (WCC), Rostock, Germany, March 7–11, 2022. Qi Wang was supported in part by the National Natural Science Foundation of China (Grant Nos. 12371522, 11931005). Geyang Wang was supported by NSF-BSF grant CCF2110113).

Appendices

A Proof of Eq. (18)

We follow the notations in the proof of Theorem 4.

Case $i = 1$: By symmetry, we have $|Z_1 - Z_0| = 0$.
Case $1< i <n$:

By Eq. (16), we have
$$\begin{aligned}{} & {} Z_i - Z_{i-1} \\{} & {} \quad \ge f_{m,n}(\varvec{x}_{[1,i]}) + g_{m,n}(i) + 1 -n + \frac{n}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) \\{} & {} \qquad - \left[ f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i-1) -n + \frac{n}{m} + \frac{1}{m}\right] \\{} & {} \quad = f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i) - g_{m,n}(i-1) \\{} & {} \qquad + \frac{m-1}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) ,\\{} & {} Z_i - Z_{i-1} \\{} & {} \quad \le f_{m,n}(\varvec{x}_{[1,i]}) + g_{m,n}(i) - n + \frac{n}{m} + \frac{1}{m} \\{} & {} \qquad - \left[ f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i-1) + 1 -n + \frac{n}{m} - \right. \\{} & {} \quad \left. \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \right] \\{} & {} \quad = f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) + g_{m,n}(i) - g_{m,n}(i-1) \\{} & {} \qquad - \frac{m-1}{m} + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) . \end{aligned}$$
By Lemma 5, we have^{Footnote 3}
$$\begin{aligned} 0 \le f_{m,n}(\varvec{x}_{[1,i]}) - f_{m,n}(\varvec{x}_{[1,i-1]}) \le mn - n - 1, \end{aligned}$$
and
$$\begin{aligned} g_{m,n}(i) - g_{m,n}(i-1) = 1 - n(m + \frac{1}{m} - 2) - \frac{1}{m^{n-i}}. \end{aligned}$$
Therefore, we have
$$\begin{aligned} \begin{aligned} Z_i - Z_{i-1}&\ge 1 - n \left( m + \frac{1}{m} - 2 \right) - \frac{1}{m^{n-i}} + \frac{m-1}{m} - \frac{m-1}{m}t(\varvec{x}_{[1,i]})\left( 2 - \frac{1}{m^{n-i-1}}\right) \\&> - n\left( m + \frac{1}{m} - 2\right) - \frac{1}{m^{n-i}} - (n-1) \cdot 2 \\&> - n\left( m + \frac{1}{m}\right) + 1, \end{aligned} \end{aligned}$$
and also
$$\begin{aligned} \begin{aligned} Z_i - Z_{i-1}&\le mn - n - 1 + 1 - n\left( m + \frac{1}{m} - 2\right) - \frac{1}{m^{n-i}} - \frac{m-1}{m} \\&\quad + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \\&= n\left( 1 - \frac{1}{m}\right) - \frac{1}{m^{n-i}} - \frac{m-1}{m} + \frac{m-1}{m}t(\varvec{x}_{[1,i-1]})\left( 2 - \frac{1}{m^{n-i}}\right) \\&< n\left( 1 - \frac{1}{m}\right) + 2n = n\left( 3 - \frac{1}{m}\right) . \end{aligned} \end{aligned}$$
Note that $|-n(m + \frac{1}{m}) + 1| \le n(m + \frac{1}{m})$ and $|n(3 - \frac{1}{m})| \le n(m + \frac{1}{m})$. Hence, we have $|Z_i - Z_{i-1}| \le n(m + \frac{1}{m})$ for $1< i < n$.
Case $i = n$:

By Eq. (16) and Lemma 5, we have
$$\begin{aligned} 0 \le f_{m,n}(\varvec{x}) - f_{m,n}(\varvec{x}_{[1,n-1]}) \le mn -n - 1, \end{aligned}$$
and
$$\begin{aligned} \begin{aligned} Z_n - Z_{n-1}&\le f_{m,n}(\varvec{x}) - \left[ f_{m,n}(\varvec{x}_{[1,n-1]}) + \left( 1 - \frac{1}{m} \right) \left( mn -n -t(\varvec{x}_{[1,n-1]})\right) \right] \\&\le mn - n - 1 - \left( 1 - \frac{1}{m} \right) (mn - n -t(\varvec{x}_{[1,n-1]})) \\&= t(\varvec{x}_{[1,n-1]}) \left( 1 - \frac{1}{m} \right) - \frac{n}{m} \\&< n \left( 1 - \frac{1}{m} \right) , \end{aligned} \end{aligned}$$
and also,
$$\begin{aligned} \begin{aligned} Z_n - Z_{n-1}&\ge f_{m,n}(\varvec{x}) - \left[ f_{m,n}(\varvec{x}_{[1,n-1]}) + \left( 1 - \frac{1}{m}\right) (mn - n -1) \right] \\&\ge -\left( 1 - \frac{1}{m}\right) (mn - n - 1)\\&= -n\left( m + \frac{1}{m} \right) - \frac{1}{m} + 1. \end{aligned} \end{aligned}$$
Thus, we have $|Z_n - Z_{n-1}| \le n(m + \frac{1}{m})$.

B Simulation results

We independently pick $x \in \mathbb {Z}_m^n$ uniformly at random and record the value $|L_1(\varvec{x})|$. The distribution of $|L_1(\varvec{x})|$ is then reflected by the frequency that $|L_1(\varvec{x})|$ lies in different intervals. We also compare it with the expected frequency given by the bounds in Theorems 3 and 4. For instance, the expected frequency of the event $|L_1(\varvec{x})| > \tau $ by Eq. (7) is

$$\begin{aligned} N e^{-2 \left( \frac{\tau - \mathbb {E}\left[ |L_1(\varvec{x})| \right] }{n \sqrt{n-1}}\right) ^2}, \end{aligned}$$

where N is the sample size. The simulation results for $n = 100$, $m = 2,3,4,5$ are depicted in Fig. 1.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, G., Wang, Q. On the size distribution of the fixed-length Levenshtein balls with radius one. Des. Codes Cryptogr. (2024). https://doi.org/10.1007/s10623-024-01382-1

Download citation

Received: 05 August 2023
Revised: 19 February 2024
Accepted: 22 February 2024
Published: 05 April 2024
DOI: https://doi.org/10.1007/s10623-024-01382-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the size distribution of the fixed-length Levenshtein balls with radius one

Abstract

Access this article

Similar content being viewed by others

Geometry of $$\ell _p^n\,\text{-Balls}$$ : Classical Results and Recent Developments

Small Ball Probability, Inverse Theorems, and Applications

Volume Properties of High-Dimensional Orlicz Balls

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

A Proof of Eq. (18)

B Simulation results

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the size distribution of the fixed-length Levenshtein balls with radius one

Abstract

Access this article

Similar content being viewed by others

Geometry of $$\ell _p^n\,\text{-Balls}$$ : Classical Results and Recent Developments

Small Ball Probability, Inverse Theorems, and Applications

Volume Properties of High-Dimensional Orlicz Balls

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

A Proof of Eq. (18)

B Simulation results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation