当前位置: X-MOL 学术arXiv.cs.DM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bit catastrophes for the Burrows-Wheeler Transform
arXiv - CS - Discrete Mathematics Pub Date : 2024-04-16 , DOI: arxiv-2404.10426
Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Giuseppe Romana, Marinella Sciortino, Cristian Urbina

A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted $r$. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases $r$ from constant to $\Theta(\log n)$, where $n$ is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive $\Theta(\log n)$-factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of $O(\log n \log r)$, since here $r=O(1)$. We then give examples of strings in which insertion, deletion, resp. substitution of a character increases $r$ by a $\Theta(\sqrt{n})$ additive factor. These strings significantly improve the best known lower bound for an additive factor of $\Omega(\log n)$ [Giuliani et al., SOFSEM 2021].

中文翻译:

Burrows-Wheeler 变换的位灾难

广义上的位灾难是指字符串中的一个字符的更改会导致压缩字符串的大小发生显着变化。我们通过 Burrows-Wheeler 变换 (BWT) 研究这种现象,这是当今几种最流行的压缩器和对齐器的核心字符串变换。确定压缩数据大小的参数是 BWT 的等字母运行次数,通常表示为 $r$。我们展示了无限的字符串族,其中插入,删除,分别。替换一个字符会将 $r$ 从常量增加到 $\Theta(\log n)$,其中 $n$ 是字符串的长度。这些字符串可以解释为乘法或加法 $\Theta(\log n)$ 因子增加的示例。至于乘法因子,它们达到了 Akagi、Funakoshi 和 Inenaga 给出的上限 [Inf & Comput. 2023] 的 $O(\log n \log r)$,因为这里 $r=O(1)$。然后我们给出字符串的例子,其中插入,删除,分别。字符替换会使 $r$ 增加 $\Theta(\sqrt{n})$ 加法因子。这些字符串显着改善了已知的加性因子 $\Omega(\log n)$ 的下界 [Giuliani et al., SOFSEM 2021]。
更新日期:2024-04-17
down
wechat
bug