Random Access in Persistent Strings and Segment Selection

Bille, Philip; Gørtz, Inge Li

doi:10.1007/s00224-022-10109-5

Random Access in Persistent Strings and Segment Selection

Published: 17 December 2022

Volume 67, pages 694–713, (2023)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

85 Accesses
Explore all metrics

Abstract

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with n nodes, we show how to represent the corresponding collection in O(n) space and \(O(\log n/ \log \log n)\) query time. This improves the previous time-space trade-offs for the problem. Additionally, we show a lower bound proving that the query time is optimal for any solution using near-linear space. To achieve our bounds for random access in persistent strings we show how to reduce the problem to the following natural geometric selection problem on line segments. Consider a set of horizontal line segments in the plane. Given parameters i and j, a segment selection query returns the j th smallest segment (the segment with the j th smallest y-coordinate) among the segments crossing the vertical line through x-coordinate i. The segment selection problem is to preprocess a set of horizontal line segments into a compact data structure that supports fast segment selection queries. We present a solution that uses O(n) space and support segment selection queries in \(O(\log n/ \log \log n)\) time, where n is the number of segments. Furthermore, we prove that that this query time is also optimal for any solution using near-linear space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Compressed Strings with Random Access

Lempel–Ziv-Like Parsing in Small Space

Article 25 May 2020

On the Smoothed Heights of Trie and Patricia Index Trees

References

Agarwal, P.K., Arge, L., Kaplan, H., Molad, E., Tarjan, R.E., Yi, K.: An optimal dynamic data structure for stabbing-semigroup queries. SIAM J. Comput. 41(1), 104–127 (2012)
Article MathSciNet MATH Google Scholar
Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69(1), 232–268 (2014)
Article MathSciNet MATH Google Scholar
Barbay, J., He, M., Munro, J.I., Rao, S.S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: Proc. 18Th SODA, pp. 680–689 (2007)
Belazzougui, D., Cording, P.H., Puglisi, S.J., Tabei, Y.: Access, rank, and select in grammar-compressed strings. In: Proc. 23Rd ESA, pp. 142–154 (2015)
Belazzougui, D., Navarro, G.: Optimal lower and upper bounds for representing sequences. ACM Trans Algorithms 11(4), 1–21 (2015)
Article MathSciNet MATH Google Scholar
Bille, P., Christiansen, A.R., Cording, P.H., Gørtz, I.L., Skjoldjensen, F.R., Vildhøj, H.W., Vind, S.: Dynamic relative compression, dynamic partial sums, and substring concatenation. Algorithmica 80(11), 3207–3224 (2018). Announced at ISAAC 2016
Article MathSciNet MATH Google Scholar
Bille, P., Christiansen, A.R., Prezza, N., Skjoldjensen, F.R.: Succinct partial sums and fenwick trees. In: Proc. 24Th SPIRE, pp. 91–96 (2017)
Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time–space trade-offs for lempel–ziv compressed indexing. Theoret. Comput. Sci. 713, 66–77 (2018)
Article MathSciNet MATH Google Scholar
Bille, P., Gørtz, I.L.: Random access in persistent strings. In: Proc. 31St ISAAC (2020)
Bille, P., Gørtz, I.L., Landau, G.M., Weimann, O.: Tree compression with top trees. Inform. Comput. 243, 166–177 (2015)
Article MathSciNet MATH Google Scholar
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). Announced at SODA 2011
Article MathSciNet MATH Google Scholar
Chan, T.M.: Persistent predecessor search and orthogonal point location on the word RAM. ACM Trans. Algorithms 9(3), 1–22 (2013)
Article MathSciNet MATH Google Scholar
Chan, T.M., Pǎtraşcu, M.: Transdichotomous results in computational geometry, i: Point location in sublogarithmic time. SIAM J. Comput. 39(2), 703–729 (2009)
Article MathSciNet MATH Google Scholar
Chan, T.M., Tsakalidis, K.: Dynamic planar orthogonal point location in sublogarithmic time. In: Proc 34Th SoCG 2018 (2018)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inform. Theory 51(7), 2554–2576 (2005)
Article MathSciNet MATH Google Scholar
Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)
Article MathSciNet MATH Google Scholar
Chern, B., Ochoa, I., Manolakos, A., No, A., Venkat, K., Weissman, T.: Reference based genome compression. In: Proc. 12Th ITW, pp. 427–431 (2012)
De Berg, M., Vankreveld, M., Snoeyink, J.: Two-dimensional and three-dimensional point location in rectangular subdivisions. J. Algorithms 18(2), 256–277 (1995)
Article MathSciNet Google Scholar
Dietz, P.F.: Fully persistent arrays (extended array). In: Proceedings of the Workshop on Algorithms and Data Structures, Lecture Notes in Computer Science, vol. 382, pp. 67–74 (1989)
Dietz, P.F.: Optimal algorithms for list indexing and subset rank. In: Proc. 1St WADS, pp. 39–46 (1989)
Do, H.H., Jansson, J., Sadakane, K., Sung, W.K.: Fast relative lempel–Ziv self-index for similar sequences. Theoret. Comput. Sci. 532, 14–30 (2014)
Article MathSciNet MATH Google Scholar
Driscoll, J., Sarnak, N., Sleator, D., Tarjan, R.: Making data structures persistent. J. Comput. System Sci. 38, 86–124 (1989)
Article MathSciNet MATH Google Scholar
Fenwick, P.M.: A new data structure for cumulative frequency tables. Software: Pract. Exper. 24(3), 327–336 (1994)
Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms 3(2), 20 (2007)
Article MathSciNet MATH Google Scholar
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theoret. Comput. Sci. 372(1), 115–121 (2007)
Article MathSciNet MATH Google Scholar
Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proc. 21St STOC, pp. 345–354 (1989)
Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. System Sci. 47(3), 424–436 (1993)
Article MathSciNet MATH Google Scholar
Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci. 48(3), 533–551 (1994)
Article MathSciNet MATH Google Scholar
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proc. 6Th LATA, pp. 240–251 (2012)
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proc. 14Th LATIN, pp. 731–742 (2014)
Gagie, T., Gawrychowski, P., Puglisi, S.J.: Approximate pattern matching in lz77-compressed texts. J. Discrete Algorithms 32, 64–68 (2015)
Article MathSciNet MATH Google Scholar
Gagie, T., Karhu, K., Navarro, G., Puglisi, S.J., Sirén, J.: Document listing on repetitive collections. In: Proc. 24Th CPM, pp. 107–119 (2013)
Ganardi, M., Jez, A., Lohrey, M.: Balancing straight-line programs. In: Proc. 60Th FOCS, pp. 1169–1183 (2019)
Golynski, A., Munro, J.I., Rao, S.S.: Rank/Select operations on large alphabets: a tool for text indexing. In: Proc. 17Th SODA, pp. 368–373 (2006)
Golynski, A., Raman, R., Rao, S.S.: On the redundancy of succinct data structures. In: Proc. 11Th SWAT, pp. 148–159 (2008)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 14Th SODA, pp. 841–850 (2003)
Grossi, R., Raman, R., Rao, S.S., Venturini, R.: Dynamic compressed strings with random access. In: Proc. 40Th ICALP, pp. 504–515 (2013)
Hon, W.K., Sadakane, K., Sung, W.K.: Succinct data structures for searchable partial sums with optimal worst-case performance. Theoret. Comput. Sci. 412(39), 5176–5186 (2011)
Article MathSciNet MATH Google Scholar
Hoobin, C., Puglisi, S.J., Zobel, J.: Relative lempel-Ziv factorization for efficient storage and retrieval of web collections. Proc. VLDB Endowment 5(3), 265–273 (2011)
Article Google Scholar
Jørgensen, A.G., Larsen, K.G.: Range selection and median: tight cell probe lower bounds and adaptive data structures. In: Proc. 22Nd SODA, pp. 805–813 (2011)
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proc. 50Th STOC, pp. 827–840 (2018)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative lempel-ziv compression of genomes for large-scale storage and retrieval. In: Proc. 17Th SPIRE, pp. 201–206 (2010)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative lempel-ziv compression of genomes. In: Proc. 34Th ACSC, pp. 91–98 (2011)
Liao, S.Y., Devadas, S., Keutzer, K.: A text-compression-based method for code size minimization in embedded systems. Trans. Design Autom. Electr. Syst. 4(1), 12–38 (1999)
Article Google Scholar
Liao, S.Y., Devadas, S., Keutzer, K., Tjiang, S.W.K., Wang, A.: Code optimization techniques in embedded DSP microprocessors. Design. Autom. Emb. Sys. 3(1), 59–73 (1998)
Article Google Scholar
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17 (3), 281–308 (2010)
Article MathSciNet Google Scholar
Munro, J.I., Nekrich, Y.: Compressed data structures for dynamic sequences. In: Proc. 23Rd ESA, pp. 891–902 (2015)
Navarro, G.: Indexing highly repetitive collections. In: Proc. 23Rd IWOCA, pp. 274–279 (2012)
Navarro, G.: Document listing on repetitive collections with guaranteed performance. Theoret. Comput. Sci. 772, 58–72 (2019)
Article MathSciNet MATH Google Scholar
Nekrich, Y.: A dynamic stabbing-max data structure with sub-logarithmic query time. In: Proc. 22Nd ISAAC, pp. 170–179 (2011)
Pătraşcu, M., Thorup, M.: Dynamic integer sets with optimal rank, select, and predecessor search. In: Proc. 55Th FOCS, pp. 166–175 (2014)
Pǎtraşcu, M., Demaine, E.D.: Logarithmic lower bounds in the cell-probe model. SIAM J. Comput. 35(4), 932–963 (2006). Announced at SODA 2004
Article MathSciNet MATH Google Scholar
Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Proc. 7Th WADS, pp. 426–437 (2001)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1-3), 211–222 (2003)
Article MathSciNet MATH Google Scholar
Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proc. 17Th SODA, pp. 1230–1239 (2006)
Sarnak, N., Tarjan, R.E.: Planar point location using persistent search trees. Commun. ACM 29(7), 669–679 (1986)
Article MathSciNet MATH Google Scholar
Storer, J.A., Szymanski, T.G.: The macro model for data compression. In: Proc. 10Th STOC, pp. 30–39 (1978)
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)
Article MathSciNet MATH Google Scholar
Tarjan, R.E., Vishkin, U.: Finding biconnected componemts and computing tree functions in logarithmic parallel time. In: Proc. 25Th FOCS, pp. 12–20 (1984)
Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proc. 24Th CPM, pp. 247–258 (2013)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23(3), 337–343 (1977)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank Jesper Jansson for pointing out segment selection as a problem of independent interest and the anonymous reviewers for their helpful comments that improved the presentation of earlier versions of this paper.

Author information

Authors and Affiliations

DTU Compute, Technical University of Denmark, DK-2800 Kgs., Lyngby, Denmark
Philip Bille & Inge Li Gørtz

Authors

Philip Bille
View author publications
You can also search for this author in PubMed Google Scholar
Inge Li Gørtz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip Bille.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An extended abstract appeared at the 31st International Symposium on Algorithms and Computation [9]

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bille, P., Gørtz, I.L. Random Access in Persistent Strings and Segment Selection. Theory Comput Syst 67, 694–713 (2023). https://doi.org/10.1007/s00224-022-10109-5

Download citation

Accepted: 09 November 2022
Published: 17 December 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00224-022-10109-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Access in Persistent Strings and Segment Selection

Abstract

Access this article

Similar content being viewed by others

Dynamic Compressed Strings with Random Access

Lempel–Ziv-Like Parsing in Small Space

On the Smoothed Heights of Trie and Patricia Index Trees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Random Access in Persistent Strings and Segment Selection

Abstract

Access this article

Similar content being viewed by others

Dynamic Compressed Strings with Random Access

Lempel–Ziv-Like Parsing in Small Space

On the Smoothed Heights of Trie and Patricia Index Trees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation