Skip to main content
Log in

Random Access in Persistent Strings and Segment Selection

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with n nodes, we show how to represent the corresponding collection in O(n) space and \(O(\log n/ \log \log n)\) query time. This improves the previous time-space trade-offs for the problem. Additionally, we show a lower bound proving that the query time is optimal for any solution using near-linear space. To achieve our bounds for random access in persistent strings we show how to reduce the problem to the following natural geometric selection problem on line segments. Consider a set of horizontal line segments in the plane. Given parameters i and j, a segment selection query returns the j th smallest segment (the segment with the j th smallest y-coordinate) among the segments crossing the vertical line through x-coordinate i. The segment selection problem is to preprocess a set of horizontal line segments into a compact data structure that supports fast segment selection queries. We present a solution that uses O(n) space and support segment selection queries in \(O(\log n/ \log \log n)\) time, where n is the number of segments. Furthermore, we prove that that this query time is also optimal for any solution using near-linear space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Agarwal, P.K., Arge, L., Kaplan, H., Molad, E., Tarjan, R.E., Yi, K.: An optimal dynamic data structure for stabbing-semigroup queries. SIAM J. Comput. 41(1), 104–127 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69(1), 232–268 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. Barbay, J., He, M., Munro, J.I., Rao, S.S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: Proc. 18Th SODA, pp. 680–689 (2007)

  4. Belazzougui, D., Cording, P.H., Puglisi, S.J., Tabei, Y.: Access, rank, and select in grammar-compressed strings. In: Proc. 23Rd ESA, pp. 142–154 (2015)

  5. Belazzougui, D., Navarro, G.: Optimal lower and upper bounds for representing sequences. ACM Trans Algorithms 11(4), 1–21 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bille, P., Christiansen, A.R., Cording, P.H., Gørtz, I.L., Skjoldjensen, F.R., Vildhøj, H.W., Vind, S.: Dynamic relative compression, dynamic partial sums, and substring concatenation. Algorithmica 80(11), 3207–3224 (2018). Announced at ISAAC 2016

    Article  MathSciNet  MATH  Google Scholar 

  7. Bille, P., Christiansen, A.R., Prezza, N., Skjoldjensen, F.R.: Succinct partial sums and fenwick trees. In: Proc. 24Th SPIRE, pp. 91–96 (2017)

  8. Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time–space trade-offs for lempel–ziv compressed indexing. Theoret. Comput. Sci. 713, 66–77 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bille, P., Gørtz, I.L.: Random access in persistent strings. In: Proc. 31St ISAAC (2020)

  10. Bille, P., Gørtz, I.L., Landau, G.M., Weimann, O.: Tree compression with top trees. Inform. Comput. 243, 166–177 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). Announced at SODA 2011

    Article  MathSciNet  MATH  Google Scholar 

  12. Chan, T.M.: Persistent predecessor search and orthogonal point location on the word RAM. ACM Trans. Algorithms 9(3), 1–22 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Chan, T.M., Pǎtraşcu, M.: Transdichotomous results in computational geometry, i: Point location in sublogarithmic time. SIAM J. Comput. 39(2), 703–729 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chan, T.M., Tsakalidis, K.: Dynamic planar orthogonal point location in sublogarithmic time. In: Proc 34Th SoCG 2018 (2018)

  15. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inform. Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  16. Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  17. Chern, B., Ochoa, I., Manolakos, A., No, A., Venkat, K., Weissman, T.: Reference based genome compression. In: Proc. 12Th ITW, pp. 427–431 (2012)

  18. De Berg, M., Vankreveld, M., Snoeyink, J.: Two-dimensional and three-dimensional point location in rectangular subdivisions. J. Algorithms 18(2), 256–277 (1995)

    Article  MathSciNet  Google Scholar 

  19. Dietz, P.F.: Fully persistent arrays (extended array). In: Proceedings of the Workshop on Algorithms and Data Structures, Lecture Notes in Computer Science, vol. 382, pp. 67–74 (1989)

  20. Dietz, P.F.: Optimal algorithms for list indexing and subset rank. In: Proc. 1St WADS, pp. 39–46 (1989)

  21. Do, H.H., Jansson, J., Sadakane, K., Sung, W.K.: Fast relative lempel–Ziv self-index for similar sequences. Theoret. Comput. Sci. 532, 14–30 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  22. Driscoll, J., Sarnak, N., Sleator, D., Tarjan, R.: Making data structures persistent. J. Comput. System Sci. 38, 86–124 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  23. Fenwick, P.M.: A new data structure for cumulative frequency tables. Software: Pract. Exper. 24(3), 327–336 (1994)

    Google Scholar 

  24. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms 3(2), 20 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theoret. Comput. Sci. 372(1), 115–121 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  26. Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proc. 21St STOC, pp. 345–354 (1989)

  27. Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. System Sci. 47(3), 424–436 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  28. Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci. 48(3), 533–551 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  29. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proc. 6Th LATA, pp. 240–251 (2012)

  30. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proc. 14Th LATIN, pp. 731–742 (2014)

  31. Gagie, T., Gawrychowski, P., Puglisi, S.J.: Approximate pattern matching in lz77-compressed texts. J. Discrete Algorithms 32, 64–68 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  32. Gagie, T., Karhu, K., Navarro, G., Puglisi, S.J., Sirén, J.: Document listing on repetitive collections. In: Proc. 24Th CPM, pp. 107–119 (2013)

  33. Ganardi, M., Jez, A., Lohrey, M.: Balancing straight-line programs. In: Proc. 60Th FOCS, pp. 1169–1183 (2019)

  34. Golynski, A., Munro, J.I., Rao, S.S.: Rank/Select operations on large alphabets: a tool for text indexing. In: Proc. 17Th SODA, pp. 368–373 (2006)

  35. Golynski, A., Raman, R., Rao, S.S.: On the redundancy of succinct data structures. In: Proc. 11Th SWAT, pp. 148–159 (2008)

  36. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 14Th SODA, pp. 841–850 (2003)

  37. Grossi, R., Raman, R., Rao, S.S., Venturini, R.: Dynamic compressed strings with random access. In: Proc. 40Th ICALP, pp. 504–515 (2013)

  38. Hon, W.K., Sadakane, K., Sung, W.K.: Succinct data structures for searchable partial sums with optimal worst-case performance. Theoret. Comput. Sci. 412(39), 5176–5186 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  39. Hoobin, C., Puglisi, S.J., Zobel, J.: Relative lempel-Ziv factorization for efficient storage and retrieval of web collections. Proc. VLDB Endowment 5(3), 265–273 (2011)

    Article  Google Scholar 

  40. Jørgensen, A.G., Larsen, K.G.: Range selection and median: tight cell probe lower bounds and adaptive data structures. In: Proc. 22Nd SODA, pp. 805–813 (2011)

  41. Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proc. 50Th STOC, pp. 827–840 (2018)

  42. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative lempel-ziv compression of genomes for large-scale storage and retrieval. In: Proc. 17Th SPIRE, pp. 201–206 (2010)

  43. Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative lempel-ziv compression of genomes. In: Proc. 34Th ACSC, pp. 91–98 (2011)

  44. Liao, S.Y., Devadas, S., Keutzer, K.: A text-compression-based method for code size minimization in embedded systems. Trans. Design Autom. Electr. Syst. 4(1), 12–38 (1999)

    Article  Google Scholar 

  45. Liao, S.Y., Devadas, S., Keutzer, K., Tjiang, S.W.K., Wang, A.: Code optimization techniques in embedded DSP microprocessors. Design. Autom. Emb. Sys. 3(1), 59–73 (1998)

    Article  Google Scholar 

  46. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17 (3), 281–308 (2010)

    Article  MathSciNet  Google Scholar 

  47. Munro, J.I., Nekrich, Y.: Compressed data structures for dynamic sequences. In: Proc. 23Rd ESA, pp. 891–902 (2015)

  48. Navarro, G.: Indexing highly repetitive collections. In: Proc. 23Rd IWOCA, pp. 274–279 (2012)

  49. Navarro, G.: Document listing on repetitive collections with guaranteed performance. Theoret. Comput. Sci. 772, 58–72 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  50. Nekrich, Y.: A dynamic stabbing-max data structure with sub-logarithmic query time. In: Proc. 22Nd ISAAC, pp. 170–179 (2011)

  51. Pătraşcu, M., Thorup, M.: Dynamic integer sets with optimal rank, select, and predecessor search. In: Proc. 55Th FOCS, pp. 166–175 (2014)

  52. Pǎtraşcu, M., Demaine, E.D.: Logarithmic lower bounds in the cell-probe model. SIAM J. Comput. 35(4), 932–963 (2006). Announced at SODA 2004

    Article  MathSciNet  MATH  Google Scholar 

  53. Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Proc. 7Th WADS, pp. 426–437 (2001)

  54. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  55. Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proc. 17Th SODA, pp. 1230–1239 (2006)

  56. Sarnak, N., Tarjan, R.E.: Planar point location using persistent search trees. Commun. ACM 29(7), 669–679 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  57. Storer, J.A., Szymanski, T.G.: The macro model for data compression. In: Proc. 10Th STOC, pp. 30–39 (1978)

  58. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  59. Tarjan, R.E., Vishkin, U.: Finding biconnected componemts and computing tree functions in logarithmic parallel time. In: Proc. 25Th FOCS, pp. 12–20 (1984)

  60. Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proc. 24Th CPM, pp. 247–258 (2013)

  61. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank Jesper Jansson for pointing out segment selection as a problem of independent interest and the anonymous reviewers for their helpful comments that improved the presentation of earlier versions of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip Bille.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An extended abstract appeared at the 31st International Symposium on Algorithms and Computation [9]

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bille, P., Gørtz, I.L. Random Access in Persistent Strings and Segment Selection. Theory Comput Syst 67, 694–713 (2023). https://doi.org/10.1007/s00224-022-10109-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-022-10109-5

Keywords

Navigation