skip to main content
article

Threshold Queries

Published:08 June 2023Publication History
Skip Abstract Section

Abstract

Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. We explore how such queries appear in practice and present a method that can be used to significantly improve the asymptotic bounds of their state-of-the-art evaluation algorithms. Our experimental evaluation of these methods shows order-of-magnitude performance improvements.

References

  1. M. Arenas, P. Barcel´o, L. Libkin, W. Martens, and A. Pieris. Database Theory. Open source at https://github.com/pdm-book/community, 2022.Google ScholarGoogle Scholar
  2. M. Arenas, L. A. Croquevielle, R. Jayaram, and C. Riveros. When is approximate counting for conjunctive queries tractable? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1015--1027, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Asif and M. A. Qadir. Enhancing the Nobel Prize schema. In 2017 International Conference on Communication, Computing and Digital Systems (C-CODE), pages 193--198, Islamabad,Pakistan, 2017. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  4. G. Bagan, A. Durand, and E. Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Proc. CSL 2007, volume 4646 of LNCS, pages 208--222, Berlin, Heidelberg, 2007. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  5. A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Berkholz and N. Schweikardt. Constant delay enumeration with fpt-preprocessing for conjunctive queries of bounded submodular width. In Proc. MFCS 2019, volume 138 of LIPIcs, pages 58:1--58:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.Google ScholarGoogle Scholar
  7. A. Bonifati, S. Dumbrava, G. Fletcher, J. Hidders, M. Hofer, W. Martens, F. Murlak, J. Shinavier, S. Staworko, and D. Tomaszuk. Threshold queries in theory and in the wild. Proc. VLDB Endow., 15(5):1105--1118, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bonifati, W. Martens, and T. Timm. Navigating the maze of wikidata query logs. In The World Wide Web Conference, pages 127--138, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Bonifati, W. Martens, and T. Timm. An analytical study of large SPARQL query logs. VLDB J., 29(2--3):655--679, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. J. Carey and D. Kossmann. On saying ?enough already!" in sql. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, SIGMOD '97, pages 219--230, New York, NY, USA, 1997. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system r. Commun. ACM, 24(10):632--646, oct 1981.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA, volume 28, pages 263--274, New York, NY, USA, 1999. Association for Computing Machinery.Google ScholarGoogle Scholar
  13. CovidGraph. COVID-19 Knowledge Graph, 2021. https://covidgraph.org/.Google ScholarGoogle Scholar
  14. V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from the other side. Theor. Comput. Sci., 329(1--3):315--323, 2004.Google ScholarGoogle Scholar
  15. S. Deep and P. Koutris. Compressed representations of conjunctive query results. In J. V. den Bussche and M. Arenas, editors, Proc. PODS 2018, pages 307--322, New York, NY, USA, 2018. ACM.Google ScholarGoogle Scholar
  16. A. Durand and S. Mengel. Structural tractability of counting of solutions to conjunctive queries. Theory Comput. Syst., 57(4):1202--1249, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Finger and N. Polyzotis. Robust and efficient algorithms for rank join evaluation. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 415--428, New York, NY, USA, 2009. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Flum and M. Grohe. The parameterized complexity of counting problems. SIAM J. Comput., 33(4):892--922, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Gottlob, G. Greco, and F. Scarcello. Treewidth and hypertree width. In L. Bordeaux, Y. Hamadi, and P. Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 3--38. Cambridge University Press, 2014.Google ScholarGoogle Scholar
  20. M. Grohe, T. Schwentick, and L. Segoufin. When is the evaluation of conjunctive queries tractable? In ACM Symposium on Theory of Computing (STOC), pages 657--666, New York, NY, USA, 2001. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. ICIJ. The Offshore Leaks Database, 2022. https: //github.com/ICIJ/offshoreleaks-data-packages.Google ScholarGoogle Scholar
  22. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB journal, 13(3):207--221, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Kara and D. Olteanu. Covers of query results. In B. Kimelfeld and Y. Amsterdamer, editors, 21st International Conference on Database Theory, volume 98 of LIPIcs, pages 16:1--16:22, Vienna, Austria, 2018. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.Google ScholarGoogle Scholar
  24. M. Kr¨otzsch. Practical linked data access via SPARQL: The case of wikidata. In LDOW@ WWW, pages 1--10, Lyon, France, 2018. CEUR Workshop Proceedings.Google ScholarGoogle Scholar
  25. V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 27(5):643--668, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Malyshev, M. Kr¨otzsch, L. Gonz´alez, J. Gonsior, and A. Bielefeldt. Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph. In International Semantic Web Conference (ISWC), pages 376--394, Cham, 2018. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems (TODS), 32(3):19--es, 2007.Google ScholarGoogle Scholar
  28. F. Murlak, J. Posiadala, and P. Susicki. On the semantics of Cypher's implicit group-by. In A. Cheung and K. Nguyen, editors, Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages, DBPL 2019, Phoenix, AZ, USA, June 23, 2019, pages 59--69. ACM, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Natsev, Y.-C. Chang, J. R. Smith, C.-S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In VLDB, volume 1, pages 281--290, San Francisco, CA, USA, 2001. Morgan KaufmannGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Olteanu and J. Z´avodn´y. Size bounds for factorised representations of query results. ACM Trans. Database Syst., 40(1):2:1--2:44, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. Journal of Computer and System Sciences, 79(6):984--1001, Sep 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Shanley. TPC releases benchmark results on 65 systems. SIGMETRICS Perform. Evaluation Rev., 19(2):19--23, 1991.Google ScholarGoogle Scholar
  33. M. D. Vose. A linear algorithm for generating random numbers with a given distribution. IEEE Transactions on software engineering, 17(9):972--975, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Vrandeci´c. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, WWW '12 Companion, pages 1063--1064, New York, NY, USA, 2012. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw., 3(3):253--256, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Yannakakis. Algorithms for acyclic database schemes. In Proc. VLDB 1981, pages 82--94, Cannes, France, 1981. IEEE Computer Society.Google ScholarGoogle Scholar
  37. Z. Zhao, R. Christensen, F. Li, X. Hu, and K. Yi. Random sampling over joins revisited. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 1525--1539, New York, NY, USA, 2018. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)13

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader