Abstract
Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. We explore how such queries appear in practice and present a method that can be used to significantly improve the asymptotic bounds of their state-of-the-art evaluation algorithms. Our experimental evaluation of these methods shows order-of-magnitude performance improvements.
- M. Arenas, P. Barcel´o, L. Libkin, W. Martens, and A. Pieris. Database Theory. Open source at https://github.com/pdm-book/community, 2022.Google Scholar
- M. Arenas, L. A. Croquevielle, R. Jayaram, and C. Riveros. When is approximate counting for conjunctive queries tractable? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1015--1027, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
- R. Asif and M. A. Qadir. Enhancing the Nobel Prize schema. In 2017 International Conference on Communication, Computing and Digital Systems (C-CODE), pages 193--198, Islamabad,Pakistan, 2017. IEEE.Google ScholarCross Ref
- G. Bagan, A. Durand, and E. Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Proc. CSL 2007, volume 4646 of LNCS, pages 208--222, Berlin, Heidelberg, 2007. Springer.Google ScholarCross Ref
- A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.Google ScholarCross Ref
- C. Berkholz and N. Schweikardt. Constant delay enumeration with fpt-preprocessing for conjunctive queries of bounded submodular width. In Proc. MFCS 2019, volume 138 of LIPIcs, pages 58:1--58:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.Google Scholar
- A. Bonifati, S. Dumbrava, G. Fletcher, J. Hidders, M. Hofer, W. Martens, F. Murlak, J. Shinavier, S. Staworko, and D. Tomaszuk. Threshold queries in theory and in the wild. Proc. VLDB Endow., 15(5):1105--1118, 2022.Google ScholarDigital Library
- A. Bonifati, W. Martens, and T. Timm. Navigating the maze of wikidata query logs. In The World Wide Web Conference, pages 127--138, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarDigital Library
- A. Bonifati, W. Martens, and T. Timm. An analytical study of large SPARQL query logs. VLDB J., 29(2--3):655--679, 2020.Google ScholarCross Ref
- M. J. Carey and D. Kossmann. On saying ?enough already!" in sql. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, SIGMOD '97, pages 219--230, New York, NY, USA, 1997. Association for Computing Machinery.Google ScholarDigital Library
- D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system r. Commun. ACM, 24(10):632--646, oct 1981.Google ScholarDigital Library
- S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA, volume 28, pages 263--274, New York, NY, USA, 1999. Association for Computing Machinery.Google Scholar
- CovidGraph. COVID-19 Knowledge Graph, 2021. https://covidgraph.org/.Google Scholar
- V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from the other side. Theor. Comput. Sci., 329(1--3):315--323, 2004.Google Scholar
- S. Deep and P. Koutris. Compressed representations of conjunctive query results. In J. V. den Bussche and M. Arenas, editors, Proc. PODS 2018, pages 307--322, New York, NY, USA, 2018. ACM.Google Scholar
- A. Durand and S. Mengel. Structural tractability of counting of solutions to conjunctive queries. Theory Comput. Syst., 57(4):1202--1249, 2015.Google ScholarDigital Library
- J. Finger and N. Polyzotis. Robust and efficient algorithms for rank join evaluation. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 415--428, New York, NY, USA, 2009. Association for Computing Machinery.Google ScholarDigital Library
- J. Flum and M. Grohe. The parameterized complexity of counting problems. SIAM J. Comput., 33(4):892--922, 2004.Google ScholarDigital Library
- G. Gottlob, G. Greco, and F. Scarcello. Treewidth and hypertree width. In L. Bordeaux, Y. Hamadi, and P. Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 3--38. Cambridge University Press, 2014.Google Scholar
- M. Grohe, T. Schwentick, and L. Segoufin. When is the evaluation of conjunctive queries tractable? In ACM Symposium on Theory of Computing (STOC), pages 657--666, New York, NY, USA, 2001. Association for Computing Machinery.Google ScholarDigital Library
- ICIJ. The Offshore Leaks Database, 2022. https: //github.com/ICIJ/offshoreleaks-data-packages.Google Scholar
- I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB journal, 13(3):207--221, 2004.Google ScholarDigital Library
- A. Kara and D. Olteanu. Covers of query results. In B. Kimelfeld and Y. Amsterdamer, editors, 21st International Conference on Database Theory, volume 98 of LIPIcs, pages 16:1--16:22, Vienna, Austria, 2018. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.Google Scholar
- M. Kr¨otzsch. Practical linked data access via SPARQL: The case of wikidata. In LDOW@ WWW, pages 1--10, Lyon, France, 2018. CEUR Workshop Proceedings.Google Scholar
- V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 27(5):643--668, 2018.Google ScholarDigital Library
- S. Malyshev, M. Kr¨otzsch, L. Gonz´alez, J. Gonsior, and A. Bielefeldt. Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph. In International Semantic Web Conference (ISWC), pages 376--394, Cham, 2018. Springer.Google ScholarDigital Library
- N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems (TODS), 32(3):19--es, 2007.Google Scholar
- F. Murlak, J. Posiadala, and P. Susicki. On the semantics of Cypher's implicit group-by. In A. Cheung and K. Nguyen, editors, Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages, DBPL 2019, Phoenix, AZ, USA, June 23, 2019, pages 59--69. ACM, 2019.Google ScholarDigital Library
- A. Natsev, Y.-C. Chang, J. R. Smith, C.-S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In VLDB, volume 1, pages 281--290, San Francisco, CA, USA, 2001. Morgan KaufmannGoogle ScholarDigital Library
- D. Olteanu and J. Z´avodn´y. Size bounds for factorised representations of query results. ACM Trans. Database Syst., 40(1):2:1--2:44, 2015.Google ScholarDigital Library
- R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. Journal of Computer and System Sciences, 79(6):984--1001, Sep 2013.Google ScholarDigital Library
- K. Shanley. TPC releases benchmark results on 65 systems. SIGMETRICS Perform. Evaluation Rev., 19(2):19--23, 1991.Google Scholar
- M. D. Vose. A linear algorithm for generating random numbers with a given distribution. IEEE Transactions on software engineering, 17(9):972--975, 1991.Google ScholarDigital Library
- D. Vrandeci´c. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, WWW '12 Companion, pages 1063--1064, New York, NY, USA, 2012. Association for Computing Machinery.Google ScholarDigital Library
- A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw., 3(3):253--256, 1977.Google ScholarDigital Library
- M. Yannakakis. Algorithms for acyclic database schemes. In Proc. VLDB 1981, pages 82--94, Cannes, France, 1981. IEEE Computer Society.Google Scholar
- Z. Zhao, R. Christensen, F. Li, X. Hu, and K. Yi. Random sampling over joins revisited. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 1525--1539, New York, NY, USA, 2018. Association for Computing Machinery.Google ScholarDigital Library
Recommendations
Threshold queries in theory and in the wild
Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is ...
Efficient Processing of Probabilistic Threshold Top-k Queries Based on X-tuple in Uncertain Database
CICN '12: Proceedings of the 2012 Fourth International Conference on Computational Intelligence and Communication NetworksTop-k queries are widely used in analyzing and processing uncertain data. In the uncertain database, the xtuple consists a number of alternatives which are mutually exclusive, the independence still remains among the x-tuples. Probabilistic threshold ...
The threshold join algorithm for top-k queries in distributed sensor networks
DMSN '05: Proceedings of the 2nd international workshop on Data management for sensor networksIn this paper we present the Threshold Join Algorithm (TJA), which is an efficient TOP-k query processing algorithm for distributed sensor networks. The objective of a top-k query is to find the k highest ranked answers to a user defined similarity ...
Comments