article

Threshold Queries

Authors:
Angela Bonifati

Lyon 1 Univ., Liris CNRS

Lyon 1 Univ., Liris CNRS
View Profile

,
Stefania Dumbrava

ENSIIE & IP Paris

ENSIIE & IP Paris
View Profile

,
George Fletcher

Eindhoven Univ. of Technology

Eindhoven Univ. of Technology
View Profile

,
Jan Hidders

Birkbeck, Univ. of London

Birkbeck, Univ. of London
View Profile

,
Matthias Hofer

University of Bayreuth

University of Bayreuth
View Profile

,
Wim Martens

University of Bayreuth

University of Bayreuth
View Profile

,
Filip Murlak

Univ. of Warsaw

Univ. of Warsaw
View Profile

,
Joshua Shinavier

LinkedIn

LinkedIn
View Profile

,
Slawek Staworko

RelationalAI & Univ. of Lille

RelationalAI & Univ. of Lille
View Profile

,
Dominik Tomaszuk

Univ. of Bialystok

Univ. of Bialystok
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 52 Issue 1March 2023pp 64–73https://doi.org/10.1145/3604437.3604452

Published:08 June 2023Publication History

ACM SIGMOD Record

Abstract

Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. We explore how such queries appear in practice and present a method that can be used to significantly improve the asymptotic bounds of their state-of-the-art evaluation algorithms. Our experimental evaluation of these methods shows order-of-magnitude performance improvements.

References

M. Arenas, P. Barcel´o, L. Libkin, W. Martens, and A. Pieris. Database Theory. Open source at https://github.com/pdm-book/community, 2022.Google Scholar
M. Arenas, L. A. Croquevielle, R. Jayaram, and C. Riveros. When is approximate counting for conjunctive queries tractable? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1015--1027, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
R. Asif and M. A. Qadir. Enhancing the Nobel Prize schema. In 2017 International Conference on Communication, Computing and Digital Systems (C-CODE), pages 193--198, Islamabad,Pakistan, 2017. IEEE.Google ScholarCross Ref
G. Bagan, A. Durand, and E. Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Proc. CSL 2007, volume 4646 of LNCS, pages 208--222, Berlin, Heidelberg, 2007. Springer.Google ScholarCross Ref
A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.Google ScholarCross Ref
C. Berkholz and N. Schweikardt. Constant delay enumeration with fpt-preprocessing for conjunctive queries of bounded submodular width. In Proc. MFCS 2019, volume 138 of LIPIcs, pages 58:1--58:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.Google Scholar
A. Bonifati, S. Dumbrava, G. Fletcher, J. Hidders, M. Hofer, W. Martens, F. Murlak, J. Shinavier, S. Staworko, and D. Tomaszuk. Threshold queries in theory and in the wild. Proc. VLDB Endow., 15(5):1105--1118, 2022.Google ScholarDigital Library
A. Bonifati, W. Martens, and T. Timm. Navigating the maze of wikidata query logs. In The World Wide Web Conference, pages 127--138, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarDigital Library
A. Bonifati, W. Martens, and T. Timm. An analytical study of large SPARQL query logs. VLDB J., 29(2--3):655--679, 2020.Google ScholarCross Ref
M. J. Carey and D. Kossmann. On saying ?enough already!" in sql. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, SIGMOD '97, pages 219--230, New York, NY, USA, 1997. Association for Computing Machinery.Google ScholarDigital Library
D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system r. Commun. ACM, 24(10):632--646, oct 1981.Google ScholarDigital Library
S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA, volume 28, pages 263--274, New York, NY, USA, 1999. Association for Computing Machinery.Google Scholar
CovidGraph. COVID-19 Knowledge Graph, 2021. https://covidgraph.org/.Google Scholar
V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from the other side. Theor. Comput. Sci., 329(1--3):315--323, 2004.Google Scholar
S. Deep and P. Koutris. Compressed representations of conjunctive query results. In J. V. den Bussche and M. Arenas, editors, Proc. PODS 2018, pages 307--322, New York, NY, USA, 2018. ACM.Google Scholar
A. Durand and S. Mengel. Structural tractability of counting of solutions to conjunctive queries. Theory Comput. Syst., 57(4):1202--1249, 2015.Google ScholarDigital Library
J. Finger and N. Polyzotis. Robust and efficient algorithms for rank join evaluation. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 415--428, New York, NY, USA, 2009. Association for Computing Machinery.Google ScholarDigital Library
J. Flum and M. Grohe. The parameterized complexity of counting problems. SIAM J. Comput., 33(4):892--922, 2004.Google ScholarDigital Library
G. Gottlob, G. Greco, and F. Scarcello. Treewidth and hypertree width. In L. Bordeaux, Y. Hamadi, and P. Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 3--38. Cambridge University Press, 2014.Google Scholar
M. Grohe, T. Schwentick, and L. Segoufin. When is the evaluation of conjunctive queries tractable? In ACM Symposium on Theory of Computing (STOC), pages 657--666, New York, NY, USA, 2001. Association for Computing Machinery.Google ScholarDigital Library
ICIJ. The Offshore Leaks Database, 2022. https: //github.com/ICIJ/offshoreleaks-data-packages.Google Scholar
I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB journal, 13(3):207--221, 2004.Google ScholarDigital Library
A. Kara and D. Olteanu. Covers of query results. In B. Kimelfeld and Y. Amsterdamer, editors, 21st International Conference on Database Theory, volume 98 of LIPIcs, pages 16:1--16:22, Vienna, Austria, 2018. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.Google Scholar
M. Kr¨otzsch. Practical linked data access via SPARQL: The case of wikidata. In LDOW@ WWW, pages 1--10, Lyon, France, 2018. CEUR Workshop Proceedings.Google Scholar
V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 27(5):643--668, 2018.Google ScholarDigital Library
S. Malyshev, M. Kr¨otzsch, L. Gonz´alez, J. Gonsior, and A. Bielefeldt. Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph. In International Semantic Web Conference (ISWC), pages 376--394, Cham, 2018. Springer.Google ScholarDigital Library
N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems (TODS), 32(3):19--es, 2007.Google Scholar
F. Murlak, J. Posiadala, and P. Susicki. On the semantics of Cypher's implicit group-by. In A. Cheung and K. Nguyen, editors, Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages, DBPL 2019, Phoenix, AZ, USA, June 23, 2019, pages 59--69. ACM, 2019.Google ScholarDigital Library
A. Natsev, Y.-C. Chang, J. R. Smith, C.-S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In VLDB, volume 1, pages 281--290, San Francisco, CA, USA, 2001. Morgan KaufmannGoogle ScholarDigital Library
D. Olteanu and J. Z´avodn´y. Size bounds for factorised representations of query results. ACM Trans. Database Syst., 40(1):2:1--2:44, 2015.Google ScholarDigital Library
R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. Journal of Computer and System Sciences, 79(6):984--1001, Sep 2013.Google ScholarDigital Library
K. Shanley. TPC releases benchmark results on 65 systems. SIGMETRICS Perform. Evaluation Rev., 19(2):19--23, 1991.Google Scholar
M. D. Vose. A linear algorithm for generating random numbers with a given distribution. IEEE Transactions on software engineering, 17(9):972--975, 1991.Google ScholarDigital Library
D. Vrandeci´c. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, WWW '12 Companion, pages 1063--1064, New York, NY, USA, 2012. Association for Computing Machinery.Google ScholarDigital Library
A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw., 3(3):253--256, 1977.Google ScholarDigital Library
M. Yannakakis. Algorithms for acyclic database schemes. In Proc. VLDB 1981, pages 82--94, Cannes, France, 1981. IEEE Computer Society.Google Scholar
Z. Zhao, R. Christensen, F. Li, X. Hu, and K. Yi. Random sampling over joins revisited. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 1525--1539, New York, NY, USA, 2018. Association for Computing Machinery.Google ScholarDigital Library

Recommendations

Threshold queries in theory and in the wild

Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is ...
Read More
Efficient Processing of Probabilistic Threshold Top-k Queries Based on X-tuple in Uncertain Database
CICN '12: Proceedings of the 2012 Fourth International Conference on Computational Intelligence and Communication Networks

Top-k queries are widely used in analyzing and processing uncertain data. In the uncertain database, the xtuple consists a number of alternatives which are mutually exclusive, the independence still remains among the x-tuples. Probabilistic threshold ...
Read More
The threshold join algorithm for top-k queries in distributed sensor networks
DMSN '05: Proceedings of the 2nd international workshop on Data management for sensor networks

In this paper we present the Threshold Join Algorithm (TJA), which is an efficient TOP-k query processing algorithm for distributed sensor networks. The objective of a top-k query is to find the k highest ranked answers to a user defined similarity ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 52, Issue 1
March 2023
118 pages
ISSN:0163-5808
DOI:10.1145/3604437
Editors:
Rada Chirkova
North Carolina State University
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Wim Martens
University of Bayreuth
,
Manos Athanassoulis
DBrainstorming
,
Marcelo Arenas
Research Highlights
,
Marianne Winslett
University of Illinois
,
Susan B. Davidson
The Future of Data(base) Education
,
Lyublena Antova
Datometry
,
Aaron J. Elmore
University of Chicago
,
Kyriakos Mouratidis
Singapore Management University
,
Dan Olteanu
University of Oxford
,
Immanuel Trummer
Cornell University
,
Yannis Velegrakis
Utrecht University
,
Renata Borovica-Gajic
Surveys
,
Tamer Özsu
University of Waterloo
,
Pınar Tözün
IT University of Copenhagen
,
Wook-Shin Han
Research and Vision columns
,
Kenneth Ross
Research Highlights
,
Alfons Kemper
Technical University of Munich
,
Samuel Madden
MIT
Issue’s Table of Contents
Copyright © 2023 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2023
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 72
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Threshold Queries

ACM SIGMOD Record

Abstract

References

Cited By

Recommendations

Threshold queries in theory and in the wild

Efficient Processing of Probabilistic Threshold Top-k Queries Based on X-tuple in Uncertain Database

The threshold join algorithm for top-k queries in distributed sensor networks