ACM Transactions on Database Systems期刊最新论文, 计算机, 软件工程类期刊,

Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-04-10
Adriane Chapman, Luca Lauro, Paolo Missier, Riccardo Torlone

Successful data-driven science requires complex data engineering pipelines to clean, transform, and alter data in preparation for machine learning, and robust results can only be achieved when each step in the pipeline can be justified, and its effect on the data explained. In this framework, we aim at providing data scientists with facilities to gain an in-depth understanding of how each step in the

更新日期：2024-04-10

详情收藏

Sharing Queries with Nonequivalent User-defined Aggregate Functions

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-04-10
Chao Zhang, Toumani Farouk

This article presents Sharing User-Defined Aggregate Function (SUDAF), a declarative framework that allows users to write User-defined Aggregate Functions (UDAFs) as mathematical expressions and use them in Structured Query Language statements. SUDAF rewrites partial aggregates of UDAFs using built-in aggregate functions and supports efficient dynamic caching and reusing of partial aggregates. Our

更新日期：2024-04-10

详情收藏

Database Repairing with Soft Functional Dependencies

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-04-10
Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, Muhammad Tibi

A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost)

更新日期：2024-04-10

详情收藏

The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra Space

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-03-23
Diego Arroyuelo, Adrián Gómez-Brandón, Aidan Hogan, Gonzalo Navarro, Juan Reutter, Javiel Rojas-Ledesma, Adrián Soto

We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a ring, regards each triple as a cyclic string of length 3. Each rotation of the triples is lexicographically sorted and the values of the last attribute are stored as a column, so we obtain the order of the next column by stably re-sorting the

更新日期：2024-03-26

详情收藏

Fast Parallel Hypertree Decompositions in Logarithmic Recursion Depth

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-02-28
Georg Gottlob, Matthias Lanzinger, Cem Okulmus, Reinhard Pichler

Various classic reasoning problems with natural hypergraph representations are known to be tractable if a hypertree decomposition (HD) of low width exists. The resulting algorithms are attractive for practical use in fields like databases and constraint satisfaction. However, algorithmic use of HDs relies on the difficult task of first computing a decomposition of the hypergraph underlying a given

更新日期：2024-02-28

详情收藏

Linking Entities across Relations and Graphs

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-02-28
Wenfei Fan, Ping Lu, Kehan Pang, Ruochun Jin, Wenyuan Yu

This article proposes a notion of parametric simulation to link entities across a relational database 𝒟 and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations, and important properties as parameters, parametric simulation identifies tuples t in 𝒟 and vertices v in G that refer to the same real-world entity, based on both topological and semantic matching

更新日期：2024-02-28

详情收藏

Ad Hoc Transactions through the Looking Glass: An Empirical Study of Application-Level Transactions in Web Applications

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-02-28
Zhaoguo Wang, Chuzhe Tang, Xiaodong Zhang, Qianmian Yu, Binyu Zang, Haibing Guan, Haibo Chen

Many transactions in web applications are constructed ad hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. We refer to database operations coordinated by application code as ad hoc transactions. Until now, little is known about them. This paper presents the first comprehensive study on ad hoc

更新日期：2024-02-28

详情收藏

Identifying the Root Causes of DBMS Suboptimality

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-02-28
Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh

The query optimization phase within a database management system (DBMS) ostensibly finds the fastest query execution plan from a potentially large set of enumerated plans, all of which correctly compute the same result of the specified query. Sometimes the cost-based optimizer selects a slower plan, for a variety of reasons. Previous work has focused on increasing the performance of specific components

更新日期：2024-02-28

详情收藏

A family of centrality measures for graph data based on subgraphs

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-02-23
Sebastián Bugedo, Cristian Riveros, Jorge Salas

We present the theoretical foundations and first experimental study of a new approach in centrality measures for graph data. The main principle is straightforward: the more relevant subgraphs around a vertex, the more central it is in the network. We formalize the notion of “relevant subgraphs” by choosing a family of subgraphs that, given a graph G and a vertex v, assigns a subset of connected subgraphs

更新日期：2024-02-23

详情收藏

GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2024-02-20
David Tench, Evan West, Victor Zhang, Michael A. Bender, Abiyaz Chowdhury, Daniel Delayo, J. Ahmed Dellas, Martín Farach-Colton, Tyler Seip, Kenny Zhang

Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components problem

更新日期：2024-02-21

详情收藏

Partial Order Multiway Search

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-11-13
Lu Shangqi, Wim Martens, Matthias Niewerth, Yufei Tao

Partial order multiway search (POMS) is a fundamental problem that finds applications in crowdsourcing, distributed file systems, software testing, and more. This problem involves an interaction between an algorithm 𝒜 and an oracle, conducted on a directed acyclic graph 𝒢 known to both parties. Initially, the oracle selects a vertex t in 𝒢 called the target. Subsequently, 𝒜 must identify the target

更新日期：2023-11-13

详情收藏

Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-11-13
Herodotos Herodotou, Elena Kakoulli

The use of storage tiering is becoming popular in data-intensive compute clusters due to the recent advancements in storage technologies. The Hadoop Distributed File System, for example, now supports storing data in memory, SSDs, and HDDs, while OctopusFS and hatS offer fine-grained storage tiering solutions. However, current big data platforms (such as Hadoop and Spark) are not exploiting the presence

更新日期：2023-11-13

详情收藏

DomainNet: Homograph Detection and Understanding in Data Lake Disambiguation

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-09-12
Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

Modern data lakes are heterogeneous in the vocabulary that is used to describe data. We study a problem of disambiguation in data lakes: How can we determine if a data value occurring more than once in the lake has different meanings and is therefore a homograph? While word and entity disambiguation have been well studied in computational linguistics, data management, and data science, we show that

更新日期：2023-09-15

详情收藏

Model Counting Meets F0 Estimation

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-08-09
A. Pavan®, N. V. Vinodchandran®, Arnab Bhattacharyya®, Kuldeep S. Meel

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities

更新日期：2023-08-09

详情收藏

Enabling Timely and Persistent Deletion in LSM-Engines

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-08-09
Subhadeep Sarkar, Tarikul Islam Papon, Dimitris Staratzis, Zichen Zhu, Manos Athanassoulis

Data-intensive applications have fueled the evolution of log-structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as second-class citizens. A delete operation inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art

更新日期：2023-08-09

详情收藏

Efficient Bi-objective SQL Optimization for Enclaved Cloud Databases with Differentially Private Padding

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-06-26
Yaxing Chen, Qinghua Zheng, Zheng Yan

Hardware-enabled enclaves have been applied to efficiently enforce data security and privacy protection in cloud database services. Such enclaved systems, however, are reported to suffer from I/O-size (also referred to as communication-volume)-based side-channel attacks. Albeit differentially private padding has been exploited to defend against these attacks as a principle method, it introduces a challenging

更新日期：2023-06-27

详情收藏

Model Counting meets F0 Estimation

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-06-20
A. Pavan, N. V. Vinodchandran, Arnab Bhattacharyya, Kuldeep S. Meel

Constraint satisfaction problems (CSP’s) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities

更新日期：2023-06-20

详情收藏

Enabling Timely and Persistent Deletion in LSM-Engines

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-06-08
Subhadeep Sarkar, Tarikul Islam Papon, Dimitris Staratzis, Zichen Zhu, Manos Athanassoulis

Data-intensive applications have fueled the evolution of log-structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as second-class citizens. A delete operation inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art

更新日期：2023-06-08

详情收藏

Proportionality on Spatial Data with Context

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-05-13
Georgios J. Fakas, Georgios Kalamatianos

More often than not, spatial objects are associated with some context, in the form of text, descriptive tags (e.g., points of interest, flickr photos), or linked entities in semantic graphs (e.g., Yago2, DBpedia). Hence, location-based retrieval should be extended to consider not only the locations but also the context of the objects, especially when the retrieved objects are too many and the query

更新日期：2023-05-13

详情收藏

Reversible Database Watermarking Based on Order-preserving Encryption for Data Sharing

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-05-13
Donghui Hu, Qing Wang, Song Yan, Xiaojun Liu, Meng Li, Shuli Zheng

In the era of big data, data sharing not only boosts the economy of the world but also brings about problems of privacy disclosure and copyright infringement. The collected data may contain users’ sensitive information; thus, privacy protection should be applied to the data prior to them being shared. Moreover, the shared data may be re-shared to third parties without the consent or awareness of the

更新日期：2023-05-13

详情收藏

Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-03-13
Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

We study the question of when we can provide direct access to the k-th answer to a Conjunctive Query (CQ) according to a specified order over the answers in time logarithmic in the size of the database, following a preprocessing step that constructs a data structure in time quasilinear in database size. Specifically, we embark on the challenge of identifying the tractable answer orderings, that is

更新日期：2023-03-15

详情收藏

Robust and Efficient Sorting with Offset-value Coding

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-03-13
Thanh Do, Goetz Graefe

Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup, and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have worked on simple, efficient implementations of decades-old, neglected, effective techniques for fast comparisons and fast sorting, in particular offset-value coding

更新日期：2023-03-14

详情收藏

Efficiently Cleaning Structured Event Logs: A Graph Repair Approach

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-03-13
Ruihong Huang, Jianmin Wang, Shaoxu Song, Xuemin Lin, Xiaochen Zhu, Jian Pei

Event data are often dirty owing to various recording conventions or simply system errors. These errors may cause serious damage to real applications, such as inaccurate provenance answers, poor profiling results, or concealing interesting patterns from event data. Cleaning dirty event data is strongly demanded. While existing event data cleaning techniques view event logs as sequences, structural

更新日期：2023-03-14

详情收藏

Efficient Sorting, Duplicate Removal, Grouping, and Aggregation

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2023-01-06
Thanh Do, Goetz Graefe, Jeffrey Naughton

Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external merge sort; and hash aggregation relies on an in-memory hash table plus hash partitioning to temporary storage. Cost-based query optimization chooses which algorithm

更新日期：2023-01-07

详情收藏

Proximity Queries on Terrain Surface

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-12-16
Victor Junqiu Wei, Raymond Chi-Wing Wong, Cheng Long, David Mount, Hanan Samet

Due to the advance of the geo-spatial positioning and the computer graphics technology, digital terrain data has become increasingly popular nowadays. Query processing on terrain data has attracted considerable attention from both the academic and the industry communities. Proximity queries such as the shortest path/distance query, k nearest/farthest neighbor query, and top-k closest/farthest pairs

更新日期：2022-12-16

详情收藏

Deciding Robustness for Lower SQL Isolation Levels

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-11-06
Bas Ketsman, Christoph Koch, Frank Neven, Brecht Vandevoort

While serializability always guarantees application correctness, lower isolation levels can be chosen to improve transaction throughput at the risk of introducing certain anomalies. A set of transactions is robust against a given isolation level if every possible interleaving of the transactions under the specified isolation level is serializable. Robustness therefore always guarantees application

更新日期：2022-11-06

详情收藏

Conjunctive Queries: Unique Characterizations and Exact Learnability

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-11-06
Balder Ten Cate, Victor Dalmau

We answer the question of which conjunctive queries are uniquely characterized by polynomially many positive and negative examples and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism

更新日期：2022-11-06

详情收藏

Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-08-18
Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Alessio Conte, Benny Kimelfeld, Nicole Schweikardt

As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports

更新日期：2022-08-18

详情收藏

On Finding Rank Regret Representatives

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-08-18
Abolfazl Asudeh, Gautam Das, H. V. Jagadish, Shangqi Lu, Azade Nazi, Yufei Tao, Nan Zhang, Jianwen Zhao

Selecting the best items in a dataset is a common task in data exploration. However, the concept of “best” lies in the eyes of the beholder: Different users may consider different attributes more important and, hence, arrive at different rankings. Nevertheless, one can remove “dominated” items and create a “representative” subset of the data, comprising the “best items” in it. A Pareto-optimal representative

更新日期：2022-08-18

详情收藏

Persistent Summaries

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-08-18
Tianjing Zeng, Zhewei Wei, Ge Luo, Ke Yi, Xiaoyong Du, Ji-Rong Wen

A persistent data structure, also known as a multiversion data structure in the database literature, is a data structure that preserves all its previous versions as it is updated over time. Every update (inserting, deleting, or changing a data record) to the data structure creates a new version, while all the versions are kept in the data structure so that any previous version can still be queried

更新日期：2022-08-18

详情收藏

Influence Maximization Revisited: Efficient Sampling with Bound Tightened

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-08-18
Qintian Guo, Sibo Wang, Zhewei Wei, Wenqing Lin, Jing Tang

Given a social network G with n nodes and m edges, a positive integer k, and a cascade model C, the influence maximization (IM) problem asks for k nodes in G such that the expected number of nodes influenced by the k nodes under cascade model C is maximized. The state-of-the-art approximate solutions run in O(k(n+m)log n/ε2) expected time while returning a (1 - 1/e - ε) approximate solution with at

更新日期：2022-08-18

详情收藏

Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-06-25
Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Alessio Conte, Benny Kimelfeld, Nicole Schweikardt

As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports

更新日期：2022-06-27

详情收藏

Persistent Data Sketching

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-05-23

A persistent data structure, also known as a multiversion data structure in the database literature, is a data structure that preserves all its previous versions as it is updated over time. Every update (inserting, deleting, or changing a data record) to the data structure creates a new version, while all the versions are kept in the data structure so that any previous version can still be queried

更新日期：2022-05-23

详情收藏

Conjunctive Regular Path Queries with Capture Groups

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-05-23
Markus L. Schmid

In practice, regular expressions are usually extended by so-called capture groups or capture variables, which allow to capture a subexpression by a variable that can be referenced in the regular expression in order to describe repetitions of subwords. We investigate how this concept could be used for pattern-based graph querying; i.e., we investigate conjunctive regular path queries (CRPQs) that are

更新日期：2022-05-23

详情收藏

Incremental Graph Computations: Doable and Undoable

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-05-23
Wenfei Fan, Chao Tian

The incremental problem for a class \( {\mathcal {Q}} \) of graph queries aims to compute, given a query \( Q \in {\mathcal {Q}} \), graph G, answers Q(G) to Q in G and updates ΔG to G as input, changes ΔO to output Q(G) such that Q(G⊕ΔG) = Q(G)⊕ΔO. It is called bounded if its cost can be expressed as a polynomial function in the sizes of Q, ΔG and ΔO, which reduces the computations on possibly big

更新日期：2022-05-23

详情收藏

Mining Order-preserving Submatrices under Data Uncertainty: A Possible-world Approach and Efficient Approximation Methods

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-05-23
Ji Cheng, Da Yan, Wenwen Qu, Xiaotian Hao, Cheng Long, Wilfred Ng, Xiaoling Wang

Given a data matrix \( D \), a submatrix \( S \) of \( D \) is an order-preserving submatrix (OPSM) if there is a permutation of the columns of \( S \), under which the entry values of each row in \( S \) are strictly increasing. OPSM mining is widely used in real-life applications such as identifying coexpressed genes and finding customers with similar preference. However, noise is ubiquitous in real

更新日期：2022-05-23

详情收藏

Optimal Joins Using Compressed Quadtrees

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-05-23
Diego Arroyuelo, Gonzalo Navarro, Juan L. Reutter, Javiel Rojas-Ledesma

Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality one either needs to build completely new indexes or must

更新日期：2022-05-23

详情收藏

Influence Maximization Revisited: Efficient Sampling with Bound Tightened

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-05-19
Qintian Guo, Sibo Wang, Zhewei Wei, Wenqing Lin, Jing Tang

Given a social network G with n nodes and m edges, a positive integer k, and a cascade model \(\mathcal {C} \), the influence maximization (IM) problem asks for k nodes in G such that the expected number of nodes influenced by the k nodes under cascade model \(\mathcal {C} \) is maximized. The state-of-the-art approximate solutions run in O(k(n + m)log n/ϵ2) expected time while returning a (1 − 1/e

更新日期：2022-05-19

详情收藏

Unified Route Planning for Shared Mobility: An Insertion-based Framework

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-05-02
Yongxin Tong, Yuxiang Zeng, Zimu Zhou, Lei Chen, Ke Xu

There has been a dramatic growth of shared mobility applications such as ride-sharing, food delivery, and crowdsourced parcel delivery. Shared mobility refers to transportation services that are shared among users, where a central issue is route planning. Given a set of workers and requests, route planning finds for each worker a route, i.e., a sequence of locations to pick up and drop off passengers/parcels

更新日期：2022-05-02

详情收藏

On Finding Rank Regret Representatives

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-04-27
Abolfazl Asudeh, Gautam Das, H. V. Jagadish, Shangqi Lu, Azade Nazi, Yufei Tao, Nan Zhang, Jianwen Zhao

Selecting the best items in a dataset is a common task in data exploration. However, the concept of “best” lies in the eyes of the beholder: different users may consider different attributes more important and, hence, arrive at different rankings. Nevertheless, one can remove “dominated” items and create a “representative” subset of the data, comprising the “best items” in it. A Pareto-optimal representative

更新日期：2022-04-28

详情收藏

Height Optimized Tries

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-04-06
Robert Binna, Eva Zangerle, Martin Pichl, Günther Specht, Viktor Leis

We present the Height Optimized Trie (HOT), a fast and space-efficient in-memory index structure. The core algorithmic idea of HOT is to dynamically vary the number of bits considered at each node, which enables a consistently high fanout and thereby good cache efficiency. For a fixed maximum node fanout, the overall tree height is minimal and its structure is deterministically defined. Multiple carefully

更新日期：2022-04-06

详情收藏

The Space-Efficient Core of Vadalog

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-04-06
Gerald Berger, Georg Gottlob, Andreas Pieris, Emanuel Sallinger

Vadalog is a system for performing complex reasoning tasks such as those required in advanced knowledge graphs. The logical core of the underlying Vadalog language is the warded fragment of tuple-generating dependencies (TGDs). This formalism ensures tractable reasoning in data complexity, while a recent analysis focusing on a practical implementation led to the reasoning algorithm around which the

更新日期：2022-04-06

详情收藏

Incremental Graph Computations: Doable and Undoable

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-03-10
Wenfei Fan, Chao Tian

The incremental problem for a class \({\mathcal {Q}} \) of graph queries aims to compute, given a query \(Q \in {\mathcal {Q}} \), graph G, answers Q(G) to Q in G and updates ΔG to G as input, changes ΔO to output Q(G) such that Q(G⊕ΔG) = Q(G)⊕ΔO. It is called bounded if its cost can be expressed as a polynomial function in the sizes of Q, ΔG and ΔO, which reduces the computations on possibly big G

更新日期：2022-03-10

详情收藏

Sampling a Near Neighbor in High DimensionsJust Accepted

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2022-02-04
Martin Aumüller, Sariel Har-Peled, Sepideh Mahabadi, Rasmus Pagh, Francesco Silvestri

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the r-near neighbor (r-NN) problem asks for a data structure that, given any query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of individual fairness and providing equal

更新日期：2022-02-04

详情收藏

A Formal Framework for Complex Event Recognition

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-12-08
Alejandro Grez, Cristian Riveros, Martín Ugarte, Stijn Vansummeren

Complex event recognition (CER) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real time. CER finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. Existing CER languages lack a clear semantics, however, which makes them hard to understand and generalize

更新日期：2021-12-08

详情收藏

On Directed Densest Subgraph Discovery

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-11-15
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks V. S. Lakshmanan, Wenjie Zhang, Xuemin Lin

Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a 3,000-edge graph

更新日期：2021-11-15

详情收藏

Timely Reporting of Heavy Hitters Using External Memory

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-11-15
Shikha Singh, Prashant Pandey, Michael A. Bender, Jonathan W. Berry, Martín Farach-Colton, Rob Johnson, Thomas M. Kroeger, Cynthia A. Phillips

Given an input stream S of size N, a ɸ-heavy hitter is an item that occurs at least ɸN times in S. The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem

更新日期：2021-11-15

详情收藏

Balancing Expressiveness and Inexpressiveness in View Design

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-11-15
Michael Benedikt, Pierre Bourhis, Louis Jachiet, Efthymia Tsamoura

We study the design of data publishing mechanisms that allow a collection of autonomous distributed data sources to collaborate to support queries. A common mechanism for data publishing is via views: functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual sources, but with the intention of supporting integrated

更新日期：2021-11-15

详情收藏

SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-09-28
Immanuel Trummer, Junxiong Wang, Ziyun Wei, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis, Ankush Rayabhari

SkinnerDB uses reinforcement learning for reliable join ordering, exploiting an adaptive processing engine with specialized join algorithms and data structures. It maintains no data statistics and uses no cost or cardinality models. Also, it uses no training workloads nor does it try to link the current query to seemingly similar queries in the past. Instead, it uses reinforcement learning to learn

更新日期：2021-09-28

详情收藏

Stream Data Cleaning under Speed and Acceleration Constraints

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-09-28
Shaoxu Song, Fei Gao, Aoqian Zhang, Jianmin Wang, Philip S. Yu

Stream data are often dirty, for example, owing to unreliable sensor reading or erroneous extraction of stock prices. Most stream data cleaning approaches employ a smoothing filter, which may seriously alter the data without preserving the original information. We argue that the cleaning should avoid changing those originally correct/clean data, a.k.a. the minimum modification rule in data cleaning

更新日期：2021-09-28

详情收藏

Error Bounded Line Simplification Algorithms for Trajectory Compression: An Experimental Evaluation

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-09-28
Xuelian Lin, Shuai Ma, Jiahao Jiang, Yanchen Hou, Tianyu Wo

Nowadays, various sensors are collecting, storing, and transmitting tremendous trajectory data, and it is well known that the storage, network bandwidth, and computing resources could be heavily wasted if raw trajectory data is directly adopted. Line simplification algorithms are effective approaches to attacking this issue by compressing a trajectory to a set of continuous line segments, and are commonly

更新日期：2021-09-28

详情收藏

Bag Query Containment and Information Theory

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-09-28
Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, Dan Suciu

The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the

更新日期：2021-09-28

详情收藏

On the Enumeration Complexity of Unions of Conjunctive Queries

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-05-30
Nofar Carmeli, Markus Kröll

We study the enumeration complexity of Unions of Conjunctive Queries (UCQs) . We aim to identify the UCQs that are tractable in the sense that the answer tuples can be enumerated with a linear preprocessing phase and a constant delay between every successive tuples. It has been established that, in the absence of self-joins and under conventional complexity assumptions, the CQs that admit such an evaluation

更新日期：2021-05-30

详情收藏

Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-05-30
Amine Mhedhbi, Chathura Kankanamge, Semih Salihoglu

We study the problem of optimizing one-time and continuous subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing worst-case optimal plans is to pick an ordering of the query vertices to match. We make two main contributions: 1. A cost-based dynamic programming

更新日期：2021-05-30

详情收藏

Embedded Functional Dependencies and Data-completeness Tailored Database Design

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-05-30
Ziheng Wei, Sebastian Link

We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data

更新日期：2021-05-30

详情收藏

Graph Indexing for Efficient Evaluation of Label-constrained Reachability Queries

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-05-30
Yangjun Chen, Gagandeep Singh

Given a directed edge labeled graph G , to check whether vertex v is reachable from vertex u under a label set S is to know if there is a path from u to v whose edge labels across the path are a subset of S . Such a query is referred to as a label-constrained reachability ( LCR ) query. In this article, we present a new approach to store a compressed transitive closure of G in the form of intervals

更新日期：2021-05-30

详情收藏

Constant-Delay Enumeration for Nondeterministic Document Spanners

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-04-14
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth

We consider the information extraction framework known as document spanners and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results

更新日期：2021-04-14

详情收藏

Scotty

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-03-27
Jonas Traub, Philipp Marian Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, or minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics, such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions)

更新日期：2021-03-27

详情收藏

An Empirical Study of Moment Estimators for Quantile Approximation

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-03-18
Rory Mitchell, Eibe Frank, Geoffrey Holmes

We empirically evaluate lightweight moment estimators for the single-pass quantile approximation problem, including maximum entropy methods and orthogonal series with Fourier, Cosine, Legendre, Chebyshev and Hermite basis functions. We show how to apply stable summation formulas to offset numerical precision issues for higher-order moments, leading to reliable single-pass moment estimators up to order

更新日期：2021-03-18

详情收藏

Evaluation of Machine Learning Algorithms in Predicting the Next SQL Query from the Future

ACM Trans. Database Syst. (IF 1.8) Pub Date : 2021-03-18
Venkata Vamsikrishna Meduri, Kanchan Chowdhury, Mohamed Sarwat

Prediction of the next SQL query from the user, given her sequence of queries until the current timestep, during an ongoing interaction session of the user with the database, can help in speculative query processing and increased interactivity. While existing machine learning-- (ML) based approaches use recommender systems to suggest relevant queries to a user, there has been no exhaustive study on

更新日期：2021-03-18

详情收藏