当前位置: X-MOL 学术World Wide Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cardinality estimation with smoothing autoregressive models
World Wide Web ( IF 3.7 ) Pub Date : 2023-07-28 , DOI: 10.1007/s11280-023-01195-7
Yuming Lin , Zejun Xu , Yinghao Zhang , You Li , Jingwei Zhang

Cardinality estimation, which aims at accurately estimating the result size of queries, is a fundamental task in database query processing and optimization. One of the most recent and effective solutions to this problem is the use of deep autoregressive models to obtain joint probability distributions through unsupervised learning. However, due to the data sparsity, it is difficult for the estimator to accurately capture the actual distribution, which affects the accuracy of the cardinality estimation. In addition, autoregressive estimators’ progressive sampling characteristics are prone to error propagation, which is more evident in high-dimensional data. To reduce the autoregressive cardinality estimation error and to obtain a better trade-off between estimate accuracy and latency, we propose a random smoothing autoregressive cardinality estimation model (SAM-CE), which uses a random smoothing technique combined with a deep autoregressive model to simplify the learning of joint probability distributions. A smooth progressive sampling method that is suitable for range queries is designed to improve the estimator accuracy by improving the sample quality. We conduct extensive experiments to demonstrate the effectiveness and performance of the proposed SAM-CE. The results show that SAM-CE achieves the state of the art effectiveness of cardinality estimation.



中文翻译:

使用平滑自回归模型进行基数估计

基数估计旨在准确估计查询结果的大小,是数据库查询处理和优化的一项基本任务。该问题最新且有效的解决方案之一是使用深度自回归模型通过无监督学习获得联合概率分布。然而,由于数据稀疏性,估计器很难准确捕捉实际分布,从而影响基数估计的准确性。此外,自回归估计器的渐进采样特性容易产生误差传播,这在高维数据中更为明显。为了减少自回归基数估计误差并在估计精度和延迟之间获得更好的权衡,我们提出了一种随机平滑自回归基数估计模型(SAM-CE),该模型使用随机平滑技术与深度自回归模型相结合来简化联合概率分布的学习。设计了一种适合范围查询的平滑渐进采样方法,通过提高样本质量来提高估计器的准确性。我们进行了大量的实验来证明所提出的 SAM-CE 的有效性和性能。结果表明,SAM-CE 实现了基数估计的最先进的有效性。设计了一种适合范围查询的平滑渐进采样方法,通过提高样本质量来提高估计器的准确性。我们进行了大量的实验来证明所提出的 SAM-CE 的有效性和性能。结果表明,SAM-CE 实现了基数估计的最先进的有效性。设计了一种适合范围查询的平滑渐进采样方法,通过提高样本质量来提高估计器的准确性。我们进行了大量的实验来证明所提出的 SAM-CE 的有效性和性能。结果表明,SAM-CE 实现了基数估计的最先进的有效性。

更新日期:2023-07-29
down
wechat
bug