Lero: applying learning-to-rank in query optimizer,The VLDB Journal

当前位置： X-MOL 学术 › VLDB J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Lero: applying learning-to-rank in query optimizer
The VLDB Journal ( IF 4.2 ) Pub Date : 2024-04-25 , DOI: 10.1007/s00778-024-00850-3
Xingguang Chen , Rong Zhu , Bolin Ding , Sibo Wang , Jingren Zhou

In recent studies, machine learning techniques have been employed to support or enhance cost-based query optimizers in DBMS. Although these approaches have shown superiority in certain benchmarks, they also suffer from certain drawbacks. These include unstable performance, high training costs, and slow model updating, which can be attributed to the inherent challenges of predicting the cost or latency of execution plans using machine learning models. In this paper, we introduce a learning-to-rank query optimizer, called Lero, which builds on top of the native query optimizer and continuously learns to improve query optimization. The key observation is that the relative order or rank of plans, rather than the exact cost or latency, is sufficient for query optimization. Lero employs a pairwise approach to train a classifier to compare any two plans and tell which one is better. Such a binary classification task is much easier than the regression task to predict the cost or latency, in terms of model efficiency and effectiveness. Rather than building a learned optimizer from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native optimizer. With its non-intrusive design, Lero can be implemented on top of any existing DBMS with minimum integration efforts. We implement Lero and demonstrate its outstanding performance using PostgreSQL and Spark SQL. In our experiments, Lero achieves near-optimal performance on several benchmarks. It reduces the execution time of the native PostgreSQL optimizer by up to \(70\%\) and other learned query optimizers by up to \(37\%\) on single-machine environments. On distributed environments, our Lero improves the running time of the native Spark SQL optimizer by up to \(27\%\). Meanwhile, Lero continuously learns and automatically adapts to query workloads and changes in data.

中文翻译：

Lero：在查询优化器中应用学习排名

在最近的研究中，机器学习技术已被用来支持或增强 DBMS 中基于成本的查询优化器。尽管这些方法在某些基准测试中显示出优越性，但它们也存在某些缺点。其中包括性能不稳定、训练成本高和模型更新慢，这可以归因于使用机器学习模型预测执行计划的成本或延迟的固有挑战。在本文中，我们介绍了一种名为Lero 的学习排名查询优化器，它构建在本机查询优化器之上，并不断学习以改进查询优化。关键的观察结果是，计划的相对顺序或排名，而不是确切的成本或延迟，足以进行查询优化。Lero采用成对方法来训练分类器来比较任意两个计划并判断哪个更好。就模型效率和有效性而言，这种二元分类任务比回归任务更容易预测成本或延迟。Lero不是从头开始构建学习优化器，而是旨在利用数十年的数据库智慧并改进本机优化器。凭借其非侵入式设计，Lero可以在任何现有 DBMS 之上实现，并且只需最少的集成工作。我们使用 PostgreSQL 和 Spark SQL实现Lero并展示其出色的性能。在我们的实验中，Lero在多个基准测试中实现了接近最佳的性能。在单机环境中，它可以将本机 PostgreSQL 优化器的执行时间减少多达\(70\%\)，并将其他学习查询优化器的执行时间减少多达\(37\%\) 。在分布式环境中，我们的Lero将原生 Spark SQL 优化器的运行时间提高了高达\(27\%\)。同时，Lero不断学习并自动适应查询工作负载和数据变化。

更新日期：2024-04-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>