当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DynQ: a dynamic query engine with query-reuse capabilities embedded in a polyglot runtime
The VLDB Journal ( IF 4.2 ) Pub Date : 2023-03-13 , DOI: 10.1007/s00778-023-00784-2
Filippo Schiavio , Daniele Bonetta , Walter Binder

Language-integrated query (LINQ) frameworks offer a convenient programming abstraction for processing in-memory collections of data, allowing developers to concisely express declarative queries using popular programming languages. Existing LINQ frameworks rely on the type system of statically typed languages such as C\(^\sharp \) or Java to perform query compilation and execution. As a consequence of this design, they do not support dynamic languages such as Python, R, or JavaScript. Such languages are however very popular among data scientists, who would certainly benefit from LINQ frameworks in data-analytics applications. The gap between dynamic languages and LINQ frameworks has been partially bridged by the recent work DynQ, a novel query engine designed for dynamic languages. DynQ is language-agnostic, since it is able to execute SQL queries on all languages supported by the GraalVM platform. Moreover, DynQ can execute queries combining data from multiple sources, namely in-memory object collections as well as on-file data and external database systems. The evaluation of DynQ shows performance comparable with equivalent hand-optimized code, and in line with common data-processing libraries and embedded databases, making DynQ an appealing query engine for standalone analytics applications and for data-intensive server-side workloads. In this work, we extend DynQ addressing the problem of optimizing high-throughput workloads in the context of fluent APIs. In particular, we focus on applications which make use of data-processing libraries mostly for executing many queries on small batches of datasets, e.g., in micro-services, as well as applications which make use of data-processing libraries within recursive functions. For this purpose, we present reusable compiled queries, a novel approach to query execution which allows reusing the same dynamically compiled code for different queries. As we show in our evaluation, thanks to reusable compiled queries, DynQ can also speed up applications that heavily use data-processing libraries on small datasets using a typical fluent API.



中文翻译:

DynQ:一个动态查询引擎,具有嵌入多语言运行时的查询重用功能

语言集成查询 (LINQ) 框架为处理内存中的数据集合提供了方便的编程抽象,允许开发人员使用流行的编程语言简洁地表达声明性查询。现有的 LINQ 框架依赖于 C \(^\sharp \)等静态类型语言的类型系统或 Java 来执行查询编译和执行。由于这种设计,它们不支持动态语言,例如 Python、R 或 JavaScript。然而,此类语言在数据科学家中非常流行,他们肯定会从数据分析应用程序中的 LINQ 框架中受益。动态语言和 LINQ 框架之间的差距已被最近的工作 DynQ 部分弥合,这是一种为动态语言设计的新型查询引擎。DynQ 与语言无关,因为它能够在 GraalVM 平台支持的所有语言上执行 SQL 查询。此外,DynQ 可以执行结合来自多个来源的数据的查询,即内存中对象集合以及文件数据和外部数据库系统。DynQ 的评估显示性能与等效的手动优化代码相当,并与通用数据处理库和嵌入式数据库保持一致,使 DynQ 成为独立分析应用程序和数据密集型服务器端工作负载的有吸引力的查询引擎。在这项工作中,我们扩展了 DynQ,解决了在流畅 API 的上下文中优化高吞吐量工作负载的问题。特别是,我们关注那些主要使用数据处理库来对小批量数据集执行许多查询的应用程序,例如,在微服务中,以及在递归函数中使用数据处理库的应用程序。为此,我们提出 我们扩展了 DynQ,解决了在流畅 API 的上下文中优化高吞吐量工作负载的问题。特别是,我们关注那些主要使用数据处理库来对小批量数据集执行许多查询的应用程序,例如,在微服务中,以及在递归函数中使用数据处理库的应用程序。为此,我们提出 我们扩展了 DynQ,解决了在流畅 API 的上下文中优化高吞吐量工作负载的问题。特别是,我们关注那些主要使用数据处理库来对小批量数据集执行许多查询的应用程序,例如,在微服务中,以及在递归函数中使用数据处理库的应用程序。为此,我们提出可重用编译查询,一种新颖的查询执行方法,允许为不同的查询重用相同的动态编译代码。正如我们在评估中所展示的那样,得益于可重用的编译查询,DynQ 还可以使用典型的流畅 API 加速在小型数据集上大量使用数据处理库的应用程序。

更新日期:2023-03-14
down
wechat
bug