当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Apache Nemo: A Framework for Optimizing Distributed Data Processing
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2021-10-15 , DOI: 10.1145/3468144
Won Wook Song 1 , Youngseok Yang 1 , Jeongyoon Eo 1 , Jangho Seo 2 , Joo Yeon Kim 3 , Sanha Lee 2 , Gyewon Lee 1 , Taegeon Um 1 , Haeyoon Cho 1 , Byung-Gon Chun 1
Affiliation  

Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. Existing approaches to such optimizations largely fall into two categories. First, distributed runtimes provide low-level policy interfaces to apply the optimizations, but do not ensure the maintenance of correct application semantics and thus often require significant effort to use. Second, policy interfaces that extend a high-level application programming model ensure correctness, but do not provide sufficient fine control.

We describe Apache Nemo, an optimization framework for distributed dataflow processing that provides fine control for high performance and also ensures correctness for ease of use. We combine several techniques to achieve this, including an intermediate representation of dataflow, compiler optimization passes, and runtime extensions. Our evaluation results show that Nemo enables composable and reusable optimizations that bring performance improvements on par with existing specialized runtimes tailored for a specific deployment scenario. Apache Nemo is open-sourced at https://nemo.apache.org as an Apache incubator project.



中文翻译:

Apache Nemo:优化分布式数据处理的框架

针对资源和数据特性优化分布式数据处理的调度和通信对于实现高性能至关重要。这种优化的现有方法主要分为两类。首先,分布式运行时提供低级策略接口来应用优化,但不能确保维护正确的应用程序语义,因此通常需要大量使用。其次,扩展高级应用程序编程模型的策略接口确保正确性,但不能提供足够的精细控制。

我们描述了 Apache Nemo,这是一个分布式数据流处理的优化框架,它提供了对高性能的精细控制,并确保了易用性的正确性。我们结合了几种技术来实现这一点,包括数据流的中间表示、编译器优化传递和运行时扩展。我们的评估结果表明,Nemo 实现了可组合和可重用的优化,带来的性能改进与针对特定部署场景定制的现有专用运行时相当。Apache Nemo 作为 Apache 孵化器项目在 https://nemo.apache.org 上开源。

更新日期:2021-10-15
down
wechat
bug