当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2023-12-15 , DOI: 10.1145/3631709
Hayfa Tayeb 1 , Ludovic Paillat 1 , Bérenger Bramas 1
Affiliation  

Leveraging the SIMD capability of modern CPU architectures is mandatory to take full advantage of their increased performance. To exploit this capability, binary executables must be vectorized, either manually by developers or automatically by a tool. For this reason, the compilation research community has developed several strategies for transforming scalar code into a vectorized implementation. However, most existing automatic vectorization techniques in modern compilers are designed for regular codes, leaving irregular applications with non-contiguous data access patterns at a disadvantage. In this article, we present a new tool, Autovesk, that automatically generates vectorized code from scalar code, specifically targeting irregular data access patterns. We describe how our method transforms a graph of scalar instructions into a vectorized one, using different heuristics to reduce the number or cost of instructions. Finally, we demonstrate the effectiveness of our approach on various computational kernels using Intel AVX-512 and ARM SVE. We compare the speedups of Autovesk vectorized code over GCC, Clang LLVM, and Intel automatic vectorization optimizations. We achieve competitive results on linear kernels and up to 11× speedups on irregular kernels.



中文翻译:


Autovesk:使用图形转换从非结构化静态内核自动生成矢量化代码



为了充分利用其增强的性能,必须利用现代 CPU 架构的 SIMD 功能。要利用此功能,必须对二进制可执行文件进行矢量化,可以由开发人员手动进行矢量化,也可以由工具自动进行矢量化。因此,编译研究社区开发了几种将标量代码转换为矢量化实现的策略。然而,现代编译器中大多数现有的自动向量化技术都是为常规代码设计的,使具有非连续数据访问模式的不规则应用程序处于不利地位。在本文中,我们提出了一种新工具 Autovesk,它可以从标量代码自动生成矢量化代码,专门针对不规则的数据访问模式。我们描述了我们的方法如何将标量指令图转换为矢量化指令图,使用不同的启发式方法来减少指令的数量或成本。最后,我们使用 Intel AVX-512 和 ARM SVE 展示了我们的方法在各种计算内核上的有效性。我们比较了 Autovesk 矢量化代码相对于 GCC、Clang LLVM 和 Intel 自动矢量化优化的加速效果。我们在线性内核上取得了有竞争力的结果,在不规则内核上实现了高达 11 倍的加速。

更新日期:2023-12-15
down
wechat
bug