当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compilation of Modular and General Sparse Workspaces
arXiv - CS - Programming Languages Pub Date : 2024-04-06 , DOI: arxiv-2404.04541
Genghan Zhang, Olivia Hsu, Fredrik Kjolstad

Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result tensor. We address this shortcoming through a compiler design that generates code that uses sparse intermediate tensors (sparse workspaces) as efficient adapters between compute code that scatters and result tensors that do not support random insertion. Our compiler automatically detects sparse scattering behavior in tensor expressions and inserts necessary intermediate workspace tensors. We present an algorithm template for workspace insertion that is the backbone of our code generation algorithm. Our algorithm template is modular by design, supporting sparse workspaces that span multiple user-defined implementations. Our evaluation shows that sparse workspaces can be up to 27.12$\times$ faster than the dense workspaces of prior work. On the other hand, dense workspaces can be up to 7.58$\times$ faster than the sparse workspaces generated by our compiler in other situations, which motivates our compiler design that supports both. Our compiler produces sequential code that is competitive with hand-optimized linear and tensor algebra libraries on the expressions they support, but that generalizes to any other expression. Sparse workspaces are also more memory efficient than dense workspaces as they compress away zeros. This compression can asymptotically decrease memory usage, enabling tensor computations on data that would otherwise run out of memory.

中文翻译:

模块化和通用稀疏工作空间的编译

近年来,在编译稀疏张量代数表达式方面进行了大量工作。本文解决了该工作中的一个缺点,即如何生成有效的代码(在时间和空间上)将值分散到稀疏结果张量中。我们通过编译器设计来解决这个缺点,该编译器设计生成的代码使用稀疏中间张量(稀疏工作空间)作为分散的计算代码和不支持随机插入的结果张量之间的高效适配器。我们的编译器自动检测张量表达式中的稀疏散射行为,并插入必要的中间工作空间张量。我们提出了一个用于工作区插入的算法模板,它是我们的代码生成算法的支柱。我们的算法模板采用模块化设计,支持跨越多个用户定义实现的稀疏工作空间。我们的评估表明,稀疏工作空间比之前工作的密集工作空间快 27.12$\times$。另一方面,在其他情况下,密集工作空间比我们的编译器生成的稀疏工作空间快 7.58$\times$,这促使我们的编译器设计支持这两种工作空间。我们的编译器生成的顺序代码在其支持的表达式上与手动优化的线性和张量代数库具有竞争力,但可以推广到任何其他表达式。稀疏工作空间也比密集工作空间具有更高的内存效率,因为它们压缩了零。这种压缩可以渐进地减少内存使用量,从而可以对本来会耗尽内存的数据进行张量计算。
更新日期:2024-04-09
down
wechat
bug