Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2023-12-28 , DOI: 10.1145/3633462
Khalid Ahmad ₁ , Cris Cecka ₂ , Michael Garland ₂ , Mary Hall ₁

Affiliation

An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTM is a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this paper, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE’s SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.

中文翻译：

探索 GPU 上稀疏张量乘密集矩阵的数据布局

一个重要的稀疏张量计算是稀疏张量密集矩阵乘法（SpTM），它用于张量分解和应用。SpTM 是稀疏矩阵密集矩阵乘法 (SpMM) 的多维模拟。在本文中，我们采用分层张量数据布局，可以展开多维张量来导出 2D 矩阵，从而可以使用 GPU 的 SpMM 内核实现来计算 SpTM。我们使用以下方法将两种 SpMM 实现与最先进的 PASTA 稀疏张量收缩实现进行比较：（1）具有分层张量数据布局的 SpMM；(2) 展开，然后调用 cuSPARSE 的 SpMM。结果表明，SpMM 在 70.9% 的情况下优于 PASTA，但这三种方法总体上都不是最好的。因此，我们使用决策树分类器根据稀疏张量的预先计算属性来识别性能最佳的稀疏张量收缩内核。

更新日期：2023-12-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>