Highly Efficient Self-checking Matrix Multiplication on Tiled AMX Accelerators,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Highly Efficient Self-checking Matrix Multiplication on Tiled AMX Accelerators
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2024-02-15 , DOI: 10.1145/3633332
Chandra Sekhar Mummidi ₁ , Victor C. Ferreira ₂ , Sudarshan Srinivasan ₂ , Sandip Kundu ₁

Affiliation

General Matrix Multiplication (GEMM) is a computationally expensive operation that is used in many applications such as machine learning. Hardware accelerators are increasingly popular for speeding up GEMM computation, with Tiled Matrix Multiplication (TMUL) in recent Intel processors being an example. Unfortunately, the TMUL hardware is susceptible to errors, necessitating online error detection. The Algorithm-based Error Detection (ABED) technique is a powerful technique to detect errors in matrix multiplications. In this article, we consider implementation of an ABED technique that integrates seamlessly with the TMUL hardware to minimize performance overhead. Unfortunately, rounding errors introduced by floating-point operations do not allow a straightforward implementation of ABED in TMUL. Previously an error bound was considered for addressing rounding errors in ABED. If the error detection threshold is set too low, it will a trigger false alarm, while a loose bound will allow errors to escape detection. In this article, we propose an adaptive error threshold that takes into account the TMUL input values to address the problem of false triggers and error escapes and provide a taxonomy of various error classes. This threshold is obtained from theoretical error analysis but is not easy to implement in hardware. Consequently, we relax the threshold such that it can be easily computed in hardware. While ABED ensures error-free computation, it does not guarantee full coverage of all hardware faults. To address this problem, we propose an algorithmic pattern generation technique to ensure full coverage for all hardware faults. To evaluate the benefits of our proposed solution, we conducted fault injection experiments and show that our approach does not produce any false alarms or detection escapes for observable errors. We conducted additional fault injection experiments on a Deep Neural Network (DNN) model and find that if a fault is not detected, it does not cause any misclassification.

中文翻译：

Tiled AMX 加速器上的高效自检矩阵乘法

通用矩阵乘法 (GEMM) 是一种计算成本较高的运算，在机器学习等许多应用中使用。硬件加速器在加速 GEMM 计算方面越来越受欢迎，最近的英特尔处理器中的平铺矩阵乘法 (TMUL) 就是一个例子。不幸的是，TMUL 硬件容易出错，因此需要在线错误检测。基于算法的错误检测 (ABED) 技术是一种检测矩阵乘法错误的强大技术。在本文中，我们考虑实施与 TMUL 硬件无缝集成的 ABED 技术，以最大限度地降低性能开销。不幸的是，浮点运算引入的舍入误差不允许在 TMUL 中直接实现 ABED。以前，考虑使用误差界限来解决 ABED 中的舍入误差。如果错误检测阈值设置得太低，则会触发误报，而宽松的界限将使错误逃脱检测。在本文中，我们提出了一种自适应错误阈值，该阈值考虑了 TMUL 输入值，以解决错误触发和错误逃逸问题，并提供各种错误类别的分类。该阈值是通过理论误差分析获得的，但不容易在硬件中实现。因此，我们放宽了阈值，以便可以轻松地在硬件中计算它。虽然 ABED 可确保无错误计算，但它不能保证完全覆盖所有硬件故障。为了解决这个问题，我们提出了一种算法模式生成技术，以确保完全覆盖所有硬件故障。为了评估我们提出的解决方案的好处，我们进行了故障注入实验，并表明我们的方法不会对可观察到的错误产生任何误报或检测逃逸。我们在深度神经网络（DNN）模型上进行了额外的故障注入实验，发现如果没有检测到故障，也不会导致任何错误分类。

更新日期：2024-02-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>