当前位置: X-MOL 学术ACM Trans. Embed. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward Energy-efficient STT-MRAM-based Near Memory Computing Architecture for Embedded Systems
ACM Transactions on Embedded Computing Systems ( IF 2 ) Pub Date : 2024-04-25 , DOI: 10.1145/3650729
Yueting Li 1 , Xueyan Wang 1 , He Zhang 1 , Biao Pan 1 , Keni Qiu 2 , Wang Kang 1 , Jun Wang 1 , Weisheng Zhao 1
Affiliation  

Convolutional Neural Networks (CNNs) have significantly impacted embedded system applications across various domains. However, this exacerbates the real-time processing and hardware resource-constrained challenges of embedded systems. To tackle these issues, we propose spin-transfer torque magnetic random-access memory (STT-MRAM)-based near memory computing (NMC) design for embedded systems. We optimize this design from three aspects: Fast-pipelined STT-MRAM readout scheme provides higher memory bandwidth for NMC design, enhancing real-time processing capability with a non-trivial area overhead. Direct index compression format in conjunction with digital sparse matrix-vector multiplication (SpMV) accelerator supports various matrices of practical applications that alleviate computing resource requirements. Custom NMC instructions and stream converter for NMC systems dynamically adjust available hardware resources for better utilization. Experimental results demonstrate that the memory bandwidth of STT-MRAM achieves 26.7 GB/s. Energy consumption and latency improvement of digital SpMV accelerator are up to 64× and 1,120× across sparsity matrices spanning from 10% to 99.8%. Single-precision and double-precision elements transmission increased up to 8× and 9.6×, respectively. Furthermore, our design achieves a throughput of up to 15.9× over state-of-the-art designs.



中文翻译:

面向嵌入式系统的基于节能 STT-MRAM 的近内存计算架构

卷积神经网络 (CNN) 对各个领域的嵌入式系统应用产生了重大影响。然而,这加剧了嵌入式系统的实时处理和硬件资源受限的挑战。为了解决这些问题,我们提出了基于自旋转移矩磁性随机存取存储器(STT-MRAM)的嵌入式系统近存储器计算(NMC)设计。我们从三个方面优化了该设计: 快速流水线 STT-MRAM 读出方案为 NMC 设计提供了更高的内存带宽,以不小的面积开销增强了实时处理能力。直接索引压缩格式与数字稀疏矩阵向量乘法(SpMV)加速器相结合,支持实际应用的各种矩阵,从而减轻计算资源需求。用于 NMC 系统的定制 NMC 指令和流转换器可动态调整可用硬件资源,以实现更好的利用。实验结果表明,STT-MRAM的内存带宽达到26.7 GB/s。在稀疏矩阵范围从 10% 到 99.8% 的情况下,数字 SpMV 加速器的能耗和延迟改善分别高达 64 倍和 1,120 倍。单精度和双精度元件传输分别提高到8倍和9.6倍。此外,我们的设计的吞吐量比最先进的设计高出 15.9 倍。

更新日期:2024-04-25
down
wechat
bug