当前位置: X-MOL 学术IEEE Trans. Nanotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ASIC Design of Nanoscale Artificial Neural Networks for Inference/Training by Floating-Point Arithmetic
IEEE Transactions on Nanotechnology ( IF 2.4 ) Pub Date : 2024-02-20 , DOI: 10.1109/tnano.2024.3367916
Farzad Niknia 1 , Ziheng Wang 1 , Shanshan Liu 2 , Pedro Reviriego 3 , Ahmed Louri 4 , Fabrizio Lombardi 1
Affiliation  

Inference and on-chip training of Artificial Neural Networks (ANNs) are challenging computational processes for large datasets; hardware implementations are needed to accelerate this computation, while meeting metrics such as operating frequency, power dissipation and accuracy. In this article, a high-performance ASIC-based design is proposed to implement both forward and backward propagations of multi-layer perceptrons (MLPs) at the nanoscales. To attain a higher accuracy, floating-point arithmetic units for a multiply-and-accumulate (MAC) array are employed in the proposed design; moreover, a hybrid implementation scheme is utilized to achieve flexibility (for networks of different size) and comprehensively low hardware overhead. The proposed design is fully pipelined, and its performance is independent of network size, except for the number of cycles and latency. The efficiency of the proposed nanoscale MLP-based design for inference (as taking place over multiple steps) and training (due to the complex processing in backward propagation by eliminating many redundant calculations) is analyzed. Moreover, the impact of different floating-point precision formats on the final accuracy and hardware metrics under the same design constraints is studied. A comparative evaluation of the proposed MLP design for different datasets and floating-point precision formats is provided. Results show that compared to current schemes found in the technical literatures, the proposed design has the best operating frequency and accuracy with still good latency and energy dissipation.

中文翻译:

用于浮点运算推理/训练的纳米级人工神经网络 ASIC 设计

人工神经网络 (ANN) 的推理和片上训练对大型数据集的计算过程提出了挑战;需要硬件实现来加速计算,同时满足工作频率、功耗和精度等指标。在本文中,提出了一种基于 ASIC 的高性能设计,以实现纳米级多层感知器 (MLP) 的前向和后向传播。为了获得更高的精度,在所提出的设计中采用了用于乘法累加(MAC)阵列的浮点运算单元;此外,采用混合实现方案来实现灵活性(针对不同规模的网络)和全面的低硬件开销。所提出的设计是完全流水线的,除了周期数和延迟之外,其性能与网络规模无关。分析了所提出的基于纳米级 MLP 的推理(通过多个步骤进行)和训练(由于通过消除许多冗余计算进行复杂的后向传播处理)设计的效率。此外,还研究了相同设计约束下不同浮点精度格式对最终精度和硬件指标的影响。提供了针对不同数据集和浮点精度格式所提出的 MLP 设计的比较评估。结果表明,与技术文献中现有的方案相比,所提出的设计具有最佳的工作频率和精度,同时仍然具有良好的延迟和能量耗散。
更新日期:2024-02-20
down
wechat
bug