当前位置: X-MOL 学术Microprocess. Microsyst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA
Microprocessors and Microsystems ( IF 2.6 ) Pub Date : 2024-02-17 , DOI: 10.1016/j.micpro.2024.105037
Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab

Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.

中文翻译:

使用 FPGA 深度神经网络的实时 P-SFA 硬件实现

机器学习 (ML) 算法,特别是人工神经网络 (ANN),已证明其在解决许多不同应用和多个领域的复杂问题方面的有效性。本文重点优化神经网络硬件架构的激活函数(AF)块。所使用的 AF 块基于基于概率的 sigmoid 函数近似块 (P-SFA) 与计算输入数据概率的新型实时概率模块 (PRT) 相结合。所提出的神经网络设计旨在使用最少的硬件资源和面积,同时保持较高的识别精度。本工作中提出的 AF 模块由两个 P-SFA 块和 PRT 组件组成。为实现神经网络而提出的架构在现场可编程门阵列 (FPGA) 上进行了评估。所提出的设计在 MNIST 数据集的 6 层深度神经网络(DNN)上实现了 97.84% 的识别准确率,在 FMNIST 数据集的 6 层 DNN 上实现了 88.58% 的识别准确率。所提出的 AF 模块总面积为 1136 个 LUT 和 327 个 FF,逻辑关键路径延迟为 8.853 ns。 P-SFA 模块的功耗为 6 mW,PRT 模块的功耗为 5 mW。
更新日期:2024-02-17
down
wechat
bug