当前位置: X-MOL 学术IEEE Open J. Circuits Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An On-Chip Fully Connected Neural Network Training Hardware Accelerator Based on Brain Float Point and Sparsity Awareness
IEEE Open Journal of Circuits and Systems Pub Date : 2023-02-23 , DOI: 10.1109/ojcas.2023.3245061
Tsung-Han Tsai, Ding-Bang Lin

In recent years, deep neural networks (DNNs) have brought revolutionary progress in various fields with the advent of technology. It is widely used in image pre-processing, image enhancement technology, face recognition, voice recognition, and other applications, gradually replacing traditional algorithms. It shows that the rise of neural networks has led to the reform of artificial intelligence. Since neural network algorithms are computationally intensive, they require GPUs or accelerated hardware for real-time computation. However, the high cost and high power consumption of GPUs result in low energy efficiency. It recently led to much research on accelerated digital circuit hardware design for deep neural networks. In this paper, we propose an efficient and flexible neural network training processor for fully connected layers. Our proposed training processor features low power consumption, high throughput, and high energy efficiency. It uses the sparsity of neuron activations to reduce the number of memory accesses and memory space to achieve an efficient training accelerator. The proposed processor uses a novel reconfigurable computing architecture to maintain high performance when operating Forward Propagation and Backward Propagation. The processor is implemented in Xilinx Zynq UltraSacle+MPSoC ZCU104 FPGA, with an operating frequency of 200MHz and power consumption of 6.444W, and can achieve 102.43 GOPS.

中文翻译:

基于脑浮点数和稀疏感知的片上全连接神经网络训练硬件加速器

近年来,随着技术的出现,深度神经网络(DNN)在各个领域带来了革命性的进步。广泛应用于图像预处理、图像增强技术、人脸识别、语音识别等应用领域,逐渐取代传统算法。表明神经网络的兴起带动了人工智能的变革。由于神经网络算法是计算密集型的,因此它们需要 GPU 或加速硬件进行实时计算。然而,GPU 的高成本和高功耗导致能源效率低下。它最近引发了对深度神经网络加速数字电路硬件设计的大量研究。在本文中,我们为全连接层提出了一种高效灵活的神经网络训练处理器。我们提出的训练处理器具有低功耗、高吞吐量和高能效的特点。它利用神经元激活的稀疏性来减少内存访问次数和内存空间,从而实现高效的训练加速器。所提出的处理器使用一种新颖的可重构计算架构来在运行前向传播和反向传播时保持高性能。该处理器采用Xilinx Zynq UltraSacle+MPSoC ZCU104 FPGA实现,工作频率为200MHz,功耗为6.444W,可实现102.43 GOPS。所提出的处理器使用一种新颖的可重构计算架构来在运行前向传播和反向传播时保持高性能。该处理器采用Xilinx Zynq UltraSacle+MPSoC ZCU104 FPGA实现,工作频率为200MHz,功耗为6.444W,可实现102.43 GOPS。所提出的处理器使用一种新颖的可重构计算架构来在运行前向传播和反向传播时保持高性能。该处理器采用Xilinx Zynq UltraSacle+MPSoC ZCU104 FPGA实现,工作频率为200MHz,功耗为6.444W,可实现102.43 GOPS。
更新日期:2023-02-23
down
wechat
bug