当前位置: X-MOL 学术J. Real-Time Image Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques
Journal of Real-Time Image Processing ( IF 3 ) Pub Date : 2024-03-29 , DOI: 10.1007/s11554-024-01442-8
Hyeonseok Hong , Dahun Choi , Namjoon Kim , Haein Lee , Beomjin Kang , Huibeom Kang , Hyun Kim

With the recent advancements in high-performance computing, convolutional neural networks (CNNs) have achieved remarkable success in various vision tasks. However, along with improvements in model accuracy, the size and computational complexity of the models have significantly increased with the increasing number of parameters. Although graphics processing unit (GPU) platforms equipped with high-performance memory and specialized in parallel processing are commonly used for CNN processing, the significant power consumption presents challenges in their utilization on edge devices. To address these issues, research is underway to design CNN models using field-programmable gate arrays (FPGAs) as accelerators. FPGAs provide a high level of flexibility, allowing efficient optimization of convolution operations, which account for a significant portion of the CNN computations. Additionally, FPGAs are known for their low power consumption compared to GPUs, making them a promising energy-efficient platform. In this paper, we review and summarize various approaches and techniques related to the design of FPGA-based CNN accelerators. Specifically, to comprehensively study CNN accelerators, we investigate the advantages and disadvantages of various methods for optimizing CNN accelerators and previously designed efficient accelerator architectures. We expect this paper to serve as an important guideline for future hardware research in artificial intelligence.



中文翻译:

现场可编程门阵列平台上的卷积神经网络加速器综述:架构和优化技术

随着高性能计算的最新进展,卷积神经网络(CNN)在各种视觉任务中取得了显着的成功。然而,随着模型精度的提高,模型的尺寸和计算复杂度随着参数数量的增加而显着增加。尽管配备高性能内存并专门进行并行处理的图形处理单元(GPU)平台通常用于CNN处理,但巨大的功耗对其在边缘设备上的使用提出了挑战。为了解决这些问题,人们正在研究使用现场可编程门阵列 (FPGA) 作为加速器来设计 CNN 模型。 FPGA 提供了高度的灵活性,可以有效优化卷积运算,而卷积运算在 CNN 计算中占很大一部分。此外,与 GPU 相比,FPGA 以其功耗低而闻名,这使其成为一种有前途的节能平台。在本文中,我们回顾并总结了与基于 FPGA 的 CNN 加速器设计相关的各种方法和技术。具体来说,为了全面研究 CNN 加速器,我们研究了各种优化 CNN 加速器的方法的优缺点以及先前设计的高效加速器架构。我们希望这篇论文能够成为未来人工智能硬件研究的重要指南。

更新日期:2024-03-30
down
wechat
bug