当前位置: X-MOL 学术J. Circuits Syst. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FPGA-Based High-Speed Energy-Efficient 32-Bit Fixed-Point MAC Architecture for DSP Application in IoT Edge Computing
Journal of Circuits, Systems and Computers ( IF 1.5 ) Pub Date : 2024-04-10 , DOI: 10.1142/s0218126624502505
Mitul Sudhirkumar Nagar 1 , Sohan H. Patel 1 , Pinalkumar Engineer 1
Affiliation  

Designing high-speed and energy-efficient blocks for image and digital signal processing (DSP) architecture is an evolving research field. This work designs a high-speed and energy-efficient multiply-accumulate (MAC) unit to augment the performance of field-programmable gate array (FPGA)-based accelerators and softcore processors. In this work, three discrete 32-bit fixed-point signed MAC architectures were designed in Verilog and synthesized for the Zynq 7000 ZedBoard to obtain efficient MAC architecture. The ultimate goal of this work is to design a fast and energy-efficient MAC unit that can achieve speed up to the DSP48 block to reduce the latency of IoT edge computing. Energy efficiency was achieved in PPG and partial product addition (PPA) for the proposed Booth radix-4 Dadda (BR4D)-based MAC. At PPG, the width of the partial product (PP) terms was optimized with Bewick’s signed extension to reduce the power consumption. At PPA, the number of PP rows reduces the critical path delay (CPD) with Dadda-based PPA. The proposed BR4D MAC unit offers a reduction in dynamic power, CPD, power-delay product (PDP) and energy-delay product (EDP) by 22%, 9%, 29% and 36%, respectively, compared to standard Booth radix-4 Wallace tree (BR4WT) based MAC. Furthermore, hybrid MACs (BR4WT and BR4D) were compared with the current state-of-the-art (SoA) designs, and it was found that the proposed BR4D MAC is 47% faster compared to the same design in SoA. The proposed BR4D was tested for frequency scaling technique by reducing the frequency in steps of 10 MHz from a maximum usable frequency (MUF) of 64 MHz to 10 MHz to evaluate the performance for low-power applications. Reducing clock frequency by 84% will reduce the power consumption at the same proportion and speed by 38%. Additionally, the proposed design helps to improve the battery life of IoT end nodes with a reduction in energy consumption and EDP by 76% and 61%, respectively.



中文翻译:

基于 FPGA 的高速节能 32 位定点 MAC 架构,适用于物联网边缘计算中的 DSP 应用

为图像和数字信号处理 (DSP) 架构设计高速且节能的模块是一个不断发展的研究领域。这项工作设计了一种高速、节能的乘法累加 (MAC) 单元,以增强基于现场可编程门阵列 (FPGA) 的加速器和软核处理器的性能。在这项工作中,在 Verilog 中设计了三个离散 32 位定点签名 MAC 架构,并为 Zynq 7000 ZedBoard 进行综合,以获得高效的 MAC 架构。这项工作的最终目标是设计一种快速且节能的 MAC 单元,可以实现高达 DSP48 块的速度,以减少物联网边缘计算的延迟。对于提议的基于 Booth radix-4 Dadda (BR4D) 的 MAC,通过 PPG 和部分积加法 (PPA) 实现了能源效率。在 PPG,部分乘积 (PP) 项的宽度通过 Bewick 的签名扩展进行了优化,以降低功耗。在 PPA 中,PP 行数减少了基于 Dadada 的 PPA 的关键路径延迟 (CPD)。与标准 Booth 基数相比,拟议的 BR4D MAC 单元将动态功耗、CPD、功率延迟乘积 (PDP) 和能量延迟乘积 (EDP) 分别降低了 22%、9%、29% 和 36%。 4 基于华莱士树 (BR4WT) 的 MAC。此外,将混合 MAC(BR4WT 和 BR4D)与当前最先进的 (SoA) 设计进行比较,发现所提出的 BR4D MAC 比 SoA 中的相同设计快 47%。所提出的 BR4D 通过频率缩放技术进行了测试,以 10 MHz 为步长将频率从 64 MHz 的最大可用频率 (MUF) 降低到 10 MHz,以评估低功耗应用的性能。时钟频率降低84%,同比例、同速度下功耗可降低38%。此外,所提出的设计有助于提高物联网终端节点的电池寿命,能耗和 EDP 分别降低 76% 和 61%。

更新日期:2024-04-13
down
wechat
bug