当前位置: X-MOL 学术ACM Trans. Embed. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics
ACM Transactions on Embedded Computing Systems ( IF 2 ) Pub Date : 2024-03-18 , DOI: 10.1145/3640464
Zhiqiang Que 1 , Hongxiang Fan 1 , Marcus Loo 1 , He Li 2 , Michaela Blott 3 , Maurizio Pierini 4 , Alexander Tapper 5 , Wayne Luk 1
Affiliation  

This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes per second in the Level-1 triggers at the CERN Large Hadron Collider experiments. This article proposes a novel outer-product based matrix multiplication approach, which is enhanced by exploiting the structured adjacency matrix and a column-major data layout. In addition, we propose a custom code transformation for the matrix multiplication operations, which leverages the structured sparsity patterns and binary features of adjacency matrices to reduce latency and improve hardware efficiency. Moreover, a fusion step is introduced to further reduce the end-to-end design latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under given latency constraints. To facilitate this, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times higher power efficiency than a GPU implementation. Compared to the previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy. The proposed LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.



中文翻译:

LL-GNN:用于高能物理的 FPGA 上的低延迟图神经网络

这项工作为粒子探测器的低延迟图神经网络(LL-GNN)设计提供了一种新颖的可重构架构,提供了前所未有的低延迟性能。将基于 FPGA 的 GNN 整合到粒子探测器中提出了独特的挑战,因为在 CERN 大型强子对撞机实验的 1 级触发器中部署用于在线事件选择的网络需要亚微秒延迟,数据速率为每秒数百 TB 。本文提出了一种新颖的基于外积的矩阵乘法方法,该方法通过利用结构化邻接矩阵和列主数据布局来增强。此外,我们提出了一种用于矩阵乘法运算的自定义代码转换,它利用结构化稀疏模式和邻接矩阵的二进制特征来减少延迟并提高硬件效率。此外,还引入了融合步骤,通过消除不必要的边界来进一步减少端到端设计延迟。此外,还提出了一种特定于 GNN 的算法-硬件协同设计方法,该方法不仅可以找到具有更好延迟的设计,而且可以在给定延迟约束下找到高精度的设计。为了实现这一目标,我们设计了适用于这种低延迟 GNN 硬件架构的可定制模板并开源,从而能够使用高级综合工具生成具有高效资源利用的低延迟 FPGA 设计。评估结果表明,与 GPU 实现相比,我们的 FPGA 实现速度提高了 9.0 倍,功效提高了 13.1 倍。与之前的 FPGA 实现相比,这项工作的延迟降低了 6.51 至 16.7 倍。此外,我们的 FPGA 设计的延迟足够低,能够在亚微秒的实时碰撞触发系统中部署 GNN,从而使其能够从提高的精度中受益。所提出的 LL-GNN 设计通过启用复杂的算法来有效地处理实验数据,从而推进了下一代触发系统。

更新日期:2024-03-22
down
wechat
bug