当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural Network
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2023-12-14 , DOI: 10.1145/3617685
Hui Yu 1 , Yu Zhang 1 , Jin Zhao 1 , Yujian Liao 1 , Zhiying Huang 1 , Donghao He 1 , Lin Gu 1 , Hai Jin 1 , Xiaofei Liao 1 , Haikun Liu 1 , Bingsheng He 2 , Jianhui Yue 3
Affiliation  

Dynamic Graph Neural Network (DGNN) has recently attracted a significant amount of research attention from various domains, because most real-world graphs are inherently dynamic. Despite many research efforts, for DGNN, existing hardware/software solutions still suffer significantly from redundant computation and memory access overhead, because they need to irregularly access and recompute all graph data of each graph snapshot. To address these issues, we propose an efficient redundancy-aware accelerator, RACE, which enables energy-efficient execution of DGNN models. Specifically, we propose a redundancy-aware incremental execution approach into the accelerator design for DGNN to instantly achieve the output features of the latest graph snapshot by correctly and incrementally refining the output features of the previous graph snapshot and also enable regular accesses of vertices’ input features. Through traversing the graph on the fly, RACE identifies the vertices that are not affected by graph updates between successive snapshots to reuse these vertices’ states (i.e., their output features) of the previous snapshot for the processing of the latest snapshot. The vertices affected by graph updates are also tracked to incrementally recompute their new states using their neighbors’ input features of the latest snapshot for correctness. In this way, the processing and accessing of many graph data that are not affected by graph updates can be correctly eliminated, enabling smaller redundant computation and memory access overhead. Besides, the input features, which are accessed more frequently, are dynamically identified according to graph topology and are preferentially resident in the on-chip memory for less off-chip communications. Experimental results show that RACE achieves on average 1139× and 84.7× speedups for DGNN inference, with average 2242× and 234.2× energy savings, in comparison with the state-of-the-art software DGNN running on Intel Xeon CPU and NVIDIA A100 GPU, respectively. Moreover, for DGNN inference, RACE obtains on average 13.1×, 11.7×, 10.4×, and 7.9× speedup and 14.8×, 12.9×, 11.5×, and 8.9× energy savings over the state-of-the-art Graph Neural Network accelerators, i.e., AWB-GCN, GCNAX, ReGNN, and I-GCN, respectively.



中文翻译:

RACE:动态图神经网络的高效冗余感知加速器

动态图神经网络(DGNN)最近吸引了各个领域的大量研究关注,因为大多数现实世界的图本质上都是动态的。尽管进行了许多研究工作,但对于 DGNN,现有的硬件/软件解决方案仍然受到冗余计算和内存访问开销的严重影响,因为它们需要不规则地访问和重新计算每个图快照的所有图数据。为了解决这些问题,我们提出了一种高效的冗余感知加速器RACE,它可以实现 DGNN 模型的节能执行。具体来说,我们在 DGNN 的加速器设计中提出了一种冗余感知增量执行方法,通过正确地、增量地细化前一个图快照的输出特征来立即实现最新图快照的输出特征,并且还能够定期访问顶点的输入特征。通过动态遍历图,RACE 识别连续快照之间不受图更新影响的顶点,以重用前一个快照的这些顶点状态(即,它们的输出特征)来处理最新快照。还跟踪受图更新影响的顶点,以使用最新快照的邻居输入特征增量地重新计算其新状态,以确保正确性。这样,可以正确消除许多不受图更新影响的图数据的处理和访问,从而实现更小的冗余计算和内存访问开销。此外,访问更频繁的输入特征根据图拓扑动态识别,并优先驻留在片上存储器中,以减少片外通信。实验结果表明,与运行在 Intel Xeon CPU 和 NVIDIA A100 GPU 上的最先进软件 DGNN 相比,RACE 的 DGNN 推理速度平均提高了 1139 倍和 84.7 倍,平均节能 2242 倍和 234.2 倍, 分别。此外,对于 DGNN 推理,与最先进的图神经网络相比,RACE 平均获得 13.1×、11.7×、10.4× 和 7.9× 的加速,以及 14.8×、12.9×、11.5× 和 8.9× 的节能加速器,分别是 AWB-GCN、GCNAX、ReGNN 和 I-GCN。

更新日期:2023-12-14
down
wechat
bug