当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2024-03-23 , DOI: 10.1145/3641853
Feng Xue 1 , Chenji Han 1 , Xinyu Li 1 , Junliang Wu 1 , Tingting Zhang 2 , Tianyi Liu 3 , Yifan Hao 4 , Zidong Du 4 , Qi Guo 4 , Fuxin Zhang 4
Affiliation  

Indirect memory accesses (IMAs, i.e., A[f(B[i])]) are typical memory access patterns in applications such as graph analysis, machine learning, and database. IMAs are composed of producer-consumer pairs, where the consumers’ memory addresses are derived from the producers’ memory data. Due to the built-in value-dependent feature, IMAs exhibit poor locality, making prefetching ineffective. Hindered by the challenges of recording the potentially complex graphs of instruction dependencies among IMA producers and consumers, current state-of-the-art hardware prefetchers either (a) exhibit inadequate IMA identification abilities or (b) rely on the run-ahead mechanism to prefetch IMAs intermittently and insufficiently.

To solve this problem, we propose Tyche,1 an efficient and general hardware prefetcher to enhance IMA performance. Tyche adopts a bilateral propagation mechanism to precisely excavate the instruction dependencies in simple chains with moderate length (rather than complex graphs). Based on the exact instruction dependencies, Tyche can accurately identify various IMA patterns, including nonlinear ones, and generate accurate prefetching requests continuously. Evaluated on broad benchmarks, Tyche achieves an average performance speedup of 16.2% over the state-of-the-art spatial prefetcher Berti. More importantly, Tyche outperforms the state-of-the-art IMA prefetchers IMP, Gretch, and Vector Runahead, by 15.9%, 12.8%, and 10.7%, respectively, with a lower storage overhead of only 0.57 KB.



中文翻译:

Tyche:一种用于间接内存访问的高效通用预取器

间接内存访问(IMA,即A [ f ( B [ i ])])是图分析、机器学习和数据库等应用中的典型内存访问模式。 IMA 由生产者-消费者对组成,其中消费者的内存地址源自生产者的内存数据。由于内置的​​值相关功能,IMA 的局部性较差,导致预取无效。由于记录 IMA 生产者和消费者之间潜在的复杂指令依赖关系图的挑战,当前最先进的硬件预取器要么 (a) 表现出不足的 IMA 识别能力,要么 (b) 依赖提前运行机制间歇性且不充分地预取 IMA。

为了解决这个问题,我们提出了 Tyche,1一种高效通用的硬件预取器,以增强 IMA 性能。 Tyche采用双边传播机制来精确挖掘中等长度的简单链(而不是复杂的图)中的指令依赖关系。基于精确的指令依赖关系,Tyche可以准确识别各种IMA模式,包括非线性模式,并持续生成准确的预取请求。根据广泛的基准评估,Tyche 比最先进的空间预取器 Berti 平均性能提升了 16.2%。更重要的是,Tyche 的性能比最先进的 IMA 预取器 IMP、Gretch 和 Vector Runahead 分别高出 15.9%、12.8% 和 10.7%,存储开销仅为 0.57 KB。

更新日期:2024-03-23
down
wechat
bug