当前位置: X-MOL 学术J. Circuits Syst. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hardware Prefetching Tuning Method Based on Program Phase Behavior
Journal of Circuits, Systems and Computers ( IF 1.5 ) Pub Date : 2024-03-28 , DOI: 10.1142/s0218126624501585
Liangming Huang , Li Yan , Tiebin Wu

Modern high-performance processor systems universally employ hardware prefetch engines to address the “memory wall” issue. Nonetheless, prefetchers are typically activated with the default configuration at system startup, and this fixed configuration does not always achieve the intended performance in the face of varied programs and may even degrade performance. As a result, it is crucial to investigate the prefetch configuration tuning method that adapts to different program characteristics in order to take full advantage of hardware prefetching. In this study, a hardware prefetching tuning method based on program phase behavior is proposed to determine the prefetch configuration that maximizes the overall predicted performance of the program through low-overhead online profiling. In the profiling process, the branch instruction vector sampled by the hardware performance counter is used to dynamically classify the program phase behavior, and the performance profiling is performed for each type of phase. Simultaneously, the recurring program phases are no longer profiled to reduce overhead. Following the profiling, the prefetch configuration with the best predicted performance is derived by combining the performance data from each phase and its running time proportion. The results of the tests on prefetch-sensitive programs in SPEC2006, NPB, and PARSEC demonstrate that the prefetch configuration obtained using the suggested method has a geometric average performance improvement of 7.34% over the default configuration and achieves 99.34% of the optimal configuration. Furthermore, the profiling run adds only 2.24% extra overhead as compared to the default configuration.



中文翻译:

基于程序阶段行为的硬件预取调整方法

现代高性能处理器系统普遍采用硬件预取引擎来解决“内存墙”问题。尽管如此,预取器通常在系统启动时以默认配置激活,而这种固定配置在面对不同的程序时并不总能达到预期的性能,甚至可能会降低性能。因此,研究适应不同程序特性的预取配置调整方法,以充分利用硬件预取的优势至关重要。在本研究中,提出了一种基于程序阶段行为的硬件预取调整方法,通过低开销在线分析来确定最大化程序整体预测性能的预取配置。在分析过程中,利用硬件性能计数器采样的分支指令向量对程序阶段行为进行动态分类,并对每种类型的阶段进行性能分析。同时,不再分析重复的程序阶段以减少开销。分析之后,通过结合每个阶段的性能数据及其运行时间比例,得出具有最佳预测性能的预取配置。在SPEC2006、NPB和PARSEC上对预取敏感程序的测试结果表明,使用该方法获得的预取配置比默认配置的几何平均性能提高了7.34%,达到了最优配置的99.34%。此外,与默认配置相比,分析运行仅增加 2.24% 的额外开销。

更新日期:2024-03-29
down
wechat
bug