当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An OpenMP Runtime for Transparent Work Sharing Across Cache-Incoherent Heterogeneous NodesJust Accepted
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2022-02-04 , DOI: 10.1145/3505224
Robert Lyerly 1 , Carlos Bilbao 1 , Changwoo Min 1 , Christopher J. Rossbach 2 , Binoy Ravindran 1
Affiliation  

In this work we present libHetMP, an OpenMP runtime for automatically and transparently distributing parallel computation across heterogeneous nodes. libHetMP targets platforms comprising CPUs with different instruction set architectures (ISA) coupled by a high-speed memory interconnect, where cross-ISA binary incompatibility and non-coherent caches require application data be marshaled to be shared across CPUs. Because of this, work distribution decisions must take into account both relative compute performance of asymmetric CPUs and communication overheads. libHetMP drives workload distribution decisions without programmer intervention by measuring performance characteristics during cross-node execution. A novel HetProbe loop iteration scheduler decides if cross-node execution is beneficial, and either distributes work according to the relative performance of CPUs when it is, or places all work on the set of homogeneous CPUs providing the best performance when it is not. We evaluate libHetMP using compute kernels from several OpenMP benchmark suites and show a geometric mean 41% speedup in execution time across asymmetric CPUs. Because some workloads may showcase irregular behavior among iterations, we extend libHetMP with a second scheduler called HetProbe-I. The evaluation of HetProbe-I shows it can further improve speedup for irregular computation, in some cases up to a 24%, by triggering periodic distribution decisions.



中文翻译:

用于跨缓存不连贯异构节点的透明工作共享的 OpenMP 运行时刚刚接受

在这项工作中,我们介绍libHetMP,一个 OpenMP 运行时,用于跨异构节点自动和透明地分布并行计算。libHetMP目标平台由具有不同指令集架构 (ISA) 的 CPU 通过高速内存互连耦合,其中跨 ISA 二进制不兼容和非连贯缓存要求编组应用程序数据以在 CPU 之间共享。因此,工作分配决策必须考虑非对称 CPU 的相对计算性能和通信开销。libHetMP通过测量跨节点执行期间的性能特征来推动工作负载分配决策,而无需程序员干预。一种新颖的 HetProbe 循环迭代调度器决定跨节点执行是否有益,并且当它是根据 CPU 的相对性能分配工作时,或者在不是时将所有工作放在提供最佳性能的同质 CPU 集合上。我们评估libHetMP使用来自多个 OpenMP 基准测试套件的计算内核,并显示跨非对称 CPU 的执行时间几何平均加速了 41%。因为一些工作负载可能会在迭代中表现出不规则的行为,我们扩展libHetMP第二个调度程序称为 HetProbe-I。HetProbe-I 的评估表明,通过触发周期性分布决策,它可以进一步提高不规则计算的速度,在某些情况下可提高 24%。

更新日期:2022-02-04
down
wechat
bug