Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2024-04-12 , DOI: 10.1002/cpe.8113
Verónica G. Melesse Vergara ₁ , Reuben D. Budiardja ₁ , Wayne Joubert ₁

Affiliation

SummaryThe Oak Ridge Leadership Computing Facility (OLCF) has a long history of supporting and promoting GPU‐accelerated computing starting with the deployment of the Titan supercomputer in 2021 and continuing with the Summit supercomputer which has a theoretical peak performance of approximately 200 petaflops. Because the majority of Summit's computational power comes from its 27,972 GPUs, users must port their applications to one of the supported programming models in order to make efficient use of the system. To prepare the transition to Frontier, the OLCF's exascale supercomputer, users will need to adapt to an entirely new ecosystem which will include new hardware and software technologies. First, users will need to familiarize themselves with the AMD Radeon GPU architecture. Furthermore, users who have been previously relying on CUDA will need to transition to the Heterogeneous‐Computing Interface for Portability (HIP) or one of the other supported programming models (e.g., OpenMP, OpenACC). In this work, we describe our initial experiences and lessons learned in porting three applications or proxy apps currently running on Summit to the HPE/Cray ecosystem to leverage the compute power from AMD GPUs: minisweep, GenASiS, and Sparkler. Each one is representative of current production workloads utilized at the OLCF, different programming languages, and different programming models.

中文翻译：

评估 AMD GPU 的 HPE/Cray 生态系统的早期经验

摘要橡树岭领导计算设施 (OLCF) 在支持和推广 GPU 加速计算方面有着悠久的历史，从 2021 年部署 Titan 超级计算机开始，一直到 Summit 超级计算机，其理论峰值性能约为 200 petaflops。由于 Summit 的大部分计算能力来自其 27,972 个 GPU，因此用户必须将其应用程序移植到受支持的编程模型之一，以便有效地利用系统。为了准备向 OLCF 的百亿亿次超级计算机 Frontier 的过渡，用户将需要适应一个全新的生态系统，其中包括新的硬件和软件技术。首先，用户需要熟悉 AMD Radeon GPU 架构。此外，以前依赖 CUDA 的用户将需要过渡到可移植性异构计算接口 (HIP) 或其他受支持的编程模型之一（例如 OpenMP、OpenACC）。在这项工作中，我们描述了将 Summit 上当前运行的三个应用程序或代理应用程序移植到 HPE/Cray 生态系统以利用 AMD GPU 的计算能力的初步经验和教训：minisweep、GenASiS 和 Sparkler。每一种都代表 OLCF 当前使用的生产工作负载、不同的编程语言和不同的编程模型。

更新日期：2024-04-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>