DxPU: Large Scale Disaggregated GPU Pools in the Datacenter,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DxPU: Large Scale Disaggregated GPU Pools in the Datacenter
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2023-10-05 , DOI: 10.1145/3617995
Bowen He ₁ , Xiao Zheng ₂ , Yuan Chen ₁ , Weinan Li ₂ , Yajin Zhou ₃ , Xin Long ₂ , Pengcheng Zhang ₂ , Xiaowei Lu ₂ , Linquan Jiang ₂ , Qiang Liu ₂ , Dennis Cai ₂ , Xiantao Zhang ₂

Affiliation

The rapid adoption of AI and convenience offered by cloud services have resulted in the growing demands for GPUs in the cloud. Generally, GPUs are physically attached to host servers as PCIe devices. However, the fixed assembly combination of host servers and GPUs is extremely inefficient in resource utilization, upgrade, and maintenance. Due to these issues, the GPU disaggregation technique has been proposed to decouple GPUs from host servers. It aggregates GPUs into a pool, and allocates GPU node(s) according to user demands. However, existing GPU disaggregation systems have flaws in software-hardware compatibility, disaggregation scope, and capacity.

In this paper, we present a new implementation of datacenter-scale GPU disaggregation, named DxPU. DxPU efficiently solves the above problems and can flexibly allocate as many GPU node(s) as users demand. In order to understand the performance overhead incurred by DxPU, we build up a performance model for AI specific workloads. With the guidance of modeling results, we develop a prototype system, which has been deployed into the datacenter of a leading cloud provider for a test run. We also conduct detailed experiments to evaluate the performance overhead caused by our system. The results show that the overhead of DxPU is less than 10%, compared with native GPU servers, in most of user scenarios.

中文翻译：

DxPU：数据中心中的大规模分解 GPU 池

人工智能的快速采用和云服务提供的便利导致对云中 GPU 的需求不断增长。通常，GPU 作为 PCIe 设备物理连接到主机服务器。然而，主机服务器和GPU的固定组装组合在资源利用、升级和维护方面效率极低。由于这些问题，人们提出了 GPU 分解技术来将 GPU 与主机服务器解耦。它将 GPU 聚合到一个池中，并根据用户需求分配 GPU 节点。然而，现有的GPU分解系统在软硬件兼容性、分解范围和容量方面存在缺陷。

在本文中，我们提出了一种数据中心规模GPU 分解的新实现，名为 DxPU。DxPU有效解决了上述问题，并且可以根据用户需求灵活分配多个GPU节点。为了了解 DxPU 产生的性能开销，我们为 AI 特定工作负载建立了一个性能模型。在建模结果的指导下，我们开发了一个原型系统，该系统已部署到领先云提供商的数据中心进行测试运行。我们还进行了详细的实验来评估我们的系统造成的性能开销。结果表明，在大多数用户场景下，与原生 GPU 服务器相比，DxPU 的开销不到 10%。

更新日期：2023-10-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>