Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2024-01-19 , DOI: 10.1145/3634916
Zhonghua Wang ₁ , Yixing Guo ₁ , Kai Lu ₁ , Jiguang Wan ₁ , Daohui Wang ₂ , Ting Yao ₂ , Huatao Wu ₂

Affiliation

Memory disaggregation is a promising architecture for modern datacenters that separates compute and memory resources into independent pools connected by ultra-fast networks, which can improve memory utilization, reduce cost, and enable elastic scaling of compute and memory resources. However, existing memory disaggregation solutions based on remote direct memory access (RDMA) suffer from high latency and additional overheads including page faults and code refactoring. Emerging cache-coherent interconnects such as CXL offer opportunities to reconstruct high-performance memory disaggregation. However, existing CXL-based approaches have physical distance limitation and cannot be deployed across racks.

In this article, we propose Rcmp, a novel low-latency and highly scalable memory disaggregation system based on RDMA and CXL. The significant feature is that Rcmp improves the performance of RDMA-based systems via CXL, and leverages RDMA to overcome CXL’s distance limitation. To address the challenges of the mismatch between RDMA and CXL in terms of granularity, communication, and performance, Rcmp (1) provides a global page-based memory space management and enables fine-grained data access, (2) designs an efficient communication mechanism to avoid communication blocking issues, (3) proposes a hot-page identification and swapping strategy to reduce RDMA communications, and (4) designs an RDMA-optimized RPC framework to accelerate RDMA transfers. We implement a prototype of Rcmp and evaluate its performance by using micro-benchmarks and running a key-value store with YCSB benchmarks. The results show that Rcmp can achieve 5.2× lower latency and 3.8× higher throughput than RDMA-based systems. We also demonstrate that Rcmp can scale well with the increasing number of nodes without compromising performance.

中文翻译：

Rcmp：通过 CXL 重建基于 RDMA 的内存分解

内存分解对于现代数据中心来说是一种很有前景的架构，它将计算和内存资源分离到通过超高速网络连接的独立池中，这可以提高内存利用率、降低成本并实现计算和内存资源的弹性扩展。然而，基于远程直接内存访问 (RDMA) 的现有内存分解解决方案存在高延迟和额外开销（包括页面错误和代码重构）的问题。新兴的缓存一致性互连（例如 CXL）为重建高性能内存分解提供了机会。然而，现有的基于 CXL 的方法存在物理距离限制，并且无法跨机架部署。

在本文中，我们提出了 Rcmp，一种基于 RDMA 和 CXL 的新型低延迟且高度可扩展的内存分解系统。Rcmp 的显着特点是通过 CXL 提高了基于 RDMA 的系统的性能，并利用 RDMA 克服了 CXL 的距离限制。为了解决RDMA和CXL在粒度、通信和性能方面不匹配的挑战，Rcmp（1）提供基于全局页面的内存空间管理并实现细粒度的数据访问，（2）设计高效的通信机制为了避免通信阻塞问题，(3)提出了一种热页识别和交换策略来减少RDMA通信，(4)设计了一个RDMA优化的RPC框架来加速RDMA传输。我们实现了 Rcmp 的原型，并通过使用微基准测试并使用 YCSB 基准运行键值存储来评估其性能。结果表明，与基于 RDMA 的系统相比，Rcmp 可以实现 5.2 倍的低延迟和 3.8 倍的高吞吐量。我们还证明了 Rcmp 可以随着节点数量的增加而很好地扩展，而不会影响性能。

更新日期：2024-01-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>