当前位置: X-MOL 学术J. Cloud Comp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate and fast congestion feedback in MEC-enabled RDMA datacenters
Journal of Cloud Computing ( IF 3.418 ) Pub Date : 2024-03-25 , DOI: 10.1186/s13677-024-00642-8
Xin He , Feifan Liang , Weibei Fan , Junchang Wang , Lei Han , Fu Xiao , Wanchun Dou

Mobile edge computing (MEC) is a novel computing paradigm that pushes computation and storage resources to the edge of the network. The interconnection of edge servers forms small-scale data centers, enabling MEC to provide low-latency network services for mobile users. Nowadays, Remote Direct Memory Access (RDMA) has been widely deployed in such data centers to reduce CPU overhead and network latency. Plenty of congestion control mechanisms have been proposed for RDMA data centers, aiming to provide low-latency data delivery and high throughput network services. However, our fine-grained experimental analysis reveals that existing congestion control mechanisms still have performance limitations due to inappropriate congestion notifications and the long congestion feedback cycle. In this paper, we propose Mercury, which is an accurate and fast congestion feedback mechanism. Mercury comprises two key components: (1) the state-driven congestion detection and (2) the window-based congestion notification. Specifically, the state-driven congestion detection monitors the queue length and the number of packets received at the switch egress port when the PFC is triggered. It determines the states of egress ports and identifies flows that really contribute to congestion. Then, window-based congestion notification calculates the window sizes for congested flows and rapidly returns Congestion Notification Packets (CNPs) with the window information to the sender. It facilitates the rate adjustment of congested flows. Mercury is compatible with existing RDMA CC mechanisms and can be easily implemented in switches. We employ real-world data sets and conduct both micro-benchmark and large-scale simulations to evaluate the performance of Mercury. The results indicate that, thanks to the accurate and fast congestion feedback, Mercury achieves a reduction in the 99th tail flow completion time by up to 45.1%, 41.8%, 38.7%, 30.9%, and 37.9% compared with Timely, DCQCN, DCQCN+TCD, PACC, and HPCC, respectively.

中文翻译:

支持 MEC 的 RDMA 数据中心准确快速的拥塞反馈

移动边缘计算(MEC)是一种新颖的计算范式,它将计算和存储资源推送到网络边缘。边缘服务器互连形成小型数据中心,使MEC能够为移动用户提供低延迟的网络服务。如今,远程直接内存访问(RDMA)已广泛部署在此类数据中心中,以减少 CPU 开销和网络延迟。人们为RDMA数据中心提出了大量的拥塞控制机制,旨在提供低延迟的数据传输和高吞吐量的网络服务。然而,我们的细粒度实验分析表明,由于不适当的拥塞通知和较长的拥塞反馈周期,现有的拥塞控制机制仍然存在性能限制。在本文中,我们提出了 Mercury,这是一种准确且快速的拥塞反馈机制。 Mercury 包含两个关键组件:(1) 状态驱动的拥塞检测和 (2) 基于窗口的拥塞通知。具体来说,状态驱动的拥塞检测在触发 PFC 时监视队列长度和交换机出端口接收的数据包数量。它确定出口端口的状态并识别真正导致拥塞的流量。然后,基于窗口的拥塞通知计算拥塞流的窗口大小,并快速将带有窗口信息的拥塞通知数据包(CNP)返回给发送方。它有利于拥塞流量的速率调整。 Mercury与现有的RDMA CC机制兼容,并且可以在交换机中轻松实现。我们采用真实世界的数据集并进行微基准测试和大规模模拟来评估 Mercury 的性能。结果表明,得益于准确、快速的拥塞反馈,与Timely、DCQCN、DCQCN相比,Mercury实现了第99尾流完成时间最多减少45.1%、41.8%、38.7%、30.9%和37.9%分别为+TCD、PACC 和HPCC。
更新日期:2024-03-25
down
wechat
bug