Abstract
Remote Direct Memory Access (RDMA) is a widely adopted optimization strategy in datacenter networking that surpasses traditional kernel-based TCP/IP networking through mechanisms such as kernel bypass and hardware offloading. However, RDMA also faces a scalability challenge with regard to connection management due to limited on-chip memory capacity in the RDMA Network Interface Card (RNIC). This necessitates the storage of connection context within RNIC’s memory and induces considerable performance degradation when maintaining a large number of connections. In this paper, we propose a novel RNIC microarchitecture design that achieves peak performance and scales well with the number of connections. First, we model RNIC and identify two key factors that degrade performance when the number of connections grows large: head-of-line blocking when accessing the connection context and connection context dependency in transmission processing. To address the head-of-line blocking problem, we then combine a non-blocking connection requester and connection context management module to process prepared connections first, which achieves peak message rate when the number of connections grows large. Besides, to eliminate connection context dependency in RNIC, we deploy a latency-hiding connection context scheduling strategy, maintaining low latency when the number of connections increases. We implement and evaluate our design, demonstrating its successful maintenance of peak message rate (66.4 Mop/s) and low latency (3.89 µs) while scaling to over 50,000 connections with less on-chip memory footprint.
Similar content being viewed by others
Availability of data and materials
All data generated or analyzed during this study are included in this published article. And the relevant program can be found in [14].
References
Zhu Yibo et al (2015) Congestion control for large-scale RDMA deployments. ACM SIGCOMM Comput Commun Rev 45(4):523–536
Guo C et al (2016) RDMA over commodity ethernet at scale. In: Proceedings of the 2016 ACM SIGCOMM Conference
Gao Y et al (2021) When cloud storage meets RDMA. In: 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)
Li Y et al (2019) HPCC: High precision congestion control. In: Proceedings of the ACM Special Interest Group on Data Communication. pp 44-58
(2024). What is a hyperscale data center? Website. https://www.datacenterdynamics.com/en/analysis/what-is-a-hyperscale-data-center/
Zamanian E et al (2016) The end of a myth: Distributed transactions can scale. arXiv preprint arXiv:1607.00655
Chen Y, Lu Y, Shu J (2019) Scalable RDMA RPC on reliable connection with efficient resource sharing. In: Proceedings of the Fourteenth EuroSys Conference 2019
Sidler D et al (2020) StRoM: smart remote memory. In: Proceedings of the Fifteenth European Conference on Computer Systems
Xilinx (2022) ERNIC. Website. https://www.xilinx.com/products/intellectual-property/ef-di-ernic.html
Schelten N et al (2020) A high-throughput, resource-efficient implementation of the RoCEv2 remote DMA protocol for network-attached hardware accelerators. In: 2020 International Conference on Field-Programmable Technology (ICFPT). IEEE
NVIDIA (2022) Mellanox adapters programmer’s reference manual (PRM). Website. https://network.nvidia.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf
Wang X et al (2021) StaR: breaking the scalability limit for RDMA. In: 2021 IEEE 29th International Conference on Network Protocols (ICNP). IEEE
Infiniband (2022) Infiniband architecture specification, Vol. 1. Website. https://cw.infinibandta.org/document/dl/8567
NCSG Group (2022) csRNA. https://github.com/ncsg-group/csRNA
Kang N et al (2022) csRNA: Connection-Scalable RDMA NIC Architecture in Datacenter Environment. In: 2022 IEEE 40th International Conference on Computer Design (ICCD). IEEE
Kalia A, Michael K, Andersen DG (2016) Design guidelines for high performance RDMA systems. In: 2016 USENIX Annual Technical Conference (USENIX ATC 16)
Mellanox Technologies (2023) mthca driver. Website. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/infiniband/hw/mthca
Kalia A, Michael K, David A (2019) Datacenter RPCs can be General and Fast. In: 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19)
Neugebauer R et al (2018) Understanding PCIe performance for end host networking. In: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication
Wang Z et al (2020) Shuhai: Benchmarking high bandwidth memory on fpgas. In: 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE
Wang Z et al (2023) SRNIC: a scalable architecture for RDMANICs. In: 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)
(2024). NVIDIA CONNECTX-5 INFINIBAND ADAPTER CARDS. https://nvdam.widen.net/s/pkxbnmbgkh/networking-infiniband-datasheet-connectx-5-2069273
Mellanox Technologies (2023) mlx5 driver. Website. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/infiniband/hw/mlx5
Lowe-Power J et al (2020) The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152
Monga SK, Kashyap S, Min C (2021) Birds of a feather flock together: scaling RDMA RPCs with flock. In: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
Mellanox Technologies (2022) libibverbs. Website. https://github.com/linux-rdma/rdma-core/tree/master/libibverbs
(2024). NVIDIA CONNECTX-6. https://nvdam.widen.net/s/5j7xtzqfxd/connectx-6-infiniband-datasheet-1987500-r2
Pan P et al (2019) Towards stateless RNIC for data center networks. In: Proceedings of the 3rd Asia-Pacific Workshop on Networking 2019
Tsai S-Y, Yiying Z (2017) Lite kernel rdma support for datacenter applications. In: Proceedings of the 26th Symposium on Operating Systems Principles
Singhvi A et al (2020) 1rma: Re-envisioning remote memory access for multi-tenant datacenters. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication
(2024). Dynamically connected transport. https://docs.nvidia.com/networking /display/bf3dpu/introduction
Mittal R et al (2018) Revisiting network support for RDMA. In: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication
Funding
This work is supported in part by the National Key Research and Development Program of China (No. 2021YFB0300700), International Partnership Program of Chinese Academy of Sciences (No. 171111KYSB20180011), National Science and Technology Innovation Program 2030 (No. 2020AAA0104402), NSFC (No. 61972380), the National Key Research and Development Program (No. 2021YFB0300700), the Research and Innovation Program of State Key Laboratory of Computer Architecture, ICT, CAS (No. CARCH5407).
Author information
Authors and Affiliations
Contributions
Ning Kang, Guangming Tan, Guojun Yuan, and Zhan Wang wrote the main part of the manuscript. Ning Kang, Fan Yang, Zhenlong Ma, and Xiaoxiao Ma are responsible for the design and evaluation. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethical Approval
Not applicable.
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kang, N., Wang, Z., Yang, F. et al. Towards connection-scalable RNIC architecture. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05991-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-05991-4