Abstract
Many researchers are turning to SmartNIC offloads to improve the performance of high-performance networked systems. In this editorial, I discuss why SmartNICs are an especially powerful form factor for improving I/O intensive applications, and how their position in the dataplane enables them to take on central role in managing I/O. Rather than focusing on the benefits of individual offloads, this paper aims to explore the position of SmartNICs in the overall system integration of datacenter servers at the hardware and software level. I argue that SmartNICs should be viewed as 'data movement controllers' (NIC-DMCs) which are responsible for tasks involved in moving data between network, CPU, accelerators, and other endpoints: multiplexing/steering, interfacing between protocols, and enforcing I/O policies. I then enumerate open questions in how the hardware and software systems of the future will evolve to accommodate a dedicated NIC-DMC which is independent of the CPU complex.
- 2023. Private communciation with Brian Nigito and Ron Minsky.Google Scholar
- 2023. Private communciation with Ren Wang.Google Scholar
- 2023. Compute Express Link. https://www.computeexpresslink.org/Google Scholar
- Amazon. 2023. AWS Nitro System. https://aws.amazon.com/ec2/nitro/Google Scholar
- AMD Corporation. [n. d.]. AMD Pensando SmartNIC. https://www.amd.com/en/accelerators/pensandoGoogle Scholar
- Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. 2020. Enabling Programmable Transport Protocols in High-Speed NICs. In Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation (Santa Clara, CA, USA) (NSDI'20). USENIX Association, USA, 93--110.Google ScholarDigital Library
- Microsoft Azure. 2023. Preview: Azure Boost. https://azure.microsoft.com/en-us/updates/preview-azure-boost/Google Scholar
- Marco Spaziani Brunella, Giacomo Belocchi, Marco Bonola, Salvatore Pontarelli, Giuseppe Siracusano, Giuseppe Bianchi, Aniello Cammarano, Alessandro Palumbo, Luca Petrucci, and Roberto Bifulco. 2020. hXDP: Efficient Software Packet Processing on FPGA NICs. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 973--990. https://www.usenix.org/conference/osdi20/presentation/brunellaGoogle Scholar
- Jian Chen, Xiaoyu Zhang, Tao Wang, Ying Zhang, Tao Chen, Jiajun Chen, Mingxu Xie, and Qiang Liu. 2022. Fidas: Fortifying the Cloud via Comprehensive FPGA-Based Offloading for Intrusion Detection: Industrial Product. In Proceedings of the 49th Annual International Symposium on Computer Architecture (New York, New York) (ISCA '22). Association for Computing Machinery, New York, NY, USA, 1029--1041. Google ScholarDigital Library
- Kit Colbert. 2020. Announcing Project Monterey: Redefining Hybrid Cloud Architecture. VMware Blog.Google Scholar
- Kevin Deierling. 2020. What Is a DPU? NVIDIA Blog. https://blogs.nvidia.com/blog/2020/05/20/whats-a-dpu-data-processing-unit/Google Scholar
- Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (Renton, WA, USA) (NSDI'18). USENIX Association, USA, 51--64.Google Scholar
- Alex Forencich, Alex C. Snoeren, George Porter, and George Papen. 2020. Corundum: An Open-Source 100-Gbps Nic. In 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020, Fayetteville, AR, USA, May 3-6, 2020. IEEE, 38--46. Google ScholarCross Ref
- Intel Corporation. [n. d.]. Infrastructure Processing Unit. https://www.intel.com/content/www/us/en/products/details/network-io/ipu.htmGoogle Scholar
- Intel Corporation. 2023. Introduction to Intel Ethernet Flow Director and Memcached Performance. Whitepaper 331109-001US.Google Scholar
- Georgios P. Katsikas, Tom Barbette, Dejan Kostić, JR. Gerald Q. Maguire, and Rebecca Steinert. 2021. Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes. ACM Trans. Comput. Syst. 38, 1--2, Article 3 (jul 2021), 45 pages. Google ScholarDigital Library
- Georgios P. Katsikas, Tom Barbette, Dejan Kostić, JR. Gerald Q. Maguire, and Rebecca Steinert. 2021. Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes. ACM Trans. Comput. Syst. 38, 1--2, Article 3 (jul 2021), 45 pages. Google ScholarDigital Library
- Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. SIGARCH Comput. Archit. News 44, 2 (mar 2016), 67--81. Google ScholarDigital Library
- Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J. Rossbach. 2018. Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 107--127. http://www.usenix.org/conference/osdi18/presentation/khawajaGoogle Scholar
- Moein Khazraee, Alex Forencich, George C. Papen, Alex C. Snoeren, and Aaron Schulman. 2023. Rosebud: Making FPGA-Accelerated Mid-dlebox Development More Pleasant. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (Vancouver, BC, Canada) (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 586--605. Google ScholarDigital Library
- Taehyun Kim, Deondre Martin Ng, Junzhi Gong, Youngjin Kwon, Minlan Yu, and KyoungSoo Park. 2023. Rearchitecting the TCP Stack for I/O-Offloaded Content Delivery. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 275--292. https://www.usenix.org/conference/nsdi23/presentation/kim-taehyunGoogle Scholar
- Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek. 2000. The Click Modular Router. ACM Trans. Comput. Syst. 18, 3 (aug 2000), 263--297. Google ScholarDigital Library
- Bojie Li, Kun Tan, Layong (Larry) Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, Peng Cheng, and Enhong Chen. 2016. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM '16). Association for Computing Machinery, New York, NY, USA, 1--14. Google ScholarDigital Library
- Jiaxin Lin, Adney Cardoza, Tarannum Khan, Yeonju Ro, Brent E. Stephens, Hassan Wassel, and Aditya Akella. 2023. RingLeader: Efficiently Offloading Intra-Server Orchestration to NICs. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 1293--1308. https://www.usenix.org/conference/nsdi23/presentation/linGoogle Scholar
- Jiaxin Lin, Kiran Patel, Brent E. Stephens, Anirudh Sivaraman, and Aditya Akella. 2020. PANIC: A High-Performance Programmable NIC for Multi-tenant Networks. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 243--259. https://www.usenix.org/conference/osdi20/presentation/linGoogle Scholar
- Mangoboost. 2023. Mangoboost DPU Accelerator. https://mangoboost.io/Google Scholar
- Joseph Melber. 2022. Fluid: Raising the Level of Abstraction for FPGA Accelerator Development Without Compromising Performance. (6 2022). Google ScholarCross Ref
- Mellanox Corporation. [n. d.]. Mellanox ConnectX-5. https://www.nvidia.com/en-us/networking/ethernet/connectx-5/Google Scholar
- James H. Morris, Mahadev Satyanarayanan, Michael H. Conner, John H. Howard, David S. Rosenthal, and F. Donelson Smith. 1986. Andrew: A Distributed Personal Computing Environment. Commun. ACM 29, 3 (mar 1986), 184--201. Google ScholarDigital Library
- Netronome. [n. d.]. Netronome Agilio SmartNICs. https://www.netronome.com/Google Scholar
- NVIDIA. 2023. ConnectX SmartNICs. https://www.nvidia.com/en-us/networking/ethernet-adapters/.Google Scholar
- NVIDIA Corporation. [n. d.]. Bluefield Data Processing Unit. https://www.nvidia.com/en-us/networking/products/data-processing-unit/Google Scholar
- NVIDIA Corporation. 2023. NVIDIA GPUDirect. https://developer.nvidia.com/gpudirectGoogle Scholar
- NVIDIA Corporation. 2023. TLS Offload using NVIDIA Bluefield DPU. https://docs.nvidia.com/doca/sdk/tls-offload/index.htmlGoogle Scholar
- Peripheral Component Interconnect Special Interest Group. 2015. Root Complex Integrated Endpoints and IOV Updates. PCI-SIG Document 11110.Google Scholar
- Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The Operating System is the Control Plane. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 1--16. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peterGoogle ScholarDigital Library
- Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2015. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. IEEE Micro 35, 3, 10--22. Google ScholarDigital Library
- John S. Quarterman, Abraham Silberschatz, and James L. Peterson. 1985. 4.2BSD and 4.3BSD as Examples of the UNIX System. ACM Comput. Surv. 17, 4 (dec 1985), 379--418. Google ScholarDigital Library
- Deepti Raghavan, Shreya Ravi, Gina Yuan, Pratiksha Thaker, Sanjari Srivastava, Micah Murray, Pedro Penna Henrique, Amy Ousterhout, Philip Levis, Matei Zaharia, and Irene Zheng. 2023. Cornflakes: Zero-Copy Serialization for Microsecond-Scale Networking. In Proceedings of the 29th Symposium on Operating Systems Principles (SOSP).Google ScholarDigital Library
- Hugo Sadok, Zhipeng Zhao, Valerie Choung, Nirav Atre, Daniel S. Berger, James C. Hoe, Aurojit Panda, and Justine Sherry. 2021. We Need Kernel Interposition over the Network Dataplane. (May 2021).Google Scholar
- Michael D. Schroeder and Jerome H. Saltzer. 1972. A Hardware Architecture for Implementing Protection Rings. Commun. ACM 15, 3 (mar 1972), 157--170. Google ScholarDigital Library
- Henry N. Schuh, Weihao Liang, Ming Liu, Jacob Nelson, and Arvind Krishnamurthy. 2021. Xenic: SmartNIC-Accelerated Distributed Transactions. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 740--755. Google ScholarDigital Library
- Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. 2022. FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 87--102. https://www.usenix.org/conference/nsdi22/presentation/shashidharaGoogle Scholar
- Justine Sherry. [n. d.]. Re-envisioning Generic Server Architectures for I/O-Driven Compute. Keynote at EuroP4 2022 Workshop. https://www.youtube.com/watch?v=Lo0mVet4eZMGoogle Scholar
- Igor Smolyar, Alex Markuze, Boris Pismenny, Haggai Eran, Gerd Zellweger, Austin Bolen, Liran Liss, Adam Morrison, and Dan Tsafrir. 2020. IOctopus: Outsmarting Nonuniform DMA (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 101--115. Google ScholarDigital Library
- Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 733--750. Google ScholarDigital Library
- The Linux Kernel documentation. [n. d.]. Segmentation Offloads. https://docs.kernel.org/networking/segmentation-offloads.htmlGoogle Scholar
- T. von Eicken, A. Basu, V. Buch, and W. Vogels. 1995. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. SIGOPS Oper. Syst. Rev. 29, 5 (dec 1995), 40--53. Google ScholarDigital Library
- Wikipedia. 2023. Data processing unit --- Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Data%20processing%20unit&oldid=1164863994. [Online; accessed 04-August-2023].Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for in-Memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX Association, USA, 2.Google ScholarDigital Library
- Zhipeng Zhao, Hugo Sadok, Nirav Atre, James Hoe, Vyas Sekar, and Justine Sherry. 2020. Achieving 100Gbps Intrusion Prevention on a Single Server. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI) (OSDI '20). USENIX Association, Berkeley, CA, USA.Google Scholar
Index Terms
- The I/O Driven Server: From SmartNICs to Data Movement Controllers
Recommendations
Server consolidation with migration control for virtualized data centers
Virtualization has become a key technology for simplifying service management and reducing energy costs in data centers. One of the challenges faced by data centers is to decide when, how, and which virtual machines (VMs) have to be consolidated into a ...
Memory resource management in VMware ESX server
OSDI '02: Proceedings of the 5th Symposium on Operating Systems Design and ImplementationVMware ESX Server is a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified commodity operating systems. This paper introduces several novel ESX Server mechanisms and policies for managing ...
Remote Programmability Model for SmartNICs in HPC Workloads
OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksAbstractHPC workloads experience significant overhead due to handling network-related tasks on the CPU. Some tasks could be offloaded to a SmartNIC, thus reducing the run-time of the workload, but this typically requires explicit support in the ...
Comments