skip to main content
research-article
Public Access

SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs

Authors Info & Claims
Published:09 April 2019Publication History
Skip Abstract Section

Abstract

Recent GPUs enable Peer-to-Peer Direct Memory Access (p2p) from fast peripheral devices like NVMe SSDs to exclude the CPU from the data path between them for efficiency. Unfortunately, using p2p to access files is challenging because of the subtleties of low-level non-standard interfaces, which bypass the OS file I/O layers and may hurt system performance. Developers must possess intimate knowledge of low-level interfaces to manually handle the subtleties of data consistency and misaligned accesses.

We present SPIN, which integrates p2p into the standard OS file I/O stack, dynamically activating p2p where appropriate, transparently to the user. It combines p2p with page cache accesses, re-enables read-ahead for sequential reads, all while maintaining standard POSIX FS consistency, portability across GPUs and SSDs, and compatibility with virtual block devices such as software RAID.

We evaluate SPIN on NVIDIA and AMD GPUs using standard file I/O benchmarks, application traces, and end-to-end experiments. SPIN achieves significant performance speedups across a wide range of workloads, exceeding p2p throughput by up to an order of magnitude. It also boosts the performance of an aerial imagery rendering application by 2.6× by dynamically adapting to its input-dependent file access pattern, enables 3.3× higher throughput for a GPU-accelerated log server, and enables 29% faster execution for the highly optimized GPU-accelerated image collage with only 30 changed lines of code.

References

  1. AMD Radeon Pro SSG Set to Transform Workstation PC Architecture, and to Shatter Real-Time Visual Computing Barriers. Retrieved on February 7, 2017 from http://www.amd.com/en-us/press-releases/Pages/amd-radeon-pro-2016jul25.aspx.Google ScholarGoogle Scholar
  2. GPUDirect RDMA. Retrieved on February 7, 2017 from http://docs.nvidia.com/cuda/gpudirect-rdma/index.html.Google ScholarGoogle Scholar
  3. Tech Brief: AMD FireProTM SDI—Link and AMD DirectGMA Technology. {n.d.} Retrieved from https://www.amd.com/Documents/SDI-tech-brief.pdf.Google ScholarGoogle Scholar
  4. Jie Zhang, David Donofrio, John Shalf, Mahmut T. Kandemir, and Myoungsoo Jung. 2015. NVMMU: A non-volatile memory management unit for heterogeneous GPU-SSD architectures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’15). IEEE, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hung-Wei Tseng, Yang Liu, Mark Gahagan, Jing Li, Yanqin Jin, and Steven Swanson. 2015. Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources. Technical Report CS2015-1015, Department of Computer Science and Engineering, University of California, San Diego.Google ScholarGoogle Scholar
  6. Mustafa Shihab, Karl Taht, and Myoungsoo Jung. 2014. GPUDrive: Reconsidering storage accesses for GPU acceleration. In Proceedings of the Workshop on Architectures and Systems for Big Data.Google ScholarGoogle Scholar
  7. Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating application objects efficiently for heterogeneous computing. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 53--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Project Donard. Retrieved on February 7, 2017 from https://github.com/sbates130272/donard.Google ScholarGoogle Scholar
  9. NVM Express 1.0e. Retrieved on February 7, 2017 from http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_0e.pdf.Google ScholarGoogle Scholar
  10. Sangman Kim, Seonggu Huh, Xinya Zhang Yige Hu, Amir Wated, Emmett Witchel, and Mark Silberstein. 2014. GPUnet: Networking abstractions for GPU programs. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX, 6--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fail2Ban. {n.d.} Retrieved from www.fail2ban.org/.Google ScholarGoogle Scholar
  12. MDADM—Manage MD Devices AKA Linux Software RAID. {n.d.} Retrieved from https://www.kernel.org/pub/linux/utils/raid/mdadm/.Google ScholarGoogle Scholar
  13. Anandech. 2016. AMD Announces Radeon-Pro SSG. Retrieved from http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-fiji-with-m2-ssds-onboard.Google ScholarGoogle Scholar
  14. ArcGIS for Desktop. {n.d.} Retrieved from http://desktop.arcgis.com/en/arcmap.Google ScholarGoogle Scholar
  15. Sagi Shahar, Shai Bergman, and Mark Silberstein. 2016. ActivePointers: A case for software translation on GPUs. In Proceedings of the International Symposium on Computer Architecture (ISCA’16). IEEE, ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Threaded I/O Tester. {n.d.} Retrieved from https://sourceforge.net/p/tiobench.Google ScholarGoogle Scholar
  17. GPU Support in Apache Spark and GPU/CPU Mixed Resource Scheduling at Production Scale. Retrieved on February 7, 2017 from http://www.spark.tc/gpu-support-in-spark-and-gpu-cpu-mixed-resource-scheduling-at-production-scale/.Google ScholarGoogle Scholar
  18. Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating file systems with GPUs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong Kang, Jongmoo Choil, Sungroh Yoon, and Jaehyuk Cha. 2013. Vssim: Virtual machine-based SSD simulator. In Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST’13). IEEE, 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  20. Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory-based solid-state drives in high-speed data processing. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA’11). IEEE, 266--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A Fast GPU Memory Copy Library Based on NVIDIA GPUDirect RDMA Technology. Retrieved on February 7, 2017 from https://github.com/NVIDIA/gdrcopy.Google ScholarGoogle Scholar
  22. Evacuate Struct_page from the Block Layer. Retrieved on February 7, 2017 from https://lwn.net/Articles/636968/.Google ScholarGoogle Scholar
  23. FOSS4G Benchmark. {n.d.} Retrieved from https://wiki.osgeo.org/wiki/FOSS4G_Benchmark.Google ScholarGoogle Scholar
  24. True Marble. {n.d.} Retrieved from http://www.unearthedoutdoors.net/global_data/true_marble/.Google ScholarGoogle Scholar
  25. VMWare. {n.d.} vRealize Log Insight. Retrieved from http://www.vmware.com/products/vrealize-log-insight.html.Google ScholarGoogle Scholar
  26. Giorgos Vasiliadis, Michalis Polychronakis, Spiros Antonatos, Evangelos P. Markatos, and Sotiris Ioannidis. 2009. Regular expression matching on graphics hardware for intrusion detection. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, 265--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Antonio Torralba, Robert Fergus, and William T. Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11 (2008), 1958--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Benchmarking GPUDirect RDMA on Modern Server Platforms. Retrieved on February 7, 2017 from https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/.Google ScholarGoogle Scholar
  29. Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2014. GPUfs: Integrating a file system with GPUs. TOCS 32, 1 (2014), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. OpenCAPI. Retrieved from http://opencapi.org/.Google ScholarGoogle Scholar
  31. Cache Coherent Interconnect for Accelerators (CCIX). Retrieved from http://www.ccixconsortium.com/.Google ScholarGoogle Scholar

Index Terms

  1. SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Computer Systems
            ACM Transactions on Computer Systems  Volume 36, Issue 2
            May 2018
            112 pages
            ISSN:0734-2071
            EISSN:1557-7333
            DOI:10.1145/3323874
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 April 2019
            • Accepted: 1 January 2019
            • Revised: 1 October 2018
            • Received: 1 January 2018
            Published in tocs Volume 36, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format