research-article

Public Access

SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs

Authors:
Shai Bergman

Technion-Israel Institute of Technology, Haifa, Israel

Technion-Israel Institute of Technology, Haifa, Israel
View Profile

,
Tanya Brokhman

Technion-Israel Institute of Technology, Haifa, Israel

Technion-Israel Institute of Technology, Haifa, Israel
View Profile

,
Tzachi Cohen

Technion-Israel Institute of Technology, Haifa, Israel

Technion-Israel Institute of Technology, Haifa, Israel
View Profile

,
Mark Silberstein

Technion-Israel Institute of Technology, Haifa, Israel

Technion-Israel Institute of Technology, Haifa, Israel
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 36 Issue 2Article No.: 5pp 1–26https://doi.org/10.1145/3309987

Published:09 April 2019Publication History

ACM Transactions on Computer Systems

Abstract

Recent GPUs enable Peer-to-Peer Direct Memory Access (p2p) from fast peripheral devices like NVMe SSDs to exclude the CPU from the data path between them for efficiency. Unfortunately, using p2p to access files is challenging because of the subtleties of low-level non-standard interfaces, which bypass the OS file I/O layers and may hurt system performance. Developers must possess intimate knowledge of low-level interfaces to manually handle the subtleties of data consistency and misaligned accesses.

We present SPIN, which integrates p2p into the standard OS file I/O stack, dynamically activating p2p where appropriate, transparently to the user. It combines p2p with page cache accesses, re-enables read-ahead for sequential reads, all while maintaining standard POSIX FS consistency, portability across GPUs and SSDs, and compatibility with virtual block devices such as software RAID.

We evaluate SPIN on NVIDIA and AMD GPUs using standard file I/O benchmarks, application traces, and end-to-end experiments. SPIN achieves significant performance speedups across a wide range of workloads, exceeding p2p throughput by up to an order of magnitude. It also boosts the performance of an aerial imagery rendering application by 2.6× by dynamically adapting to its input-dependent file access pattern, enables 3.3× higher throughput for a GPU-accelerated log server, and enables 29% faster execution for the highly optimized GPU-accelerated image collage with only 30 changed lines of code.

References

AMD Radeon Pro SSG Set to Transform Workstation PC Architecture, and to Shatter Real-Time Visual Computing Barriers. Retrieved on February 7, 2017 from http://www.amd.com/en-us/press-releases/Pages/amd-radeon-pro-2016jul25.aspx.Google Scholar
GPUDirect RDMA. Retrieved on February 7, 2017 from http://docs.nvidia.com/cuda/gpudirect-rdma/index.html.Google Scholar
Tech Brief: AMD FireProTM SDI—Link and AMD DirectGMA Technology. {n.d.} Retrieved from https://www.amd.com/Documents/SDI-tech-brief.pdf.Google Scholar
Jie Zhang, David Donofrio, John Shalf, Mahmut T. Kandemir, and Myoungsoo Jung. 2015. NVMMU: A non-volatile memory management unit for heterogeneous GPU-SSD architectures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’15). IEEE, 13--24. Google ScholarDigital Library
Hung-Wei Tseng, Yang Liu, Mark Gahagan, Jing Li, Yanqin Jin, and Steven Swanson. 2015. Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources. Technical Report CS2015-1015, Department of Computer Science and Engineering, University of California, San Diego.Google Scholar
Mustafa Shihab, Karl Taht, and Myoungsoo Jung. 2014. GPUDrive: Reconsidering storage accesses for GPU acceleration. In Proceedings of the Workshop on Architectures and Systems for Big Data.Google Scholar
Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating application objects efficiently for heterogeneous computing. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 53--65. Google ScholarDigital Library
Project Donard. Retrieved on February 7, 2017 from https://github.com/sbates130272/donard.Google Scholar
NVM Express 1.0e. Retrieved on February 7, 2017 from http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_0e.pdf.Google Scholar
Sangman Kim, Seonggu Huh, Xinya Zhang Yige Hu, Amir Wated, Emmett Witchel, and Mark Silberstein. 2014. GPUnet: Networking abstractions for GPU programs. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX, 6--8. Google ScholarDigital Library
Fail2Ban. {n.d.} Retrieved from www.fail2ban.org/.Google Scholar
MDADM—Manage MD Devices AKA Linux Software RAID. {n.d.} Retrieved from https://www.kernel.org/pub/linux/utils/raid/mdadm/.Google Scholar
Anandech. 2016. AMD Announces Radeon-Pro SSG. Retrieved from http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-fiji-with-m2-ssds-onboard.Google Scholar
ArcGIS for Desktop. {n.d.} Retrieved from http://desktop.arcgis.com/en/arcmap.Google Scholar
Sagi Shahar, Shai Bergman, and Mark Silberstein. 2016. ActivePointers: A case for software translation on GPUs. In Proceedings of the International Symposium on Computer Architecture (ISCA’16). IEEE, ACM. Google ScholarDigital Library
Threaded I/O Tester. {n.d.} Retrieved from https://sourceforge.net/p/tiobench.Google Scholar
GPU Support in Apache Spark and GPU/CPU Mixed Resource Scheduling at Production Scale. Retrieved on February 7, 2017 from http://www.spark.tc/gpu-support-in-spark-and-gpu-cpu-mixed-resource-scheduling-at-production-scale/.Google Scholar
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating file systems with GPUs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, 13. Google ScholarDigital Library
Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong Kang, Jongmoo Choil, Sungroh Yoon, and Jaehyuk Cha. 2013. Vssim: Virtual machine-based SSD simulator. In Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST’13). IEEE, 1--14.Google ScholarCross Ref
Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory-based solid-state drives in high-speed data processing. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA’11). IEEE, 266--277. Google ScholarDigital Library
A Fast GPU Memory Copy Library Based on NVIDIA GPUDirect RDMA Technology. Retrieved on February 7, 2017 from https://github.com/NVIDIA/gdrcopy.Google Scholar
Evacuate Struct_page from the Block Layer. Retrieved on February 7, 2017 from https://lwn.net/Articles/636968/.Google Scholar
FOSS4G Benchmark. {n.d.} Retrieved from https://wiki.osgeo.org/wiki/FOSS4G_Benchmark.Google Scholar
True Marble. {n.d.} Retrieved from http://www.unearthedoutdoors.net/global_data/true_marble/.Google Scholar
VMWare. {n.d.} vRealize Log Insight. Retrieved from http://www.vmware.com/products/vrealize-log-insight.html.Google Scholar
Giorgos Vasiliadis, Michalis Polychronakis, Spiros Antonatos, Evangelos P. Markatos, and Sotiris Ioannidis. 2009. Regular expression matching on graphics hardware for intrusion detection. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, 265--283. Google ScholarDigital Library
Antonio Torralba, Robert Fergus, and William T. Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11 (2008), 1958--1970. Google ScholarDigital Library
Benchmarking GPUDirect RDMA on Modern Server Platforms. Retrieved on February 7, 2017 from https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/.Google Scholar
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2014. GPUfs: Integrating a file system with GPUs. TOCS 32, 1 (2014), 1. Google ScholarDigital Library
OpenCAPI. Retrieved from http://opencapi.org/.Google Scholar
Cache Coherent Interconnect for Accelerators (CCIX). Retrieved from http://www.ccixconsortium.com/.Google Scholar

Index Terms

SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
    2. Extra-functional properties
      1. Software performance

Recommendations

GPUfs: Integrating a file system with GPUs

As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order ...
Read More
A multiple-file write scheme for improving write performance of small files in Fast File System

Fast File System (FFS) stores files to disk in separate disk writes, each of which incurs a disk positioning (seek + rotation) limiting the write performance for small files. We propose a new scheme called co-writing to accelerate small file writes in ...
Read More
GPUfs: integrating a file system with GPUs
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 36, Issue 2
May 2018
112 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/3323874
Editor:
Michael Swift
University of Wisconsin, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2019
- Accepted: 1 January 2019
- Revised: 1 October 2018
- Received: 1 January 2018
Published in tocs Volume 36, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Accelerators
GPU
I/O subsystem
file systems
operating systems
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 2,251
  Total Downloads
- Downloads (Last 12 months)489
- Downloads (Last 6 weeks)94
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

GPUfs: Integrating a file system with GPUs

A multiple-file write scheme for improving write performance of small files in Fast File System

GPUfs: integrating a file system with GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

GPUfs: Integrating a file system with GPUs

A multiple-file write scheme for improving write performance of small files in Fast File System

GPUfs: integrating a file system with GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media