当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SPIN
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2019-04-10 , DOI: 10.1145/3309987
Shai Bergman 1 , Tanya Brokhman 1 , Tzachi Cohen 1 , Mark Silberstein 1
Affiliation  

Recent GPUs enable Peer-to-Peer Direct Memory Access ( p 2 p ) from fast peripheral devices like NVMe SSDs to exclude the CPU from the data path between them for efficiency. Unfortunately, using p 2 p to access files is challenging because of the subtleties of low-level non-standard interfaces, which bypass the OS file I/O layers and may hurt system performance. Developers must possess intimate knowledge of low-level interfaces to manually handle the subtleties of data consistency and misaligned accesses. We present SPIN , which integrates p 2 p into the standard OS file I/O stack, dynamically activating p 2 p where appropriate, transparently to the user. It combines p 2 p with page cache accesses, re-enables read-ahead for sequential reads, all while maintaining standard POSIX FS consistency, portability across GPUs and SSDs, and compatibility with virtual block devices such as software RAID. We evaluate SPIN on NVIDIA and AMD GPUs using standard file I/O benchmarks, application traces, and end-to-end experiments. SPIN achieves significant performance speedups across a wide range of workloads, exceeding p 2 p throughput by up to an order of magnitude. It also boosts the performance of an aerial imagery rendering application by 2.6× by dynamically adapting to its input-dependent file access pattern, enables 3.3× higher throughput for a GPU-accelerated log server, and enables 29% faster execution for the highly optimized GPU-accelerated image collage with only 30 changed lines of code.

中文翻译:

旋转

最近的 GPU 支持对等直接内存访问(p2p) 从 NVMe SSD 等快速外围设备中排除 CPU 从它们之间的数据路径中以提高效率。不幸的是,使用p2p访问文件由于低级非标准接口的微妙之处,这些接口绕过操作系统文件 I/O 层并可能损害系统性能,因此具有挑战性。开发人员必须对低级接口有深入的了解,才能手动处理数据一致性和未对齐访问的微妙之处。我们提出旋转, 它集成了p2p进入标准 OS 文件 I/O 堆栈,动态激活p2p在适当的情况下,对用户透明。它结合了p2p通过页面缓存访问,重新启用顺序读取的预读,同时保持标准 POSIX FS 一致性、跨 GPU 和 SSD 的可移植性以及与软件 RAID 等虚拟块设备的兼容性。我们使用标准文件 I/O 基准、应用程序跟踪和端到端实验来评估 NVIDIA 和 AMD GPU 上的 SPIN。SPIN 在广泛的工作负载中实现了显着的性能加速,超过p2p吞吐量提高了一个数量级。它还通过动态适应其依赖于输入的文件访问模式将航空图像渲染应用程序的性能提高了 2.6 倍,使 GPU 加速的日志服务器的吞吐量提高了 3.3 倍,并使高度优化的 GPU 的执行速度提高了 29% - 仅更改 30 行代码即可加速图像拼贴。
更新日期:2019-04-10
down
wechat
bug