当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Architectural support for sharing, isolating and virtualizing FPGA resources.
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2024-02-16 , DOI: 10.1145/3648475
Panagiotis Miliadis 1 , Dimitris Theodoropoulos 1 , Dionisios N. Pnevmatikatos 1 , Nectarios Koziris 1
Affiliation  

FPGAs are increasingly popular in cloud environments for their ability to offer on-demand acceleration and improved compute efficiency. Providers would like to increase utilization, by multiplexing customers on a single device, similar to how processing cores and memory are shared. Nonetheless, multi-tenancy still faces major architectural limitations including: a) inefficient sharing of memory interfaces across hardware tasks exacerbated by technological limitations and peculiarities, b) insufficient solutions for performance and data isolation and high quality of service, c) absent or simplistic allocation strategies to effectively distribute external FPGA memory across hardware tasks. This paper presents a full-stack solution for enabling multi-tenancy on FPGAs. Specifically, our work proposes an intra-fpga virtualization layer to share FPGA interfaces and its resources across tenants. To achieve efficient inter-connectivity between virtual FPGAs (vFGPAs) and external interfaces, we employ a compact network-on-chip architecture to optimize resource utilization. Dedicated memory management units implement the concept of virtual memory in FPGAs, providing mechanisms to isolate the address space and enable memory protection. We also introduce a memory segmentation scheme to effectively allocate FPGA address space and enhance isolation through hardware-software support, while preserving the efficacy of memory transactions. We assess our solution on an Alveo U250 Data Center FPGA Card, employing ten real-world benchmarks from the Rodinia and Rosetta suites. Our framework preserves the performance of hardware tasks from a non-virtualized environment, while enhancing the device aggregate throughput through resource sharing; up to 3.96x in isolated and up to 2.31x in highly congested settings, where an external interface is shared across four vFPGAs. Finally, our work ensures high-quality of service, with hardware tasks achieving up to 0.95x of their native performance, even when resource sharing introduces interference from other accelerators.



中文翻译:

对共享、隔离和虚拟化 FPGA 资源的架构支持。

FPGA 因其提供按需加速和提高计算效率的能力而在云环境中越来越受欢迎。提供商希望通过在单个设备上复用客户来提高利用率,类似于共享处理核心和内存的方式。尽管如此,多租户仍然面临着主要的架构限制,包括:a)跨硬件任务的内存接口共享效率低下,技术限制和特殊性加剧了这种情况,b)性能和数据隔离以及高质量服务的解决方案不足,c)分配缺失或过于简单跨硬件任务有效分配外部 FPGA 内存的策略。本文提出了一种在 FPGA 上实现多租户的全栈解决方案。具体来说,我们的工作提出了一个 fpga 内虚拟化层来跨租户共享 FPGA 接口及其资源。为了实现虚拟 FPGA (vFGPA) 和外部接口之间的高效互连,我们采用紧凑的片上网络架构来优化资源利用率。专用内存管理单元在 FPGA 中实现虚拟内存的概念,提供隔离地址空间和实现内存保护的机制。我们还引入了内存分段方案,以有效分配 FPGA 地址空间并通过硬件软件支持增强隔离,同时保留内存事务的效率。我们在 Alveo U250 数据中心 FPGA 卡上评估我们的解决方案,采用 Rodinia 和 Rosetta 套件中的十个实际基准。我们的框架保留了非虚拟化环境中硬件任务的性能,同时通过资源共享增强设备聚合吞吐量;在隔离环境中最高可达 3.96 倍,在高度拥挤的环境中最高可达 2.31 倍,其中外部接口在四个 vFPGA 之间共享。最后,我们的工作确保了高质量的服务,即使资源共享引入了其他加速器的干扰,硬件任务也能实现高达其本机性能 0.95 倍的性能。

更新日期:2024-02-16
down
wechat
bug