当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2024-02-23 , DOI: 10.1145/3649135
Haitao Du 1 , Yuhan Qin 1 , Song Chen 1 , Yi Kang 1
Affiliation  

DRAM memory is a performance bottleneck for many applications, due to its high access latency. Previous work has mainly focused on data locality, introducing small-but-fast regions to cache frequently accessed data, thereby reducing the average latency. However, these locality-based designs have three challenges in modern multi-core systems: 1) Inter-application interference leads to random memory access traffic. 2) Fairness issues prevent the memory controller from over-prioritizing data locality. 3) Write-intensive applications have much lower locality and evict substantial dirty entries. With frequent data movement between the fast in-DRAM cache and slow regular arrays, the overhead induced by moving data may even offset the performance and energy benefits of in-DRAM caching.

In this paper, we decouple the data movement process into two distinct phases. The first phase is Load-Reduced Destructive Activation (LRDA), which destructively promotes data into the in-DRAM cache. The second phase is Delayed Cycle-Stealing Restoration (DCSR), which restores the original data when DRAM bank is idle. LRDA decouples the most time-consuming restoration phase from activation, and DCSR hides the restoration latency through prevalent bank-level parallelism. We propose FASA-DRAM incorporating destructive activation and delayed restoration techniques to enable both in-DRAM caching and proactive latency-hiding mechanisms. Our evaluation shows that FASA-DRAM improves the average performance by 19.9% and reduces average DRAM energy consumption by 18.1% over DDR4 DRAM for four-core workloads, with less than 3.4% extra area overhead. Furthermore, FASA-DRAM outperforms state-of-the-art designs in both performance and energy efficiency.



中文翻译:

FASA-DRAM:通过破坏性激活和延迟恢复来减少 DRAM 延迟

由于访问延迟较高,DRAM 内存成为许多应用程序的性能瓶颈。之前的工作主要关注数据局部性,引入小而快的区域来缓存频繁访问的数据,从而降低平均延迟。然而,这些基于局部性的设计在现代多核系统中面临三个挑战:1)应用程序间干扰导致随机内存访问流量。2) 公平问题防止内存控制器过度优先考虑数据局部性。3) 写密集型应用程序的局部性要低得多,并且会驱逐大量脏条目。随着快速 DRAM 缓存和慢速常规阵列之间频繁的数据移动,移动数据引起的开销甚至可能抵消 DRAM 缓存的性能和能源优势。

在本文中,我们将数据移动过程解耦为两个不同的阶段。第一阶段是负载减少破坏性激活 (LRDA),它将破坏性地将数据提升到 DRAM 缓存中。第二阶段是延迟周期窃取恢复(DCSR),它在 DRAM Bank 空闲时恢复原始数据。LRDA 将最耗时的恢复阶段与激活解耦,DCSR 通过流行的存储体级并行性隐藏了恢复延迟。我们建议 FASA-DRAM 结合破坏性激活和延迟恢复技术,以实现 DRAM 缓存和主动延迟隐藏机制。我们的评估表明,对于四核工作负载,FASA-DRAM 比 DDR4 DRAM 平均性能提高了 19.9%,平均 DRAM 能耗降低了 18.1%,额外区域开销不到 3.4%。此外,FASA-DRAM 在性能和能源效率方面均优于最先进的设计。

更新日期:2024-02-23
down
wechat
bug