当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PMAlloc: A Holistic Approach to Improving Persistent Memory Allocation
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2024-02-03 , DOI: 10.1145/3643886
Zheng Dang 1 , Shuibing He 1 , Xuechen Zhang 2 , Peiyi Hong , Zhenxin Li , Xinyu Chen , Haozhe Song 1 , Xian-He Sun 3 , Gang Chen 1
Affiliation  

Persistent memory allocation is a fundamental building block for developing high-performance and in-memory applications. Existing persistent memory allocators suffer from many performance issues. First, they may introduce repeated cache line flushes and small random accesses in persistent memory for their poor heap metadata management. Second, they use static slab segregation resulting in a dramatic increase in memory consumption when allocation request size is changed. Third, they are not aware of NUMA effect, leading to remote persistent memory accesses in memory allocation and deallocation processes. In this paper, we design a novel allocator, named PMAlloc, to solve the above issues simultaneously. (1) PMAlloc eliminates cache line reflushes by mapping contiguous data blocks in slabs to interleaved metadata entries stored in different cache lines. (2) It writes small metadata units to a persistent bookkeeping log in a sequential pattern to remove random heap metadata accesses in persistent memory. (3) Instead of using static slab segregation, it supports slab morphing, which allows slabs to be transformed between size classes to significantly improve slab usage. (4) It uses a local-first allocation policy to avoid allocating remote memory blocks. And it supports a two-phase deallocation mechanism including recording and synchronization to minimize the number of remote memory access in the deallocation. PMAlloc is complementary to the existing consistency models. Results on 6 benchmarks demonstrate that PMAlloc improves the performance of state-of-the-art persistent memory allocators by up to 6.4x and 57x for small and large allocations, respectively. PMAlloc with NUMA optimizations brings a 2.9x speedup in multi-socket evaluation and is up to 36x faster than other persistent memory allocators. Using PMAlloc reduces memory usage by up to 57.8%. Besides, we integrate PMAlloc in a persistent FPTree. Compared to the state-of-the-art allocators, PMAlloc improves the performance of this application by up to 3.1x.



中文翻译:

PMAlloc:改进持久内存分配的整体方法

持久内存分配是开发高性能内存应用程序的基本构建块。现有的持久内存分配器存在许多性能问题。首先,由于堆元数据管理不佳,它们可能会在持久内存中引入重复的缓存行刷新和小规模随机访问。其次,它们使用静态slab隔离,导致当分配请求大小改变时内存消耗急剧增加。第三,他们没有意识到 NUMA 效应,导致内存分配和释放过程中出现远程持久内存访问。在本文中,我们设计了一种新颖的分配器,名为 PMAlloc,来同时解决上述问题。 (1) PMAlloc 通过将板中的连续数据块映射到存储在不同缓存行中的交错元数据条目来消除缓存行刷新。 (2) 它将小的元数据单元以顺序模式写入持久性簿记日志,以删除持久性内存中的随机堆元数据访问。 (3)它不使用静态slab隔离,而是支持slab变形,这允许slab在尺寸类别之间转换,以显着提高slab使用率。 (4)它使用本地优先分配策略来避免分配远程内存块。并且它支持包括记录和同步的两阶段释放机制,以最大限度地减少释放中的远程内存访问次数。 PMAlloc 是对现有一致性模型的补充。 6 个基准测试的结果表明,对于小型和大型分配,PMAlloc 将最先进的持久内存分配器的性能分别提高了 6.4 倍和 57 倍。具有 NUMA 优化的 PMAlloc 在多插槽评估方面带来 2.9 倍的加速,并且比其他持久内存分配器快达 36 倍。使用 PMAlloc 可以减少高达 57.8% 的内存使用量。此外,我们将 PMAlloc 集成到持久的 FPTree 中。与最先进的分配器相比,PMAlloc 将此应用程序的性能提高了 3.1 倍。

更新日期:2024-02-04
down
wechat
bug