当前位置: X-MOL 学术ACM Trans. Database Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enabling Timely and Persistent Deletion in LSM-Engines
ACM Transactions on Database Systems ( IF 1.8 ) Pub Date : 2023-08-09 , DOI: 10.1145/3599724
Subhadeep Sarkar 1 , Tarikul Islam Papon 2 , Dimitris Staratzis 3 , Zichen Zhu 2 , Manos Athanassoulis 2
Affiliation  

Data-intensive applications have fueled the evolution of log-structured merge (LSM) based key-value engines that employ the out-of-place paradigm to support high ingestion rates with low read/write interference. These benefits, however, come at the cost of treating deletes as second-class citizens. A delete operation inserts a tombstone that invalidates older instances of the deleted key. State-of-the-art LSM-engines do not provide guarantees as to how fast a tombstone will propagate to persist the deletion. Further, LSM-engines only support deletion on the sort key. To delete on another attribute (e.g., timestamp), the entire tree is read and re-written, leading to undesired latency spikes and increasing the overall operational cost of a database. Efficient and persistent deletion is key to support: (i) streaming systems operating on a window of data, (ii) privacy with latency guarantees on data deletion, and (iii) en masse cloud deployment of data systems.

Further, we document that LSM-based key-value engines perform suboptimally in the presence of deletes in a workload. Tombstone-driven logical deletes, by design, are unable to purge the deleted entries in a timely manner, and retaining the invalidated entries perpetually affects the overall performance of LSM-engines in terms of space amplification, write amplification, and read performance. Moreover, the potentially unbounded latency for persistent deletes brings in critical privacy concerns in light of the data privacy protection regulations, such as the right to be forgotten in EU’s GDPR, the right to delete in California’s CCPA and CPRA, and deletion right in Virginia’s VCDPA. Toward this, we introduce the delete design space for LSM-trees and highlight the performance implications of the different classes of delete operations.

To address these challenges, in this article, we build a new key-value storage engine, Lethe+, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order. We show that Lethe+ supports any user-defined threshold for the delete persistence latency offering higher read throughput (1.17× -1.4×) and lower space amplification (2.1× -9.8×), with a modest increase in write amplification (between 4% and 25%) that can be further amortized to less than 1%. In addition, Lethe+ supports efficient range deletes on a secondary delete key by dropping entire data pages without sacrificing read performance or employing a costly full tree merge.



中文翻译:

在 LSM 引擎中启用及时且持久的删除

数据密集型应用程序推动了基于日志结构化合并 (LSM)的键值引擎的发展,这些引擎采用异地范例来支持高摄取率和低读/写干扰。然而,这些好处是以将删除视为二等公民为代价的。删除操作会插入一个逻辑删除,使已删除键的旧实例无效。最先进的 LSM 引擎无法保证逻辑删除以多快的速度传播以持久删除。此外,LSM 引擎仅支持排序键的删除。要删除另一个属性(例如时间戳),需要读取并重写整个树,从而导致不希望的延迟峰值并增加数据库的总体运营成本。高效和持久的删除是支持的关键:(i)在数据窗口上运行的流式系统,(ii)数据删除的隐私性和延迟保证,以及(iii)数据系统的集体云部署。

此外,我们记录了基于 LSM 的键值引擎在工作负载中存在删除的情况下执行效果不佳。墓碑驱动的逻辑删除在设计上无法及时清除已删除的条目,并且永久保留无效条目会影响 LSM 引擎在空间放大、写入放大和读取性能方面的整体性能。此外,根据数据隐私保护法规,例如欧盟GDPR中的被遗忘权、加州CCPA和CPRA中的删除权以及删除权,持久性删除可能存在的无限延迟带来了严重的隐私问题。在弗吉尼亚州的 VCDPA。为此,我们介绍了 LSM 树的删除设计空间,并强调了不同类别的删除操作的性能影响。

为了应对这些挑战,在本文中,我们构建了一个新的键值存储引擎Lethe +,它使用非常少量的附加元数据、一组新的删除感知压缩策略以及编织的新物理数据布局排序和删除键顺序。我们表明,Lethe +支持任何用户定义的删除持久性延迟阈值,提供更高的读取吞吐量(1.17× -1.4×) 和更低的空间放大(2.1× -9.8×),并适度增加写入放大(4% 之间)和25%),可以进一步摊销至低于1%。此外,Lethe +支持辅助删除键上的高效范围删除通过删除整个数据页而不牺牲读取性能或采用昂贵的全树合并。

更新日期:2023-08-09
down
wechat
bug