当前位置: X-MOL 学术ACM Trans. Storage › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Oasis: Controlling Data Migration in Expansion of Object-based Storage Systems
ACM Transactions on Storage ( IF 1.7 ) Pub Date : 2023-01-19 , DOI: https://dl.acm.org/doi/10.1145/3568424
Yiming Zhang, Li Wang, Shun Gai, Qiwen Ke, Wenhao Li, Zhenlong Song, Guangtao Xue, Jiwu Shu

Object-based storage systems have been widely used for various scenarios such as file storage, block storage, blob (e.g., large videos) storage, and so on, where the data is placed among a large number of object storage devices (OSDs). Data placement is critical for the scalability of decentralized object-based storage systems. The state-of-the-art CRUSH placement method is a decentralized algorithm that deterministically places object replicas onto storage devices without relying on a central directory. While enjoying the benefits of decentralization such as high scalability, robustness, and performance, CRUSH-based storage systems suffer from uncontrolled data migration when expanding the capacity of the storage clusters (i.e., adding new OSDs), which is determined by the nature of CRUSH and will cause significant performance degradation when the expansion is nontrivial.

This article presents MapX, a novel extension to CRUSH that uses an extra time-dimension mapping (from object creation times to cluster expansion times) for controlling data migration after cluster expansions. Each expansion is viewed as a new layer of the CRUSH map represented by a virtual node beneath the CRUSH root. MapX controls the mapping from objects onto layers by manipulating the timestamps of the intermediate placement groups (PGs). MapX is applicable to a large variety of object-based storage scenarios where object timestamps can be maintained as higher-level metadata. We have applied MapX to the state-of-the-art Ceph-RBD (RADOS Block Device) to implement a migration-controllable, decentralized object-based block store (called Oasis). Oasis extends the RBD metadata structure to maintain and retrieve approximate object creation times (for migration control) at the granularity of expansion layers. Experimental results show that the MapX-based Oasis block store outperforms the CRUSH-based Ceph-RBD (which is busy in migrating objects after expansions) by 3.17× ∼ 4.31× in tail latency, and 76.3% (respectively, 83.8%) in IOPS for reads (respectively, writes).



中文翻译:

Oasis:控制基于对象的存储系统扩展中的数据迁移

基于对象的存储系统已被广泛应用于各种场景,例如文件存储、块存储、blob(例如,大视频)存储等,其中数据被放置在大量对象存储设备(OSD)中。数据放置对于分散的基于对象的存储系统的可扩展性至关重要。最先进的 CRUSH 放置方法是一种分散式算法,可以确定性地将对象副本放置到存储设备上,而不依赖于中央目录。在享受去中心化的好处(例如高可扩展性、健壮性和性能)的同时,基于 CRUSH 的存储系统受到不受控制的影响存储集群扩容(即增加新的OSD)时的数据迁移,这是由CRUSH的性质决定的,当扩容不平凡时,会导致性能显着下降。

本文介绍了MapX,它是 CRUSH 的一个新扩展,它使用额外的时间维度映射(从对象创建时间到集群扩展时间)来控制集群扩展后的数据迁移。每个扩展都被视为 CRUSH 图的新层,由 CRUSH 根下的虚拟节点表示。MapX通过操纵中间放置组 (PG) 的时间戳来控制从对象到图层的映射。MapX适用于各种基于对象的存储场景,其中对象时间戳可以作为更高级别的元数据进行维护。我们应用了MapX到最先进的 Ceph-RBD(RADOS 块设备)来实现迁移可控、分散的基于对象的块存储(称为Oasis)。Oasis扩展了 RBD 元数据结构,以在扩展层的粒度上维护和检索近似的对象创建时间(用于迁移控制)。实验结果表明,基于MapX的Oasis块存储优于基于 CRUSH 的 Ceph-RBD(扩展后忙于迁移对象)尾延迟为 3.17× ∼ 4.31×,IOPS 为 76.3%(分别为 83.8%)用于读取(分别为写入)。

更新日期:2023-01-19
down
wechat
bug