-
Index Shipping for Efficient Replication in LSM Key-Value Stores with Hybrid KV Placement ACM Trans. Storage (IF 1.7) Pub Date : 2024-04-16 Giorgos Stilianakis, Giorgos Saloustros, Orestis Chiotakis, Giorgos Xanthakis, Angelos Bilas
Key-value (KV) stores based on LSM tree have become a foundational layer in the storage stack of datacenters and cloud services. Current approaches for achieving reliability and availability favor reducing network traffic and send to replicas only new KV pairs. As a result, they perform costly compactions to reorganize data in both the primary and backup nodes, which increases device I/O traffic and
-
eZNS: Elastic Zoned Namespace for Enhanced Performance Isolation and Device Utilization ACM Trans. Storage (IF 1.7) Pub Date : 2024-04-12 Jaehong Min, Chenxingyu Zhao, Ming Liu, Arvind Krishnamurthy
Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction, hold the potential to significantly enhance the cost-efficiency of future storage infrastructure and mitigate performance unpredictability. However, existing ZNS SSDs have a static zoned interface, making them in-adaptable to workload runtime behavior, unscalable to underlying hardware capabilities, and interfering
-
A Contract-aware and Cost-effective LSM Store for Cloud Storage with Low Latency Spikes ACM Trans. Storage (IF 1.7) Pub Date : 2024-04-04 Yuanhui Zhou, Jian Zhou, Kai Lu, Ling Zhan, Peng Xu, Peng Wu, Shuning Chen, Xian Liu, Jiguang Wan
Cloud storage is gaining popularity because features such as pay-as-you-go significantly reduce storage costs. However, the community has not sufficiently explored its contract model and latency characteristics. As LSM-Tree-based key-value stores (LSM stores) become the building block for numerous cloud applications, how cloud storage would impact the performance of key-value accesses is vital. This
-
Tarazu: An Adaptive End-to-end I/O Load-balancing Framework for Large-scale Parallel File Systems ACM Trans. Storage (IF 1.7) Pub Date : 2024-04-04 Arnab K. Paul, Sarah Neuwirth, Bharti Wadhwa, Feiyi Wang, Sarp Oral, Ali R. Butt
The imbalanced I/O load on large parallel file systems affects the parallel I/O performance of high-performance computing (HPC) applications. One of the main reasons for I/O imbalances is the lack of a global view of system-wide resource consumption. While approaches to address the problem already exist, the diversity of HPC workloads combined with different file striping patterns prevents widespread
-
Exploiting Data-pattern-aware Vertical Partitioning to Achieve Fast and Low-cost Cloud Log Storage ACM Trans. Storage (IF 1.7) Pub Date : 2024-02-19 Junyu Wei, Guangyan Zhang, Junchao Chen, Yang Wang, Weimin Zheng, Tingtao Sun, Jiesheng Wu, Jiangwei Jiang
Cloud logs can be categorized into on-line, off-line, and near-line logs based on the access frequency. Among them, near-line logs are mainly used for debugging, which means they prefer a low query latency for better user experience. Besides, the storage system for near-line logs prefers a low overall cost including the storage cost to store compressed logs, and the computation cost to compress logs
-
Introduction to the Special Section on USENIX ATC 2023 ACM Trans. Storage (IF 1.7) Pub Date : 2024-02-19 Dan Williams, Julia Lawall
No abstract available.
-
Perseid: A Secondary Indexing Mechanism for LSM-Based Storage Systems ACM Trans. Storage (IF 1.7) Pub Date : 2024-02-19 Jing Wang, Youyou Lu, Qing Wang, Yuhao Zhang, Jiwu Shu
LSM-based storage systems are widely used for superior write performance on block devices. However, they currently fail to efficiently support secondary indexing, since a secondary index query operation usually needs to retrieve multiple small values, which scatter in multiple LSM components. In this work, we revisit secondary indexing in LSM-based storage systems with byte-addressable persistent memory
-
Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search ACM Trans. Storage (IF 1.7) Pub Date : 2024-02-19 Junhyeok Jang, Hanjin Choi, Hanyeoreum Bae, Seungjun Lee, Miryeong Kwon, Myoungsoo Jung
We propose CXL-ANNS, a software-hardware collaborative approach to enable scalable approximate nearest neighbor search (ANNS) services. To this end, we first disaggregate DRAM from the host via compute express link (CXL) and place all essential datasets into its memory pool. While this CXL memory pool allows ANNS to handle billion-point graphs without an accuracy loss, we observe that the search performance
-
Polling Sanitization to Balance I/O Latency and Data Security of High-density SSDs ACM Trans. Storage (IF 1.7) Pub Date : 2024-02-19 Jiaojiao Wu, Zhigang Cai, Fan Yang, Jun Li, Francois Trahay, Zheng Yang, Chao Wang, Jianwei Liao
Sanitization is an effective approach for ensuring data security through scrubbing invalid but sensitive data pages, with the cost of impacts on storage performance due to moving out valid pages from the sanitization-required wordline, which is a logical read/write unit and consists of multiple pages in high-density SSDs. To minimize the impacts on I/O latency and data security, this article proposes
-
An End-to-End High-Performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems ACM Trans. Storage (IF 1.7) Pub Date : 2024-01-30 Nannan Zhao, Muhui Lin, Hadeel Albahar, Arnab K. Paul, Zhijie Huang, Subil Abraham, Keren Chen, Vasily Tarasov, Dimitrios Skourtis, Ali Anwar, Ali R. Butt
The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place a high pressure on the infrastructure of container registries that store and distribute images and container storage systems on the Docker client side that manage image layers and store
-
An LSM Tree Augmented with B+ Tree on Nonvolatile Memory ACM Trans. Storage (IF 1.7) Pub Date : 2024-01-30 Donguk Kim, Jongsung Lee, Keun Soo Lim, Jun Heo, Tae Jun Ham, Jae W. Lee
Modern log-structured merge (LSM) tree-based key-value stores are widely used to process update-heavy workloads effectively as the LSM tree sequentializes write requests to a storage device to maximize storage performance. However, this append-only approach leaves many outdated copies of frequently updated key-value pairs, which need to be routinely cleaned up through the operation called compaction
-
gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value Stores ACM Trans. Storage (IF 1.7) Pub Date : 2024-01-30 Hui Sun, Jinfeng Xu, Xiangxiang Jiang, Guanzhong Chen, Yinliang Yue, Xiao Qin
Log-structured-merge tree or LSM-tree is a technological underpinning in key-value (KV) stores to support a wide range of performance-critical applications. By conducting data re-organization in the background by virtue of compaction operations, the KV stores have the potential to swiftly service write requests with sequential batched disk writes and read requests for KV items constantly sorted by
-
A Scalable Wear Leveling Technique for Phase Change Memory ACM Trans. Storage (IF 1.7) Pub Date : 2024-01-30 Wang Xu, Israel Koren
Phase Change Memory (PCM), one of the recently proposed non-volatile memory technologies, has been suffering from low write endurance. For example, a single-layer PCM cell could only be written approximately 108. This limits the lifetime of a PCM-based memory to a few days rather than years when memory-intensive applications are running. Wear leveling techniques have been proposed to improve the write
-
Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDs ACM Trans. Storage (IF 1.7) Pub Date : 2024-01-30 Shucheng Wang, Qiang Cao, Hong Jiang, Ziyi Lu, Jie Yao, Yuxing Chen, Anqun Pan
Following a conventional design principle that pays more fast-CPU-cycles for fewer slow-I/Os, popular software storage architecture Linux Multiple-Disk (MD) for parity-based RAID (e.g., RAID5 and RAID6) assigns one or more centralized worker threads to efficiently process all user requests based on multi-stage asynchronous control and global data structures, successfully exploiting characteristics
-
Block-level Image Service for the Cloud ACM Trans. Storage (IF 1.7) Pub Date : 2024-01-30 Huiba Li, Zhihao Zhang, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu, Yiming Zhang, Windsor Hsu
Businesses increasingly need agile and elastic computing infrastructure to respond quickly to real-world situations. By offering efficient process-based virtualization and a layered image system, containers are designed to enable agile and elastic application deployment. However, creating or updating large container clusters is still slow due to the image downloading and unpacking process. In this
-
Exploiting Flat Namespace to Improve File System Metadata Performance on Ultra-Fast, Byte-Addressable NVMs ACM Trans. Storage (IF 1.7) Pub Date : 2024-01-30 Miao Cai, Junru Shen, Bin Tang, Hao Huang, Baoliu Ye
The conventional file system provides a hierarchical namespace by structuring it as a directory tree. Tree-based namespace structure leads to inefficient file path walk and expensive namespace tree traversal, underutilizing ultra-low access latency and superior sequential performance provided by non-volatile memories (NVMs). This article proposes FlatFS+, an NVM file system that features a flat namespace
-
Practical Design Considerations for Wide Locally Recoverable Codes (LRCs) ACM Trans. Storage (IF 1.7) Pub Date : 2023-11-14 Saurabh Kadekodi, Shashwat Silas, David Clausen, Arif Merchant
Most of the data in large-scale storage clusters is erasure coded. At exascale, optimizing erasure codes for low storage overhead, efficient reconstruction, and easy deployment is of critical importance. Locally recoverable codes (LRCs) have deservedly gained central importance in this field, because they can balance many of these requirements. In our work, we study wide LRCs; LRCs with large number
-
From Missteps to Milestones: A Journey to Practical Fail-Slow Detection ACM Trans. Storage (IF 1.7) Pub Date : 2023-11-01 Ruiming Lu, Erci Xu, Yiming Zhang, Fengyi Zhu, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Jiwu Shu, Minglu Li, Jiesheng Wu
The newly emerging “fail-slow” failures plague both software and hardware where the victim components are still functioning yet with degraded performance. To address this problem, this article presents Perseus, a practical fail-slow detection framework for storage devices. Perseus leverages a light regression-based model to quickly pinpoint and analyze fail-slow failures at the granularity of drives
-
Empowering Storage Systems Research with NVMeVirt: A Comprehensive NVMe Device Emulator ACM Trans. Storage (IF 1.7) Pub Date : 2023-10-31 Sang-Hoon Kim, Jaehoon Shim, Euidong Lee, Seongyeop Jeong, Ilkueon Kang, Jin-Soo Kim
There have been drastic changes in the storage device landscape recently. At the center of the diverse storage landscape lies the NVMe interface, which allows high-performance and flexible communication models required by these next-generation device types. However, its hardware-oriented definition and specification are bottlenecking the development and evaluation cycle for new revolutionary storage
-
The Security War in File Systems: An Empirical Study from A Vulnerability-centric Perspective ACM Trans. Storage (IF 1.7) Pub Date : 2023-10-03 Jinghan Sun, Shaobo Li, Jun Xu, Jian Huang
This article presents a systematic study on the security of modern file systems, following a vulnerability-centric perspective. Specifically, we collected 377 file system vulnerabilities committed to the CVE database in the past 20 years. We characterize them from four dimensions: why the vulnerabilities appear, how the vulnerabilities can be exploited, what consequences can arise, and how the vulnerabilities
-
Hybrid Block Storage for Efficient Cloud Volume Service ACM Trans. Storage (IF 1.7) Pub Date : 2023-10-03 Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang
The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability, and low cost to the underlying cloud storage. To satisfy the requirement, this article proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead
-
Owner-free Distributed Symmetric Searchable Encryption Supporting Conjunctive Queries ACM Trans. Storage (IF 1.7) Pub Date : 2023-10-03 Qiuyun Tong, Xinghua Li, Yinbin Miao, Yunwei Wang, Ximeng Liu, Robert H. Deng
Symmetric Searchable Encryption (SSE), as an ideal primitive, can ensure data privacy while supporting retrieval over encrypted data. However, existing multi-user SSE schemes require the data owner to share the secret key with all query users or always be online to generate search tokens. While there are some solutions to this problem, they have at least one weakness, such as non-supporting conjunctive
-
A High-performance RDMA-oriented Learned Key-value Store for Disaggregated Memory Systems ACM Trans. Storage (IF 1.7) Pub Date : 2023-10-03 Pengfei Li, Yu Hua, Pengfei Zuo, Zhangyu Chen, Jiajie Sheng
Disaggregated memory systems separate monolithic servers into different components, including compute and memory nodes, to enjoy the benefits of high resource utilization, flexible hardware scalability, and efficient data sharing. By exploiting the high-performance RDMA (Remote Direct Memory Access), the compute nodes directly access the remote memory pool without involving remote CPUs. Hence, the
-
Understanding Persistent-memory-related Issues in the Linux Kernel ACM Trans. Storage (IF 1.7) Pub Date : 2023-10-03 Om Rameshwar Gatla, Duo Zhang, Wei Xu, Mai Zheng
Persistent memory (PM) technologies have inspired a wide range of PM-based system optimizations. However, building correct PM-based systems is difficult due to the unique characteristics of PM hardware. To better understand the challenges as well as the opportunities to address them, this article presents a comprehensive study of PM-related issues in the Linux kernel. By analyzing 1,553 PM-related
-
FASTSync: A FAST Delta Sync Scheme for Encrypted Cloud Storage in High-bandwidth Network Environments ACM Trans. Storage (IF 1.7) Pub Date : 2023-10-03 Suzhen Wu, Zhanhong Tu, Yuxuan Zhou, Zuocheng Wang, Zhirong Shen, Wei Chen, Wei Wang, Weichun Wang, Bo Mao
More and more data are stored in cloud storage, which brings two major challenges. First, the modified files in the cloud should be quickly synchronized to ensure data consistency, e.g., delta synchronization (sync) achieves efficient cloud sync by synchronizing only the updated part of the file. Second, the huge data in the cloud needs to be deduplicated and encrypted, e.g., Message-Locked Encryption
-
The Security War in File Systems: An Empirical Study from A Vulnerability-Centric Perspective ACM Trans. Storage (IF 1.7) Pub Date : 2023-07-17 Jinghan Sun, Shaobo Li, Jun Xu, Jian Huang
This paper presents a systematic study on the security of modern file systems, following a vulnerability-centric perspective. Specifically, we collected 377 file system vulnerabilities committed to the CVE database in the past 20 years. We characterize them from four dimensions that include why the vulnerabilities appear, how the vulnerabilities can be exploited, what consequences can arise, and how
-
FASTSync: a FAST Delta Sync Scheme for Encrypted Cloud Storage in High-Bandwidth Network Environments ACM Trans. Storage (IF 1.7) Pub Date : 2023-07-07 Suzhen Wu, Zhanhong Tu, Yuxuan Zhou, Zuocheng Wang, Zhirong Shen, Wei Chen, Wei Wang, Weichun Wang, Bo Mao
More and more data are stored in cloud storage which brings two major challenges. First, the modified files in the cloud should be quickly synchronized to ensure data consistency, e.g., delta synchronization (sync) achieves efficient cloud sync by synchronizing only the updated part of the file. Second, the huge data in the cloud needs to be deduplicated and encrypted, e.g., Message-Locked Encryption
-
Owner-Free Distributed Symmetric Searchable Encryption Supporting Conjunctive Queries ACM Trans. Storage (IF 1.7) Pub Date : 2023-07-05 Qiuyun Tong, Xinghua Li, Yinbin Miao, Yunwei Wang, Ximeng Liu, Robert H. Deng
Symmetric Searchable Encryption (SSE), as an ideal primitive, can ensure data privacy while supporting retrieval over encrypted data. However, existing multi-user SSE schemes require the data owner to share the secret key with all query users or always be online to generate search tokens. While there are some solutions to this problem, they have at least one weakness, such as non-supporting conjunctive
-
The Design of Fast and Lightweight Resemblance Detection for Efficient Post-Deduplication Delta Compression ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Wen Xia, Lifeng Pu, Xiangyu Zou, Philip Shilane, Shiyi Li, Haijun Zhang, Xuan Wang
Post-deduplication delta compression is a data reduction technique that calculates and stores the differences of very similar but non-duplicate chunks in storage systems, which is able to achieve a very high compression ratio. However, the low throughput of widely used resemblance detection approaches (e.g., N-Transform) usually becomes the bottleneck of delta compression systems due to introducing
-
KVRangeDB: Range Queries for a Hash-based Key–Value Device ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Mian Qin, Qing Zheng, Jason Lee, Bradley Settlemyer, Fei Wen, Narasimha Reddy, Paul Gratz
Key–value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the
-
A Universal SMR-aware Cache Framework with Deep Optimization for DM-SMR and HM-SMR Disks ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Diansen Sun, Ruixiong Tan, Yunpeng Chai
To satisfy the enormous storage capacities required for big data, data centers have been adopting high-density shingled magnetic recording (SMR) disks. However, the weak fine-grained random write performance of SMR disks caused by their inherent write amplification and unbalanced read–write performance poses a severe challenge. Many studies have proposed solid-state drive (SSD) cache systems to address
-
Localized Validation Accelerates Distributed Transactions on Disaggregated Persistent Memory ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Ming Zhang, Yu Hua, Pengfei Zuo, Lurong Liu
Persistent memory (PM) disaggregation significantly improves the resource utilization and failure isolation to build a scalable and cost-effective remote memory pool in modern data centers. However, due to offering limited computing power and overlooking the bandwidth and persistence properties of real PMs, existing distributed transaction schemes, which are designed for legacy DRAM-based monolithic
-
Performance Bug Analysis and Detection for Distributed Storage and Computing Systems ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Jiaxin Li, Yiming Zhang, Shan Lu, Haryadi S. Gunawi, Xiaohui Gu, Feng Huang, Dongsheng Li
This article systematically studies 99 distributed performance bugs from five widely deployed distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop MapReduce and ZooKeeper). We present the TaxPerf database, which collectively organizes the analysis results as over 400 classification labels and over 2,500 lines of bug re-description. TaxPerf is classified into six bug categories
-
Visibility Graph-based Cache Management for DRAM Buffer Inside Solid-state Drives ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Zhibing Sha, Jun Li, Fengxiang Zhang, Min Huang, Zhigang Cai, Francois Trahay, Jianwei Liao
Most solid-state drives (SSDs) adopt an on-board Dynamic Random Access Memory (DRAM) to buffer the write data, which can significantly reduce the amount of write operations committed to the flash array of SSD if data exhibits locality in write operations. This article focuses on efficiently managing the small amount of DRAM cache inside SSDs. The basic idea is to employ the visibility graph technique
-
Derrick: A Three-layer Balancer for Self-managed Continuous Scalability ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Andrzej Jackowski, Leszek Gryz, Michał Wełnicki, Cezary Dubnicki, Konrad Iwanicki
Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly
-
CostCounter: A Better Method for Collision Mitigation in Cuckoo Hashing ACM Trans. Storage (IF 1.7) Pub Date : 2023-06-19 Haonan Wu, Shuxian Wang, Zhanfeng Jin, Yuhang Zhang, Ruyun Ma, Sijin Fan, Ruili Chao
Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost
-
Hybrid Block Storage for Efficient Cloud Volume Service ACM Trans. Storage (IF 1.7) Pub Date : 2023-05-08 Yiming Zhang, Huiba Li, Shengyun Liu, Peng Huang
The migration of traditional desktop and server applications to the cloud brings challenge of high performance, high reliability and low cost to the underlying cloud storage. To satisfy the requirement, this paper proposes a hybrid cloud-scale block storage system called Ursa. Trace analysis shows that the I/O patterns served by block storage have only limited locality to exploit. Therefore, instead
-
Introduction to the Special Section on USENIX ATC 2022 ACM Trans. Storage (IF 1.7) Pub Date : 2023-04-08 Jiri Schindler, Noa Zilberman
No abstract available.
-
Realizing Strong Determinism Contract on Log-Structured Merge Key-Value Stores ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-25 Miryeong Kwon, Seungjun Lee, Hyunkyu Choi, Jooyoung Hwang, Myoungsoo Jung
We propose Vigil-KV, a hardware and software co-designed framework that eliminates long-tail latency almost perfectly by introducing strong latency determinism. To make Get latency deterministic, Vigil-KV first enables a predictable latency mode (PLM) interface on a real datacenter-scale NVMe SSD, having knowledge about the nature of the underlying flash technologies. Vigil-KV at the system-level then
-
TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-22 Guanyu Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen
Out-of-core systems rely on high-performance cache sub-systems to reduce the number of I/O operations. Although the page cache in modern operating systems enables transparent access to memory and storage devices, it suffers from efficiency and scalability issues on cache misses, forcing out-of-core systems to design and implement their own cache components, which is a non-trivial task. This study proposes
-
ZNSwap: un-Block your Swap ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Shai Bergman, Niklas Cassel, Matias Bjørling, Mark Silberstein
We introduce ZNSwap , a novel swap subsystem optimized for the recent Zoned Namespace (ZNS) SSDs. ZNSwap leverages ZNS’s explicit control over data management on the drive and introduces a space-efficient host-side Garbage Collector (GC) for swap storage co-designed with the OS swap logic. ZNSwap enables cross-layer optimizations, such as direct access to the in-kernel swap usage statistics by the
-
CacheSack: Theory and Experience of Google’s Admission Optimization for Datacenter Flash Caches ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Tzu-Wei Yang, Seth Pollen, Mustafa Uysal, Arif Merchant, Homer Wolfmeister, Junaid Khalid
This article describes the algorithm, implementation, and deployment experience of CacheSack, the admission algorithm for Google datacenter flash caches. CacheSack minimizes the dominant costs of Google’s datacenter flash caches: disk IO and flash footprint. CacheSack partitions cache traffic into disjoint categories, analyzes the observed cache benefit of each subset, and formulates a knapsack problem
-
Introduction to the Special Section on USENIX OSDI 2022 ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Marcos K. Aguilera, Hakim Weatherspoon
No abstract available.
-
An In-depth Comparative Analysis of Cloud Block Storage Workloads: Findings and Implications ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Jinhong Li, Qiuping Wang, Patrick P. C. Lee, Chao Shi
Cloud block storage systems support diverse types of applications in modern cloud services. Characterizing their input/output (I/O) activities is critical for guiding better system designs and optimizations. In this article, we present an in-depth comparative analysis of production cloud block storage workloads through the block-level I/O traces of billions of I/O requests collected from two production
-
Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Suli Yang, Jing Liu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
In this article, we present an approach to systematically examine the schedulability of distributed storage systems, identify their scheduling problems, and enable effective scheduling in these systems. We use Thread Architecture Models (TAMs) to describe the behavior and interactions of different threads in a system, and show both how to construct TAMs for existing systems and utilize TAMs to identify
-
PSA-Cache: A Page-state-aware Cache Scheme for Boosting 3D NAND Flash Performance ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Shujie Pang, Yuhui Deng, Genxiong Zhang, Yi Zhou, Yaoqin Huang, Xiao Qin
Garbage collection (GC) plays a pivotal role in the performance of 3D NAND flash memory, where Copyback has been widely used to accelerate valid page migration during GC. Unfortunately, copyback is constrained by the parity symmetry issue: data read from an odd/even page must be written to an odd/even page. After migrating two odd/even consecutive pages, a free page between the two migrated pages will
-
FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Kewen He, Yujie An, Yijing Luo, Xiaoguang Liu, Gang Wang
The Log-Structured Merge Tree (LSM-Tree) is widely used in key-value (KV) stores because of its excwrite performance. But LSM-Tree-based KV stores still have the overhead of write-ahead log and write stall caused by slow L0 flush and L0-L1 compaction. New byte-addressable, persistent memory (PM) devices bring an opportunity to improve the write performance of LSM-Tree. Previous studies on PM-based
-
TPFS: A High-Performance Tiered File System for Persistent Memories and Disks ACM Trans. Storage (IF 1.7) Pub Date : 2023-03-06 Shengan Zheng, Morteza Hoseinzadeh, Steven Swanson, Linpeng Huang
Emerging fast, byte-addressable persistent memory (PM) promises substantial storage performance gains compared with traditional disks. We present TPFS, a tiered file system that combines PM and slow disks to create a storage system with near-PM performance and large capacity. TPFS steers incoming file input/output (I/O) to PM, dynamic random access memory (DRAM), or disk depending on the synchronicity
-
Boosting Cache Performance by Access Time Measurements ACM Trans. Storage (IF 1.7) Pub Date : 2023-02-17 Gil Einziger, Omri Himelbrand, Erez Waisbard
Most modern systems utilize caches to reduce the average data access time and optimize their performance. Recently proposed policies implicitly assume uniform access times, but variable access times naturally appear in domains such as storage, web search, and DNS resolution. Our work measures the access times for various items and exploits variations in access times as an additional signal for caching
-
Oasis: Controlling Data Migration in Expansion of Object-based Storage Systems ACM Trans. Storage (IF 1.7) Pub Date : 2023-01-19 Yiming Zhang, Li Wang, Shun Gai, Qiwen Ke, Wenhao Li, Zhenlong Song, Guangtao Xue, Jiwu Shu
Object-based storage systems have been widely used for various scenarios such as file storage, block storage, blob (e.g., large videos) storage, and so on, where the data is placed among a large number of object storage devices (OSDs). Data placement is critical for the scalability of decentralized object-based storage systems. The state-of-the-art CRUSH placement method is a decentralized algorithm
-
Improving Storage Systems Using Machine Learning ACM Trans. Storage (IF 1.7) Pub Date : 2023-01-19 Ibrahim Umit Akgun, Ali Selman Aydin, Andrew Burford, Michael McNeill, Michael Arkhangelskiy, Erez Zadok
Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for
-
End-to-end I/O Monitoring on Leading Supercomputers ACM Trans. Storage (IF 1.7) Pub Date : 2023-01-11 Bin Yang, Wei Xue, Tianyu Zhang, Shichao Liu, Xiaosong Ma, Xiyang Wang, Weiguo Liu
This paper offers a solution to overcome the complexities of production system I/O performance monitoring. We present Beacon, an end-to-end I/O resource monitoring and diagnosis system for the 40960-node Sunway TaihuLight supercomputer, currently the fourth-ranked supercomputer in the world. Beacon simultaneously collects and correlates I/O tracing/profiling data from all the compute nodes, forwarding
-
Efficient Crash Consistency for NVMe over PCIe and RDMA ACM Trans. Storage (IF 1.7) Pub Date : 2023-01-11 Xiaojian Liao, Youyou Lu, Zhe Yang, Jiwu Shu
This article presents crash-consistent Non-Volatile Memory Express (ccNVMe), a novel extension of the NVMe that defines how host software communicates with the non-volatile memory (e.g., solid-state drive) across a PCI Express bus and RDMA-capable networks with both crash consistency and performance efficiency. Existing storage systems pay a huge tax on crash consistency, and thus cannot fully exploit
-
Reliability Evaluation of Erasure-coded Storage Systems with Latent Errors ACM Trans. Storage (IF 1.7) Pub Date : 2023-01-11 Ilias Iliadis
Large-scale storage systems employ erasure-coding redundancy schemes to protect against device failures. The adverse effect of latent sector errors on the Mean Time to Data Loss (MTTDL) and the Expected Annual Fraction of Data Loss (EAFDL) reliability metrics is evaluated. A theoretical model capturing the effect of latent errors and device failures is developed, and closed-form expressions for the
-
Extending and Programming the NVMe I/O Determinism Interface for Flash Arrays ACM Trans. Storage (IF 1.7) Pub Date : 2023-01-11 Huaicheng Li, Martin L. Putra, Ronald Shi, Fadhil I. Kurnia, Xing Lin, Jaeyoung Do, Achmad Imam Kistijantoro, Gregory R. Ganger, Haryadi S. Gunawi
Predictable latency on flash storage is a long-pursuit goal, yet unpredictability stays due to the unavoidable disturbance from many well-known SSD internal activities. To combat this issue, the recent NVMe IO Determinism (IOD) interface advocates host-level controls to SSD internal management tasks. Although promising, challenges remain on how to exploit it for truly predictable performance. We present
-
InDe: An Inline Data Deduplication Approach via Adaptive Detection of Valid Container Utilization ACM Trans. Storage (IF 1.7) Pub Date : 2023-01-11 Lifang Lin, Yuhui Deng, Yi Zhou, Yifeng Zhu
Inline deduplication removes redundant data in real-time as data is being sent to the storage system. However, it causes data fragmentation: logically consecutive chunks are physically scattered across various containers after data deduplication. Many rewrite algorithms aim to alleviate the performance degradation due to fragmentation by rewriting fragmented duplicate chunks as unique chunks into new
-
Improving the Endurance of Next Generation SSD’s using WOM-v Codes ACM Trans. Storage (IF 1.7) Pub Date : 2022-12-16 Shehbaz Jaffer, Kaveh Mahdaviani, Bianca Schroeder
High density Solid State Drives, such as QLC drives, offer increased storage capacity, but a magnitude lower Program and Erase (P/E) cycles, limiting their endurance and hence usability. We present the design and implementation of non-binary, Voltage-Based Write-Once-Memory (WOM-v) Codes to improve the lifetime of QLC drives. First, we develop a FEMU based simulator test-bed to evaluate the gains of
-
ctFS: Replacing File Indexing with Hardware Memory Translation through Contiguous File Allocation for Persistent Memory ACM Trans. Storage (IF 1.7) Pub Date : 2022-12-16 Ruibin Li, Xiang Ren, Xu Zhao, Siwei He, Michael Stumm, Ding Yuan
Persistent byte-addressable memory (PM) is poised to become prevalent in future computer systems. PMs are significantly faster than disk storage, and accesses to PMs are governed by the Memory Management Unit (MMU) just as accesses with volatile RAM. These unique characteristics shift the bottleneck from I/O to operations such as block address lookup—for example, in write workloads, up to 45% of the
-
EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management ACM Trans. Storage (IF 1.7) Pub Date : 2022-12-12 Margaret Lawson, William Gropp, Jay Lofstead
High-performance computing scientists are producing unprecedented volumes of data that take a long time to load for analysis. However, many analyses only require loading in the data containing particular features of interest and scientists have many approaches for identifying these features. Therefore, if scientists store information (descriptive metadata) about these identified features, then for
-
The what, The from, and The to: The Migration Games in Deduplicated Systems ACM Trans. Storage (IF 1.7) Pub Date : 2022-11-15 Roei Kisous, Ariel Kolikant, Abhinav Duggal, Sarai Sheinvald, Gala Yadgar
Deduplication reduces the size of the data stored in large-scale storage systems by replacing duplicate data blocks with references to their unique copies. This creates dependencies between files that contain similar content and complicates the management of data in the system. In this article, we address the problem of data migration, in which files are remapped between different volumes as a result