skip to main content
research-article

SILK+ Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads

Published:30 May 2020Publication History
Skip Abstract Section

Abstract

Log-Structured Merge Key-Value stores (LSM KVs) are designed to offer good write performance, by capturing client writes in memory, and only later flushing them to storage. Writes are later compacted into a tree-like data structure on disk to improve read performance and to reduce storage space use. It has been widely documented that compactions severely hamper throughput. Various optimizations have successfully dealt with this problem. These techniques include, among others, rate-limiting flushes and compactions, selecting among compactions for maximum effect, and limiting compactions to the highest level by so-called fragmented LSMs.

In this article, we focus on latencies rather than throughput. We first document the fact that LSM KVs exhibit high tail latencies. The techniques that have been proposed for optimizing throughput do not address this issue, and, in fact, in some cases, exacerbate it. The root cause of these high tail latencies is interference between client writes, flushes, and compactions. Another major cause for tail latency is the heterogeneous nature of the workloads in terms of operation mix and item sizes whereby a few more computationally heavy requests slow down the vast majority of smaller requests.

We introduce the notion of an Input/Output (I/O) bandwidth scheduler for an LSM-based KV store to reduce tail latency caused by interference of flushing and compactions and by workload heterogeneity. We explore three techniques as part of this I/O scheduler: (1) opportunistically allocating more bandwidth to internal operations during periods of low load, (2) prioritizing flushes and compactions at the lower levels of the tree, and (3) separating client requests by size and by data access path. SILK+ is a new open-source LSM KV that incorporates this notion of an I/O scheduler.

References

  1. Muhammad Yousuf Ahmad and Bettina Kemme. 2015. Compaction management in distributed key-value datastores. In Proceedings of VLDB.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jung-Sang Ahn, Chiyoung Seo, Ravi Mayuram, Rahim Yaseen, Jin-Soo Kim, and Seungryoul Maeng. 2015. ForestDB: A fast key-value storage system for variable-length string keys. IEEE Trans. Comput. 65, 3 (2015), 902–915.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt. 2018. BespoKV: Application tailored scale-out key-value stores. In Proceedings of SC18.Google ScholarGoogle Scholar
  4. Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of ACM SIGMETRICS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  6. Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing latency spikes in log-structured merge key-value stores. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  7. Oana Balmau, Rachid Guerraoui, Vasileios Trigonakis, and Igor Zablotchi. 2017. FloDB: Unlocking memory in persistent key-value stores. In Proceedings of EuroSys.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nikhil Bansal and Mor Harchol-Balter. 2001. Analysis of SRPT scheduling: Investigating unfairness. In Proceedings ACM SIGMETRICS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Yang Zhan. 2015. An introduction to B-trees and Write-optimization. ;login: 40, 5 (2015).Google ScholarGoogle Scholar
  10. Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (1970), 422–426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michaela Blott, Ling Liu, Kimon Karras, and Kees Vissers. 2015. Scaling out to a single-node 80 gbps memcached server with 40 terabytes of memory. In Proceedings of USENIX HotStorage.Google ScholarGoogle Scholar
  12. Edward Bortnikov, Anastasia Braginsky, Eshcar Hillel, Idit Keidar, and Gali Sheffi. 2018. Accordion: Better memory organization for LSM key-value stores. In Proceedings of VLDB.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gerth Stolting Brodal and Rolf Fagerberg. 2003. Lower bounds for external memory dictionaries. In Proceedings of SODA.Google ScholarGoogle Scholar
  14. Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, and Yinlong Xu. 2018. HashKV: Enabling efficient updates in KV storage via hashing. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  15. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of SoCC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better space-time trade-offs for lsm-tree based key-value stores via adaptive removal of superfluous merging. In Proceedings of SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jeffrey Dean and Sanjay Ghemawat. [n.d.]. LevelDB. Retrieved January 2019 from https://github.com/google/leveldb.Google ScholarGoogle Scholar
  19. Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. 2016. Job-aware scheduling in eagle: Divide and stick to your probes. In Proceedings of SoCC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. 2018. Kairos: Preemptive data center scheduling without runtime estimates. In Proceedings of SoCC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. 2015. Hawk: Hybrid datacenter scheduling. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  22. Christina Delimitrou and Christos Kozyrakis. 2018. Amdahl’s law for tail latency. Commun. ACM 61, 8 (2018), 65–72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Diego Didona and Willy Zwaenepoel. 2019. Size-aware sharding for improving tail latencies in in-memory key-value stores. In Proceedings of NSDI.Google ScholarGoogle Scholar
  24. Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing space amplification in rocksDB. In Proceedings of CIDR.Google ScholarGoogle Scholar
  25. Assaf Eisenman, Asaf Cidon, Evgenya Pergament, Or Haimovich, Ryan Stutsman, Mohammad Alizadeh, and Sachin Katti. 2019. Flashield: A hybrid key-value cache that controls flash write amplification. In Proceedings of NSDI.Google ScholarGoogle Scholar
  26. Facebook. [n.d.]. RocksDB: A Persistent Key-value Store for Fast Storage Environments. Retrieved January 2019 from https://rocksdb.org.Google ScholarGoogle Scholar
  27. Facebook. [n.d.]. RocksDB Autotuned Rate Limiter. Retrieved January 2019 from https://rocksdb.org/blog/2017/12/18/17-auto-tuned-rate-limiter.html.Google ScholarGoogle Scholar
  28. Facebook. [n.d.]. RocksDB Benchmarking Tools. Retrieved January 2019 from https://github.com/facebook/rocksdb/wiki/Benchmarking-tools.Google ScholarGoogle Scholar
  29. Facebook. [n.d.]. RocksDB Level-based Compaction Changes. Retrieved January 2019 from https://rocksdb.org/blog/2017/06/26/17-level-based-changes.html.Google ScholarGoogle Scholar
  30. Facebook. [n.d.]. RocksDB Rate Limiter. Retrieved January 2019 from https://github.com/facebook/rocksdb/wiki/Rate-Limiter.Google ScholarGoogle Scholar
  31. Facebook. [n.d.]. RocksDB Tuing Guide. Retrieved January 2019 from https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide.Google ScholarGoogle Scholar
  32. Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling concurrent log-structured data stores. In Proceedings of EuroSys.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yu Hua, Bin Xiao, Bharadwaj Veeravalli, and Dan Feng. 2011. Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. 61, 6 (2011), 817–830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hyperdex. [n.d.]. HyperLevelDB. Retrieved January 2019 from https://github.com/rescrv/HyperLevelDB.Google ScholarGoogle Scholar
  36. William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: Write-optimization in a kernel file system. ACM Trans. Storage (TOS) 11, 4 (2015), 1–29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazieres, and Christos Kozyrakis. 2019. Shinjuku: Preemptive scheduling for second-scale tail latency. In Proceedings of NSDI.Google ScholarGoogle Scholar
  38. Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for nonvolatile memory with noveLSM. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  39. Leonard Kleinrock. 1975. Theory, volume 1, queueing systems. (1975).Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. 2015. Atlas: Baidu’s key-value storage system for cloud data. In Proceedings of MSST.Google ScholarGoogle ScholarCross RefCross Ref
  41. Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44, 2 (2010), 35–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: the design and implementation of a fast persistent key-value store. In Proceedings of SOSP.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hyeontaek Lim, David G. Andersen, and Michael Kaminsky. 2016. Towards accurate and fast evaluation of multi-stage log-structured designs. In Proceedings of FAST.Google ScholarGoogle Scholar
  44. Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceedings of ACM SIGARCH.Google ScholarGoogle Scholar
  45. Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of FAST.Google ScholarGoogle Scholar
  46. Fei Mei, Qiang Cao, Hong Jiang, and Jingjun Li. 2018. SifrDB: A unified solution for write-optimized key-value stores in large datacenter. In Proceedings of SoCC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Fei Mei, Qiang Cao, Hong Jiang, and Lei Tian Tintri. 2017. LSM-tree managed storage for large-scale key-value store. In Proceedings of SoCC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Memcached. [n.d.]. memcached: Free 8 open source, high-performance, distributed memory object caching system. Retrieved May 2019 from https://memcached.org/.Google ScholarGoogle Scholar
  49. Alexander Merritt, Ada Gavrilovska, Yuan Chen, and Dejan Milojicic. 2017. Concurrent log-structured memory for many-core key-value stores. In Proceedings of VLDB.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inf. 33, 4 (1996), 351–385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. John Ousterhout and Fred Douglis. 1989. Beating the I/O bottleneck: A case for log-structured file systems. ACM SIGOPS Operating Systems Review 23, 1 (1989), 11–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2016. Tucana: Design and implementation of a fast and efficient scale-up key-value store. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  53. Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proceedings of SOSP.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Waleed Reda, Marco Canini, Lalith Suresh, Dejan Kostić, and Sean Braithwaite. 2017. Rein: Taming tail latency in key-value stores via multiget scheduling. In Proceedings of EuroSys.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A space-efficient key-value storage engine for semi-sorted data. In Proceedings of VLDB.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of EuroSys.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  59. Ting Yao, Jiguang Wan, Ping Huang, Xubin He, Fei Wu, and Changsheng Xie. 2017. Building efficient key-value stores via a lightweight compaction tree. ACM Trans. Storage (TOS) 13, 4 (2017), 1–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Qi Zhang, Alma Riska, Wei Sun, Evgenia Smirni, and Gianfranco Ciardo. 2005. Workload-aware load balancing for clustered web servers. IEEE Trans. Parallel Distrib. Syst. 16, 3 (2005), 219–233.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SILK+ Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 36, Issue 4
          Section: Best of ATC 2019 and Regular Paper
          November 2018
          115 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/3394910
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 May 2020
          • Online AM: 7 May 2020
          • Revised: 1 January 2020
          • Accepted: 1 January 2020
          • Received: 1 October 2019
          Published in tocs Volume 36, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format