Abstract
This paper proposes iSwap, a new memory page swap mechanism that reduces the ineffective I/O swap operations and improves the QoS for applications with a high priority in the cloud environments. iSwap works in the OS kernel. iSwap accurately learns the reuse patterns for memory pages and makes the swap decisions accordingly to avoid ineffective operations. In the cases where memory pressure is high, iSwap compresses pages that belong to the latency-critical (LC) applications (or high-priority applications) and keeps them in main memory, avoiding I/O operations for these LC applications to ensure QoS; and iSwap evicts low-priority applications’ pages out of main memory. iSwap has a low overhead and works well for cloud applications with large memory footprints. We evaluate iSwap on Intel x86 and ARM platforms. The experimental results show that iSwap can significantly reduce ineffective swap operations (8.0% - 19.2%) and improve the QoS for LC applications (36.8% - 91.3%) in cases where memory pressure is high, compared with the latest LRU-based approach widely used in modern OSes.
- “ Cleancache and frontswap,” https://lwn.net/Articles/386090/.Google Scholar
- “ The crypto compression api,” https://docs.kernel.org/crypto.Google Scholar
- “ The FreeBSD project,” https://www.freebsd.org.Google Scholar
- “ The linux kernel archives,” https://www.kernel.org/.Google Scholar
- “Page table management,” https://www.kernel.org/doc/gorman/html/understand/understand006.html.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The parsec benchmark suite: Characterization and architectural implications,” in PACT, 2008.Google ScholarDigital Library
- “ The /proc filesystem,” https://docs.kernel.org/filesystem/proc.html.Google Scholar
- “ Tunable watermark,” https://lwn.net/Articles/422291/.Google Scholar
- “ The zswap compressed swap cache,” https://lwn.net/Articles/537422/.Google Scholar
- D. Ardelean, A. Diwan, and C. Erdman, “Performance analysis of cloud applications,” in NSDI, 2018.Google Scholar
- S. Bai, H. Wan, Y. Huang, X. Sun, F. Wu, C. Xie, H.-C. Hsieh, T.-W. Kuo, and C. J. Xue, “Pipette: Efficient fine-grained reads for SSDs,” in DAC, 2022.Google ScholarDigital Library
- P. Banerjee, Parallel algorithms for VLSI computer-aided design. Prentice-Hall, Inc., 1994.Google ScholarDigital Library
- S. Bergman, N. Cassel, M. Bjorling, and M. Silberstein, “ZNSwap: un-Block your swap,” in USENIX ATC, 2022.Google Scholar
- S. Chen, C. Delimitrou, and J. F. Martínez, “Parties: Qos-aware resource partitioning for multiple interactive services,” in ASPLOS, 2019.Google ScholarDigital Library
- B. Cooper, “ YCSB: Yahoo! cloud serving benchmark.”Google Scholar
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with ycsb,” in SoCC, 2010.Google ScholarDigital Library
- B. Fitzpatrick, “Distributed caching with memcached,” Linux journal, 2004.Google ScholarDigital Library
- G. Grahne and J. Zhu, “Efficiently using prefix-trees in mining frequent itemsets.” in FIMI, 2003.Google Scholar
- J. Han, E. Haihong, G. Le, and J. Du, “Survey on nosql database,” in PERCOM, 2011.Google Scholar
- S. Jiang, F. Chen, and X. Zhang, “CLOCK-Pro: An effective improvement of the CLOCK replacement,” in USENIX ATC, 2005.Google Scholar
- S. Kim and J.-S. Yang, “Optimized I/O determinism for emerging NVM-based NVMe SSD in an enterprise system,” in DAC, 2018.Google Scholar
- C. Kurumada, S. C. Meylan, and M. C. Frank, “Zipfian frequency distributions facilitate word segmentation in context,” Cognition, 2013.Google Scholar
- N. Lebeck, A. Krishnamurthy, H. M. Levy, and I. Zhang, “End the senseless killing: Improving memory management for mobile operating systems,” in USENIX ATC, 2020.Google Scholar
- L. Liu, et al, “Intelligent resource scheduling for co-located latency-critical services: A multi-model collaborative learning approach,” in USENIX FAST, 2023.Google Scholar
- L. Liu, et al, “Rethinking memory management in modern operating system: Horizontal, vertical or random?” in IEEE TC, 2016.Google ScholarDigital Library
- L. Liu, et al, “Hierarchical hybrid memory management in OS for tiered memory systems,” in IEEE TPDS, 2019.Google ScholarCross Ref
- A. Maruf, A. Ghosh, J. Bhimani, D. Campello, A. Rudoff, and R. Rangaswami, “Multi-clock: Dynamic tiering for hybrid memory systems,” in HPCA, 2022.Google ScholarCross Ref
- M. Müller, D. Charypar, and M. H. Gross, “Particle-based fluid simulation for interactive applications,” in SCA, 2003.Google Scholar
- A. Ousterhout, J. Fried, J. Behrens, A. Belay, and H. Balakrishnan, “Shenango: Achieving high cpu efficiency for latency-sensitive datacenter workloads,” in NSDI, 2019.Google ScholarDigital Library
- L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “High-performance clustering of streams and large data sets,” in ICDE, 2002.Google Scholar
- J. Zhu, et al, “CFIO: A conflict-free I/O mechanism to fully exploit internal parallelism for Open-Channel SSDs,” in Journal of Systems Architecture, 2023.Google ScholarDigital Library
- J. Park, M. Kim, M. Chun, L. Orosa, J. Kim, and O. Mutlu, “Reducing solid-state drive read latency by optimizing read-retry,” in ASPLOS, 2021.Google ScholarDigital Library
- B. K. Tanaka, “Monitoring virtual memory with vmstat,” in Linux Journal, 2005.Google Scholar
- A. S. Tenenbaum, Operating Systems: Design and Implementation. Prentice-Hall, 1987.Google Scholar
- X. Xiang, C. Ding, H. Luo, and B. Bao, “HOTL: A higher order theory of locality,” in ASPLOS, 2013.Google ScholarDigital Library
- J. Yang, Y. Wang, and Z. Wang, “Efficient modeling of random sampling-based LRU,” in ICPP, 2021.Google Scholar
- X. Zhang, S. Dwarkadas, and K. Shen, “Towards practical page coloring-based multicore cache management,” in EuroSys, 2009.Google ScholarDigital Library
- T. Patel and D. Tiwari, “Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers,” in HPCA, 2020.Google ScholarCross Ref
- “ MySQL Database,” https://www.mysql.com.Google Scholar
- J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2011.Google ScholarDigital Library
- T. Anderson and M. Dahlin, Operating Systems: Principles and Practice. Recursive books, 2014.Google Scholar
- J. H. Saltzer and M. F. Kaashoek, Principles of computer system design: an introduction. Morgan Kaufmann, 2009.Google Scholar
- Park, SeongJae, Yunjae Lee, and Heon Y. Yeom. “Profiling dynamic data access patterns with controlled overhead and quality,” in Proceedings of the 20th International Middleware Conference Industrial Track, 2019.Google ScholarDigital Library
- A. Lagar-Cavilla, J. Ahn, S. Souhlal, N. Agarwal, R. Burny, S. Butt, J. Chang, A. Chaugule, N. Deng, J. Shahid et al., “Software-defined far memory in warehouse-scale computers,” in ASPLOS, 2019.Google ScholarDigital Library
- J. Weiner, N. Agarwal, D. Schatzberg, L. Yang, H. Wang, B. Sanouillet, B. Sharma, T. Heo, M. Jain, C. Tang et al., “TMO: transparent memory offloading in datacenters,” in ASPLOS, 2022.Google ScholarDigital Library
- “ Idle page tracking/working set estimation.,” https://lwn.net/Articles/460762/.Google Scholar
- P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar, “Dynamic tracking of page miss ratio curve for memory management,” in ACM SIGPLAN Notices, 2004.Google Scholar
- “ zram: Compressed RAM based block devices.,” https://www.kernel.org/doc/Documentation/blockdev/zram.txt.Google Scholar
- “ zcache: a compressed file page cache.,” https://lwn.net/Articles/562254/.Google Scholar
- “ Idle and stale page tracking.,” https://lwn.net/Articles/461461/.Google Scholar
- L. Liu, C. Wu, and X. Feng. “Memory resource optimization method and apparatus,” US Patent No. 9,857,980, 2018.Google Scholar
- F. Lv, et al, “Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms,” in Journal of Computer Science and Technology,, 2014.Google Scholar
Index Terms
- iSwap: A New Memory Page Swap Mechanism for Reducing Ineffective I/O Operations in Cloud Environments
Recommendations
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsReplacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
Write-aware memory management for hybrid SLC-MLC PCM memory systems
In recent years, phase-change memory (PCM) has generated a great deal of interest because of its byte addressability and non-volatility properties. It is regarded as a good alternative storage medium that can reduce the performance gap between the main ...
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
PMAM '12: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and ManycoresThe advent of manycore in computing architecture causes severe energy consumption and memory wall problem. Thus, emerging technologies such as on-chip memory and nonvolatile memory (NVRAM) have led to a paradigm shift in computing architecture era. For ...
Comments