样式: 排序: IF: - GO 导出 标记为已读
-
Optimizing Resource Management for Shared Microservices: A Scalable System Design ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2024-02-13 Shutian Luo, Chenyu Lin, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, Huanle Xu, Chengzhong Xu
A common approach to improving resource utilization in data centers is to adaptively provision resources based on the actual workload. One fundamental challenge of doing this in microservice management frameworks, however, is that different components of a service can exhibit significant differences in their impact on end-to-end performance. To make resource management more challenging, a single microservice
-
Component-distinguishable Co-location and Resource Reclamation for High-throughput Computing ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2024-02-13 Laiping Zhao, Yushuai Cui, Yanan Yang, Xiaobo Zhou, Tie Qiu, Keqiu Li, Yungang Bao
Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat multi-component LCs as monolithic applications and treat BEs as “second-class citizens” when allocating resources to them. Neglecting the inconsistent interference tolerance abilities of LC components and the inconsistent preemption
-
Hardware-Software Collaborative Tiered-Memory Management Framework for Virtualization ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2024-02-13 Sai Sha, Chuandong Li, Xiaolin Wang, Zhenlin Wang, Yingwei Luo
The tiered-memory system can effectively expand the memory capacity for virtual machines (VMs). However, virtualization introduces new challenges specifically in enforcing performance isolation, minimizing context switching, and providing resource overcommit. None of the state-of-the-art designs consider virtualization and address these challenges; we observe that a VM with tiered memory incurs up
-
Diciclo: Flexible User-level Services for Efficient Multitenant Isolation ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2024-02-13 Giorgos Kappes, Stergios V. Anastasiadis
Containers are a mainstream virtualization technique for running stateful workloads over persistent storage. In highly utilized multitenant hosts, resource contention at the system kernel leads to inefficient container input/output (I/O) handling. Although there are interesting techniques to address this issue, they incur high implementation complexity and execution overhead. As a cost-effective alternative
-
PMAlloc: A Holistic Approach to Improving Persistent Memory Allocation ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2024-02-03 Zheng Dang, Shuibing He, Xuechen Zhang, Peiyi Hong, Zhenxin Li, Xinyu Chen, Haozhe Song, Xian-He Sun, Gang Chen
Persistent memory allocation is a fundamental building block for developing high-performance and in-memory applications. Existing persistent memory allocators suffer from many performance issues. First, they may introduce repeated cache line flushes and small random accesses in persistent memory for their poor heap metadata management. Second, they use static slab segregation resulting in a dramatic
-
Trinity: High-Performance and Reliable Mobile Emulation through Graphics Projection ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2024-01-24 Hao Lin, Zhenhua Li, Di Gao, Yunhao Liu, Feng Qian, Tianyin Xu
Mobile emulation, which creates full-fledged software mobile devices on a physical PC/server, is pivotal to the mobile ecosystem. Unfortunately, existing mobile emulators perform poorly on graphics-intensive apps in terms of efficiency and compatibility. To address this, we introduce graphics projection, a novel graphics virtualization mechanism that adds a small-size projection space inside the guest
-
Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2024-01-15 Jie Zhao, Jinchen Xu, Peng Di, Wang Nie, Jiahui Hu, Yanzhi Yi, Sijia Yang, Zhen Geng, Renwei Zhang, Bojie Li, Zhiliang Gan, Xuefeng Jin
Loop tiling and fusion are two essential transformations in optimizing compilers to enhance the data locality of programs. Existing heuristics either perform loop tiling and fusion in a particular order, missing some of their profitable compositions, or execute ad-hoc implementations for domain-specific applications, calling for a generalized and systematic solution in optimizing compilers. In this
-
Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2023-12-18 Michael Pellauer, Jason Clemons, Vignesh Balaji, Neal Crago, Aamer Jaleel, Donghyuk Lee, Mike O’Connor, Anghsuman Parashar, Sean Treichler, Po-An Tsai, Stephen W. Keckler, Joel S. Emer
Sparse tensor algorithms are becoming widespread, particularly in the domains of deep learning, graph and data analytics, and scientific computing. Current high-performance broad-domain architectures, such as GPUs, often suffer memory system inefficiencies by moving too much data or moving it too far through the memory hierarchy. To increase performance and efficiency, proposed domain-specific accelerators
-
Partial Network Partitioning ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2023-12-18 Basil Alkhatib, Sreeharsha Udayashankar, Sara Qunaibi, Ahmed Alquraan, Mohammed Alfatafta, Wael Al-Manasrah, Alex Depoutovitch, Samer Al-Kiswany
We present an extensive study focused on partial network partitioning. Partial network partitions disrupt the communication between some but not all nodes in a cluster. First, we conduct a comprehensive study of system failures caused by this fault in 13 popular systems. Our study reveals that the studied failures are catastrophic (e.g., lead to data loss), easily manifest, and are mainly due to design
-
Charlotte: Reformulating Blockchains into a Web of Composable Attested Data Structures for Cross-Domain Applications ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2023-12-18 Isaac Sheff, Xinwen Wang, Kushal Babel, Haobin Ni, Robbert van Renesse, Andrew C. Myers
Cross-domain applications are rapidly adopting blockchain techniques for immutability, availability, integrity, and interoperability. However, for most applications, global consensus is unnecessary and may not even provide sufficient guarantees. We propose a new distributed data structure: Attested Data Structures (ADS), which generalize not only blockchains but also many other structures used by distributed
-
Filesystem Fragmentation on Modern Storage Systems ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2023-12-18 Jonggyu Park, Young Ik Eom
Filesystem fragmentation has been one of the primary reasons for computer systems to get slower over time. However, there have been rapid changes in modern storage systems over the past decades, and modern storage devices such as solid state drives have different mechanisms to access data, compared with traditional rotational ones. In this article, we revisit filesystem fragmentation on modern computer
-
Charlotte: Reformulating Blockchains into a Web of Composable Attested Data Structures for Cross-Domain Applications ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2023-07-22 Isaac Sheff, Xinwen Wang, Kushal Babel, Haobin Ni, Robbert van Renesse, Andrew C. Myers
Cross-domain applications are rapidly adopting blockchain techniques for immutability, availability, integrity, and interoperability. However, for most applications, global consensus is unnecessary and may not even provide sufficient guarantees. We propose a new distributed data structure: Attested Data Structures (ADS), which generalize not only blockchains, but also many other structures used by
-
Partial Network Partitioning ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-12-19 Basil Alkhatib, Sreeharsha Udayashankar, Sara Qunaibi, Ahmed Alquraan, Mohammed Alfatafta, Wael Al-Manasrah, Alex Depoutovitch, Samer Al-Kiswany
We present an extensive study focused on partial network partitioning. Partial network partitions disrupt the communication between some but not all nodes in a cluster. First, we conduct a comprehensive study of system failures caused by this fault in 13 popular systems. Our study reveals that the studied failures are catastrophic (e.g., lead to data loss), easily manifest, and are mainly due to design
-
Using Pattern of On-Off Routers and Links and Router Delays to Protect Network-on-Chip Intellectual Property ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-11-24 Arnab Kumar Biswas
Intellectual Property (IP) reuse is a well known practice in chip design processes. Nowadays, network-on-chips (NoCs) are increasingly used as IP and sold by various vendors to be integrated in a multiprocessor system-on-chip (MPSoC). However, IP reuse exposes the design to IP theft, and an attacker can launch IP stealing attacks against NoC IPs. With the growing adoption of MPSoC, such attacks can
-
Efficient Instruction Scheduling Using Real-time Load Delay Tracking ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-11-24 Andreas Diavastos, Trevor E. Carlson
Issue time prediction processors use dataflow dependencies and predefined instruction latencies to predict issue times of repeated instructions. In this work, we make two key observations: (1) memory accesses often take additional time to arrive than the static, predefined access latency that is used to describe these systems. This is due to contention in the memory hierarchy and variability in DRAM
-
Efficient Instruction Scheduling using Real-time Load Delay Tracking ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-15 Andreas Diavastos, Trevor E. Carlson
Issue time prediction processors use data-flow dependencies and predefined instruction latencies to predict issue times of repeated instructions. In this work, we make two key observations: (1) memory accesses often take additional time to arrive than the static, predefined access latency that is used to describe these systems. This is due to contention in the memory hierarchy and variability in DRAM
-
Using Pattern of On-off Routers and Links and Router Delays to Protect Network-on-Chip Intellectual Property ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-13 Arnab Kumar Biswas
Intellectual Property (IP) reuse is a well known practice in chip design processes. Nowadays, network-on-chips (NoCs) are increasingly used as IP and sold by various vendors to be integrated in a multiprocessor system-on-chip (MPSoC). However, IP reuse exposes the design to IP theft, and an attacker can launch IP stealing attacks against NoC IPs. With the growing adoption of MPSoC, such attacks can
-
Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-05 Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui
To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy inefficient. Emerging non-volatile memory (NVM) technologies offer high capacity compared to DRAM and low energy compared to SSDs. Hence, NVMs have the potential to fundamentally change the dichotomy between DRAM and durable storage in Big Data processing. However
-
An OpenMP Runtime for Transparent Work Sharing across Cache-Incoherent Heterogeneous Nodes ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-05 Robert Lyerly, Carlos Bilbao, Changwoo Min, Christopher J. Rossbach, Binoy Ravindran
In this work, we present libHetMP, an OpenMP runtime for automatically and transparently distributing parallel computation across heterogeneous nodes. libHetMP targets platforms comprising CPUs with different instruction set architectures (ISA) coupled by a high-speed memory interconnect, where cross-ISA binary incompatibility and non-coherent caches require application data be marshaled to be shared
-
The Role of Compute in Autonomous Micro Aerial Vehicles: Optimizing for Mission Time and Energy Efficiency ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-05 Behzad Boroujerdian, Hasan Genc, Srivatsan Krishnan, Bardienus Pieter Duisterhof, Brian Plancher, Kayvan Mansoorshahi, Marcelino Almeida, Wenzhi Cui, Aleksandra Faust, Vijay Janapa Reddi
Autonomous and mobile cyber-physical machines are becoming an inevitable part of our future. In particular, Micro Aerial Vehicles (MAVs) have seen a resurgence in activity. With multiple use cases, such as surveillance, search and rescue, package delivery, and more, these unmanned aerial systems are on the cusp of demonstrating their full potential. Despite such promises, these systems face many challenges
-
ROME: All Overlays Lead to Aggregation, but Some Are Faster than Others ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-05 Marcel Blöcher, Emilio Coppa, Pascal Kleber, Patrick Eugster, William Culhane, Masoud Saeida Ardekani
Aggregation is common in data analytics and crucial to distilling information from large datasets, but current data analytics frameworks do not fully exploit the potential for optimization in such phases. The lack of optimization is particularly notable in current “online” approaches that store data in main memory across nodes, shifting the bottleneck away from disk I/O toward network and compute resources
-
H-Container: Enabling Heterogeneous-ISA Container Migration in Edge Computing ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-05 Tong Xing, Antonio Barbalace, Pierre Olivier, Mohamed L. Karaoui, Wei Wang, Binoy Ravindran
Edge computing is a recent computing paradigm that brings cloud services closer to the client. Among other features, edge computing offers extremely low client/server latencies. To consistently provide such low latencies, services should run on edge nodes that are physically as close as possible to their clients. Thus, when the physical location of a client changes, a service should migrate between
-
Boosting Inter-process Communication with Architectural Support ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-07-05 Yubin Xia, Dong Du, Zhichao Hua, Binyu Zang, Haibo Chen, Haibing Guan
IPC (inter-process communication) is a critical mechanism for modern OSes, including not only microkernels such as seL4, QNX, and Fuchsia where system functionalities are deployed in user-level processes, but also monolithic kernels like Android where apps frequently communicate with plenty of user-level services. However, existing IPC mechanisms still suffer from long latency. Previous software optimizations
-
ROME: All Overlays Lead to Aggregation, but Some Are Faster than OthersJust Accepted ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-03-16 Marcel Blöcher, Emilio Coppa, Pascal Kleber, Patrick Eugster, William Culhane, Masoud Saeida Ardekani
Aggregation is common in data analytics and crucial to distilling information from large datasets, but current data analytics frameworks do not fully exploit the potential for optimization in such phases. The lack of optimization is particularly notable in current “online” approaches which store data in main memory across nodes, shifting the bottleneck away from disk I/O toward network and compute
-
An OpenMP Runtime for Transparent Work Sharing Across Cache-Incoherent Heterogeneous NodesJust Accepted ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-02-04 Robert Lyerly, Carlos Bilbao, Changwoo Min, Christopher J. Rossbach, Binoy Ravindran
In this work we present libHetMP, an OpenMP runtime for automatically and transparently distributing parallel computation across heterogeneous nodes. libHetMP targets platforms comprising CPUs with different instruction set architectures (ISA) coupled by a high-speed memory interconnect, where cross-ISA binary incompatibility and non-coherent caches require application data be marshaled to be shared
-
Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid MemoriesJust Accepted ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-02-04 Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui
To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy-inefficient. Emerging non-volatile memory (NVM) technologies offer high capacity compared to DRAM and low energy compared to SSDs. Hence, NVMs have the potential to fundamentally change the dichotomy between DRAM and durable storage in Big Data processing. However
-
Shooting Down the Server Front-End Bottleneck ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2022-01-04 Rakesh Kumar, Boris Grot
The front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction footprints. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art techniques force a trade-off between metadata storage cost and performance. Temporal Stream prefetchers deliver high performance but require a prohibitive amount of metadata
-
Apache Nemo: A Framework for Optimizing Distributed Data Processing ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-10-15 Won Wook Song, Youngseok Yang, Jeongyoon Eo, Jangho Seo, Joo Yeon Kim, Sanha Lee, Gyewon Lee, Taegeon Um, Haeyoon Cho, Byung-Gon Chun
Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. Existing approaches to such optimizations largely fall into two categories. First, distributed runtimes provide low-level policy interfaces to apply the optimizations, but do not ensure the maintenance of correct application semantics and thus often
-
Scaling Membership of Byzantine Consensus ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-10-15 Burcu Canakci, Robbert Van Renesse
Scaling Byzantine Fault Tolerant (BFT) systems in terms of membership is important for secure applications with large participation such as blockchains. While traditional protocols have low latency, they cannot handle many processors. Conversely, blockchains often have hundreds to thousands of processors to increase robustness, but they typically have high latency or energy costs. We describe various
-
Systemizing Interprocedural Static Analysis of Large-scale Systems Code with Graspan ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-07-29 Zhiqiang Zuo, Kai Wang, Aftab Hussain, Ardalan Amiri Sani, Yiyu Zhang, Shenming Lu, Wensheng Dou, Linzhang Wang, Xuandong Li, Chenxi Wang, Guoqing Harry Xu
There is more than a decade-long history of using static analysis to find bugs in systems such as Linux. Most of the existing static analyses developed for these systems are simple checkers that find bugs based on pattern matching. Despite the presence of many sophisticated interprocedural analyses, few of them have been employed to improve checkers for systems code due to their complex implementations
-
Modular and Distributed Management of Many-Core SoCs ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-07-08 Marcelo Ruaro, Anderson Sant’ana, Axel Jantsch, Fernando Gehm Moraes
Many-Core Systems-on-Chip increasingly require Dynamic Multi-objective Management (DMOM) of resources. DMOM uses different management components for objectives and resources to implement comprehensive and self-adaptive system resource management. DMOMs are challenging because they require a scalable and well-organized framework to make each component modular, allowing it to be instantiated or redesigned
-
SmartIO ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-07-08 Jonas Markussen, Lars Bjørlykke Kristiansen, Pål Halvorsen, Halvor Kielland-Gyrud, Håkon Kvale Stensland, Carsten Griwodz
The large variety of compute-heavy and data-driven applications accelerate the need for a distributed I/O solution that enables cost-effective scaling of resources between networked hosts. For example, in a cluster system, different machines may have various devices available at different times, but moving workloads to remote units over the network is often costly and introduces large overheads compared
-
Metron ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-07-08 Georgios P. Katsikas, Tom Barbette, Dejan Kostić, JR. Gerald Q. Maguire, Rebecca Steinert
Deployment of 100Gigabit Ethernet (GbE) links challenges the packet processing limits of commodity hardware used for Network Functions Virtualization (NFV). Moreover, realizing chained network functions (i.e., service chains) necessitates the use of multiple CPU cores, or even multiple servers, to process packets from such high speed links. Our system Metron jointly exploits the underlying network
-
Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-07-01 Youwei Zhuo, Jingji Chen, Gengyu Rao, Qinyi Luo, Yanzhi Wang, Hailong Yang, Depei Qian, Xuehai Qian
To hide the complexity of the underlying system, graph processing frameworks ask programmers to specify graph computations in user-defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. It exemplifies a gap between programming model
-
A Simulation Software for the Evaluation of Vulnerabilities in Reputation Management Systems ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-06-05 Vincenzo Agate, Alessandra De Paola, Giuseppe Lo Re, Marco Morana
Multi-agent distributed systems are characterized by autonomous entities that interact with each other to provide, and/or request, different kinds of services. In several contexts, especially when a reward is offered according to the quality of service, individual agents (or coordinated groups) may act in a selfish way. To prevent such behaviours, distributed Reputation Management Systems (RMSs) provide
-
AI Tax ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-03-26 Daniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, David Zimmerman, Vijay Janapa Reddi
Artificial intelligence and machine learning are experiencing widespread adoption in industry and academia. This has been driven by rapid advances in the applications and accuracy of AI through increasingly complex algorithms and models; this, in turn, has spurred research into specialized hardware AI accelerators. Given the rapid pace of advances, it is easy to forget that they are often developed
-
UNIQ ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-03-26 Chaim Baskin, Natan Liss, Eli Schwartz, Evgenii Zheltonozhskii, Raja Giryes, Alex M. Bronstein, Avi Mendelson
We present a novel method for neural network quantization. Our method, named UNIQ , emulates a non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform
-
KylinX ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-02-12 Yiming Zhang, Chengfei Zhang, Yaozheng Wang, Kai Yu, Guangtao Xue, Jon Crowcroft
Unikernel specializes a minimalistic LibOS and a target application into a standalone single-purpose virtual machine (VM) running on a hypervisor, which is referred to as (virtual) appliance . Compared to traditional VMs, Unikernel appliances have smaller memory footprint and lower overhead while guaranteeing the same level of isolation. On the downside, Unikernel strips off the process abstraction
-
Highly Concurrent Latency-tolerant Register Files for GPUs ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2021-01-04 Mohammad Sadrosadati, Amirhossein Mirhosseini, Ali Hajiabadi, Seyed Borna Ehsani, Hajar Falahati, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, Onur Mutlu
Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching registers
-
Introduction to the Special Issue on the Award Papers of USENIX ATC 2019 ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-06-11 Dahlia Malkhi, Dan Tsafrir
This special issue of ACM Transactions on Computer Systems presents the three papers from the 2019 USENIX Annual Technical Conference (ATC’19) that won the Best Paper Award. The scope of ATC is broad. It covers all practical aspects related to systems software, and its goal is to improve and further the knowledge of computing systems of all scales, from small embedded devices to large data centers
-
SILK+ Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-05-31 Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, Diego Didona
Log-Structured Merge Key-Value stores (LSM KVs) are designed to offer good write performance, by capturing client writes in memory, and only later flushing them to storage. Writes are later compacted into a tree-like data structure on disk to improve read performance and to reduce storage space use. It has been widely documented that compactions severely hamper throughput. Various optimizations have
-
Transactuations ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-05-31 Tanakorn Leesatapornwongsa, Aritra Sengupta, Masoud Saeida Ardekani, Gustavo Petri, Cesar A. Stuardo
A large class of IoT applications read sensors, execute application logic, and actuate actuators. However, the lack of high-level programming abstractions compromises correctness, especially in the presence of failures and unwanted interleaving between applications. A key problem arises when operations on IoT devices or the application itself fails, which leads to inconsistencies between the physical
-
A Retargetable System-level DBT Hypervisor ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-05-31 Tom Spink, Harry Wagstaff, Björn Franke
System-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different from that of the host machine. Due to their performance-critical nature, system-level DBT frameworks are typically hand-coded and heavily optimized, both for their guest and host architectures. While this results in
-
Corrigendum to “Derecho: Fast State Machine Replication for Cloud Services,” by Jha et al., ACM Transactions on Computer Systems (TOCS) Volume 36, Issue 2, Article No. 4 ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-05-30 Jha
No abstract available.
-
Effective Detection of Sleep-in-atomic-context Bugs in the Linux Kernel ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-05-04 Jia-Ju Bai, Julia Lawall, Shi-Min Hu
Atomic context is an execution state of the Linux kernel in which kernel code monopolizes a CPU core. In this state, the Linux kernel may only perform operations that cannot sleep, as otherwise a system hang or crash may occur. We refer to this kind of concurrency bug as a sleep-in-atomic-context (SAC) bug. In practice, SAC bugs are hard to find, as they do not cause problems in all executions. In
-
Assisting Static Compiler Vectorization with a Speculative Dynamic Vectorizer in an HW/SW Codesigned Environment ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-04-04 Rakesh Kumar, Alejandro Martínez, Antonio González
Compiler-based static vectorization is used widely to extract data-level parallelism from computation-intensive applications. Static vectorization is very effective in vectorizing traditional array-based applications. However, compilers’ inability to do accurate interprocedural pointer disambiguation and interprocedural array dependence analysis severely limits vectorization opportunities. HW/SW codesigned
-
Spanner ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-04-04 James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford
Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API
-
Fast and Portable Locking for Multicore Architectures ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-04-04 Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall, Gilles Muller
The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea
-
Protocol Responsibility Offloading to Improve TCP Throughput in Virtualized Environments ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-04-04 Sahan Gamage, Ramana Rao Kompella, Dongyan Xu, Ardalan Kangarlou
Virtualization is a key technology that powers cloud computing platforms such as Amazon EC2. Virtual machine (VM) consolidation, where multiple VMs share a physical host, has seen rapid adoption in practice, with increasingly large numbers of VMs per machine and per CPU core. Our investigations, however, suggest that the increasing degree of VM consolidation has serious negative effects on the VMs’
-
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2020-04-04 Yunsup Lee, Rimas Avizienis, Alex Bishara, Richard Xia, Derek Lockhart, Christopher Batten, Krste Asanović
We present a taxonomy and modular implementation approach for data-parallel accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) architectural design patterns. We introduce Maven, a new VT microarchitecture based on the traditional vector-SIMD microarchitecture, that is considerably simpler to implement and easier to program than previous VT designs. Using an extensive
-
An Instruction Set Architecture for Machine Learning ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-08-13 Yunji Chen, Huiying Lan, Zidong Du, Shaoli Liu, Jinhua Tao, Dong Han, Tao Luo, Qi Guo, Ling Li, Yuan Xie, Tianshi Chen
Machine Learning (ML) are a family of models for learning from the data to improve performance on a certain task. ML techniques, especially recent renewed neural networks (deep neural networks), have proven to be efficient for a broad range of applications. ML techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which usually are not energy efficient, since
-
Software Prefetching for Indirect Memory Accesses ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-06-18 Sam Ainsworth, Timothy M. Jones
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. This article
-
The Arm Triple Core Lock-Step (TCLS) Processor ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-06-18 Xabier Iturbe, Balaji Venu, Emre Ozer, Jean-Luc Poupat, Gregoire Gimenez, Hans-Ulrich Zurek
The Arm Triple Core Lock-Step (TCLS) architecture is the natural evolution of Arm Cortex-R Dual Core Lock-Step (DCLS) processors to increase dependability, predictability, and availability in safety-critical and ultra-reliable applications. TCLS is simple, scalable, and easy to deploy in applications where Arm DCLS processors are widely used (e.g., automotive), as well as in new sectors where the presence
-
SPIN ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-04-10 Shai Bergman, Tanya Brokhman, Tzachi Cohen, Mark Silberstein
Recent GPUs enable Peer-to-Peer Direct Memory Access ( p 2 p ) from fast peripheral devices like NVMe SSDs to exclude the CPU from the data path between them for efficiency. Unfortunately, using p 2 p to access files is challenging because of the subtleties of low-level non-standard interfaces, which bypass the OS file I/O layers and may hurt system performance. Developers must possess intimate knowledge
-
Mitigating Load Imbalance in Distributed Data Serving with Rack-Scale Memory Pooling ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-04-10 Stanko Novakovic, Alexandros Daglis, Dmitrii Ustiugov, Edouard Bugnion, Babak Falsafi, Boris Grot
To provide low-latency and high-throughput guarantees, most large key-value stores keep the data in the memory of many servers. Despite the natural parallelism across lookups, the load imbalance, introduced by heavy skew in the popularity distribution of keys, limits performance. To avoid violating tail latency service-level objectives, systems tend to keep server utilization low and organize the data
-
Derecho ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-04-04 Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert Van Renesse, Sydney Zink, Kenneth P. Birman
Cloud computing services often replicate data and may require ways to coordinate distributed actions. Here we present Derecho, a library for such tasks. The API provides interfaces for structuring applications into patterns of subgroups and shards, supports state machine replication within them, and includes mechanisms that assist in restart after failures. Running over 100Gbps RDMA, Derecho can send
-
Deca ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-03-14 Xuanhua Shi, Zhixiang Ke, Yongluan Zhou, Hai Jin, Lu Lu, Xiong Zhang, Ligang He, Zhenyu Hu, Fei Wang
In-memory caching of intermediate data and active combining of data in shuffle buffers have been shown to be very effective in minimizing the recomputation and I/O cost in big data processing systems such as Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap. These generated objects may quickly saturate
-
Lock–Unlock ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-03-14 Rachid Guerraoui, Hugo Guiroux, Renaud Lachaize, Vivien Quéma, Vasileios Trigonakis
A plethora of optimized mutex lock algorithms have been designed over the past 25 years to mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications that consider different performance metrics, such as energy efficiency and tail latency. In this article, we perform
-
Venice ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2019-03-14 Boyan Zhao, Rui Hou, Jianbo Dong, Michael Huang, Sally A. Mckee, Qianlong Zhang, Yueji Liu, Ye Li, Lixin Zhang, Dan Meng
Consolidated server racks are quickly becoming the standard infrastructure for engineering, business, medicine, and science. Such servers are still designed much in the way when they were organized as individual, distributed systems. Given that many fields rely on big-data analytics substantially, its cost-effectiveness and performance should be improved, which can be achieved by flexibly allowing
-
Ryoan ACM Trans. Comput. Syst. (IF 1.5) Pub Date : 2018-12-17 Tyler Hunt, Zhiting Zhu, Yuanzhong Xu, Simon Peter, Emmett Witchel
Users of modern data-processing services such as tax preparation or genomic screening are forced to trust them with data that the users wish to keep secret. Ryoan 1 protects secret data while it is processed by services that the data owner does not trust. Accomplishing this goal in a distributed setting is difficult, because the user has no control over the service providers or the computational platform