样式: 排序: IF: - GO 导出 标记为已读
-
A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation arXiv.cs.DC Pub Date : 2024-04-17 Jinghai He, Haoyu Liu, Yuhang Wu, Zeyu Zheng, Tingyu Zhu
We provide a preliminary study on utilizing GPU (Graphics Processing Unit) to accelerate computation for three simulation optimization tasks with either first-order or second-order algorithms. Compared to the implementation using only CPU (Central Processing Unit), the GPU implementation benefits from computational advantages of parallel processing for large-scale matrices and vectors operations. Numerical
-
A Secure and Trustworthy Network Architecture for Federated Learning Healthcare Applications arXiv.cs.DC Pub Date : 2024-04-17 Antonio Boiano, Marco Di Gennaro, Luca Barbieri, Michele Carminati, Monica Nicoli, Alessandro Redondi, Stefano Savazzi, Albert Sund Aillet, Diogo Reis Santos, Luigi Serio
Federated Learning (FL) has emerged as a promising approach for privacy-preserving machine learning, particularly in sensitive domains such as healthcare. In this context, the TRUSTroke project aims to leverage FL to assist clinicians in ischemic stroke prediction. This paper provides an overview of the TRUSTroke FL network infrastructure. The proposed architecture adopts a client-server model with
-
Hierarchical storage management in user space for neuroimaging applications arXiv.cs.DC Pub Date : 2024-04-17 Valérie Hayot-Sasson, Tristan Glatard
Neuroimaging open-data initiatives have led to increased availability of large scientific datasets. While these datasets are shifting the processing bottleneck from compute-intensive to data-intensive, current standardized analysis tools have yet to adopt strategies that mitigate the costs associated with large data transfers. A major challenge in adapting neuroimaging applications for data-intensive
-
IoTSim-Osmosis-RES: Towards autonomic renewable energy-aware osmotic computing arXiv.cs.DC Pub Date : 2024-04-17 Tomasz Szydlo, Amadeusz Szabala, Nazar Kordiumov, Konrad Siuzdak, Lukasz Wolski, Khaled Alwasel, Fawzy Habeeb, Rajiv Ranjan
Internet of Things systems exists in various areas of our everyday life. For example, sensors installed in smart cities and homes are processed in edge and cloud computing centres providing several benefits that improve our lives. The place of data processing is related to the required system response times -- processing data closer to its source results in a shorter system response time. The Osmotic
-
Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route arXiv.cs.DC Pub Date : 2024-04-17 Zonghang Li, Wenjiao Feng, Weibo Cai, Hongfang Yu, Long Luo, Gang Sun, Hongyang Du, Dusit Niyato
Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions. This paradigm eliminates the need for centralizing sensitive raw data in one location but faces the significant challenge of high parameter synchronization delays, which stems from the constraints of bandwidth-limited
-
Undo and Redo Support for Replicated Registers arXiv.cs.DC Pub Date : 2024-04-17 Leo Stewen, Martin Kleppmann
Undo and redo functionality is ubiquitous in collaboration software. In single user settings, undo and redo are well understood. However, when multiple users edit a document, concurrency may arise, leading to a non-linear operation history. This renders undo and redo more complex both in terms of their semantics and implementation. We survey the undo and redo semantics of current mainstream collaboration
-
Mutiny! How does Kubernetes fail, and what can we do about it? arXiv.cs.DC Pub Date : 2024-04-17 Marco Barletta, Marcello Cinque, Catello Di Martino, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In this paper, we i) analyze and classify real-world failures of Kubernetes (the most popular container orchestration system), ii) develop a framework to perform a fault/error injection campaign targeting the data store preserving the cluster state, and iii) compare results of our fault/error injection experiments with real-world failures, showing that our fault/error injections can recreate many real-world
-
GPU-Based Parallel Computing Methods for Medical Photoacoustic Image Reconstruction arXiv.cs.DC Pub Date : 2024-04-16 Xinyao Yi, Yuxin Qiao
Recent years have witnessed a rapid advancement in GPU technology, establishing it as a formidable high-performance parallel computing technology with superior floating-point computational capabilities compared to traditional CPUs. This paper explores the application of this technology in the field of photoacoustic imaging, an emerging non-destructive testing technique in biomedical engineering characterized
-
XMiner: Efficient Directed Subgraph Matching with Pattern Reduction arXiv.cs.DC Pub Date : 2024-04-17 Pingpeng Yuan, Yujiang Wang, Tianyu Ma, Siyuan He, Ling Liu
Graph pattern matching, one of the fundamental graph mining problems, aims to extract structural patterns of interest from an input graph. The state-of-the-art graph matching algorithms and systems are mainly designed for undirected graphs. Directed graph matching is more complex than undirected graph matching because the edge direction must be taken into account before the exploration of each directed
-
Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe arXiv.cs.DC Pub Date : 2024-04-16 Christopher Rae, Joseph K. L. Lee, James Richings
With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, which
-
Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development arXiv.cs.DC Pub Date : 2024-04-16 Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu, Yankun Xu, Xiuwen Guo, Yunlong Fei, Zhaoying Wang, Mingkui Li, Yingjing Jiang, Lv Lu, Liang Su
With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries to
-
Distributing Context-Aware Shared Memory Data Structures: A Case Study on Unordered Linked List arXiv.cs.DC Pub Date : 2024-04-15 Raaghav Ravishankar, Sandeep Kulkarni, Sathya Peri, Gokarna Sharma
In this paper, we focus on partitioning a context-aware shared memory data structure so that it can be implemented as a distributed data structure running on multiple machines. By context-aware data structures, we mean that the result of an operation not only depends upon the value of the shared data but also upon the previous operations performed by the same client. While there is substantial work
-
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores arXiv.cs.DC Pub Date : 2024-04-15 Zixuan Li, Mingxing Duan, Huizhang Luo, Wangdong Yang, Kenli Li, Keqin Li
Sparse tensors are prevalent in real-world applications, often characterized by their large-scale, high-order, and high-dimensional nature. Directly handling raw tensors is impractical due to the significant memory and computational overhead involved. The current mainstream approach involves compressing or decomposing the original tensor. One popular tensor decomposition algorithm is the Tucker decomposition
-
LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System arXiv.cs.DC Pub Date : 2024-04-16 Shijing Hu, Ruijun Deng, Xin Du, Zhihui Lu, Qiang Duan, Yi He, Shih-Chia Huang, Jie Wu
Recent large vision models (e.g., SAM) enjoy great potential to facilitate intelligent perception with high accuracy. Yet, the resource constraints in the IoT environment tend to limit such large vision models to be locally deployed, incurring considerable inference latency thereby making it difficult to support real-time applications, such as autonomous driving and robotics. Edge-cloud collaboration
-
BoLD: Fast and Cheap Dispute Resolution arXiv.cs.DC Pub Date : 2024-04-16 Mario M. Alvarez, Henry Arneson, Ben Berger, Lee Bousfield, Chris Buckland, Yafah Edelman, Edward W. Felten, Daniel Goldman, Raul Jordan, Mahimna Kelkar, Akaki Mamageishvili, Harry Ng, Aman Sanghi, Victor Shoup, Terence Tsao
BoLD is a new dispute resolution protocol that is designed to replace the originally deployed Arbitrum dispute resolution protocol. Unlike that protocol, BoLD is resistant to delay attacks. It achieves this resistance without a significant increase in onchain computation costs and with reduced staking costs.
-
Optimizing Malware Detection in IoT Networks: Leveraging Resource-Aware Distributed Computing for Enhanced Security arXiv.cs.DC Pub Date : 2024-04-12 Sreenitha Kasarapu, Sanket Shukla, Sai Manoj Pudukotai Dinakarrao
In recent years, networked IoT systems have revo- lutionized connectivity, portability, and functionality, offering a myriad of advantages. However, these systems are increasingly targeted by adversaries due to inherent security vulnerabilities and limited computational and storage resources. Malicious applications, commonly known as malware, pose a significant threat to IoT devices and networks. While
-
ChainScience 2024, Conference Proceedings arXiv.cs.DC Pub Date : 2024-04-15 Nicolò Vallarano, Claudio J. Tessone
ChainScience 2024, the second edition of the interdisciplinary conference, brought together academics, practitioners, and industry experts to explore novel developments in the realm of distributed ledger technologies. The conference aimed to bridge diverse fields such as informatics, business, economics, finance, regulation, law, mathematics, physics, and complexity science. The papers presented in
-
The intelligent prediction and assessment of financial information risk in the cloud computing model arXiv.cs.DC Pub Date : 2024-04-14 Yufu Wang, Mingwei Zhu, Jiaqiang Yuan, Guanghui Wang, Hong Zhou
Cloud computing (cloud computing) is a kind of distributed computing, referring to the network "cloud" will be a huge data calculation and processing program into countless small programs, and then, through the system composed of multiple servers to process and analyze these small programs to get the results and return to the user. This report explores the intersection of cloud computing and financial
-
Egret: Reinforcement Mechanism for Sequential Computation Offloading in Edge Computing arXiv.cs.DC Pub Date : 2024-04-14 Haosong Peng, Yufeng Zhan, DiHua Zhai, Xiaopu Zhang, Yuanqing Xia
As an emerging computing paradigm, edge computing offers computing resources closer to the data sources, helping to improve the service quality of many real-time applications. A crucial problem is designing a rational pricing mechanism to maximize the revenue of the edge computing service provider (ECSP). However, prior works have considerable limitations: clients are static and are required to disclose
-
Tangram: High-resolution Video Analytics on Serverless Platform with SLO-aware Batching arXiv.cs.DC Pub Date : 2024-04-14 Haosong Peng, Yufeng Zhan, Peng Li, Yuanqing Xia
Cloud-edge collaborative computing paradigm is a promising solution to high-resolution video analytics systems. The key lies in reducing redundant data and managing fluctuating inference workloads effectively. Previous work has focused on extracting regions of interest (RoIs) from videos and transmitting them to the cloud for processing. However, a naive Infrastructure as a Service (IaaS) resource
-
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs arXiv.cs.DC Pub Date : 2024-04-14 Elliot Kolker-Hicks, Di Zhang, Dong Dai
High Performance Computing (HPC) systems are used across a wide range of disciplines for both large and complex computations. HPC systems often receive many thousands of computational tasks at a time, colloquially referred to as jobs. These jobs must then be scheduled as optimally as possible so they can be completed within a reasonable timeframe. HPC scheduling systems often employ a technique called
-
Centralization in Proof-of-Stake Blockchains: A Game-Theoretic Analysis of Bootstrapping Protocols arXiv.cs.DC Pub Date : 2024-04-15 Varul Srivastava, Sankarshan Damle, Sujit Gujar
Proof-of-stake (PoS) has emerged as a natural alternative to the resource-intensive Proof-of-Work (PoW) blockchain, as was recently seen with the Ethereum Merge. PoS-based blockchains require an initial stake distribution among the participants. Typically, this initial stake distribution is called bootstrapping. This paper argues that existing bootstrapping protocols are prone to centralization. To
-
Enhancing IoT Malware Detection through Adaptive Model Parallelism and Resource Optimization arXiv.cs.DC Pub Date : 2024-04-12 Sreenitha Kasarapu, Sanket Shukla, Sai Manoj Pudukotai Dinakarrao
The widespread integration of IoT devices has greatly improved connectivity and computational capabilities, facilitating seamless communication across networks. Despite their global deployment, IoT devices are frequently targeted for security breaches due to inherent vulnerabilities. Among these threats, malware poses a significant risk to IoT devices. The lack of built-in security features and limited
-
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework arXiv.cs.DC Pub Date : 2024-04-12 Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong
Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed
-
Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning arXiv.cs.DC Pub Date : 2024-04-12 Liwei Wang, Jun Li, Wen Chen, Qingqing Wu, Ming Ding
Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL framework
-
emucxl: an emulation framework for CXL-based disaggregated memory applications arXiv.cs.DC Pub Date : 2024-04-12 Raja Gond, Purushottam Kulkarni
The emergence of CXL (Compute Express Link) promises to transform the status of interconnects between host and devices and in turn impact the design of all software layers. With its low overhead, low latency, and memory coherency capabilities, CXL has the potential to improve the performance of existing devices while making viable new operational use cases (e.g., disaggregated memory pools, cache coherent
-
Byzantine Reliable Broadcast with Low Communication and Time Complexity arXiv.cs.DC Pub Date : 2024-04-11 Thomas Locher
Byzantine reliable broadcast is a fundamental problem in distributed computing, which has been studied extensively over the past decades. State-of-the-art algorithms are predominantly based on the approach to share encoded fragments of the broadcast message, yielding an asymptotically optimal communication complexity when the message size exceeds the network size, a condition frequently encountered
-
Performance Analysis of Decentralized Physical Infrastructure Networks and Centralized Clouds arXiv.cs.DC Pub Date : 2024-04-12 Jan von der Assen, Christian Killer, Alessandro De Carli, Burkhard Stiller
The advent of Decentralized Physical Infrastructure Networks (DePIN) represents a shift in the digital infrastructure of today's Internet. While Centralized Service Providers (CSP) monopolize cloud computing, DePINs aim to enhance data sovereignty and confidentiality and increase resilience against a single point of failure. Due to the novelty of the emerging field of DePIN, this work focuses on the
-
NotNets: Accelerating Microservices by Bypassing the Network arXiv.cs.DC Pub Date : 2024-04-09 Peter Alvaro, Matthew Adiletta, Adrian Cockroft, Frank Hady, Ramesh Illikkal, Esteban Ramos, James Tsai, Robert Soulé
Remote procedure calls are the workhorse of distributed systems. However, as software engineering trends, such as micro-services and serverless computing, push applications towards ever finer-grained decompositions, the overhead of RPC-based communication is becoming too great to bear. In this paper, we argue that point solutions that attempt to optimize one aspect of RPC logic are unlikely to mitigate
-
Analysis of Distributed Algorithms for Big-data arXiv.cs.DC Pub Date : 2024-04-09 Rajendra Purohit, K R Chowdhary, S D Purohit
The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on permanent basis. The present article focuses on the study and performance of distributed and parallel algorithms their file systems, to achieve scalability at local level
-
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey arXiv.cs.DC Pub Date : 2024-04-09 Feng Liang, Zhen Zhang, Haifeng Lu, Victor C. M. Leung, Yanyi Guo, Xiping Hu
With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resources
-
A Systematic Literature Survey of Sparse Matrix-Vector Multiplication arXiv.cs.DC Pub Date : 2024-04-09 Jianhua Gao, Bingjie Liu, Weixing Ji, Hua Huang
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in
-
A Survey of Distributed Graph Algorithms on Massive Graphs arXiv.cs.DC Pub Date : 2024-04-09 Lingkai Meng, Yu Shao, Long Yuan, Longbin Lai, Peng Cheng, Xue Li, Wenyuan Yu, Wenjie Zhang, Xuemin Lin, Jingren Zhou
Distributed processing of large-scale graph data has many practical applications and has been widely studied. In recent years, a lot of distributed graph processing frameworks and algorithms have been proposed. While many efforts have been devoted to analyzing these, with most analyzing them based on programming models, less research focuses on understanding their challenges in distributed environments
-
Software-based Security Framework for Edge and Mobile IoT arXiv.cs.DC Pub Date : 2024-04-09 José Cecílio, Alan Oliveira de Sá, André Souto
With the proliferation of Internet of Things (IoT) devices, ensuring secure communications has become imperative. Due to their low cost and embedded nature, many of these devices operate with computational and energy constraints, neglecting the potential security vulnerabilities that they may bring. This work-in-progress is focused on designing secure communication among remote servers and embedded
-
KaMPIng: Flexible and (Near) Zero-overhead C++ Bindings for MPI arXiv.cs.DC Pub Date : 2024-04-08 Demian Hespe, Lukas Hübner, Florian Kurpicz, Peter Sanders, Matthias Schimek, Daniel Seemaier, Christoph Stelz, Tim Niklas Uhl
The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing, but MPI only provides C and Fortran bindings. While this offers great language interoperability, high-level programming languages like C++ make software development quicker and less error-prone. We propose novel C++ language bindings that cover all abstraction levels from low-level MPI calls to convenient STL-style
-
Efficient Distributed Data Structures for Future Many-core Architectures arXiv.cs.DC Pub Date : 2024-04-08 Panagiota Fatourou, Nikolaos D. Kallimanis, Eleni Kanellou, Odysseas Makridakis, Christi Symeonidou
We study general techniques for implementing distributed data structures on top of future many-core architectures with non cache-coherent or partially cache-coherent memory. With the goal of contributing towards what might become, in the future, the concurrency utilities package in Java collections for such architectures, we end up with a comprehensive collection of data structures by considering different
-
Towards Reconfigurable Linearizable Reads arXiv.cs.DC Pub Date : 2024-04-08 Myles Thiessen, Aleksey Panas, Guy Khazma, Eyal de Lara
Linearizable datastores are desirable because they provide users with the illusion that the datastore is run on a single machine that performs client operations one at a time. To reduce the performance cost of providing this illusion, many specialized algorithms for linearizable reads have been proposed which significantly improve read performance compared to write performance. The main difference
-
Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures arXiv.cs.DC Pub Date : 2024-04-05 Baodi Shan, Mauricio Araya-Polo
Accelerated computing is widely used in high-performance computing. Therefore, it is crucial to experiment and discover how to better utilize GPUGPUs latest generations on relevant applications. In this paper, we present results and share insights about highly tuned stencil-based kernels for NVIDIA Ampere (A100) and Hopper (GH200) architectures. Performance results yield useful insights into the behavior
-
Stable Blockchain Sharding under Adversarial Transaction Generation arXiv.cs.DC Pub Date : 2024-04-05 Ramesh Adhikari, Costas Busch, Dariusz Kowalski
Sharding is used to improve the scalability and performance of blockchain systems. We investigate the stability of blockchain sharding, where transactions are continuously generated by an adversarial model. The system consists of $n$ processing nodes that are divided into $s$ shards. Following the paradigm of classical adversarial queuing theory, transactions are continuously received at injection
-
Sharding Distributed Data Databases: A Critical Review arXiv.cs.DC Pub Date : 2024-04-05 Siamak Solat
This article examines the significant challenges encountered in implementing sharding within distributed replication systems. It identifies the impediments of achieving consensus among large participant sets, leading to scalability, throughput, and performance limitations. These issues primarily arise due to the message complexity inherent in consensus mechanisms. In response, we investigate the potential
-
Achieving High-Performance Fault-Tolerant Routing in HyperX Interconnection Networks arXiv.cs.DC Pub Date : 2024-04-05 Cristóbal Camarero, Alejandro Cano, Carmen Martínez, Ramón Beivide
Interconnection networks are key actors that condition the performance of current large datacenter and supercomputer systems. Both topology and routing are critical aspects that must be carefully considered for a competitive system network design. Moreover, when daily failures are expected, this tandem should exhibit resilience and robustness. Low-diameter networks, including HyperX, are cheaper than
-
Iniva: Inclusive and Incentive-compatible Vote Aggregation arXiv.cs.DC Pub Date : 2024-04-07 Arian Baloochestani, Hanish Gogada, Leander Jehl, Hein Meling
Many blockchain platforms use committee-based consensus for scalability, finality, and security. In this consensus scheme, a committee decides which blocks get appended to the chain, typically through several voting phases. Platforms typically leverage the committee members' recorded votes to reward, punish, or detect failures. A common approach is to let the block proposer decide which votes to include
-
RACS and SADL: Towards Robust SMR in the Wide-Area Network arXiv.cs.DC Pub Date : 2024-04-05 Pasindu Tennage, Antoine Desjardins, Lefteris Kokoris-Kogias
Consensus algorithms deployed in the crash fault tolerant setting chose a leader-based architecture in order to achieve the lowest latency possible. However, when deployed in the wide area they face two key robustness challenges. First, they lose liveness when the network is unreliable because they rely on timeouts to find a leader. Second, they cannot have a high replication factor because of the
-
VELLET: Verifiable Embedded Wallet for Securing Authenticity and Integrity arXiv.cs.DC Pub Date : 2024-04-05 Hiroki Watanabe, Kohei Ichihara, Takumi Aita
The blockchain ecosystem, particularly with the rise of Web3 and Non-Fungible Tokens (NFTs), has experienced a significant increase in users and applications. However, this expansion is challenged by the need to connect early adopters with a wider user base. A notable difficulty in this process is the complex interfaces of blockchain wallets, which can be daunting for those familiar with traditional
-
SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models arXiv.cs.DC Pub Date : 2024-04-04 Aditya Shankar, Hans Brouwer, Rihan Hai, Lydia Chen
Synthetic tabular data is crucial for sharing and augmenting data across silos, especially for enterprises with proprietary data. However, existing synthesizers are designed for centrally stored data. Hence, they struggle with real-world scenarios where features are distributed across multiple silos, necessitating on-premise data storage. We introduce SiloFuse, a novel generative framework for high-quality
-
Wilkins: HPC In Situ Workflows Made Easy arXiv.cs.DC Pub Date : 2024-04-04 Orcun Yildiz, Dmitriy Morozov, Arnur Nigmetov, Bogdan Nicolae, Tom Peterka
In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing
-
Use Cases for High Performance Research Desktops arXiv.cs.DC Pub Date : 2024-04-04 Robert Henschel, Jonas Lindemann, Anders Follin, Bernd Dammann, Cicada Dennis, Abhinav Thota
High Performance Research Desktops are used by HPC centers and research computing organizations to lower the barrier of entry to HPC systems. These Linux desktops are deployed alongside HPC systems, leveraging the investments in HPC compute and storage infrastructure. By serving as a gateway to HPC systems they provide users with an environment to perform setup and infrastructure tasks related to the
-
Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework arXiv.cs.DC Pub Date : 2024-04-04 Chen Li, Ye Zhu, Yang Cao, Jinli Zhang, Annisa Annisa, Debo Cheng, Yasuhiko Morimoto
The computation of the skyline provides a mechanism for utilizing multiple location-based criteria to identify optimal data points. However, the efficiency of these computations diminishes and becomes more challenging as the input data expands. This study presents a novel algorithm aimed at mitigating this challenge by harnessing the capabilities of Apache Spark, a distributed processing platform,
-
INSPIRIT: Optimizing Heterogeneous Task Scheduling through Adaptive Priority in Task-based Runtime Systems arXiv.cs.DC Pub Date : 2024-04-04 Yiqing Wang, Xiaoyan Liu, Hailong Yang, Xinyu Yang, Pengbo Wang, Yi Liu, Zhongzhi Luan, Depei Qian
As modern HPC computing platforms become increasingly heterogeneous, it is challenging for programmers to fully leverage the computation power of massive parallelism offered by such heterogeneity. Consequently, task-based runtime systems have been proposed as an intermediate layer to hide the complex heterogeneity from the application programmers. The core functionality of these systems is to realize
-
Groundhog: Linearly-Scalable Smart Contracting via Commutative Transaction Semantics arXiv.cs.DC Pub Date : 2024-04-04 Geoffrey Ramseyer, David Mazières
Groundhog is a novel design for a smart contract execution engine based around concurrent execution of blocks of transactions. Unlike prior work, transactions within a block in Groundhog are not ordered relative to one another. Instead, our key design insights are first, to design a set of commutative semantics that lets the Groundhog runtime deterministically resolve concurrent accesses to shared
-
Reducing the Impact of I/O Contention in Numerical Weather Prediction Workflows at Scale Using DAOS arXiv.cs.DC Pub Date : 2024-04-03 Nicolau Manubens, Simon D. Smart, Emanuele Danovaro, Tiago Quintino, Adrian Jackson
Operational Numerical Weather Prediction (NWP) workflows are highly data-intensive. Data volumes have increased by many orders of magnitude over the last 40 years, and are expected to continue to do so, especially given the upcoming adoption of Machine Learning in forecast processes. Parallel POSIX-compliant file systems have been the dominant paradigm in data storage and exchange in HPC workflows
-
vPALs: Towards Verified Performance-aware Learning System For Resource Management arXiv.cs.DC Pub Date : 2024-04-03 Guoliang He, Gingfung Yeung, Sheriffo Ceesay, Adam Barker
Accurately predicting task performance at runtime in a cluster is advantageous for a resource management system to determine whether a task should be migrated due to performance degradation caused by interference. This is beneficial for both cluster operators and service owners. However, deploying performance prediction systems with learning methods requires sophisticated safeguard mechanisms due to
-
A Survey on Error-Bounded Lossy Compression for Scientific Datasets arXiv.cs.DC Pub Date : 2024-04-03 Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Robert Underwood, Zhaorui Zhang, Milan Shah, Yafan Huang, Jiajun Huang, Xiaodong Yu, Congrong Ren, Hanqi Guo, Grant Wilkins, Dingwen Tao, Jiannan Tian, Sian Jin, Zizhe Jian, Daoce Wang, MD Hasanur Rahman, Boyuan Zhang, Jon C. Calhoun, Guanpeng Li, Kazutomo Yoshii, Khalid Ayed Alharthi, Franck Cappello
Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each
-
History Trees and Their Applications arXiv.cs.DC Pub Date : 2024-04-03 Giovanni Viglietta
In the theoretical study of distributed communication networks, "history trees" are a discrete structure that naturally models the concept that anonymous agents become distinguishable upon receiving different sets of messages from neighboring agents. By conveniently organizing temporal information in a systematic manner, history trees have been instrumental in the development of optimal deterministic
-
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms arXiv.cs.DC Pub Date : 2024-04-03 Jiaang Duan, Shiyou Qian, Dingyu Yang, Hanwen Hu, Jian Cao, Guangtao Xue
With its elastic power and a pay-as-you-go cost model, the deployment of deep learning inference services (DLISs) on serverless platforms is emerging as a prevalent trend. However, the varying resource requirements of different layers in DL models hinder resource utilization and increase costs, when DLISs are deployed as a single function on serverless platforms. To tackle this problem, we propose
-
Vocabulary Attack to Hijack Large Language Model Applications arXiv.cs.DC Pub Date : 2024-04-03 Patrick Levi, Christoph P. Neumann
The fast advancements in Large Language Models (LLMs) are driving an increasing number of applications. Together with the growing number of users, we also see an increasing number of attackers who try to outsmart these systems. They want the model to reveal confidential information, specific false information, or offensive behavior. To this end, they manipulate their instructions for the LLM by inserting
-
MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving arXiv.cs.DC Pub Date : 2024-04-02 Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, Hao Zhang
Large language models (LLMs) have demonstrated remarkable performance, and organizations are racing to serve LLMs of varying sizes as endpoints for use-cases like chat, programming and search. However, efficiently serving multiple LLMs poses significant challenges for existing approaches due to varying popularity of LLMs. In the paper, we present MuxServe, a flexible spatial-temporal multiplexing system
-
Haina Storage: A Decentralized Secure Storage Framework Based on Improved Blockchain Structure arXiv.cs.DC Pub Date : 2024-04-02 Zijian Zhou, Caimei Wang, Xiaoheng Deng, Jianhao Lu, Qilue Wen, Chen Zhang, Hong Li
Although the decentralized storage technology based on the blockchain can effectively realize secure data storage on cloud services. However, there are still some problems in the existing schemes, such as low storage capacity and low efficiency. To address related issues, we propose a novel decentralized storage framework, which mainly includes four aspects: (1) we proposed a Bi-direction Circular
-
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling arXiv.cs.DC Pub Date : 2024-03-31 Kamran Razavi, Saeid Ghafouri, Max Mühlhäuser, Pooyan Jamshidi, Lin Wang
Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency
-
Experiências, Resultados e Reflexões a partir do Gerenciamento de experimentos no Mundo Real com FANETs e VANTs -- Versão Estendida arXiv.cs.DC Pub Date : 2024-03-29 Bruno José Olivieri de Souza, markus Endler
In the research on FANETs (Flying Ad-Hoc Networks) and distributed coordination of UAVs (Unmanned Aerial Vehicles), also known as drones, there are many studies that validate their proposals through simulations. Simulations are important, but beyond them, there is also a need for real-world tests to validate the proposals and enhance results. However, field experiments involving drones and FANETs are