样式: 排序: IF: - GO 导出 标记为已读
-
Sampling-Based Multi-Job Placement for Heterogeneous Deep Learning Clusters IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-17 Kaiyang Liu, Jingrong Wang, Zhiming Huang, Jianping Pan
-
On Off-chaining Smart Contract Runtime Protection: A Queuing Model Approach IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-16 Isra M. Ali, Mohamed M. Abdallah
-
HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-12 Xin Du, Minglong Wang, Zhihui Lu, Qiang Duan, Yuhao Liu, Jianfeng Feng, Huarui Wang
-
FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning with Partitioning and Parallelism of Search Space IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-10 Xiaqing Li, Qi Guo, Guangyan Zhang, Siwei Ye, Guanhua He, Yiheng Yao, Rui Zhang, Yifan Hao, Zidong Du, Weimin Zheng
-
MPMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-08 Zheng Zhang, Yaqi Xia, Hulin Wang, Donglin Yang, Chuang Hu, Xiaobo Zhou, Dazhao Cheng
-
G-Learned Index: Enabling Efficient Learned Index on GPU IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-02 Jiesong Liu, Feng Zhang, Lv Lu, Chang Qi, Xiaoguang Guo, Dong Deng, Guoliang Li, Huanchen Zhang, Jidong Zhai, Hechen Zhang, Yuxing Chen, Anqun Pan, Xiaoyong Du
AI and GPU technologies have been widely applied to solve Big Data problems. The total data volume worldwide reaches 200 zettabytes in 2022. How to efficiently index the required content among massive data becomes serious. Recently, a promising learned index has been proposed to address this challenge: It has extremely high efficiency while retaining marginal space overhead. However, we notice that
-
Parallel Computation of Dominance Scores for Multidimensional Datasets on GPUs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-27 Wei-Mei Chen, Hsin-Hung Tsai, Joon Fong Ling
The dominance scoring problem in a multidimensional dataset is to return the number of points dominated by a given point, which is a common metric for evaluating the quality of a data point. Dominance scoring is an elementary operator for variations of the skyline operator, including top- $k$ dominating and $k$ -skyband queries. This study proposes query processing for dominance scores that operates
-
HybridChain: Fast, Accurate, and Secure Transaction Processing With Distributed Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-26 Amirhossein Taherpour, Xiaodong Wang
In order to fully unlock the transformative power of distributed ledgers and blockchains, it is crucial to develop innovative consensus algorithms that can overcome the obstacles of security, scalability, and interoperability, which currently hinder their widespread adoption. This paper introduces HybridChain that combines the advantages of sharded blockchain and DAG distributed ledger, and a consensus
-
AtRec: Accelerating Recommendation Model Training on CPUs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-25 Siqi Wang, Tianyu Feng, Hailong Yang, Xin You, Bangduo Chen, Tongxuan Liu, Zhongzhi Luan, Depei Qian
The popularity of recommendation models and the enhanced AI processing capability of CPUs have provided massive performance opportunities to deliver satisfactory experiences to a large number of users. Unfortunately, existing recommendation model training methods fail to achieve high efficiency due to unique challenges such as dynamic shape and high parallelism. To address the above limitations, we
-
Taking Advantage of the Mistakes: Rethinking Clustered Federated Learning for IoT Anomaly Detection IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-21 Jiamin Fan, Kui Wu, Guoming Tang, Yang Zhou, Shengqiang Huang
Clustered federated learning (CFL) is a promising solution to address the non-IID problem in the spatial domain for federated learning (FL). However, existing CFL solutions overlook the non-IID issue in the temporal domain and lack consideration of time efficiency. In this work, we propose a novel approach, called ClusterFLADS , which takes advantage of the false predictions of the inappropriate global
-
Taking RNA-RNA Interaction to Machine Peak IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-21 Chiranjeb Mondal, Sanjay Rajopadhye
RNA-RNA interactions (RRIs) are essential in many biological processes, including gene transcription, translation, and localization. They play a critical role in diseases such as cancer and Alzheimer’s. Algorithms to model RRI typically use dynamic programming and have the complexity $\Theta (N^{3} \, M^{3})$ in time and $\Theta (N^{2} \, M^{2})$ in space where $N$ and $M$ are the lengths of the two
-
HI-Kyber: A Novel High-Performance Implementation Scheme of Kyber Based on GPU IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-20 Xinyi Ji, Jiankuo Dong, Tonggui Deng, Pinchang Zhang, Jiafeng Hua, Fu Xiao
CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete algorithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing
-
Resource Aware Clustering for Tackling the Heterogeneity of Participants in Federated Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-20 Rahul Mishra, Hari Prabhat Gupta, Garvit Banga, Sajal K. Das
-
PROV-IO$^+$+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-14 Runzhou Han, Mai Zheng, Suren Byna, Houjun Tang, Bin Dong, Dong Dai, Yong Chen, Dongkyun Kim, Joseph Hassoun, David Thorsley
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative
-
DMA-Assisted I/O for Persistent Memory IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-14 Dingding Li, Weijie Zhang, Mianxiong Dong, Kaoru Ota
Modern local persistent memory (PM) file systems often rely on CPU-based memory copying for data transfer between DRAM and PM, resulting in significant CPU resource consumption. While some nascent systems explore DMA (direct memory access) as an alternative for improved efficiency, the intricacies and trade-offs remain obscure. This paper investigates the feasibility of DMA for PM I/O and argues that
-
Analytical Modeling and Throughput Computation of Blockchain Sharding IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-12 Pourya Soltani, Farid Ashtiani
-
Revisiting PM-Based B$^{+}$+-Tree With Persistent CPU Cache IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-05 Bowen Zhang, Shengan Zheng, Liangxu Nie, Zhenlin Qi, Hongyi Chen, Linpeng Huang, Hong Mei
Persistent memory (PM) promises near-DRAM performance as well as data persistence. Recently, a new feature called eADR is available for PM-equipped platforms to guarantee the persistence of CPU cache. The emergence of eADR presents unique opportunities to build lock-free data structures and unleash the full potential of PM. In this paper, we propose NBTree, a lock-free PM-friendly B $^+$ -Tree, to
-
Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-05 Fan Yuan, Xiaojian Yang, Shengguo Li, Dezun Dong, Chun Huang, Zheng Wang
Multigrid preconditioned conjugate gradient (MGPCG) is commonly used in high-performance computing (HPC) workloads. However, MGPCG is notoriously challenging to optimize since most of its computation kernels are memory-bounded with low arithmetic intensity and non-trivial communication patterns among parallel processes. This article presents new techniques to improve the data locality and reduce the
-
FHVAC: Feature-Level Hybrid Video Adaptive Configuration for Machine-Centric Live Streaming IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-04 Yuanhong Zhang, Weizhan Zhang, Haipeng Du, Caixia Yan, Li Liu, Qinghua Zheng
With the widespread deployment of edge computing, the focus has shifted to machine-centric live video streaming, where endpoint-collected videos are transmitted over networks to edge servers for analysis. Unlike maximizing user's Quality of Experience (QoE), machine-centric video streaming optimizes the machine's Quality of Inference (QoI) by balancing the inference accuracy, inference delay, and transmission
-
Critique of “Productivity, Portability, Performance Data-Centric Python” by SCC Team From Sun Yat-sen University IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-04 Han Huang, Tengyang Zheng, Tianxing Yang, Yang Ye, Siran Liu, Zhe Tang, Shengyou Lu, Guangnan Feng, Zhiguang Chen, Dan Huang
-
HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-28 Yi-Chien Lin, Bingyi Zhang, Viktor K. Prasanna
As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this
-
Agile Cache Replacement in Edge Computing via Offline-Online Deep Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-22 Zhe Wang, Jia Hu, Geyong Min, Zhiwei Zhao, Zi Wang
One fundamental problem of content caching in edge computing is how to replace contents in edge servers with limited capacities to meet the dynamic requirements of users without knowing their preferences in advance. Recently, online deep reinforcement learning (DRL)-based caching methods have been developed to address this problem by learning an edge cache replacement policy using samples collected
-
Byzantine-Tolerant Causal Ordering for Unicasts, Multicasts, and Broadcasts IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-21 Anshuman Misra, Ajay D. Kshemkalyani
Byzantine fault-tolerant causal ordering of messages is useful to many applications. Causal ordering requires a property that we term strong safety, and liveness. In this paper, we use execution histories to prove that it is impossible to solve causal ordering – strong safety and liveness – in a deterministic manner for unicasts, multicasts, and broadcasts in an asynchronous system with one or more
-
Analysis and Reproducibility of ”Productivity, Portability, Performance: Data-Centric Python” IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-21 Christopher Lompa, Piotr Luczynski
-
High Throughput Lattice-Based Signatures on GPUs: Comparing Falcon and Mitaka IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20 Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang
The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of
-
INT-Label: Lightweight In-Band Network-Wide Telemetry via Distributed Labeling IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20 Enge Song, Tian Pan, Haoyu Song, Qiang Fu, Yingjiang Liu, Chenhao Jia, Chuanying Yuan, Minglan Gao, Jiao Zhang, Tao Huang, Yunjie Liu
In-band Network Telemetry (INT) enables hop-by-hop device-internal state exposure for maintaining and troubleshooting data center networks. To achieve network-wide telemetry coverage, orchestration on top of the INT primitive is required. A straightforward solution would flood the network with INT probe packets for maximum measurement coverage, which leads to a huge bandwidth overhead. A refined solution
-
End-to-End Bayesian Networks Exact Learning in Shared Memory IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20 Subhadeep Karan, Zainul Abideen Sayed, Jaroslaw Zola
Bayesian networks are important Machine Learning models with many practical applications in, e.g., biomedicine and bioinformatics. The problem of Bayesian networks learning is $\mathcal {NP}$ -hard and computationally challenging. In this article, we propose practical parallel exact algorithms to learn Bayesian networks from data. Our approach uses shared-memory task parallelism to realize exploration
-
GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-19 Ke Cheng, Sheng Zhang, Meizhao Liu, Yingcheng Gu, Liu Wei, Huanyu Cheng, Kai Liu, Yu Song, Xiaohang Shi, Andong Zhu, Lei Tang
Deploying microservice instances in geo-distributed edge clouds which are located at the network edge and in proximity to end-users can provide on-site processing, thereby improving the quality of service (QoS). To accommodate the time-varying request arrival rate of each edge cloud, the deployment scheme of microservice instances is dynamically adapted, which is called microservice autoscaling. However
-
Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-13 Burak Aksar, Efe Sencan, Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, Jim Brandt, Brian Kulis, Manuel Egele, Ayse K. Coskun
With the increasing scale and complexity of High-Performance Computing (HPC) systems, performance variations in applications caused by anomalies have become significant bottlenecks in system health and operational efficiency. As we move towards exascale systems, these variations become more prominent due to the increased sharing of resources. Such variations lead to lower energy efficiency and higher
-
Joint Optimization of Parallelism and Resource Configuration for Serverless Function Steps IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-13 Zhaojie Wen, Qiong Chen, Yipei Niu, Zhen Song, Quanfeng Deng, Fangming Liu
Function-as-a-Service (FaaS) offers a fine-grained resource provision model, enabling developers to build highly elastic cloud applications. User requests are handled by a series of serverless functions step by step, which forms a multi-step workflow. The developers are required to set proper configurations for functions to meet service level objectives (SLOs) and save costs. However, developing the
-
X-Shard: Optimistic Cross-Shard Transaction Processing for Sharding-Based Blockchains IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-01 Jie Xu, Yulong Ming, Zihan Wu, Cong Wang, Xiaohua Jia
Recent advances in cryptocurrencies have sparked significant interest in blockchain technology. However, scalability issues remain a major challenge for wide adoption of blockchains. Sharding is a promising approach to scale blockchains, but existing sharding-based blockchains fail to achieve expected performance gains due to limitations in cross-shard transaction processing. In this paper, we propose
-
An Offline-Transfer-Online Framework for Cloud-Edge Collaborative Distributed Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-31 Tianyu Zeng, Xiaoxi Zhang, Jingpu Duan, Chao Yu, Chuan Wu, Xu Chen
Recent advances in deep reinforcement learning (DRL) have made it possible to train various powerful agents to perform complex tasks in real-time environments. With the next-generation communication technologies, making cloud-edge collaborative artificial intelligence service with evolved DRL agents can be a significant scenario. However, agents with different algorithms and architectures in the same
-
Multi-Agent Deep Reinforcement Learning Framework for Renewable Energy-Aware Workflow Scheduling on Distributed Cloud Data Centers IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-31 Amanda Jayanetti, Saman Halgamuge, Rajkumar Buyya
The ever-increasing demand for the cloud computing paradigm has resulted in the widespread deployment of multiple datacenters, the operations of which consume very high levels of energy. The carbon footprint resulting from these operations threatens environmental sustainability while the increased energy costs have a direct impact on the profitability of cloud providers. Using renewable energy sources
-
CloudSimPer: Simulating Geo-Distributed Datacenters Powered by Renewable Energy Mix IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23 Jie Song, Peimeng Zhu, Yanfeng Zhang, Ge Yu
Nowadays, studies on energy-efficient datacenters, especially the DataCenters powered by Renewable Energy mix (DCRE), have gained great attention. DCREs are large-scale, geo-distributed, and equipped with on-site renewable energy generators. For these features, it is expensive to perform empirical evaluations of proposed algorithms and solutions on the real-world DCREs, while the state-of-the-art datacenter
-
EvoGWP: Predicting Long-Term Changes in Cloud Workloads Using Deep Graph-Evolution Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23 Jialun Li, Jieqian Yao, Danyang Xiao, Diying Yang, Weigang Wu
Workload prediction plays a crucial role in resource management of large scale cloud datacenters. Although quite a number of methods/algorithms have been proposed, long-term changes have not been explicitly identified and considered. Due to shifty user demands, workload re-locations, or other reasons, the “resource usage pattern” of a workload, which is usually quite stable in a short-term view, may
-
Synergistically Rebalancing the EDP of Container-Based Parallel Applications IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23 Vinicius S. da Silva, Everton C. de Lima, Janaína Schwarzrock, Fábio D. Rossi, Marcelo C. Luizelli, Antonio Carlos S. Beck, Arthur F. Lorenzon
The use of containers has become standard in cloud environments. However, many parallel applications in containers will not present gains proportional to the extra available hardware. This inefficient use of hardware naturally leads to energy consumption waste. With that in mind, we propose TT-Autoscaling . It works at two different levels: a) in the container, by automatically and transparently tuning
-
Reproducing Performance of Data-Centric Python by SCC Team From National Tsing Hua University IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-18 Fu-Chiang Chang, En-Ming Huang, Pin-Yi Kuo, Chan-Yu Mou, Hsu-Tzu Ting, Pang-Ning Wu, Jerry Chou
-
Suppressing the Interference Within a Datacenter: Theorems, Metric and Strategy IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-16 Yuhang Liu, Xin Deng, Jiapeng Zhou, Mingyu Chen, Yungang Bao
As the paradigm of cloud computing, a datacenter accommodates many co-running applications sharing system resources. Although highly concurrent applications improve resource utilization, the resulting resource contention can increase the uncertainty of quality of services (QoS). Previous studies have shown that achieving high resource utilization and high QoS simultaneously is challenging. Moreover
-
Collaboration in Federated Learning With Differential Privacy: A Stackelberg Game Analysis IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-16 Guangjing Huang, Qiong Wu, Peng Sun, Qian Ma, Xu Chen
As a privacy-preserving distributed learning paradigm, federated learning (FL) enables multiple client devices to train a shared model without uploading their local data. To further enhance the privacy protection performance of FL, differential privacy (DP) has been successfully incorporated into FL systems to defend against privacy attacks from adversaries. In FL with DP, how to stimulate efficient
-
Optimizing Full-Spectrum Matrix Multiplications on ARMv8 Multi-Core CPUs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-10 Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang
General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing. While the mainstream Basic Linear Algebra Subprograms (BLAS) libraries can deliver good performance on large and regular-shaped GEMMs, they are inadequate for optimizing small and irregular-shaped GEMMs, which are commonly seen in emerging HPC applications. Recent research has focused on improving GEMM performance
-
Estuary: A Low Cross-Shard Blockchain Sharding Protocol Based on State Splitting IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-09 Linpeng Jia, Yanxiu Liu, Keyuan Wang, Yi Sun
Sharding is one of the most promising technologies for significantly increasing blockchain transaction throughput. However, as the number of shards increases, the ratio of cross-shard transactions in existing blockchain sharding protocols gradually approaches 100%. Since cross-shard transactions consume many times more resources than intra-shard transactions, the processing overhead of cross-shard
-
EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-04 Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant
-
Real-Time Offloading for Dependent and Parallel Tasks in Cloud-Edge Environments Using Deep Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-03 Xing Chen, Shengxi Hu, Chujia Yu, Zheyi Chen, Geyong Min
As an effective technique to relieve the problem of resource constraints on mobile devices (MDs), the computation offloading utilizes powerful cloud and edge resources to process the computation-intensive tasks of mobile applications uploaded from MDs. In cloud-edge computing, the resources (e.g., cloud and edge servers) that can be accessed by mobile applications may change dynamically. Meanwhile
-
A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-12-15 Dongsheng Li, Shengwei Li, Zhiquan Lai, Yongquan Fu, Xiangyu Ye, Lei Cai, Linbo Qiao
With the increasing volumes of data samples and deep neural network (DNN) models, efficiently scaling the training of DNN models has become a significant challenge for server clusters with AI accelerators in terms of memory and computing efficiency. Existing parallelism schemes can be broadly classified into three categories: data parallelism (splitting data samples), model parallelism (splitting model
-
TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-12-06 Daoce Wang, Jesus Pulido, Pascal Grosset, Sian Jin, Jiannan Tian, Kai Zhao, James Ahrens, Dingwen Tao
Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation
-
Batch Jobs Load Balancing Scheduling in Cloud Computing Using Distributional Reinforcement Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-20 Tiangang Li, Shi Ying, Yishi Zhao, Jianga Shang
In cloud computing, how to reasonably allocate computing resources for batch jobs to ensure the load balance of dynamic clusters and meet user requests is an important and challenging task. Most existing studies are based on deep Q network, which utilizes neural networks to estimate the expected value of cumulative return in the scheduling process. The value-based DQN algorithms ignore the complete
-
PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-20 Yaozheng Fang, Zhiyuan Zhou, Surong Dai, Jinni Yang, Hui Zhang, Ye Lu
The performance bottleneck of blockchain has shifted from consensus to serial smart contract execution in transaction validation. Previous works predominantly focus on inter-contract parallel execution, but they fail to address the inherent limitations of each smart contract execution performance. In this paper, we propose PaVM, the first smart contract virtual machine that supports both inter-contract
-
Enabling Efficient Erasure Coding in Disaggregated Memory Systems IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-15 Qiliang Li, Liangliang Xu, Yongkun Li, Min Lyu, Wei Wang, Pengfei Zuo, Yinlong Xu
Disaggregated memory (DM) separates compute and memory resources to build a huge memory pool. Erasure coding (EC) is expected to provide fault tolerance in DM with low memory cost. In DM with EC, objects are first coded in compute servers, then directly written to memory servers via high-speed networks like one-sided RDMA. However, as the one-sided RDMA latency goes down to the microsecond level, coding
-
Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-14 Zhe Jiang, Kecheng Yang, Nathan Fisher, Nan Guan, Neil C. Audsley, Zheng Dong
Following the trend of increasing autonomy in real-time systems, multi-core System-on-Chips (SoCs) have enabled devices to better handle the large streams of data and intensive computation required by such autonomous systems. In modern multi-core SoCs, each L1 cache is designed to be tied to an individual processor, and a processor can only access its own L1 cache. This design method ensures the system's
-
Enabling Streaming Analytics in Satellite Edge Computing via Timely Evaluation of Big Data Queries IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-13 Zichuan Xu, Guangyuan Xu, Hao Wang, Weifa Liang, Qiufen Xia, Shangguang Wang
Internet-of-Things (IoT) applications from many industries, such as transportation (maritime, road, rail, air) and fleet management, offshore monitoring, and farming are located in remote areas without cellular connectivity. Such IoT applications continuously generate stream data with hidden values that need to unveiled in real time. Streaming analytics is emerging as a popular type of Big Data analytics
-
Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-10 Changlong Li, Yu Liang, Liang Shi, Chao Wang, Chun Jason Xue, Xuehai Zhou
This article presents LegoSwap, a cross-device memory swapping mechanism for mobile devices. It exploits the unbalanced utilization of memory resources across devices. With LegoSwap, remote memory is utilized in a seamless plug-and-play manner. It achieves comparable-to-local swapping performance based on existing network infrastructure. In addition, LegoSwap frees from the effect of remote I/O disconnection
-
US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-09 Yunqi Gao, Bing Hu, Mahdi Boloursaz Mashhadi, A-Long Jin, Pei Xiao, Chunming Wu
The communication bottleneck severely constrains the scalability of distributed deep learning, and efficient communication scheduling accelerates distributed DNN training by overlapping computation and communication tasks. However, existing approaches based on tensor partitioning are not efficient and suffer from two challenges: 1) the fixed number of tensor blocks transferred in parallel can not necessarily
-
Demystifying the Cost of Serverless Computing: Towards a Win-Win Deal IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-07 Fangming Liu, Yipei Niu
Serverless is an emerging computing paradigm that greatly simplifies the development, deployment, and maintenance of cloud applications. However, due to potential cost issues brought by the widely adopted pricing, it is difficult to answer how to use and operate serverless computing services from the perspectives of users and providers. To demystify the cost of serverless computing, we present one
-
SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-06 Isam Mashhour Al Jawarneh, Paolo Bellavista, Antonio Corradi, Luca Foschini, Rebecca Montanari
The widespread adoption of Internet of Things (IoT) motivated the emergence of mixed workloads in smart cities, where fast arriving geo-referenced big data streams are joined with archive tables, aiming at enriching streams with descriptive attributes that enable insightful analytics. Applications are now relying on finding, in real-time, to which geographical regions data streaming tuples belong.
-
A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-25 Yuan Li, Ahmed Louri, Avinash Karanth
Large-scale deep neural network (DNN) accelerators are poised to facilitate the concurrent processing of diverse DNNs, imposing demanding challenges on the interconnection fabric. These challenges encompass overcoming performance degradation and energy increase associated with system scaling while also necessitating flexibility to support dynamic partitioning and adaptable organization of compute resources
-
Parallel and Distributed Bayesian Network Structure Learning IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-23 Jian Yang, Jiantong Jiang, Zeyi Wen, Ajmal Mian
Bayesian networks (BNs) are graphical models representing uncertainty in causal discovery, and have been widely used in medical diagnosis and gene analysis due to their effectiveness and good interpretability. However, mainstream BN structure learning methods are computationally expensive, as they must perform numerous conditional independence (CI) tests to decide the existence of edges. Some researchers
-
Online Learning Algorithms for Context-Aware Video Caching in D2D Edge Networks IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-20 Qiufen Xia, Zhiwei Jiao, Zichuan Xu
With the emergence of various short video platforms such as TikTok and Instagram, coupled with the accelerated pace of people's lives, people are spending more time sharing and watching online videos than ever before, and they gradually turn their attention to short videos with short duration and novel content. Browsing and watching short videos by users with their energy-capacitated devices, such
-
Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-18 Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Zhongzhi Luan, Depei Qian
Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of stencil order, memory accesses and computation patterns. To adapt diverse stencils to GPUs, a variety of optimization techniques have been proposed
-
FedHAP: Federated Hashing With Global Prototypes for Cross-Silo Retrieval IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-16 Meilin Yang, Jian Xu, Wenbo Ding, Yang Liu
Deep hashing has been widely applied in large-scale data retrieval due to its superior retrieval efficiency and low storage cost. However, data are often scattered in data silos with privacy concerns, so performing centralized data storage and retrieval is not always possible. Leveraging the approach of federated learning (FL) to perform deep hashing is a recent research trend. However, existing frameworks
-
UMA-MF: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix Factorization IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-09-20 Yizhi Huang, Yan Liu, Yang Bai, Si Chen, Renfa Li
Recent research has shown that collaborative computing of CPUs and GPUs in the same system can effectively accelerate large-scale SGD-based matrix factorization (MF), but it faces the problem of limited scalability due to parameter synchronization in the server. Theoretically, asynchronous methods can overcome this shortcoming. However, through a series of tests, observations, and analyses, we realize