IEEE Transactions on Parallel and Distributed Systems期刊最新论文, 计算机, 系统架构类期刊,

Sampling-Based Multi-Job Placement for Heterogeneous Deep Learning Clusters

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-17
Kaiyang Liu, Jingrong Wang, Zhiming Huang, Jianping Pan

更新日期：2024-04-17

详情收藏

On Off-chaining Smart Contract Runtime Protection: A Queuing Model Approach

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-16
Isra M. Ali, Mohamed M. Abdallah

更新日期：2024-04-16

详情收藏

HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-12
Xin Du, Minglong Wang, Zhihui Lu, Qiang Duan, Yuhao Liu, Jianfeng Feng, Huarui Wang

更新日期：2024-04-12

详情收藏

FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning with Partitioning and Parallelism of Search Space

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-10
Xiaqing Li, Qi Guo, Guangyan Zhang, Siwei Ye, Guanhua He, Yiheng Yao, Rui Zhang, Yifan Hao, Zidong Du, Weimin Zheng

更新日期：2024-04-10

详情收藏

MPMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-08
Zheng Zhang, Yaqi Xia, Hulin Wang, Donglin Yang, Chuang Hu, Xiaobo Zhou, Dazhao Cheng

更新日期：2024-04-08

详情收藏

G-Learned Index: Enabling Efficient Learned Index on GPU

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-04-02
Jiesong Liu, Feng Zhang, Lv Lu, Chang Qi, Xiaoguang Guo, Dong Deng, Guoliang Li, Huanchen Zhang, Jidong Zhai, Hechen Zhang, Yuxing Chen, Anqun Pan, Xiaoyong Du

AI and GPU technologies have been widely applied to solve Big Data problems. The total data volume worldwide reaches 200 zettabytes in 2022. How to efficiently index the required content among massive data becomes serious. Recently, a promising learned index has been proposed to address this challenge: It has extremely high efficiency while retaining marginal space overhead. However, we notice that

更新日期：2024-04-02

详情收藏

Parallel Computation of Dominance Scores for Multidimensional Datasets on GPUs

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-27
Wei-Mei Chen, Hsin-Hung Tsai, Joon Fong Ling

The dominance scoring problem in a multidimensional dataset is to return the number of points dominated by a given point, which is a common metric for evaluating the quality of a data point. Dominance scoring is an elementary operator for variations of the skyline operator, including top- $k$ dominating and $k$ -skyband queries. This study proposes query processing for dominance scores that operates

更新日期：2024-03-27

详情收藏

HybridChain: Fast, Accurate, and Secure Transaction Processing With Distributed Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-26
Amirhossein Taherpour, Xiaodong Wang

In order to fully unlock the transformative power of distributed ledgers and blockchains, it is crucial to develop innovative consensus algorithms that can overcome the obstacles of security, scalability, and interoperability, which currently hinder their widespread adoption. This paper introduces HybridChain that combines the advantages of sharded blockchain and DAG distributed ledger, and a consensus

更新日期：2024-03-26

详情收藏

AtRec: Accelerating Recommendation Model Training on CPUs

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-25
Siqi Wang, Tianyu Feng, Hailong Yang, Xin You, Bangduo Chen, Tongxuan Liu, Zhongzhi Luan, Depei Qian

The popularity of recommendation models and the enhanced AI processing capability of CPUs have provided massive performance opportunities to deliver satisfactory experiences to a large number of users. Unfortunately, existing recommendation model training methods fail to achieve high efficiency due to unique challenges such as dynamic shape and high parallelism. To address the above limitations, we

更新日期：2024-03-25

详情收藏

Taking Advantage of the Mistakes: Rethinking Clustered Federated Learning for IoT Anomaly Detection

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-21
Jiamin Fan, Kui Wu, Guoming Tang, Yang Zhou, Shengqiang Huang

Clustered federated learning (CFL) is a promising solution to address the non-IID problem in the spatial domain for federated learning (FL). However, existing CFL solutions overlook the non-IID issue in the temporal domain and lack consideration of time efficiency. In this work, we propose a novel approach, called ClusterFLADS , which takes advantage of the false predictions of the inappropriate global

更新日期：2024-03-21

详情收藏

Taking RNA-RNA Interaction to Machine Peak

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-21
Chiranjeb Mondal, Sanjay Rajopadhye

RNA-RNA interactions (RRIs) are essential in many biological processes, including gene transcription, translation, and localization. They play a critical role in diseases such as cancer and Alzheimer’s. Algorithms to model RRI typically use dynamic programming and have the complexity $\Theta (N^{3} \, M^{3})$ in time and $\Theta (N^{2} \, M^{2})$ in space where $N$ and $M$ are the lengths of the two

更新日期：2024-03-21

详情收藏

HI-Kyber: A Novel High-Performance Implementation Scheme of Kyber Based on GPU

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-20
Xinyi Ji, Jiankuo Dong, Tonggui Deng, Pinchang Zhang, Jiafeng Hua, Fu Xiao

CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete algorithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing

更新日期：2024-03-20

详情收藏

Resource Aware Clustering for Tackling the Heterogeneity of Participants in Federated Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-20
Rahul Mishra, Hari Prabhat Gupta, Garvit Banga, Sajal K. Das

更新日期：2024-03-20

详情收藏

PROV-IO$^+$+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-14
Runzhou Han, Mai Zheng, Suren Byna, Houjun Tang, Bin Dong, Dong Dai, Yong Chen, Dongkyun Kim, Joseph Hassoun, David Thorsley

Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative

更新日期：2024-03-14

详情收藏

DMA-Assisted I/O for Persistent Memory

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-14
Dingding Li, Weijie Zhang, Mianxiong Dong, Kaoru Ota

Modern local persistent memory (PM) file systems often rely on CPU-based memory copying for data transfer between DRAM and PM, resulting in significant CPU resource consumption. While some nascent systems explore DMA (direct memory access) as an alternative for improved efficiency, the intricacies and trade-offs remain obscure. This paper investigates the feasibility of DMA for PM I/O and argues that

更新日期：2024-03-14

详情收藏

Analytical Modeling and Throughput Computation of Blockchain Sharding

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-12
Pourya Soltani, Farid Ashtiani

更新日期：2024-03-12

详情收藏

Revisiting PM-Based B$^{+}$+-Tree With Persistent CPU Cache

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-05
Bowen Zhang, Shengan Zheng, Liangxu Nie, Zhenlin Qi, Hongyi Chen, Linpeng Huang, Hong Mei

Persistent memory (PM) promises near-DRAM performance as well as data persistence. Recently, a new feature called eADR is available for PM-equipped platforms to guarantee the persistence of CPU cache. The emergence of eADR presents unique opportunities to build lock-free data structures and unleash the full potential of PM. In this paper, we propose NBTree, a lock-free PM-friendly B $^+$ -Tree, to

更新日期：2024-03-05

详情收藏

Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-05
Fan Yuan, Xiaojian Yang, Shengguo Li, Dezun Dong, Chun Huang, Zheng Wang

Multigrid preconditioned conjugate gradient (MGPCG) is commonly used in high-performance computing (HPC) workloads. However, MGPCG is notoriously challenging to optimize since most of its computation kernels are memory-bounded with low arithmetic intensity and non-trivial communication patterns among parallel processes. This article presents new techniques to improve the data locality and reduce the

更新日期：2024-03-05

详情收藏

FHVAC: Feature-Level Hybrid Video Adaptive Configuration for Machine-Centric Live Streaming

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-04
Yuanhong Zhang, Weizhan Zhang, Haipeng Du, Caixia Yan, Li Liu, Qinghua Zheng

With the widespread deployment of edge computing, the focus has shifted to machine-centric live video streaming, where endpoint-collected videos are transmitted over networks to edge servers for analysis. Unlike maximizing user's Quality of Experience (QoE), machine-centric video streaming optimizes the machine's Quality of Inference (QoI) by balancing the inference accuracy, inference delay, and transmission

更新日期：2024-03-04

详情收藏

Critique of “Productivity, Portability, Performance Data-Centric Python” by SCC Team From Sun Yat-sen University

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-03-04
Han Huang, Tengyang Zheng, Tianxing Yang, Yang Ye, Siran Liu, Zhe Tang, Shengyou Lu, Guangnan Feng, Zhiguang Chen, Dan Huang

更新日期：2024-03-04

详情收藏

HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-28
Yi-Chien Lin, Bingyi Zhang, Viktor K. Prasanna

As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this

更新日期：2024-02-28

详情收藏

Agile Cache Replacement in Edge Computing via Offline-Online Deep Reinforcement Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-22
Zhe Wang, Jia Hu, Geyong Min, Zhiwei Zhao, Zi Wang

One fundamental problem of content caching in edge computing is how to replace contents in edge servers with limited capacities to meet the dynamic requirements of users without knowing their preferences in advance. Recently, online deep reinforcement learning (DRL)-based caching methods have been developed to address this problem by learning an edge cache replacement policy using samples collected

更新日期：2024-02-22

详情收藏

Byzantine-Tolerant Causal Ordering for Unicasts, Multicasts, and Broadcasts

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-21
Anshuman Misra, Ajay D. Kshemkalyani

Byzantine fault-tolerant causal ordering of messages is useful to many applications. Causal ordering requires a property that we term strong safety, and liveness. In this paper, we use execution histories to prove that it is impossible to solve causal ordering – strong safety and liveness – in a deterministic manner for unicasts, multicasts, and broadcasts in an asynchronous system with one or more

更新日期：2024-02-21

详情收藏

Analysis and Reproducibility of ”Productivity, Portability, Performance: Data-Centric Python”

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-21
Christopher Lompa, Piotr Luczynski

更新日期：2024-02-21

详情收藏

High Throughput Lattice-Based Signatures on GPUs: Comparing Falcon and Mitaka

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20
Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang

The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of

更新日期：2024-02-20

详情收藏

INT-Label: Lightweight In-Band Network-Wide Telemetry via Distributed Labeling

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20
Enge Song, Tian Pan, Haoyu Song, Qiang Fu, Yingjiang Liu, Chenhao Jia, Chuanying Yuan, Minglan Gao, Jiao Zhang, Tao Huang, Yunjie Liu

In-band Network Telemetry (INT) enables hop-by-hop device-internal state exposure for maintaining and troubleshooting data center networks. To achieve network-wide telemetry coverage, orchestration on top of the INT primitive is required. A straightforward solution would flood the network with INT probe packets for maximum measurement coverage, which leads to a huge bandwidth overhead. A refined solution

更新日期：2024-02-20

详情收藏

End-to-End Bayesian Networks Exact Learning in Shared Memory

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-20
Subhadeep Karan, Zainul Abideen Sayed, Jaroslaw Zola

Bayesian networks are important Machine Learning models with many practical applications in, e.g., biomedicine and bioinformatics. The problem of Bayesian networks learning is $\mathcal {NP}$ -hard and computationally challenging. In this article, we propose practical parallel exact algorithms to learn Bayesian networks from data. Our approach uses shared-memory task parallelism to realize exploration

更新日期：2024-02-20

详情收藏

GeoScale: Microservice Autoscaling With Cost Budget in Geo-Distributed Edge Clouds

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-19
Ke Cheng, Sheng Zhang, Meizhao Liu, Yingcheng Gu, Liu Wei, Huanyu Cheng, Kai Liu, Yu Song, Xiaohang Shi, Andong Zhu, Lei Tang

Deploying microservice instances in geo-distributed edge clouds which are located at the network edge and in proximity to end-users can provide on-site processing, thereby improving the quality of service (QoS). To accommodate the time-varying request arrival rate of each edge cloud, the deployment scheme of microservice instances is dynamically adapted, which is called microservice autoscaling. However

更新日期：2024-02-19

详情收藏

Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-13
Burak Aksar, Efe Sencan, Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, Jim Brandt, Brian Kulis, Manuel Egele, Ayse K. Coskun

With the increasing scale and complexity of High-Performance Computing (HPC) systems, performance variations in applications caused by anomalies have become significant bottlenecks in system health and operational efficiency. As we move towards exascale systems, these variations become more prominent due to the increased sharing of resources. Such variations lead to lower energy efficiency and higher

更新日期：2024-02-13

详情收藏

Joint Optimization of Parallelism and Resource Configuration for Serverless Function Steps

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-13
Zhaojie Wen, Qiong Chen, Yipei Niu, Zhen Song, Quanfeng Deng, Fangming Liu

Function-as-a-Service (FaaS) offers a fine-grained resource provision model, enabling developers to build highly elastic cloud applications. User requests are handled by a series of serverless functions step by step, which forms a multi-step workflow. The developers are required to set proper configurations for functions to meet service level objectives (SLOs) and save costs. However, developing the

更新日期：2024-02-13

详情收藏

X-Shard: Optimistic Cross-Shard Transaction Processing for Sharding-Based Blockchains

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-02-01
Jie Xu, Yulong Ming, Zihan Wu, Cong Wang, Xiaohua Jia

Recent advances in cryptocurrencies have sparked significant interest in blockchain technology. However, scalability issues remain a major challenge for wide adoption of blockchains. Sharding is a promising approach to scale blockchains, but existing sharding-based blockchains fail to achieve expected performance gains due to limitations in cross-shard transaction processing. In this paper, we propose

更新日期：2024-02-01

详情收藏

An Offline-Transfer-Online Framework for Cloud-Edge Collaborative Distributed Reinforcement Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-31
Tianyu Zeng, Xiaoxi Zhang, Jingpu Duan, Chao Yu, Chuan Wu, Xu Chen

Recent advances in deep reinforcement learning (DRL) have made it possible to train various powerful agents to perform complex tasks in real-time environments. With the next-generation communication technologies, making cloud-edge collaborative artificial intelligence service with evolved DRL agents can be a significant scenario. However, agents with different algorithms and architectures in the same

更新日期：2024-01-31

详情收藏

Multi-Agent Deep Reinforcement Learning Framework for Renewable Energy-Aware Workflow Scheduling on Distributed Cloud Data Centers

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-31
Amanda Jayanetti, Saman Halgamuge, Rajkumar Buyya

The ever-increasing demand for the cloud computing paradigm has resulted in the widespread deployment of multiple datacenters, the operations of which consume very high levels of energy. The carbon footprint resulting from these operations threatens environmental sustainability while the increased energy costs have a direct impact on the profitability of cloud providers. Using renewable energy sources

更新日期：2024-01-31

详情收藏

CloudSimPer: Simulating Geo-Distributed Datacenters Powered by Renewable Energy Mix

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23
Jie Song, Peimeng Zhu, Yanfeng Zhang, Ge Yu

Nowadays, studies on energy-efficient datacenters, especially the DataCenters powered by Renewable Energy mix (DCRE), have gained great attention. DCREs are large-scale, geo-distributed, and equipped with on-site renewable energy generators. For these features, it is expensive to perform empirical evaluations of proposed algorithms and solutions on the real-world DCREs, while the state-of-the-art datacenter

更新日期：2024-01-23

详情收藏

EvoGWP: Predicting Long-Term Changes in Cloud Workloads Using Deep Graph-Evolution Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23
Jialun Li, Jieqian Yao, Danyang Xiao, Diying Yang, Weigang Wu

Workload prediction plays a crucial role in resource management of large scale cloud datacenters. Although quite a number of methods/algorithms have been proposed, long-term changes have not been explicitly identified and considered. Due to shifty user demands, workload re-locations, or other reasons, the “resource usage pattern” of a workload, which is usually quite stable in a short-term view, may

更新日期：2024-01-23

详情收藏

Synergistically Rebalancing the EDP of Container-Based Parallel Applications

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-23
Vinicius S. da Silva, Everton C. de Lima, Janaína Schwarzrock, Fábio D. Rossi, Marcelo C. Luizelli, Antonio Carlos S. Beck, Arthur F. Lorenzon

The use of containers has become standard in cloud environments. However, many parallel applications in containers will not present gains proportional to the extra available hardware. This inefficient use of hardware naturally leads to energy consumption waste. With that in mind, we propose TT-Autoscaling . It works at two different levels: a) in the container, by automatically and transparently tuning

更新日期：2024-01-23

详情收藏

Reproducing Performance of Data-Centric Python by SCC Team From National Tsing Hua University

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-18
Fu-Chiang Chang, En-Ming Huang, Pin-Yi Kuo, Chan-Yu Mou, Hsu-Tzu Ting, Pang-Ning Wu, Jerry Chou

更新日期：2024-01-18

详情收藏

Suppressing the Interference Within a Datacenter: Theorems, Metric and Strategy

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-16
Yuhang Liu, Xin Deng, Jiapeng Zhou, Mingyu Chen, Yungang Bao

As the paradigm of cloud computing, a datacenter accommodates many co-running applications sharing system resources. Although highly concurrent applications improve resource utilization, the resulting resource contention can increase the uncertainty of quality of services (QoS). Previous studies have shown that achieving high resource utilization and high QoS simultaneously is challenging. Moreover

更新日期：2024-01-16

详情收藏

Collaboration in Federated Learning With Differential Privacy: A Stackelberg Game Analysis

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-16
Guangjing Huang, Qiong Wu, Peng Sun, Qian Ma, Xu Chen

As a privacy-preserving distributed learning paradigm, federated learning (FL) enables multiple client devices to train a shared model without uploading their local data. To further enhance the privacy protection performance of FL, differential privacy (DP) has been successfully incorporated into FL systems to defend against privacy attacks from adversaries. In FL with DP, how to stimulate efficient

更新日期：2024-01-16

详情收藏

Optimizing Full-Spectrum Matrix Multiplications on ARMv8 Multi-Core CPUs

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-10
Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang

General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing. While the mainstream Basic Linear Algebra Subprograms (BLAS) libraries can deliver good performance on large and regular-shaped GEMMs, they are inadequate for optimizing small and irregular-shaped GEMMs, which are commonly seen in emerging HPC applications. Recent research has focused on improving GEMM performance

更新日期：2024-01-10

详情收藏

Estuary: A Low Cross-Shard Blockchain Sharding Protocol Based on State Splitting

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-09
Linpeng Jia, Yanxiu Liu, Keyuan Wang, Yi Sun

Sharding is one of the most promising technologies for significantly increasing blockchain transaction throughput. However, as the number of shards increases, the ratio of cross-shard transactions in existing blockchain sharding protocols gradually approaches 100%. Since cross-shard transactions consume many times more resources than intra-shard transactions, the processing overhead of cross-shard

更新日期：2024-01-09

详情收藏

EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-04
Di Wu, Rehmat Ullah, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese

Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant

更新日期：2024-01-04

详情收藏

Real-Time Offloading for Dependent and Parallel Tasks in Cloud-Edge Environments Using Deep Reinforcement Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2024-01-03
Xing Chen, Shengxi Hu, Chujia Yu, Zheyi Chen, Geyong Min

As an effective technique to relieve the problem of resource constraints on mobile devices (MDs), the computation offloading utilizes powerful cloud and edge resources to process the computation-intensive tasks of mobile applications uploaded from MDs. In cloud-edge computing, the resources (e.g., cloud and edge servers) that can be accessed by mobile applications may change dynamically. Meanwhile

更新日期：2024-01-03

详情收藏

A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-12-15
Dongsheng Li, Shengwei Li, Zhiquan Lai, Yongquan Fu, Xiangyu Ye, Lei Cai, Linbo Qiao

With the increasing volumes of data samples and deep neural network (DNN) models, efficiently scaling the training of DNN models has become a significant challenge for server clusters with AI accelerators in terms of memory and computing efficiency. Existing parallelism schemes can be broadly classified into three categories: data parallelism (splitting data samples), model parallelism (splitting model

更新日期：2023-12-15

详情收藏

TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-12-06
Daoce Wang, Jesus Pulido, Pascal Grosset, Sian Jin, Jiannan Tian, Kai Zhao, James Ahrens, Dingwen Tao

Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation

更新日期：2023-12-06

详情收藏

Batch Jobs Load Balancing Scheduling in Cloud Computing Using Distributional Reinforcement Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-20
Tiangang Li, Shi Ying, Yishi Zhao, Jianga Shang

In cloud computing, how to reasonably allocate computing resources for batch jobs to ensure the load balance of dynamic clusters and meet user requests is an important and challenging task. Most existing studies are based on deep Q network, which utilizes neural networks to estimate the expected value of cumulative return in the scheduling process. The value-based DQN algorithms ignore the complete

更新日期：2023-11-20

详情收藏

PaVM: A Parallel Virtual Machine for Smart Contract Execution and Validation

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-20
Yaozheng Fang, Zhiyuan Zhou, Surong Dai, Jinni Yang, Hui Zhang, Ye Lu

The performance bottleneck of blockchain has shifted from consensus to serial smart contract execution in transaction validation. Previous works predominantly focus on inter-contract parallel execution, but they fail to address the inherent limitations of each smart contract execution performance. In this paper, we propose PaVM, the first smart contract virtual machine that supports both inter-contract

更新日期：2023-11-20

详情收藏

Enabling Efficient Erasure Coding in Disaggregated Memory Systems

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-15
Qiliang Li, Liangliang Xu, Yongkun Li, Min Lyu, Wei Wang, Pengfei Zuo, Yinlong Xu

Disaggregated memory (DM) separates compute and memory resources to build a huge memory pool. Erasure coding (EC) is expected to provide fault tolerance in DM with low memory cost. In DM with EC, objects are first coded in compute servers, then directly written to memory servers via high-speed networks like one-sided RDMA. However, as the one-sided RDMA latency goes down to the microsecond level, coding

更新日期：2023-11-15

详情收藏

Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCs

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-14
Zhe Jiang, Kecheng Yang, Nathan Fisher, Nan Guan, Neil C. Audsley, Zheng Dong

Following the trend of increasing autonomy in real-time systems, multi-core System-on-Chips (SoCs) have enabled devices to better handle the large streams of data and intensive computation required by such autonomous systems. In modern multi-core SoCs, each L1 cache is designed to be tied to an individual processor, and a processor can only access its own L1 cache. This design method ensures the system's

更新日期：2023-11-14

详情收藏

Enabling Streaming Analytics in Satellite Edge Computing via Timely Evaluation of Big Data Queries

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-13
Zichuan Xu, Guangyuan Xu, Hao Wang, Weifa Liang, Qiufen Xia, Shangguang Wang

Internet-of-Things (IoT) applications from many industries, such as transportation (maritime, road, rail, air) and fleet management, offshore monitoring, and farming are located in remote areas without cellular connectivity. Such IoT applications continuously generate stream data with hidden values that need to unveiled in real time. Streaming analytics is emerging as a popular type of Big Data analytics

更新日期：2023-11-13

详情收藏

Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-10
Changlong Li, Yu Liang, Liang Shi, Chao Wang, Chun Jason Xue, Xuehai Zhou

This article presents LegoSwap, a cross-device memory swapping mechanism for mobile devices. It exploits the unbalanced utilization of memory resources across devices. With LegoSwap, remote memory is utilized in a seamless plug-and-play manner. It achieves comparable-to-local swapping performance based on existing network infrastructure. In addition, LegoSwap frees from the effect of remote I/O disconnection

更新日期：2023-11-10

详情收藏

US-Byte: An Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-09
Yunqi Gao, Bing Hu, Mahdi Boloursaz Mashhadi, A-Long Jin, Pei Xiao, Chunming Wu

The communication bottleneck severely constrains the scalability of distributed deep learning, and efficient communication scheduling accelerates distributed DNN training by overlapping computation and communication tasks. However, existing approaches based on tensor partitioning are not efficient and suffer from two challenges: 1) the fixed number of tensor blocks transferred in parallel can not necessarily

更新日期：2023-11-09

详情收藏

Demystifying the Cost of Serverless Computing: Towards a Win-Win Deal

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-07
Fangming Liu, Yipei Niu

Serverless is an emerging computing paradigm that greatly simplifies the development, deployment, and maintenance of cloud applications. However, due to potential cost issues brought by the widely adopted pricing, it is difficult to answer how to use and operate serverless computing services from the perspectives of users and providers. To demystify the cost of serverless computing, we present one

更新日期：2023-11-07

详情收藏

SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-11-06
Isam Mashhour Al Jawarneh, Paolo Bellavista, Antonio Corradi, Luca Foschini, Rebecca Montanari

The widespread adoption of Internet of Things (IoT) motivated the emergence of mixed workloads in smart cities, where fast arriving geo-referenced big data streams are joined with archive tables, aiming at enriching streams with descriptive attributes that enable insightful analytics. Applications are now relying on finding, in real-time, to which geographical regions data streaming tuples belong.

更新日期：2023-11-06

详情收藏

A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-25
Yuan Li, Ahmed Louri, Avinash Karanth

Large-scale deep neural network (DNN) accelerators are poised to facilitate the concurrent processing of diverse DNNs, imposing demanding challenges on the interconnection fabric. These challenges encompass overcoming performance degradation and energy increase associated with system scaling while also necessitating flexibility to support dynamic partitioning and adaptable organization of compute resources

更新日期：2023-10-25

详情收藏

Parallel and Distributed Bayesian Network Structure Learning

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-23
Jian Yang, Jiantong Jiang, Zeyi Wen, Ajmal Mian

Bayesian networks (BNs) are graphical models representing uncertainty in causal discovery, and have been widely used in medical diagnosis and gene analysis due to their effectiveness and good interpretability. However, mainstream BN structure learning methods are computationally expensive, as they must perform numerous conditional independence (CI) tests to decide the existence of edges. Some researchers

更新日期：2023-10-23

详情收藏

Online Learning Algorithms for Context-Aware Video Caching in D2D Edge Networks

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-20
Qiufen Xia, Zhiwei Jiao, Zichuan Xu

With the emergence of various short video platforms such as TikTok and Instagram, coupled with the accelerated pace of people's lives, people are spending more time sharing and watching online videos than ever before, and they gradually turn their attention to short videos with short duration and novel content. Browsing and watching short videos by users with their energy-capacitated devices, such

更新日期：2023-10-20

详情收藏

Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-18
Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Zhongzhi Luan, Depei Qian

Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of stencil order, memory accesses and computation patterns. To adapt diverse stencils to GPUs, a variety of optimization techniques have been proposed

更新日期：2023-10-18

详情收藏

FedHAP: Federated Hashing With Global Prototypes for Cross-Silo Retrieval

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-10-16
Meilin Yang, Jian Xu, Wenbo Ding, Yang Liu

Deep hashing has been widely applied in large-scale data retrieval due to its superior retrieval efficiency and low storage cost. However, data are often scattered in data silos with privacy concerns, so performing centralized data storage and retrieval is not always possible. Leveraging the approach of federated learning (FL) to perform deep hashing is a recent research trend. However, existing frameworks

更新日期：2023-10-16

详情收藏

UMA-MF: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix Factorization

IEEE Trans. Parallel Distrib. Syst. (IF 5.3) Pub Date : 2023-09-20
Yizhi Huang, Yan Liu, Yang Bai, Si Chen, Renfa Li

Recent research has shown that collaborative computing of CPUs and GPUs in the same system can effectively accelerate large-scale SGD-based matrix factorization (MF), but it faces the problem of limited scalability due to parameter synchronization in the server. Theoretically, asynchronous methods can overcome this shortcoming. However, through a series of tests, observations, and analyses, we realize

更新日期：2023-09-20

详情收藏