当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Not All Federated Learning Algorithms Are Created Equal: A Performance Evaluation Study
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17287
Gustav A. Baumgart, Jaemin Shin, Ali Payani, Myungjin Lee, Ramana Rao Kompella

Federated Learning (FL) emerged as a practical approach to training a model from decentralized data. The proliferation of FL led to the development of numerous FL algorithms and mechanisms. Many prior efforts have given their primary focus on accuracy of those approaches, but there exists little understanding of other aspects such as computational overheads, performance and training stability, etc. To bridge this gap, we conduct extensive performance evaluation on several canonical FL algorithms (FedAvg, FedProx, FedYogi, FedAdam, SCAFFOLD, and FedDyn) by leveraging an open-source federated learning framework called Flame. Our comprehensive measurement study reveals that no single algorithm works best across different performance metrics. A few key observations are: (1) While some state-of-the-art algorithms achieve higher accuracy than others, they incur either higher computation overheads (FedDyn) or communication overheads (SCAFFOLD). (2) Recent algorithms present smaller standard deviation in accuracy across clients than FedAvg, indicating that the advanced algorithms' performances are stable. (3) However, algorithms such as FedDyn and SCAFFOLD are more prone to catastrophic failures without the support of additional techniques such as gradient clipping. We hope that our empirical study can help the community to build best practices in evaluating FL algorithms.

中文翻译:

并非所有联邦学习算法都是一样的:性能评估研究

联邦学习 (FL) 是一种利用分散数据训练模型的实用方法。 FL 的普及导致了众多 FL 算法和机制的发展。许多先前的工作主要关注这些方法的准确性,但对其他方面(例如计算开销、性能和训练稳定性等)知之甚少。为了弥补这一差距,我们对几种规范的 FL 算法进行了广泛的性能评估( FedAvg、FedProx、FedYogi、FedAdam、SCAFFOLD 和 FedDyn),利用名为 Flame 的开源联合学习框架。我们全面的测量研究表明,没有一种算法能够在不同的性能指标上表现最佳。一些关键的观察结果是:(1) 虽然一些最先进的算法比其他算法实现了更高的精度,但它们会产生更高的计算开销 (FedDyn) 或通信开销 (SCAFFOLD)。 (2) 最近的算法在客户端准确度上的标准偏差比 FedAvg 更小,表明先进算法的性能稳定。 (3) 然而,如果没有梯度裁剪等附加技术的支持,FedDyn 和 SCAFFOLD 等算法更容易出现灾难性故障。我们希望我们的实证研究能够帮助社区建立评估 FL 算法的最佳实践。
更新日期:2024-03-27
down
wechat
bug