当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding
arXiv - CS - Sound Pub Date : 2024-02-05 , DOI: arxiv-2402.02889
Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen

The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated by large-scale heterogeneous audio sources. In this paper, we evaluate the performance of feature-matching and predictive audio-SSL techniques when integrated into large-scale FL settings simulated with non-independently identically distributed (non-iid) data. We propose a novel Federated SSL (F-SSL) framework, dubbed FASSL, that enables learning intermediate feature representations from large-scale decentralized heterogeneous clients, holding unlabelled audio data. Our study has found that audio F-SSL approaches perform on par with the centralized audio-SSL approaches on the audio-retrieval task. Extensive experiments demonstrate the effectiveness and significance of FASSL as it assists in obtaining the optimal global model for state-of-the-art FL aggregation methods.

中文翻译:

探索通用音频理解的联合自监督学习

联邦学习 (FL) 和自监督学习 (SSL) 的集成提供了独特的协同组合,可利用音频数据进行通用音频理解,而不会损害用户数据隐私。然而,人们很少努力研究 FL 体系中的 SSL 模型以实现通用音频理解,特别是当训练数据由大规模异构音频源生成时。在本文中,我们评估了特征匹配和预测音频 SSL 技术集成到使用非独立同分布(非独立同分布)数据模拟的大规模 FL 设置中时的性能。我们提出了一种新颖的联邦 SSL (F-SSL) 框架,称为 FASSL,它能够从大规模去中心化异构客户端学习中间特征表示,并保存未标记的音频数据。我们的研究发现,在音频检索任务中,音频 F-SSL 方法的表现与集中式音频 SSL 方法相当。大量实验证明了 FASSL 的有效性和重要性,因为它有助于获得最先进的 FL 聚合方法的最佳全局模型。
更新日期:2024-02-06
down
wechat
bug