Adversarial Data Augmentation for Robust Speaker Verification,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial Data Augmentation for Robust Speaker Verification
arXiv - CS - Sound Pub Date : 2024-02-05 , DOI: arxiv-2402.02699
Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a potential issue with the vanilla DA is augmentation residual, i.e., unwanted distortion caused by different types of augmentation. To address this problem, this paper proposes a novel approach called adversarial data augmentation (A-DA) which combines DA with adversarial learning. Specifically, it involves an additional augmentation classifier to categorize various augmentation types used in data augmentation. This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations. Experiments conducted on VoxCeleb and CN-Celeb datasets demonstrate that our proposed A-DA outperforms standard DA in both augmentation matched and mismatched test conditions, showcasing its superior robustness and generalization against acoustic variations.

中文翻译：

用于稳健说话者验证的对抗性数据增强

数据增强（DA）由于其易于实现和显着的有效性而在深度说话人模型中获得了广泛的流行。它通过模拟现实生活中的声学变化来丰富训练数据，使深度神经网络能够学习与说话者相关的表示，同时忽略不相关的声学变化，从而提高鲁棒性和泛化性。然而，vanilla DA 的一个潜在问题是增强残留，即由不同类型的增强引起的不需要的失真。为了解决这个问题，本文提出了一种称为对抗性数据增强（A-DA）的新方法，它将 DA 与对抗性学习相结合。具体来说，它涉及一个额外的增强分类器来对数据增强中使用的各种增强类型进行分类。这种对抗性学习使网络能够生成可以欺骗增强分类器的说话人嵌入，从而使学习到的说话人嵌入在面对增强变化时更加稳健。在 VoxCeleb 和 CN-Celeb 数据集上进行的实验表明，我们提出的 A-DA 在增强匹配和不匹配测试条件下均优于标准 DA，展示了其卓越的鲁棒性和针对声学变化的泛化能力。

更新日期：2024-02-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>