当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting Voice Cloning Attacks via Timbre Watermarking
arXiv - CS - Sound Pub Date : 2023-12-06 , DOI: arxiv-2312.03410
Chang Liu, Jie Zhang, Tianwei Zhang, Xi Yang, Weiming Zhang, Nenghai Yu

Nowadays, it is common to release audio content to the public. However, with the rise of voice cloning technology, attackers have the potential to easily impersonate a specific person by utilizing his publicly released audio without any permission. Therefore, it becomes significant to detect any potential misuse of the released audio content and protect its timbre from being impersonated. To this end, we introduce a novel concept, "Timbre Watermarking", which embeds watermark information into the target individual's speech, eventually defeating the voice cloning attacks. To ensure the watermark is robust to the voice cloning model's learning process, we design an end-to-end voice cloning-resistant detection framework. The core idea of our solution is to embed and extract the watermark in the frequency domain in a temporally invariant manner. To acquire generalization across different voice cloning attacks, we modulate their shared process and integrate it into our framework as a distortion layer. Experiments demonstrate that the proposed timbre watermarking can defend against different voice cloning attacks, exhibit strong resistance against various adaptive attacks (e.g., reconstruction-based removal attacks, watermark overwriting attacks), and achieve practicality in real-world services such as PaddleSpeech, Voice-Cloning-App, and so-vits-svc. In addition, ablation studies are also conducted to verify the effectiveness of our design. Some audio samples are available at https://timbrewatermarking.github.io/samples.



如今,向公众发布音频内容是很常见的。然而,随着语音克隆技术的兴起,攻击者有可能在未经任何许可的情况下利用特定人公开发布的音频来轻松冒充特定人。因此,检测已发布音频内容的任何潜在滥用并保护其音色不被冒充就变得非常重要。为此,我们引入了一种新颖的概念“音色水印”,它将水印信息嵌入到目标个体的语音中,最终击败语音克隆攻击。为了确保水印对语音克隆模型的学习过程具有鲁棒性,我们设计了一个端到端的语音克隆抗检测框架。我们解决方案的核心思想是以时间不变的方式在频域中嵌入和提取水印。为了获得不同语音克隆攻击的泛化能力,我们调整它们的共享过程并将其作为失真层集成到我们的框架中。实验表明,所提出的音色水印可以防御不同的语音克隆攻击,对各种自适应攻击(例如,基于重构的删除攻击、水印覆盖攻击)表现出强大的抵抗力,并在现实世界的服务中实现实用性,例如PaddleSpeech、Voice- Cloning-App 和 so-vits-svc。此外,还进行了消融研究以验证我们设计的有效性。一些音频样本可在 https://timbrewatermarking.github.io/samples 上找到。