当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting Voice Cloning Attacks via Timbre Watermarking
arXiv - CS - Sound Pub Date : 2023-12-06 , DOI: arxiv-2312.03410 Chang Liu, Jie Zhang, Tianwei Zhang, Xi Yang, Weiming Zhang, Nenghai Yu
arXiv - CS - Sound Pub Date : 2023-12-06 , DOI: arxiv-2312.03410 Chang Liu, Jie Zhang, Tianwei Zhang, Xi Yang, Weiming Zhang, Nenghai Yu
Nowadays, it is common to release audio content to the public. However, with
the rise of voice cloning technology, attackers have the potential to easily
impersonate a specific person by utilizing his publicly released audio without
any permission. Therefore, it becomes significant to detect any potential
misuse of the released audio content and protect its timbre from being
impersonated. To this end, we introduce a novel concept, "Timbre Watermarking",
which embeds watermark information into the target individual's speech,
eventually defeating the voice cloning attacks. To ensure the watermark is
robust to the voice cloning model's learning process, we design an end-to-end
voice cloning-resistant detection framework. The core idea of our solution is
to embed and extract the watermark in the frequency domain in a temporally
invariant manner. To acquire generalization across different voice cloning
attacks, we modulate their shared process and integrate it into our framework
as a distortion layer. Experiments demonstrate that the proposed timbre
watermarking can defend against different voice cloning attacks, exhibit strong
resistance against various adaptive attacks (e.g., reconstruction-based removal
attacks, watermark overwriting attacks), and achieve practicality in real-world
services such as PaddleSpeech, Voice-Cloning-App, and so-vits-svc. In addition,
ablation studies are also conducted to verify the effectiveness of our design.
Some audio samples are available at
https://timbrewatermarking.github.io/samples.
中文翻译:
通过音色水印检测语音克隆攻击
如今,向公众发布音频内容是很常见的。然而,随着语音克隆技术的兴起,攻击者有可能在未经任何许可的情况下利用特定人公开发布的音频来轻松冒充特定人。因此,检测已发布音频内容的任何潜在滥用并保护其音色不被冒充就变得非常重要。为此,我们引入了一种新颖的概念“音色水印”,它将水印信息嵌入到目标个体的语音中,最终击败语音克隆攻击。为了确保水印对语音克隆模型的学习过程具有鲁棒性,我们设计了一个端到端的语音克隆抗检测框架。我们解决方案的核心思想是以时间不变的方式在频域中嵌入和提取水印。为了获得不同语音克隆攻击的泛化能力,我们调整它们的共享过程并将其作为失真层集成到我们的框架中。实验表明,所提出的音色水印可以防御不同的语音克隆攻击,对各种自适应攻击(例如,基于重构的删除攻击、水印覆盖攻击)表现出强大的抵抗力,并在现实世界的服务中实现实用性,例如PaddleSpeech、Voice- Cloning-App 和 so-vits-svc。此外,还进行了消融研究以验证我们设计的有效性。一些音频样本可在 https://timbrewatermarking.github.io/samples 上找到。
更新日期:2023-12-07
中文翻译:
通过音色水印检测语音克隆攻击
如今,向公众发布音频内容是很常见的。然而,随着语音克隆技术的兴起,攻击者有可能在未经任何许可的情况下利用特定人公开发布的音频来轻松冒充特定人。因此,检测已发布音频内容的任何潜在滥用并保护其音色不被冒充就变得非常重要。为此,我们引入了一种新颖的概念“音色水印”,它将水印信息嵌入到目标个体的语音中,最终击败语音克隆攻击。为了确保水印对语音克隆模型的学习过程具有鲁棒性,我们设计了一个端到端的语音克隆抗检测框架。我们解决方案的核心思想是以时间不变的方式在频域中嵌入和提取水印。为了获得不同语音克隆攻击的泛化能力,我们调整它们的共享过程并将其作为失真层集成到我们的框架中。实验表明,所提出的音色水印可以防御不同的语音克隆攻击,对各种自适应攻击(例如,基于重构的删除攻击、水印覆盖攻击)表现出强大的抵抗力,并在现实世界的服务中实现实用性,例如PaddleSpeech、Voice- Cloning-App 和 so-vits-svc。此外,还进行了消融研究以验证我们设计的有效性。一些音频样本可在 https://timbrewatermarking.github.io/samples 上找到。