当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Proactive Detection of Voice Cloning with Localized Watermarking
arXiv - CS - Sound Pub Date : 2024-01-30 , DOI: arxiv-2401.17264
Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

中文翻译:

具有本地化水印的语音克隆主动检测

在快速发展的语音生成模型领域,迫切需要确保音频真实性以应对语音克隆的风险。我们推出了 AudioSeal,这是第一个专为人工智能生成语音的本地化检测而设计的音频水印技术。AudioSeal 采用与定位损失联合训练的生成器/检测器架构,以实现高达样本级别的局部水印检测,以及受听觉掩蔽启发的新颖感知损失,使 AudioSeal 能够实现更好的不可感知性。AudioSeal 在现实生活中的音频操作的鲁棒性和基于自动和人工评估指标的不可察觉性方面实现了最先进的性能。此外,AudioSeal 设计有快速单通道检测器,其速度显着超越现有模型 - 实现检测速度提高两个数量级,使其成为大规模实时应用的理想选择。
更新日期:2024-01-31
down
wechat
bug