Detection of Deepfake Environmental Audio,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detection of Deepfake Environmental Audio
arXiv - CS - Sound Pub Date : 2024-03-26 , DOI: arxiv-2403.17529
Hafsa Ouajdi, Oussama Hadder, Modan Tailleur, Mathieu Lagrange, Laurie M. Heller

With the ever-rising quality of deep generative models, it is increasingly important to be able to discern whether the audio data at hand have been recorded or synthesized. Although the detection of fake speech signals has been studied extensively, this is not the case for the detection of fake environmental audio. We propose a simple and efficient pipeline for detecting fake environmental sounds based on the CLAP audio embedding. We evaluate this detector using audio data from the 2023 DCASE challenge task on Foley sound synthesis. Our experiments show that fake sounds generated by 44 state-of-the-art synthesizers can be detected on average with 98% accuracy. We show that using an audio embedding learned on environmental audio is beneficial over a standard VGGish one as it provides a 10% increase in detection performance. Informal listening to Incorrect Negative examples demonstrates audible features of fake sounds missed by the detector such as distortion and implausible background noise.

中文翻译：

Deepfake环境音频检测

随着深度生成模型质量的不断提高，能够辨别手头的音频数据是否已被记录或合成变得越来越重要。尽管虚假语音信号的检测已被广泛研究，但虚假环境音频的检测情况并非如此。我们提出了一种简单而高效的管道，用于基于 CLAP 音频嵌入来检测虚假环境声音。我们使用来自 2023 年 DCASE Foley 声音合成挑战任务的音频数据来评估该检测器。我们的实验表明，44 个最先进的合成器生成的假声音平均可以以 98% 的准确率被检测出来。我们证明，使用在环境音频中学习到的音频嵌入比标准 VGGish 更有利，因为它的检测性能提高了 10%。非正式聆听不正确的负例可以展示检测器遗漏的虚假声音的可听特征，例如失真和难以置信的背景噪声。

更新日期：2024-03-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>