当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring Green AI for Audio Deepfake Detection
arXiv - CS - Sound Pub Date : 2024-03-21 , DOI: arxiv-2403.14290
Subhajit Saha, Md Sahidullah, Swagatam Das

The state-of-the-art audio deepfake detectors leveraging deep neural networks exhibit impressive recognition performance. Nonetheless, this advantage is accompanied by a significant carbon footprint. This is mainly due to the use of high-performance computing with accelerators and high training time. Studies show that average deep NLP model produces around 626k lbs of CO\textsubscript{2} which is equivalent to five times of average US car emission at its lifetime. This is certainly a massive threat to the environment. To tackle this challenge, this study presents a novel framework for audio deepfake detection that can be seamlessly trained using standard CPU resources. Our proposed framework utilizes off-the-shelve self-supervised learning (SSL) based models which are pre-trained and available in public repositories. In contrast to existing methods that fine-tune SSL models and employ additional deep neural networks for downstream tasks, we exploit classical machine learning algorithms such as logistic regression and shallow neural networks using the SSL embeddings extracted using the pre-trained model. Our approach shows competitive results compared to the commonly used high-carbon footprint approaches. In experiments with the ASVspoof 2019 LA dataset, we achieve a 0.90\% equal error rate (EER) with less than 1k trainable model parameters. To encourage further research in this direction and support reproducible results, the Python code will be made publicly accessible following acceptance. Github: https://github.com/sahasubhajit/Speech-Spoofing-

中文翻译:

探索用于音频 Deepfake 检测的绿色 AI

利用深度神经网络的最先进的音频深度伪造检测器表现出令人印象深刻的识别性能。尽管如此,这一优势也伴随着巨大的碳足迹。这主要是由于使用了带有加速器的高性能计算和较长的训练时间。研究表明,深度 NLP 模型平均产生约 626,000 磅的 CO\textsubscript{2},相当于美国汽车在其生命周期内平均排放量的五倍。这无疑是对环境的巨大威胁。为了应对这一挑战,本研究提出了一种新颖的音频深度伪造检测框架,可以使用标准 CPU 资源进行无缝训练。我们提出的框架利用现成的基于自我监督学习(SSL)的模型,这些模型经过预先训练并可在公共存储库中使用。与微调 SSL 模型并为下游任务采用额外的深度神经网络的现有方法相比,我们利用经典的机器学习算法,例如逻辑回归和浅层神经网络,使用使用预训练模型提取的 SSL 嵌入。与常用的高碳足迹方法相比,我们的方法显示出具有竞争力的结果。在使用 ASVspoof 2019 LA 数据集进行的实验中,我们使用少于 1k 个可训练模型参数实现了 0.90% 的等错误率 (EER)。为了鼓励这个方向的进一步研究并支持可重现的结果,Python 代码将在接受后公开访问。 Github:https://github.com/sahasubhajit/Speech-Spoofing-
更新日期:2024-03-22
down
wechat
bug