当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Focal Modulation Networks for Interpretable Sound Classification
arXiv - CS - Sound Pub Date : 2024-02-05 , DOI: arxiv-2402.02754
Luca Della Libera, Cem Subakan, Mirco Ravanelli

The increasing success of deep neural networks has raised concerns about their inherent black-box nature, posing challenges related to interpretability and trust. While there has been extensive exploration of interpretation techniques in vision and language, interpretability in the audio domain has received limited attention, primarily focusing on post-hoc explanations. This paper addresses the problem of interpretability by-design in the audio domain by utilizing the recently proposed attention-free focal modulation networks (FocalNets). We apply FocalNets to the task of environmental sound classification for the first time and evaluate their interpretability properties on the popular ESC-50 dataset. Our method outperforms a similarly sized vision transformer both in terms of accuracy and interpretability. Furthermore, it is competitive against PIQ, a method specifically designed for post-hoc interpretation in the audio domain.

中文翻译:

用于可解释声音分类的焦点调制网络

深度神经网络的日益成功引起了人们对其固有黑盒性质的担忧,并带来了与可解释性和信任相关的挑战。尽管人们对视觉和语言的解释技术进行了广泛的探索,但音频领域的可解释性受到的关注有限,主要集中在事后解释上。本文利用最近提出的无注意力焦点调制网络(FocalNets)解决了音频领域的可解释性设计问题。我们首次将 FocalNets 应用于环境声音分类任务,并在流行的 ESC-50 数据集上评估其可解释性。我们的方法在准确性和可解释性方面都优于类似大小的视觉转换器。此外,它与 PIQ 具有竞争力,PIQ 是一种专门为音频领域事后解释而设计的方法。
更新日期:2024-02-06
down
wechat
bug