当前位置: X-MOL 学术Aut. Control Comp. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Environmental Sound Classification Based on Attention Feature Fusion and Improved Residual Network
Automatic Control and Computer Sciences Pub Date : 2023-08-27 , DOI: 10.3103/s0146411623040119
Jinfang Zeng , Yuxing Liu , Mengjiao Wang , Xin Zhang

Abstract

The classification of environmental sound is an important research area in artificial intelligence and its classification accuracy is greatly affected by feature extraction. However, most existing methods for feature set generation use simple feature fusion methods, which are ineffective for multi-classification purposes. To solve this problem and improve the neural network classification performance of existing training environmental sound classification (ESC) tasks, we first add the Gaussian error linear unit (GELU) activation function and gated linear units (GLU) to the residual network, which improves the network’s stability. Subsequently, this paper proposes a feature fusion method based on the attention mechanism and employs squeeze-and-excitation networks (SENet) to make network learning features fusion and training more successfully, which offers obvious advantages over existing classification methods. Experimental results show that our model has reached an obvious increase in classification accuracy for the two datasets i.e. ESC-10 (98.27%) and ESC-50 (98.32%).



中文翻译:

基于注意力特征融合和改进残差网络的环境声音分类

摘要

环境声音分类是人工智能中的一个重要研究领域,其分类精度很大程度上受特征提取的影响。然而,大多数现有的特征集生成方法都使用简单的特征融合方法,这对于多分类目的是无效的。为了解决这个问题并提高现有训练环境声音分类(ESC)任务的神经网络分类性能,我们首先在残差网络中添加高斯误差线性单元(GELU)激活函数和门控线性单元(GLU),从而提高了网络的稳定性。随后,本文提出了一种基于注意力机制的特征融合方法,并采用挤压和激励网络(SENet)使网络学习特征融合和训练更加成功,与现有的分类方法相比具有明显的优势。实验结果表明,我们的模型对 ESC-10 (98.27%) 和 ESC-50 (98.32%) 这两个数据集的分类精度有了明显的提高。

更新日期:2023-08-28
down
wechat
bug