SDFIE-NET – A self-learning dual-feature fusion information capture expression method for birdsong recognition,Applied Acoustics

当前位置： X-MOL 学术 › Appl. Acoust. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SDFIE-NET – A self-learning dual-feature fusion information capture expression method for birdsong recognition
Applied Acoustics ( IF 3.4 ) Pub Date : 2024-04-04 , DOI: 10.1016/j.apacoust.2024.110004
Qin Zhang , Shipeng Hu , Lu Tang , Rui Deng , Choujun Yang , Guoxiong Zhou , Aibin Chen

Bird recognition is important for the monitoring of bird populations and the protection of ecosystems. Identifying birds through image forms can be difficult due to the complexity of natural environments. Song-based bird recognition allows for bird identification with only a small amount of background noise introduced, however, efficiently recognizing bird songs remains a challenging task. Based on this problem, this paper proposed a self-learning dual-feature fusion information capture expression method (SDFIE-NET) for recognizing birdsong. Firstly, using the Mel filter excerpt the low-frequency characteristics of the bird song. Since fixed-parameter filters are incapable of achieving different feature extraction effects based on different birdsong. In this paper, we incorporate a fully learnable audio classification front-end Leaf architecture for the extraction of bird song feature information, which can self-learn different extraction parameters for the birdsong. Effectively combining the high-frequency feature information and low-frequency differences acquired by the two approaches corresponds to the declared dual-feature fusion module (SCDFF), reducing information redundancy and improving characterization capability. Secondly, the backbone network utilizes SDFIE-NET, which is composed of the Fused-MBConv module and modified CA-MBConv module. The Criss-Cross Attention module is added after each layer composed of Fused-MBConv modules. This improves the speed and accuracy of effective information transfer between internal modules and increases the expressive power of the model at the pixel level. To enhance the anti-interference and generalization ability of the model, we constructed a self-made dataset (Bird_alldata) consisting of 30 kinds of birdsong. On this dataset, we performed a variety of experiments, and recognition accuracy reached 95.77 % and the F1-score reached 95.52 %. Generalization experiments were conducted on the environmental sound dataset Urbansound8K and the bird song dataset Birdsdata, and the model achieves recognition accuracies of 94.05 % and 94.10 % on the two datasets, with F1-scores of 94.21 % and 94.05 %, respectively.

中文翻译：

SDFIE-NET – 一种用于鸟鸣识别的自学习双特征融合信息捕获表达方法

鸟类识别对于鸟类种群监测和生态系统保护具有重要意义。由于自然环境的复杂性，通过图像形式识别鸟类可能很困难。基于歌曲的鸟类识别可以在仅引入少量背景噪声的情况下进行鸟类识别，然而，有效识别鸟类歌曲仍然是一项具有挑战性的任务。基于此问题，本文提出了一种用于识别鸟鸣的自学习双特征融合信息捕获表达方法（SDFIE-NET）。首先，利用梅尔滤波器摘录鸟鸣的低频特征。由于固定参数滤波器无法根据不同的鸟鸣声实现不同的特征提取效果。在本文中，我们采用完全可学习的音频分类前端Leaf架构来提取鸟鸣特征信息，它可以自学习不同的鸟鸣提取参数。将两种方法获得的高频特征信息和低频差异有效地结合起来，对应于所声明的双特征融合模块（SCDFF），减少信息冗余并提高表征能力。其次，骨干网络采用SDFIE-NET，它由Fused-MBConv模块和改进的CA-MBConv模块组成。在由 Fused-MBConv 模块组成的每一层之后添加 Criss-Cross Attention 模块。这提高了内部模块之间有效信息传输的速度和准确性，并增加了模型在像素级别的表达能力。为了增强模型的抗干扰和泛化能力，我们构建了由30种鸟鸣组成的自制数据集（Bird_alldata）。在此数据集上，我们进行了各种实验，识别准确率达到了95.77%，F1-score达到了95.52%。在环境声音数据集Urbansound8K和鸟鸣数据集Birdsdata上进行泛化实验，模型在两个数据集上的识别准确率分别为94.05%和94.10%，F1分数分别为94.21%和94.05%。

更新日期：2024-04-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>