当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
E-DOCRNet: A multi-feature fusion network for dog bark identification
Applied Acoustics ( IF 3.4 ) Pub Date : 2024-03-19 , DOI: 10.1016/j.apacoust.2024.109950
Rui Deng , Guoxiong Zhou , Lu Tang , Choujun Yang , Aibin Chen

Dog bark is the most commonly used communication method of dogs. Human ear has difficulty in distinguishing information in dog bark, while deep learning methods rises recognition accuracy and efficiency. However, there exists information loss during the process of feature extraction. Aiming at this problem, this paper proposed a lightweight multi-feature fusion network E-DOCRNet. The network is composed of 4 stages, including information pre-process, feature extraction, feature fusion and decision making/classification. Firstly, the dog bark audio information is converted into Constant Q Transformer (CQT) Chroma, Spectral Contrast, Tonnetz and Mel Spectrograms by various filters. After that, features in the diagrams are further extracted by optimized EfficientNetV2_s and the proposed network DOCRNN. Then the time domain feature and frequency domain feature are learnable weighted fusion through the proposed Bi-feature fusion Block (BFB). Finally, the fused feature is sorted into four categories. This study collected barking data of 6 common breeds of dogs to establish the Dogbark_GA dataset. The audios in the dataset are labeled as adult male, adult female, juvenile male and juvenile female. The E-DOCRNet conducted experiments on the self-built dataset and the public dataset Urbansound8K, the accuracies are 92.3 % and 94.6 % respectively. According to comparison and ablation experiments, the proposed method has more advantages than the existing advanced classification methods.

中文翻译:

E-DOCRNet:用于狗吠识别的多特征融合网络

狗叫是狗最常用的交流方式。人耳难以区分狗叫声中的信息,而深度学习方法提高了识别的准确性和效率。然而,在特征提取过程中存在信息丢失的情况。针对这一问题,本文提出了一种轻量级多特征融合网络E-DOCRNet。该网络由4个阶段组成,包括信息预处理、特征提取、特征融合和决策/分类。首先,狗吠音频信息通过各种滤波器转换为恒定 Q 变换器 (CQT) 色度、频谱对比度、Tonnetz 和梅尔频谱图。之后,通过优化的 EfficientNetV2_s 和提出的网络 DOCRNN 进一步提取图中的特征。然后通过提出的双特征融合块(BFB)将时域特征和频域特征进行可学习的加权融合。最后,融合后的特征被分为四类。本研究收集了6种常见犬种的吠叫数据,建立了Dogbark_GA数据集。数据集中的音频被标记为成年男性、成年女性、青少年男性和青少年女性。 E-DOCRNet在自建数据集和公共数据集Urbansound8K上进行了实验,准确率分别为92.3%和94.6%。根据比较和消融实验,该方法比现有的先进分类方法更具优势。
更新日期:2024-03-19
down
wechat
bug