Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Novel S-LDA Features for Automatic Emotion Recognition from Speech using 1-D CNN
International Journal of Mathematical, Engineering and Management Sciences Pub Date : 2022-01-01 , DOI: 10.33889/ijmems.2022.7.1.004
Pradeep Tiwari 1 , A. D. Darji 2
Affiliation  

Emotions are explicit and serious mental activities, which find expression in speech, body gestures and facial features, etc. Speech is a fast, effective and the most convenient mode of human communication. Hence, speech has become the most researched modality in Automatic Emotion Recognition (AER). To extract the most discriminative and robust features from speech for Automatic Emotion Recognition (AER) recognition has yet remained a challenge. This paper, proposes a new algorithm named Shifted Linear Discriminant Analysis (S-LDA) to extract modified features from static low-level features like Mel-Frequency Cepstral Coefficients (MFCC) and Pitch. Further 1-D Convolution Neural Network (CNN) was applied to these modified features for extracting high-level features for AER. The performance evaluation of classification task for the proposed techniques has been carried out on the three standard databases: Berlin EMO-DB emotional speech database, Surrey Audio-Visual Expressed Emotion (SAVEE) database and eNTERFACE database. The proposed technique has shown to outperform the results obtained using state of the art techniques. The results shows that the best accuracy obtained for AER using the eNTERFACE database is 86.41%, on the Berlin database is 99.59% and with SAVEE database is 99.57%.

中文翻译:

一种新的 S-LDA 特征,用于使用 1-D CNN 从语音中自动识别情感

情绪是一种外显而严肃的心理活动,主要表现在言语、肢体动作和面部特征等方面。言语是一种快速、有效、最方便的人类交流方式。因此,语音已成为自动情绪识别(AER)中研究最多的模态。从语音中提取最具辨别力和鲁棒性的特征用于自动情绪识别 (AER) 识别仍然是一个挑战。本文提出了一种名为移位线性判别分析 (S-LDA) 的新算法,用于从 Mel-Frequency Cepstral Coefficients (MFCC) 和 Pitch 等静态低级特征中提取修改后的特征。进一步的一维卷积神经网络 (CNN) 被应用于这些修改后的特征,以提取 AER 的高级特征。对所提出技术的分类任务的性能评估已经在三个标准数据库上进行:柏林 EMO-DB 情感语音数据库、萨里视听表达情感 (SAVEE) 数据库和 eENTERFACE 数据库。所提出的技术已证明优于使用最先进技术获得的结果。结果表明,使用 eINTERFACE 数据库获得的 AER 的最佳准确率为 86.41%,在柏林数据库上为 99.59%,使用 SAVEE 数据库为 99.57%。
更新日期:2022-01-01
down
wechat
bug