当前位置: X-MOL 学术Comput. Intell. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stressed Speech Emotion Recognition Using Teager Energy and Spectral Feature Fusion with Feature Optimization
Computational Intelligence and Neuroscience ( IF 3.120 ) Pub Date : 2023-10-11 , DOI: 10.1155/2023/5765760
Surekha Reddy Bandela 1 , S Siva Priyanka 2 , K Sunil Kumar 3 , Y Vijay Bhaskar Reddy 4 , Afework Aemro Berhanu 5
Affiliation  

The objective of speech emotion recognition (SER) is to enhance man–machine interface. It can also be used to cover the physiological state of a person in critical situations. In recent time, speech emotion recognition also finds its operations in medicine and forensics. A new feature extraction technique using Teager energy operator (TEO) is proposed for the detection of stressed emotions as Teager energy-autocorrelation envelope (TEO-Auto-Env). TEO is basically designed for increasing the energies of the stressed speech signals whose energies are reduced during the speech production process and hence used in this analysis. A stressed speech emotion recognition (SSER) system is developed using TEO-Auto-Env and spectral feature combination for detecting the emotions. The spectral features considered are Mel-frequency cepstral coefficients (MFCC), linear prediction cepstral coefficients (LPCC), and relative spectra–perceptual linear prediction (RASTA-PLP). EMO-DB (German), EMOVO (Italian), IITKGP (Telugu), and EMA (English) databases are used in this analysis. The classification of the emotions is carried out using the k-nearest neighborhood (k-NN) classifier for gender-dependent (GD) and speaker-independent (SI) cases. The proposed SSER system provides improved accuracy compared to the existing ones. Average recall is used for performance evaluation. The highest classification accuracy is achieved using the feature combination of TEO-Auto-Env, MFCC, and LPCC features with 91.4% (SI), 91.4% (GD-male), and 93.1%(GD-female) for EMO-DB; 68.5% (SI), 68.5% (GD-male), and 74.6% (GD-female) for EMOVO; 90.6%(SI), 91% (GD-male), and 92.3% (GD-female) for EMA; and 95.1% (GD-female) for IITKGP female database.

中文翻译:

使用 Teager 能量和频谱特征融合与特征优化的强调语音情绪识别

语音情感识别(SER)的目标是增强人机界面。它还可以用于覆盖危急情况下人的生理状态。近年来,语音情感识别也开始在医学和法医学领域发挥作用。提出了一种使用 Teager 能量算子(TEO)的新特征提取技术,用于检测压力情绪,即 Teager 能量自相关包络(TEO-Auto-Env)。TEO 基本上是为了增加受压语音信号的能量而设计的,这些信号的能量在语音产生过程中会减少,因此用于此分析。使用 TEO-Auto-Env 和频谱特征组合开发了压力语音情绪识别 (SSER) 系统来检测情绪。考虑的频谱特征是梅尔频率倒谱系数(MFCC)、线性预测倒谱系数(LPCC)和相对频谱-感知线性预测(RASTA-PLP)。此分析使用 EMO-DB(德语)、EMOVO(意大利语)、IITKGP(泰卢固语)和 EMA(英语)数据库。对于性别相关 (GD) 和说话者无关 (SI) 情况,使用 k 最近邻 (k-NN) 分类器进行情绪分类。与现有系统相比,所提出的 SSER 系统提供了更高的精度。平均召回率用于绩效评估。使用 TEO-Auto-Env、MFCC 和 LPCC 特征组合实现最高分类精度,EMO-DB 为 91.4%(SI)、91.4%(GD-male)和 93.1%(GD-female);EMOVO 为 68.5%(SI)、68.5%(GD-男性)和 74.6%(GD-女性);EMA 为 90.6%(SI)、91%(GD-男性)和 92.3%(GD-女性);IITKGP 女性数据库为 95.1%(GD-女性)。
更新日期:2023-10-11
down
wechat
bug