Comparative Performance Analysis of Metaheuristic Feature Selection Methods for Speech Emotion Recognition,Measurement Science Review

当前位置： X-MOL 学术 › Meas. Sci. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comparative Performance Analysis of Metaheuristic Feature Selection Methods for Speech Emotion Recognition
Measurement Science Review ( IF 0.9 ) Pub Date : 2024-04-13 , DOI: 10.2478/msr-2024-0010
Turgut Ozseven ₁ , Mustafa Arpacioglu ₂

Affiliation

Emotion recognition systems from speech signals are realized with the help of acoustic or spectral features. Acoustic analysis is the extraction of digital features from speech files using digital signal processing methods. Another method is the analysis of time-frequency images of speech using image processing. The size of the features obtained by acoustic analysis is in the thousands. Therefore, classification complexity increases and causes variation in classification accuracy. In feature selection, features unrelated to emotions are extracted from the feature space and are expected to contribute to the classifier performance. Traditional feature selection methods are mostly based on statistical analysis. Another feature selection method is the use of metaheuristic algorithms to detect and remove irrelevant features from the feature set. In this study, we compare the performance of metaheuristic feature selection algorithms for speech emotion recognition. For this purpose, a comparative analysis was performed on four different datasets, eight metaheuristics and three different classifiers. The results of the analysis show that the classification accuracy increases when the feature size is reduced. For all datasets, the highest accuracy was achieved with the support vector machine. The highest accuracy for the EMO-DB, EMOVA, eNTERFACE’05 and SAVEE datasets is 88.1%, 73.8%, 73.3% and 75.7%, respectively.

中文翻译：

语音情感识别元启发式特征选择方法的性能比较分析

语音信号的情感识别系统是借助声学或频谱特征来实现的。声学分析是使用数字信号处理方法从语音文件中提取数字特征。另一种方法是使用图像处理来分析语音的时频图像。通过声学分析获得的特征大小有数千个。因此，分类复杂度增加并导致分类精度的变化。在特征选择中，从特征空间中提取与情感无关的特征，并期望对分类器性能做出贡献。传统的特征选择方法大多基于统计分析。另一种特征选择方法是使用元启发式算法来检测并从特征集中删除不相关的特征。在本研究中，我们比较了语音情感识别的元启发式特征选择算法的性能。为此，对四个不同的数据集、八个元启发法和三个不同的分类器进行了比较分析。分析结果表明，当特征尺寸减小时，分类精度提高。对于所有数据集，支持向量机实现了最高的准确度。 EMO-DB、EMOVA、eNTERFACE'05 和 SAVEE 数据集的最高准确度分别为 88.1%、73.8%、73.3% 和 75.7%。

更新日期：2024-04-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>