当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multimodal Arabic emotion recognition using deep learning
Speech Communication ( IF 3.2 ) Pub Date : 2023-11-04 , DOI: 10.1016/j.specom.2023.103005
Noora Al Roken , Gerassimos Barlas

Emotion Recognition has been an active area for decades due to the complexity of the problem and its significance in human–computer interaction. Various methods have been employed to tackle this problem, leveraging different inputs such as speech, 2D and 3D images, audio signals, and text, all of which can convey emotional information. Recently, researchers have started combining multiple modalities to enhance the accuracy of emotion classification, recognizing that different emotions may be better expressed through different input types. This paper introduces a novel Arabic audio-visual natural-emotion dataset, investigates two existing multimodal classifiers, and proposes a new classifier trained on our Arabic dataset. Our evaluation encompasses different aspects, including variations in visual dataset sizes, joint and disjoint training, single and multimodal networks, as well as consecutive and overlapping segmentation. Through 5-fold cross-validation, our proposed classifier achieved exceptional results with an average F1-score of 0.912 and an accuracy of 0.913 for natural emotion recognition.



中文翻译:

使用深度学习的多模态阿拉伯语情感识别

由于问题的复杂性及其在人机交互中的重要性,情感识别几十年来一直是一个活跃的领域。人们已经采用各种方法来解决这个问题,利用不同的输入,例如语音、2D 和 3D 图像、音频信号和文本,所有这些都可以传达情感信息。最近,研究人员认识到通过不同的输入类型可以更好地表达不同的情绪,因此开始结合多种模式来提高情绪分类的准确性。本文介绍了一种新颖的阿拉伯语视听自然情感数据集,研究了两个现有的多模态分类器,并提出了一种在我们的阿拉伯语数据集上训练的新分类器。我们的评估涵盖不同的方面,包括视觉数据集大小的变化、联合和不相交训练、单模态和多模态网络以及连续和重叠分割。通过 5 倍交叉验证,我们提出的分类器取得了优异的结果,平均 F1 分数为 0.912,自然情绪识别的准确度为 0.913。

更新日期:2023-11-04
down
wechat
bug