当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speech emotion recognition based on Graph-LSTM neural network
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2023-10-11 , DOI: 10.1186/s13636-023-00303-9
Yan Li , Yapeng Wang , Xu Yang , Sio-Kei Im

Currently, Graph Neural Networks have been extended to the field of speech signal processing. It is the more compact and flexible way to represent speech sequences by graphs. However, the structures of the relationships in recent studies are tend to be relatively uncomplicated. Moreover, the graph convolution module exhibits limitations that impede its adaptability to intricate application scenarios. In this study, we establish the speech-graph using feature similarity and introduce a novel architecture for graph neural network that leverages an LSTM aggregator and weighted pooling. The unweighted accuracy of 65.39% and the weighted accuracy of 71.83% are obtained on the IEMOCAP dataset, achieving the performance comparable to or better than existing graph baselines. This method can improve the interpretability of the model to some extent, and identify speech emotion features effectively.

中文翻译:

基于Graph-LSTM神经网络的语音情感识别

目前,图神经网络已经扩展到语音信号处理领域。这是通过图形表示语音序列的更紧凑和灵活的方式。然而,最近研究中的关系结构往往相对不复杂。此外,图卷积模块表现出的局限性阻碍了其对复杂应用场景的适应性。在这项研究中,我们利用特征相似性建立了语音图,并引入了一种利用 LSTM 聚合器和加权池的新型图神经网络架构。在IEMOCAP数据集上获得了65.39%的未加权准确率和71.83%的加权准确率,实现了与现有图基线相当或更好的性能。该方法可以在一定程度上提高模型的可解释性,有效识别语音情感特征。
更新日期:2023-10-11
down
wechat
bug