当前位置: X-MOL 学术Signal Image Video Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Concept-guided multi-level attention network for image emotion recognition
Signal, Image and Video Processing ( IF 2.3 ) Pub Date : 2024-03-13 , DOI: 10.1007/s11760-024-03074-8
Hansen Yang , Yangyu Fan , Guoyun Lv , Shiya Liu , Zhe Guo

Image emotion recognition aims to predict people’s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.



中文翻译:

用于图像情感识别的概念引导多级注意网络

图像情感识别旨在预测人们对视觉刺激的情感反应。最近,情感区域发现已成为该领域的热门话题,因为它为任务带来了显着的改进。现有研究主要通过从客体方面进行复杂的分析来发现情感区域,对情感的区分度较低。在本文中,我们提出了一种概念引导的多级注意网络(CMANet),充分利用属性方面的概念来增强图像情感识别。为了利用多个概念来指导情感区域的挖掘,CMANet被设计为多层次架构,其中首先在整体分支特征的指导下计算关注语义特征。随后,利用获得的关注语义特征,可以关注局部分支中的特征图的情感区域。然后,提出了一种自适应融合方法来实现视觉和语义特征的互补。值得注意的是,对于容易混淆的情感类别,提出了一种新颖的变权重交叉熵损失,使模型能够专注于硬样本,以提高任务的性能。对多个情感图像数据集的实验证明,所提出的方法是有效的并且优于最先进的方法。

更新日期:2024-03-14
down
wechat
bug