VIEMF: Multimodal metaphor detection via visual information enhancement with multimodal fusion,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

VIEMF: Multimodal metaphor detection via visual information enhancement with multimodal fusion
Information Processing & Management ( IF 8.6 ) Pub Date : 2024-01-23 , DOI: 10.1016/j.ipm.2024.103652
Xiaoyu He , Long Yu , Shengwei Tian , Qimeng Yang , Jun Long , Bo Wang

In this paper, we study multimodal metaphor detection to obtain real semantic meaning from multiple heterogeneous information sources. The existing approaches mainly suffer from two drawbacks. (1) They focus on textual aspects, overlooking the characteristics of visual metaphor information. (2) Efficient methods for fusing multimodal metaphor features are lacking. To address the first issue, we propose a visual information enhancement method based on dual-granularity visual feature fusion, obtaining complete metaphorical visual features. To achieve bidirectional interaction among multimodal metaphor features, we further develop a multi-interactive crossmodal residual network (MCRN) that fuses the consistent and complementary information between different modalities and design a progressive fusion strategy to enhance the iterative fusion ability of the model. We extensively evaluate the proposed method on the popular Met-meme metaphor detection benchmark, outperforming the existing state-of-the-art methods by a large margins; i.e., we achieve F1 score improvements ranging from 1.47% to 2.55% under different languages. In addition, we further extend the evaluation to the Sarcasm dataset to validate the ability of the model to perceive semantic contrasts and meaning transformations, and the experimental results are superior to those of a strong baseline model.

中文翻译：

VIEMF：通过多模态融合视觉信息增强进行多模态隐喻检测

在本文中，我们研究多模态隐喻检测，以从多个异构信息源中获取真实的语义。现有方法主要有两个缺点。(1)注重文本方面，忽视了视觉隐喻信息的特点。(2)缺乏融合多模态隐喻特征的有效方法。针对第一个问题，我们提出了一种基于双粒度视觉特征融合的视觉信息增强方法，获得完整的隐喻视觉特征。为了实现多模态隐喻特征之间的双向交互，我们进一步开发了一种多交互跨模态残差网络（MCRN），该网络融合了不同模态之间的一致和互补信息，并设计了渐进式融合策略以增强模型的迭代融合能力。我们在流行的 Met-meme 隐喻检测基准上广泛评估了所提出的方法，大大优于现有的最先进方法；即，我们在不同语言下实现了从 1.47% 到 2.55% 的 F1 分数提高。此外，我们进一步将评估扩展到 Sarcasm 数据集，以验证模型感知语义对比和意义转换的能力，实验结果优于强基线模型。

更新日期：2024-01-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>