Efficient Visual Metaphor Image Generation Based on Metaphor Understanding,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Visual Metaphor Image Generation Based on Metaphor Understanding
Neural Processing Letters ( IF 3.1 ) Pub Date : 2024-04-16 , DOI: 10.1007/s11063-024-11609-w
Chang Su , Xingyue Wang , Shupin Liu , Yijiang Chen

Metaphor has significant implications for revealing cognitive and thinking mechanisms. Visual metaphor image generation not only presents metaphorical connotations intuitively but also reflects AI’s understanding of metaphor through the generated images. This paper investigates the task of generating images based on text with visual metaphors. We explore metaphor image generation and create a dataset containing sentences with visual metaphors. Then, we propose a visual metaphor generation image framework based on metaphor understanding, which is more tailored to the essence of metaphor, better utilizes visual features, and has stronger interpretability. Specifically, the framework extracts the source domain, target domain, and metaphor interpretation from metaphorical sentences, separating the elements of the metaphor to deepen the understanding of its themes and intentions. Additionally, the framework introduces image data from the source domain to capture visual similarities and generate visual enhancement prompts specific to the domain. Finally, these prompts are combined with metaphorical interpretation sentences to form the final prompt text. Experimental results demonstrate that this approach effectively captures the essence of metaphor and generates metaphorical images consistent with the textual meaning.

中文翻译：

基于隐喻理解的高效视觉隐喻图像生成

隐喻对于揭示认知和思维机制具有重要意义。视觉隐喻图像生成不仅直观地呈现隐喻内涵，而且通过生成的图像反映了人工智能对隐喻的理解。本文研究了基于具有视觉隐喻的文本生成图像的任务。我们探索隐喻图像生成并创建一个包含带有视觉隐喻的句子的数据集。然后，我们提出了一种基于隐喻理解的视觉隐喻生成图像框架，该框架更贴合隐喻的本质，更好地利用视觉特征，具有更强的可解释性。具体来说，该框架从隐喻句子中提取源域、目标域和隐喻解释，分离隐喻的元素，以加深对其主题和意图的理解。此外，该框架引入了来自源域的图像数据来捕获视觉相似性并生成特定于该域的视觉增强提示。最后，这些提示与隐喻解释语句结合起来，形成最终的提示文本。实验结果表明，该方法有效地捕捉了隐喻的本质，并生成与文本含义一致的隐喻图像。

更新日期：2024-04-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>