当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Knowledge enhancement and scene understanding for knowledge-based visual question answering
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2023-12-14 , DOI: 10.1007/s10115-023-02028-9
Zhenqiang Su , Gang Gou

Knowledge-based visual question answering calls for not only paying attention to the visual content of images but also the support of relevant outside knowledge for improved question and answer thinking. The semantics of the questions should not be overlooked since knowledge retrieval relies on more than just visual information. This paper first proposed a question-based semantic retrieval strategy to compensate for the absence of image retrieval knowledge in order to better combine visual and knowledge information. Secondly, image caption is added to help the model better achieve scene understanding. Finally, modal knowledge is represented and accumulated through the triplets. Experimental results on the OK-VQA dataset show that the proposed method achieves an improvement of 4.24% and 1.90% over the two baseline methods, respectively, which proves the effectiveness of this method.



中文翻译:


基于知识的视觉问答的知识增强和场景理解



基于知识的视觉问答不仅需要关注图像的视觉内容,还需要相关外部知识的支持,以提高问答思维。问题的语义不应被忽视,因为知识检索不仅仅依赖于视觉信息。本文首先提出了一种基于问题的语义检索策略来弥补图像检索知识的缺失,以便更好地将视觉信息和知识信息结合起来。其次,添加图像标题以帮助模型更好地实现场景理解。最后,模态知识通过三元组来表示和积累。 OK-VQA数据集上的实验结果表明,该方法较两种基线方法分别提高了4.24%和1.90%,证明了该方法的有效性。

更新日期:2023-12-16
down
wechat
bug