当前位置: X-MOL 学术Acta Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Natural language guided object retrieval in images
Acta Informatica ( IF 0.6 ) Pub Date : 2021-07-19 , DOI: 10.1007/s00236-021-00400-2
Ahmad Ostovar 1 , Suna Bensch 1 , Thomas Hellström 1
Affiliation  

The ability to understand the surrounding environment and being able to communicate with interacting humans are important functionalities for many automated systems where visual input (e.g., images, video) and natural language input (speech or text) have to be related to each other. Possible applications are automatic image caption generation, interactive surveillance systems, or human robot interaction. In this paper, we propose algorithms for automatic responses to natural language queries about an image. Our approach uses a predefined neural net for detection of bounding boxes and objects in images, spatial relations between bounding boxes are modeled with a neural net, the queries are analyzed with a syntactic parser, and algorithms to map natural language to properties in the images are introduced. The algorithms make use of semantic similarity and antonyms. We evaluate the performance of our approach with test users assessing the quality of our system’s generated answers.



中文翻译:

图像中自然语言引导的对象检索

理解周围环境并能够与互动的人类交流的能力是许多自动化系统的重要功能,其中视觉输入(例如,图像、视频)和自然语言输入(语音或文本)必须相互关联。可能的应用包括自动图像字幕生成、交互式监视系统或人机交互。在本文中,我们提出了自动响应有关图像的自然语言查询的算法。我们的方法使用预定义的神经网络来检测图像中的边界框和对象,边界框之间的空间关系用神经网络建模,使用句法解析器分析查询,将自然语言映射到图像中的属性的算法是介绍。这些算法利用语义相似性和反义词。我们通过测试用户评估我们系统生成答案的质量来评估我们方法的性能。

更新日期:2021-07-19
down
wechat
bug