当前位置: X-MOL 学术Rob. Auton. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A heterogeneous attention fusion mechanism for the cross-environment scene classification of the home service robot
Robotics and Autonomous Systems ( IF 4.3 ) Pub Date : 2024-01-04 , DOI: 10.1016/j.robot.2024.104619
Bo Zhu , Ximing Fan , Xiang Gao , Guozheng Xu , Junzhe Xie

There have been many methods to improve the capacity of scene classification of service robots. However, most of them are proposed from a technical standpoint but without reference to any cognitive principle of the brain, and furthermore, from design to evaluation, the particularity of the robot task is still not fully considered, such as cross-environment generalization, explicit semantic preservation and interpretation. Thus, the scene cognitive behavior of robots is far from humans, and their environmental adaptability is still poor. It is difficult to complete learning place concepts from discrete fragments and then continuously perceiving them with a limited view in unvisited spaces. Inspired by the recent findings from neuroscience, an attention-based global and object attribute fusion mechanism (AGOFM for short) constructed by three parts is proposed to overcome these deficiencies. In the global attribute part, a global feature extractor and a sequence context extractor are used to generate the holistic feature. The involved context integrates limited views to form an overall impression of a scene for guiding attention. In the object attribute part, a novel object vector is proposed. It simultaneously involves the detected object quantity, category and confidence information, which are all related to the vector index and high-level semantics. In the attention generation part, two sorted top-X characteristics deriving from the above two parts are fed into a fully connected (FC) network with batch normalization to generate effective attention. The attention weights are then applied to the batch normalized global and object vectors respectively, and subsequently, the two heterogeneous information are directly fused by another FC network to achieve scene classification. The policies for multi-learner fusion and frame rejection are also provided. Finally, a novel evaluation paradigm is proposed that the model is trained on a discrete prior dataset, and then the inference is tested on a traditional dataset and two robot view datasets. This simulates the cross-environment situation. Under such severe conditions, the results demonstrate that the proposed method outperforms several popular methods.



中文翻译:

家庭服务机器人跨环境场景分类的异构注意力融合机制

目前已经有很多方法来提高服务机器人的场景分类能力。然而,大多数都是从技术角度提出的,没有参考大脑的任何认知原理,而且从设计到评估,仍然没有充分考虑机器人任务的特殊性,如跨环境泛化、显式化等。语义保存和解释。由此可见,机器人的场景认知行为与人类相差甚远,环境适应能力还较差。从离散的碎片中学习地点概念,然后在未访问过的空间中以有限的视角持续感知它们是很困难的。受神经科学最新发现的启发,提出了一种由三部分构建的基于注意力的全局和对象属性融合机制(简称AGOFM)来克服这些缺陷。在全局属性部分,使用全局特征提取器和序列上下文提取器来生成整体特征。所涉及的情境将有限的视图整合起来,形成场景的整体印象,以引导注意力。在对象属性部分,提出了一种新颖的对象向量。它同时涉及检测到的对象数量、类别和置信度信息,这些信息都与向量索引和高层语义相关。在注意力生成部分,从上述两个部分导出的两个排序的 top-X 特征被输入到具有批量归一化的全连接(FC)网络中,以生成有效的注意力。然后将注意力权重分别应用于批量归一化的全局向量和对象向量,随后,两个异构信息通过另一个FC网络直接融合以实现场景分类。还提供了多学习器融合和框架拒绝的策略。最后,提出了一种新颖的评估范式,即在离散先验数据集上训练模型,然后在传统数据集和两个机器人视图数据集上测试推理。这模拟了跨环境的情况。在如此恶劣的条件下,结果表明所提出的方法优于几种流行的方法。

更新日期:2024-01-04
down
wechat
bug