当前位置: X-MOL 学术J. Intell. Robot. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains
Journal of Intelligent & Robotic Systems ( IF 3.3 ) Pub Date : 2024-04-16 , DOI: 10.1007/s10846-024-02092-5
Hamidreza Kasaei , Mohammadreza Kasaei , Georgios Tziafas , Sha Luo , Remo Sasso

To aid humans in everyday tasks, robots need to know which objects exist in the scene, where they are, and how to grasp and manipulate them in different situations. Therefore, object recognition and grasping are two key functionalities for autonomous robots. Most state-of-the-art approaches treat object recognition and grasping as two separate problems, even though both use visual input. Furthermore, the knowledge of the robot is fixed after the training phase. In such cases, if the robot encounters new object categories, it must be retrained to incorporate new information without catastrophic forgetting. To resolve this problem, we propose a deep learning architecture with an augmented memory capacity to handle open-ended object recognition and grasping simultaneously. In particular, our approach takes multi-views of an object as input and jointly estimates pixel-wise grasp configuration as well as a deep scale- and rotation-invariant representation as output. The obtained representation is then used for open-ended object recognition through a meta-active learning technique. We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings. Our approach empowers a robot to acquire knowledge about new object categories using, on average, less than five instances per category and achieve \(95\%\) object recognition accuracy and above \(91\%\) grasp success rate on (highly) cluttered scenarios in both simulation and real-robot experiments. A video of these experiments is available online at: https://youtu.be/n9SMpuEkOgk



中文翻译:

开放域中的同时多视图对象识别和抓取

为了帮助人类完成日常任务,机器人需要知道场景中存在哪些物体、它们在哪里,以及如何在不同情况下抓取和操纵它们。因此,物体识别和抓取是自主机器人的两个关键功能。大多数最先进的方法将对象识别和抓取视为两个独立的问题,即使两者都使用视觉输入。此外,机器人的知识在训练阶段之后是固定的。在这种情况下,如果机器人遇到新的物体类别,它必须重新训练以吸收新信息,而不会发生灾难性的遗忘。为了解决这个问题,我们提出了一种具有增强内存容量的深度学习架构,可以同时处理开放式对象识别和抓取。特别是,我们的方法将对象的多视图作为输入,并联合估计像素级的抓取配置以及深度尺度和旋转不变的表示作为输出。然后,通过元主动学习技术将获得的表示用于开放式对象识别。我们在模拟和现实环境中使用很少的现场示例展示了我们的方法掌握从未见过的对象并快速学习新对象类别的能力。我们的方法使机器人能够平均使用每个类别不到 5 个实例来获取有关新对象类别的知识,并实现\(95\%\)对象识别准确率和高于\(91\%\) 的抓取成功率(高度)模拟和真实机器人实验中的混乱场景。这些实验的视频可在线观看:https://youtu.be/n9SMpuEkOgk

更新日期:2024-04-16
down
wechat
bug