Deep Reinforcement Learning-Based Large-Scale Robot Exploration,IEEE Robotics and Automation Letters

当前位置： X-MOL 学术 › IEEE Robot. Automation Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Reinforcement Learning-Based Large-Scale Robot Exploration
IEEE Robotics and Automation Letters ( IF 5.2 ) Pub Date : 2024-03-20 , DOI: 10.1109/lra.2024.3379804
Yuhong Cao ₁ , Rui Zhao ₁ , Yizhuo Wang ₁ , Bairan Xiang ₁ , Guillaume Sartoretti ₁

Affiliation

In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this end, our approach relies on learned attention mechanisms for their powerful ability to capture long-term dependencies at different spatial scales to reason about the robot's entire belief over known areas. Our approach relies on ground truth information (i.e., privileged learning) to guide the environment estimation during training, as well as on a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones. Simulation results show that our model exhibits better exploration efficiency (12% in path length, 6% in makespan) and lower planning time (60%) than the state-of-the-art planners in a

$\text{130}\;\text{m}\times \text{100}\,\text{m}$

benchmark scenario. We also validate our learned model on hardware.

中文翻译：

基于深度强化学习的大规模机器人探索

在这项工作中，我们提出了一种基于深度强化学习（DRL）的反应规划器来解决二维动作空间中基于激光雷达的大规模自主机器人探索问题。我们基于 DRL 的规划器允许代理根据对环境底层转换模型的学习估计，通过对未知区域进行隐式预测来被动地规划其探索路径。为此，我们的方法依赖于学习的注意力机制，因为它们具有捕获不同空间尺度的长期依赖性的强大能力，以推理机器人对已知区域的整体信念。我们的方法依赖于真实信息（即特权学习）来指导训练期间的环境估计，以及图稀疏算法，该算法允许在小规模环境中训练的模型扩展到大规模环境。仿真结果表明，与最先进的规划器相比，我们的模型表现出更好的探索效率（路径长度 12%，完工时间 6%）和更低的规划时间 (60%)。

$\text{130}\;\text{m}\times \text{100}\,\text{m}$

基准场景。我们还在硬件上验证了我们学习的模型。

更新日期：2024-03-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文