当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments
Autonomous Agents and Multi-Agent Systems ( IF 1.9 ) Pub Date : 2024-03-26 , DOI: 10.1007/s10458-024-09641-0
Junchao Li , Mingyu Cai , Zhen Kan , Shaoping Xiao

Abstract

Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles.



中文翻译:

用于在部分可观察环境中执行复杂任务的自主代理的运动规划的无模型强化学习

摘要

在信息不完整的部分已知环境中,自主代理的运动规划是一个具有挑战性的问题,特别是对于复杂的任务。本文提出了一种无模型强化学习方法来解决这个问题。我们将运动规划制定为概率标记的部分可观察马尔可夫决策过程(PL-POMDP)问题,并使用线性时序逻辑(LTL)来表达复杂的任务。然后将 LTL 公式转换为极限确定性广义 Büchi 自动机 (LDGBA)。该问题被重新定义为基于模型检查技术找到 PL-POMDP 与 LDGBA 的乘积的最优策略来满足复杂的任务。我们利用长短期记忆(LSTM)实现深度 Q 学习来处理观察历史和任务识别。我们的贡献包括所提出的方法、LTL 和 LDGBA 的利用以及 LSTM 增强的深度 Q 学习。我们通过在各种环境(包括网格世界、虚拟办公室和多代理仓库)中进行模拟来证明所提出方法的适用性。模拟结果表明,我们提出的方法有效地解决了环境、行动和观测的不确定性。这表明了其在现实世界应用的潜力,包括无人机的控制。

更新日期:2024-03-27
down
wechat
bug