当前位置: X-MOL 学术J. Intell. Robot. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
State-Dependent Maximum Entropy Reinforcement Learning for Robot Long-Horizon Task Learning
Journal of Intelligent & Robotic Systems ( IF 3.3 ) Pub Date : 2024-01-24 , DOI: 10.1007/s10846-024-02049-8
Deshuai Zheng , Jin Yan , Tao Xue , Yong Liu

Task-oriented robot learning has shown significant potential with the development of Reinforcement Learning (RL) algorithms. However, the learning of long-horizon tasks for robots remains a formidable challenge due to the inherent complexity of tasks, typically comprising multiple diverse stages. Universal RL algorithms commonly encounter issues such as slow convergence or even failure to converge altogether when applied to such tasks. The reasons behind these challenges lie in the local optima trap or redundant exploration during the new stages or the junction of two continuous stages. To address these challenges, we propose a novel state-dependent maximum entropy (SDME) reinforcement learning algorithm. This algorithm effectively balances the trade-off between exploration and exploitation around three kinds of critical states arising from the unique nature of long-horizon tasks. We conducted experiments within an open-source simulation environment, focusing on two representative long-horizon tasks. The proposed SDME algorithm exhibits faster and more stable learning capabilities, requiring merely one-third of the number of learning samples necessary for baseline approaches. Furthermore, we assess the generalization ability of our method under randomly initialized conditions, and the results show that the success rate of the SDME algorithm is nearly twice that of the baselines. Our code will be available at https://github.com/Peter-zds/SDME.



中文翻译:

机器人长视野任务学习的状态相关最大熵强化学习

随着强化学习(RL)算法的发展,面向任务的机器人学习已显示出巨大的潜力。然而,由于任务固有的复杂性(通常包括多个不同的阶段),机器人长期任务的学习仍然是一个艰巨的挑战。通用强化学习算法在应用于此类任务时通常会遇到收敛速度慢甚至无法完全收敛等问题。这些挑战背后的原因在于新阶段或两个连续阶段交界处的局部最优陷阱或冗余探索。为了解决这些挑战,我们提出了一种新颖的状态相关最大熵(SDME)强化学习算法。该算法有效地平衡了由于长视野任务的独特性而产生的三种临界状态的探索和利用之间的权衡。我们在开源模拟环境中进行了实验,重点关注两个代表性的长期任务。所提出的 SDME 算法表现出更快、更稳定的学习能力,仅需要基线方法所需学习样本数量的三分之一。此外,我们评估了我们的方法在随机初始化条件下的泛化能力,结果表明 SDME 算法的成功率几乎是基线的两倍。我们的代码将在 https://github.com/Peter-zds/SDME 上提供。

更新日期:2024-01-25
down
wechat
bug