当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning
Machine Learning ( IF 7.5 ) Pub Date : 2024-02-05 , DOI: 10.1007/s10994-023-06503-w
Lisheng Wu , Ke Chen

Reinforcement learning often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pre-trained skills are applied in goal exploration with theoretical justification. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance.



中文翻译:

通过稀疏奖励长视野目标条件强化学习的预训练技能增强目标探索

强化学习通常难以在复杂的环境中完成稀疏奖励的长期任务。目标条件强化学习(GCRL)已被用来通过易于实现的子目标课程来解决这个难题。在 GCRL 中,探索新的子目标对于智能体最终找到实现所需目标的途径至关重要。如何有效地探索新颖的子目标是 GCRL 中最具挑战性的问题之一。已经提出了几种目标探索方法来解决这个问题,但仍然难以有效地找到所需的目标。在本文中,我们通过优化已实现目标和要探索的新目标的熵,提出了一种新颖的学习目标,以便在基于子目标选择的 GCRL 中更有效地进行目标探索。为了优化这个目标,我们首先探索和利用在与当前任务类似的环境中挖掘的频繁出现的目标转换模式,通过技能学习来组合技能。然后,将预先训练的技能应用于目标探索并进行理论论证。对各种备用奖励长期基准任务的评估表明,将我们的方法纳入几个最先进的 GCRL 基线中可以显着提高其探索效率,同时提高或保持其性能。

更新日期:2024-02-06
down
wechat
bug