当前位置: X-MOL 学术Sci. China Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A survey on model-based reinforcement learning
Science China Information Sciences ( IF 8.8 ) Pub Date : 2024-01-23 , DOI: 10.1007/s11432-022-3696-5
Fan-Ming Luo , Tian Xu , Hang Lai , Xiong-Hui Chen , Weinan Zhang , Yang Yu

Reinforcement learning (RL) interacts with the environment to solve sequential decision-making problems via a trial-and-error approach. Errors are always undesirable in real-world applications, even though RL excels at playing complex video games that permit several trial-and-error attempts. To improve sample efficiency and thus reduce errors, model-based reinforcement learning (MBRL) is believed to be a promising direction, as it constructs environment models in which trial-and-errors can occur without incurring actual costs. In this survey, we investigate MBRL with a particular focus on the recent advancements in deep RL. There is a generalization error between the learned model of a non-tabular environment and the actual environment. Consequently, it is crucial to analyze the disparity between policy training in the environment model and that in the actual environment, guiding algorithm design for improved model learning, model utilization, and policy training. In addition, we discuss the recent developments of model-based techniques in other forms of RL, such as offline RL, goal-conditioned RL, multi-agent RL, and meta-RL. Furthermore, we discuss the applicability and benefits of MBRL for real-world tasks. Finally, this survey concludes with a discussion of the promising future development prospects for MBRL. We believe that MBRL has great unrealized potential and benefits in real-world applications, and we hope this survey will encourage additional research on MBRL.



中文翻译:

基于模型的强化学习综述

强化学习 (RL) 与环境交互,通过试错方法解决顺序决策问题。尽管强化学习擅长玩允许多次试错的复杂视频游戏,但在现实应用中错误总是不受欢迎的。为了提高样本效率并减少错误,基于模型的强化学习(MBRL)被认为是一个有前途的方向,因为它构建了可以在不产生实际成本的情况下进行试错的环境模型。在本次调查中,我们研究了 MBRL,特别关注深度 RL 的最新进展。非表格环境的学习模型与实际环境之间存在泛化误差。因此,分析环境模型中的策略训练与实际环境中的策略训练之间的差异,指导算法设计以改进模型学习、模型利用和策略训练至关重要。此外,我们还讨论了其他形式的强化学习中基于模型的技术的最新发展,例如离线强化学习、目标条件强化学习、多智能体强化学习和元强化学习。此外,我们还讨论了 MBRL 对于实际任务的适用性和好处。最后,本次调查总结了 MBRL 未来的发展前景。我们相信 MBRL 在现实世界的应用中具有巨大的未实现的潜力和好处,我们希望这项调查能够鼓励对 MBRL 进行更多的研究。

更新日期:2024-01-26
down
wechat
bug