A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.,International Journal of Neural Systems

当前位置： X-MOL 学术 › Int. J. Neural Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.
International Journal of Neural Systems ( IF 8 ) Pub Date : 2023-10-20 , DOI: 10.1142/s012906572350065x
Enrique Adrian Villarrubia-Martin ₁ , Luis Rodriguez-Benitez ₁ , Luis Jimenez-Linares ₁ , David Muñoz-Valero ₂ , Jun Liu ₃

Affiliation

Reinforcement learning (RL) is a powerful technique that allows agents to learn optimal decision-making policies through interactions with an environment. However, traditional RL algorithms suffer from several limitations such as the need for large amounts of data and long-term credit assignment, i.e. the problem of determining which actions actually produce a certain reward. Recently, Transformers have shown their capacity to address these constraints in this area of learning in an offline setting. This paper proposes a framework that uses Transformers to enhance the training of online off-policy RL agents and address the challenges described above through self-attention. The proposal introduces a hybrid agent with a mixed policy that combines an online off-policy agent with an offline Transformer agent using the Decision Transformer architecture. By sequentially exchanging the experience replay buffer between the agents, the agent's learning training efficiency is improved in the first iterations and so is the training of Transformer-based RL agents in situations with limited data availability or unknown environments.

中文翻译：

Transformers 支持的混合在线离策略强化学习代理框架。

强化学习（RL）是一种强大的技术，允许代理通过与环境的交互来学习最佳决策策略。然而，传统的强化学习算法存在一些局限性，例如需要大量数据和长期信用分配，即确定哪些动作实际上会产生一定奖励的问题。最近，Transformers 展示了其在线下学习领域解决这些限制的能力。本文提出了一个框架，使用 Transformer 来增强在线离策略 RL 代理的训练，并通过自注意力解决上述挑战。该提案引入了一种具有混合策略的混合代理，该混合代理使用 Decision Transformer 架构将在线离线策略代理与离线 Transformer 代理相结合。通过在智能体之间顺序交换经验重播缓冲区，智能体的学习训练效率在第一次迭代中得到了提高，基于 Transformer 的 RL 智能体在数据可用性有限或环境未知的情况下的训练也得到了提高。

更新日期：2023-10-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>