当前位置: X-MOL 学术J. Phys. Conf. Ser. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An improved DDPG algorithm based on evolution-guided transfer in reinforcement learning
Journal of Physics: Conference Series Pub Date : 2024-02-01 , DOI: 10.1088/1742-6596/2711/1/012016
Xueqian Bai , Haonian Wang

Deep Reinforcement Learning (DRL) algorithms help agents take actions automatically in sophisticated control tasks. However, it is challenged by sparse reward and long training time for exploration in the application of Deep Neural Network (DNN). Evolutionary Algorithms (EAs), a set of black box optimization techniques, are well applied to single agent real-world problems, not troubled by temporal credit assignment. However, both suffer from large sets of sampled data. To facilitate the research on DRL for a pursuit-evasion game, this paper contributes an innovative policy optimization algorithm, which is named as Evolutionary Algorithm Transfer - Deep Deterministic Policy Gradient (EAT-DDPG). The proposed EAT-DDPG takes parameters transfer into consideration, initializing the DNN of DDPG with the parameters driven by EA. Meanwhile, a diverse set of experiences produced by EA are stored into the replay buffer of DDPG before the EA process is ceased. EAT-DDPG is an improved version of DDPG, aiming at maximizing the reward value of the agent trained by DDPG as much as possible within finite episodes. The experimental environment includes a pursuit-evasion scenario where the evader moves with the fixed policy, and the results show that the agent can explore policy more efficiently with the proposed EAT-DDPG during the learning process.

中文翻译:

强化学习中基于进化引导迁移的改进DDPG算法

深度强化学习 (DRL) 算法可帮助智能体在复杂的控制任务中自动采取行动。然而,在深度神经网络(DNN)的应用中,它受到奖励稀疏和探索训练时间长的挑战。进化算法(EA)是一组黑盒优化技术,可以很好地应用于单代理现实世界问题,而不受时间信用分配的困扰。然而,两者都受到大量采样数据的影响。为了促进追逃游戏的DRL研究,本文提出了一种创新的策略优化算法,称为进化算法迁移-深度确定性策略梯度(EAT-DDPG)。所提出的 EAT-DDPG 考虑了参数传递,用 EA 驱动的参数初始化 DDPG 的 DNN。同时,在 EA 进程停止之前,EA 产生的各种经验集将存储到 DDPG 的重放缓冲区中。 EAT-DDPG是DDPG的改进版本,旨在在有限的episode内尽可能地最大化DDPG训练的智能体的奖励值。实验环境包括躲避追击场景,其中躲避者按照固定策略移动,结果表明,智能体在学习过程中可以使用所提出的 EAT-DDPG 更有效地探索策略。
更新日期:2024-02-01
down
wechat
bug