当前位置: X-MOL 学术Int. J. Softw. Eng. Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Dual Decision-Making Continuous Reinforcement Learning Method Based on Sim2Real
International Journal of Software Engineering and Knowledge Engineering ( IF 0.9 ) Pub Date : 2023-11-22 , DOI: 10.1142/s0218194023500626
Wenwen Xiao , Xinzhi Wang , Xiangfeng Luo , Shaorong Xie

Continuous reinforcement learning carries potential security risks when applied in real-world scenarios, which could have significant societal implications. While its field of application is expanding, the majority of applications still remain confined to virtual environments. If only a single continuous learning method is applied to an unmanned system, it will still forget previously learned experiences, and retraining will be required when it encounters unknown environments. This reduces the learning efficiency of the unmanned system. To address these issues, some scholars have suggested prioritizing the experience playback pool and using transfer learning to apply previously learned strategies to new environments. However, these methods only alleviate the speed at which the unmanned system forgets its experiences and do not fundamentally solve the problem. Additionally, they cannot prevent dangerous actions and falling into local optima. Therefore, we propose a dual decision-making continuous learning method based on simulation to reality (Sim2Real). This method employs a knowledge body to eliminate the local optimal dilemma, and corrects bad strategies in a timely manner to ensure that the unmanned system makes the best decision every time. Our experimental results demonstrate that our method has a 30% higher success rate than other state-of-the-art methods, and the model transfer to real scenes is still highly effective.



中文翻译:

一种基于Sim2Real的双重决策连续强化学习方法

持续强化学习在现实场景中应用时会带来潜在的安全风险,这可能会产生重大的社会影响。尽管其应用领域不断扩大,但大多数应用仍然局限于虚拟环境。如果仅将单一的持续学习方法应用于无人系统,它仍然会忘记以前学到的经验,并且在遇到未知环境时需要重新训练。这降低了无人系统的学习效率。为了解决这些问题,一些学者建议优先考虑经验回放池,并使用迁移学习将以前学到的策略应用到新环境中。然而,这些方法只是减缓了无人系统忘记经验的速度,并没有从根本上解决问题。此外,它们无法防止危险行为并陷入局部最优。因此,我们提出了一种基于模拟现实的双重决策持续学习方法(Sim2Real)。该方法利用知识体消除局部最优困境,并及时纠正不良策略,确保无人系统每次都能做出最佳决策。我们的实验结果表明,我们的方法比其他最先进的方法高出 30% 的成功率,并且模型迁移到真实场景仍然非常有效。

更新日期:2023-11-22
down
wechat
bug