当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Potential-based reward shaping using state–space segmentation for efficiency in reinforcement learning
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2024-04-07 , DOI: 10.1016/j.future.2024.03.057
Melis İlayda Bal , Hüseyin Aydın , Cem İyigün , Faruk Polat

Reinforcement Learning (RL) algorithms encounter slow learning in environments with sparse explicit reward structures due to the limited feedback available on the agent’s behavior. This problem is exacerbated particularly in complex tasks with large state and action spaces. To address this inefficiency, in this paper, we propose a novel approach based on to decompose the task and to provide more frequent feedback to the agent. Our approach involves extracting state–space segments by formulating the problem as a minimum cut problem on a transition graph, constructed using the agent’s experiences during interactions with the environment via the algorithm. Subsequently, these segments are leveraged in the agent’s learning process through potential-based reward shaping. Our experimentation on benchmark problem domains with sparse rewards demonstrated that our proposed method effectively accelerates the agent’s learning without compromising computation time while upholding the policy invariance principle.

中文翻译:

使用状态空间分割进行基于电位的奖励塑造以提高强化学习的效率

由于代理行为的可用反馈有限,强化学习 (RL) 算法在显式奖励结构稀疏的环境中会遇到学习缓慢的问题。这个问题在具有大状态和动作空间的复杂任务中尤其严重。为了解决这种低效率问题,在本文中,我们提出了一种基于分解任务并向代理提供更频繁的反馈的新方法。我们的方法涉及通过将问题表述为转换图上的最小割问题来提取状态空间段,该转换图是通过算法使用代理在与环境交互期间的经验构建的。随后,通过基于潜力的奖励塑造,在代理的学习过程中利用这些细分。我们在具有稀疏奖励的基准问题域上进行的实验表明,我们提出的方法有效地加速了代理的学习,而不会影响计算时间,同时维护策略不变性原则。
更新日期:2024-04-07
down
wechat
bug