Uncertainty-aware Distributional Offline Reinforcement Learning,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Uncertainty-aware Distributional Offline Reinforcement Learning
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17646
Xiaocong Chen, Siyu Wang, Tong Yu, Lina Yao

Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.

中文翻译：

不确定性感知分布式离线强化学习

离线强化学习（RL）提出了独特的挑战，因为它仅依赖于观察数据。在这种情况下，一个核心问题是通过量化与各种行动和环境随机性相关的不确定性来确保所学政策的安全性。传统方法主要强调通过学习风险规避政策来减轻认知不确定性，往往忽视环境随机性。在本研究中，我们提出了一种不确定性感知的分布式离线强化学习方法，以同时解决认知不确定性和环境随机性。我们提出了一种无模型的离线强化学习算法，能够学习风险规避策略并表征贴现累积奖励的整个分布，而不是仅仅最大化累积贴现回报的预期值。我们的方法通过风险敏感和风险中性基准的综合实验进行了严格评估，证明了其卓越的性能。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>