当前位置: X-MOL 学术Optim. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On maximizing probabilities for over-performing a target for Markov decision processes
Optimization and Engineering ( IF 2.1 ) Pub Date : 2023-12-08 , DOI: 10.1007/s11081-023-09870-4
Tanhao Huang , Yanan Dai , Jinwen Chen

This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.



中文翻译:

关于最大化超额执行马尔可夫决策过程目标的概率

本文研究了马尔可夫决策过程中风险敏感控制和最大化超越目标概率的大偏差控制之间的双重关系。为了导出所需的对偶性,我们应用克雷因-鲁特曼定理的非线性扩展来表征最优风险敏感值,并证明存在静态且确定性的最优策略。该价值函数的右侧导数用于表征使对偶性成立的特定目标。证明了“跑赢”概率的最优策略可以近似为风险敏感控制的最优策略。最优风险敏感价值函数的(右手、左手)导数的范围起着重要作用。提出了这两类最优控制问题之间的一些本质区别。

更新日期:2023-12-08
down
wechat
bug