当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Anderson acceleration for partially observable Markov decision processes: A maximum entropy approach
Automatica ( IF 6.4 ) Pub Date : 2024-02-08 , DOI: 10.1016/j.automatica.2024.111557
Mingyu Park , Jaeuk Shin , Insoon Yang

Partially observable Markov decision processes (POMDPs) is a rich mathematical framework that embraces a large class of complex sequential decision-making problems under uncertainty with limited observations. However, the complexity of POMDPs poses various computational challenges, motivating the need for an efficient algorithm that rapidly finds a good enough suboptimal solution. In this paper, we propose a novel accelerated offline POMDP algorithm exploiting Anderson acceleration (AA) that is capable of efficiently solving fixed-point problems using previous solution estimates. Our algorithm is based on the Q-function approximation (QMDP) method to alleviate the scalability issue inherent in POMDPs. Inspired by the quasi-Newton interpretation of AA, we propose a maximum entropy variant of QMDP, which we call , to fully benefit from AA. We prove that the overall algorithm converges to the suboptimal solution obtained by soft QMDP. Our algorithm can also be implemented in a model-free manner using simulation data. Provable error bounds on the residual and the solution are provided to examine how the simulation errors are propagated through the proposed algorithm. Finally, the performance of our algorithm is tested on several benchmark problems. According to the results of our experiments, the proposed algorithm converges significantly faster without degrading the solution quality compared to its standard counterparts.

中文翻译:

部分可观察马尔可夫决策过程的安德森加速:最大熵方法

部分可观测马尔可夫决策过程(POMDP)是一个丰富的数学框架,它包含在有限观测的不确定性下的一大类复杂的顺序决策问题。然而,POMDP 的复杂性带来了各种计算挑战,激发了对快速找到足够好的次优解决方案的高效算法的需求。在本文中,我们提出了一种利用安德森加速(AA)的新型加速离线 POMDP 算法,该算法能够使用先前的解估计有效地解决定点问题。我们的算法基于 Q 函数近似 (QMDP) 方法,以缓解 POMDP 固有的可扩展性问题。受 AA 拟牛顿解释的启发,我们提出了 QMDP 的最大熵变体,我们称之为 ,以充分受益于 AA。我们证明了整体算法收敛于软 QMDP 获得的次优解。我们的算法还可以使用模拟数据以无模型的方式实现。提供了残差和解决方案的可证明误差范围,以检查模拟误差如何通过所提出的算法传播。最后,我们的算法的性能在几个基准问题上进行了测试。根据我们的实验结果,与标准算法相比,所提出的算法收敛速度明显更快,并且不会降低解的质量。
更新日期:2024-02-08
down
wechat
bug