Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration,Science China Information Sciences

当前位置： X-MOL 学术 › Sci. China Inf. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration
Science China Information Sciences ( IF 8.8 ) Pub Date : 2024-03-27 , DOI: 10.1007/s11432-023-3982-y
Xiushan Jiang , Yanshuang Wang , Dongya Zhao , Ling Shi

In this study, the Pareto optimal strategy problem was investigated for multi-player mean-field stochastic systems governed by Itô differential equations using the reinforcement learning (RL) method. A partially model-free solution for Pareto-optimal control was derived. First, by applying the convexity of cost functions, the Pareto optimal control problem was solved using a weighted-sum optimal control problem. Subsequently, using on-policy RL, we present a novel policy iteration (PI) algorithm based on the ℌ-representation technique. In particular, by alternating between the policy evaluation and policy update steps, the Pareto optimal control policy is obtained when no further improvement occurs in system performance, which eliminates directly solving complicated cross-coupled generalized algebraic Riccati equations (GAREs). Practical numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.

中文翻译：

使用策略迭代的平均场随机多玩家系统的在线帕累托最优控制

在本研究中，使用强化学习（RL）方法研究了伊藤微分方程控制的多人平均场随机系统的帕累托最优策略问题。导出了帕累托最优控制的部分无模型解决方案。首先，通过应用成本函数的凸性，使用加权和最优控制问题来解决帕累托最优控制问题。随后，使用同策略 RL，我们提出了一种基于ℌ表示技术的新型策略迭代 (PI) 算法。特别是，通过策略评估和策略更新步骤之间的交替，当系统性能没有进一步改善时，可以获得帕累托最优控制策略，这消除了直接求解复杂的交叉耦合广义代数Riccati方程（GARE）。给出了实际的数值例子来证明所提出算法的有效性。

更新日期：2024-03-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>