Decentralized fused-learner architectures for Bayesian reinforcement learning,Artificial Intelligence

当前位置： X-MOL 学术 › Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decentralized fused-learner architectures for Bayesian reinforcement learning
Artificial Intelligence ( IF 14.4 ) Pub Date : 2024-02-13 , DOI: 10.1016/j.artint.2024.104094
Augustin A. Saucan , Subhro Das , Moe Z. Win

Decentralized training is a robust solution for learning over an extensive network of distributed agents. Many existing solutions involve the averaging of locally inferred parameters which constrain the architecture to independent agents with identical learning algorithms. Here, we propose decentralized fused-learner architectures for Bayesian reinforcement learning, named fused Bayesian-learner architectures (FBLAs), that are capable of learning an optimal policy by fusing potentially heterogeneous Bayesian policy gradient learners, i.e., agents that employ different learning architectures to estimate the gradient of a control policy. The novelty of FBLAs relies on fusing the full posterior distributions of the local policy gradients. The inclusion of higher-order information, i.e., probabilistic uncertainty, is employed to robustly fuse the locally-trained parameters. FBLAs find the barycenter of all local posterior densities by minimizing the total Kullback–Leibler divergence from the barycenter distribution to the local posterior densities. The proposed FBLAs are demonstrated on a sensor-selection problem for Bernoulli tracking, where multiple sensors observe a dynamic target and only a subset of sensors is allowed to be active at any time.

中文翻译：

用于贝叶斯强化学习的去中心化融合学习器架构

去中心化训练是通过广泛的分布式代理网络进行学习的强大解决方案。许多现有的解决方案涉及局部推断参数的平均，这将架构限制为具有相同学习算法的独立代理。在这里，我们提出了用于贝叶斯强化学习的去中心化融合学习器架构，称为融合贝叶斯学习器架构（FBLA），它能够通过融合潜在的异构贝叶斯策略梯度学习器（即采用不同学习架构来学习最优策略）估计控制策略的梯度。 FBLA 的新颖性依赖于融合局部策略梯度的完整后验分布。采用高阶信息（即概率不确定性）来稳健地融合本地训练的参数。 FBLA 通过最小化从重心分布到局部后验密度的总 Kullback-Leibler 散度来找到所有局部后验密度的重心。所提出的 FBLA 在伯努利跟踪的传感器选择问题上进行了演示，其中多个传感器观察动态目标，并且任何时候只允许传感器的子集处于活动状态。

更新日期：2024-02-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>