当前位置: X-MOL 学术IEEE Trans. Games › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cognition-Driven Multiagent Policy Learning Framework for Promoting Cooperation
IEEE Transactions on Games ( IF 2.3 ) Pub Date : 2022-06-27 , DOI: 10.1109/tg.2022.3186386
Zhiqiang Pu 1 , Huimu Wang 2 , Boyin Liu 1 , Jianqiang Yi 1
Affiliation  

Many attempts have been made to promote cooperation for multiagent systems. However, several issues that draw less attentions but may dramatically degrade the cooperation performance still exist, such as redundant information interactions among neighbors, and difficulties in understanding complex and dynamic environments from high-level cognition. To address these limitations, a cognition-driven multiagent policy (CDMAP) learning framework is proposed in this article. It includes a cognition difference network (CDN), a coupling cognition network (CCN), and a policy optimization network (PON). CDN is designed based on a variational autoencoder, where a concept of cognition difference is defined to prune redundant interactions among agents for more efficient communication. Based on the pruned topology, CCN captures the hidden representations of the surrounding environment. Several coupling graph attention layers are incorporated in CCN, each layer with different but coupling adjacent matrices, yielding a comprehensive state understanding from multiple representation spaces. Based on the captured hidden states, PON generates the final policies, where QMIX is adopted as a value factorization method to alleviate the credit-assignment problem. At last, CDMAP is evaluated through two representative multiagent games including Google Research Football and StarCraft II . The results demonstrate its superior effectiveness compared with existing methods.

中文翻译:

促进合作的认知驱动多主体政策学习框架

人们已经做出了许多尝试来促进多智能体系统的合作。然而,一些不太受关注但可能严重降低合作绩效的问题仍然存在,例如邻居之间的冗余信息交互以及从高层认知中理解复杂动态环境的困难。为了解决这些限制,本文提出了一种认知驱动的多智能体策略(CDMAP)学习框架。它包括认知差异网络(CDN)、耦合认知网络(CCN)和策略优化网络(PON)。CDN 是基于变分自动编码器设计的,其中定义了认知差异的概念来修剪代理之间的冗余交互,以实现更有效的通信。基于修剪后的拓扑,CCN 捕获周围环境的隐藏表示。CCN 中包含多个耦合图注意层,每个层具有不同但耦合的相邻矩阵,从多个表示空间产生全面的状态理解。基于捕获的隐藏状态,PON 生成最终策略,其中采用 QMIX 作为价值分解方法来缓解信用分配问题。最后,通过 Google Research Football 和 Google Research Football 等两个具有代表性的多智能体游戏对 CDMAP 进行了评估。其中采用 QMIX 作为价值分解方法来缓解信用分配问题。最后,通过 Google Research Football 和 Google Research Football 等两个具有代表性的多智能体游戏对 CDMAP 进行了评估。其中采用 QMIX 作为价值分解方法来缓解信用分配问题。最后,通过 Google Research Football 和 Google Research Football 等两个具有代表性的多智能体游戏对 CDMAP 进行了评估。星际争霸二。结果表明,与现有方法相比,其具有优越的有效性。
更新日期:2022-06-27
down
wechat
bug