当前位置: X-MOL 学术Rob. Auton. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RevAP: A bankruptcy-based algorithm to solve the multi-agent credit assignment problem in task start threshold-based multi-agent systems
Robotics and Autonomous Systems ( IF 4.3 ) Pub Date : 2024-01-12 , DOI: 10.1016/j.robot.2024.104631
Hossein Yarahmadi , Mohammad Ebrahim Shiri , Hamidreza Navidi , Arash Sharifi , Moharram Challenger

Multi-Agent Systems (MASs) are the prominent symbol of Distributed Artificial Intelligence (DAI). Learning in MAS, which is commonly based on Reinforcement Learning (RL), is one of the problems that play an essential role in unknown environments. In this study aimed at solving the Multi-agent Credit Assignment (MCA) problem, we introduce the Task Start Threshold (TST) of agents as a new constraint in a multi-score operational environment, transforming the MCA into a bankruptcy problem. In the following, considering the bankruptcy concept, a new base algorithm, which is called Reverse Adjusted Proportional (RevAP), is introduced. Based on this algorithm, three methods PTST, T-MAS, and T-KAg, were presented to solve the MCA with different strategies. The proposed methods were evaluated in terms of group learning rate, confidence, expertness, certainty, efficiency, correctness, and density in comparison to the state-of-the-art methods such as ranking methods, dynamic, history-based as knowledge-based methods, Counterfactual Multi-Agent Policy Gradient (COMA) as an example of policy-based methods, Value-Decomposition Network (VDN) as an example of value-based methods, and Shapley Q- value Deep Deterministic Policy Gradient (SQDDPG) as a game theory-based method. The results reveal the better performance of the proposed approach compared to the existing methods based on the majority of the parameters.



中文翻译:

RevAP:一种基于破产的算法,用于解决基于任务启动阈值的多智能体系统中的多智能体信用分配问题

多代理系统(MAS)是分布式人工智能(DAI)的突出标志。MAS 中的学习通常基于强化学习 (RL),是在未知环境中发挥重要作用的问题之一。在这项旨在解决多智能体信用分配(MCA)问题的研究中,我们引入了智能体的任务启动阈值(TST)作为多分数运行环境中的新约束,将MCA转化为破产问题。下面,考虑到破产概念,引入一种新的基本算法,称为反向调整比例(RevAP)。基于该算法,提出了三种方法PTST、T-MAS和T-KAg,以不同的策略求解MCA。与最先进的方法(例如排名方法、动态方法、基于历史的方法和基于知识的方法)相比,从群体学习率、置信度、专业性、确定性、效率、正确性和密度方面对所提出的方法进行了评估方法,反事实多智能体策略梯度(COMA)作为基于策略的方法的示例,价值分解网络(VDN)作为基于值的方法的示例,沙普利Q值深度确定性策略梯度(SQDDPG)作为基于值的方法的示例。基于博弈论的方法。结果表明,与基于大多数参数的现有方法相比,所提出的方法具有更好的性能。

更新日期:2024-01-12
down
wechat
bug