A simulation and experimentation architecture for resilient cooperative multiagent reinforcement learning models operating in contested and dynamic environments,SIMULATION

当前位置： X-MOL 学术 › Simulation › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A simulation and experimentation architecture for resilient cooperative multiagent reinforcement learning models operating in contested and dynamic environments
SIMULATION ( IF 1.6 ) Pub Date : 2024-03-23 , DOI: 10.1177/00375497241232432
Ishan Honhaga ₁ , Claudia Szabo ₁

Affiliation

Cooperative multiagent reinforcement learning approaches are increasingly being used to make decisions in contested and dynamic environments, which tend to be wildly different from the environments used to train them. As such, there is a need for a more in-depth understanding of their resilience and robustness in conditions such as network partitions, node failures, or attacks. In this article, we propose a modeling and simulation framework that explores the resilience of four c-MARL models when faced with different types of attacks, and the impact that training with different perturbations has on the effectiveness of these attacks. We show that c-MARL approaches are highly vulnerable to perturbations of observation, action reward, and communication, showing more than 80% drop in the performance from the baseline. We also show that appropriate training with perturbations can dramatically improve performance in some cases, however, can also result in overfitting, making the models less resilient against other attacks. This is a first step toward a more in-depth understanding of the resilience c-MARL models and the effect that contested environments can have on their behavior and toward resilience of complex systems in general.

中文翻译：

在竞争和动态环境中运行的弹性协作多智能体强化学习模型的模拟和实验架构

协作多智能体强化学习方法越来越多地用于在竞争性和动态环境中做出决策，这些环境往往与用于训练它们的环境有很大不同。因此，需要更深入地了解它们在网络分区、节点故障或攻击等情况下的弹性和鲁棒性。在本文中，我们提出了一个建模和模拟框架，探讨了四种 c-MARL 模型在面对不同类型攻击时的恢复能力，以及不同扰动训练对这些攻击有效性的影响。我们表明，c-MARL 方法非常容易受到观察、行动奖励和沟通的干扰，表现较基线下降了 80% 以上。我们还表明，在某些情况下，适当的扰动训练可以显着提高性能，但也可能导致过度拟合，使模型对其他攻击的弹性较差。这是更深入地了解弹性 c-MARL 模型以及竞争环境对其行为和复杂系统弹性的影响的第一步。

更新日期：2024-03-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>