Laser Learning Environment: A new environment for coordination-critical multi-agent tasks,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Laser Learning Environment: A new environment for coordination-critical multi-agent tasks
arXiv - CS - Multiagent Systems Pub Date : 2024-04-04 , DOI: arxiv-2404.03596
Yannick Molinghen, Raphaël Avalos, Mark Van Achter, Ann Nowé, Tom Lenaerts

We introduce the Laser Learning Environment (LLE), a collaborative multi-agent reinforcement learning environment in which coordination is central. In LLE, agents depend on each other to make progress (interdependence), must jointly take specific sequences of actions to succeed (perfect coordination), and accomplishing those joint actions does not yield any intermediate reward (zero-incentive dynamics). The challenge of such problems lies in the difficulty of escaping state space bottlenecks caused by interdependence steps since escaping those bottlenecks is not rewarded. We test multiple state-of-the-art value-based MARL algorithms against LLE and show that they consistently fail at the collaborative task because of their inability to escape state space bottlenecks, even though they successfully achieve perfect coordination. We show that Q-learning extensions such as prioritized experience replay and n-steps return hinder exploration in environments with zero-incentive dynamics, and find that intrinsic curiosity with random network distillation is not sufficient to escape those bottlenecks. We demonstrate the need for novel methods to solve this problem and the relevance of LLE as cooperative MARL benchmark.

中文翻译：

激光学习环境：协调关键型多智能体任务的新环境

我们介绍了激光学习环境（LLE），这是一种以协调为核心的协作多智能体强化学习环境。在 LLE 中，智能体相互依赖才能取得进展（相互依赖），必须共同采取特定的行动序列才能成功（完美协调），并且完成这些联合行动不会产生任何中间奖励（零激励动态）。此类问题的挑战在于难以逃脱由相互依赖步骤引起的状态空间瓶颈，因为逃脱这些瓶颈是没有奖励的。我们针对 LLE 测试了多种最先进的基于值的 MARL 算法，结果表明，尽管它们成功实现了完美协调，但由于无法摆脱状态空间瓶颈，它们在协作任务中始终失败。我们表明，Q 学习扩展（例如优先经验重放和 n 步返回）阻碍了在零激励动态环境中的探索，并发现随机网络蒸馏的内在好奇心不足以摆脱这些瓶颈。我们证明了解决这个问题的新方法的必要性以及 LLE 作为合作 MARL 基准的相关性。

更新日期：2024-04-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>