Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Bolshakov, V. E.; Alfimtsev, A. N.

doi:10.1134/S1064562423701132

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Published: 11 March 2024

Volume 108, pages S382–S392, (2023)
Cite this article

Doklady Mathematics Aims and scope Submit manuscript

V. E. Bolshakov¹ &
A. N. Alfimtsev¹

28 Accesses
Explore all metrics

Abstract

In the rapidly evolving field of reinforcement learning, combination of hierarchical and multiagent learning methods presents unique challenges and opens up new opportunities. This paper discusses a combination of multilevel hierarchical learning with subgoal discovery and multiagent reinforcement learning with hindsight experience replay. Combining these approaches leads to the creation of multiagent subgoal hierarchy algorithm (MASHA) that allows multiple agents to learn efficiently in complex environments, including environments with sparse rewards. We demonstrate the results of the proposed approach in one of these environments inside the StarCraft II strategy game, in addition to making comparisons with other existing approaches. The proposed algorithm is developed in the paradigm of centralized learning with decentralized execution, which makes it possible to achieve a balance between coordination and autonomy of agents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

REFERENCES

S. Singh, R. L. Lewis, A. G. Barto, and J. Sorg, “Intrinsically motivated reinforcement learning: an evolutionary perspective,” IEEE Trans. Auton. Mental Dev. 2, 70–82 (2010). https://doi.org/10.1109/tamd.2010.2051031
Article Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, and others, “Human-level control through deep reinforcement learning,” Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961
Article Google Scholar
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Yu. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Yu. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature 575, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
Article Google Scholar
A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” Electron. Imaging 29 (19), 70–76 (2017). https://doi.org/10.2352/issn.2470-1173.2017.19.avm-023
Article Google Scholar
Y. Yang, “Many-agent reinforcement learning,” PhD Thesis (Department of Computer Science, Univ. College London, London, 2021), p. 327.
M. A. Khamis and W. Gomaa, “Enhanced multiagent multi-objective reinforcement learning for urban traffic light control,” in 2012 11th Int. Conf. on Machine Learning and Applications, Boca Raton, Fla., 2000 (IEEE, 2000), pp. 1151–1158. https://doi.org/10.1109/icmla.2012.108
L. Zheng, “Episodic multi-agent reinforcement learning with curiosity-driven exploration,” Adv. Neural Inf. Process. Syst. 34, 3757–3769 (2021).
Google Scholar
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” J. Artif. Intell. Res. 47, 253–279 (2015). https://doi.org/10.1613/jair.3912
Article Google Scholar
A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dynamic Syst. 13, 41–77 (2003). https://doi.org/10.1023/a:1022140919877
Article MathSciNet Google Scholar
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” J. Art-if. Intell. Res. 13, 227–303 (2000). https://doi.org/10.1613/jair.639
Article MathSciNet Google Scholar
R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artif. Intell. 112, 181–211 (1999). https://doi.org/10.1016/s0004-3702(99)00052-1
Article MathSciNet Google Scholar
M. Samvelyan, T. Rashid, C. Schroeder de Witt, G. Farquahr, N. Nardelli, T. G. J. Rudner, Ch.-M. Hung, Ph. H. S. Torr, J. Foerster, and Sh. Whiteson, “The StarCraft multi-agent challenge,” (2019).
P. Dayan and G. Hinton, “Feudal reinforcement learning,” Adv. Neural Inf. Process. Syst. 14, 271–278 (1993).
Google Scholar
O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” (2018), Vol. 31, pp. 3307–3317.
A. Levy, G. Konidaris, R. Platt, and K. Saenko, “Learning multi-level hierarchies with hindsight,” in Proc. 7th Int. Conf. on Learning Representations (2019), pp. 1–15.
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Adv. Neural Inf. Process. Syst. 30, 1–11 (2017).
Google Scholar
P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” Proc. AAAI Conf. Artif. Intell. 31, 1726–1734 (2017). https://doi.org/10.1609/aaai.v31i1.10916
J. Yang, I. Borovikov, and H. Zha, “Hierarchical cooperative multi-agent reinforcement learning with skill discovery,” in Proc. 19th Int. Conf. on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, 2020 (International Foundation for Autonomous Agents and Multiagent Systems, Richland, S.C., 2020), pp. 1566–1574.
T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res. 21 (1) (2020).
L. Ryan, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Adv. Neural Inf. Process. Syst. 30, 1–12 (2017).
Google Scholar
D. Yali, “Liir: Learning individual intrinsic reward in multi-agent reinforcement learning,” Adv. Neural Inf. Process. Syst. 32, 1–12 (2019).
Google Scholar
C. Amato, G. Konidaris, G. Cruz, C. A. Maynor, J. P. How, and L. P. Kaelbling, “Planning for decentralized control of multiple robots under uncertainty,” in 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), Seattle, Wash., 2015 (IEEE, 2015), pp. 1241–1248. https://doi.org/10.1109/icra.2015.7139350
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (MIT Press, 2018).
Google Scholar
T. Lillicrap, “Continuous control with deep reinforcement learning,” arXiv Preprint (2015). https://doi.org/10.48550/arXiv.1509.02971
D. P. Kingma and B. J. Adam, “A method for stochastic optimization,” arXiv Preprint (2014).

Download references

Funding

The work is supported by the State assignment no. FSFN-2023-0006.

Author information

Authors and Affiliations

Bauman Moscow State Technical University, Moscow, Russia
V. E. Bolshakov & A. N. Alfimtsev

Authors

V. E. Bolshakov
View author publications
You can also search for this author in PubMed Google Scholar
A. N. Alfimtsev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to V. E. Bolshakov or A. N. Alfimtsev.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Translated by E. Oborin

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bolshakov, V.E., Alfimtsev, A.N. Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes. Dokl. Math. 108 (Suppl 2), S382–S392 (2023). https://doi.org/10.1134/S1064562423701132

Download citation

Received: 01 September 2023
Revised: 29 September 2023
Accepted: 18 October 2023
Published: 11 March 2024
Issue Date: December 2023
DOI: https://doi.org/10.1134/S1064562423701132

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions