Abstract
In the rapidly evolving field of reinforcement learning, combination of hierarchical and multiagent learning methods presents unique challenges and opens up new opportunities. This paper discusses a combination of multilevel hierarchical learning with subgoal discovery and multiagent reinforcement learning with hindsight experience replay. Combining these approaches leads to the creation of multiagent subgoal hierarchy algorithm (MASHA) that allows multiple agents to learn efficiently in complex environments, including environments with sparse rewards. We demonstrate the results of the proposed approach in one of these environments inside the StarCraft II strategy game, in addition to making comparisons with other existing approaches. The proposed algorithm is developed in the paradigm of centralized learning with decentralized execution, which makes it possible to achieve a balance between coordination and autonomy of agents.
Similar content being viewed by others
REFERENCES
S. Singh, R. L. Lewis, A. G. Barto, and J. Sorg, “Intrinsically motivated reinforcement learning: an evolutionary perspective,” IEEE Trans. Auton. Mental Dev. 2, 70–82 (2010). https://doi.org/10.1109/tamd.2010.2051031
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, and others, “Human-level control through deep reinforcement learning,” Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Yu. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Yu. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature 575, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” Electron. Imaging 29 (19), 70–76 (2017). https://doi.org/10.2352/issn.2470-1173.2017.19.avm-023
Y. Yang, “Many-agent reinforcement learning,” PhD Thesis (Department of Computer Science, Univ. College London, London, 2021), p. 327.
M. A. Khamis and W. Gomaa, “Enhanced multiagent multi-objective reinforcement learning for urban traffic light control,” in 2012 11th Int. Conf. on Machine Learning and Applications, Boca Raton, Fla., 2000 (IEEE, 2000), pp. 1151–1158. https://doi.org/10.1109/icmla.2012.108
L. Zheng, “Episodic multi-agent reinforcement learning with curiosity-driven exploration,” Adv. Neural Inf. Process. Syst. 34, 3757–3769 (2021).
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” J. Artif. Intell. Res. 47, 253–279 (2015). https://doi.org/10.1613/jair.3912
A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dynamic Syst. 13, 41–77 (2003). https://doi.org/10.1023/a:1022140919877
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” J. Art-if. Intell. Res. 13, 227–303 (2000). https://doi.org/10.1613/jair.639
R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artif. Intell. 112, 181–211 (1999). https://doi.org/10.1016/s0004-3702(99)00052-1
M. Samvelyan, T. Rashid, C. Schroeder de Witt, G. Farquahr, N. Nardelli, T. G. J. Rudner, Ch.-M. Hung, Ph. H. S. Torr, J. Foerster, and Sh. Whiteson, “The StarCraft multi-agent challenge,” (2019).
P. Dayan and G. Hinton, “Feudal reinforcement learning,” Adv. Neural Inf. Process. Syst. 14, 271–278 (1993).
O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” (2018), Vol. 31, pp. 3307–3317.
A. Levy, G. Konidaris, R. Platt, and K. Saenko, “Learning multi-level hierarchies with hindsight,” in Proc. 7th Int. Conf. on Learning Representations (2019), pp. 1–15.
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Adv. Neural Inf. Process. Syst. 30, 1–11 (2017).
P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” Proc. AAAI Conf. Artif. Intell. 31, 1726–1734 (2017). https://doi.org/10.1609/aaai.v31i1.10916
J. Yang, I. Borovikov, and H. Zha, “Hierarchical cooperative multi-agent reinforcement learning with skill discovery,” in Proc. 19th Int. Conf. on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, 2020 (International Foundation for Autonomous Agents and Multiagent Systems, Richland, S.C., 2020), pp. 1566–1574.
T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res. 21 (1) (2020).
L. Ryan, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Adv. Neural Inf. Process. Syst. 30, 1–12 (2017).
D. Yali, “Liir: Learning individual intrinsic reward in multi-agent reinforcement learning,” Adv. Neural Inf. Process. Syst. 32, 1–12 (2019).
C. Amato, G. Konidaris, G. Cruz, C. A. Maynor, J. P. How, and L. P. Kaelbling, “Planning for decentralized control of multiple robots under uncertainty,” in 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), Seattle, Wash., 2015 (IEEE, 2015), pp. 1241–1248. https://doi.org/10.1109/icra.2015.7139350
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (MIT Press, 2018).
T. Lillicrap, “Continuous control with deep reinforcement learning,” arXiv Preprint (2015). https://doi.org/10.48550/arXiv.1509.02971
D. P. Kingma and B. J. Adam, “A method for stochastic optimization,” arXiv Preprint (2014).
Funding
The work is supported by the State assignment no. FSFN-2023-0006.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors of this work declare that they have no conflicts of interest.
Additional information
Translated by E. Oborin
Publisher’s Note.
Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bolshakov, V.E., Alfimtsev, A.N. Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes. Dokl. Math. 108 (Suppl 2), S382–S392 (2023). https://doi.org/10.1134/S1064562423701132
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1064562423701132