Skip to main content
Log in

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

  • Published:
Doklady Mathematics Aims and scope Submit manuscript

Abstract

In the rapidly evolving field of reinforcement learning, combination of hierarchical and multiagent learning methods presents unique challenges and opens up new opportunities. This paper discusses a combination of multilevel hierarchical learning with subgoal discovery and multiagent reinforcement learning with hindsight experience replay. Combining these approaches leads to the creation of multiagent subgoal hierarchy algorithm (MASHA) that allows multiple agents to learn efficiently in complex environments, including environments with sparse rewards. We demonstrate the results of the proposed approach in one of these environments inside the StarCraft II strategy game, in addition to making comparisons with other existing approaches. The proposed algorithm is developed in the paradigm of centralized learning with decentralized execution, which makes it possible to achieve a balance between coordination and autonomy of agents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.

Similar content being viewed by others

REFERENCES

  1. S. Singh, R. L. Lewis, A. G. Barto, and J. Sorg, “Intrinsically motivated reinforcement learning: an evolutionary perspective,” IEEE Trans. Auton. Mental Dev. 2, 70–82 (2010). https://doi.org/10.1109/tamd.2010.2051031

    Article  Google Scholar 

  2. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, and others, “Human-level control through deep reinforcement learning,” Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  3. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  4. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Yu. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Yu. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature 575, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z

    Article  Google Scholar 

  5. A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep reinforcement learning framework for autonomous driving,” Electron. Imaging 29 (19), 70–76 (2017). https://doi.org/10.2352/issn.2470-1173.2017.19.avm-023

    Article  Google Scholar 

  6. Y. Yang, “Many-agent reinforcement learning,” PhD Thesis (Department of Computer Science, Univ. College London, London, 2021), p. 327.

  7. M. A. Khamis and W. Gomaa, “Enhanced multiagent multi-objective reinforcement learning for urban traffic light control,” in 2012 11th Int. Conf. on Machine Learning and Applications, Boca Raton, Fla., 2000 (IEEE, 2000), pp. 1151–1158. https://doi.org/10.1109/icmla.2012.108

  8. L. Zheng, “Episodic multi-agent reinforcement learning with curiosity-driven exploration,” Adv. Neural Inf. Process. Syst. 34, 3757–3769 (2021).

    Google Scholar 

  9. M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” J. Artif. Intell. Res. 47, 253–279 (2015). https://doi.org/10.1613/jair.3912

    Article  Google Scholar 

  10. A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dynamic Syst. 13, 41–77 (2003). https://doi.org/10.1023/a:1022140919877

    Article  MathSciNet  Google Scholar 

  11. T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” J. Art-if. Intell. Res. 13, 227–303 (2000). https://doi.org/10.1613/jair.639

    Article  MathSciNet  Google Scholar 

  12. R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artif. Intell. 112, 181–211 (1999). https://doi.org/10.1016/s0004-3702(99)00052-1

    Article  MathSciNet  Google Scholar 

  13. M. Samvelyan, T. Rashid, C. Schroeder de Witt, G. Farquahr, N. Nardelli, T. G. J. Rudner, Ch.-M. Hung, Ph. H. S. Torr, J. Foerster, and Sh. Whiteson, “The StarCraft multi-agent challenge,” (2019).

  14. P. Dayan and G. Hinton, “Feudal reinforcement learning,” Adv. Neural Inf. Process. Syst. 14, 271–278 (1993).

    Google Scholar 

  15. O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” (2018), Vol. 31, pp. 3307–3317.

  16. A. Levy, G. Konidaris, R. Platt, and K. Saenko, “Learning multi-level hierarchies with hindsight,” in Proc. 7th Int. Conf. on Learning Representations (2019), pp. 1–15.

  17. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Adv. Neural Inf. Process. Syst. 30, 1–11 (2017).

    Google Scholar 

  18. P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” Proc. AAAI Conf. Artif. Intell. 31, 1726–1734 (2017). https://doi.org/10.1609/aaai.v31i1.10916

  19. J. Yang, I. Borovikov, and H. Zha, “Hierarchical cooperative multi-agent reinforcement learning with skill discovery,” in Proc. 19th Int. Conf. on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, 2020 (International Foundation for Autonomous Agents and Multiagent Systems, Richland, S.C., 2020), pp. 1566–1574.

  20. T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res. 21 (1) (2020).

  21. L. Ryan, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Adv. Neural Inf. Process. Syst. 30, 1–12 (2017).

    Google Scholar 

  22. D. Yali, “Liir: Learning individual intrinsic reward in multi-agent reinforcement learning,” Adv. Neural Inf. Process. Syst. 32, 1–12 (2019).

    Google Scholar 

  23. C. Amato, G. Konidaris, G. Cruz, C. A. Maynor, J. P. How, and L. P. Kaelbling, “Planning for decentralized control of multiple robots under uncertainty,” in 2015 IEEE Int. Conf. on Robotics and Automation (ICRA), Seattle, Wash., 2015 (IEEE, 2015), pp. 1241–1248. https://doi.org/10.1109/icra.2015.7139350

  24. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (MIT Press, 2018).

    Google Scholar 

  25. T. Lillicrap, “Continuous control with deep reinforcement learning,” arXiv Preprint (2015). https://doi.org/10.48550/arXiv.1509.02971

  26. D. P. Kingma and B. J. Adam, “A method for stochastic optimization,” arXiv Preprint (2014).

Download references

Funding

The work is supported by the State assignment no. FSFN-2023-0006.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to V. E. Bolshakov or A. N. Alfimtsev.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Translated by E. Oborin

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bolshakov, V.E., Alfimtsev, A.N. Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes. Dokl. Math. 108 (Suppl 2), S382–S392 (2023). https://doi.org/10.1134/S1064562423701132

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1064562423701132

Keywords:

Navigation