Accelerating deep reinforcement learning via knowledge-guided policy network

Yu, Yuanqiang; Zhang, Peng; Zhao, Kai; Zheng, Yan; Hao, Jianye

doi:10.1007/s10458-023-09600-1

Accelerating deep reinforcement learning via knowledge-guided policy network

Published: 18 February 2023

Volume 37, article number 17, (2023)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Yuanqiang Yu¹,
Peng Zhang¹,
Kai Zhao¹,
Yan Zheng ORCID: orcid.org/0000-0003-2741-058X¹ &
…
Jianye Hao¹

964 Accesses
1 Citation
Explore all metrics

Abstract

Deep reinforcement learning has contributed to dramatic advances in many tasks, such as playing games, controlling robots, and navigating complex environments. However, it requires many interactions with the environment. This is different from the human learning process since humans can use prior knowledge, which can significantly speed up the learning process as it avoids unnecessary exploration. Previous works integrating knowledge in RL did not model uncertainty in human cognition, which reduces the reliability of knowledge. In this paper, we propose a knowledge-guided policy network, a novel framework that combines suboptimal human knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller representing human knowledge and a refined module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing reinforcement learning algorithms such as PPO, AC, and SAC. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, significantly improves the learning efficiency of basic RL algorithms, even with very low-performance human prior knowledge. Additional experiments are conducted on the number of fuzzy rules and the interpretability of the policy, which make our proposed framework more complete and reasonable. The code for this research is released under the project page of https://github.com/yuyuanq/reinforcement-learning-using-knowledge-controller.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KGRL: A Method of Reinforcement Learning Based on Knowledge Guidance

Efficient hierarchical policy network with fuzzy rules

Article 30 August 2021

Policy-Approximation Based Deep Reinforcement Learning Techniques: An Overview

References

Berenji, H. R. (1992). A reinforcement learning-based architecture for fuzzy logic control. International Journal of Approximate Reasoning, 6(2), 267–292.
Article MATH Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym
Celemin, C., & Ruiz-del Solar, J. (2019). An interactive framework for learning continuous actions policies based on corrective feedback. Journal of Intelligent & Robotic Systems, 95(1), 77–97.
Article Google Scholar
Cheng, C.A., Yan, X., Wagener, N., & Boots, B. (2018). Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.
MATH Google Scholar
Cruz, F., Twiefel, J., Magg, S., Weber, C., & Wermter, S. (2015). Interactive reinforcement learning through speech guidance in a domestic scenario. In: 2015 international joint conference on neural networks (IJCNN), (pp. 1–8). IEEE
Dai, X., Li, C. K., & Rad, A. B. (2005). An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 6(3), 285–293.
Article Google Scholar
Eysenbach, B., Salakhutdinov, R.R., & Levine, S. (2019). Search on the replay buffer: Bridging planning and reinforcement learning. Advances in Neural Information Processing Systems 32
Fischer, M., Balunovic, M., Drachsler-Cohen, D., Gehr, T., Zhang, C., & Vechev, M. (2019). Dl2: Training and querying neural networks with logic. In: Proceedings of international conference on machine learning (pp. 1931–1941).
Garcez, A.S.d., Broda, K.B., & Gabbay, D.M. (2012). Neural-symbolic learning systems: Foundations and applications. Berlin: Springer.
MATH Google Scholar
Ha, D., Dai, A., & Le, Q.V. (2016). Hypernetworks. arXiv:1609.09106
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861–1870). PMLR.
Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In: Advances in neural information processing systems (pp. 4565–4573).
Ho, M.K., Littman, M.L., Cushman, F., & Austerweil, J.L. (2015). Teaching with rewards and punishments: Reinforcement or communication? In: CogSci
Hu, Z., Ma, X., Liu, Z., Hovy, E., & Xing, E. (2016). Harnessing deep neural networks with logic rules. arXiv:1603.06318
Jang, J. S. (1993). Anfis: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics, 23(3), 665–685.
Article Google Scholar
Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980
Knox, W.B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the fifth international conference on Knowledge capture (pp. 9–16)
Konda, V.R., & Tsitsiklis, J.N. (2000). Actor-critic algorithms. In: Advances in neural information processing systems (pp. 1008–1014). Citeseer
Kuhlmann, G., Stone, P., Mooney, R., & Shavlik, J. (2004). Guiding a reinforcement learner with natural language advice: Initial results in Robocup Soccer. In The AAAI-2004 workshop on supervisory control of learning and adaptive systems. San Jose, CA
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Article Google Scholar
MacGlashan, J., Ho, M.K., Loftin, R., Peng, B., Wang, G., Roberts, D.L., Taylor, M.E., & Littman, M.L. (2017). Interactive learning from policy-dependent human feedback. In International conference on machine learning (pp. 2285–2294). PMLR
Mathewson, K.W., & Pilarski, P.M. (2016). Simultaneous control and human feedback in the training of a robotic agent with actor-critic reinforcement learning. arXiv preprint arXiv:1606.06979
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
Article Google Scholar
Najar, A., & Chetouani, M. (2021). Reinforcement learning with human advice: a survey. Frontiers in Robotics and AI 8
De Raedt, L., & Kimmig, A. (2015). Probabilistic (logic) programming concepts. Machine Learning, 100(1), 5–47.
Article MathSciNet MATH Google Scholar
Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1), 107–136.
Article MATH Google Scholar
Rosenstein, M.T., Barto, A.G., Si, J., Barto, A., Powell, W., & Wunsch, D. (2004). Supervised actor-critic reinforcement learning. Learning and approximate dynamic programming: Scaling up to the real world (pp. 359–380).
Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 627–635). JMLR Workshop and Conference Proceedings.
Schmidhuber, J. (1992). Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1), 131–139.
Article Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In Proceedings of international conference on machine learning (pp. 1889–1897).
Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347
Silva, A., & Gombolay, M. (2021). Encoding human domain knowledge to warm start reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence (vol. 35, pp. 5042–5050).
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., & Van, d.D.G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., & Lanctot, M. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Article Google Scholar
Squire, S., Tellex, S., Arumugam, D., & Yang, L. (2015). Grounding English commands to reward functions. In Robotics: Science and systems
Sun, J., Karray, F., Basir, O., & Kamel, M. (2002). Fuzzy logic-based natural language processing and its application to speech recognition. In 3rd WSES international conference on fuzzy sets and systems (pp 11–15).
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: an introduction. Cambridge: MIT Press.
MATH Google Scholar
Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics, 1, 116–132.
Article MATH Google Scholar
Tasfi, N. (2016). Pygame learning environment. https://github.com/ntasfi/PyGame-Learning-Environment
Vogel, A., & Jurafsky, D. (2010). Learning to follow navigational directions. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 806–814).
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3–4), 279–292.
Article MATH Google Scholar
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In Advances in neural information processing systems (pp. 5279–5288).
Yager, R. R., & Zadeh, L. A. (2012). An introduction to fuzzy logic applications in intelligent systems (Vol. 165). Berlin: Springer.
MATH Google Scholar
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
Article MathSciNet MATH Google Scholar
Zadeh, L. A. (1988). Fuzzy logic. Computer, 21(4), 83–93.
Article Google Scholar
Zhang, S., & Sridharan, M. (2022). A survey of knowledge-based sequential decision-making under uncertainty. AI Magazine, 43(2), 249–266.
Article Google Scholar
Zhang, P., Hao, J., Wang, W., Tang, H., Ma, Y., Duan, Y., & Zheng, Y. Kogun: Accelerating deep reinforcement learning via integrating human suboptimal knowledge
Zhang, Y., Ren, J., Li, J., Fang, Q., & Xu, X. (2021). Deep q-learning with explainable and transferable domain rules. In International conference on intelligent computing (pp. 259–273). Springer
Zhou, S., Ren, W., Ren, X., Mi, X., & Yi, X. (2021). Kg-rl: A knowledge-guided reinforcement learning for massive battle games. In Pacific rim international conference on artificial intelligence (pp. 83–94). Springer
Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA) (pp 3357–3364). IEEE

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (Grant Nos.: 62106172, U1836214), Special Program of Artificial Intelligence and Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission (No.: 56917ZXRGGX00150), Tianjin Natural Science Fund (No.: 19JCYBJC16300), Research on Data Platform Technology Based on Automotive Electronic Identification System, Science and Technology on Information Systems Engineering Laboratory (Grant No. WDZC20205250407).

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Yuanqiang Yu, Peng Zhang, Kai Zhao, Yan Zheng & Jianye Hao

Authors

Yuanqiang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jianye Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yan Zheng or Jianye Hao.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is an extended version of the paper [48] presented at the Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI), Virtual, Japan, 2020.

Yuanqiang Yu and Peng Zhang have contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, Y., Zhang, P., Zhao, K. et al. Accelerating deep reinforcement learning via knowledge-guided policy network. Auton Agent Multi-Agent Syst 37, 17 (2023). https://doi.org/10.1007/s10458-023-09600-1

Download citation

Accepted: 12 January 2023
Published: 18 February 2023
DOI: https://doi.org/10.1007/s10458-023-09600-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating deep reinforcement learning via knowledge-guided policy network

Abstract

Access this article

Similar content being viewed by others

KGRL: A Method of Reinforcement Learning Based on Knowledge Guidance

Efficient hierarchical policy network with fuzzy rules

Policy-Approximation Based Deep Reinforcement Learning Techniques: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating deep reinforcement learning via knowledge-guided policy network

Abstract

Access this article

Similar content being viewed by others

KGRL: A Method of Reinforcement Learning Based on Knowledge Guidance

Efficient hierarchical policy network with fuzzy rules

Policy-Approximation Based Deep Reinforcement Learning Techniques: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation