Skip to main content
Log in

Learning Robust Locomotion for Bipedal Robot via Embedded Mechanics Properties

  • Research Article
  • Published:
Journal of Bionic Engineering Aims and scope Submit manuscript

Abstract

Reinforcement learning (RL) provides much potential for locomotion of legged robot. Due to the gap between simulation and the real world, achieving sim-to-real for legged robots is challenging. However, the support polygon of legged robots can help to overcome some of these challenges. Quadruped robot has a considerable support polygon, followed by bipedal robot with actuated feet, and point-footed bipedal robot has the smallest support polygon. Therefore, despite the existing sim-to-real gap, most of the recent RL approaches are deployed to the real quadruped robots that are inherently more stable, while the RL-based locomotion of bipedal robot is challenged by zero-shot sim-to-real task. Especially for the point-footed one that gets better dynamic performance, the inevitable tumble brings extra barriers to sim-to-real task. Actually, the crux of this type of problem is the difference of mechanics properties between the physical robot and the simulated one, making it difficult to play the learned skills well on the physical bipedal robot. In this paper, we introduce the embedded mechanics properties (EMP) based on the optimization with Gaussian processes to RL training, making it possible to perform sim-to-real transfer on the BRS1-P robot used in this work, hence the trained policy can be deployed on the BRS1-P without any struggle. We validate the performance of the learning-based BRS1-P on the condition of disturbances and terrains not ever learned, demonstrating the bipedal locomotion and resistant performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of Data and Materials

The datasets generated and analyzed during the current study are not publicly available as the data also forms part of an ongoing study but are available from the corresponding author on reasonable request.

References

  1. Saeedvand, S., Jafari, M., Aghdasi, H. S., & Baltes, J. (2019). A comprehensive survey on humanoid robot development. The Knowledge Engineering Review, 34, e20.

    Article  Google Scholar 

  2. Sun, H., Yang, J. J., Jia, Y. H., & Wang, C. H. (2023). Posture control of legged locomotion based on virtual pivot point concept. Journal of Bionic Engineering. https://doi.org/10.1007/s42235-023-00410-5

    Article  Google Scholar 

  3. Huang, Z. L., Dong, C. C., Yu, Z. G., Chen, X. C., Meng, F., & Huang, Q. (2023). Task-space whole-body control with variable contact force control for position-controlled humanoid adaptation to unknown disturbance. Journal of Bionic Engineering. https://doi.org/10.1007/s42235-023-00378-2

    Article  Google Scholar 

  4. Gong, Y., Hartley, R., Da, X., Hereid, A., Harib, O., Huang, J. K. & Grizzle J. (2019). Feedback control of a cassie bipedal robot: Walking, standing, and riding a segway. In: 2019 American Control Conference (ACC), Philadelphia, USA, (pp. 4559-4566), IEEE.

  5. Choi, S., Ji, G., Park, J., Kim, H., Mun, J., Lee, J. H., & Hwangbo, J. (2023). Learning quadrupedal locomotion on deformable terrain. Science Robotics, 8(74), eade2256.

    Article  Google Scholar 

  6. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (pp. 66–101). MIT Press.

    Google Scholar 

  7. Chen, G., Yang, X., Xu, Y., Lu, Y., & Hu, H. (2022). Neural network-based motion modeling and control of water-actuated soft robotic fish. Smart Materials and Structures, 32(1), 015004.

    Article  Google Scholar 

  8. Chen, G., Xu, Y., Yang, C., Yang, X., Hu, H., Chai, X. & Wang, D. (2023). Design and control of a novel bionic mantis shrimp robot. IEEE/ASME Transactions on Mechatronics, 2023, 3266778.

    Google Scholar 

  9. Li Z., Cheng, X., Peng, X. B., Abbeel, P., Levine, S., Berseth, G. & Sreenath, K. (2021). Reinforcement learning for robust parameterized locomotion control of bipedal robots. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, (pp. 2811–2817), IEEE.

  10. Peng, X. B., Coumans, E., Zhang, T., Lee, T. W., Tan, J. & Levine, S. (2020). Learning agile robotic locomotion skills by imitating animals. arXiv preprint, arXiv:2004.00784.

  11. Xu, Y., Luo, Z., Bai, X., Xie, H., Zhu, Y., Chen, S., & Shang, J. (2023). Design and experiments of a human-leg-inspired omnidirectional robotic leg. Journal of Bionic Engineering. https://doi.org/10.1007/s42235-023-00410-5

    Article  Google Scholar 

  12. Han, L. Q., Chen, X. C., Yu, Z. G., Zhu, X., Hashimoto, K., & Huang, Q. (2023). Trajectory-free dynamic locomotion using key trend states for biped robots with point feet. Information Sciences, 66(189201), 1–189201.

    Google Scholar 

  13. Rudin, N., Hoeller, D., Reist, P. & Hutter, M. (2022). Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning, Auckland, New Zealand, (pp. 91–100), PMLR.

  14. Duan, H. L., Dao, J., Green, K., Apgar, T., Fern, A. & Hurst, J. (2021). Learning task space actions for bipedal locomotion. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, (pp. 1276–1282), IEEE.

  15. Zhang, S., Boehmer, W. & Whiteson, S. (2019). Deep residual reinforcement learning. arXiv preprint, arXiv:1905.01072.

  16. Johannink, T., Bahl, S., Nair, A., Luo, J., Kumar, A., Loskyll, M. & Levine, S. (2019). Residual reinforcement learning for robot control. In: 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, (pp. 6023–6029), IEEE.

  17. Alakuijala, M., Dulac-Arnold, G., Mairal, J., Ponce, J. & Schmid, C. (2021). Residual reinforcement learning from demonstrations. arXiv preprint, arXiv:2106.08050.

  18. Xie, Z. M., Berseth, G., Clary, P., Hurst, J. & van de Panne, M. (2018). Feedback control for cassie with deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, (pp. 1241–1246), IEEE.

  19. Xie, Z. M., Clary, P., Dao, J., Morais, P., Hurst, J. & Panne, M. (2020). Learning locomotion skills for cassie: Iterative design and sim-to-real. In: Conference on Robot Learning (ICRL), Virtual Conference, (pp. 317–329), PMLR.

  20. Siekmann, J., Godse, Y., Fern, A. & Hurst, J. (2021). Sim-to-real learning of all common bipedal gaits via periodic reward composition. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, (pp. 7309–7315), IEEE.

  21. Csomay-Shanklin, N., Tucker, M., Dai, M., Reher, J. & Ames, A. D. (2022, May). Learning controller gains on bipedal walking robots via user preferences. In: 2022 International Conference on Robotics and Automation (ICRA), Pennsylvania, USA, (pp. 10405–10411), IEEE.

  22. Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., & Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26), eaau5872.

    Article  Google Scholar 

  23. Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., & Hutter, M. (2020). Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47), eabc5986.

    Article  Google Scholar 

  24. Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., & Hutter, M. (2022). Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62), eabk2822.

    Article  Google Scholar 

  25. Wang, L., Meng, L. B., Kang, R., Liu, B. T., Gu, S., Zhang, Z. H., Meng, Fei., & Ming, A. G. (2022). Design and dynamic locomotion control of quadruped robot with perception-less terrain adaptation. Cyborg And Bionic Systems, 2022, Art. no. 9816495.

    Article  Google Scholar 

  26. Wang, Z. C., Wei, W., Xie, A., Zhang, Y., Wu, J., & Zhu, Q. G. (2022). Hybrid bipedal locomotion based on reinforcement learning and heuristics. Micromachines, 13(10), 1688.

    Article  Google Scholar 

  27. Peng, X. B., Ma, Z., Abbeel, P., Levine, S., & Kanazawa, A. (2021). Amp: adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics (TOG), 40(4), 1–20.

    Article  Google Scholar 

  28. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1), 53–65.

    Article  Google Scholar 

  29. Vollenweider, E., Bjelonic, M., Klemm, V., Rudin, N., Lee, J. & Hutter, M. (2022). Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv preprint, arXiv:2203.14912.

  30. Hutter, M., Gehring, C., Jud, D., Lauber, A., Bellicoso, C. D., Tsounis, V. & Hoepflinger, M. (2016). Anymal-a highly mobile and dynamic quadrupedal robot. In: 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), Daejeon, Korea, (pp. 38–44), IEEE.

  31. Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. Machine Learning Proceedings, 1995, 531–539.

    Google Scholar 

  32. Schulman, J., Moritz, P., Levine, S., Jordan, M. & Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv preprint, arXiv:1506.02438.

  33. Konda, V., & Tsitsiklis, J. (1999). Actor-critic algorithms. In Neural Information Processing Systems (NIPS) (p. 12), Colorado. MIT Press & NIPS Foundation.

  34. Schulman, J., Levine, S., Abbeel, P., Jordan, M. & Moritz, P. (2015). Trust region policy optimization. In: International Conference on Machine Learning (ICML), Lille, France, (pp. 1889–1897), PMLR.

  35. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y. & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint, arXiv:1509.02971.

  36. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint, arXiv:1707.06347.

  37. Brochu, E., Cora, V. M. & De Freitas, N. (2010). A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint, arXiv:1012.2599.

  38. Yu, W., Turk, G., & Liu, C. K. (2018). Learning symmetric and low-energy locomotion. ACM Transactions on Graphics (TOG), 37(4), 1–12.

    Article  Google Scholar 

  39. Chen, G., Zhao, Z., Wang, Z., Tu, J., & Hu, H. (2023). Swimming modeling and performance optimization of a fish-inspired underwater vehicle (FIUV). Ocean Engineering, 271, 113748.

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant No. 62073041, and in part by the “111” Project under Grant B08043.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Meng.

Ethics declarations

Conflict of Interest

The authors certify that there is no conflict of interest pertaining to the content of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary information. The results of this work are visible in the accompanying video

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Chen, X., Meng, F. et al. Learning Robust Locomotion for Bipedal Robot via Embedded Mechanics Properties. J Bionic Eng (2024). https://doi.org/10.1007/s42235-023-00452-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42235-023-00452-9

Keywords

Navigation