Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models

Han, Lei; Zhu, Qingxu; Sheng, Jiapeng; Zhang, Chong; Li, Tingguang; Zhang, Yizheng; Zhang, He; Liu, Yuzhen; Zhou, Cheng; Zhao, Rui; Li, Jie; Zhang, Yufeng; Wang, Rui; Chi, Wanchao; Li, Xiong; Zhu, Yonghui; Xiang, Lingzhu; Teng, Xiao; Zhang, Zhengyou

doi:10.1038/s42256-024-00861-3

Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models

Article
Published: 05 July 2024

Volume 6, pages 787–798, (2024)
Cite this article

From

View current issue Submit your manuscript

2810 Accesses
4 Citations
32 Altmetric
4 Mentions
Explore all metrics

A preprint version of the article is available at arXiv.

Abstract

Knowledge from animals and humans inspires robotic innovations. Numerous efforts have been made to achieve agile locomotion in quadrupedal robots through classical controllers or reinforcement learning approaches. These methods usually rely on physical models or handcrafted rewards to accurately describe the specific system, rather than on a generalized understanding like animals do. Here we propose a hierarchical framework to construct primitive-, environmental- and strategic-level knowledge that are all pre-trainable, reusable and enrichable for legged robots. The primitive module summarizes knowledge from animal motion data, where, inspired by large pre-trained models in language and image understanding, we introduce deep generative models to produce motor control signals stimulating legged robots to act like real animals. Then, we shape various traversing capabilities at a higher level to align with the environment by reusing the primitive module. Finally, a strategic module is trained focusing on complex downstream tasks by reusing the knowledge from previous levels. We apply the trained hierarchical controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles and play in a designed challenging multi-agent chase tag game, where lifelike agility and strategy emerge in the robots.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 1: A framework overview of the proposed method.**

**Fig. 2: Evaluation of the primitive motor controllers.**

**Fig. 3: Performance evaluation of the environmental-primitive motor controllers.**

**Fig. 4: Snapshots in the chase tag game.**

Hierarchical Decentralized Deep Reinforcement Learning Architecture for a Simulated Four-Legged Agent

Hierarchical generative modelling for autonomous robots

Article Open access 02 November 2023

A learning-based control pipeline for generic motor skills for quadruped robots

Article 12 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The full motion data from the Labrador retriever together with the retargeted data for the MAX robot are available from Code Ocean at https://doi.org/10.24433/CO.8441152.v3 (ref. ⁵¹) and GitHub at https://tencent-roboticsx.github.io/lifelike-agility-and-play/. The raw motion clips are in .bvh format, and the retargeted data are organized in .txt files.

Code availability

The codes are available in Code Ocean at https://doi.org/10.24433/CO.8441152.v3 (ref. ⁵¹) and GitHub at https://tencent-roboticsx.github.io/lifelike-agility-and-play/.

References

Tan, J. et al. Sim-to-real: learning agile locomotion for quadruped robots. In Proc. Robotics: Science and Systems Vol. XIV (MIT Press Journals, 2018).
Haarnoja, T., Hartikainen, K., Abbeel, P. and Levine, S. Latent space policies for hierarchical reinforcement learning. In Proc. 35th International Conference on Machine Learning 1851–1860 (PMLR, 2018).
Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci. Rob. 4, eaau5872 (2019).
Article Google Scholar
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Rob. 5, eabc5986 (2020).
Article Google Scholar
Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Rob. 7, eabk2822 (2022).
Article Google Scholar
Kumar, A., Fu, Z., Pathak, D. & Malik, J. RMA: rapid motor adaptation for legged robots. In Proc. Robotics: Science and Systems Vol. XVII (2021).
Cheng, X., Shi, K., Agarwal, A. & Pathak, D. Extreme parkour with legged robots. In Conference on Robot Learning (2023).
Zhuang, Z. et al. Robot parkour learning. In Conference on Robot Learning (2023).
Hoeller, D., Rudin, N., Sako, D. & Hutter, M. ANYmal parkour: learning agile navigation for quadrupedal robots. Sci. Rob. 9, eadi7566 (2024).
Article Google Scholar
Yang, Y. et al. CAJun: continuous adaptive jumping using a learned centroidal controller. In Proc. 7th Conference on Robot Learning Vol. 229, 2791–2806 (PMLR, 2023).
Caluwaerts, K. et al. Barkour: benchmarking animal-level agility with quadruped robots. Preprint at https://doi.org/10.48550/arXiv.2305.14654 (2023).
Choi, S. et al. Learning quadrupedal locomotion on deformable terrain. Sci. Rob. 8, eade2256 (2023).
Article Google Scholar
Yang, C., Yuan, K., Zhu, Q., Yu, W. & Li, Z. Multi-expert learning of adaptive legged locomotion. Sci. Rob. 5, eabb2174 (2020).
Article Google Scholar
Peng, X. B. et al. Learning agile robotic locomotion skills by imitating animals. In Proc. Robotics: Science and Systems (2020).
Bohez, S. et al. Imitate and repurpose: learning reusable robot movement skills from human and animal behaviors. Preprint at https://doi.org/10.48550/arXiv.2203.17138 (2022).
Levine, S., Wang, J. M., Haraux, A., Popović, Z. & Koltun, V. Continuous character control with low-dimensional embeddings. ACM Trans. Graphics 31, 28 (2012).
Article Google Scholar
Ling, H. Y., Zinno, F., Cheng, G. & Van De Panne, M. Character controllers using motion VAEs. ACM Trans. Graphics 39, 40 (2020).
Article Google Scholar
Tirumala, D. et al. Behavior priors for efficient reinforcement learning. J. Mach. Learn. Res. 23, 9989–10056 (2022).
MathSciNet Google Scholar
Heess, N. et al. Learning and transfer of modulated locomotor controllers. Preprint at https://doi.org/10.48550/arXiv.1610.05182 (2016).
Merel, J. et al. Neural probabilistic motor primitives for humanoid control. In International Conference on Learning Representations 4647–4660 (Curran Assoc., 2019).
Hasenclever, L., Pardo, F., Hadsell, R., Heess, N. & Merel, J. CoMic: complementary task learning & mimicry for reusable skills. In Proc. 37th International Conference on Machine Learning Vol. 119, 4105–4115 (PMLR, 2020).
Liu, S. et al. From motor control to team play in simulated humanoid football. Sci. Rob. 7, eabo0235 (2022).
Article Google Scholar
Zhu, Q., Zhang, H., Lan, M. & Han, L. Neural categorical priors for physics-based character control. ACM Trans. Graphics 42, 178 (2023).
Article Google Scholar
Ji, Y. et al. Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems 1479–1486 (IEEE, 2022).
van den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. In Advances in Neural Information Processing Systems (2017).
Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference on Machine Learning Vol. 139, 8821–8831 (PMLR, 2021).
Roy, A., Vaswani, A., Neelakantan, A. & Parmar, N. Theory and experiments on vector quantized autoencoders. Preprint at https://doi.org/10.48550/arXiv.1805.11063 (2018).
Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning Vol. 4 (Springer, 2006).
Chi, W., Jiang, X. & Zheng, Y. A linearization of centroidal dynamics for the model-predictive control of quadruped robots. In 2022 International Conference on Robotics and Automation 4656–4663 (IEEE, 2022).
Zhou, Q. et al. Max: A wheeled-legged quadruped robot for multimodal agile locomotion. IEEE Transactions on Automation Science and Engineering 1–21 (2024).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://doi.org/10.48550/arXiv.1707.06347 (2017).
Sun, P. et al. TLeague: a framework for competitive self-play based distributed multi-agent reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.2011.12895 (2020).
Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (2017).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Gouelle, A., Mégrot, F. & Müller, B. Interpreting spatiotemporal parameters, symmetry, and variability in clinical gait analysis. Handb. Hum. Motion 689, 707 (2018).
Google Scholar
Jarvis, S. L. et al. Kinematic and kinetic analysis of dogs during trotting after amputation of a thoracic limb. Am. J. Vet. Res. 74, 1155–1163 (2013).
Article Google Scholar
Pálya, Z., Rácz, K., Nagymáté, G. & Kiss, R. M. Development of a detailed canine gait analysis method for evaluating harnesses: a pilot study. PLoS ONE 17, e0264299 (2022).
World chase tag. Wikipedia https://en.wikipedia.org/wiki/World_Chase_Tag (accessed 23 March 2023).
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Article Google Scholar
Han, L. et al. TStarBot-X: an open-sourced and comprehensive study for efficient league training in starcraft ii full game. Preprint at https://doi.org/10.48550/arXiv.2011.13729 (2020).
Coulom, R. Bayesian Elo rating (2005).
Xie, Z. et al. Learning locomotion skills for cassie: Iterative design and sim-to-real. In Proc. Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research 317–329 (PMLR, 2020).
Peng, X. B., Kanazawa, A., Malik, J., Abbeel, P. & Levine, S. SFV: reinforcement learning of physical skills from videos. ACM Trans. Graph. 37, 178 (2018).
Article Google Scholar
Zhang, H. et al. Learning physically simulated tennis skills from broadcast videos. ACM Trans. Graph. 42, 95 (2023).
Article Google Scholar
Gleicher, M. Retargetting motion to new characters. In Proc. 25th Annual Conference on Computer Graphics and Interactive Techniques 33–42 (Association for Computing Machinery, 1998).
Peng, X. B. and Van De Panne, M. Learning locomotion skills using DeepRL: does the choice of action space matter? In Proc. ACM SIGGRAPH/Eurographics Symposium on Computer Animation 12:1–12:13 (Association for Computing Machinery, 2017).
Ho, J. and Ermon, S. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems Vol. 29 (2016).
Agarwal, A., Kumar, A., Malik, J. & Pathak, D. Legged locomotion in challenging terrains using egocentric vision. In Proc. 6th Conference on Robot Learning (eds Liu, K. et al.) 403–415 (PMLR, 2023).
Li, T. et al. Learning terrain-adaptive locomotion with agile behaviors by imitating animals. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems 339–345, (2023).
Rusu, A. A. et al. Policy distillation. Preprint at https://doi.org/10.48550/arXiv.1511.06295 (2015).
Han, L. et al. Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Code Ocean https://doi.org/10.24433/CO.8441152.v3 (2024).

Download references

Acknowledgements

We would like to thank S. Li for his early contributions to motion retargeting. We would like to thank our colleagues in Tencent Robotics X and Tencent Cloud for providing constructive discussions and computing resources. We would like to thank the Labrador who wore the motion capture markers and moved for motion data collection.

Author information

These authors contributed equally: Lei Han, Qingxu Zhu, Jiapeng Sheng, Chong Zhang, Tingguang Li, Yizheng Zhang, He Zhang.

Authors and Affiliations

Tencent Robotics X, Shenzhen, China
Lei Han, Qingxu Zhu, Jiapeng Sheng, Chong Zhang, Tingguang Li, Yizheng Zhang, He Zhang, Yuzhen Liu, Cheng Zhou, Rui Zhao, Jie Li, Yufeng Zhang, Rui Wang, Wanchao Chi, Xiong Li, Yonghui Zhu, Lingzhu Xiang, Xiao Teng & Zhengyou Zhang

Authors

Lei Han
View author publications
You can also search for this author in PubMed Google Scholar
Qingxu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jiapeng Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Chong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tingguang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yizheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
He Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wanchao Chi
View author publications
You can also search for this author in PubMed Google Scholar
Xiong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yonghui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lingzhu Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Teng
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyou Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.H. organized the research project. L.H., Q.Z., C. Zhang, T.L. and H.Z. designed, implemented and experimented with various environmental settings, neural network architectures, algorithms and so on. C. Zhou, T.L. and C. Zhang collected the animal motion dataset. L.H. and Yizheng Zhang iterated over multiple versions of the physics-based simulator and its settings. J.S., Y.L., Yizheng Zhang, T.L., Q.Z. and L.H. completed the real robot experiments. Q.Z., R.Z. and C. Zhou contributed to improving the training infrastructure. Y.L., J.L., Yufeng Zhang, R.W., W.C., X.L., Y. Zhu, L.X. and X.T. maintained the robot hardware and software during the project. L.H. wrote the paper with contributions from H.Z., C. Zhang, Q.Z., T.L. and J.S.; Z.Z. provided general scope advice and consistently supported the team.

Corresponding authors

Correspondence to Lei Han or Qingxu Zhu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Ken Caluwaerts, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 6.1–6.5, Tables 1–4 and Figs. 1–4.

Reporting Summary

Supplementary Video 1

Main movie for the PMC model.

Supplementary Video 2

Main movie for the EPMC model.

Supplementary Video 3

Main movie for the SEPMC model.

Supplementary Video 4

The performance of all the trained policies in simulation.

Supplementary Video 5

The performance of the fall recovery model in real-world experiment.

Supplementary Video 6

The performance of the student environment-level network using onboard depth camera in real-world experiment.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, L., Zhu, Q., Sheng, J. et al. Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Nat Mach Intell 6, 787–798 (2024). https://doi.org/10.1038/s42256-024-00861-3

Download citation

Received: 30 August 2023
Accepted: 31 May 2024
Published: 05 July 2024
Issue Date: July 2024
DOI: https://doi.org/10.1038/s42256-024-00861-3
Springer Nature Limited

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models

From

Abstract

Access this article

Similar content being viewed by others

Hierarchical Decentralized Deep Reinforcement Learning Architecture for a Simulated Four-Legged Agent

Hierarchical generative modelling for autonomous robots

A learning-based control pipeline for generic motor skills for quadruped robots

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Supplementary Video 1

Supplementary Video 2

Supplementary Video 3

Supplementary Video 4

Supplementary Video 5

Supplementary Video 6

Rights and permissions

About this article

Cite this article

Navigation

Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models

Abstract

Access this article

Similar content being viewed by others

Explore related subjects

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation