Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement Learning

Zhang, Tianle; Liu, Zhen; Pu, Zhiqiang; Yi, Jianqiang; Liang, Yanyan; Zhang, Du

doi:10.1007/s12555-022-0171-z

Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement Learning

Regular Papers
Robot and Applications
Published: 05 May 2023

Volume 21, pages 2350–2362, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement Learning

Download PDF

255 Accesses
2 Citations
Explore all metrics

Abstract

Although deep reinforcement learning has recently achieved some successes in robot navigation, there are still unsolved problems. Particularly, a robot guided by a distant ultimate goal is easy to get stuck in danger or encounter collisions in dynamic crowded environments due to the lack of long-term perspectives. In this paper, a novel subgoal-guided approach based on two-level hierarchical deep reinforcement learning with spatial-temporal graph attention networks (ST-GANets), called SG-HDRL, is proposed for a robot navigating in a dynamic crowded environment with autonomous obstacles, e.g., crowd. Specifically, the high-level policy, that models the spatial-temporal relation between the robot and the obstacles using the obstacles’ trajectories by the designed high-level ST-GANet, generates intermediate subgoals from a longer-term perspective over higher temporal scales. The subgoals give a favorable and collision-free direction to avoid encountering danger or collisions while approaching the ultimate goal. The low-level policy, that similarly implements the designed low-level ST-GANet to implicitly predict the obstacles’ motions, takes the subgoals as short-term guidance through an intrinsic reward incentive to generate primitive actions for the robot. Simulation results demonstrate that SG-HDRL using ST-GANets has better performances compared with state-of-the-art baselines. Furthermore, the proposed SG-HDRL is deployed to an experimental platform based on omnidirectional cars, and experiment results validate the effectiveness and practicability of the proposed SG-HDRL.

Article PDF

Multi-objective deep reinforcement learning for crowd-aware robot navigation with dynamic human preference

Article 14 June 2023

Human-Aware Subgoal Generation in Crowded Indoor Environments

Robot navigation in a crowd by integrating deep reinforcement learning and online planning

Article 17 March 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
Article Google Scholar
Z. Sui, Z. Pu, J. Yi, and S. Wu, “Formation control with collision avoidance through deep reinforcement learning using model-guided demonstration,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 6, pp. 2358–2372, 2020.
Article Google Scholar
J.-Y. Jhang, C.-J. Lin, C.-T. Lin, and K.-Y. Young, “Navigation control of mobile robots using an interval type-2 fuzzy controller based on dynamic-group particle swarm optimization,” International Journal of Control, Automation, and Systems, vol. 16, no. 5, pp. 2446–2457, 2018.
Article Google Scholar
T. Zhang, Z. Liu, S. Wu, Z. Pu, and J. Yi, “Multi-robot cooperative target encirclement through learning distributed transferable policy,” Proc. of International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–8, 2020.
J. Xin, C. Dong, Y. Zhang, Y. Yao, and A. Gong, “Visual servoing of unknown objects for family service robots,” Journal of Intelligent & Robotic Systems, vol. 104, Article number 10, 2022.
M. Tampubolon, L. Pamungkas, H.-J. Chiu, Y.-C. Liu, and Y.-C. Hsieh, “Dynamic wireless power transfer for logistic robots,” Energies, vol. 11, no. 3, p. 527, 2018.
Article Google Scholar
W. Youn, H. Ko, H. Choi, I. Choi, J.-H. Baek, and H. Myung, “Collision-free autonomous navigation of a small uav using low-cost sensors in GPS-denied environments,” International Journal of Control, Automation, and Systems, vol. 19, no. 2, pp. 953–968, 2021.
Article Google Scholar
T. Zhang, T. Qiu, Z. Pu, Z. Liu, and J. Yi, “Robot navigation among external autonomous agents through deep reinforcement learning using graph attention network,” IFACPapersOnLine, vol. 53, no. 2, pp. 9465–9470, 2020.
Google Scholar
R. Tang and H. Yuan, “Cyclic error correction based q-learning for mobile robots navigation,” International Journal of Control, Automation, and Systems, vol. 15, no. 4, pp. 1790–1798, 2017.
Article Google Scholar
P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,” The International Journal of Robotics Research, vol. 17, no. 7, pp. 760–772, 1998.
Article Google Scholar
J. Van den Berg, M. Lin, and D. Manocha, “Reciprocal velocity obstacles for real-time multi-agent navigation,” Proc. of IEEE International Conference on Robotics and Automation, IEEE, pp. 1928–1935, 2008.
J. van den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” Robotics Research, C. Pradalier, R. Siegwart, and G. Hirzinger, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 3–19, 2011.
Chapter Google Scholar
M. Kuderer, H. Kretzschmar, C. Sprunk, and W. Burgard, “Feature-based prediction of trajectories for socially compliant navigation.” Robotics: Science and Systems, 2012.
G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How, “Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns,” Autonomous Robots, vol. 35, no. 1, pp. 51–76, 2013.
Article Google Scholar
H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learning navigation behaviors end-to-end with autorl,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.
Article Google Scholar
Y. Chen, C. Liu, B. E. Shi, and M. Liu, “Robot navigation in crowds by graph convolutional networks with attention learned from human gaze,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2754–2761, 2020.
Article Google Scholar
Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 285–292, 2017.
Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1343–1350, 2017.
P. Long, T. Fanl, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 6252–6259, 2018.
T. Fan, X. Cheng, J. Pan, D. Manocha, and R. Yang, “Crowdmove: Autonomous mapless navigation in crowded scenarios,” arXiv preprint arXiv:1807.07870, 2018.
T. Fan, P. Long, W. Liu, and J. Pan, “Fully distributed multi-robot collision avoidance via deep reinforcement learning for safe and efficient navigation in complex scenarios,” arXiv preprint arXiv:1808.03841, 2018.
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
Article Google Scholar
M. Everett, Y. F. Chen, and J. P. How, “Motion planning among dynamic, decision-making agents with deep reinforcement learning,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 3052–3059, 2018.
C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” Proc. of International Conference on Robotics and Automation (ICRA), pp. 6015–6022, 2019.
X. Yang, M. Moallem, and R. V. Patel, “A layered goal-oriented fuzzy motion planning strategy for mobile robot navigation,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 35, no. 6, pp. 1214–1224, 2005.
Article Google Scholar
D. Wang, D. Liu, N. Kwok, and K. Waldron, “A subgoal-guided force field method for robot navigation,” Proc. of IEEE/ASME International Conference on Mechtronic and Embedded Systems and Applications, IEEE, pp. 488–493, 2008.
O. Nachum, S. S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” Proc. of the 32nd International Conference on Neural Information Processing Systems, pp. 3303–3313, 2018.
T. D. Kulkarni, K. R. Narasimhan, A. Saeedi, and J. B. Tenenbaum, “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,” arXiv preprint arXiv:1604.06057, 2016.
T. Haarnoja, K. Hartikainen, P. Abbeel, and S. Levine, “Latent space policies for hierarchical reinforcement learning,” Proc. of International Conference on Machine Learning, PMLR, pp. 1851–1860, 2018.
R. Makar, S. Mahadevan, and M. Ghavamzadeh, “Hierarchical multi-agent reinforcement learning,” Proc. of the fifth International Conference on Autonomous Agents, pp. 246–253, 2001.
A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “Feudal networks for hierarchical reinforcement learning,” arXiv preprint arXiv:1703.01161, 2017.
P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.
Article MathSciNet Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” Proc. of 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband et al., “Deep q-learning from demonstrations,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 3223–3230, 2018.
Article Google Scholar
N. Ltd, “Nokov products: Mars series,” in Beijing, China, 2021. [Online]. Available: https://www.nokov.com/

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Tianle Zhang, Zhen Liu, Zhiqiang Pu & Jianqiang Yi
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Tianle Zhang, Zhen Liu, Zhiqiang Pu & Jianqiang Yi
Faculty of Innovation Engineering, Macau University of Science and Technology (M.U.S.T), Macau, China
Yanyan Liang & Du Zhang

Authors

Tianle Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Pu
View author publications
You can also search for this author in PubMed Google Scholar
Jianqiang Yi
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Liang
View author publications
You can also search for this author in PubMed Google Scholar
Du Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Liu.

Additional information

Tianle Zhang received his B.Eng. degree in mechanical engineering and automation from University of Science and Technology Beijing, Beijing, China, in 2018. He is currently pursuing a Ph.D. degree in control theory and control engineering with the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His current research interests include planning and decision-making in mobile robots, multiagent deep reinforcement learning (MDRL), and swarm intelligence.

Zhen Liu received his B.Eng. degree in automation from the Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2010, and a Ph.D. degree in control theory and control engineering from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2015. He is currently an Associate Professor with the Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences. His research interests include robust adaptive control, fuzzy control, and applications of aerospace systems.

Zhiqiang Pu received his B.Eng. degree in automation from Wuhan University, Wuhan, China, in 2009, and a Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2014. He is currently a Professor in the Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences. He has been supported by the Talent Program of Youth Innovation Promotion Association CAS since 2017. His research interests include decision intelligence, collective intelligence, and applications of unmanned autonomous systems.

Jianqiang Yi received his B.Eng. degree in mechanical engineering from Beijing Institute of Technology, Beijing, China, in 1985, and his M.Eng. and Ph.D. degrees in automation from the Kyushu Institute of Technology, Kitakyushu, Japan, in 1989 and 1992, respectively. From 1992 to 1994, he worked as a Research Fellow with the Computer Software Development Company, Tokyo, Japan. From 1994 to 2001, He was with MYCOM, Inc., Kyoto, Japan. Since 2001, he has been a Full Professor with the Institute of Automation, Chinese Academy of Sciences, Beijing, China. He has authored or co-authored over 100 international journal papers and 270 international conference papers. His research interests mainly include intelligent control, adaptive control, robot system, and collective intelligence.

Yanyan Liang received his B.S. degree from the Chongqing University of Posts and Telecommunications, Chongqing, China, in 2004, and his M.S. and Ph.D. degrees from the Macau University of Science and Technology (MUST), Taipa, Macau, in 2006 and 2009, respectively. He is currently an Associate Professor with MUST. He has published more than 40 papers related to pattern recognition, image processing, and computer vision in IEEE Transactions and international conferences, including IEEE Transactions on Imageprocessing, IEEE Transactions on Cybernetics, IEEE Transactions on Multimedia, IEEE Transactions on Information Forensics and Security, IJCAI, CVPR, ICPR, and FG. He is also working on smart city applications with computer vision and big data. His research interests include computer vision, image processing, and machine learning.

Du Zhang received his M.S. degree from Nanjing University, Nanjing, China, in 1982, and a Ph.D. degree from the University of Illinois, USA, in 1987. He is currently a Chair Professor with the Faculty of Innovation Engineering, Macau University of Science and Technology, Macau. His research interests cover machine learning (STEP perpetual learning), knowledge-based systems, big data analytics, software engineering, and high-level petri net applications.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Key Research and Development Program of China under Grant 2018AAA0101005, in part by the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27030204, in part by the External Cooperation Key Project of Chinese Academy Sciences (No. 173211KYSB20200002), in part by the Foundation under Grant 2019-JCJQ-ZD-049, and in part by the National Natural Science Foundation of China (62073323).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, T., Liu, Z., Pu, Z. et al. Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement Learning. Int. J. Control Autom. Syst. 21, 2350–2362 (2023). https://doi.org/10.1007/s12555-022-0171-z

Download citation

Received: 23 February 2022
Revised: 29 April 2022
Accepted: 28 July 2022
Published: 05 May 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12555-022-0171-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement Learning

Abstract

Article PDF

Similar content being viewed by others

Multi-objective deep reinforcement learning for crowd-aware robot navigation with dynamic human preference

Human-Aware Subgoal Generation in Crowded Indoor Environments

Robot navigation in a crowd by integrating deep reinforcement learning and online planning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation