Hierarchical End-to-end Control Policy for Multi-degree-of-freedom Manipulators

Min, Cheol-Hui; Song, Jae-Bok

doi:10.1007/s12555-021-0511-4

Hierarchical End-to-end Control Policy for Multi-degree-of-freedom Manipulators

Regular Papers
Robot and Applications
Published: 27 August 2022

Volume 20, pages 3296–3311, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Hierarchical End-to-end Control Policy for Multi-degree-of-freedom Manipulators

Download PDF

265 Accesses
4 Citations
Explore all metrics

Abstract

In recent years, several control policies for a multi-degree-of-freedom (DOF) manipulator using deep reinforcement learning have been proposed. To avoid complexity, previous studies have applied a number of constraints on the high-dimensional state-action space, thus hindering generalized policy function learning. In this study, the control problem is addressed by in-troducing a hierarchical reinforcement learning method that can learn the end-to-end control policy of a multi-DOF manipula-tor without any constraints on the state-action space. The proposed method learns hierarchical policy using two off-policy methods. Using human demonstration data and a newly proposed data-correction method, controlling the multi-DOF manipu-lator in an end-to-end manner is shown to outperform the non-hierarchical deep reinforcement learning methods.

Article PDF

DNN-Based Force Estimation in Hyper-Redundant Manipulators

Article 13 June 2024

Inverse kinematics solution and control method of 6-degree-of-freedom manipulator based on deep reinforcement learning

Article Open access 30 May 2024

Robotic Arm Control and Task Training Through Deep Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

S. Xiao and J. Dong, “Adaptive fault-tolerant control for a class of uncertain TS fuzzy systems with guaranteed time-varying performance,” Fuzzy Sets and Systems, vol. 385, pp. 1–19, April 2020.
Article MathSciNet Google Scholar
R. Raul-Cristian, R.-E. Precup, and E. M. Petri, “Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems,” European Journal of Control, vol. 58, pp. 373–387, March 2021.
Article MathSciNet Google Scholar
A. Turnip and J. H. Panggabean, “Hybrid controller design based magneto-rheological damper lookup table for quarter car suspension,” International Journal of Artificial Intelligence, vol. 18, no. 1, pp. 193–206, March 2020.
Google Scholar
R. Bellman, “Dynamic programming and Lagrange multipliers,” Proceedings of the National Academy of Sciences of the United States of America, vol. 42, no. 10, p. 767, October 1956.
Article MathSciNet Google Scholar
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, September 2013.
Article Google Scholar
F. Zhang, J. Leitner, M. Milford, and P. Corke, “Modular deep Q networks for sim-to-real transfer of visuomotor policies,” arXiv preprint, arXiv:1610.06781, October 2016.
P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 763–768, May 2009.
B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, May 2009.
Article Google Scholar
C. Daniel, G. Neumann, and J. Peters, “Hierarchical relative entropy policy search,” Artificial Intelligence and Statistics, PMLR, pp. 273–281, March 2012.
S. Levine and V. Koltun, “Guided policy search,” Proc. of International Conference on Machine Learning, PMLR, pp. 1–9, May 2013.
D. Xu, S. Nair, Y. Zhu, J. Gao, A. Garg, L. Fei-Fei, and S. Savarese, “Neural task programming: Learning to generalize across hierarchical tasks,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 3795–3802, May 2018.
R. Julian, E. Heiden, Z. He, H. Zhang, S. Schaal, J. Lim. G. Sukhatme, and K. Hausman, “Scaling simulation-to-real transfer by learning composable robot skills,” Proc. of International Symposium on Experimental Robotics, pp. 267–279, Springer, Cham, November 2018.
Google Scholar
C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine, “Learning modular neural network policies for multi-task and multi-robot transfer,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 2169–2176, May 2017.
D. A. Huang, S. Nair, D. Xu, Y. Zhu, A. Garg, L. Fei-Fei, S. Savarese, and J. C. Niebles, “Neural task graphs: Generalizing to unseen tasks from a single video demonstration,” Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8565–8574, 2019.
S. Sohn, J. Oh, and H. Lee, “Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies,” Advances in Neural Information Processing Systems, pp. 7156–7166, December 2018.
R. Parr and S. Russell, “Reinforcement learning with hierarchies of machines,” Advances in Neural Information Processing Systems, vol. 10, pp. 1043–1049, 1997.
Google Scholar
Z. J. Pang, R. Z. Liu, Z. Y. Meng, Y. Zhang, Y. Yu, and T. Lu, “On reinforcement learning for full-length game of starcraft,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 4691–4698, July 2019.
Article Google Scholar
S. Nair and C. Finn, “Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation,” arXiv preprint, arXiv:1909.05829, September 2019.
R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1, pp. 181–211, August 1999.
Article MathSciNet Google Scholar
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, 2000.
Article MathSciNet Google Scholar
D. Precup, Temporal Abstraction in Reinforcement Learning, University of Massachusetts Amherst, 2000.
P. Dayan and G. E. Hinton, “Feudal reinforcement learning,” Advances in Neural Information Processing Systems, vol. 5, pp. 271–278, 1992.
MATH Google Scholar
M. Klimek, H. Michalewski, and P. Mi, “Hierarchical reinforcement learning with parameters,” Proc. of Conference on Robot Learning, PMLR, pp. 301–313, October 2017.
P. L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, pp. 1726–1734, February 2017.
Article Google Scholar
A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “FeUdal networks for hierarchical reinforcement learning,” Proc. of International Conference on Machine Learning, PMLR, pp. 3540–3549, July 2017.
J. Co-Reyes, Y. Liu, A. Gupta, B. Eysenbach, P. Abbeel, and S. Levine, “Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings,” Proc. of International Conference on Machine Learning, PMLR, pp. 1009–1018, July 2018.
A. Levy, G. Konidaris, R. Platt, and K. Saenko, “Learning multi-level hierarchies with hindsight,” arXiv preprint, arXiv:1712.00948, December 2017.
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, R. P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, W. Zaremba, “Hindsight experience replay,” Advances in Neural Information Processing Systems, vol. 30, 2017.
O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
Y. Jiang, S. Gu, K. Murphy, and C. Finn, “Language as an abstraction for hierarchical deep reinforcement learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
C. Li, F. Xia, R. Martin-Martin, and S. Savarese, “HRL4IN: Hierarchical reinforcement learning for interactive navigation with mobile manipulators,” Conference on Robot Learning, PMLR, pp. 603–616, May 2020.
D. Jain, A. Iscen, and K. Caluwaerts, “Hierarchical reinforcement learning for quadruped locomotion,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7551–7557, November 2019.
O. Nachum, M. Ahn, H. Ponte, S. Gu, and V. Kumar, “Multi-agent manipulation via locomotion using hierarchical sim2real,” arXiv preprint, arXiv:1908.05224, August 2019.
M. Wulfmeier, A. Abdolmaleki, R. Hafner, J. T. Springenberg, M. Neunert, T. Hertweck, T. Lampe, N. Siegel, N. Heess, and M. Riedmiller, “Compositional transfer in hierarchical reinforcement learning,” arXiv preprint, arXiv:1906.11228, June 2019.
A. Levy, G. Konidaris, R. Platt, and K. Saenko, “Learning multi-level hierarchies with hindsight,” arXiv preprint, arXiv:1712.00948, December 2017.
S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” Proc. of International Conference on Machine Learning, PMLR, pp. 1587–1596, July 2018.
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” arXiv preprint, arXiv:1812.05905, December 2018.
S. Nasiriany, V. Pong, S. Lin, S. Levine, “Planning with goal-conditioned policies,” Advances in Neural Information Processing Systems, vol. 32, 2019.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint, arXiv:1509.02971, September 2015.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, and S. Petersen, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, February 2015.
Article Google Scholar
N. Chentanez, A. Barto, and S. Singh, “Intrinsically motivated reinforcement learning,” Advances in Neural Information Processing Systems, vol. 17, 2004.
L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, “Asymmetric actor critic for image-based robot learning,” arXiv preprint, arXiv:1710.06542, October 2017.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, November 2018.
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, September 2017.

Download references

Author information

Authors and Affiliations

School of Mechanical Engineering, Korea University, 145, Anam-ro, Seongbuk-gu, Seoul, Korea
Cheol-Hui Min & Jae-Bok Song

Authors

Cheol-Hui Min
View author publications
You can also search for this author in PubMed Google Scholar
Jae-Bok Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jae-Bok Song.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by the MOTIE under the Industrial Foundation Technology Development Program supervised by the KEIT (No. 20008613).

Cheol-Hui Min received his B.S. degree in mechanical engineering from Korea University, in 2017, and an M.S. degree in mechanical engineering from Korea University, in 2019. He is currently pursuing a Ph.D. degree in electrical and computer engineering at Seoul National University. His research interests include 3D vision for robotics, model-based deep reinforcement learning, and optimal control.

Jae-Bok Song received his B.S. and M.S. degrees in mechanical engineering from Seoul National University, in 1983 and 1985, respectively. He was awarded his Ph.D. degree from M.I.T., in 1992. He is currently a Professor at the School of Mechanical Engineering at Korea University. He has served as a director of Intelligent Robotics Laboratory since 1993. His research interests include robot safety and robotic system design and control.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Min, CH., Song, JB. Hierarchical End-to-end Control Policy for Multi-degree-of-freedom Manipulators. Int. J. Control Autom. Syst. 20, 3296–3311 (2022). https://doi.org/10.1007/s12555-021-0511-4

Download citation

Received: 19 June 2021
Revised: 16 January 2022
Accepted: 09 February 2022
Published: 27 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s12555-021-0511-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hierarchical End-to-end Control Policy for Multi-degree-of-freedom Manipulators

Abstract

Article PDF

Similar content being viewed by others

DNN-Based Force Estimation in Hyper-Redundant Manipulators

Inverse kinematics solution and control method of 6-degree-of-freedom manipulator based on deep reinforcement learning

Robotic Arm Control and Task Training Through Deep Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical End-to-end Control Policy for Multi-degree-of-freedom Manipulators

Abstract

Article PDF

Similar content being viewed by others

DNN-Based Force Estimation in Hyper-Redundant Manipulators

Inverse kinematics solution and control method of 6-degree-of-freedom manipulator based on deep reinforcement learning

Robotic Arm Control and Task Training Through Deep Reinforcement Learning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation