Abstract
In recent years, several control policies for a multi-degree-of-freedom (DOF) manipulator using deep reinforcement learning have been proposed. To avoid complexity, previous studies have applied a number of constraints on the high-dimensional state-action space, thus hindering generalized policy function learning. In this study, the control problem is addressed by in-troducing a hierarchical reinforcement learning method that can learn the end-to-end control policy of a multi-DOF manipula-tor without any constraints on the state-action space. The proposed method learns hierarchical policy using two off-policy methods. Using human demonstration data and a newly proposed data-correction method, controlling the multi-DOF manipu-lator in an end-to-end manner is shown to outperform the non-hierarchical deep reinforcement learning methods.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
S. Xiao and J. Dong, “Adaptive fault-tolerant control for a class of uncertain TS fuzzy systems with guaranteed time-varying performance,” Fuzzy Sets and Systems, vol. 385, pp. 1–19, April 2020.
R. Raul-Cristian, R.-E. Precup, and E. M. Petri, “Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems,” European Journal of Control, vol. 58, pp. 373–387, March 2021.
A. Turnip and J. H. Panggabean, “Hybrid controller design based magneto-rheological damper lookup table for quarter car suspension,” International Journal of Artificial Intelligence, vol. 18, no. 1, pp. 193–206, March 2020.
R. Bellman, “Dynamic programming and Lagrange multipliers,” Proceedings of the National Academy of Sciences of the United States of America, vol. 42, no. 10, p. 767, October 1956.
J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, September 2013.
F. Zhang, J. Leitner, M. Milford, and P. Corke, “Modular deep Q networks for sim-to-real transfer of visuomotor policies,” arXiv preprint, arXiv:1610.06781, October 2016.
P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 763–768, May 2009.
B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, May 2009.
C. Daniel, G. Neumann, and J. Peters, “Hierarchical relative entropy policy search,” Artificial Intelligence and Statistics, PMLR, pp. 273–281, March 2012.
S. Levine and V. Koltun, “Guided policy search,” Proc. of International Conference on Machine Learning, PMLR, pp. 1–9, May 2013.
D. Xu, S. Nair, Y. Zhu, J. Gao, A. Garg, L. Fei-Fei, and S. Savarese, “Neural task programming: Learning to generalize across hierarchical tasks,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 3795–3802, May 2018.
R. Julian, E. Heiden, Z. He, H. Zhang, S. Schaal, J. Lim. G. Sukhatme, and K. Hausman, “Scaling simulation-to-real transfer by learning composable robot skills,” Proc. of International Symposium on Experimental Robotics, pp. 267–279, Springer, Cham, November 2018.
C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine, “Learning modular neural network policies for multi-task and multi-robot transfer,” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 2169–2176, May 2017.
D. A. Huang, S. Nair, D. Xu, Y. Zhu, A. Garg, L. Fei-Fei, S. Savarese, and J. C. Niebles, “Neural task graphs: Generalizing to unseen tasks from a single video demonstration,” Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8565–8574, 2019.
S. Sohn, J. Oh, and H. Lee, “Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies,” Advances in Neural Information Processing Systems, pp. 7156–7166, December 2018.
R. Parr and S. Russell, “Reinforcement learning with hierarchies of machines,” Advances in Neural Information Processing Systems, vol. 10, pp. 1043–1049, 1997.
Z. J. Pang, R. Z. Liu, Z. Y. Meng, Y. Zhang, Y. Yu, and T. Lu, “On reinforcement learning for full-length game of starcraft,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 4691–4698, July 2019.
S. Nair and C. Finn, “Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation,” arXiv preprint, arXiv:1909.05829, September 2019.
R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1, pp. 181–211, August 1999.
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, 2000.
D. Precup, Temporal Abstraction in Reinforcement Learning, University of Massachusetts Amherst, 2000.
P. Dayan and G. E. Hinton, “Feudal reinforcement learning,” Advances in Neural Information Processing Systems, vol. 5, pp. 271–278, 1992.
M. Klimek, H. Michalewski, and P. Mi, “Hierarchical reinforcement learning with parameters,” Proc. of Conference on Robot Learning, PMLR, pp. 301–313, October 2017.
P. L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” Proc. of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, pp. 1726–1734, February 2017.
A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “FeUdal networks for hierarchical reinforcement learning,” Proc. of International Conference on Machine Learning, PMLR, pp. 3540–3549, July 2017.
J. Co-Reyes, Y. Liu, A. Gupta, B. Eysenbach, P. Abbeel, and S. Levine, “Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings,” Proc. of International Conference on Machine Learning, PMLR, pp. 1009–1018, July 2018.
A. Levy, G. Konidaris, R. Platt, and K. Saenko, “Learning multi-level hierarchies with hindsight,” arXiv preprint, arXiv:1712.00948, December 2017.
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, R. P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, W. Zaremba, “Hindsight experience replay,” Advances in Neural Information Processing Systems, vol. 30, 2017.
O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
Y. Jiang, S. Gu, K. Murphy, and C. Finn, “Language as an abstraction for hierarchical deep reinforcement learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
C. Li, F. Xia, R. Martin-Martin, and S. Savarese, “HRL4IN: Hierarchical reinforcement learning for interactive navigation with mobile manipulators,” Conference on Robot Learning, PMLR, pp. 603–616, May 2020.
D. Jain, A. Iscen, and K. Caluwaerts, “Hierarchical reinforcement learning for quadruped locomotion,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7551–7557, November 2019.
O. Nachum, M. Ahn, H. Ponte, S. Gu, and V. Kumar, “Multi-agent manipulation via locomotion using hierarchical sim2real,” arXiv preprint, arXiv:1908.05224, August 2019.
M. Wulfmeier, A. Abdolmaleki, R. Hafner, J. T. Springenberg, M. Neunert, T. Hertweck, T. Lampe, N. Siegel, N. Heess, and M. Riedmiller, “Compositional transfer in hierarchical reinforcement learning,” arXiv preprint, arXiv:1906.11228, June 2019.
A. Levy, G. Konidaris, R. Platt, and K. Saenko, “Learning multi-level hierarchies with hindsight,” arXiv preprint, arXiv:1712.00948, December 2017.
S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” Proc. of International Conference on Machine Learning, PMLR, pp. 1587–1596, July 2018.
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” arXiv preprint, arXiv:1812.05905, December 2018.
S. Nasiriany, V. Pong, S. Lin, S. Levine, “Planning with goal-conditioned policies,” Advances in Neural Information Processing Systems, vol. 32, 2019.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint, arXiv:1509.02971, September 2015.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, and S. Petersen, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, February 2015.
N. Chentanez, A. Barto, and S. Singh, “Intrinsically motivated reinforcement learning,” Advances in Neural Information Processing Systems, vol. 17, 2004.
L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, “Asymmetric actor critic for image-based robot learning,” arXiv preprint, arXiv:1710.06542, October 2017.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, November 2018.
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30, September 2017.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported by the MOTIE under the Industrial Foundation Technology Development Program supervised by the KEIT (No. 20008613).
Cheol-Hui Min received his B.S. degree in mechanical engineering from Korea University, in 2017, and an M.S. degree in mechanical engineering from Korea University, in 2019. He is currently pursuing a Ph.D. degree in electrical and computer engineering at Seoul National University. His research interests include 3D vision for robotics, model-based deep reinforcement learning, and optimal control.
Jae-Bok Song received his B.S. and M.S. degrees in mechanical engineering from Seoul National University, in 1983 and 1985, respectively. He was awarded his Ph.D. degree from M.I.T., in 1992. He is currently a Professor at the School of Mechanical Engineering at Korea University. He has served as a director of Intelligent Robotics Laboratory since 1993. His research interests include robot safety and robotic system design and control.
Rights and permissions
About this article
Cite this article
Min, CH., Song, JB. Hierarchical End-to-end Control Policy for Multi-degree-of-freedom Manipulators. Int. J. Control Autom. Syst. 20, 3296–3311 (2022). https://doi.org/10.1007/s12555-021-0511-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-021-0511-4