Abstract
Deep Reinforcement Learning (DRL) is a promising Machine Learning technique that enables robotic systems to efficiently learn high dimensional control policies. However, generating good policies requires carefully define appropriate reward functions, state, and action spaces. There is no unique methodology to make these choices, and parameter tuning is time-consuming. In this paper, we investigate how the choice of both the reward function and hyper-parameters affects the quality of the policy learned. To this aim, we compare four DRL algorithms when learning continuous torque control policies for manipulation tasks via a model-free approach. In detail, we simulate one manipulator robot and formulate two tasks: a random target reaching and a pick&place application, each with two different reward functions. Then, we select the algorithms, multiple hyper-parameters, and exhaustively compare their learning performance across the two tasks. Finally, we include the simulated and real-world execution of our best policies. The obtained performance demonstrates the validity of our proposal. Users can follow our approach when selecting the best-performing algorithm according to the assignment. Moreover, they can exploit our results to solve the same tasks, even with other manipulator robots. Generated policies will be easily portable to a physical setup while guaranteeing a perfect match between the simulated and real behaviors.
A. Franceschetti and E. Tosello—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Developed MuJoCo model of a UR5 manipulator equipped with a Robotiq 3-finger gripper available at http://www.mujoco.org/forum/index.php?resources/universal-robots-ur5-robotiq-s-model-3-finger-gripper.22/.
- 2.
Universal Robots UR5 specs available at https://www.universal-robots.com/products/ur5-robot/.
- 3.
Robotiq 3-Finger adaptive gripper specs available at https://robotiq.com/products/3-finger-adaptive-robot-gripper.
- 4.
Unified Robot Description Format (URDF) definition available at http://wiki.ros.org/urdf.
- 5.
rllabplusplus available at https://github.com/shaneshixiang/rllabplusplus.
- 6.
Video of performed experiments available at https://youtu.be/W1EMChcjkKA.
References
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel. P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, Conference Track Proceedings, San Juan, Puerto Rico, 2–4 May 2016 (2016)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, T., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning, ICML-14, pp. 387–395 (2014)
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838 (2016)
Varin, P., Grossman, L., Kuindersma, S.: A comparison of action spaces for learning manipulation tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6015–6021 (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR, abs/1707.06347 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, 10–15 July 2018, pp 1861–1870. PMLR (2018)
Ceola, F., Tosello, E., Tagliapietra, L., Nicola, G., Ghidoni, S.: Robot task planning via deep reinforcement learning: a tabletop object sorting application. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 486–492 (2019)
Nicola, G., Tagliapietra, L., Tosello, E., Navarin, N., Ghidoni, S., Menegatti, E.: Robotic object sorting via deep reinforcement learning: a generalized approach. In: 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1266–1273 (2020)
Thrun, S.: Explanation-Based Neural Network Learning: A Lifelong Learning Approach. Kluwer Academic Publishers, Norwell (1996)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)
Andrychowicz, M., et al.: Hindsight experience replay. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5055–5065 (2017)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S., Levine, S.: Combining model-based and model-free updates for trajectory-centric reinforcement learning. arXiv preprint arXiv:1703.03078 (2017)
Levine, S., Koltun, V.: Guided policy search. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. III-1–III-9. JMLR.org (2013)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2014)
Kober J., Peters J.: Reinforcement learning in robotics: a survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. Adaptation, Learning, and Optimization, vol. 12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3_18
Kimura, H., Kobayashi, S.: Reinforcement learning for continuous action using stochastic gradient ascent. In: 5th Intelligent Autonomous Systems, pp. 288–295 (1998)
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 1329–1338 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR, abs/1412.6980 (2014)
Islam, R., Henderson, P., Gomrokchi, M., Precup, D.: Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv preprint arXiv:1708.04133 (2017)
Quigley, M., et al.: ROS: an open-source robot operating system. In: ICRA Workshop on Open Source Software, Kobe, Japan, vol. 3, p. 5 (2009)
Olson, E.: AprilTag: a robust and flexible visual fiducial system. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 3400–3407. IEEE (2011)
Castaman, N., et al.: RUR53: an unmanned ground vehicle for navigation, recognition, and manipulation. Adv. Robot. 35(1), 1–18 (2021)
de Freitas, E.P., et al.: Ontological concepts for information sharing in cloud robotics. J. Ambient Intell. Humaniz. Comput. (2020)
Tosello, E., Fan, Z., Castro, A.G., Pagello, E.: Cloud-based task planning for smart robots. In: Chen, W., Hosoda, K., Menegatti, E., Shimizu, M., Wang, H. (eds.) IAS 2016. AISC, vol. 531, pp. 285–300. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-48036-7_21
Tosello, E., Fan, Z., Pagello, E.: A semantic knowledge base for cognitive robotics manipulator. In: Workshop on Toward Intelligent Social Robots - Current Advances in Cognitive Robotics (2015)
Fan, Z., Tosello, E., Palmia, M., Pagello, E.: Applying semantic web technologies to multi-robot coordination. In: Workshop on New Research Frontiers for Intelligent Autonomous Systems, NRF-IAS-2014 (2014)
Acknowledgments
Part of this work was supported by MIUR (Italian Minister for Education), under the initiative Departments of Excellence (Law 232/2016), and by Fondazione Cariverona, under the project Collaborazione Uomo-Robot per Assemblaggi Manuali Intelligenti (CURAMI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Franceschetti, A., Tosello, E., Castaman, N., Ghidoni, S. (2022). Robotic Arm Control and Task Training Through Deep Reinforcement Learning. In: Ang Jr, M.H., Asama, H., Lin, W., Foong, S. (eds) Intelligent Autonomous Systems 16. IAS 2021. Lecture Notes in Networks and Systems, vol 412. Springer, Cham. https://doi.org/10.1007/978-3-030-95892-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-95892-3_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95891-6
Online ISBN: 978-3-030-95892-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)