Abstract
We benchmark common reinforcement learning algorithms on a modified version of OpenAI Gym’s Cartpole: a virtual environment containing a simulation of an inverted pendulum. While Policy Gradient, Actor-Critic, and Proximal Policy Optimization are all able to balance the pendulum, only Policy Gradient and Actor-Critic are able to quickly and consistently learn to balance the pendulum in a simulation. By transferring the trained models to the real world, all of the algorithms are able to satisfactorily balance a real inverted pendulum. On the real pendulum, Actor-Critic is best able to adequately reject disturbances among the algorithms tested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J.: Part 3: Intro to Policy Optimization. OpenAI (2018)
Amini, A., et al.: Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robot. Autom. Lett. 5(2), 1143–1150 (2020)
Bonatti, R., Madaan, R., Vineet, V., Scherer, S., Kapoor, A.: Learning visuomotor policies for aerial navigation using cross-modal representations. CoRR (2020)
Deisenroth, M., Rasmussen, C.: PILCO: a model-based and data-efficient approach to policy search. In: ICML, pp. 465–472 (2011)
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: ICML (2016)
Kalashnikov, D., et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Billard, A., Dragan, A., Peters, J., Morimoto, J. (eds.) Proceedings of the 2nd Conference on Robot Learning, Proceedings of Machine Learning Research, vol. 87, pp. 651–673. PMLR (2018)
Karpathy, A.: Deep reinforcement learning: Pong from pixels (2016)
Kennedy, E.: Swing-up and stabilization of a single inverted pendulum: real-time implementation. Ph.D. thesis, North Carolina State University (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR (2015)
OpenAI et al.: Solving Rubik’s Cube with a robot hand. CoRR (2019)
Osiński, B., et al.: Simulation-based reinforcement learning for real-world autonomous driving. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 6411–6418. IEEE (2020)
Riedmiller, M.: Neural reinforcement learning to swing-up and balance a real pole. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3191–3196 (2005)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR (2015)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. CoRR (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bates, D., Tran, H. (2022). Benchmarking Virtual Reinforcement Learning Algorithms to Balance a Real Inverted Pendulum. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 296. Springer, Cham. https://doi.org/10.1007/978-3-030-82199-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-82199-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82198-2
Online ISBN: 978-3-030-82199-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)