Abstract
Reinforcement Learning (RL) is a well-known technique for learning the solutions of control problems from the interactions of an agent in its domain. However, RL is known to be inefficient in problems of the real-world where the state space and the set of actions grow up fast. Recently, heuristics, case-based reasoning (CBR) and transfer learning have been used as tools to accelerate the RL process. This paper investigates a class of algorithms called Transfer Learning Heuristically Accelerated Reinforcement Learning (TLHARL) that uses CBR as heuristics within a transfer learning setting to accelerate RL. The main contributions of this work are the proposal of a new TLHARL algorithm based on the traditional RL algorithm Q(λ) and the application of TLHARL on two distinct real-robot domains: a robot soccer with small-scale robots and the humanoid-robot stability learning. Experimental results show that our proposed method led to a significant improvement of the learning rate in both domains.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aha, D.W., Molineaux, M., Sukthankar, G.: Case-based reasoning in transfer learning. In: Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, ICCBR ’09, pp. 29–44. Springer-Verlag, Berlin (2009)
Araujo, E.G., Grupen, R.A.: Learning control composition in a complex environment. In: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pp. 333–342. MIT Press/Bradford Books (1996)
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Astrom, K.J., Furuta, K.: Swinging up a pendulum by energy control. Automatica 36(2), 287–295 (2000)
Atkeson, C. G., Schaal, S.: Robot learning from demonstration. In: International Conference on Machine Learning, pp. 12–20 (1997)
Banerjee, B., Stone, P.: General game learning using knowledge transfer. In: The 20th International Joint Conference on Artificial Intelligence, pp. 672–677 (2007)
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Guyon, I., Dror, G., Lemaire, V., Taylor, G., Silver, D. (eds.) Proceedings of ICML Workshop on Unsupervised and Transfer Learning, Proceedings of Machine Learning Research, vol. 27, pp. 17–36. PMLR, Bellevue, Washington, USA. http://proceedings.mlr.press/v27/bengio12a.html (2012)
Bianchi, R., Celiberto, L.A., Matsuura, J., Santos, P, de Mántaras, R.L.: Transferring knowledge as heuristics in reinforcement learning: a case base approach. Artif. Intell. 226, 102–121 (2015)
Bianchi, R.A.C., Ribeiro, C.H.C., Costa, A.H.R.: Heuristically Accelerated Q-Learning: a new approach to speed up reinforcement learning. Lect. Notes Artif. Intell. 3171, 245–254 (2004)
Bianchi, R.A.C., Ribeiro, C.H.C., Costa, A.H.R.: Accelerating autonomous learning by using heuristic selection of actions. J. Heuristics 14(2), 135–168 (2008)
de Boer, R., Kok, J.: The Incremental Development of a Synthetic Multi-Agent System: The UvA Trilearn 2001 Robotic Soccer Simulation Team. Master’s Thesis. University of Amsterdam, Amsterdam (2002)
Caruana, R.: Learning many related tasks at the same time with backpropagation. In: Advances in Neural Information Processing Systems 7, pp. 657–664. Morgan Kaufmann (1995)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Celiberto, L.A. Jr, Bianchi, R.A.C., Santos, P.E.: Transfer learning heuristically accelerated algorithm: a case study with real robots. In: 2016 Latin American Robotics Symposium and Intelligent Robotics Meeting, pp. 311–315 (2016)
Celiberto, L.A. Jr, Matsuura, J.P., de Mantaras, R.L., Bianchi, R.A.C.: Using transfer learning to speed-up reinforcement learning: A cased-based approach. In: 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting, pp. 55–60 (2010)
Drummond, C.: Accelerating reinforcement learning by composing solutions of automatically identified subtasks. J. Artif. Intell. Res. 16, 59–104 (2002)
Du, Y., de la Cruz, G.V., Irwin, J., Taylor, M.E.: Initial progress in transfer for deep reinforcement learning algorithms. In: International Joint Conference on Artificial Intelligence (2016)
Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’06, pp. 720–727. ACM, New York, NY, USA (2006)
Ferreira, L.A., Costa Ribeiro, C.H., da Costa Bianchi, R.A.: Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems. Appl. Intell. 41(2), 551–562 (2014)
Glatt, R., da Silva, F.L., Costa, A.H.R.: Towards knowledge transfer in deep reinforcement learning. In: Proceedings of the Brazilian Conference on Intelligent System (BRACIS), pp. 91–96 (2016)
Griffith, S., Subramanian, K., Scholz, J., Isbell, C.L., Thomaz, A.L.: Policy shaping: Integrating human feedback with reinforcement learning. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) NIPS, pp. 2625–2633 (2013)
Gupta, A., Devin, C., Liu, Y., Abbeel, P., Levine, S.: Learning invariant feature spaces to transfer skills with reinforcement learning. In: Proceedings of the Fifth International Conference on Learning Representations. OpenReview, Toulon, France (2017)
Ha, I., Tamura, Y., Asama, H., Han, J., Hong, D.W.: Development of open humanoid platform darwin-op. In: SICE Annual Conference 2011, pp. 2178–2181 (2011)
von Hessling, A., Goel, A.K.: Abstracting reusable cases from reinforcement learning. In: Brüninghaus, S. (ed.) 6th International Conference on Case-Based Reasoning, ICCBR 2005, Chicago, IL, USA, August 23-26, 2005, Workshop Proceedings, pp. 227–236 (2005)
Lazaric, A.: Transfer in Reinforcement Learning: A Framework and a Survey, pp. 143–173. Springer Berlin Heidelberg, Berlin (2012)
Lemke, C., Budka, M., Gabrys, B.: Metalearning: a survey of trends and technologies. Artif. Intell. Rev. 44, 1–14 (2013)
Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., Zhang, G.: Transfer learning using computational intelligence. Know.-Based Syst. 80(C), 14–23 (2015). https://doi.org/10.1016/j.knosys.2015.01.010
de Mántaras, R.L., McSherry, D., Bridge, D., Leake, D., Smyth, B., Craw, S., Faltings, B., Maher, M.L., Cox, M.T., Forbus, K., Keane, M., Aamodt, A., Watson, I.: Retrieval, reuse, revision and retention in case-based reasoning. Knowl. Eng. Rev 20(3), 215–240 (2005)
Nichols, B. D.: Continuous action-space reinforcement learning methods applied to the minimum-time swing-up of the acrobot. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2084–2089 (2015)
Niculescu-Mizil, A., Caruana: Inductive transfer for Bayesian network structure learning. In: Unsupervised and Transfer Learning - Workshop held at ICML 2011, Bellevue, Washington, USA, July 2, 2011, pp. 167–180 (2012)
Noda, I.: Soccer server: a simulator of robocup. In: Proceedings of AI Symposium of the Japanese Society for Artificial Intelligence, pp. 29–34 (1995)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
Parisotto, E., Ba, L.J., Salakhutdinov, R.: Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv:1511.06342 (2015)
Patricia, N., Caputo, B.: Learning to learn, from transfer learning to domain adaptation: A unifying perspective. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pp. 1442–1449. IEEE Computer Society, Washington, DC, USA (2014)
Perico, D.H., Silva, I.J., Vilão Junior, C.O., Homem, T.P.D., Destro, R.C., Tonidandel, F., Bianchi, R.A.C.: Newton: A high level control humanoid robot for the robocup soccer kidsize league. In: Osório, F.S., Wolf, D.F., Castelo Branco, K., Grassi, V. Jr., Becker, M., Romero, R.A.F. (eds.) Robotics: Joint Conference on Robotics, LARS 2014, SBR 2014, Robocontrol 2014, São Carlos, Brazil, October 18-23, 2014. Revised Selected Papers, pp. 53–73. Springer Berlin Heidelberg, Berlin (2015)
Rubenstein, M., Ahler, C., Nagpal, R.: Kilobot: A low cost scalable robot system for collective behaviors. In: 2012 IEEE International Conference on Robotics and Automation, pp. 3293–3298 (2012)
Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1), 123–158 (1996)
Spiegel, M.R.: Statistics. McGraw-Hill, New York (1998)
Spong, M.W.: The swing up control problem for the Acrobot. IEEE Control Syst. 15(1), 49–55 (1995)
Student: The probable error of a mean. Biometrika 6(1), 1–25 (1908)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9–44 (1988)
Suttom, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. Adv. Neural Inf. Proces. Syst. 8, 1038–1044 (1996)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tan, B., Song, Y., Zhong, E., Yang, Q.: Transitive transfer learning. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pp. 1155–1164. ACM, New York, NY, USA (2015)
Taylor, M.E.: Autonomous Inter-task Transfer in Reinforcement Learning Domains. Ph.D. Thesis, Department of Computer Sciences, The University of Texas at Austin (2008)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10(1), 1633–1685 (2009)
Taylor, M.E., Stone, P., Jong, N.K.: Transferring instances for model-based reinforcement learning. In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Artificial Intelligence, vol. 5212, pp. 488–505 (2008)
Tharin, J.: Kilobot User Manual. K-Team (2010)
Thorndike, E.L., Woodworth, R.S.: The influence of improvement in one mental function upon the efficiency of other functions. Psychol. Rev. 8, 247–261 (1901)
Thrun, S., Mitchell, T.M.: Learning one more thing. In: IJCAI’95: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1217–1223. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1995)
Watkins, C.J.C.H.: Learning from Delayed rewards. Ph.D. Thesis. University of Cambridge, Cambridge (1989)
Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 9 (2016)
Welch, B. L.: The generalization of “Student’s” problem when several different population variances are involved. Biometrika 34(1), 28–35 (1947)
Wender, S., Watson, I.: Combining case-based reasoning and reinforcement learning for tactical unit selection in real-time strategy game AI, pp. 413–429. Springer International Publishing, Berlin (2016)
Zhang, X., Yu, T., Yang, B., Cheng, L.: Accelerating bio-inspired optimizer with transfer reinforcement learning for reactive power optimization. Knowledge-Based Systems pp. – (2016)
Zhang, X.S., Li, Q., YU, T., Yang, B.: Consensus transfer q-learning for decentralized generation command dispatch based on virtual generation tribe. IEEE Trans. Smart Grid PP(99), 1–1 (2016). https://doi.org/10.1109/TSG.2016.2607801
Zhang, A., She, J., Lai, X., Wu, M.: Motion planning and tracking control for an acrobot based on a rewinding approach. Automatica 49(1), 278–284 (2012)
Acknowledgements
Reinaldo Bianchi acknowledges support from FAPESP (2016/21047-3), Paulo E. Santos acknowledges support from FAPESP-IBM (2016/18792-9) and CNPq (307093/2014-0), Isaac J. da Silva acknowledges support from CAPES, and Ramon Lopez de Mantaras acknowledges support from Generalitat de Catalunya Research Grant 2014 SGR 118 and CSIC Project 201550E022.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bianchi, R.A.C., Santos, P.E., da Silva, I.J. et al. Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning. J Intell Robot Syst 91, 301–312 (2018). https://doi.org/10.1007/s10846-017-0731-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-017-0731-2