Abstract
In this paper, we consider reinforcement learning in systems with unknown environment where the agent must trade off efficiently between: exploration(long-term optimization) and exploitation (short-term optimization). ε− greedy algorithm is a method using near-greedy action selection rule. It behaves greedily (exploitation) most of the time, but every once in a while, say with small probability ε (exploration), instead select an action at random. Many works already proved that random exploration drives the agent towards poorly modeled states. Therefore, this study evaluates the role of heuristic based exploration in reinforcement learning. We proposed three methods: neighborhood search based exploration, simulated annealing based exploration, and tabu search based exploration. All techniques follow the same rule ”Explore the most unvisited state”. In the simulation, these techniques are evaluated and compared on a discrete reinforcement learning task (robot navigation).
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Wiering, M.A.: Explorations in efficient reinforcement learning. Ph.D. dissertation, University of Amsterdam IDSIA (February 1999)
Thrun, S., Moller, K.: Active exploration in dynamic environments. In: Moody, J.E., Hanson, S.J., Lippmann, R. (eds.) Advances in Neural Information Processing Systems 4, pp. 531–538. Morgan Kaufmann, Washington (1992)
Nguyen, D., Widrow, B.: The truck backer upper: An example of self-learning in neural networks. In: Proceedings of the First International Joint Conference on Neural Networks Washington DC San Diego, Washington, DC, IEEE TAB Neural Network Committee (1989)
Thrun, S.B., Moller, K., Linden, A.: Planning with an adaptive world model. In: Advances in Neural Information Processing Systems, Morgan Kaufmann, San Mateo (1991)
Holland, J.H.: Adaptation in Natural and Artificial System, 2nd edn. MIT Press, Cambridge (1992)
Macready, W., Wolpert, D.H.: Bandit problems and the Exploration/Exploitation Tradeoff. IEEE Transactions on Evolutionary Computation 2(1), 2–22 (1998)
Reeves, C.R.: Modern Heuristic Techniques for Combinatorial Problems. Blackwell Scientific Publication, Oxford (1993)
Downsland, K.: Simulated annealing. In: Downsland, K. (ed.) Modern Heuristic Techniques for Combinatorial Problems, Blackwell Scientific Publication, Oxford (1993)
Davies, S., Ng, A., Moore, A.: Applying Online Search Techniques to Continuous-State Reinforcement Learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, AAAI (1998)
Atiya, A.F., Parlos, A.G., Ingber, L.: A reinforcement learning method based on adaptive simulated annealing. In: Proceedings of the 46th IEEE International Midwest Symposium on Circuits and Systems, MWSCAS ’03, vol.1, December 2003, pp. 121–124 (2003)
Abramsan, M., Wechsler, H.: Competitive reinforcement learning for combinatorial problems. In: International Joint Conference on Neural Network (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Anh Vien, N., Hoang Viet, N., Lee, S., Chung, T. (2007). Heuristic Search Based Exploration in Reinforcement Learning. In: Sandoval, F., Prieto, A., Cabestany, J., Graña, M. (eds) Computational and Ambient Intelligence. IWANN 2007. Lecture Notes in Computer Science, vol 4507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73007-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-73007-1_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73006-4
Online ISBN: 978-3-540-73007-1
eBook Packages: Computer ScienceComputer Science (R0)