Abstract
This paper describes a simple memory augmentation technique that employs tabular Q-learning to solve binary cell structured mazes with exits generated randomly at the start of each solution attempt. A standard tabular Q-learning can solve any maze with continuous learning; however, if the learning is stopped and the policy is frozen, the agent will not adapt to solve newly generated exits. To avoid using Recurrent Neural Networks RNNs to solve memory-required tasks, we designed and implemented a simple external memory to remember the agent’s cell visit history. This memory also expands the state information to hold more information, assisting tabular Q-learning in distinguishing its path from entering and exiting a maze corridor. Experiments on five maze problems of varying complexity are presented. The maze has two and four predefined exits; the exit will be randomly assigned at the start of each solution attempt. The results show that tabular Q-learning with a frozen policy can outperform standard deep-learning algorithms without incorporating RNNs into the model structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Euler tour technique, December 2020
Alexopoulos, C.: A note on state-space decomposition methods for analyzing stochastic flow networks. IEEE Trans. Reliab. 44(2), 354–357 (1995)
Cayton, L.: Fast nearest neighbor retrieval for Bregman divergences. In: Proceedings of the 25th International Conference on Machine Learning, pp. 112–119 (2008)
Chen, C., Ying, V., Laird, D.: Deep q-learning with recurrent neural networks. Stanford Cs229 Course Report 4, 3 (2016)
Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI 1998/IAAI 1998, USA, 1998, pp. 761–768. American Association for Artificial Intelligence (1998)
Del Rosario, J.R.B., et al.: Modelling and characterization of a maze-solving mobile robot using wall follower algorithm. In: Applied Mechanics and Materials, vol. 446, pp. 1245–1249. Trans Tech Publ (2014)
Ilin, R., Kozma, R., Werbos, P.J.: Efficient learning in cellular simultaneous recurrent neural networks-the case of maze navigation problem. In: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 324–329. IEEE (2007)
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR, abs/1611.05397 (2016)
Lin, J.-H., Vitter, J.S.: A theory for memory-based learning. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 103–115 (1992)
Meng, L., Gorbet, R., Kulić, D.: Memory-based deep reinforcement learning for POMDPS. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5619–5626. IEEE (2021)
Meng, L., Gorbet, R., Kulić, D.: Partial observability during DRL for robot control. arXiv preprin arXiv:2209.04999 (2022)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nielsen, Frank: Hierarchical clustering. In: Introduction to HPC with MPI for Data Science. UTCS, pp. 195–211. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21903-5_8
Olsson, M., Malm, S., Witt, K.: Evaluating the effects of hyperparameter optimization in Vizdoom (2022)
Osmanković, D., Konjicija, S.: Implementation of q-learning algorithm for solving maze problem. In: 2011 Proceedings of the 34th International Convention MIPRO, pp. 1619–1622. IEEE (2011)
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016)
Teh, H.Y., Kempa-Liehr, A.W., Wang, K.I.-K.: Sensor data quality: a systematic review. J. Big Data 7(1), 1–49 (2020)
Tijsma, A.D., Drugan, M.M., Wiering, M.A.: Comparing exploration strategies for q-learning in random stochastic mazes. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8 (2016)
Christopher JCH Watkins and Peter Dayan: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Wierstra, D., Foerster, A., Peters, J., Schmidhuber, J.: Solving deep memory POMDPs with recurrent policy gradients. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 697–706. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74690-4_71
Wu, Z., Wang, X., Gonzalez, J.E., Goldstein, T., Davis, L.S.: Ace: adapting to changing environments for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2121–2130 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pisheh Var, M., Fairbank, M., Samothrakis, S. (2023). Finding Eulerian Tours in Mazes Using a Memory-Augmented Fixed Policy Function. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-37717-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-37717-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37716-7
Online ISBN: 978-3-031-37717-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)