Finding Eulerian Tours in Mazes Using a Memory-Augmented Fixed Policy Function

Pisheh Var, Mahrad; Fairbank, Michael; Samothrakis, Spyridon

doi:10.1007/978-3-031-37717-4_22

Mahrad Pisheh Var¹⁰,
Michael Fairbank¹⁰ &
Spyridon Samothrakis¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 711))

Included in the following conference series:

Science and Information Conference

636 Accesses

Abstract

This paper describes a simple memory augmentation technique that employs tabular Q-learning to solve binary cell structured mazes with exits generated randomly at the start of each solution attempt. A standard tabular Q-learning can solve any maze with continuous learning; however, if the learning is stopped and the policy is frozen, the agent will not adapt to solve newly generated exits. To avoid using Recurrent Neural Networks RNNs to solve memory-required tasks, we designed and implemented a simple external memory to remember the agent’s cell visit history. This memory also expands the state information to hold more information, assisting tabular Q-learning in distinguishing its path from entering and exiting a maze corridor. Experiments on five maze problems of varying complexity are presented. The maze has two and four predefined exits; the exit will be randomly assigned at the start of each solution attempt. The results show that tabular Q-learning with a frozen policy can outperform standard deep-learning algorithms without incorporating RNNs into the model structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Q-Learning in a Multidimensional Maze Environment

Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques

GAN-Based Planning Model in Deep Reinforcement Learning

References

Euler tour technique, December 2020
Google Scholar
Alexopoulos, C.: A note on state-space decomposition methods for analyzing stochastic flow networks. IEEE Trans. Reliab. 44(2), 354–357 (1995)
Article Google Scholar
Cayton, L.: Fast nearest neighbor retrieval for Bregman divergences. In: Proceedings of the 25th International Conference on Machine Learning, pp. 112–119 (2008)
Google Scholar
Chen, C., Ying, V., Laird, D.: Deep q-learning with recurrent neural networks. Stanford Cs229 Course Report 4, 3 (2016)
Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI 1998/IAAI 1998, USA, 1998, pp. 761–768. American Association for Artificial Intelligence (1998)
Google Scholar
Del Rosario, J.R.B., et al.: Modelling and characterization of a maze-solving mobile robot using wall follower algorithm. In: Applied Mechanics and Materials, vol. 446, pp. 1245–1249. Trans Tech Publ (2014)
Google Scholar
Ilin, R., Kozma, R., Werbos, P.J.: Efficient learning in cellular simultaneous recurrent neural networks-the case of maze navigation problem. In: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 324–329. IEEE (2007)
Google Scholar
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR, abs/1611.05397 (2016)
Google Scholar
Lin, J.-H., Vitter, J.S.: A theory for memory-based learning. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 103–115 (1992)
Google Scholar
Meng, L., Gorbet, R., Kulić, D.: Memory-based deep reinforcement learning for POMDPS. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5619–5626. IEEE (2021)
Google Scholar
Meng, L., Gorbet, R., Kulić, D.: Partial observability during DRL for robot control. arXiv preprin arXiv:2209.04999 (2022)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Nielsen, Frank: Hierarchical clustering. In: Introduction to HPC with MPI for Data Science. UTCS, pp. 195–211. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21903-5_8
Olsson, M., Malm, S., Witt, K.: Evaluating the effects of hyperparameter optimization in Vizdoom (2022)
Google Scholar
Osmanković, D., Konjicija, S.: Implementation of q-learning algorithm for solving maze problem. In: 2011 Proceedings of the 34th International Convention MIPRO, pp. 1619–1622. IEEE (2011)
Google Scholar
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
MATH Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article MATH Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016)
Google Scholar
Teh, H.Y., Kempa-Liehr, A.W., Wang, K.I.-K.: Sensor data quality: a systematic review. J. Big Data 7(1), 1–49 (2020)
Article Google Scholar
Tijsma, A.D., Drugan, M.M., Wiering, M.A.: Comparing exploration strategies for q-learning in random stochastic mazes. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8 (2016)
Google Scholar
Christopher JCH Watkins and Peter Dayan: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Wierstra, D., Foerster, A., Peters, J., Schmidhuber, J.: Solving deep memory POMDPs with recurrent policy gradients. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 697–706. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74690-4_71
Chapter Google Scholar
Wu, Z., Wang, X., Gonzalez, J.E., Goldstein, T., Davis, L.S.: Ace: adapting to changing environments for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2121–2130 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Essex, Colchester, UK
Mahrad Pisheh Var, Michael Fairbank & Spyridon Samothrakis

Authors

Mahrad Pisheh Var
View author publications
You can also search for this author in PubMed Google Scholar
Michael Fairbank
View author publications
You can also search for this author in PubMed Google Scholar
Spyridon Samothrakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahrad Pisheh Var .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pisheh Var, M., Fairbank, M., Samothrakis, S. (2023). Finding Eulerian Tours in Mazes Using a Memory-Augmented Fixed Policy Function. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-37717-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-37717-4_22
Published: 01 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37716-7
Online ISBN: 978-3-031-37717-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Finding Eulerian Tours in Mazes Using a Memory-Augmented Fixed Policy Function

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Q-Learning in a Multidimensional Maze Environment

Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques

GAN-Based Planning Model in Deep Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Finding Eulerian Tours in Mazes Using a Memory-Augmented Fixed Policy Function

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Q-Learning in a Multidimensional Maze Environment

Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques

GAN-Based Planning Model in Deep Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation