Skip to main content

Finding Eulerian Tours in Mazes Using a Memory-Augmented Fixed Policy Function

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 711))

Included in the following conference series:

  • 636 Accesses

Abstract

This paper describes a simple memory augmentation technique that employs tabular Q-learning to solve binary cell structured mazes with exits generated randomly at the start of each solution attempt. A standard tabular Q-learning can solve any maze with continuous learning; however, if the learning is stopped and the policy is frozen, the agent will not adapt to solve newly generated exits. To avoid using Recurrent Neural Networks RNNs to solve memory-required tasks, we designed and implemented a simple external memory to remember the agent’s cell visit history. This memory also expands the state information to hold more information, assisting tabular Q-learning in distinguishing its path from entering and exiting a maze corridor. Experiments on five maze problems of varying complexity are presented. The maze has two and four predefined exits; the exit will be randomly assigned at the start of each solution attempt. The results show that tabular Q-learning with a frozen policy can outperform standard deep-learning algorithms without incorporating RNNs into the model structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Euler tour technique, December 2020

    Google Scholar 

  2. Alexopoulos, C.: A note on state-space decomposition methods for analyzing stochastic flow networks. IEEE Trans. Reliab. 44(2), 354–357 (1995)

    Article  Google Scholar 

  3. Cayton, L.: Fast nearest neighbor retrieval for Bregman divergences. In: Proceedings of the 25th International Conference on Machine Learning, pp. 112–119 (2008)

    Google Scholar 

  4. Chen, C., Ying, V., Laird, D.: Deep q-learning with recurrent neural networks. Stanford Cs229 Course Report 4, 3 (2016)

    Google Scholar 

  5. Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI 1998/IAAI 1998, USA, 1998, pp. 761–768. American Association for Artificial Intelligence (1998)

    Google Scholar 

  6. Del Rosario, J.R.B., et al.: Modelling and characterization of a maze-solving mobile robot using wall follower algorithm. In: Applied Mechanics and Materials, vol. 446, pp. 1245–1249. Trans Tech Publ (2014)

    Google Scholar 

  7. Ilin, R., Kozma, R., Werbos, P.J.: Efficient learning in cellular simultaneous recurrent neural networks-the case of maze navigation problem. In: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 324–329. IEEE (2007)

    Google Scholar 

  8. Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR, abs/1611.05397 (2016)

    Google Scholar 

  9. Lin, J.-H., Vitter, J.S.: A theory for memory-based learning. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 103–115 (1992)

    Google Scholar 

  10. Meng, L., Gorbet, R., Kulić, D.: Memory-based deep reinforcement learning for POMDPS. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5619–5626. IEEE (2021)

    Google Scholar 

  11. Meng, L., Gorbet, R., Kulić, D.: Partial observability during DRL for robot control. arXiv preprin arXiv:2209.04999 (2022)

  12. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  13. Nielsen, Frank: Hierarchical clustering. In: Introduction to HPC with MPI for Data Science. UTCS, pp. 195–211. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21903-5_8

  14. Olsson, M., Malm, S., Witt, K.: Evaluating the effects of hyperparameter optimization in Vizdoom (2022)

    Google Scholar 

  15. Osmanković, D., Konjicija, S.: Implementation of q-learning algorithm for solving maze problem. In: 2011 Proceedings of the 34th International Convention MIPRO, pp. 1619–1622. IEEE (2011)

    Google Scholar 

  16. Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)

    MATH  Google Scholar 

  17. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  MATH  Google Scholar 

  18. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016)

    Google Scholar 

  19. Teh, H.Y., Kempa-Liehr, A.W., Wang, K.I.-K.: Sensor data quality: a systematic review. J. Big Data 7(1), 1–49 (2020)

    Article  Google Scholar 

  20. Tijsma, A.D., Drugan, M.M., Wiering, M.A.: Comparing exploration strategies for q-learning in random stochastic mazes. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8 (2016)

    Google Scholar 

  21. Christopher JCH Watkins and Peter Dayan: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  22. Wierstra, D., Foerster, A., Peters, J., Schmidhuber, J.: Solving deep memory POMDPs with recurrent policy gradients. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 697–706. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74690-4_71

    Chapter  Google Scholar 

  23. Wu, Z., Wang, X., Gonzalez, J.E., Goldstein, T., Davis, L.S.: Ace: adapting to changing environments for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2121–2130 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahrad Pisheh Var .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pisheh Var, M., Fairbank, M., Samothrakis, S. (2023). Finding Eulerian Tours in Mazes Using a Memory-Augmented Fixed Policy Function. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-37717-4_22

Download citation

Publish with us

Policies and ethics