Skip to main content

An AlphaZero-Inspired Approach to Solving Search Problems

  • Chapter
  • First Online:
Decision Making Under Uncertainty and Constraints

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 217))

  • 397 Accesses

Abstract

AlphaZero and its extension MuZero are computer programs that use machine-learning techniques to play at a superhuman level in chess, go, and a few other games. They achieved this level of play solely with reinforcement learning from self-play, without any domain knowledge except the game rules. It is a natural idea to adapt the methods and techniques used in AlphaZero for solving search problems such as the Boolean satisfiability problem (in its search version). Given a search problem, how to represent it for an AlphaZero-inspired solver? What are the “rules of solving” for this search problem? We describe possible representations in terms of easy-instance solvers and self-reductions, and we give examples of such representations for the satisfiability problem. We also describe a version of Monte Carlo tree search adapted for search problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Biere, A., Heule, M., van Maaren, H., Walsh, T. (Eds.): Handbook of Satisfiability, 2nd edn. IOS Press (2021)

    Google Scholar 

  2. Buss, S., Nordström, J.: Proof complexity and SAT solving. In: Handbook of Satisfiability, 2nd edn., vol. 336, pp. 233–350. IOS Press (2021)

    Google Scholar 

  3. Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: An adaptive sampling algorithm for solving Markov decision processes. Oper. Res. 53(1), 126–139 (2005)

    Google Scholar 

  4. Hoos, H.H., Hutter, F., Leyton-Brown, K.: Automated configuration and selection of SAT solvers. In: Handbook of Satisfiability. Volume 336 of Frontiers in Artificial Intelligence and Applications, 2nd edn., pp. 481–507. IOS Press (2021)

    Google Scholar 

  5. Kerschke, P., Hoos, H.H., Neumann, F., Trautmann, H.: Automated algorithm selection: survey and perspectives. Evol. Comput. 27(1), 3–45 (2019)

    Article  Google Scholar 

  6. Mazyavkina, N., Sviridov, S., Ivanov, S., Burnaev, E.: Reinforcement learning for combinatorial optimization: a survey. Comput. Oper. Res. 134, 105400 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  7. Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P.: Curriculum learning for reinforcement learning domains: a framework and survey. J. Mach. Learn. Res. 21, 181:1–181:50 (2020)

    Google Scholar 

  8. Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T.P., Silver, D.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020)

    Article  Google Scholar 

  9. Selsam, D., Lamm, M., Bünz, B., Liang, P., de Moura, L., Dill, D.L.: Learning a SAT solver from single-bit supervision. In: Proceedings of the 7th International Conference on Learning Representations, ICLR 2019 (2019)

    Google Scholar 

  10. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., Hassabis, D.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  11. Tseitin, G.S.: On the complexity of derivation in propositional calculus. Zapiski Nauchnykh Seminarov LOMI 8, 234–259 (1968). In Russian. Reprinted in: Siekmann, J., Wrightson, G. (eds.): Automation of Reasoning 2: Classical Papers on Computational Logic 1967–1970, pp. 466–483. Springer (1983)

    Google Scholar 

  12. Vesselinova, N., Steinert, R., Perez-Ramirez, D.F., Boman, M.: Learning combinatorial optimization on graphs: a survey with applications to networking. IEEE Access 8, 120388–120416 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evgeny Dantsin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dantsin, E., Kreinovich, V., Wolpert, A. (2023). An AlphaZero-Inspired Approach to Solving Search Problems. In: Ceberio, M., Kreinovich, V. (eds) Decision Making Under Uncertainty and Constraints. Studies in Systems, Decision and Control, vol 217. Springer, Cham. https://doi.org/10.1007/978-3-031-16415-6_20

Download citation

Publish with us

Policies and ethics