Abstract
A Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations. The method can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained control of the tree growth, at the level of individual simulations, and allows efficient selectivity. The resulting algorithm was implemented in a 9×9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abramson, B.: Expected-Outcome: A General Model of Static Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(2), 182–193 (1990)
Allis, L.V.: Searching for Solutions in Games and Artificial Intelligence. PhD thesis, Universiteit Maastricht, Maastricht, The Netherlands (1994)
Alrefaei, M.H., Andradóttir, S.: A Simulated Annealing Algorithm with Constant Temperature for Discrete Stochastic Optimization. Management Science 45(5), 748–764 (1999)
Baum, E.B., Smith, W.D.: A Bayesian Approach to Relevance in Game Playing. Artificial Intelligence 97(1–2), 195–242 (1997)
Billings, D., Papp, D., Peña, L., Schaeffer, J., Szafron, D.: Using Selective-Sampling Simulations in Poker. In: Proceedings of the AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information (1999)
Bouzy, B.: Associating Shallow and Selective Global Tree Search with Monte Carlo for 9×9 Go. In: van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.) CG 2004. LNCS, vol. 3846, pp. 67–80. Springer, Heidelberg (2006)
Bouzy, B.: Move Pruning Techniques for Monte-Carlo Go. In: van den Herik, H.J., Hsu, S.-C., Hsu, T.-s., Donkers, H.H.L.M. (eds.) CG 2005. LNCS, vol. 4250, pp. 104–119. Springer, Heidelberg (2006)
Bouzy, B., Cazenave, T.: Computer Go: an AI-oriented Survey. Artificial Intelligence 132, 39–103 (2001)
Bouzy, B., Helmstetter, B.: Monte Carlo Go Developments. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) 10th Advances in Computer Games (ACG10), Many Games, Many Challenges, pp. 159–174. Kluwer Academic Publishers, Boston (2004)
Brügmann, B.: Monte Carlo Go, Unpublished technical report (1993)
Cazenave, T., Helmstetter, B.: Combining Tactical Search and Monte-Carlo in the Game of Go. In: Kendall, G., Lucas, S. (eds.) Proceedings of the IEEE Symposium on Computational Intelligence and Games, pp. 117–124. IEEE Computer Society Press, Los Alamitos (2005)
Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: An Adaptive Sampling Algorithm for Solving Markov Decision Processes. Operations Research 53(1), 126–139 (2005)
Chen, C.-H., Lin, J., Yücesan, E., Chick, S.E.: Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization. Journal of Discrete Event Dynamic Systems: Theory and Applications 10(3), 251–270 (2000)
Chung, M., Buro, M., Schaeffer, J.: Monte-Carlo Planning in RTS Games. In: Kendall, G., Lucas, S. (eds.) Proceedings of the IEEE Symposium on Computational Intelligence and Games, pp. 117–124. IEEE Computer Society Press, Los Alamitos (2005)
Enzenberger, M.: Evaluation in Go by a Neural Network Using Soft Segmentation. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) 10th Advances in Computer Games (ACG10), Many Games, Many Challenges, pp. 97–108. Kluwer Academic Publishers, Boston (2004)
Futschik, A., Pflug, G.Ch.: Optimal Allocation of Simulation Experiments in Discrete Stochastic Optimization and Approximative Algorithms. European Journal of Operational Research 101, 245–260 (1997)
Ginsberg, M.L.: GIB: Steps Toward an Expert-Level Bridge-Playing Program. In: Dean, Th. (ed.) Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 584–593. Morgan Kaufmann, Los Altos, CA (1999)
Juillé, H.: Methods for Statistical Inference: Extending the Evolutionary Computation Paradigm. PhD thesis, Brandeis University, Department of Computer Science (May 1999)
Kearns, M., Mansour, Y., Ng, A.Y.: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. In: Dean, T. (ed.) Proceedings of the Sixteenth Internation Joint Conference on Artificial Intelligence, pp. 1324–1331. Morgan Kaufmann, Los Alamitos, CA (1999)
Knuth, D.E., Moore, R.W.: An Analysis of Alpha-Beta Pruning. Artificial Intelligence 6, 293–326 (1975)
Palay, A.J.: Searching with Probabilities. Pitman, Marshfield, MA (1984)
Péret, L., Garcia, F.: On-line Search for Solving Large Markov Decision Processes. In: De Mantaras, R.L., Saitta, L. (eds.) Proceedings of the 16th European Conference on Artificial Intelligence (2004)
Sheppard, B.: Efficient Control of Selective Simulations. ICGA Journal 27(2), 67–79 (2004)
Sutton, R.S.: Learning to Predict by the Methods of Temporal Differences. Machine Learning 3, 9–44 (1988)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Tesauro, G.: Programming Backgammon Using Self-Teaching Neural Nets. Artificial Intelligence 134, 181–199 (2002)
Tromp, J., Farnebäck, G.: Combinatorics of Go. In: van den Herik, H.J., Ciancarini, P., Donkers, H.L.L.M. (eds.) CG 2006. 5th Computers and Games Conference. LNCS, vol. 4630, pp. 85–100. Springer, Heidelberg (2007)
Wedd, N.: Computer Go Tournaments on KGS (2005), http://www.weddslist.com/kgs/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coulom, R. (2007). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.(. (eds) Computers and Games. CG 2006. Lecture Notes in Computer Science, vol 4630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75538-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-75538-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75537-1
Online ISBN: 978-3-540-75538-8
eBook Packages: Computer ScienceComputer Science (R0)