Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Pepels, Tom; Cazenave, Tristan; Winands, Mark H. M.; Lanctot, Marc

doi:10.1007/978-3-319-14923-3_1

Tom Pepels¹⁵,
Tristan Cazenave¹⁶,
Mark H. M. Winands¹⁵ &
…
Marc Lanctot¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 504))

Included in the following conference series:

Workshop on Computer Games

760 Accesses
5 Citations

Abstract

Regret minimization is important in both the Multi-Armed Bandit problem and Monte-Carlo Tree Search (MCTS). Recently, simple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in MCTS, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of MCTS research applies the UCT selection policy for minimizing cumulative regret in the tree, this paper introduces a new MCTS variant, Hybrid MCTS (H-MCTS), which minimizes both types of regret in different parts of the tree. H-MCTS uses SHOT, a recursive version of Sequential Halving, to minimize simple regret near the root, and UCT to minimize cumulative regret when descending further down the tree. We discuss the motivation for this new search technique, and show the performance of H-MCTS in six distinct two-player games: Amazons, AtariGo, Ataxx, Breakthrough, NoGo, and Pentalath.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

LinUCB Applied to Monte-Carlo Tree Search

Monte Carlo Tree Search with Robust Exploration

Sub-sampling for Multi-armed Bandits

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Arneson, B., Hayward, R., Henderson, P.: Monte-Carlo tree search in Hex. IEEE Trans. Comput. Intell. AI in Games 2(4), 251–258 (2010)
Article Google Scholar
Audibert, J., Bubeck, S., Munos, R.: Best arm identification in multi-armed bandits. In: Proc. 23rd Conf. on Learn. Theory, pp. 41–53 (2010)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)
Article MATH Google Scholar
Balla, R.K., Fern, A.: UCT for tactical assault planning in real-time strategy games. In: Boutilier, C. (ed.) Proc. of the 21st Int. Joint Conf. on Artif. Intel. (IJCAI), pp. 40–45 (2009)
Google Scholar
Browne, C., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte-Carlo tree search methods. IEEE Trans. on Comput. Intell. AI in Games 4(1), 1–43 (2012)
Article Google Scholar
Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Comput. Sci. 412(19), 1832–1852 (2010)
Article MathSciNet Google Scholar
Cazenave, T.: Sequential halving applied to trees. IEEE Computer Society Press, Los Alamitos (2014)
Google Scholar
Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
Chapter Google Scholar
Feldman, Z., Domshlak, C.: Simple regret optimization in online planning for markov decision processes. CoRR abs/1206.3382 (2012)
Google Scholar
Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proc. of the Int. Conf. on Mach. Learn., pp. 1238–1246 (2013)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Pepels, T.: Novel Selection Methods for Monte-Carlo Tree Search. Master’s thesis, Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands (2014)
Google Scholar
Pepels, T., Winands, M.H.M., Lanctot, M.: Real-time Monte Carlo Tree Search in Ms Pac-Man. IEEE Trans. Comp. Intell. AI Games 6(3), 245–257 (2014)
Article Google Scholar
Powley, E.J., Whitehouse, D., Cowling, P.I.: Monte Carlo tree search with macro-actions and heuristic route planning for the physical travelling salesman problem. In: IEEE Conf. Comput. Intell. Games, pp. 234–241. IEEE (2012)
Google Scholar
Ramanujan, R., Sabharwal, A., Selman, B.: Understanding Sampling Style Adversarial Search Methods. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 474–483 (2010)
Google Scholar
Rimmel, A., Teytaud, O., Lee, C., Yen, S., Wang, M., Tsai, S.: Current frontiers in computer Go. IEEE Trans. Comput. Intell. AI in Games 2(4), 229–238 (2010)
Article Google Scholar
Teytaud, F., Teytaud, O.: On the huge benefit of decisive moves in Monte-Carlo Tree Search algorithms. In: IEEE Conference on Computational Intelligence and Games, pp. 359–364. IEEE (2010)
Google Scholar
Tolpin, D., Shimony, S.: MCTS based on simple regret. In: Proc. Assoc. Adv. Artif. Intell., pp. 570–576 (2012)
Google Scholar
Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte-Carlo Tree Search Solver. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 25–36. Springer, Heidelberg (2008)
Chapter Google Scholar
Winands, M.H.M., Björnsson, Y., Saito, J.T.: Monte Carlo Tree Search in Lines of Action. IEEE Trans. Comp. Intell. AI Games 2(4), 239–250 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Games and AI Group, Department of Knowledge Engineering, Faculty of Humanities and Sciences, Maastricht University, The Netherlands
Tom Pepels, Mark H. M. Winands & Marc Lanctot
LAMSADE - Université Paris-Dauphine, France
Tristan Cazenave

Authors

Tom Pepels
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Cazenave
View author publications
You can also search for this author in PubMed Google Scholar
Mark H. M. Winands
View author publications
You can also search for this author in PubMed Google Scholar
Marc Lanctot
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université Paris-Dauphine, Paris, France
Tristan Cazenave
Games and AI Group, Department of Knowledge Engineering, Faculty of Humanities and Sciences, Maastricht University, Maastricht, The Netherlands
Mark H. M. Winands
School of Computer Science, Reykjavik University, Iceland
Yngvi Björnsson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pepels, T., Cazenave, T., Winands, M.H.M., Lanctot, M. (2014). Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds) Computer Games. CGW 2014. Communications in Computer and Information Science, vol 504. Springer, Cham. https://doi.org/10.1007/978-3-319-14923-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-14923-3_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14922-6
Online ISBN: 978-3-319-14923-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Abstract

Chapter PDF

Similar content being viewed by others

LinUCB Applied to Monte-Carlo Tree Search

Monte Carlo Tree Search with Robust Exploration

Sub-sampling for Multi-armed Bandits

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Abstract

Chapter PDF

Similar content being viewed by others

LinUCB Applied to Monte-Carlo Tree Search

Monte Carlo Tree Search with Robust Exploration

Sub-sampling for Multi-armed Bandits

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation