Abstract
Regret minimization is important in both the Multi-Armed Bandit problem and Monte-Carlo Tree Search (MCTS). Recently, simple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in MCTS, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of MCTS research applies the UCT selection policy for minimizing cumulative regret in the tree, this paper introduces a new MCTS variant, Hybrid MCTS (H-MCTS), which minimizes both types of regret in different parts of the tree. H-MCTS uses SHOT, a recursive version of Sequential Halving, to minimize simple regret near the root, and UCT to minimize cumulative regret when descending further down the tree. We discuss the motivation for this new search technique, and show the performance of H-MCTS in six distinct two-player games: Amazons, AtariGo, Ataxx, Breakthrough, NoGo, and Pentalath.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arneson, B., Hayward, R., Henderson, P.: Monte-Carlo tree search in Hex. IEEE Trans. Comput. Intell. AI in Games 2(4), 251–258 (2010)
Audibert, J., Bubeck, S., Munos, R.: Best arm identification in multi-armed bandits. In: Proc. 23rd Conf. on Learn. Theory, pp. 41–53 (2010)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)
Balla, R.K., Fern, A.: UCT for tactical assault planning in real-time strategy games. In: Boutilier, C. (ed.) Proc. of the 21st Int. Joint Conf. on Artif. Intel. (IJCAI), pp. 40–45 (2009)
Browne, C., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte-Carlo tree search methods. IEEE Trans. on Comput. Intell. AI in Games 4(1), 1–43 (2012)
Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Comput. Sci. 412(19), 1832–1852 (2010)
Cazenave, T.: Sequential halving applied to trees. IEEE Computer Society Press, Los Alamitos (2014)
Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
Feldman, Z., Domshlak, C.: Simple regret optimization in online planning for markov decision processes. CoRR abs/1206.3382 (2012)
Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proc. of the Int. Conf. on Mach. Learn., pp. 1238–1246 (2013)
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Pepels, T.: Novel Selection Methods for Monte-Carlo Tree Search. Master’s thesis, Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands (2014)
Pepels, T., Winands, M.H.M., Lanctot, M.: Real-time Monte Carlo Tree Search in Ms Pac-Man. IEEE Trans. Comp. Intell. AI Games 6(3), 245–257 (2014)
Powley, E.J., Whitehouse, D., Cowling, P.I.: Monte Carlo tree search with macro-actions and heuristic route planning for the physical travelling salesman problem. In: IEEE Conf. Comput. Intell. Games, pp. 234–241. IEEE (2012)
Ramanujan, R., Sabharwal, A., Selman, B.: Understanding Sampling Style Adversarial Search Methods. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 474–483 (2010)
Rimmel, A., Teytaud, O., Lee, C., Yen, S., Wang, M., Tsai, S.: Current frontiers in computer Go. IEEE Trans. Comput. Intell. AI in Games 2(4), 229–238 (2010)
Teytaud, F., Teytaud, O.: On the huge benefit of decisive moves in Monte-Carlo Tree Search algorithms. In: IEEE Conference on Computational Intelligence and Games, pp. 359–364. IEEE (2010)
Tolpin, D., Shimony, S.: MCTS based on simple regret. In: Proc. Assoc. Adv. Artif. Intell., pp. 570–576 (2012)
Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte-Carlo Tree Search Solver. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 25–36. Springer, Heidelberg (2008)
Winands, M.H.M., Björnsson, Y., Saito, J.T.: Monte Carlo Tree Search in Lines of Action. IEEE Trans. Comp. Intell. AI Games 2(4), 239–250 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pepels, T., Cazenave, T., Winands, M.H.M., Lanctot, M. (2014). Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds) Computer Games. CGW 2014. Communications in Computer and Information Science, vol 504. Springer, Cham. https://doi.org/10.1007/978-3-319-14923-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-14923-3_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14922-6
Online ISBN: 978-3-319-14923-3
eBook Packages: Computer ScienceComputer Science (R0)