Abstract
A minimax version of temporal difference learning (minimax TD-learning) is given, similar to minimax Q-learning. The algorithm is used to train a neural net to play Campaign, a two-player zero-sum game with imperfect information of the Markov game class. Two different evaluation criteria for evaluating game-playing agents are used, and their relation to game theory is shown. Also practical aspects of linear programming and fictitious play used for solving matrix games are discussed.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Sutton, R.S.: Learning to predict by the methods of temporal differences, Machine Learning 3 (1988) 9–44.
Tesauro, G.J.: Practical issues in temporal difference learning, Machine Learning 8 (1992) 257–277.
Koller, D., Pfeffer, A.: Representations and solutions for game-theoretic problems. Artificial Intelligence 94(1) (1997) 167–215.
Halck, O.M., Dahl, F.A.: On classification of games and evaluation of players — with some sweeping generalizations about the literature. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99 Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).
Billings, D., Papp, D., Schaeffer, J., Szafron, D.: Poker as a testbed for machine intelligence research. In: Mercer, R., Neufeld, E. (eds.): Advances in Artificial Intelligence, Springer-Verlag, Berlin-Heidelberg-New York (1998) 228–238.
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann, New Brunswick (1994) 157–163.
Luce, R.D., Raiffa, H.: Games and Decisions. Wiley, New York (1957).
Strang, G.: Linear Algebra and Its Applications. Second Edition. Harcourt Brace Jovanovich, Orlando (1980).
Dahl, F.A., Halck, O.M.: Three games designed for the study of human and automated decision making. Definition and properties of the games Campaign, Operation Lucid and Operation Opaque. FFI/RAPPORT-98/02799, Norwegian Defence Research Establishment (FFI), Kjeller, Norway (1998).
Berkovitz, L.D.: The Tactical Air Game: A multimove game with mixed strategy solution. In: Grote, J.D. (ed.): The Theory and Application of Differential Games, Reidel Publishing Company, Dordrecht, The Netherlands (1975) 169–177.
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 72 (1995) 81–138.
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C. The Art of Scientific Computing. Cambridge University Press, Cambridge, UK (1988).
Szepesvari, C., Littman, M.L.: A unified analysis of value-function-based reinforcement-learning algorithms. Neural Computation 11 (1999) 2017–2060.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dahl, F.A., Halck, O.M. (2000). Minimax TD-Learning with Neural Nets in a Markov Game. In: López de Mántaras, R., Plaza, E. (eds) Machine Learning: ECML 2000. ECML 2000. Lecture Notes in Computer Science(), vol 1810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45164-1_13
Download citation
DOI: https://doi.org/10.1007/3-540-45164-1_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67602-7
Online ISBN: 978-3-540-45164-8
eBook Packages: Springer Book Archive