Minimax TD-Learning with Neural Nets in a Markov Game

Dahl, Fredrik A.; Halck, Ole Martin

doi:10.1007/3-540-45164-1_13

Fredrik A. Dahl⁴ &
Ole Martin Halck⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1810))

Included in the following conference series:

European Conference on Machine Learning

1659 Accesses
2 Citations

Abstract

A minimax version of temporal difference learning (minimax TD-learning) is given, similar to minimax Q-learning. The algorithm is used to train a neural net to play Campaign, a two-player zero-sum game with imperfect information of the Markov game class. Two different evaluation criteria for evaluating game-playing agents are used, and their relation to game theory is shown. Also practical aspects of linear programming and fictitious play used for solving matrix games are discussed.

Download to read the full chapter text

Chapter PDF

Deep Distributional Temporal Difference Learning for Game Playing

Reinforcement Learning with Neural Networks: Tricks of the Trade

Neural Networks as a Learning Component for Designing Board Games

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Sutton, R.S.: Learning to predict by the methods of temporal differences, Machine Learning 3 (1988) 9–44.
Google Scholar
Tesauro, G.J.: Practical issues in temporal difference learning, Machine Learning 8 (1992) 257–277.
MATH Google Scholar
Koller, D., Pfeffer, A.: Representations and solutions for game-theoretic problems. Artificial Intelligence 94(1) (1997) 167–215.
Article MATH MathSciNet Google Scholar
Halck, O.M., Dahl, F.A.: On classification of games and evaluation of players — with some sweeping generalizations about the literature. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99 Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).
Google Scholar
Billings, D., Papp, D., Schaeffer, J., Szafron, D.: Poker as a testbed for machine intelligence research. In: Mercer, R., Neufeld, E. (eds.): Advances in Artificial Intelligence, Springer-Verlag, Berlin-Heidelberg-New York (1998) 228–238.
Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann, New Brunswick (1994) 157–163.
Google Scholar
Luce, R.D., Raiffa, H.: Games and Decisions. Wiley, New York (1957).
MATH Google Scholar
Strang, G.: Linear Algebra and Its Applications. Second Edition. Harcourt Brace Jovanovich, Orlando (1980).
Google Scholar
Dahl, F.A., Halck, O.M.: Three games designed for the study of human and automated decision making. Definition and properties of the games Campaign, Operation Lucid and Operation Opaque. FFI/RAPPORT-98/02799, Norwegian Defence Research Establishment (FFI), Kjeller, Norway (1998).
Google Scholar
Berkovitz, L.D.: The Tactical Air Game: A multimove game with mixed strategy solution. In: Grote, J.D. (ed.): The Theory and Application of Differential Games, Reidel Publishing Company, Dordrecht, The Netherlands (1975) 169–177.
Google Scholar
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 72 (1995) 81–138.
Article Google Scholar
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C. The Art of Scientific Computing. Cambridge University Press, Cambridge, UK (1988).
MATH Google Scholar
Szepesvari, C., Littman, M.L.: A unified analysis of value-function-based reinforcement-learning algorithms. Neural Computation 11 (1999) 2017–2060.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Norwegian Defence Research Establishment (FFI), P.O. Box 25, NO-2027, Kjeller, Norway
Fredrik A. Dahl & Ole Martin Halck

Authors

Fredrik A. Dahl
View author publications
You can also search for this author in PubMed Google Scholar
Ole Martin Halck
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut d’Investigació en Intelligència Artificial, IIIA, Spanish Council for Scientific Research, CSIC, Campus, U.A.B., 08193, Bellaterra, Catalonia, Spain
Ramon López de Mántaras & Enric Plaza &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dahl, F.A., Halck, O.M. (2000). Minimax TD-Learning with Neural Nets in a Markov Game. In: López de Mántaras, R., Plaza, E. (eds) Machine Learning: ECML 2000. ECML 2000. Lecture Notes in Computer Science(), vol 1810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45164-1_13

Download citation

DOI: https://doi.org/10.1007/3-540-45164-1_13
Published: 14 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67602-7
Online ISBN: 978-3-540-45164-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Minimax TD-Learning with Neural Nets in a Markov Game

Abstract

Chapter PDF

Similar content being viewed by others

Deep Distributional Temporal Difference Learning for Game Playing

Reinforcement Learning with Neural Networks: Tricks of the Trade

Neural Networks as a Learning Component for Designing Board Games

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Minimax TD-Learning with Neural Nets in a Markov Game

Abstract

Chapter PDF

Similar content being viewed by others

Deep Distributional Temporal Difference Learning for Game Playing

Reinforcement Learning with Neural Networks: Tricks of the Trade

Neural Networks as a Learning Component for Designing Board Games

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation