Temporal Difference Learning and Simulated Annealing for Optimal Control: A Case Study

Leng, Jinsong; Sathyaraj, Beulah M.; Jain, Lakhmi

doi:10.1007/978-3-540-78582-8_50

Jinsong Leng¹,
Beulah M. Sathyaraj¹ &
Lakhmi Jain¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4953))

Included in the following conference series:

KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications

1623 Accesses
2 Citations

Abstract

The trade-off between exploration and exploitation has an important impact on the performance of temporal difference learning. There are several action selection strategies, however, it is unclear which strategy is better. The impact of action selection strategies may depend on the application domains and human factors. This paper presents a modified Sarsa(λ) control algorithm by sampling actions in conjunction with simulated annealing technique. A game of soccer is utilised as the simulation environment, which has a large, dynamic and continuous state space. The empirical results demonstrate that the quality of convergence has been significantly improved by using the simulated annealing approach.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance

Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing

StRRT-based path planning with PSO-tuned parameters for RoboCup soccer

Article 12 November 2014

Keywords

References

Teambots (2000), http://www.cs.cmu.edu/~trb/Teambots/Domains/SoccerBots
Albus, J.S.: A Theory of Cerebellar Function. Mathematical Biosciences 10, 25–61 (1971)
Article Google Scholar
Atiya, A.F., Parlos, A.G., Ingber, L.: A Reinforcement Learning Method Based on Adaptive Simulated Annealing. In: Proceedings of the 46th IEEE International Midwest Symposium on, pp. 121–124 (2003)
Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Google Scholar
Chaharsooghi, S.K., Jafari, N.: A Simulated Annealing Approach for Product Mix Decisions. Scientia Iranica 14(3), 230–235 (2007)
Google Scholar
Dowsland, K.A.: Simulated Annealing. In: Modern Heuristic Techniques for Combinatorial Problems (1995)
Google Scholar
Guo, M., Liu, Y., Malec, J.: A New Q-learning Algorithm Based on the Metropolis Criterion. Systems, Man and Cybernetics, Part B, IEEE Transactions on 34(5), 2140–2143 (2004)
Article Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)
MATH Google Scholar
Ingber, L.: Very Fast Simulated Re-annealing. Mathematical Computer Modelling 12(8), 967–973 (1989)
Article MATH MathSciNet Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220(4598), 671–680 (1983)
Article MathSciNet Google Scholar
Klopf, A.H.: Brain Function and Adaptive Systems–A Heterostatic Theory. Technical report, AFCRL–72–0164, Air Force Cambridge Research Laboratories, Bedford, MA (1972)
Google Scholar
Leng, J., Fyfe, C., Jain, L.: Reinforcement Learning of Competitive Skills with Soccer Agents. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part I. LNCS (LNAI), vol. 4692, Springer, Heidelberg (2007)
Chapter Google Scholar
Leng, J., Jain, L., Fyfe, C.: Simulation and Reinforcement Learning with Soccer Agents. Journal of Multiagent and Grid systems, IOS Press, The Netherlands 4(4) (to be published, 2008)
Google Scholar
Leng, J., Jain, L., Fyfe, C.: Convergence Analysis on Approximate Reinforcement Learning. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part I. LNCS (LNAI), vol. 4692, pp. 85–91. Springer, Heidelberg (2007)
Chapter Google Scholar
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 21, 1087–1092 (1953)
Article Google Scholar
Russel, S., Norwig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs (2003)
Google Scholar
Stefán, P., Monostori, L.: On the relationship between learning capability and the boltzmann-formula. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, pp. 227–236. Springer, Heidelberg (2001)
Google Scholar
Sutton, R.S.: Learning to Predict by the Method of Temporal Differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Vien, N.A., Viet, N.H., Lee, S., Chung, T.: Heuristic Search Based Exploration in Reinforcement Learning. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 110–118. Springer, Heidelberg (2007)
Chapter Google Scholar
White, S.R.: Concepts of scale in simulated annealing. In: AIP Conference Proceedings, vol. 122, pp. 261–270 (1984)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, SA 5095, Australia
Jinsong Leng, Beulah M. Sathyaraj & Lakhmi Jain

Authors

Jinsong Leng
View author publications
You can also search for this author in PubMed Google Scholar
Beulah M. Sathyaraj
View author publications
You can also search for this author in PubMed Google Scholar
Lakhmi Jain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ngoc Thanh Nguyen Geun Sik Jo Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leng, J., Sathyaraj, B.M., Jain, L. (2008). Temporal Difference Learning and Simulated Annealing for Optimal Control: A Case Study. In: Nguyen, N.T., Jo, G.S., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2008. Lecture Notes in Computer Science(), vol 4953. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78582-8_50

Download citation

DOI: https://doi.org/10.1007/978-3-540-78582-8_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78581-1
Online ISBN: 978-3-540-78582-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Temporal Difference Learning and Simulated Annealing for Optimal Control: A Case Study

Abstract

Chapter PDF

Similar content being viewed by others

Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance

Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing

StRRT-based path planning with PSO-tuned parameters for RoboCup soccer

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Temporal Difference Learning and Simulated Annealing for Optimal Control: A Case Study

Abstract

Chapter PDF

Similar content being viewed by others

Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance

Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing

StRRT-based path planning with PSO-tuned parameters for RoboCup soccer

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation