Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning

Matignon, Laëtitia; Laurent, Guillaume J.; Le Fort-Piat, Nadine

doi:10.1007/11840817_87

Laëtitia Matignon²⁰,
Guillaume J. Laurent²⁰ &
Nadine Le Fort-Piat²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

International Conference on Artificial Neural Networks

3633 Accesses
34 Citations

Abstract

An important issue in Reinforcement Learning (RL) is to accelerate or improve the learning process. In this paper, we study the influence of some RL parameters over the learning speed. Indeed, although RL convergence properties have been widely studied, no precise rules exist to correctly choose the reward function and initial Q-values. Our method helps the choice of these RL parameters within the context of reaching a goal in a minimal time. We develop a theoretical study and also provide experimental justifications for choosing on the one hand the reward function, and on the other hand particular initial Q-values based on a goal bias function.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Monte Carlo Bias Correction in Q-Learning

Average Reward Optimization with Multiple Discounting Reinforcement Learners

Reinforcement Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England (1989)
Google Scholar
Mataric, M.J.: Reward functions for accelerated learning. In: Proc. of the 11th ICML, pp. 181–189 (1994)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proc. of the 16th ICML, pp. 278–287 (1999)
Google Scholar
Wiewiora, E.: Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research 19, 205–208 (2003)
MATH MathSciNet Google Scholar
Hailu, G., Sommer, G.: On amount and quality of bias in reinforcement learning. In: Proc. of the IEEE International Conference on Systems, Man and Cybernetics, Tokyo, pp. 1491–1495 (1999)
Google Scholar
Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Machine Learning 22(1-3), 227–250 (1996)
Article MATH Google Scholar
Behnke, S., Bennewitz, M.: Learning to play soccer using imitative reinforcement. In: Proc. of the ICRA Workshop on Social Aspects of Robot Programming through Demonstration, Barcelona (2005)
Google Scholar
Watkins, C., Dayan, P.: Technical note: Q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar
Doya, K.: Reinforcement learning in continuous time and space. Neural Computation 12(1), 219–245 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Automatique de Besançon UMR CNRS 6596, 24 rue Alain Savary, 25000, Besançon, France
Laëtitia Matignon, Guillaume J. Laurent & Nadine Le Fort-Piat

Authors

Laëtitia Matignon
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume J. Laurent
View author publications
You can also search for this author in PubMed Google Scholar
Nadine Le Fort-Piat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Computer Engineering, Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, GR-157 80, Zographou, Greece
Stefanos D. Kollias
Department of Electrical and Computer Engineering, National Technical University of Athens, 15780, Zographou, Greece
Andreas Stafylopatis
Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Włodzisław Duch
Adaptive Informatics Research Centre, Helsinki University of Technology, HUT, P.O. Box 5400, 02015, Finland
Erkki Oja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matignon, L., Laurent, G.J., Le Fort-Piat, N. (2006). Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_87

Download citation

DOI: https://doi.org/10.1007/11840817_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Monte Carlo Bias Correction in Q-Learning

Average Reward Optimization with Multiple Discounting Reinforcement Learners

Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Monte Carlo Bias Correction in Q-Learning

Average Reward Optimization with Multiple Discounting Reinforcement Learners

Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation