Abstract
The mean-square asymptotic behavior of constant stepsize temporal-difference algorithms is analyzed in this paper. The analysis is carried out for the case of a linear (cost-to-go) function approximation and for the case of Markov chains with an uncountable state space. An asymptotic upper bound for the mean-square deviation of the algorithm iterations from the optimal value of the parameter of the (cost-to-go) function approximator achievable by temporal-difference learning is determined as a function of stepsize.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Benveniste, M. Metivier, P. Priouret, Adaptive Algorithms and Stochastic Approximation, Springer Verlag, 1990.
D.P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, 1995.
D.P. Bertsekas, J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.
P.D. Dayan, The convergence of TD(λ) for general λ, Machine Learning 8 (1992), pp. 341–362.
P.D. Dayan, T.J. Sejnowski, TD(λ) converges with probability 1, Machine Learning 14 (1994), pp. 295–301.
T. Jaakola, M.I. Jordan, S.P. Singh, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation 6 (1994), pp. 1185–1201.
P.R. Kumar, P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice Hall, 1986.
S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability, Springer Verlag, 1993.
R.S. Sutton, Learning to predict by the methods of temporal-differences, Machine Learning 3 (1988), pp. 9–44.
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
V. TadiĆ, On the convergence of stochastic iterative algorithms and their applications to machine learning, in preparation.
V. TadiĆ, On the robustness of stochastic iterative algorithms and their applications to machine learning, in preparation.
V. TadiĆ, A stabilization of a class of stochastic iterative algorithms and its application to machine learning, in preparation.
V. TadiĆ, Almost sure exponential convergence of constant stepsize temporal-difference learning algorithms, in preparation.
J.N. Tsitsiklis, B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control 42 (1997), pp. 674–690.
J.N. Tsitsiklis, B. Van Roy, Feature-based methods for large scale dynamic programming, Machine Learning 22 (1996), pp. 59–94.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tadić, V. (1999). On the Asymptotic Behavior of a Constant Stepsize Temporal-Difference Learning Algorithm. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_11
Download citation
DOI: https://doi.org/10.1007/3-540-49097-3_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65701-9
Online ISBN: 978-3-540-49097-5
eBook Packages: Springer Book Archive