Abstract
The sample complexity of a reinforcement-learning algorithm is highly coupled to how proficiently it explores, which in turn depends critically on the effective size of its state space. This paper proposes a new exploration mechanism for model-based algorithms in continuous state spaces that automatically discovers the relevant dimensions of the environment. We show that this information can be used to dramatically decrease the sample complexity of the algorithm over conventional exploration techniques. This improvement is achieved by maintaining a low-dimensional representation of the transition function. Empirical evaluations in several environments, including simulation benchmarks and a real robotics domain, suggest that the new method outperforms state-of-the-art algorithms and that the behavior is robust and stable.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.
Chow, C. S., & Tsitsiklis, J. N. (1989). The complexity of dynamic programming. Journal of Complexity, 5(4), 466–488.
Fukumizu, K., Bach, F. R., & Jordan, M. I. (2009). Kernel dimension reduction in regression. Annals of Statistics, 37, 1871.
Gordon, G. J. (1995). Stable function approximation in dynamic programming. In Proceedings of the twelfth international conference on machine learning.
Jolliffe, I. (1986). Principal component analysis. New York: Springer.
Jong, N. K., & Stone, P. (2007). Model-based exploration in continuous state spaces. In The 7th symposium on abstraction, reformulation, and approximation.
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London.
Kolter, J. Z., & Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th international conference on machine learning.
Mahadevan, S. (2009). Learning representation and control in Markov decision processes: new frontiers. Foundations and Trends in Machine Learning, 1(4), 403–565. http://dx.doi.org/10.1561/2200000003.
Nouri, A., & Littman, M. L. (2008). Multi-resolution exploration in continuous spaces. In Proceedings of the 22nd neural information processing systems.
Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. L. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In Proceedings of the 25th international conference on machine learning.
Smart, W. D. (2004). Explicit manifold representations for value-function approximation in reinforcement learning. In Proceedings of the 8th international symposium on artificial intelligence and mathematics (pp. 25–2004).
Strehl, A. L. (2007). Model-based reinforcement learning in factored-state MDPs. In Proceedings of the 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning.
Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state MDPs. In Proceedings of the 22nd national conference on artificial intelligence.
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th international conference on machine learning.
Sutton, R. S. (1996). Generalization in reinforcement learning: successful examples using sparse coarse coding. In Proceedings of the 8th advances in neural information processing systems. Cambridge, MA.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.
Thrun, S. B. (1992). The role of exploration in learning control. In Handbook of intelligent control: neural, fuzzy, and adaptive approaches (pp. 527–559). New York: Van Nostrand Reinhold.
Weinberger, K. Q., & Tesauro, G. (2007). Metric learning for kernel regression. In Proceedings of the 11th international conference on artificial intelligence and statistics.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: José L Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag.
Rights and permissions
About this article
Cite this article
Nouri, A., Littman, M.L. Dimension reduction and its application to model-based exploration in continuous spaces. Mach Learn 81, 85–98 (2010). https://doi.org/10.1007/s10994-010-5202-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5202-y