Continuous-Action Q-Learning

Millán, José del R.; Posenato, Daniele; Dedieu, Eric

doi:10.1023/A:1017988514716

Continuous-Action Q-Learning

Published: November 2002

Volume 49, pages 247–265, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Continuous-Action Q-Learning

Download PDF

José del R. Millán¹,
Daniele Posenato¹ &
Eric Dedieu¹

5290 Accesses
75 Citations
Explore all metrics

Abstract

This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the “winning unit” weighted by their Q-values. Then, TD(λ) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Machine Learning (pp. 30-37).
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 835–846.
Google Scholar
Dedieu, E., & Millán, J. del R. (1998). Efficient occupancy grids for variable resolution map building. In Proceedings of the 6th International Symposium on Intelligent Robotic Systems (pp. 195-203).
Fritzke, B. (1995). A growing neural gas network learns topologies. In Advances in neural information processing systems 7 (pp. 625–632).
Google Scholar
Kohonen, T. (1997). Self-organizing maps (2nd edn.). Berlin: Springer-Verlag.
Google Scholar
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–321.
Google Scholar
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22, 159–195.
Google Scholar
Martín, P., & Millán, J. del R. (1998). Learning reaching strategies through reinforcement for a sensor-based manipulator. Neural Networks, 11, 359–376.
Google Scholar
Matari?, M. J. (1992). Integration of representation into goal-driven behavior-based robots. IEEE Transactions on Robotics and Automation, 8, 304–312.
Google Scholar
Millán, J. del R. (1992). A reinforcement connectionist learning approach to robot path finding. Ph.D. Thesis, Software Dept., Universitat Polit`ecnica de Catalunya, Barcelona, Spain.
Google Scholar
Millán, J. del R. (1996). Rapid, safe, and incremental learning of navigation strategies. IEEE Transactions on Systems, Man, and Cybernetics-Part B, 26, 408–420.
Google Scholar
Millán, J. del R. (1997). Incremental acquisition of local networks for the control of autonomous robots. In Proceedings of the 7th International Conference on Artificial Neural Networks (pp. 739-744).
Millán, J. del R., & Torras, C. (1992). A reinforcement connectionist approach to robot path finding in non-mazelike environments. Machine Learning, 8, 363–395.
Google Scholar
Santamaría, J. C., Sutton, R. S., & Ram, A. (1998). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6, 163–217.
Google Scholar
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the 10th International Conference on Machine Learning (pp. 298-305).
Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22, 123–158.
Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Google Scholar
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in neural information processing systems 8 (pp. 1038-1044).
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Google Scholar
Tadepalli, P., & Ok, D. (1998). Model-based average reward reinforcement learning. Artificial Intelligence, 100, 177–224.
Google Scholar
Tham, C. L. (1995). Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robotics and Autonomous Systems, 15, 247–274.
Google Scholar
Thrun, S. B. (1992). The role of exploration in learning control. In D. A. White & D. A. Sofge (Eds.), Handbook of intelligent control: Neural, fuzzy and adaptive approaches (pp. 527–559). New York: Van Nostrand Reinhold.
Google Scholar
Watkins, C. J. C. H. (1989). Learning with delayed rewards. Ph.D. Thesis, Cambridge University, England, UK.
Google Scholar

Download references

Author information

Authors and Affiliations

Joint Research Centre, European Commission, 21020, Ispra (VA), Italy
José del R. Millán, Daniele Posenato & Eric Dedieu

Authors

José del R. Millán
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Posenato
View author publications
You can also search for this author in PubMed Google Scholar
Eric Dedieu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Millán, J.d.R., Posenato, D. & Dedieu, E. Continuous-Action Q-Learning. Machine Learning 49, 247–265 (2002). https://doi.org/10.1023/A:1017988514716

Download citation

Issue Date: November 2002
DOI: https://doi.org/10.1023/A:1017988514716

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Continuous-Action Q-Learning

Abstract

Article PDF

Similar content being viewed by others

The Challenges of Reinforcement Learning in Robotics and Optimal Control

ETQ-learning: an improved Q-learning algorithm for path planning

Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Continuous-Action Q-Learning

Abstract

Article PDF

Similar content being viewed by others

The Challenges of Reinforcement Learning in Robotics and Optimal Control

ETQ-learning: an improved Q-learning algorithm for path planning

Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation