The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Moore, Andrew W.; Atkeson, Christopher G.

doi:10.1007/BF00993591

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Published: December 1995

Volume 21, pages 199–233, (1995)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Download PDF

Andrew W. Moore¹ &
Christopher G. Atkeson²

1576 Accesses
94 Citations
1 Altmetric
Explore all metrics

Abstract

Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that neither planning nor exploration occurs uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a real-valued criterion. Many simulated problems have been rested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.

Article PDF

PiP-X: Funnel-Based Online Feedback Motion Planning/Replanning in Dynamic Environments

OnPlan: A Framework for Simulation-Based Online Planning

Complex behavior from intrinsic motivation to occupy future action-state path space

Article Open access 29 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Akian, M., Chancelier, J.P. & Quadrat, J.P., (1988). Dynamic Programming Complexity and Application. InProceedings of the 27th Conference on Decision and Control, Austin, Texas.
Arcilla, A.S., Hauser, J., Eiseman, P.R. & Thompson, J.F., (1991).Numerical Grid Generation in Computational Fluid Dynamics and Related Fields. North-Holland.
Barto, A.G., Bradtke, S.J. & Singh, S.P., (1994). Real-time Learning and Control using Asynchronous Dynamic Programming.AI Journal, to appear (also published as UMass Amherst Technical Report 91-57 in 1991).
Barto, A.G., Sutton, R.S. & Anderson, C.W., (1983). Neuronlike Adaptive elements that that can learn difficult Control Problems.IEEE Trans. on Systems Man and Cybernetics, 13(5):835–846.
Google Scholar
Bellman, R.E., (1957).Dynamic Programming. Princeton University Press, Princeton, NJ.
Google Scholar
Bertsekas, D.P. & Tsitsiklis, J.N., (1989).Parallel and Distributed Computation. Prentice Hall.
Brooks, R.A. & Lozano-Perez, T., (1983). A Subdivision Algorithm in Configuration Space for Findpath with rotation. InProceedings of the 8th International Conference on Artifical Intelligence.
Chapman, D. & Kaelbling, L.P., (1991). Learning from Delayed Reinforcement In a Complex Domain. Technical Report, Teleos Research.
Chow, C.S., (1990). Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems. Technical report, M.I.T. Laboratory for Information and Decision Sciences.
Dayan, P. & Hinton, G.E., (1993). Feudal Reinforcement Learning. In S. J. Hanson, J. D Cowan, and C. L. Giles, editors,Advances in Neural Information Processing Systems 5. Morgan Kaufmann.
Hoppe, R. H. W., (1986). Multi-Grid Methods for Hamilton-Jacobi-Bellman Equations.Numerical Mathematics, 49.
Kaelbling, L. (1993). Hierarchicial Learning in Stochastic Domains: Preliminary Results. InMachine Learning: Proceedings of the Tenth International Workshop. Morgan Kaufmann.
Kaelbling, L.P., (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR-90-04, Stanford University, Department of Computer Science, June 1990.
Kambhampati, Subbarao & Davis, Larry S., (1986). Multiresolution Path Planning for Mobile Robots.IEEE Journal of Robotics and Automation, Vol. RA-2, No. 3, 2(3).
Knuth, D.E., (1973).Sorting and Searching. Addison Wesley.
Koenig, S. & Simmons, R.G. (1993). Complexity Analysis of Reinforcement Learning. InProceedings of the Eleventh International Conference on Artificial Intelligence (AAAI-93). MIT Press.
Latombe, J. (1991).Robot Motion Planning. Kluwer.
McCormick, S.F., (1989).Multilevel Adaptive Methods for Partial Differential Equations. SIAM.
Michie, D. & Chambers, R.A., (1968). BOXES: An Experiment in Adaptive Control. In E. Dale and D. Michie, editors,Machine Intelligence 2. Oliver and Boyd.
Moore, A.W., (1991). Variable Resolution Dynamic Programming: Efficiently Learning Action Maps in Multivariate Real-valued State-spaces. In L. Birnbaum and G. Collins, editors,Machine Learning: Proceedings of the Eighth International Workshop. Morgan Kaufmann.
Moore, A.W. & Atkeson, C.G., (1993). Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time.Machine Learning, 13.
Nilsson, N.J., (1971).Problem-solving Methods in Artificial Intelligence. McGraw Hill.
Peng, J. & Williams, R.J., (1993). Efficient Learning and Planning Within the Dyna Framework. InProceedings of the Second International Conference on Simulation of Adaptive Behavior. MIT Press.
Sage, A.P. & White, C.C., (1977).Optimum Systems Control. Prentice Hall.
Schaal, S. & Atkeson, C.G., (1994). Assessing the Quality of Local Linear Models. InAdvances in Neural Information Processing Systems 6. Morgan Kaufmann.
Simons, J., Van Brussel, H., De Schutter, J. & Verhaert, J. (1982). A Self-Learning Automaton with Variable Resolution for High Precision Assembly by Industrial Robots.IEEE Trans. on Automatic Control, 27(5):1109–1113.
Google Scholar
Sutton, R.S., (1984). Temporal Credit Assignment in Reinforcement Learning. Phd. thesis, University of Massachusetts, Amherst.
Google Scholar
Sutton, R.S., (1990). Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. InProceedings of the 7th International Conference on Machine Learning. Morgan Kaufmann.
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD. Thesis, King's College, University of Cambridge.

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, 15213, Pittsburgh, PA
Andrew W. Moore
Georgia Institute of Technology, College of Computing, 30332, Atlanta, GA
Christopher G. Atkeson

Authors

Andrew W. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Christopher G. Atkeson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moore, A.W., Atkeson, C.G. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Mach Learn 21, 199–233 (1995). https://doi.org/10.1007/BF00993591

Download citation

Received: 31 January 1994
Accepted: 08 March 1995
Issue Date: December 1995
DOI: https://doi.org/10.1007/BF00993591

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Abstract

Article PDF

Similar content being viewed by others

PiP-X: Funnel-Based Online Feedback Motion Planning/Replanning in Dynamic Environments

OnPlan: A Framework for Simulation-Based Online Planning

Complex behavior from intrinsic motivation to occupy future action-state path space

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Abstract

Article PDF

Similar content being viewed by others

PiP-X: Funnel-Based Online Feedback Motion Planning/Replanning in Dynamic Environments

OnPlan: A Framework for Simulation-Based Online Planning

Complex behavior from intrinsic motivation to occupy future action-state path space

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation