Abstract
Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that neither planning nor exploration occurs uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a real-valued criterion. Many simulated problems have been rested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Akian, M., Chancelier, J.P. & Quadrat, J.P., (1988). Dynamic Programming Complexity and Application. InProceedings of the 27th Conference on Decision and Control, Austin, Texas.
Arcilla, A.S., Hauser, J., Eiseman, P.R. & Thompson, J.F., (1991).Numerical Grid Generation in Computational Fluid Dynamics and Related Fields. North-Holland.
Barto, A.G., Bradtke, S.J. & Singh, S.P., (1994). Real-time Learning and Control using Asynchronous Dynamic Programming.AI Journal, to appear (also published as UMass Amherst Technical Report 91-57 in 1991).
Barto, A.G., Sutton, R.S. & Anderson, C.W., (1983). Neuronlike Adaptive elements that that can learn difficult Control Problems.IEEE Trans. on Systems Man and Cybernetics, 13(5):835–846.
Bellman, R.E., (1957).Dynamic Programming. Princeton University Press, Princeton, NJ.
Bertsekas, D.P. & Tsitsiklis, J.N., (1989).Parallel and Distributed Computation. Prentice Hall.
Brooks, R.A. & Lozano-Perez, T., (1983). A Subdivision Algorithm in Configuration Space for Findpath with rotation. InProceedings of the 8th International Conference on Artifical Intelligence.
Chapman, D. & Kaelbling, L.P., (1991). Learning from Delayed Reinforcement In a Complex Domain. Technical Report, Teleos Research.
Chow, C.S., (1990). Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems. Technical report, M.I.T. Laboratory for Information and Decision Sciences.
Dayan, P. & Hinton, G.E., (1993). Feudal Reinforcement Learning. In S. J. Hanson, J. D Cowan, and C. L. Giles, editors,Advances in Neural Information Processing Systems 5. Morgan Kaufmann.
Hoppe, R. H. W., (1986). Multi-Grid Methods for Hamilton-Jacobi-Bellman Equations.Numerical Mathematics, 49.
Kaelbling, L. (1993). Hierarchicial Learning in Stochastic Domains: Preliminary Results. InMachine Learning: Proceedings of the Tenth International Workshop. Morgan Kaufmann.
Kaelbling, L.P., (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR-90-04, Stanford University, Department of Computer Science, June 1990.
Kambhampati, Subbarao & Davis, Larry S., (1986). Multiresolution Path Planning for Mobile Robots.IEEE Journal of Robotics and Automation, Vol. RA-2, No. 3, 2(3).
Knuth, D.E., (1973).Sorting and Searching. Addison Wesley.
Koenig, S. & Simmons, R.G. (1993). Complexity Analysis of Reinforcement Learning. InProceedings of the Eleventh International Conference on Artificial Intelligence (AAAI-93). MIT Press.
Latombe, J. (1991).Robot Motion Planning. Kluwer.
McCormick, S.F., (1989).Multilevel Adaptive Methods for Partial Differential Equations. SIAM.
Michie, D. & Chambers, R.A., (1968). BOXES: An Experiment in Adaptive Control. In E. Dale and D. Michie, editors,Machine Intelligence 2. Oliver and Boyd.
Moore, A.W., (1991). Variable Resolution Dynamic Programming: Efficiently Learning Action Maps in Multivariate Real-valued State-spaces. In L. Birnbaum and G. Collins, editors,Machine Learning: Proceedings of the Eighth International Workshop. Morgan Kaufmann.
Moore, A.W. & Atkeson, C.G., (1993). Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time.Machine Learning, 13.
Nilsson, N.J., (1971).Problem-solving Methods in Artificial Intelligence. McGraw Hill.
Peng, J. & Williams, R.J., (1993). Efficient Learning and Planning Within the Dyna Framework. InProceedings of the Second International Conference on Simulation of Adaptive Behavior. MIT Press.
Sage, A.P. & White, C.C., (1977).Optimum Systems Control. Prentice Hall.
Schaal, S. & Atkeson, C.G., (1994). Assessing the Quality of Local Linear Models. InAdvances in Neural Information Processing Systems 6. Morgan Kaufmann.
Simons, J., Van Brussel, H., De Schutter, J. & Verhaert, J. (1982). A Self-Learning Automaton with Variable Resolution for High Precision Assembly by Industrial Robots.IEEE Trans. on Automatic Control, 27(5):1109–1113.
Sutton, R.S., (1984). Temporal Credit Assignment in Reinforcement Learning. Phd. thesis, University of Massachusetts, Amherst.
Sutton, R.S., (1990). Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. InProceedings of the 7th International Conference on Machine Learning. Morgan Kaufmann.
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD. Thesis, King's College, University of Cambridge.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Moore, A.W., Atkeson, C.G. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Mach Learn 21, 199–233 (1995). https://doi.org/10.1007/BF00993591
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00993591