Integrating Guidance into Relational Reinforcement Learning

Driessens, Kurt; Džeroski, Sašo

doi:10.1023/B:MACH.0000039779.47329.3a

Integrating Guidance into Relational Reinforcement Learning

Published: December 2004

Volume 57, pages 271–304, (2004)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Integrating Guidance into Relational Reinforcement Learning

Download PDF

Kurt Driessens¹ &
Sašo Džeroski²

2339 Accesses
51 Citations
Explore all metrics

Abstract

Reinforcement learning, and Q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the Q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and because the Q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (RRL) is such an approach; it makes Q-learning feasible in structural domains by incorporating a relational learner into Q-learning. The problem of sparse rewards has not been addressed for RRL. This paper presents a solution based on the use of “reasonable policies” to provide guidance. Different types of policies and different strategies to supply guidance through these policies are discussed and evaluated experimentally in several relational domains to show the merits of the approach.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6:1, 37–66.
Google Scholar
Bain, M., & Sammut, C. (1995). A framework for behavioral cloning. In S. Muggleton, K. Furukawa, & D. Michie (Eds.), Machine Intelligence, vol. 15. Oxford University Press.
Bertsekas, & Tsitsiklis (1996). Neuro-dynamic programming. Athena Scientific.
Blockeel, H., & De Raedt, L. (1998). Top-down induction of first order logical decision trees. Artificial Intelligence, 101:1/2, 285–297.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth.
Chambers, R. A., & Michie, D. (1969). Man-machine co-operation on a learning task. Computer Graphics: Techniques and Applications (pp. 179–186).
Chapman, D., & Kaelbling, L. P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisions. In Proceedings of the 12th International Joint Conference on Artificial Intelligence (pp. 726–731).
De Raedt, L. (1997). Logical settings for concept learning. Artificial Intelligence, 95, 187–201.
Google Scholar
De Raedt, L., & D?zeroski, S. (1994). First order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70, 375–392.
Google Scholar
Dixon, K., Malak, R., & Khosla, P. (2000). Incorporating prior knowledge and previously learned information into reinforcement learning agents. Technical report, Institute for Complex Engineered Systems, Carnegie Mellon University.
Driessens, K., & Blockeel, H. (2001). Learing digger using hierarchical reinforcement learning for concurrent goals. In Proceedings of the 5th EuropeanWorkshop on Reinforcement Learning (pp. 11–12). Onderwijsinstituut CKI, University of Utrecht.
Driessens, K., & Ramon, J. (2003). Relational instance based regression for relational reinforcement learning. In Submitted to ICML 2003.
Driessens, K., Ramon, J., & Blockeel, H. (2001). Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In Proceedings of the 13th European Conference on Machine Learning (pp. 97–108). Springer-Verlag.
D?zeroski, S., De Raedt, L., & Blockeel, H. (1998). Relational reinforcement learning. In J. Shavlik (Ed.), Proceedings of the 15th International Conference on Machine Learning (ICML'98) (pp. 136–143). Morgan Kaufmann.
D?zeroski, S., De Raedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43, 7–52.
Google Scholar
Emde, W., & Wettschereck, D. (1996). Relational instance-based learning. In L. Saitta (Ed.), Proceedings of theThirteenth International Conference on Machine Learning (pp. 122–130). Morgan Kaufmann.
Fern, A., Yoon, S., & Givan, R. (2003). Approximate policy iteration with a policy language bias. In T. S., L. Saul, & B. Bernhard Schlkopf (Eds.), Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems. The MIT Press.
Fikes, R. E., & Nilsson, N. J. (1971). Strips: A new approach to the application for theorem proving to problem solving. In Advance Papers of the Second International Joint Conference on Artificial Intelligence (pp. 608–620). Edinburgh, Scotland.
Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Google Scholar
Kersting, K., & De Raedt, L. (2003). Logical markov decision programs. In Proceedings of the IJCAI'03Workshop on Learning Statistical Models of Relational Data (pp. 63–70).
Kirsten, M., Wrobel, S., & Horvath, T. (2001). Distance based approaches to relational learning and clustering. In S. D?zeroski and N. Lavrač (Eds.), Relational data mining (pp. 213–232). Springer-Verlag.
Kramer, S. (1996). Structural regression trees. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 812–819). Cambridge/Menlo Park. AAAI Press/MIT Press.
Lagoudakis, M., Parr, R., & Littman, M. (2002). Least-squares methods in reinforcement learning for control. In Proceedings of the 2nd Hellenic Conference on Artificial Intelligence (SETN-02) (pp. 249–260), Springer.
Lavrač, N., & D?zeroski, S. (1994). Inductive logic programming: Techniques and applications. Ellis Horwood.
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–321.
Google Scholar
Morales, E. (2003). Scaling up reinforcement learning with a relational representation. In Proc. of the Workshop on Adaptability in Multi-agent Systems (pp. 15–26).
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann series in machine learning. Morgan Kaufmann.
Ramon, J. (2002). Clustering and instance based learning in first order logic. Ph.D. thesis, Department of Computer Science, K.U. Leuven.
Ramon, J., & Bruynooghe, M. (2001). A polynomial time computable metric between point sets. Acta Informatica, 37, 765–780.
Google Scholar
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Foundations (ed. w/ PDP Research Group). vol. 1, MA: MIT Press Cambridge.
Scheffer, T., Greiner, R., & Darken, C. (1997). Why experimentation can be better than "Perfect Guidance". In Proceedings of the 14th International Conference on Machine Learning (pp. 331–339). Morgan Kaufmann.
Shapiro, D., Langley, P., & Shachter, R. (2001). Using background knowledge to speed reinforcement learning in physical agents. In Proceedings of the 5th International Conference on Autonomous Agents. Association for Computing Machinery.
Smart, W. D., & Kaelbling, L. P. (2000). Practical reinforcement learning in continuous spaces. In Proceedings of the 17th International Conference on Machine Learning (pp. 903–910). Morgan Kaufmann.
Sutton, R. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Proceeding of the 8th Conference on Advances in Neural Information Processing Systems (pp. 1038–1044). Cambridge, MA: The MIT Press.
Google Scholar
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: The MIT Press.
Google Scholar
Urbancic, T., Bratko, I., & Sammut, C. (1996). Learning models of control skills: Phenomena, results and problems. In Proceedings of the 13th Triennial World Congress of the International Federation of Automatic Control (pp. 391–396). IFAC.
Utgoff, P., Berkman, N., & Clouse, J. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29:1, 5–44.
Google Scholar
Van Laer, W., & De Raedt, L. (2001). How to upgrade propositional learners to first order logic: A case study. In S. D?zeroski, & N. Lavrač (Eds.), Relational Data Mining (pp. 235–261). Springer-Verlag.
van Otterlo, M. (2004). Reinforcement learning for relational MDPs. In Proceedings of the Machine Learning Conference of Belgium and the Netherlands 2004.
Wagner, R., & Fischer, M. (1974). The string to string correction problem. Journal of the ACM, 21(1), 168–173.
Google Scholar
Wang, X. (1995). Learning by observation and practice:An incremental approach for planning operator acquisition. In Proceedings of the 12th International Conference on Machine Learning (pp. 549–557).
Watkins, C. (1989). Learning from delayed rewards. Ph.D. thesis, King's College, Cambridge.
Wiering, M. (1999). Explorations in efficient reinforcement learning. Ph.D. thesis, University of Amsterdam.
Yoon, S., Fern, A., & Givan, R. (2002). Inductive policy selection for first order MDPs. In Proceedings of UAI'02.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001, Heverlee, Belgium
Kurt Driessens
Department of Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia
Sašo Džeroski

Authors

Kurt Driessens
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Driessens, K., Džeroski, S. Integrating Guidance into Relational Reinforcement Learning. Machine Learning 57, 271–304 (2004). https://doi.org/10.1023/B:MACH.0000039779.47329.3a

Download citation

Issue Date: December 2004
DOI: https://doi.org/10.1023/B:MACH.0000039779.47329.3a

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Integrating Guidance into Relational Reinforcement Learning

Abstract

Article PDF

Similar content being viewed by others

A Direct Policy-Search Algorithm for Relational Reinforcement Learning

Algebraic Reinforcement Learning

Reinforcement Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Integrating Guidance into Relational Reinforcement Learning

Abstract

Article PDF

Similar content being viewed by others

A Direct Policy-Search Algorithm for Relational Reinforcement Learning

Algebraic Reinforcement Learning

Reinforcement Learning

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation