Abstract
Reinforcement learning, and Q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the Q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and because the Q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (RRL) is such an approach; it makes Q-learning feasible in structural domains by incorporating a relational learner into Q-learning. The problem of sparse rewards has not been addressed for RRL. This paper presents a solution based on the use of “reasonable policies” to provide guidance. Different types of policies and different strategies to supply guidance through these policies are discussed and evaluated experimentally in several relational domains to show the merits of the approach.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6:1, 37–66.
Bain, M., & Sammut, C. (1995). A framework for behavioral cloning. In S. Muggleton, K. Furukawa, & D. Michie (Eds.), Machine Intelligence, vol. 15. Oxford University Press.
Bertsekas, & Tsitsiklis (1996). Neuro-dynamic programming. Athena Scientific.
Blockeel, H., & De Raedt, L. (1998). Top-down induction of first order logical decision trees. Artificial Intelligence, 101:1/2, 285–297.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth.
Chambers, R. A., & Michie, D. (1969). Man-machine co-operation on a learning task. Computer Graphics: Techniques and Applications (pp. 179–186).
Chapman, D., & Kaelbling, L. P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisions. In Proceedings of the 12th International Joint Conference on Artificial Intelligence (pp. 726–731).
De Raedt, L. (1997). Logical settings for concept learning. Artificial Intelligence, 95, 187–201.
De Raedt, L., & D?zeroski, S. (1994). First order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70, 375–392.
Dixon, K., Malak, R., & Khosla, P. (2000). Incorporating prior knowledge and previously learned information into reinforcement learning agents. Technical report, Institute for Complex Engineered Systems, Carnegie Mellon University.
Driessens, K., & Blockeel, H. (2001). Learing digger using hierarchical reinforcement learning for concurrent goals. In Proceedings of the 5th EuropeanWorkshop on Reinforcement Learning (pp. 11–12). Onderwijsinstituut CKI, University of Utrecht.
Driessens, K., & Ramon, J. (2003). Relational instance based regression for relational reinforcement learning. In Submitted to ICML 2003.
Driessens, K., Ramon, J., & Blockeel, H. (2001). Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. In Proceedings of the 13th European Conference on Machine Learning (pp. 97–108). Springer-Verlag.
D?zeroski, S., De Raedt, L., & Blockeel, H. (1998). Relational reinforcement learning. In J. Shavlik (Ed.), Proceedings of the 15th International Conference on Machine Learning (ICML'98) (pp. 136–143). Morgan Kaufmann.
D?zeroski, S., De Raedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43, 7–52.
Emde, W., & Wettschereck, D. (1996). Relational instance-based learning. In L. Saitta (Ed.), Proceedings of theThirteenth International Conference on Machine Learning (pp. 122–130). Morgan Kaufmann.
Fern, A., Yoon, S., & Givan, R. (2003). Approximate policy iteration with a policy language bias. In T. S., L. Saul, & B. Bernhard Schlkopf (Eds.), Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems. The MIT Press.
Fikes, R. E., & Nilsson, N. J. (1971). Strips: A new approach to the application for theorem proving to problem solving. In Advance Papers of the Second International Joint Conference on Artificial Intelligence (pp. 608–620). Edinburgh, Scotland.
Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Kersting, K., & De Raedt, L. (2003). Logical markov decision programs. In Proceedings of the IJCAI'03Workshop on Learning Statistical Models of Relational Data (pp. 63–70).
Kirsten, M., Wrobel, S., & Horvath, T. (2001). Distance based approaches to relational learning and clustering. In S. D?zeroski and N. Lavrač (Eds.), Relational data mining (pp. 213–232). Springer-Verlag.
Kramer, S. (1996). Structural regression trees. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 812–819). Cambridge/Menlo Park. AAAI Press/MIT Press.
Lagoudakis, M., Parr, R., & Littman, M. (2002). Least-squares methods in reinforcement learning for control. In Proceedings of the 2nd Hellenic Conference on Artificial Intelligence (SETN-02) (pp. 249–260), Springer.
Lavrač, N., & D?zeroski, S. (1994). Inductive logic programming: Techniques and applications. Ellis Horwood.
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293–321.
Morales, E. (2003). Scaling up reinforcement learning with a relational representation. In Proc. of the Workshop on Adaptability in Multi-agent Systems (pp. 15–26).
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann series in machine learning. Morgan Kaufmann.
Ramon, J. (2002). Clustering and instance based learning in first order logic. Ph.D. thesis, Department of Computer Science, K.U. Leuven.
Ramon, J., & Bruynooghe, M. (2001). A polynomial time computable metric between point sets. Acta Informatica, 37, 765–780.
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Foundations (ed. w/ PDP Research Group). vol. 1, MA: MIT Press Cambridge.
Scheffer, T., Greiner, R., & Darken, C. (1997). Why experimentation can be better than "Perfect Guidance". In Proceedings of the 14th International Conference on Machine Learning (pp. 331–339). Morgan Kaufmann.
Shapiro, D., Langley, P., & Shachter, R. (2001). Using background knowledge to speed reinforcement learning in physical agents. In Proceedings of the 5th International Conference on Autonomous Agents. Association for Computing Machinery.
Smart, W. D., & Kaelbling, L. P. (2000). Practical reinforcement learning in continuous spaces. In Proceedings of the 17th International Conference on Machine Learning (pp. 903–910). Morgan Kaufmann.
Sutton, R. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Proceeding of the 8th Conference on Advances in Neural Information Processing Systems (pp. 1038–1044). Cambridge, MA: The MIT Press.
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: The MIT Press.
Urbancic, T., Bratko, I., & Sammut, C. (1996). Learning models of control skills: Phenomena, results and problems. In Proceedings of the 13th Triennial World Congress of the International Federation of Automatic Control (pp. 391–396). IFAC.
Utgoff, P., Berkman, N., & Clouse, J. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29:1, 5–44.
Van Laer, W., & De Raedt, L. (2001). How to upgrade propositional learners to first order logic: A case study. In S. D?zeroski, & N. Lavrač (Eds.), Relational Data Mining (pp. 235–261). Springer-Verlag.
van Otterlo, M. (2004). Reinforcement learning for relational MDPs. In Proceedings of the Machine Learning Conference of Belgium and the Netherlands 2004.
Wagner, R., & Fischer, M. (1974). The string to string correction problem. Journal of the ACM, 21(1), 168–173.
Wang, X. (1995). Learning by observation and practice:An incremental approach for planning operator acquisition. In Proceedings of the 12th International Conference on Machine Learning (pp. 549–557).
Watkins, C. (1989). Learning from delayed rewards. Ph.D. thesis, King's College, Cambridge.
Wiering, M. (1999). Explorations in efficient reinforcement learning. Ph.D. thesis, University of Amsterdam.
Yoon, S., Fern, A., & Givan, R. (2002). Inductive policy selection for first order MDPs. In Proceedings of UAI'02.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Driessens, K., Džeroski, S. Integrating Guidance into Relational Reinforcement Learning. Machine Learning 57, 271–304 (2004). https://doi.org/10.1023/B:MACH.0000039779.47329.3a
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000039779.47329.3a