Fast learning of relational kernels

Landwehr, Niels; Passerini, Andrea; De Raedt, Luc; Frasconi, Paolo

doi:10.1007/s10994-009-5163-1

Fast learning of relational kernels

Published: 05 January 2010

Volume 78, pages 305–342, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Fast learning of relational kernels

Download PDF

Niels Landwehr¹,
Andrea Passerini²,
Luc De Raedt³ &
…
Paolo Frasconi⁴

902 Accesses
19 Citations
Explore all metrics

Abstract

We develop a general theoretical framework for statistical logical learning with kernels based on dynamic propositionalization, where structure learning corresponds to inferring a suitable kernel on logical objects, and parameter learning corresponds to function learning in the resulting reproducing kernel Hilbert space. In particular, we study the case where structure learning is performed by a simple FOIL-like algorithm, and propose alternative scoring functions for guiding the search process. We present an empirical evaluation on several data sets in the single-task as well as in the multi-task setting.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Argyriou, A., Hauser, R., Micchelli, C. A., & Pontil, M. (2006). A DC-programming algorithm for kernel selection. In Proceedings of the 23rd international conference of machine learning (ICML-2006) (pp. 41–48), Pittsburgh, PA, USA.
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems 19 (pp. 41–48). Cambridge: MIT Press.
Google Scholar
Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138–156.
Article MATH MathSciNet Google Scholar
Ben-David, S., Eiron, N., & Simon, H. U. (2002). Limitations of learning via embeddings in Euclidean half spaces. Journal of Machine Learning Research, 3, 441–461.
Article MathSciNet Google Scholar
Bengio, Y., Delalleau, O., & Roux, N. L. (2005). The curse of highly variable functions for local kernel machines. In Advances in neural information processing systems 18. Cambridge: MIT Press.
Google Scholar
Blockeel, H., De Raedt, L., & Ramon, J. (1998). Top-down induction of clustering trees. In Proceeding of the 15th international conference on machine learning, Madison, Wisconsin, USA.
Blockeel, H., Dzeroski, S., Kompare, B., Kramer, S., Pfahringer, B., & Laer, W. (2004). Experiments in predicting biodegradability. Applied Artificial Intelligence, 18(2), 157–181.
Article Google Scholar
Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 1579–1619.
MathSciNet Google Scholar
Caponnetto, A., Micchelli, C., Pontil, M., & Ying, Y. (2008). Universal kernels for multi-task learning. Journal of Machine Learning Research.
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.
Article Google Scholar
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46(1–3), 131–159.
Article MATH Google Scholar
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 1–25.
Google Scholar
Cristianini, N., Shawe-Taylor, J., Elisseef, A., & Kandola, J. (2001). On kernel-target alignment. In Advances in neural information processing systems 14. Cambridge: MIT Press.
Google Scholar
Cucker, F., & Smale, S. (2002). On the mathematical foundations of learning. Bulletin (New Series) of the American Mathematical Society, 39(1), 1–49.
Article MATH MathSciNet Google Scholar
Cumby, C. M., & Roth, D. (2003). On kernel methods for relational learning. In Proceedings of the twentieth international conference on machine learning (pp. 107–114), Washington, DC, USA.
Datta, P., & Kibler, D. F. (1993). Concept sharing: a means to improve multi-concept learning. In Proceedings of the 10th international conference on machine learning, Amherst, MA, USA.
Davis, J., Burnside, E., de Castro Dutra, I., Page, D., & Costa, V. S. (2005). An integrated approach to learning Bayesian networks of rules. In Lecture notes in computer science : Vol. 3720. Machine learning, 16th European conference (pp. 84–95), Porto, Portugal. Berlin: Springer.
Google Scholar
De Raedt, L. (2008). Logical and relational learning. Berlin: Springer.
Book MATH Google Scholar
De Raedt, L., & Ramon, J. (2004). Condensed representations for inductive logic programming. In Proceedings of the 9th international conference on the principles of knowledge representation and reasoning.
De Raedt, L., Lavrac, N., & Dzeroski, S. (1993). Multiple predicate learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1037–1043), Chambery, France.
De Raedt, L., Frasconi, P., Kersting, K., & Muggleton, S. (2008). Lecture notes in computer science : Vol. 4911. Probabilistic inductive logic programming—theory and applications. Berlin: Springer.
MATH Google Scholar
Dehaspe, L., Toivonen, H., & King, R. (1998). Finding frequent substructures in chemical compounds. In Proceedings of the 4th international conference on knowledge discovery and data mining.
Deshpande, A., Milch, B., Zettlemoyer, L., & Kaelbling, L. (2007). Learning probabilistic relational dynamics for multiple tasks. In Proceedings of the 23rd conference on uncertainty in artificial intelligence (UAI-07) (pp. 83–92).
Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
MathSciNet Google Scholar
Fang, H., Tong, W., Shi, L., Blair, R., Perkins, R., Branham, W., Hass, B., Xie, Q., Dial, S., Moland, C., & Sheehan, D. (2001). Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. Chemical Research in Toxicology, 14(3), 280–294.
Article Google Scholar
Frasconi, P., Passerini, A., Muggleton, S., & Lodhi, H. (2005). Declarative kernels. In Kramer, S., & Pfahringer, B. (Eds.), Proceedings of the 15th international conference on inductive logic programming, late-breaking papers (pp. 17–19).
Frasconi, P., Jaeger, M., & Passerini, A. (2008). Feature discovery with type extension trees. In Lecture notes in computer science : Vol. 5194. ILP 2008: Proceedings of the 18th international conference. Berlin: Springer.
Google Scholar
Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.
Article MATH Google Scholar
Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5(1), 49–58.
Article Google Scholar
Gärtner, T., Lloyd, J., & Flach, P. (2004). Kernels and distances for structured data. Machine Learning, 57(3), 205–232.
Article MATH Google Scholar
Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge: MIT Press.
MATH Google Scholar
Höffgen, K.-U., Simon, H.-U., & van Horn, K. S. (1995). Robust trainability of single neurons. Journal of Computer and System Sciences, 50(1), 114–125.
Article MATH MathSciNet Google Scholar
Jebara, T. (2004). Multi-task feature and kernel selection for SVMs. In Proceedings of the 21st international conference on machine learning, Banff, Alberta, Canada.
Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning (pp. 169–184). Cambridge: MIT Press.
Google Scholar
Karalič, A., & Bratko, I. (1997). First order regression. Machine Learning, 26(2–3), 147–176.
Article MATH Google Scholar
Kersting, K., & De Raedt, L. (2007). Bayesian logic programming: theory and tools. In Getoor, L. & Taskar, B. (Eds.), Introduction to statistical relational learning. Cambridge: MIT Press.
Google Scholar
Khan, K., Muggleton, S., & Parson, R. (1998). Repeat learning using predicate invention. In Lecture notes in computer science : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings (pp. 165–174), Madison, Wisconsin, USA. Berlin: Springer.
Google Scholar
Kimeldorf, G. S., & Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics, 41, 495–502.
Article MATH MathSciNet Google Scholar
King, R., Srinivasan, A., & Sternberg, M. (1995). Relating chemical activity to structure: an examination of ILP successes. New Generation Computing, 13(2, 4), 411–433.
Article Google Scholar
Kirsten, M., Wrobel, S., & Horváth, T. (2001). Distance based approaches to relational learning and clustering. In Relational data mining (pp. 213–230). Berlin: Springer.
Google Scholar
Kok, S., & Domingos, P. (2005). Learning the structure of Markov logic networks. In Proceedings of the 22nd international conference on machine learning (pp. 441–448), Bonn, Germany. New York: ACM.
Chapter Google Scholar
Kramer, S., & De Raedt, L. (2001). Feature construction with version spaces for biochemical applications. In Proceedings of the 18th international conference on machine learning (pp. 258–265). San Mateo: Morgan Kaufmann.
Google Scholar
Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.
Google Scholar
Landwehr, N., Kersting, K., & De Raedt, L. (2005). nFOIL: integrating naive Bayes and FOIL. In Proceedings of the 20th national conference on artificial intelligence (pp. 795–800), Pittsburgh, Pennsylvania, USA.
Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kFOIL: Learning simple relational kernels. In Proceedings of the 21st national conference on artificial intelligence, July 16–20, 2006, Boston, Massachusetts, USA.
Lavrac, N., & Dzeroski, S. (1994). ILP: techniques and application. Chichester: Ellis Horwood.
Google Scholar
Leslie, C. S., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: a string kernel for SVM protein classification. In Pacific symposium on biocomputing (pp. 566–575), Lihue, Hawaii, USA.
Lloyd, J. W. (1987). Foundations of logic programming (2nd extended ed.). Berlin: Springer.
MATH Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. The Journal of Machine Learning Research, 2, 419–444.
Article MATH Google Scholar
Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. Journal of Machine Learning Research, 6, 1099–1125.
MathSciNet Google Scholar
Micchelli, C., Xu, Y., & Zhang, H. (2006). Universal kernels. The Journal of Machine Learning Research, 7, 2651–2667.
MathSciNet Google Scholar
Muggleton, S. (2000). Learning stochastic logic programs. In Getoor, L. & Jensen, D. (Eds.), Proceedings of the AAAI2000 workshop on learning statistical models from relational data.
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: theory and methods. Journal of Logic Programming, 19/20, 629–679.
Article MathSciNet Google Scholar
Muggleton, S., Amini, A., & Sternberg, M. (2005). Support vector inductive logic programming. In Lecture notes in computer science : Vol. 3735. Discovery science, 8th international conference, Proceedings (pp. 163–175), Singapore. Berlin: Springer.
Google Scholar
Obozinski, G., Taskar, B., & Jordan, M. (June, 2006). Multi-task feature selection. Technical report, Dept. of Statistics, UC Berkeley.
Ong, C. S., Smola, A. J., & Williamson, R. C. (2005). Learning the kernel with hyperkernels. Journal of Machine Learning Research, 6, 1043–1071.
MathSciNet Google Scholar
Passerini, A., Frasconi, P., & De Raedt, L. (2006). Kernels on prolog proof trees: statistical learning in the ILP setting. In Probabilistic, logical and relational learning—towards a synthesis.
Platt, J. (1999). Sequential minimal optimization: a fast algorithm for training support vector machines. In Advances in kernel methods: support vector learning (pp. 185–208).
Poggio, T., & Smale, S. (2003). The mathematics of learning: dealing with data. Notices of the American Mathematical Society, 50(5), 537–544.
MATH MathSciNet Google Scholar
Popescul, A., Ungar, L., Lawrence, S., & Pennock, D. (2003). Statistical relational learning for document mining. In Proceedings of the 3rd IEEE international conference on data mining (pp. 275–282).
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceeding of the 15th international conference on machine learning, Madison, Wisconsin, USA.
Quinlan, J. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Google Scholar
Rakotomamonjy, A. (2004). Optimizing area under ROC curve with SVMs. In Workshop on ROC analysis in AI at the 15th European conference on artificial intelligence, Pisa, Italy.
Ramon, J. (2002). Clustering and instance based learning in first order logic. Ph.D. thesis, Katholieke Universiteit Leuven, Belgium.
Ramon, J., & Bruynooghe, M. (1998). A framework for defining distances between first-order logic objects. In Lecture notes in computer science : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings (pp. 271–280), Madison, Wisconsin, USA. Berlin: Springer.
Google Scholar
Reid, M. D. (2004). Improving rule evaluation using multitask learning. In Lecture notes in artificial intelligence : Vol. 3194. Inductive logic programming, 14th international conference, Proceedings, Porto, Portugal. Berlin: Springer.
Google Scholar
Rückert, U., & Kramer, S. (2007). Margin-based first-order rule learning. Machine Learning, 70(2–3), 189–206.
Google Scholar
Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., & Tsuda, K. (2009). gBoost: a mathematical programming approach to graph classification and regression. Machine Learning, 75(1), 69–89.
Article Google Scholar
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-GrAdient SOlver for SVM. Proceedings of the 24th international conference on machine learning (pp. 807–814), Corvallis, Oregon, USA.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Google Scholar
Slattery, S., & Craven, M. (1998). Combining statistical and relational methods for learning in hypertext domains. In Lecture notes in computer science. : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings, Madison, Wisconsin, USA. Berlin: Springer.
Google Scholar
Srinivasan, A., Muggleton, S., King, R., & Sternberg, M. (1996). Theories for mutagenicity: a study of first-order and feature-based induction. Artificial Intelligence, 85, 277–299.
Article Google Scholar
Srinivasan, A., King, R. D., & Bristol, D. (1999). An assessment of ILP-assisted models for toxicology and the PTE-3 experiment. In Lecture notes in computer science : Vol. 1634. Inductive logic programming, 9th international workshop, ILP-99, Proceedings, Bled, Slovenia, June 24–27, 1999. Berlin: Springer.
Google Scholar
Steck, H. (2007). Hinge rank loss and the area under the ROC curve. In Proceedings of the 18th European conference on machine learning, Warsaw, Poland.
Swamidass, S. J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., & Baldi, P. (2005). Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1), 359–368.
Article Google Scholar
Wachman, G., & Khardon, R. (2007). Learning from interpretations: a rooted kernel for ordered hypergraphs. In Proceedings of the 24th international conference on machine learning, Corvallis, Oregon, USA.
Weston, J., Schölkopf, B., Eskin, E., Leslie, C., & Noble, W. (2003). Dealing with large diagonals in kernel matrices. Annals of the Institute of Statistical Mathematics, 55(2), 391–408.
MATH MathSciNet Google Scholar
Wheeler, D. L. L., Barrett, T., Benson, D. A. A., Bryant, S. H. H., Canese, K., Chetvernin, V., Church, D. M. M., Dicuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L. Y. Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D. J. J., Madden, T. L. L., Maglott, D. R. R., Miller, V., Ostell, J., Pruitt, K. D. D., Schuler, G. D. D., Shumway, M., Sequeira, E., Sherry, S. T. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L. L., Tatusova, T. A. A., Wagner, L., & Yaschenko, E. (2008). Database resources of the national center for biotechnology information. Nucleic Acids Research.

Download references

Author information

Authors and Affiliations

Department for Computer Science, University of Potsdam, August-Bebel-Str. 89, 14482, Potsdam, Germany
Niels Landwehr
Dipartimento di Ingegneria e Scienza dell’Informazione, Università degli Studi di Trento, Via Sommarive 14, 38100, Povo, TN, Italy
Andrea Passerini
Departement Computerwetenschappen, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt
Machine Learning and Neural Networks Group, Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Via di Santa Marta 3, 50139, Firenze, Italy
Paolo Frasconi

Authors

Niels Landwehr
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Passerini
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Frasconi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niels Landwehr.

Additional information

Editor: Lise Getoor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Landwehr, N., Passerini, A., De Raedt, L. et al. Fast learning of relational kernels. Mach Learn 78, 305–342 (2010). https://doi.org/10.1007/s10994-009-5163-1

Download citation

Received: 27 October 2008
Revised: 24 August 2009
Accepted: 15 November 2009
Published: 05 January 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10994-009-5163-1

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fast learning of relational kernels

Abstract

Article PDF

Similar content being viewed by others

Learning Łukasiewicz Logic Fragments by Quadratic Programming

Kernel Methods for Structured Data

Lazy and Eager Relational Learning Using Graph-Kernels

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Fast learning of relational kernels

Abstract

Article PDF

Similar content being viewed by others

Learning Łukasiewicz Logic Fragments by Quadratic Programming

Kernel Methods for Structured Data

Lazy and Eager Relational Learning Using Graph-Kernels

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation