Abstract
We develop a general theoretical framework for statistical logical learning with kernels based on dynamic propositionalization, where structure learning corresponds to inferring a suitable kernel on logical objects, and parameter learning corresponds to function learning in the resulting reproducing kernel Hilbert space. In particular, we study the case where structure learning is performed by a simple FOIL-like algorithm, and propose alternative scoring functions for guiding the search process. We present an empirical evaluation on several data sets in the single-task as well as in the multi-task setting.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Argyriou, A., Hauser, R., Micchelli, C. A., & Pontil, M. (2006). A DC-programming algorithm for kernel selection. In Proceedings of the 23rd international conference of machine learning (ICML-2006) (pp. 41–48), Pittsburgh, PA, USA.
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems 19 (pp. 41–48). Cambridge: MIT Press.
Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138–156.
Ben-David, S., Eiron, N., & Simon, H. U. (2002). Limitations of learning via embeddings in Euclidean half spaces. Journal of Machine Learning Research, 3, 441–461.
Bengio, Y., Delalleau, O., & Roux, N. L. (2005). The curse of highly variable functions for local kernel machines. In Advances in neural information processing systems 18. Cambridge: MIT Press.
Blockeel, H., De Raedt, L., & Ramon, J. (1998). Top-down induction of clustering trees. In Proceeding of the 15th international conference on machine learning, Madison, Wisconsin, USA.
Blockeel, H., Dzeroski, S., Kompare, B., Kramer, S., Pfahringer, B., & Laer, W. (2004). Experiments in predicting biodegradability. Applied Artificial Intelligence, 18(2), 157–181.
Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 1579–1619.
Caponnetto, A., Micchelli, C., Pontil, M., & Ying, Y. (2008). Universal kernels for multi-task learning. Journal of Machine Learning Research.
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46(1–3), 131–159.
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 1–25.
Cristianini, N., Shawe-Taylor, J., Elisseef, A., & Kandola, J. (2001). On kernel-target alignment. In Advances in neural information processing systems 14. Cambridge: MIT Press.
Cucker, F., & Smale, S. (2002). On the mathematical foundations of learning. Bulletin (New Series) of the American Mathematical Society, 39(1), 1–49.
Cumby, C. M., & Roth, D. (2003). On kernel methods for relational learning. In Proceedings of the twentieth international conference on machine learning (pp. 107–114), Washington, DC, USA.
Datta, P., & Kibler, D. F. (1993). Concept sharing: a means to improve multi-concept learning. In Proceedings of the 10th international conference on machine learning, Amherst, MA, USA.
Davis, J., Burnside, E., de Castro Dutra, I., Page, D., & Costa, V. S. (2005). An integrated approach to learning Bayesian networks of rules. In Lecture notes in computer science : Vol. 3720. Machine learning, 16th European conference (pp. 84–95), Porto, Portugal. Berlin: Springer.
De Raedt, L. (2008). Logical and relational learning. Berlin: Springer.
De Raedt, L., & Ramon, J. (2004). Condensed representations for inductive logic programming. In Proceedings of the 9th international conference on the principles of knowledge representation and reasoning.
De Raedt, L., Lavrac, N., & Dzeroski, S. (1993). Multiple predicate learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1037–1043), Chambery, France.
De Raedt, L., Frasconi, P., Kersting, K., & Muggleton, S. (2008). Lecture notes in computer science : Vol. 4911. Probabilistic inductive logic programming—theory and applications. Berlin: Springer.
Dehaspe, L., Toivonen, H., & King, R. (1998). Finding frequent substructures in chemical compounds. In Proceedings of the 4th international conference on knowledge discovery and data mining.
Deshpande, A., Milch, B., Zettlemoyer, L., & Kaelbling, L. (2007). Learning probabilistic relational dynamics for multiple tasks. In Proceedings of the 23rd conference on uncertainty in artificial intelligence (UAI-07) (pp. 83–92).
Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
Fang, H., Tong, W., Shi, L., Blair, R., Perkins, R., Branham, W., Hass, B., Xie, Q., Dial, S., Moland, C., & Sheehan, D. (2001). Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. Chemical Research in Toxicology, 14(3), 280–294.
Frasconi, P., Passerini, A., Muggleton, S., & Lodhi, H. (2005). Declarative kernels. In Kramer, S., & Pfahringer, B. (Eds.), Proceedings of the 15th international conference on inductive logic programming, late-breaking papers (pp. 17–19).
Frasconi, P., Jaeger, M., & Passerini, A. (2008). Feature discovery with type extension trees. In Lecture notes in computer science : Vol. 5194. ILP 2008: Proceedings of the 18th international conference. Berlin: Springer.
Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.
Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5(1), 49–58.
Gärtner, T., Lloyd, J., & Flach, P. (2004). Kernels and distances for structured data. Machine Learning, 57(3), 205–232.
Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge: MIT Press.
Höffgen, K.-U., Simon, H.-U., & van Horn, K. S. (1995). Robust trainability of single neurons. Journal of Computer and System Sciences, 50(1), 114–125.
Jebara, T. (2004). Multi-task feature and kernel selection for SVMs. In Proceedings of the 21st international conference on machine learning, Banff, Alberta, Canada.
Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning (pp. 169–184). Cambridge: MIT Press.
Karalič, A., & Bratko, I. (1997). First order regression. Machine Learning, 26(2–3), 147–176.
Kersting, K., & De Raedt, L. (2007). Bayesian logic programming: theory and tools. In Getoor, L. & Taskar, B. (Eds.), Introduction to statistical relational learning. Cambridge: MIT Press.
Khan, K., Muggleton, S., & Parson, R. (1998). Repeat learning using predicate invention. In Lecture notes in computer science : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings (pp. 165–174), Madison, Wisconsin, USA. Berlin: Springer.
Kimeldorf, G. S., & Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics, 41, 495–502.
King, R., Srinivasan, A., & Sternberg, M. (1995). Relating chemical activity to structure: an examination of ILP successes. New Generation Computing, 13(2, 4), 411–433.
Kirsten, M., Wrobel, S., & Horváth, T. (2001). Distance based approaches to relational learning and clustering. In Relational data mining (pp. 213–230). Berlin: Springer.
Kok, S., & Domingos, P. (2005). Learning the structure of Markov logic networks. In Proceedings of the 22nd international conference on machine learning (pp. 441–448), Bonn, Germany. New York: ACM.
Kramer, S., & De Raedt, L. (2001). Feature construction with version spaces for biochemical applications. In Proceedings of the 18th international conference on machine learning (pp. 258–265). San Mateo: Morgan Kaufmann.
Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.
Landwehr, N., Kersting, K., & De Raedt, L. (2005). nFOIL: integrating naive Bayes and FOIL. In Proceedings of the 20th national conference on artificial intelligence (pp. 795–800), Pittsburgh, Pennsylvania, USA.
Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kFOIL: Learning simple relational kernels. In Proceedings of the 21st national conference on artificial intelligence, July 16–20, 2006, Boston, Massachusetts, USA.
Lavrac, N., & Dzeroski, S. (1994). ILP: techniques and application. Chichester: Ellis Horwood.
Leslie, C. S., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: a string kernel for SVM protein classification. In Pacific symposium on biocomputing (pp. 566–575), Lihue, Hawaii, USA.
Lloyd, J. W. (1987). Foundations of logic programming (2nd extended ed.). Berlin: Springer.
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. The Journal of Machine Learning Research, 2, 419–444.
Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. Journal of Machine Learning Research, 6, 1099–1125.
Micchelli, C., Xu, Y., & Zhang, H. (2006). Universal kernels. The Journal of Machine Learning Research, 7, 2651–2667.
Muggleton, S. (2000). Learning stochastic logic programs. In Getoor, L. & Jensen, D. (Eds.), Proceedings of the AAAI2000 workshop on learning statistical models from relational data.
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: theory and methods. Journal of Logic Programming, 19/20, 629–679.
Muggleton, S., Amini, A., & Sternberg, M. (2005). Support vector inductive logic programming. In Lecture notes in computer science : Vol. 3735. Discovery science, 8th international conference, Proceedings (pp. 163–175), Singapore. Berlin: Springer.
Obozinski, G., Taskar, B., & Jordan, M. (June, 2006). Multi-task feature selection. Technical report, Dept. of Statistics, UC Berkeley.
Ong, C. S., Smola, A. J., & Williamson, R. C. (2005). Learning the kernel with hyperkernels. Journal of Machine Learning Research, 6, 1043–1071.
Passerini, A., Frasconi, P., & De Raedt, L. (2006). Kernels on prolog proof trees: statistical learning in the ILP setting. In Probabilistic, logical and relational learning—towards a synthesis.
Platt, J. (1999). Sequential minimal optimization: a fast algorithm for training support vector machines. In Advances in kernel methods: support vector learning (pp. 185–208).
Poggio, T., & Smale, S. (2003). The mathematics of learning: dealing with data. Notices of the American Mathematical Society, 50(5), 537–544.
Popescul, A., Ungar, L., Lawrence, S., & Pennock, D. (2003). Statistical relational learning for document mining. In Proceedings of the 3rd IEEE international conference on data mining (pp. 275–282).
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceeding of the 15th international conference on machine learning, Madison, Wisconsin, USA.
Quinlan, J. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Rakotomamonjy, A. (2004). Optimizing area under ROC curve with SVMs. In Workshop on ROC analysis in AI at the 15th European conference on artificial intelligence, Pisa, Italy.
Ramon, J. (2002). Clustering and instance based learning in first order logic. Ph.D. thesis, Katholieke Universiteit Leuven, Belgium.
Ramon, J., & Bruynooghe, M. (1998). A framework for defining distances between first-order logic objects. In Lecture notes in computer science : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings (pp. 271–280), Madison, Wisconsin, USA. Berlin: Springer.
Reid, M. D. (2004). Improving rule evaluation using multitask learning. In Lecture notes in artificial intelligence : Vol. 3194. Inductive logic programming, 14th international conference, Proceedings, Porto, Portugal. Berlin: Springer.
Rückert, U., & Kramer, S. (2007). Margin-based first-order rule learning. Machine Learning, 70(2–3), 189–206.
Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., & Tsuda, K. (2009). gBoost: a mathematical programming approach to graph classification and regression. Machine Learning, 75(1), 69–89.
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-GrAdient SOlver for SVM. Proceedings of the 24th international conference on machine learning (pp. 807–814), Corvallis, Oregon, USA.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Slattery, S., & Craven, M. (1998). Combining statistical and relational methods for learning in hypertext domains. In Lecture notes in computer science. : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings, Madison, Wisconsin, USA. Berlin: Springer.
Srinivasan, A., Muggleton, S., King, R., & Sternberg, M. (1996). Theories for mutagenicity: a study of first-order and feature-based induction. Artificial Intelligence, 85, 277–299.
Srinivasan, A., King, R. D., & Bristol, D. (1999). An assessment of ILP-assisted models for toxicology and the PTE-3 experiment. In Lecture notes in computer science : Vol. 1634. Inductive logic programming, 9th international workshop, ILP-99, Proceedings, Bled, Slovenia, June 24–27, 1999. Berlin: Springer.
Steck, H. (2007). Hinge rank loss and the area under the ROC curve. In Proceedings of the 18th European conference on machine learning, Warsaw, Poland.
Swamidass, S. J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., & Baldi, P. (2005). Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1), 359–368.
Wachman, G., & Khardon, R. (2007). Learning from interpretations: a rooted kernel for ordered hypergraphs. In Proceedings of the 24th international conference on machine learning, Corvallis, Oregon, USA.
Weston, J., Schölkopf, B., Eskin, E., Leslie, C., & Noble, W. (2003). Dealing with large diagonals in kernel matrices. Annals of the Institute of Statistical Mathematics, 55(2), 391–408.
Wheeler, D. L. L., Barrett, T., Benson, D. A. A., Bryant, S. H. H., Canese, K., Chetvernin, V., Church, D. M. M., Dicuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L. Y. Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D. J. J., Madden, T. L. L., Maglott, D. R. R., Miller, V., Ostell, J., Pruitt, K. D. D., Schuler, G. D. D., Shumway, M., Sequeira, E., Sherry, S. T. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L. L., Tatusova, T. A. A., Wagner, L., & Yaschenko, E. (2008). Database resources of the national center for biotechnology information. Nucleic Acids Research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Lise Getoor.
Rights and permissions
About this article
Cite this article
Landwehr, N., Passerini, A., De Raedt, L. et al. Fast learning of relational kernels. Mach Learn 78, 305–342 (2010). https://doi.org/10.1007/s10994-009-5163-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5163-1