Abstract
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higher-order logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of real-world data sets. By converting our kernel to a distance pseudo-metric for 1-nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene data set by more than 10%.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2003). Support vector machines for multiple-instance learning. In Advances in neural information processing systems (Vol. 15) MIT Press.
Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68.
Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of MachineLearning Research, 2, 125–137.
Blockeel, H., & De Raedt, L. (1998). Top-down induction of first order logical decision trees. Artificial Intelligence, 101:1/2, 285–297.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144–152). ACM Press.
Church, A. (1940). A formulation of the simple theory of types. Journal of Symbolic Logic, 5, 56–68.
Collins, M., & Duffy, N. (2002). Convolution kernels for natural language. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems (Vol. 14) MIT Press.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines (and other kernel-basedlearning methods). Cambridge University Press.
De Raedt, L. (1998). Attribute value learning versus inductive logic programming: The missing links (extended abstract). In D. Page (Ed.), Proceedings of the 8th International Conference on Inductive Logic Programming, Vol. 1446 of Lecture Notes in Artificial Intelligence (pp. 1–8). Springer-Verlag.
De Raedt, L., & Van Laer, W. (1995). Inductive constraint logic. In K. Jantke, T. Shinohara, & T. Zeugmann (Eds.), Proceedings of the 6th InternationalWorkshop on Algorithmic Learning Theory, Vol. 997 of LNAI, (pp. 80–94).Springer Verlag.
Dietterich, T. G., Lathrop, R. H., & Lozano-Pérez, T. (1997). Solving the multiple instance problem with axisparallel rectangles. Artificial Intelligence, 89:1/2, 31–71.
D?zeroski, S., & Lavrač N. (Eds.) (2001). Relational data mining. Springer-Verlag.
D?zeroski, S., Schulze-Kremer, S., Heidtke, K., Siems, K., Wettschereck, D., & Blockeel, H. (1998). Diterpene structure elucidation from 13C NMR spectra with inductive logic programming. Applied Artificial Intelligence, 12:5, 363–383. Special Issue on First-Order Knowledge Discovery in Databases.
Emde, W., & Wettschereck, D. (1996). Relational instance-based learning. In Proceedings of the 13th International Conference on Machine Learning (pp. 122–130). Morgan Kaufmann.
Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics.
Gärtner, T. (2002). Exponential and geometric kernels for graphs. In NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data.
Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations.
Gärtner, T., Flach, P. A., Kowalczyk, A., & Smola, A. J. (2002). Multi-instance kernels. In C. Sammut & A. Hoffmann (Eds.), Proceedings of the 19th International Conference on Machine Learning (pp. 179–186). Morgan Kaufmann.
Gärtner, T., Flach, P. A., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop.
Haussler, D. (1999). Convolution kernels on discrete structures.Technical report, Department of Computer Science, University of California at Santa Cruz.
Horváth, T., Wrobel, S., & Bohnebeck, U. (2001). Relational instance-based learning with lists and terms. Machine Learning, 43:1/2, 53–80.
Jones, S. P., & Hughes J. (Eds.) (1998). Haskell98: A Non-Strict Purely Functional Language. Available at http://haskell.org/.
Kashima, H., & Inokuchi, A. (2002). Kernels for graph classification. In ICDM Workshop on Active Mining.
Keeler, J. D., Rumelhart, D. E., & Leow, W.-K. (1991). Integrated segmentation and recognition of hand-printed numerals. In R. Lippmann, J. Moody, & D. Touretzky (Eds.), Advances in neural information processing systems, Vol. 3 (pp. 557–563). Morgan Kaufmann.
Lloyd, J. W. (2003). Logic for learning. Springer-Verlag.
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.
Maron, O., & Lozano-Pérez, T. (1998). A framework for multiple-instance learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, Vol. 10. MIT Press.
Michie, D., Muggleton, S., Page, D., & Srinivasan, A. (1994). To the international computing community: A new EastWest challenge. Technical report, Oxford University Computing laboratory, Oxford, UK.
Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., & Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 2:2.
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learing, 42:3, 203–231.
Quinlan, J. (1990). Learning logical definitions from relations. Machine Learning, 5:3, 239–266.
Ramon, J., & Bruynooghe, M. (2001). A polynomial time computable metric between point sets. Acta Informatica, 37:10, 765–780.
Ramon, J., & De Raedt, L. (2000). Multi instance neural networks. In Attribute-Value and Relational Learning: Crossing the Boundaries.AWorkshop at the Seventeenth International Conference on Machine Learning (ICML-2000).
Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In Proceedings of the 14th Annual Conference on Learning Theory.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. MIT Press.
Schölkopf, B., Smola, A. J., & Müller, K.-R. (1999). Kernel principal component analysis. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods-support vector learning ( pp. 327–352). MIT Press.
Tikhonov, A. N., & Arsenin, V. Y. (1977). Solutions of Ill-posed problems. W.H. Winston.
Vapnik, V. (1995). The nature of statistical learning theory. Springer-Verlag.
Wahba, G. (1990). Spline Models for Observational Data, Vol. 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia: SIAM.
Witten, I. H., & Frank, E. (2000). Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann.
Zhang, Q., & Goldman, S. (2002). EM-DD: An improved multiple-instance learning technique. In T. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems, Vol. 14. MIT Press.
Zien, A., Ratsch, G., Mika, S., Schölkopf, B., Lengauer, T., & Muller, K.-R. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16:9, 799–807.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gärtner, T., Lloyd, J.W. & Flach, P.A. Kernels and Distances for Structured Data. Machine Learning 57, 205–232 (2004). https://doi.org/10.1023/B:MACH.0000039777.23772.30
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000039777.23772.30