Abstract
Kernel methods are a class of non-parametric learning techniques relying on kernels. A kernel generalizes dot products to arbitrary domains and can thus be seen as a similarity measure between data points with complex structures. The use of kernels allows to decouple the representation of the data from the specific learning algorithm, provided it can be defined in terms of distance or similarity between instances. Under this unifying formalism a wide range of methods have been developed, dealing with binary and multiclass classification, regression, ranking, clustering and novelty detection to name a few. Recent developments include statistical tests of dependency and alignments between related domains, such as documents written in different languages. Key to the success of any kernel method is the definition of an appropriate kernel for the data at hand. A well-designed kernel should capture the aspects characterizing similar instances while being computationally efficient. Building on the seminal work by D. Haussler on convolution kernels, a vast literature on kernels for structured data has arisen. Kernels have been designed for sequences, trees and graphs, as well as arbitrary relational data represented in first or higher order logic. From the representational viewpoint, this allowed to address one of the main limitations of statistical learning approaches, namely the difficulty to deal with complex domain knowledge. Interesting connections between the complementary fields of statistical and symbolic learning have arisen as one of the consequences. Another interesting connection made possible by kernels is between generative and discriminative learning. Here data are represented with generative models and appropriate kernels are built on top of them to be used in a discriminative setting.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Feature Space
- Kernel Method
- Reproduce Kernel Hilbert Space
- Inductive Logic Program
- Kernel Principal Component Analysis
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aiolli, F., Da San Martino, G., Sperduti, A.: Route kernels for trees. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 17–24. ACM, New York (2009)
Amari, S.I.: Mathematical foundations of neurocomputing. Proceedings of the IEEE 78(9), 1443–1463 (1990)
Amari, S.I.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)
Aronszajn, N.: Theory of reproducing kernels. Trans. Amer. Math. Soc. 686, 337–404 (1950)
Bakir, G.H., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data (Neural Information Processing). The MIT Press (2007)
Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups. Springer, New York (1984)
Borgwardt, K.M.: Graph Kernels. PhD thesis, Ludwig-Maximilians-University Munich (2007)
Borgwardt, K.M., Kriegel, H.-P.: Shortest-path kernels on graphs. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 74–81. IEEE Computer Society, Washington, DC (2005)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifier. In: Proc. 5th ACM Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 144–152 (July 1992)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14. MIT Press (2002)
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, USA, pp. 263–270 (2002)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press (2000)
Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J.: On kernel-target alignment. In: Advances in Neural Information Processing Systems 14, vol. 14, pp. 367–373 (2002)
De Raedt, L.: Logical and Relational Learning. Springer (2008)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)
Fletcher, R.: Practical Methods of Optimization, 2nd edn. John Wiley & Sons (1987)
Frasconi, P., Passerini, A.: Learning with Kernels and Logical Representations. In: De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.) Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911, pp. 56–91. Springer, Heidelberg (2008)
Gärtner, T.: Exponential and geometric kernels for graphs. In: NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data (2002)
Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the 19th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann (2002)
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels for Structured Data. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 66–83. Springer, Heidelberg (2003)
Gärtner, T.: Kernels for Structured Data. PhD thesis, Universität Bonn (2005)
Gärtner, T., Flach, P.A., Wrobel, S.: On Graph Kernels: Hardness Results and Efficient Alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Mach. Learn. 57, 205–232 (2004)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 513–520. MIT Press, Cambridge (2007)
Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.J.: A kernel statistical test of independence. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20. MIT Press, Cambridge (2008)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)
Ham, J., Lee, D.D., Mika, S., Schölkopf, B.: A kernel view of the dimensionality reduction of manifolds. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 47. ACM, New York (2004)
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz (1999)
Hoffmann, H.: Kernel pca for novelty detection. Pattern Recogn. 40, 863–874 (2007)
Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Annals of Statistics 36(3), 1171–1220 (2008)
Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 158–167. ACM, New York (2004)
Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7(1-2), 95–114 (2000)
Jaakkola, T., Haussler, D.: Probabilistic kernel regression models. In: Proc. of Neural Information Processing Conference (1998)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pp. 487–493. MIT Press, Cambridge (1999)
Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5, 819–844 (2004)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, ch. 11, pp. 169–185. MIT Press (1998)
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328. AAAI Press (2003)
Keerthi, S.S., Duan, K.B., Shevade, S.K., Poo, A.N.: A fast dual algorithm for kernel logistic regression. Mach. Learn. 61, 151–165 (2005)
Kim, K., Franz, M.O., Schölkopf, B.: Iterative kernel principal component analysis for image modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(9), 1351–1366 (2005)
Kimeldorf, G., Wahba, G.: Some results on tchebycheffian spline functions. J. Math. Anal. Applic. 33, 82–95 (1971)
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete input spaces. In: Sammut, C., Hoffmann, A. (eds.) Proc. of the 19th International Conference on Machine Learning, pp. 315–322. Morgan Kaufmann (2002)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kfoil: learning simple relational kernels. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 389–394. AAAI Press (2006)
Landwehr, N., Passerini, A., Raedt, L., Frasconi, P.: Fast learning of relational kernels. Mach. Learn. 78, 305–342 (2010)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for svm protein classification. In: Proc. of the Pacific Symposium on Biocomputing, pp. 564–575 (2002)
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 1417–1424. MIT Press, Cambridge (2003)
Leslie, C., Kuang, R., Eskin, E.: Inexact matching string kernels for protein classificatio. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, MIT Press (2004) (in press)
Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. In: Advances in Neural Information Processing Systems, pp. 563–569 (2000)
Da San Martino, G.: Kernel Methods for Tree Structured Data. PhD thesis, Department of Computer Science, University of Bologna (2009)
Menchetti, S.: Learning Preference and Structured Data: Theory and Applications. PhD thesis, Dipartimento di Sistemi e Informatica, DSI, Università di Firenze, Italy (December 2005)
Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 585–592. ACM, New York (2005)
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London A 209, 415–446 (1909)
Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. J. Mach. Learn. Res. 7, 2651–2667 (2006)
Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)
Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19/20, 629–679 (1994)
Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Burges, C., Schölkopf, B. (eds.) Advances in Kernel Methods–Support Vector Learning. MIT Press (1998)
Poggio, T., Smale, S.: The mathematics of learning: Dealing with data. Notices of the American Mathematical Society 50(5), 537–544 (2003)
Quinlan, J.R.: Learning Logical Definitions from Relations. Machine Learning 5, 239–266 (1990)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Rakotomamonjy, A., Bach, F.R., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research 9, 2491–2521 (2008)
Rasmussenand, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press (December 2005)
Saitoh, S.: Theory of Reproducing Kernels and its Applications. Longman Scientific Technical, Harlow (1988)
Saunders, G., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proc. 15th International Conf. on Machine Learning, pp. 515–521 (1998)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high dimensional distribution. Neural Computation 13, 1443–1471 (2001)
Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: Advances in Kernel Methods–Support Vector Learning, pp. 327–352. MIT Press (1999)
Schölkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge (2002)
Schölkopf, B., Warmuth, M.K. (eds.): COLT/Kernel 2003. LNCS (LNAI), vol. 2777. Springer, Heidelberg (2003)
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Shin, K., Kuboyama, T.: A generalization of haussler’s convolution kernel: mapping kernel. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 944–951. ACM, New York (2008)
Sterling, L., Shapiro, E.: The art of Prolog: advanced programming techniques, 2nd edn. MIT Press, Cambridge (1994)
Swamidass, S.J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics 21, 359–368 (2005)
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1991–(1999)
Tikhonov, A.N.: On solving ill-posed problem and method of regularization. Dokl. Akad. Nauk USSR 153, 501–504 (1963)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)
Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics 18(suppl. 1), S268–S275 (2002)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.: Graph kernels. Journal of Machine Learning Research 11, 1201–1242 (2010)
Vishwanathan, S.V.N., Smola, A.: Fast Kernels for String and Tree Matching. Advances in Neural Information Processing Systems 15 (2003)
Wahba, G.: Splines Models for Observational Data. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
Wale, N., Watson, I.A., Karypis, G.: Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl. Inf. Syst. 14, 347–375 (2008)
Watkins, C.: Dynamic alignment kernels. In: Smola, A.J., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classiers, pp. 39–50. MIT Press (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Passerini, A. (2013). Kernel Methods for Structured Data. In: Bianchini, M., Maggini, M., Jain, L. (eds) Handbook on Neural Information Processing. Intelligent Systems Reference Library, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36657-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-36657-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36656-7
Online ISBN: 978-3-642-36657-4
eBook Packages: EngineeringEngineering (R0)