Abstract
Supervised learning on graphs is a central subject in graph data processing. In graph classification and regression, we assume that the target values of a certain number of graphs or a certain part of a graph are available as a training dataset, and our goal is to derive the target values of other graphs or the remaining part of the graph. In drug discovery applications, for example, a graph and its target value correspond to a chemical compound and its chemical activity. In this chapter, we review state-of-the-art methods of graph classification. In particular, we focus on two representative methods, graph kernels and graph boosting, and we present other methods in relation to the two methods. We describe the strengths and weaknesses of different graph classification methods and recent efforts to overcome the challenges.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB 1994, pages 487–499, 1994.
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In Proc 2nd SIAM Data Mining Conference (SDM), pages 158–174, 2002.
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
M. Boley and H. Grosskreutz. A randomized approach for approximating the number of frequent sets. In Proceedings of the 8th IEEE International Conference on Data Mining, pages 43–52, 2008.
K. M. Borgwardt, C. S. Ong, S. Schonauer, S. V. N. Vishwanathan, A. J. Smola, and H.-P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl. 1):i47–i56, 2006.
O. Chapelle, A. Zien, and B. Scholkopf, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.
T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press and McGraw Hill, 1990.
A. Demiriz, K.P. Bennet, and J. Shawe-Taylor. Linear programming boosting via column generation. Machine Learning, 46(1–3):225–254, 2002.
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036–1050, 2005.
O. du Merle, D. Villeneuve, J. Desrosiers, and P. Hansen. Stabilized column generation. Discrete Mathematics, 194:229–237, 1999.
F. Eichinger, K. Bohm, and M. Huber. Mining edge-weighted call graphs to localise software bugs. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 333–348, 2008.
T. Gartner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Proc. of the Sixteenth Annual Conference on Computational Learning Theory, 2003.
I. Guyon, J. Weston, S. Bahnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3):389–422, 2002.
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
Z. Harchaoui and F. Bach. Image classification with segmentation graph kernels. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2007.
C. Helma, T. Cramer, S. Kramer, and L.D. Raedt. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. Sci., 44:1402–1411, 2004.
T. Horvath, T. Gartner, and S. Wrobel. Cyclic pattern kernels for predictive graph mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 158–167, 2004.
A. Inokuchi. Mining generalized substructures from a set of labeled graphs. In Proceedings of the 4th IEEE Internatinal Conference on Data Mining, pages 415–418. IEEE Computer Society, 2005.
Y. Karklin, R.F. Meraz, and S.R. Holbrook. Classification of non-coding rna using graph representations of secondary structure. In Pacific Symposium on Biocomputing, pages 4–15, 2005.
H. Kashima, T. Kato, Y. Yamanishi, M. Sugiyama, and K. Tsuda. Link propagation: A fast semi-supervised learning algorithm for link prediction. In 2009 SIAM Conference on Data Mining, pages 1100–1111, 2009.
H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 21st International Conference on Machine Learning, pages 321–328. AAAI Press, 2003.
T. Kato, H. Kashima, and M. Sugiyama. Robust label propagation on multiple networks. IEEE Trans. Neural Networks, 20(1):35–44, 2008.
J. Kazius, S. Nijssen, J. Kok, T. Back, and A.P. Ijzerman. Substructure mining using elaborate chemical representation. J. Chem. Inf. Model., 46:597–605, 2006.
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 1–2:273–324, 1997.
R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input. In ICML 2002, 2002.
T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In Advances in Neural Information Processing Systems 17, pages 729–736. MIT Press, 2005.
D. G. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.
P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert. Graph kernels for molecular structure - activity relationship analysis with support vector machines. J. Chem. Inf. Model., 45:939–951, 2005.
P. Mahe and J.P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning, 75:3–35, 2009.
S. Morishita. Computing optimal hypotheses efficiently for boosting. In Discovery Science, pages 471–481, 2001.
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proceedings of ACM SIGACT-SIGMOD-SIGART Symposium on Database Systems (PODS), pages 226–236, 2000.
S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology, 9(Suppl. 1):S4, 2008.
S. Nijssen and J.N. Kok. A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 647–652. ACM Press, 2004.
S. Nowozin, K. Tsuda, T. Uno, T. Kudo, and G. Bakir. Weighted substructure mining for image analysis. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2007.
J. Pei, J. Han, B. Mortazavi-asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Mining sequential patterns by pattern-growth: The prefixs-pan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):1424–1440, 2004.
G. Ratsch, S. Mika, B. Scholkopf, and K.-R. Muller. Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE Trans. Patt. Anal. Mach. Intell., 24(9):1184–1199, 2002.
R. Rosipal and N. Kramer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection Techniques, pages 34–51. Springer, 2006.
W.J. Rugh. Linear System Theory. Prentice Hall, 1995.
H. Saigo, N. Kramer, and K. Tsuda. Partial least squares regression for graph mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 578–586, 2008.
H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda. GBoost: A mathematical programming approach to graph classification and regression. Machine Learning, 2008.
A. Sanfeliu and K.S. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern., 13:353–362, 1983.
B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
H. Shin, A.M. Lisewski, and O. Lichtarge. Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics, 23:3217–3224, 2007.
A. Subramanya and J. Bilmes. Soft-supervised learning for text classification. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1090–1099, 2008.
K. Tsuda. Entire regularization paths for graph data. In Proceedings of the 24th International Conference on Machine Learning, pages 919–926, 2007.
K. Tsuda and T. Kudo. Clustering graphs by weighted substructure mining. In Proceedings of the 23rd International Conference on Machine Learning, pages 953–960. ACM Press, 2006.
K. Tsuda and K. Kurihara. Graph mining with variational dirichlet process mixture models. In SIAM Conference on Data Mining (SDM), 2008.
K. Tsuda and W.S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(Suppl. 1):i326–i333, 2004.
K. Tsuda, H.J. Shin, and B. Scholkopf. Fast protein classification with multiple networks. Bioinformatics, 21(Suppl. 2):ii59–ii65, 2005.
S.V.N. Vishwanathan, K.M. Borgwardt, and N.N. Schraudolph. Fast computation of graph kernels. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2006. MIT Press.
N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the 2006 IEEE International Conference on Data Mining, pages 678–689, 2006.
X. Yan and J. Han. gSpan: graph-based substructure pattern mining. In Proceedings of the 2002 IEEE International Conference on Data Mining, pages 721–724. IEEE Computer Society, 2002.
D. Zhou, O. Bousquet, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems (NIPS) 16, pages 321–328. MIT Press, 2004.
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proc. of the Twentieth International Conference on Machine Learning (ICML), pages 912–919. AAAI Press, 2003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag US
About this chapter
Cite this chapter
Tsuda, K., Saigo, H. (2010). Graph Classification. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_11
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6045-0_11
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)