Abstract
Many powerful methods for intelligent data analysis have become available in the fields of machine learning and data mining. However, almost all of these methods are based on the assumption that the objects under consideration are represented in terms of feature vectors, or collections of attribute values. In the present paper we argue that symbolic representations, such as strings, trees or graphs, have a representational power that is significantly higher than the representational power of feature vectors. On the other hand, operations on these data structure that are typically needed in data mining and machine learning are more involved than their counterparts on feature vectors. However, recent progress in graph matching and related areas has led to many new practical methods that seem to be very promising for a wide range of applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. The MIT Press (2001)
Kantardzic, M.: Data Mining-Concepts, Models, Methods, and Algorithms. Wiley-Interscience (2003)
Mitchell, T.: Machine Learning. McGraw Hill (1997)
Fu, K.-S.: Syntactic Pattern Recognition and Applications. Prentice Hall (1982)
Bunke, H., Sanfeliu, A. (eds.): Syntactic and Structural Pattern Recognition-Theory and Applications. World Scientific (1990)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the Association for Computing Machinery 21(1) (1974) 168–173
Ullman, J.R.: An algorithm for subgraph isomorphism. Journal of the ACM 23(1) (1976) 31–42
Messmer, B.T., Bunke, H.: A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. PAMI 20(5) (1998) 493–507
McGregor, J.: Backtrack search algorithms and the maximal common subgraph problem. Software-Practice and Experience 12 (1982) 23–13
Levi, G.: A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo 9 (1972) 341–354
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19 (1998) 255–259
Wallis, W.D., Shoubridge, P., Kraetzl, M., Ray, D.: Graph distances using graph union Pattern Recognition Letters 22 (2001) 701–704
Fernandez, M.L, Valiente, G.: A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters 22 (2001) 753–758
Bunke, H.: Error correcting graph matching: on the influence of the underlying cost function. IEEE Trans. PAMI 21 (1999) 917–922
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. Journal of Pattern Recognition and Art. Intelligence, Special issue on Graph Matching in Computer Vision and Pattern Recognition (to appear)
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Computing Surveys 31 (1999) 264–323
Seong, D., Kim, H., Park, K.: Incremental clustering of attributed graphs. IEEE Trans. SMC (1993) 1399–1411
Luo, B., Wilson, R.E., Hancock, E.: Spectral feature vectors for graph clustering. In: Caelli, T. et al. (eds.): Structural, Syntactic and Statistical Pattern Recognition. Lecture Notes in Computer Science, Vol. 2396. Springer (2002) 83–93
Sanfeliu, A., Serratosa, F., Alquezar, R.: Synthesis of function-described graphs and clustering of attributed graph. Int. Journal of Pattern Recognition and Art. Intell. 16 (2002) 621–655
Zahn, C.T.: Graph-theoretical methods for detecting and describing Gestalt structures. IEEE Trans. Computers C-20 (1971) 68–86
Jiang, X., Münger, A., Bunke, H.: On median graphs: properties, algorithms, and applications. IEEE Trans. Pattern Analysis and Machine Intell. 23(10) (2001) 1144–1151
Bunke, H., Günter, S.: Weighted mean of a pair of graphs. Computing 67 (2001) 209–224
Bunke, H., Kandel, A.: Mean and maximal common subgraph of two graphs. Pattern Recognition Letters 21 (2000) 163–168
Günter, S., Bunke, H.: Self-organizing map for clustering in the graph domain. Pattern Recognition Letters 23 (2002) 401–417
Günter, S., Bunke, H.: Validation indices for graph clustering. Pattern Recognition Letters 24 (2003) 1107–1113
Hopcroft, J., Wong, J.: Linear time algorithm for isomorphism of planar graphs. Proc. 6th Annual Symposium on Theory of Computing (1974) 172–184
Jiang, X., Bunke, H.: Marked subgraph isomorphism of ordered graphs. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds.): Advances in Pattern Recognition. Lecture Notes in Computer Science, Vol. 1451. Springer Verlag (1998) 122–131
Dickinson, P., Bunke, H., Dadej, A., Kraetzl, M.: On graphs with unique node labels. Proc. 4th IAPR Workshop on Graph based Representations, York (2003)
Bunke, H., Kraetzl, M., Shoubridge, P., Wallis, W.: Detection of abnormal change in time series of graphs. Journal of Interconnection Networks 3 (2002) 85–101
Bunke, H., Kraetzl, M.: Classification and detection of abnormal events in time series of graphs. In: Last, M., Kandel, A., Bunke, H. (eds.): Data Mining in Time Series Databases. World Scientific (2003)
Schenker, A., Last, M., Bunke, H., Kandel, A.: Clustering of documents using a graph model. In: Antonacopoulos, A., Hu, J. (eds.): Web Document Analysis: Challenges and Opportunities. World Scientific (2003)
Schenker, A., Last, M., Bunke, H., Kandel, A.: Classification of web documents using graph matching. Int. Journal of Pattern Recognition and Art. Intelligence (to appear)
Salton, G.: Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley (1989)
Schenker, A., Last, M., Bunke, H., Kandel, A.: Graph representations for web document clustering. Proc. 1st Iberian Conf. on Pattern Recognition and Image Analysis (2003)
Ristad, E., Yianilos, P.: Learning string edit distance. IEEE Trans. PAMI 20 (1998) 522–532
Parizeau, M., Ghazzali, N., Hebert, J.-F.: Optimizing the cost matrix for approximate string matching using genetic algorithms. Pattern Recognition 31 (1998) 431–440
Gomez-Ballester, M., Forcada, M., Mico, M.: A gradient-ascent method to adapt the edit distance to a classification task. In: Torres, M.I., Sanfeliu, A. (eds.): Pattern Recognition and Applications. IOS Press (2000) 13–18
Ambauen, R., Fischer, S., Bunke, H.: Graph edit distance with node splitting and merging and its application to diatom identification. Proc. 4th IAPR Workshop on Graph based Representations, York (2003)
Neuhaus, M., Bunke, H.: Self-organizing graph edit distance. Proc. 4th IAPR Workshop on Graph based Representations, York (2003)
Neuhaus, M.: Learning Graph Edit Distance. Diploma Thesis, University of Bern (2003)
Kohonen, T.: Self-Organizing Maps. Springer Verlag (1995)
Irniger, Ch., Bunke, H.: Graph Matching: Filtering large databases of graphs using decision trees. In: Jolion, J.-M., Kropatsch, W., Vento, M. (eds.): Proc. 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition. (2001) 239–249
Irniger, Ch., Bunke, H.: Theoretical analysis and experimental comparison of graph matching algorithms for database filtering. Proc. 4th IAPR Workshop on Graph based Representations, York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bunke, H. (2003). Graph-Based Tools for Data Mining and Machine Learning. In: Perner, P., Rosenfeld, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2003. Lecture Notes in Computer Science, vol 2734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45065-3_2
Download citation
DOI: https://doi.org/10.1007/3-540-45065-3_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40504-7
Online ISBN: 978-3-540-45065-8
eBook Packages: Springer Book Archive