Abstract
In this paper, we present a novel graph database-mining method called APGM (APproximate Graph Mining) to mine useful patterns from noisy graph database. In our method, we designed a general framework for modeling noisy distribution using a probability matrix and devised an efficient algorithm to identify approximate matched frequent subgraphs. We have used APGM to both synthetic data set and real-world data sets on protein structure pattern identification and structure classification. Our experimental study demonstrates the efficiency and efficacy of the proposed method.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aggarwal CC (2009) Managing and mining uncertain data. Springer, Berlin
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Proceedings of the 2009 ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD’09), pp 29–37
Bandyopadhyay D, Snoeyink J (2004) Almost-Delaunay simplices: nearest neighbor relations for imprecise points. In: ACM-SIAM symposium on distributed algorithms, pp 403–412
Chan J, Bailey J, Leckie C (2008) Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16(1): 53–96
Chen C, Yan X, Zhu F, Han J (2007) Gapprox: mining frequent approximate patterns from a massive network. In: Proceedings of the 2007 international conference on data mining (ICDM’07)
Eddy SR (2004) Where did the blosum62 alignment score matrix come from. Nat Biotechnol 22: 1035–1036
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 14
Holder LB, Cook DJ, Djoko S (1994) Substructures discovery in the subdue system. In: Proceedings of AAAI’94 workshop knowledge discovery in databases, pp 169–180
Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. In: Proceedings of the 2005 international conference on intelligent systems for molecular biology (ISMB’05)
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 2003 IEEE international conference on data mining (ICDM’03), pp 549–552
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 2003 international conference on data mining (ICDM’03)
Huan J, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A, Wang W (2006) Distance-based identification of spatial motifs in proteins using constrained frequent subgraph mining. In: Proceedings of the IEEE computational systems bioinformatics
Huan J, Prins J, Wang W, Carter C, Dokholyan NV (2006) Coordinated evolution of protein sequences and structures with structure entropy. In: Computer Science Department Technical Report
Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004) Mining family specific residue packing patterns from protein structure graphs. In: Proceedings of the 8th annual international conference on research in computational molecular biology (RECOMB), pp 308–315
Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 581–586
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of 2000 practice of knowledge discovery in databases conference (PKDD’00), pp 13–23
Judson KA, Lubinski JM, Jiang M, Chang Y, Eisenberg RJ, Cohen GH, Friedman HM (2003) Blocking immune evasion as a novel approach for prevention and treatment of herpes simplex virus infection. J Virol 77: 12639–12645
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the 2001 international conference on data mining (ICDM’01), pp 313–320
Lahiri M, Berger-Wolf TY (2009) Periodic subgraph mining in dynamic networks. Knowl Inf Syst (online first 09/2009)
Lahiri M, Berger-Wolf TY (2007) Structure prediction in temporal networks using frequent subgraphs. Computat Intell Data Min, pp. 35–42
Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH—a hierarchic classification of protein domain structures. Structure 5(8): 1093–1108
Pei J, Jiang D, Zhang A (2005) Mining cross-graph quasi-cliques in gene expression and protein interaction data. ICDE, pp 353–354
De Raedt L, Kramer S (2001) The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI’01: seventeenth international joint conference on artificial intelligence, vol 2, pp 853–859
Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19: 1589–1591
Weng C-H, Chen Y-L (2010) Mining fuzzy association rules from uncertain data. Knowl Inf Syst 23(2): 129–152
Yada K, Motoda H, Washio T, Miyawaki A (2004) Consumer behavior analysis by graph mining technique. Lecture Notes in Computer Science, pp 800–806
Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: Procceeding of international conference on data mining (ICDM’02), pp 721–724
Yan X, Zhu F, Yu PS, Han J (2006) Feature-based substructure similarity search. ACM Trans Database Syst 31(4): 1418–1453
Zhang S, Yang J (2008) Ram: randomized approximate graph mining export. Scientific and Statistical Database Management
Zhang S, Yang J, Cheedella V (2007) Monkey: approximate graph mining based on spanning trees. In: Proceeding of IEEE 23rd international conference data engineering (ICDE’07), pp 1247–1249
Zou Z, Li J, Gao H, Zhang S (2009) Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 2009 conference on information and knowledge management (CIKM’09), pp 583–592
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jia, Y., Zhang, J. & Huan, J. An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst 28, 423–447 (2011). https://doi.org/10.1007/s10115-010-0376-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0376-y