Abstract
The rapid development of Gene Ontology (GO) and huge amount of biomedical data annotated by GO terms necessitate computation of semantic similarity of GO terms and, in turn, measurement of functional similarity of genes based on their annotations. This paper proposes a novel and efficient method to measure the semantic similarity of GO terms. This method addresses the limitations in existing GO term similarity measurement methods by using the information content of all ancestor terms of a GO term to determine the GO term’s semantic content. The aggregate information content of all ancestor terms of a GO term implicitly reflects the GO term’s location in the GO graph and also represents how human beings use this GO term and all its ancestor terms to annotate genes. We show that semantic similarity of GO terms obtained by our method closely matches the human perception. Extensive experimental studies show that this novel method outperforms all existing methods in terms of the correlation with gene expression data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)
Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., Lewis, S.: The generic genome browser: A building block for a model organism system database. Genome Research 12, 1599–1610 (2002)
The UniProt Consortium. The uniprot consortium: The universal protein resource (uniprot). Nucleic Acids Research, pp. 190–195 (2008)
Kriventseva, E.V., Fleischmann, W., Zdobnov, E.M., Apweiler, R.: Clustr: a database of clusters of swiss-prot+trembl proteins. Nucleic Acids Research 29, 33–36 (2001)
Xu, T., Du, L., Zhou, Y.: Evaluation of go-based functional similarity measures using s.cerevisiae protein interaction and expression profile data. BMC Bioinformatics 9, 472 (2008)
Wang, J.Z., Du, Z., Payattakool, R., Yu, P.S., Chen, C.-F.: A new method to measure the semantic similarity of go terms. Bioinformatics 23, 1274–1281 (2007)
Wang, H., Azuaje, F., Bodenreider, O., Dopazo, J.: Gene expression correlation and gene ontology-based similarity: An assessment of quantitative relationships. In: Proc. of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 25–31 (2004)
Sevilla, J.L., Segura, V., Podhorski, A., Guruceaga, E., Mato, J.M., Martinez-Cruz, L.A., Corrales, F.J., Rubio, A.: Correlation between gene expression and go semantic similarity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 330–338 (2005)
Schlicker, A., Domingues, F.S., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006)
Cheng, J., Cline, M., Martin, J., Finkelstein, D., Awad, T., Kulp, D., Siani-Rose, M.A.: A knowledge-based clustering algorithm driven by gene ontology. Journal of Biopharmaceutical Statistics 14(3), 687–700 (2004)
Pesquita, C., Faria, D., Falcao, A.O., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Computational Biology 5(7), e1000443 (2009)
Azuaje, F., Wang, H., Bodenreider, O.: Ontology-driven similarity approaches to supporting gene functional assessment. In: Proc. of the ISMB 2005 SIG Meeting on Bio-ontologies, pp. 9–10 (2005)
Li, B., Wang, J.Z., Luo, F., Feltus, F.A., Zhou, J.: Effectively integrating information content and structural relationship to improve the gene ontology similarity measure between proteins. In: The 2010 International Conference on Bioinformatics & Computational Biology (BioComp 2010), pp. 166–172 (2010)
Pesquita, C., Faria, D., Bastos, H., Falcao, A.O., Couto, F.M.: Evaluating go-based semantic similarity measures. In: Proc. of the 10th Annual Bio-Ontologies Meeting 2007, pp. 37–40 (2007)
Ravasi, T., et al.: An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140(5), 744–752 (2010)
Washington, N.L., Haendel, M.A., Mungall, C.J., Ashburner, M., Westerfield, M., Lewis, S.E.: Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biology 7(11), e1000247 (2009)
Resnik, P.: Semantic similarity in taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Lin, D.: An information-theoretic definition of similarity. In: Proc. Int. Conf. on Machine Learning, pp. 296–304 (1998)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. Int. Conf. on Research in Computational Linguistics, pp. 19–33 (1997)
Pekar, V., Staab, S.: Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In: Proc. Int. Conf. on Computational Linguistics, vol. 2, pp. 786–792 (2002)
Wu, H., Su, Z., Mao, F., Olman, V., Xu, Y.: Prediction of functional modules based on comparative genome analysis and gene ontology application. Nucleic Acids Research 33(9), 2822–2837 (2005)
Du, Z., Li, L., Chen, C.-F., Yu, P.S., Wang, J.Z.: G-sesame: web tools for go-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Research 37, W345–W349 (2009)
Froehlich, H., Speer, N., Poustka, A., Beissbarth, T.: Gosim - an r-package for computation of information theoretic go similarities between terms and gene products. BMC Bioinformatics 8, 166 (2007)
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: Identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16, 1370–1386 (2004)
Gibbons, F.D., Roth, F.P.: Judging the quality of gene expression-based clustering methods using gene annotation. Genome Research 12, 1574–1581 (2002)
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, X., Li, L., Srimani, P.K., Yu, P.S., Wang, J.Z. (2013). Measure the Semantic Similarity of GO Terms Using Aggregate Information Content. In: Cai, Z., Eulenstein, O., Janies, D., Schwartz, D. (eds) Bioinformatics Research and Applications. ISBRA 2013. Lecture Notes in Computer Science(), vol 7875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38036-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-38036-5_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38035-8
Online ISBN: 978-3-642-38036-5
eBook Packages: Computer ScienceComputer Science (R0)