Abstract
Semantic text mining is a challenging research topic in recent years. Many types of research focus on measuring the similarity of two documents with ontologies such as Medical Subject Headings (Mesh) and Gene Ontology (GO). However, most of the researches considered the single relationship in an ontology. To represent the document comprehensively, a semantic document similarity calculation method is proposed, based on utilizing Average Maximum Match algorithm with double-relations in GO. In the experiment, the results show that the double-relations based similarity calculation method is better than traditional semantic similarity measurements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Danushka, B., Georgios, K., Sophia, A.: A cross-lingual similarity measure for detecting biomedical term translations. PLoS One 10(6), 7–15 (2015)
Spasić, I., Ananiadou, S.: A flexible measure of contextual similarity for biomedical terms. In: Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, pp. 197–208 (2005)
Rey-Long, L.: Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS One 10(10), 6–10 (2015)
Chen, C., Hsieh, S., Weng, Y.: Semantic similarity measure in biomedical domain leverage Web Search Engine. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology (2010)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics (ACL’94), pp. 133–138 (1994)
Leacock, C., Chodorow, M.: Filling in a sparse training space for word sense identification. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics (ACL94), pp. 248–256 (1994)
Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. Bioinform. 15(4), 871–882 (2003)
Choudhury, J., Kimtani, D.K., Chakrabarty, A.: Text clustering using a word net-based knowledge-base and the Lesk algorithm. Int. J. Comput. Appl. 48(21), 20–24 (2012)
Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Resnik, O.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity and natural language. J. Artif. Intell. Res. Bibliometr. 19(11), 95–130 (1999)
Lin, D.: Principle-based parsing without overgeneration. In: 31st Annual Meeting of the Association for Computational Linguistics, pp. 112–120. Association for Computational Linguistics, USA (1993)
Zhang, X., Jing, L., Hu, X., et al.: A comparative study of ontology based term similarity measures on PubMed document clustering. In: International Conference on Database Systems, pp. 115–126. Springer, Berlin, Heidelberg (2007)
Jing, Z., Yuxuan, S., Shengwen, P., Xuhui, L., Hiroshi, M., Shanfeng, Z.: MeSHSim: an R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents. J. Bioinform. Comput. (2015) (BioMed Central)
Logeswari, S., Kandhasamy, P.: Designing a semantic similarity measure for biomedical document clustering. J. Med. Imaging Health Inform. 5(6), 1163–1170 (2015)
The Gene Ontology Resource Home. http://geneontology.org/. Accessed 27 Feb 2019
Wang, J.Z., Du, Z., Payattakool, R., Yu, P.S., Chen, C.F.: A new method to measure the semantic similarity of go terms. Bioinformatics 23(10), 1274–1281 (2007)
Zare, H., Shooshtari, P., Gupta, A., Brinkman, R.: Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinform. (2010)
Dongen, V.: A cluster algorithm for graphs. In: Information Systems, pp. 1–40. CWI (2000)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD’96 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
MacKay, D.: An example inference task: clustering. In: Information Theory, Inference and Learning Algorithms, pp. 284–292. Cambridge University Press (2003)
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)
Acknowledgements
This study was supported by the National Natural Science Foundation of China (61702324).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, J., Li, M., Zhang, Z., Li, K. (2020). An Efficient Semantic Document Similarity Calculation Method Based on Double-Relations in Gene Ontology. In: Pan, JS., Li, J., Tsai, PW., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol 157. Springer, Singapore. https://doi.org/10.1007/978-981-13-9710-3_19
Download citation
DOI: https://doi.org/10.1007/978-981-13-9710-3_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9709-7
Online ISBN: 978-981-13-9710-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)