Abstract
Biomedical data is growing up rapidly, and a better retrieval system is the need for its utilization. A basic problem while retrieving data from a system related to the queries is mismatch of words, which indicates the use of dissimilar words for expressing the identical concepts in given queries and in the stored documents. Two techniques are commonly used to solve this problem, i.e., query paraphrasing and query expansion. Query paraphrasing refers that the query is paraphrased by using synonyms of terms in the query. Query expansion techniques are further categorized as local and global. Local query expansion technique focuses on the analysis of the documents having top ranks retrieved for a query. Different ranking models have been introduced to rank documents in collections based on terms and features. A collection of candidate terms is obtained for expanding the given query from these documents. On feature selection from term pool, final selected candidate expansion terms contain a few terms which cause query drift problem. To overcome this problem, the semantic filtering technique was used. Semantic similarity measures are the basic techniques for successful semantic filtering. However, global query expansion relies on the analysis of the whole collection to find out word relationships. Synonyms of query words are extracted from a dictionary or thesaurus. In this research, we evaluated the famous probability-based ranking models such as LM-Dirichlet, LM Jelinek-Mercer, and BM25 for biomedical data retrieval process. We performed experimental analysis using diverse preprocessing techniques iteratively on 36 biomedical-related queries for the evaluation. State-of-the-art biomedical dataset Trec Genomic was used as a core for whole experimentation. It was observed that BM25 was the best information retrieval model for biomedical data. We used different terms scoring techniques such as Baseline, BNS, Chi-Square, CoDice, BIM, KLD, LRF, PRF, and RSV to score the terms related to the query. The average of MAP scores of all the queries was compared that exhibited BNS term scoring technique is the best for biomedical data. Different semantic similarity measures such as path-based, Wu and Palmer, Leacock, and Chodorow were applied on terms extracted from BNS to get most appropriate terms for query expansion. Finally, queries expanded with the most similar terms each time and documents retrieved through the expanded queries and the MAP results were evaluated for the purpose of final declarations of this research. The results of biomedical data retrieval through query expansion were improved, and the LCH semantic similarity measuring technique found best for query expansion in biomedical data retrieval system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Abdulla AAA, Lin H, Xu B, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinf 17(7):238
Alipanah N, Parveen P, Menezes S, Khan L, Seida SB, Thuraisingham B (2010) Ontology-driven query expansion methods to facilitate federated queries. In: 2010 IEEE international conference on service-oriented computing and applications (SOCA). IEEE, pp 1–8
Amati G, Joost C, Rijsbergen V (2003) Probabilistic models for information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS), 20(4), 357–389
Basu T, Murthy CA (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(suppl_1):D267–D270
Cohen T, Roberts K, Gururaj AE, Chen X, Pournejati S, Alter G, Xu H (2017) A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge. Database 2017. https://doi.org/10.1093/database/bax061
Demner-Fushman D, Mork JG, Shooshan SE, Aronson AR (2010) UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. J Biomed Inform 43(4):587–594
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305
Fujita S (2004) Revisiting again document length hypotheses TREC 2004 genomics track experiments at Patolis. In: TREC
Grossman DA, Frieder O (2012) Information retrieval: algorithms and heuristics, vol 15. Springer Science & Business Media, New York
Harish BS, Guru DS, Manjunath S (2010) Representation and classification of text documents: a brief review. IJCA, Special Issue on RTIPPR 2:110–119
Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53
Hiemstra D (2009) Information retrieval models. Information Retrieval: searching in the 21st Century, pp 1–17
Imran H, Sharan A (2009) Thesaurus and query expansion. Int j Comp Sci Infor Technol (IJCSIT) 1(2):89–97
Jerome RN, Giuse NB, Gish KW, Sathe NA, Dietrich MS (2001) Information needs of clinical teams: analysis of questions received by the clinical informatics consult service. Bull Med Libr Assoc 89(2):177
Lavrenko V, Croft WB (2017) Relevance-based language models. In ACM SIGIR Forum, ACM, New York, 51(2):260–267
Lin SM, Huang CM (2017) Personalized optimal search in local query expansion. In ROCLING
Lu W, Robertson S, MacFarlane A (2005) Field-weighted XML retrieval based on BM25. In: International workshop of the initiative for the evaluation of XML retrieval. Springer, Berlin/Heidelberg, pp 161–171
Lv Y, Zhai C (2010) Positional relevance model for pseudo-relevance feedback. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 579–586
Mahdabi P, Crestani F (2014) The effect of citation analysis on query expansion for patent retrieval. Inf Retr 17(5–6):412–429
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1, no 1. Cambridge university press, Cambridge, p 496)
McInnes BT, Pedersen T, Pakhomov SV (2009) UMLS-Interface and UMLS-similarity: open source software for measuring paths and semantic similarity. In: AMIA annual symposium proceedings, vol 2009. American Medical Informatics Association, p 431
Moffat A, Webber W, Zobel J, Baeza-Yates R (2007) A pipelined architecture for distributed text query evaluation. Inf Retr 10(3):205–231
Pedersen T, Pakhomov SV, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299
Pérez-Agüera JR, Araujo L (2008) Comparing and combining methods for automatic query expansion. arXiv preprint arXiv:0804.2057
Rada R, Mili H, Bicknell E, Bletner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Rivas AR, Iglesias EL, Borrajo L (2014) Study of query expansion techniques and their application in the biomedical information retrieval. Sci World J 2014
Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Found Trends Inf Retr 3(4):333–389
Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608
Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Info Sci 41:288–297
Singh J, Sharan A (2015) Relevance feedback based query expansion model using borda count and semantic similarity approach. Comput Intell Neurosci 2015:96
Singh J, Sharan A (2018) Rank fusion and semantic genetic notion based automatic query expansion model. Swarm Evol Comput 38:295–308
Slimani T (2013) Description and evaluation of semantic similarity measures approaches. arXiv preprint arXiv:1310.8059
Smiley D, Pugh DE (2011) Apache Solr 3 Enterprise search server. Packt Publishing Ltd., Birmingham
Urbain J, Goharian N, Frieder O (2006) IIT TREC 2006: genomics track. In: TREC
Wasim M, Khan MUG, Mahmood W (2018) Enhanced biomedical retrieval using discriminative term selection for Pseudo relevance feedback. J Med Imaging Health Inform 8(5):1000–1008
Wei CP, Hu PJH, Tai CH, Huang CN, Yang CS (2007) Managing word mismatch problems in information retrieval: a topic-based query expansion approach. J Manag Inf Syst 24(3):269–295
Xiong C, Callan J (2015) Esdrank: connecting query and documents through external semi-structured data. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 951–960
Xu J, Croft WB (2017) Query expansion using local and global document analysis. In ACM SIGIR Forum, ACM, New York, 51(2):168–175
Yang J, Peng W, Ward MO, Rundensteiner EA (2003). Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In IEEE Symposium on Information Visualization 2003 (IEEE Cat. No. 03TH8714) (pp 105–112). IEEE.high dimensional datasets. Information Visualization, 2003. INFOVIS 2003. IEEE Symposium on. IEEE, 2003
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this entry
Cite this entry
Qadeer, M., Hussain, C.G., Hussain, C.M. (2022). Biomedical Data Retrieval Using Enhanced Query Expansion. In: Hussain, C.M., Di Sia, P. (eds) Handbook of Smart Materials, Technologies, and Devices. Springer, Cham. https://doi.org/10.1007/978-3-030-84205-5_63
Download citation
DOI: https://doi.org/10.1007/978-3-030-84205-5_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84204-8
Online ISBN: 978-3-030-84205-5
eBook Packages: EngineeringReference Module Computer Science and Engineering