Biomedical Data Retrieval Using Enhanced Query Expansion

Qadeer, Muhammad; Hussain, Chuadhery Ghazanfar; Hussain, Chaudhery Mustansar

doi:10.1007/978-3-030-84205-5_63

Muhammad Qadeer³,
Chuadhery Ghazanfar Hussain³ &
Chaudhery Mustansar Hussain⁴

137 Accesses

Abstract

Biomedical data is growing up rapidly, and a better retrieval system is the need for its utilization. A basic problem while retrieving data from a system related to the queries is mismatch of words, which indicates the use of dissimilar words for expressing the identical concepts in given queries and in the stored documents. Two techniques are commonly used to solve this problem, i.e., query paraphrasing and query expansion. Query paraphrasing refers that the query is paraphrased by using synonyms of terms in the query. Query expansion techniques are further categorized as local and global. Local query expansion technique focuses on the analysis of the documents having top ranks retrieved for a query. Different ranking models have been introduced to rank documents in collections based on terms and features. A collection of candidate terms is obtained for expanding the given query from these documents. On feature selection from term pool, final selected candidate expansion terms contain a few terms which cause query drift problem. To overcome this problem, the semantic filtering technique was used. Semantic similarity measures are the basic techniques for successful semantic filtering. However, global query expansion relies on the analysis of the whole collection to find out word relationships. Synonyms of query words are extracted from a dictionary or thesaurus. In this research, we evaluated the famous probability-based ranking models such as LM-Dirichlet, LM Jelinek-Mercer, and BM25 for biomedical data retrieval process. We performed experimental analysis using diverse preprocessing techniques iteratively on 36 biomedical-related queries for the evaluation. State-of-the-art biomedical dataset Trec Genomic was used as a core for whole experimentation. It was observed that BM25 was the best information retrieval model for biomedical data. We used different terms scoring techniques such as Baseline, BNS, Chi-Square, CoDice, BIM, KLD, LRF, PRF, and RSV to score the terms related to the query. The average of MAP scores of all the queries was compared that exhibited BNS term scoring technique is the best for biomedical data. Different semantic similarity measures such as path-based, Wu and Palmer, Leacock, and Chodorow were applied on terms extracted from BNS to get most appropriate terms for query expansion. Finally, queries expanded with the most similar terms each time and documents retrieved through the expanded queries and the MAP results were evaluated for the purpose of final declarations of this research. The results of biomedical data retrieval through query expansion were improved, and the LCH semantic similarity measuring technique found best for query expansion in biomedical data retrieval system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 899.99; Price excludes VAT (USA)

Hardcover Book: USD 1,399.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

Abdulla AAA, Lin H, Xu B, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinf 17(7):238
Article Google Scholar
Alipanah N, Parveen P, Menezes S, Khan L, Seida SB, Thuraisingham B (2010) Ontology-driven query expansion methods to facilitate federated queries. In: 2010 IEEE international conference on service-oriented computing and applications (SOCA). IEEE, pp 1–8
Google Scholar
Amati G, Joost C, Rijsbergen V (2003) Probabilistic models for information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS), 20(4), 357–389
Google Scholar
Basu T, Murthy CA (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892
Article Google Scholar
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(suppl_1):D267–D270
Article Google Scholar
Cohen T, Roberts K, Gururaj AE, Chen X, Pournejati S, Alter G, Xu H (2017) A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge. Database 2017. https://doi.org/10.1093/database/bax061
Demner-Fushman D, Mork JG, Shooshan SE, Aronson AR (2010) UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. J Biomed Inform 43(4):587–594
Article Google Scholar
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305
MATH Google Scholar
Fujita S (2004) Revisiting again document length hypotheses TREC 2004 genomics track experiments at Patolis. In: TREC
Google Scholar
Grossman DA, Frieder O (2012) Information retrieval: algorithms and heuristics, vol 15. Springer Science & Business Media, New York
MATH Google Scholar
Harish BS, Guru DS, Manjunath S (2010) Representation and classification of text documents: a brief review. IJCA, Special Issue on RTIPPR 2:110–119
Google Scholar
Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53
Article Google Scholar
Hiemstra D (2009) Information retrieval models. Information Retrieval: searching in the 21st Century, pp 1–17
Google Scholar
Imran H, Sharan A (2009) Thesaurus and query expansion. Int j Comp Sci Infor Technol (IJCSIT) 1(2):89–97
Google Scholar
Jerome RN, Giuse NB, Gish KW, Sathe NA, Dietrich MS (2001) Information needs of clinical teams: analysis of questions received by the clinical informatics consult service. Bull Med Libr Assoc 89(2):177
Google Scholar
Lavrenko V, Croft WB (2017) Relevance-based language models. In ACM SIGIR Forum, ACM, New York, 51(2):260–267
Google Scholar
Lin SM, Huang CM (2017) Personalized optimal search in local query expansion. In ROCLING
Google Scholar
Lu W, Robertson S, MacFarlane A (2005) Field-weighted XML retrieval based on BM25. In: International workshop of the initiative for the evaluation of XML retrieval. Springer, Berlin/Heidelberg, pp 161–171
Google Scholar
Lv Y, Zhai C (2010) Positional relevance model for pseudo-relevance feedback. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 579–586
Chapter Google Scholar
Mahdabi P, Crestani F (2014) The effect of citation analysis on query expansion for patent retrieval. Inf Retr 17(5–6):412–429
Article Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1, no 1. Cambridge university press, Cambridge, p 496)
Google Scholar
McInnes BT, Pedersen T, Pakhomov SV (2009) UMLS-Interface and UMLS-similarity: open source software for measuring paths and semantic similarity. In: AMIA annual symposium proceedings, vol 2009. American Medical Informatics Association, p 431
Google Scholar
Moffat A, Webber W, Zobel J, Baeza-Yates R (2007) A pipelined architecture for distributed text query evaluation. Inf Retr 10(3):205–231
Article Google Scholar
Pedersen T, Pakhomov SV, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299
Article Google Scholar
Pérez-Agüera JR, Araujo L (2008) Comparing and combining methods for automatic query expansion. arXiv preprint arXiv:0804.2057
Google Scholar
Rada R, Mili H, Bicknell E, Bletner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Article Google Scholar
Rivas AR, Iglesias EL, Borrajo L (2014) Study of query expansion techniques and their application in the biomedical information retrieval. Sci World J 2014
Google Scholar
Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Found Trends Inf Retr 3(4):333–389
Article Google Scholar
Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608
Google Scholar
Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Info Sci 41:288–297
Article Google Scholar
Singh J, Sharan A (2015) Relevance feedback based query expansion model using borda count and semantic similarity approach. Comput Intell Neurosci 2015:96
Google Scholar
Singh J, Sharan A (2018) Rank fusion and semantic genetic notion based automatic query expansion model. Swarm Evol Comput 38:295–308
Article Google Scholar
Slimani T (2013) Description and evaluation of semantic similarity measures approaches. arXiv preprint arXiv:1310.8059
Google Scholar
Smiley D, Pugh DE (2011) Apache Solr 3 Enterprise search server. Packt Publishing Ltd., Birmingham
Google Scholar
Urbain J, Goharian N, Frieder O (2006) IIT TREC 2006: genomics track. In: TREC
Google Scholar
Wasim M, Khan MUG, Mahmood W (2018) Enhanced biomedical retrieval using discriminative term selection for Pseudo relevance feedback. J Med Imaging Health Inform 8(5):1000–1008
Article Google Scholar
Wei CP, Hu PJH, Tai CH, Huang CN, Yang CS (2007) Managing word mismatch problems in information retrieval: a topic-based query expansion approach. J Manag Inf Syst 24(3):269–295
Article Google Scholar
Xiong C, Callan J (2015) Esdrank: connecting query and documents through external semi-structured data. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 951–960
Chapter Google Scholar
Xu J, Croft WB (2017) Query expansion using local and global document analysis. In ACM SIGIR Forum, ACM, New York, 51(2):168–175
Google Scholar
Yang J, Peng W, Ward MO, Rundensteiner EA (2003). Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In IEEE Symposium on Information Visualization 2003 (IEEE Cat. No. 03TH8714) (pp 105–112). IEEE.high dimensional datasets. Information Visualization, 2003. INFOVIS 2003. IEEE Symposium on. IEEE, 2003
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Education, Computer science and Technology, Punjab, Pakistan
Muhammad Qadeer & Chuadhery Ghazanfar Hussain
Department of Chemistry and Environmental Science, New Jersey Institute of Technology, Newark, NJ, USA
Chaudhery Mustansar Hussain

Authors

Muhammad Qadeer
View author publications
You can also search for this author in PubMed Google Scholar
Chuadhery Ghazanfar Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Chaudhery Mustansar Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaudhery Mustansar Hussain .

Editor information

Editors and Affiliations

Department of Chemistry and Environmental Science, New Jersey Institute of Technology, Newark, NJ, USA
Chaudhery Mustansar Hussain
School of Science, University of Padova, Padova, Italy
Paolo Di Sia

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Qadeer, M., Hussain, C.G., Hussain, C.M. (2022). Biomedical Data Retrieval Using Enhanced Query Expansion. In: Hussain, C.M., Di Sia, P. (eds) Handbook of Smart Materials, Technologies, and Devices. Springer, Cham. https://doi.org/10.1007/978-3-030-84205-5_63

Download citation

DOI: https://doi.org/10.1007/978-3-030-84205-5_63
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84204-8
Online ISBN: 978-3-030-84205-5
eBook Packages: EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics