Skip to main content

Biomedical Data Retrieval Using Enhanced Query Expansion

  • Reference work entry
  • First Online:
Handbook of Smart Materials, Technologies, and Devices

Abstract

Biomedical data is growing up rapidly, and a better retrieval system is the need for its utilization. A basic problem while retrieving data from a system related to the queries is mismatch of words, which indicates the use of dissimilar words for expressing the identical concepts in given queries and in the stored documents. Two techniques are commonly used to solve this problem, i.e., query paraphrasing and query expansion. Query paraphrasing refers that the query is paraphrased by using synonyms of terms in the query. Query expansion techniques are further categorized as local and global. Local query expansion technique focuses on the analysis of the documents having top ranks retrieved for a query. Different ranking models have been introduced to rank documents in collections based on terms and features. A collection of candidate terms is obtained for expanding the given query from these documents. On feature selection from term pool, final selected candidate expansion terms contain a few terms which cause query drift problem. To overcome this problem, the semantic filtering technique was used. Semantic similarity measures are the basic techniques for successful semantic filtering. However, global query expansion relies on the analysis of the whole collection to find out word relationships. Synonyms of query words are extracted from a dictionary or thesaurus. In this research, we evaluated the famous probability-based ranking models such as LM-Dirichlet, LM Jelinek-Mercer, and BM25 for biomedical data retrieval process. We performed experimental analysis using diverse preprocessing techniques iteratively on 36 biomedical-related queries for the evaluation. State-of-the-art biomedical dataset Trec Genomic was used as a core for whole experimentation. It was observed that BM25 was the best information retrieval model for biomedical data. We used different terms scoring techniques such as Baseline, BNS, Chi-Square, CoDice, BIM, KLD, LRF, PRF, and RSV to score the terms related to the query. The average of MAP scores of all the queries was compared that exhibited BNS term scoring technique is the best for biomedical data. Different semantic similarity measures such as path-based, Wu and Palmer, Leacock, and Chodorow were applied on terms extracted from BNS to get most appropriate terms for query expansion. Finally, queries expanded with the most similar terms each time and documents retrieved through the expanded queries and the MAP results were evaluated for the purpose of final declarations of this research. The results of biomedical data retrieval through query expansion were improved, and the LCH semantic similarity measuring technique found best for query expansion in biomedical data retrieval system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 899.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,399.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

  • Abdulla AAA, Lin H, Xu B, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinf 17(7):238

    Article  Google Scholar 

  • Alipanah N, Parveen P, Menezes S, Khan L, Seida SB, Thuraisingham B (2010) Ontology-driven query expansion methods to facilitate federated queries. In: 2010 IEEE international conference on service-oriented computing and applications (SOCA). IEEE, pp 1–8

    Google Scholar 

  • Amati G, Joost C, Rijsbergen V (2003) Probabilistic models for information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS), 20(4), 357–389

    Google Scholar 

  • Basu T, Murthy CA (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892

    Article  Google Scholar 

  • Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(suppl_1):D267–D270

    Article  Google Scholar 

  • Cohen T, Roberts K, Gururaj AE, Chen X, Pournejati S, Alter G, Xu H (2017) A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge. Database 2017. https://doi.org/10.1093/database/bax061

  • Demner-Fushman D, Mork JG, Shooshan SE, Aronson AR (2010) UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. J Biomed Inform 43(4):587–594

    Article  Google Scholar 

  • Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305

    MATH  Google Scholar 

  • Fujita S (2004) Revisiting again document length hypotheses TREC 2004 genomics track experiments at Patolis. In: TREC

    Google Scholar 

  • Grossman DA, Frieder O (2012) Information retrieval: algorithms and heuristics, vol 15. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Harish BS, Guru DS, Manjunath S (2010) Representation and classification of text documents: a brief review. IJCA, Special Issue on RTIPPR 2:110–119

    Google Scholar 

  • Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53

    Article  Google Scholar 

  • Hiemstra D (2009) Information retrieval models. Information Retrieval: searching in the 21st Century, pp 1–17

    Google Scholar 

  • Imran H, Sharan A (2009) Thesaurus and query expansion. Int j Comp Sci Infor Technol (IJCSIT) 1(2):89–97

    Google Scholar 

  • Jerome RN, Giuse NB, Gish KW, Sathe NA, Dietrich MS (2001) Information needs of clinical teams: analysis of questions received by the clinical informatics consult service. Bull Med Libr Assoc 89(2):177

    Google Scholar 

  • Lavrenko V, Croft WB (2017) Relevance-based language models. In ACM SIGIR Forum, ACM, New York, 51(2):260–267

    Google Scholar 

  • Lin SM, Huang CM (2017) Personalized optimal search in local query expansion. In ROCLING

    Google Scholar 

  • Lu W, Robertson S, MacFarlane A (2005) Field-weighted XML retrieval based on BM25. In: International workshop of the initiative for the evaluation of XML retrieval. Springer, Berlin/Heidelberg, pp 161–171

    Google Scholar 

  • Lv Y, Zhai C (2010) Positional relevance model for pseudo-relevance feedback. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 579–586

    Chapter  Google Scholar 

  • Mahdabi P, Crestani F (2014) The effect of citation analysis on query expansion for patent retrieval. Inf Retr 17(5–6):412–429

    Article  Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1, no 1. Cambridge university press, Cambridge, p 496)

    Google Scholar 

  • McInnes BT, Pedersen T, Pakhomov SV (2009) UMLS-Interface and UMLS-similarity: open source software for measuring paths and semantic similarity. In: AMIA annual symposium proceedings, vol 2009. American Medical Informatics Association, p 431

    Google Scholar 

  • Moffat A, Webber W, Zobel J, Baeza-Yates R (2007) A pipelined architecture for distributed text query evaluation. Inf Retr 10(3):205–231

    Article  Google Scholar 

  • Pedersen T, Pakhomov SV, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299

    Article  Google Scholar 

  • Pérez-Agüera JR, Araujo L (2008) Comparing and combining methods for automatic query expansion. arXiv preprint arXiv:0804.2057

    Google Scholar 

  • Rada R, Mili H, Bicknell E, Bletner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30

    Article  Google Scholar 

  • Rivas AR, Iglesias EL, Borrajo L (2014) Study of query expansion techniques and their application in the biomedical information retrieval. Sci World J 2014

    Google Scholar 

  • Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Found Trends Inf Retr 3(4):333–389

    Article  Google Scholar 

  • Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608

    Google Scholar 

  • Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Info Sci 41:288–297

    Article  Google Scholar 

  • Singh J, Sharan A (2015) Relevance feedback based query expansion model using borda count and semantic similarity approach. Comput Intell Neurosci 2015:96

    Google Scholar 

  • Singh J, Sharan A (2018) Rank fusion and semantic genetic notion based automatic query expansion model. Swarm Evol Comput 38:295–308

    Article  Google Scholar 

  • Slimani T (2013) Description and evaluation of semantic similarity measures approaches. arXiv preprint arXiv:1310.8059

    Google Scholar 

  • Smiley D, Pugh DE (2011) Apache Solr 3 Enterprise search server. Packt Publishing Ltd., Birmingham

    Google Scholar 

  • Urbain J, Goharian N, Frieder O (2006) IIT TREC 2006: genomics track. In: TREC

    Google Scholar 

  • Wasim M, Khan MUG, Mahmood W (2018) Enhanced biomedical retrieval using discriminative term selection for Pseudo relevance feedback. J Med Imaging Health Inform 8(5):1000–1008

    Article  Google Scholar 

  • Wei CP, Hu PJH, Tai CH, Huang CN, Yang CS (2007) Managing word mismatch problems in information retrieval: a topic-based query expansion approach. J Manag Inf Syst 24(3):269–295

    Article  Google Scholar 

  • Xiong C, Callan J (2015) Esdrank: connecting query and documents through external semi-structured data. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 951–960

    Chapter  Google Scholar 

  • Xu J, Croft WB (2017) Query expansion using local and global document analysis. In ACM SIGIR Forum, ACM, New York, 51(2):168–175

    Google Scholar 

  • Yang J, Peng W, Ward MO, Rundensteiner EA (2003). Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In IEEE Symposium on Information Visualization 2003 (IEEE Cat. No. 03TH8714) (pp 105–112). IEEE.high dimensional datasets. Information Visualization, 2003. INFOVIS 2003. IEEE Symposium on. IEEE, 2003

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaudhery Mustansar Hussain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Qadeer, M., Hussain, C.G., Hussain, C.M. (2022). Biomedical Data Retrieval Using Enhanced Query Expansion. In: Hussain, C.M., Di Sia, P. (eds) Handbook of Smart Materials, Technologies, and Devices. Springer, Cham. https://doi.org/10.1007/978-3-030-84205-5_63

Download citation

Publish with us

Policies and ethics