Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE

Khennak, Ilyes; Drias, Habiba

doi:10.1007/s11704-016-5560-0

Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE

Research Article
Published: 18 October 2017

Volume 12, pages 163–176, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Frontiers of Computer Science Aims and scope Submit manuscript

Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE

Download PDF

Ilyes Khennak¹ &
Habiba Drias¹

56 Accesses
1 Citation
Explore all metrics

Abstract

Because of users’ growing utilization of unclear and imprecise keywords when characterizing their information need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms occurring in the largest possible number of documents where the query keywords appear; (2) proximity, where more importance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria simultaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the retrieval performance as compared to the baseline.

Article PDF

Pseudo relevance feedback optimization

Article 25 May 2021

Pseudo-Relevance Feedback for Information Retrieval in Medicine Using Genetic Algorithms

Pseudo-Relevance Feedback Based on Locally-Built Co-occurrence Graphs

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ranganathan P. From microprocessors to nanostores: rethinking datacentric systems. IEEE Computer, 2011, 44(1): 39–48
Article Google Scholar
Zhu Y Y, Zhong N, Xiong Y. Data explosion, data nature and dataology. In: Proceedings of International Conference on Brain Informatics. 2009, 147–158
Google Scholar
Ntoulas A, Cho J, Olston C. What’s new on the Web?: the evolution of the Web from a search engine perspective. In: Proceedings of the 13th International Conference on World Wide Web. 2004, 1–12
Google Scholar
Bharat K, Broder A. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems, 1998, 30(1): 379–388
Article Google Scholar
Williams H E, Zobel J. Searchable words on the Web. International Journal on Digital Libraries, 2005, 5(2): 99–105
Article Google Scholar
Eisenstein J, O’Connor B, Smith N A, Xing E P. Mapping the geographical diffusion of new words. In: Proceedings of Workshop on Social Network and Social Media Analysis: Methods, Models and Applications. 2012
Google Scholar
Sun H M. A study of the features of internet english from the linguistic perspective. Studies in Literature and Language, 2010, 1(7): 98–103
Google Scholar
Chen Q, Li M, Zhou M. Improving query spelling correction usingWeb search results. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007, 181–189
Google Scholar
Subramaniam L V, Roy S, Faruquie T A, Negi S. A survey of types of text noise and techniques to handle noisy text. In: Proceedings of the 3rd Workshop on Analytics for Noisy Unstructured Text Data. 2009, 115–122
Chapter Google Scholar
Ahmad F, Kondrak G. Learning a spelling error model from search query logs. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 2005, 955–962
Google Scholar
Carpineto C, Romano G. A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 2012, 44(1): 1–50
Article MATH Google Scholar
Véronis J. Hyperlex: lexical cartography for information retrieval. Computer Speech & Language, 2004, 18(3): 223–252
Article Google Scholar
Bernardini A, Carpineto C, Amico M D. Full-subtopic retrieval with keyphrase-based search results clustering. In: Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technologies. 2009, 206–213
Google Scholar
Wong S K M, Ziarko W, Raghavan V V, Wong P. On modeling of information retrieval concepts in vector spaces. ACM Transactions on Database Systems, 1987, 12(2): 299–321
Article Google Scholar
Crestani F. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review, 1997, 11(6): 453–482
Article Google Scholar
Carpineto C, Romano G. Concept Data Analysis: Theory and Applications. Chichester: John Wiley & Sons, 2004
Book MATH Google Scholar
Sahlgren M. An introduction to random indexing. In: Proceedings of Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering. 2005
Google Scholar
Melucci M. A basis for information retrieval in context. ACM Transactions on Information Systems, 2008, 26(3): 1–41
Article Google Scholar
Sun R, Ong C H, Chua T S. Mining dependency relations for query expansion in passage retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 382–389
Google Scholar
Schlaefer N, Ko J, Betteridge J, Pathak M A, Nyberg E, Sautter G. Semantic extensions of the Ephyra QA system for TREC 2007. In: Proceedings of the 16th Text REtrieval Conference. 2007
Google Scholar
Kraaij W, Nie J Y, Simard M. Embedding Web-based statistical translation models in cross-language information retrieval. Computational Linguistics, 2003, 29(3): 381–419
Article MATH Google Scholar
Kherfi M L, Ziou D, Bernardi A. Image retrieval from the World Wide Web: issues, techniques, and systems. ACM Computing Surveys, 2004, 36(1): 35–67
Article Google Scholar
Natsev A P, Haubold A, Tešić J, Xie L X, Yan R. Semantic conceptbased query expansion and re-ranking for multimedia retrieval. In: Proceedings of the 15th ACM International Conference on Multimedia. 2007, 991–1000
Google Scholar
Arguello J, Elsas J L, Callan J, Carbonell J G. Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference onWeblogs and Social Media. 2008, 10–18
Google Scholar
Hidalgo J M G, de Buenaga Rodríguez M, Pérez J C C. The role of word sense disambiguation in automated text categorization. In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems. 2005, 298–309
Google Scholar
Graupmann J, Cai J, Schenkel R. Automatic query refinement using mined semantic relations. In: Proceedings of International Workshop on Challenges in Web Information Retrieval and Integration. 2005, 205–213
Google Scholar
Kamvar M, Baluja S. The role of context in query input: using contextual signals to complete queries on mobile devices. In: Proceedings of the 9th International Conference on Human Computer Interaction with Mobile Devices and Services. 2007, 405–412
Google Scholar
Huang C C, Lin K M, Chien L F. Automatic training corpora acquisition through Web mining. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technologies. 2005, 193–199
Google Scholar
Perugini S, Ramakrishnan N. Interacting withWeb hierarchies. IT Professional, 2006, 8(4): 19–28
Article Google Scholar
Church K, Smyth B. Mobile content enrichment. In: Proceedings of the 12th International Conference on Intelligent User Interfaces. 2007, 112–121
Google Scholar
Macdonald C, Ounis I. Expertise drift and query expansion in expert search. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. 2007, 341–350
Google Scholar
Billerbeck B, Zobel J. Document expansion versus query expansion for ad-hoc retrieval. In: Proceedings of the 10th Australasian Document Computing Symposium. 2005, 34–41
Google Scholar
Shokouhi M, Azzopardi L, Thomas P. Effective query expansion for federated search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 427–434
Google Scholar
Wang H, Liang Y, Fu L, Xue G R, Yu Y. Efficient query expansion for advertisement search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 51–58
Google Scholar
Voorhees E M. Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1994, 61–69
Google Scholar
Collins-Thompson K, Callan J. Query expansion using random walk models. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 2005, 704–711
Google Scholar
Liu S, Liu F, Yu C, Meng W Y. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004, 266–272
Google Scholar
Song M, Song I Y, Hu X H, Allen R B. Integration of association rules and ontologies for semantic query expansion. Data & Knowledge Engineering, 2007, 63(1): 63–75
Article Google Scholar
Gauch S, Wang J Y, Rachakonda S M. A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Transactions on Information Systems, 1999, 17(3): 250–269
Article Google Scholar
Hu J N, Deng W H, Guo J. Improving retrieval performance by global analysis. In: Proceedings of the 18th International Conference on Pattern Recognition. 2006, 703–706
Google Scholar
Park L A, Ramamohanarao K. Query expansion using a collection dependent probabilistic latent semantic thesaurus. In: Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2007, 224–235
Chapter Google Scholar
Milne D N, Witten I H, Nichols D M. A knowledge-based search engine powered by wikipedia. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. 2007, 445–454
Google Scholar
Rocchio J J. Relevance feedback in information retrieval. The SMART Retrieval System-Experiments in Automatic Document Processing, 1971, 313–323
Google Scholar
Robertson S E, Jones K S. Relevance weighting of search terms. Journal of the American Society for Information Science, 1976, 27(3): 129–146
Article Google Scholar
Wong W, Luk R W P, Leong H V, Ho K, Lee D L. Re-examining the effects of adding relevance information in a relevance feedback environment. Information Processing & Management, 2008, 44(3): 1086–1116
Article Google Scholar
Zhai C X, Lafferty J. Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management. 2001, 403–410
Google Scholar
Lavrenko V, Croft W B. Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001, 120–127
Google Scholar
Khennak I, Drias H. Strength pareto fitness assignment for generating expansion features. In: Proceedings of the 3rd World Conference on Information Systems and Technologies. 2015, 133–142
Google Scholar
Robertson S, Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends® in Information Retrieval, 2009, 3(4): 333–389
Article Google Scholar
Robertson S E. On term selection for query expansion. Journal of Documentation, 1990, 46(4): 359–364
Article Google Scholar
Carpineto C, De Mori R, Romano G, Bigi B. An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 2001, 19(1): 1–27
Article Google Scholar
Jurafsky D, Martin J H. Speech and Language Processing. Upper Saddle River, NJ: Pearson Prentice Hall, 2014
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Research in Artificial Intelligence, Computer Science Department, University of Sciences and Technology Houari Boumediene (USTHB), Algiers, 16111, Algeria
Ilyes Khennak & Habiba Drias

Authors

Ilyes Khennak
View author publications
You can also search for this author in PubMed Google Scholar
Habiba Drias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilyes Khennak.

Additional information

Ilyes Khennak is a PhD student in computer science at University of Sciences and Technology Houari Boumediene (USTHB), Algeria. He received his master degree in intelligent computer systems from USTHB in 2011. His research interests include artificial intelligence and information retrieval.

Habiba Drias received the MS degree in computer science from Case Western Reserve University, USA in 1984 and the PhD degree in computer science from University of Sciences and Technology Houari Boumediene (USTHB), Algeria in collaboration with UPMC, France in 1993. She is currently a full professor at USTHB since 1999 and directs the Laboratory of Research in Artificial Intelligence (LRIA). She has published around 200 papers in wellrecognized international conference proceedings and journals and has directed 20 PhD theses, 38 master theses and 31 engineer projects. In 2013, she won the Algerian Scopus award in computer science, and she was selected by a jury of international academicians as a founding member of the Algerian Academy of Science and Technology (AAST) in 2015.

Electronic supplementary material

Supplementary material, approximately 353 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khennak, I., Drias, H. Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE. Front. Comput. Sci. 12, 163–176 (2018). https://doi.org/10.1007/s11704-016-5560-0

Download citation

Received: 26 December 2015
Accepted: 12 September 2016
Published: 18 October 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11704-016-5560-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE

Abstract

Article PDF

Similar content being viewed by others

Pseudo relevance feedback optimization

Pseudo-Relevance Feedback for Information Retrieval in Medicine Using Genetic Algorithms

Pseudo-Relevance Feedback Based on Locally-Built Co-occurrence Graphs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 353 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE

Abstract

Article PDF

Similar content being viewed by others

Pseudo relevance feedback optimization

Pseudo-Relevance Feedback for Information Retrieval in Medicine Using Genetic Algorithms

Pseudo-Relevance Feedback Based on Locally-Built Co-occurrence Graphs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 353 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation