Abstract
The task of finding novel information in information retrieval (IR) has been proposed recently and paid more attention to. Compared with techniques in traditional document-level retrieval, query expansion (QE) is dominant in the new task. This paper gives an empirical study on the effectiveness of different QE techniques on finding novel information. The conclusion is drawn according to experiments on two standard test collections of TREC2002 and TREC2003 novelty tracks. Local co-occurrence-based QE approach performs best and makes more than 15% consistent improvement, which enhances both precision and recall in some cases. Proximity-based and dependency-based QE are also effective that both make about 10% progress. Pseudo relevance feedback works better than semantics-based QE and the latter one is not helpful on finding novel information.
Supported by the Chinese Natural Science Foundation (NO. 60223004, 60321002, 60303005), and partially sponsored by the joint project with IBM China research.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Miller, G.A., et al.: Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue) 3(4), 235–312 (1990)
Smeaton, A.F., Berrut, C.: Thresholding postings lists, query expansion by word-word distances and POS tagging of Spanish text. In: Proceedings of the 4th Text Retrieval Conference (1996)
Rijbergen, V.: A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation, 106–119 (1977)
Crouch, C.J., Yong, B.: Experiments in automatic statistical thesaurus construction. In: Proceedings of 15th Int. ACM/SIGIR Conf on R&D in Information Retrieval, Copenhagen, Denmark, pp. 77–87 (1992)
Schutze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. In: Proceedings of RIAO 1994, pp. 266–274 (1994)
Chen, H., et al.: Automatic thesaurus generation for an electronic community system. Journal of American Society for Information Science 46(3), 175–193 (1995)
Lin, D., et al.: Identifying Synonyms among Distributionally Similar Words. In: Proceedings of IJCAI 2003 (2003)
Ruge, G.: Experiments on linguistically-based term associations. Information Processing and Management 28(3), 317–332 (1992)
Grefenstette, G.: Explorations in automatic thesaurus discovery. Kluwer Academic Publishers, Dordrecht (1994)
Voorhees. E. M.: Query Expansion Using Lexical-Semantic Relations. In: 17th Annual International ACM SIGIR conference (1994)
Xu, J.: Croft. W.B.: Query Expansion Using Local and Global Document Analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference, pp. 4–11 (1996)
Lin, D.: Pantel. P.: Concept Discovery from Text. In: Proceedings of Conference on Computational Linguistics 2002, Taipei, Taiwan, pp. 577–583 (2002)
Rocchio, J.: Relevance feedback in information retrieval. In: The Smart retrieval system experiments in automatic document processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Attar, R., Fraenkel, A.S.: Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery 24(3), 397–417 (1977)
Harries, Z.S.: Mathematical Structures of Language. Wiley Publisher, New York (1968)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, M., Lin, C., Ma, S. (2005). How Effective Is Query Expansion for Finding Novel Information? . In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)