Skip to main content

Combining Families of Information Retrieval Algorithms Using Metalearning

  • Chapter
Survey of Text Mining

Abstract

This chapter describes some experiments that use metalearning to combine families of information retrieval (IR) algorithms obtained by varying the normalizations and similarity functions. By metalearning, we mean the following simple idea: a family of IR algorithms is applied to a corpus of documents in which relevance is known to produce a learning set. A machine learning algorithm is then applied to this data set to produce a classifier that combines the different IR algorithms. In experiments with TREC-3 data, we could significantly improve precision at the same level of recall with this technique. Most prior work in this area has focused on combining different IR algorithms with various averaging schemes or has used a fixed combining function. The combining function in metalearning is a statistical model itself which in general depends on the document, the query, and the various scores produced by the different component IR algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. T.G. Dietterich.Machine-learning research: Four current directions.AI Magazine, 18 (4): 97–136, 1997.

    Google Scholar 

  2. E.A. Fox and J.A. Shaw.Combination of multiple sources.In Proceedings of the Second Text Retrieval Conference (TREC-2), pages 97–136, 1994.

    Google Scholar 

  3. R.L. Grossman, H. Bodek, D. Northcutt, and H.V. Poor.Data mining and tree-based optimization.In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, eds., AAAI Press, Menlo Park, CA, pages 323–326, 1996.

    Google Scholar 

  4. R.L. Grossman and R.G. Larson.A state space realization theorem for data mining. In subm., 2002.

    Google Scholar 

  5. E. Greengrass.Information retrieval: A survey.United States Department of Defense Technical Report TR–R52–008–001, 2001.

    Google Scholar 

  6. D.K. Harman, editor.Proceedings of the Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500–226, 1995.

    Google Scholar 

  7. D.A. Hull, J.O. Pedersen, and H. Schütze.Method combination for document filtering.In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996.

    Google Scholar 

  8. J.H. Lee.Combining multiple evidence from different properties of weighting schemes.In Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1995.

    Google Scholar 

  9. J.H. Lee.Analyses of multiple evidence combination.In Proceedings of the Twentieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1997.

    Google Scholar 

  10. J. Mayfield.Personal communication, 2000.

    Google Scholar 

  11. J. Mayfield, P. McNamee, and C. Piatko.The JHU/APL HAIRCUT System at TREC-8.National Institute of Standards and Technology Special Publication, 2000.

    Google Scholar 

  12. PATTERN. The pattern system version 2.6, Magnify, Inc., 1999.

    Google Scholar 

  13. A.L. Prodromidis, P.K. Chan, and S.J. Stolfo.Meta-learning in distributed data mining systems, issues and approaches.In Advances in Distributed Data Mining, Hillol Kargupta and Philip Chan, eds., MIT Press, Cambridge, MA, pages 81–113, 2000.

    Google Scholar 

  14. G. Salton.Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.Addison-Wesley, Reading, MA, 1989.

    Google Scholar 

  15. C.C. Vogt and G.W. Cottrell.Predicting the performance of linearly combined IR systems.In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,pages 190–196, 998.

    Google Scholar 

  16. C. J. van Rijsbergen.Information Retrieval, second edition. Butterworths, London, 1979.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this chapter

Cite this chapter

Cornelson, M., Greengrass, E., Grossman, R.L., Karidi, R., Shnidman, D. (2004). Combining Families of Information Retrieval Algorithms Using Metalearning. In: Berry, M.W. (eds) Survey of Text Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-4305-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-4305-0_7

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-3057-6

  • Online ISBN: 978-1-4757-4305-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics