Information Retrieval Evaluation with Partial Relevance Judgment

Wu, Shengli; McClean, Sally

doi:10.1007/11788911_7

Shengli Wu¹⁸ &
Sally McClean¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 4042))

Included in the following conference series:

British National Conference on Databases

513 Accesses

Abstract

Mean Average Precision has been widely used by researchers in information retrieval evaluation events such as TREC, and it is believed to be a good system measure because of its sensitivity and reliability. However, its drawbacks as regards partial relevance judgment has been largely ignored. In many cases, partial relevance judgment is probably the only reasonable solution due to the large document collections involved.

In this paper, we will address this issue through analysis and experiment. Our investigation shows that when only partial relevance judgment is available, mean average precision suffers from several drawbacks: inaccurate values, no explicit explanation, and being subject to the evaluation environment. Further, mean average precision is not superior to some other measures such as precision at a given document level for sensitivity and reliability, both of which are believed to be the major advantages of mean average precision. Our experiments also suggest that average precision over all documents would be a good measure for such a situation.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization

Recall-Oriented Evaluation for Information Retrieval Systems

Measuring Stability and Discrimination Power of Metrics in Information Retrieval Evaluation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Aslam, J.A., Yilmaz, E., Pavlu, V.: The maximum entropy method for analysing retrieval measures. In: Proceedings of ACM SIGIR 2005, Salvador, Brazil, pp. 27–34 (August 2005)
Google Scholar
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of ACM SIGIR 2000, Athens, Greece, pp. 33–40 (July 2000)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of ACM SIGIR 2004, Sheffield, United Kingdom, pp. 25–32 (July 2004)
Google Scholar
Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete mathematics. Addison-wesley publishing company, Reading (1989)
MATH Google Scholar
Jävelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 442–446 (2002)
Google Scholar
Kagolovsky, Y., Moehr, J.R.: Current status of the evaluation of information retrieval. Journal of Medical Systems 27(5), 409–424 (2003)
Article Google Scholar
Kekäläinen, J.: Binary and graded relevance in IR evaluations – comparison of the effects on ranking of IR systems. Information Processing & Management 41(5), 1019–1033 (2005)
Article Google Scholar
Robertson, S.E., Hancock-Beaulieu, M.M.: On the evaluation of IR systems. Information Processing & Management 28(4), 457–466 (1992)
Article Google Scholar
Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitivity, and reliability. In: Proceedings of ACM SIGIR 2005, Salvador, Brazil, pp. 162–169 (August 2005)
Google Scholar
Shaw, W.M., Burgin, R., Howell, P.: Performance standards and evaluations in IR test collections: Cluster-based retrieval models. Information Processing & Management 33(1), 1–14 (1997)
Article Google Scholar
TREC, http://trec.nist.gov/
van Rijsbergen, C.J.: Information Retrieval. Butterworths (1979)
Google Scholar
Voiskunskii, V.G.: Evaluation of search results: A new approach. Journal of the American Society for Information Science 48(2), 133–142 (1997)
Article Google Scholar
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of ACM SIGIR 1998, Melbourne, Australia, pp. 315–323 (August 1998)
Google Scholar
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management 36(5), 697–716 (2000)
Article Google Scholar
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: Proceedings of ACM SIGIR 2002, Tampere, Finland, pp. 316–323 (August 2002)
Google Scholar
Wu, S., McClean, S.: Modelling rank-probability of relevance relationship in resultant document list for data fusion (submitted for publication)
Google Scholar
Zobel, J.: How reliable are the results of large-scale information retrieval experiments. In: Proceedings of ACM SIGIR 1998, Melbourne, Australia, pp. 307–314 (August 1998)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, University of Ulster, Northern Ireland, UK
Shengli Wu & Sally McClean

Authors

Shengli Wu
View author publications
You can also search for this author in PubMed Google Scholar
Sally McClean
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The School of Electronics, Electrical, Engineering and Computer Science, Queen’s University Belfast, BT7 1NN N.I., Belfast, UK
David A. Bell
School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, BT7 1NN, Belfast, UK
Jun Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, S., McClean, S. (2006). Information Retrieval Evaluation with Partial Relevance Judgment. In: Bell, D.A., Hong, J. (eds) Flexible and Efficient Information Handling. BNCOD 2006. Lecture Notes in Computer Science, vol 4042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788911_7

Download citation

DOI: https://doi.org/10.1007/11788911_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35969-2
Online ISBN: 978-3-540-35971-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Information Retrieval Evaluation with Partial Relevance Judgment

Abstract

Chapter PDF

Similar content being viewed by others

Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization

Recall-Oriented Evaluation for Information Retrieval Systems

Measuring Stability and Discrimination Power of Metrics in Information Retrieval Evaluation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Information Retrieval Evaluation with Partial Relevance Judgment

Abstract

Chapter PDF

Similar content being viewed by others

Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization

Recall-Oriented Evaluation for Information Retrieval Systems

Measuring Stability and Discrimination Power of Metrics in Information Retrieval Evaluation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation