Abstract
Mean Average Precision has been widely used by researchers in information retrieval evaluation events such as TREC, and it is believed to be a good system measure because of its sensitivity and reliability. However, its drawbacks as regards partial relevance judgment has been largely ignored. In many cases, partial relevance judgment is probably the only reasonable solution due to the large document collections involved.
In this paper, we will address this issue through analysis and experiment. Our investigation shows that when only partial relevance judgment is available, mean average precision suffers from several drawbacks: inaccurate values, no explicit explanation, and being subject to the evaluation environment. Further, mean average precision is not superior to some other measures such as precision at a given document level for sensitivity and reliability, both of which are believed to be the major advantages of mean average precision. Our experiments also suggest that average precision over all documents would be a good measure for such a situation.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Information Retrieval
- Relevant Document
- Average Precision
- Information Retrieval System
- Relevance Judgment
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aslam, J.A., Yilmaz, E., Pavlu, V.: The maximum entropy method for analysing retrieval measures. In: Proceedings of ACM SIGIR 2005, Salvador, Brazil, pp. 27–34 (August 2005)
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of ACM SIGIR 2000, Athens, Greece, pp. 33–40 (July 2000)
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of ACM SIGIR 2004, Sheffield, United Kingdom, pp. 25–32 (July 2004)
Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete mathematics. Addison-wesley publishing company, Reading (1989)
Jävelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 442–446 (2002)
Kagolovsky, Y., Moehr, J.R.: Current status of the evaluation of information retrieval. Journal of Medical Systems 27(5), 409–424 (2003)
Kekäläinen, J.: Binary and graded relevance in IR evaluations – comparison of the effects on ranking of IR systems. Information Processing & Management 41(5), 1019–1033 (2005)
Robertson, S.E., Hancock-Beaulieu, M.M.: On the evaluation of IR systems. Information Processing & Management 28(4), 457–466 (1992)
Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitivity, and reliability. In: Proceedings of ACM SIGIR 2005, Salvador, Brazil, pp. 162–169 (August 2005)
Shaw, W.M., Burgin, R., Howell, P.: Performance standards and evaluations in IR test collections: Cluster-based retrieval models. Information Processing & Management 33(1), 1–14 (1997)
TREC, http://trec.nist.gov/
van Rijsbergen, C.J.: Information Retrieval. Butterworths (1979)
Voiskunskii, V.G.: Evaluation of search results: A new approach. Journal of the American Society for Information Science 48(2), 133–142 (1997)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of ACM SIGIR 1998, Melbourne, Australia, pp. 315–323 (August 1998)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management 36(5), 697–716 (2000)
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: Proceedings of ACM SIGIR 2002, Tampere, Finland, pp. 316–323 (August 2002)
Wu, S., McClean, S.: Modelling rank-probability of relevance relationship in resultant document list for data fusion (submitted for publication)
Zobel, J.: How reliable are the results of large-scale information retrieval experiments. In: Proceedings of ACM SIGIR 1998, Melbourne, Australia, pp. 307–314 (August 1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, S., McClean, S. (2006). Information Retrieval Evaluation with Partial Relevance Judgment. In: Bell, D.A., Hong, J. (eds) Flexible and Efficient Information Handling. BNCOD 2006. Lecture Notes in Computer Science, vol 4042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788911_7
Download citation
DOI: https://doi.org/10.1007/11788911_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35969-2
Online ISBN: 978-3-540-35971-5
eBook Packages: Computer ScienceComputer Science (R0)