Abstract
Search effectiveness metrics quantify the relevance of the ranked document lists returned by retrieval systems. In this paper we characterize metrics according to seven numeric properties – boundedness, monotonicity, convergence, top-weightedness, localization, completeness, and realizability. We demonstrate that these properties partition the commonly-used evaluation metrics, and hence provide a framework in which the relationships between effectiveness metrics can be better understood, including their relative merits for different applications.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proc. SIGIR, Seattle, Washington, pp. 541–548 (2006)
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proc. SIGIR, Athens, Greece, pp. 33–40 (2000)
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proc. SIGIR, Sheffield, England, pp. 25–32 (2004)
Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press (2010)
Carterette, B.: System effectiveness, user models, and user utility: A conceptual framework for investigation. In: Proc. SIGIR, Beijing, China, pp. 903–912 (2011)
Carterette, B., Kanoulas, E., Yilmaz, E.: Simulating simple user behavior for system effectiveness evaluation. In: Proc. CIKM, Glasgow, Scotland, pp. 611–620 (2011)
Chapelle, O., Metzler, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proc. CIKM, Hong Kong, China, pp. 621–630 (2009)
Demartini, G., Mizzaro, S.: A classification of IR effectiveness metrics. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 488–491. Springer, Heidelberg (2006)
Dupret, G., Piwowarski, B.: A user browsing model to predict search engine click data from past observations. In: Proc. SIGIR, Singapore, pp. 331–338 (2008)
Dupret, G., Piwowarski, B.: A user behavior model for average precision and its generalization to graded judgments. In: Proc. SIGIR, Geneva, Switzerland, pp. 531–538 (2010)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Sys. 20(4), 422–446 (2002)
Losee, R.M.: Percent perfect performance (PPP). Inf. Proc. Man. 43(4), 1020–1029 (2007)
Mizzaro, S.: The good, the bad, the difficult, and the easy: Something wrong with information retrieval evaluation? In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 642–646. Springer, Heidelberg (2008)
Moffat, A., Thomas, P., Scholer, F.: Users versus models: What observation tells us about effectiveness metrics. In: Proc. CIKM, San Francisco, California (to appear, 2013)
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Sys. 27(1:2), 1–27 (2008)
Robertson, S.: On GMAP: and other transformations. In: Proc. CIKM, Arlington, Virginia, pp. 78–83 (2006)
Robertson, S.: A new interpretation of average precision. In: Proc. SIGIR, Singapore, pp. 689–690 (2008)
Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Ret. 11(5), 447–470 (2008)
Sakai, T.: Alternatives to BPref. In: Proc. SIGIR, Amsterdam, Netherlands, pp. 71–78 (2007)
Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitivity, and reliability. In: Proc. SIGIR, Salvador, Brazil, pp. 162–169 (2005)
Smucker, M.D., Clarke, C.L.A.: Time-based calibration of effectiveness measures. In: Proc. SIGIR, Portland, Oregon, pp. 95–104 (2012)
Turpin, A., Scholer, F.: User performance versus precision measures for simple search tasks. In: Proc. SIGIR, Seattle, Washington, pp. 11–18 (2006)
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proc. SIGIR, Singapore, pp. 51–58 (2008)
Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: Proc. CIKM, Arlington, Virginia, pp. 102–111 (2006)
Zobel, J., Moffat, A., Park, L.A.F.: Against recall: Is it persistence, cardinality, density, coverage, or totality? SIGIR Forum 43(1), 3–15 (2009)
Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proc. SIGIR, Melbourne, Australia, pp. 307–314 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moffat, A. (2013). Seven Numeric Properties of Effectiveness Metrics. In: Banchs, R.E., Silvestri, F., Liu, TY., Zhang, M., Gao, S., Lang, J. (eds) Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science, vol 8281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45068-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-45068-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45067-9
Online ISBN: 978-3-642-45068-6
eBook Packages: Computer ScienceComputer Science (R0)