Abstract
Information retrieval (IR) evaluation measures are cornerstones for determining the suitability and task performance efficiency of retrieval systems. Their metric and scale properties enable to compare one system against another to establish differences or similarities. Based on the representational theory of measurement, this paper determines these properties by exploiting the information contained in a retrieval measure itself. It establishes the intrinsic framework of a retrieval measure, which is the common scenario when the domain set is not explicitly specified. A method to determine the metric and scale properties of any retrieval measure is provided, requiring knowledge of only some of its attained values. The method establishes three main categories of retrieval measures according to their intrinsic properties. Some common user-oriented and system-oriented evaluation measures are classified according to the presented taxonomy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Here, the commonly used term “IR evaluation metric” collides with the mathematical term “metric”, which will be used later in this paper. To solve this issue, the rest of the paper will refer the term “IR evaluation metrics” as “IR evaluation measures”, keeping the term “metric” for its mathematical sense.
- 2.
Typically a SERP includes content in a non homogeneous manner, such as images, query suggestions, knowledge panels, etc. However, here, we consider the classical ordered (or unordered) list of documents since it is the common structure considered when the evaluation of ranking models is studied.
- 3.
The associated weak order, \(\preceq _f\), may be transformed into a total order by considering the following equivalence relation: \(\mathbf {\hat{r}_1} \sim _{f} \mathbf {\hat{r}_2} \Leftrightarrow f(\mathbf {\hat{r}_1}) = f(\mathbf {\hat{r}_2})\). Let \(\mathbf {R^{*}}\) be the set of equivalence classes, and let \(\mathbf {\hat{r}^{*}_1}\) and \(\mathbf {\hat{r}^{*}_2}\) be two elements of this set containing the individual system output rankings \(\mathbf {\hat{r}_1}\), \(\mathbf {\hat{r}_2} \in \textbf{R}\), respectively. It can be defined the following ordering on \(\mathbf {R^{*}}\): \(\mathbf {\hat{r}^{*}_1} \preceq _{f}^{*} \mathbf {\hat{r}^{*}_2} \Leftrightarrow \mathbf {\hat{r}_1} \preceq _{f} \mathbf {\hat{r}_2}\). Then, \((\mathbf {R^{*}}, \preceq _{f}^{*})\) is called the reduction or quotient of \((\textbf{R}, \preceq _{f})\), where \(\preceq _{f}^{*}\) is well-defined and \((\mathbf {R^{*}}, \preceq _{f}^{*})\) is a totally ordered set [72].
- 4.
Imagine hypothetical beings living on the surface of a two-dimensional Euclidean space, \(\mathbb {R}^2\), ignorant of the surrounding three-dimensional space (but with a sense of Euclidean distance). These beings are local observers, whose view reaches only a two coordinated environment. The geometrical elements of this surface capable of being observed or measured by these beings (essentially lengths) constitute what is called the intrinsic geometry of the surface. The intrinsic properties of the surface are those which depend exclusively on the surface itself.
- 5.
- 6.
As noted in Sect. 4, the intrinsic properties of a retrieval measure deduced with this framework are based on the RTM.
References
Allan, J., Aslam, J., Belkin, N., Buckley, C., Callan, J., Croft, B., Dumais, S., Fuhr, N., Harman, D., Harper, D.J., et al.: Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, university of massachusetts amherst, september 2002. In: ACM SIGIR Forum. 1, pp. 31–47. ACM New York, NY, USA (2003)
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009)
Amigo, E., Gonzalo, J., Mizzaro, S.: What is my problem identifying formal tasks and metrics in data mining on the basis of measurement theory. IEEE Trans. Knowl. Data Eng. (2021)
Amigó, E., Gonzalo, J., Verdejo, F.: A general evaluation measure for document organization tasks. In: Proceedings of the 36th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 643–652 (2013)
Amigó, E., Mizzaro, S.: On the nature of information access evaluation metrics: a unifying framework. Inf. Retr. J. 23(3), 318–386 (2020)
Azzopardi, L., Thomas, P., Craswell, N.: Measuring the utility of search engine result pages: an information foraging based measure. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 605–614 (2018)
Baccianella, S., Esuli, A., Sebastiani, F.: Evaluation measures for ordinal regression. In: 2009 Ninth International Conference on Intelligent Systems Design and Applications, pp. 283–287. IEEE (2009)
Belew, R.K.: Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press (2000)
Blair, D.C.: Information retrieval, 2nd ed. C.J. van rijsbergen. London: Butterworths. JASIS 30(6), 374–375 (1979). https://doi.org/10.1002/asi.4630300621
Bollmann, P.: Two axioms for evaluation measures in information retrieval. In: SIGIR, vol. 84, pp. 233–245. Citeseer (1984)
Bollmann, P., Cherniavsky, V.S.: Measurement-theoretical investigation of the mz-metric. In: Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, pp. 256–267. Citeseer (1980)
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32 (2004)
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: ACM SIGIR Forum. 2, pp. 235–242. ACM New York, NY, USA (2017)
Busin, L., Mizzaro, S.: Axiometrics: An axiomatic approach to information retrieval effectiveness metrics. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, pp. 22–29 (2013)
Büttcher, S., Clarke, C.L., Yeung, P.C., Soboroff, I.: Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 63–70 (2007)
Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synth. Lect. Inf. Concepts, Retr., Serv. 2(1), 1–89 (2010)
Carterette, B.: System effectiveness, user models, and user utility: a conceptual framework for investigation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 903–912 (2011)
Carterette, B.A.: Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans. Inf. Syst. (TOIS) 30(1), 1–34 (2012)
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 621–630 (2009)
Cleverdon, C.W.: The significance of the cranfield tests on index languages. In: Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1991)
Clinchant, S., Gaussier, E.: Is document frequency important for prf? In: Conference on the Theory of Information Retrieval, pp. 89–100. Springer, Berlin (2011)
Clinchant, S., Gaussier, E.: A theoretical analysis of pseudo-relevance feedback models. In: Proceedings of the 2013 Conference on the Theory of Information Retrieval, pp. 6–13 (2013)
Cooper, W.S.: Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. Am. Doc. 19(1), 30–41 (1968)
Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, vol. 520. Addison-Wesley Reading (2010)
Do Carmo, M.P.: Differential Geometry of Curves and Surfaces: Revised and Updated, 2nd edn. Courier Dover Publications (2016)
Fang, H.: An axiomatic approach to information retrieval. Technical report (2007)
Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56 (2004)
Fang, H., Tao, T., Zhai, C.: Diagnostic evaluation of information retrieval models. ACM Trans. Inf. Syst. (TOIS) 29(2), 1–42 (2011)
Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 480–487 (2005)
Fang, H., Zhai, C.: Semantic term matching in axiomatic approaches to information retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–122 (2006)
Ferrante, M., Ferro, N., Fuhr, N.: Towards meaningful statements in ir evaluation: Mapping evaluation measures to interval scales. IEEE Access 9, 136,182–136,216 (2021)
Ferrante, M., Ferro, N., Fuhr, N.: Response to moffat’s comment on “towards meaningful statements in ir evaluation: Mapping evaluation measures to interval scales” (2022). https://doi.org/10.48550/ARXIV.2212.11735
Ferrante, M., Ferro, N., Pontarollo, S.: A general theory of ir evaluation measures. IEEE Trans. Knowl. Data Eng. 31(3), 409–422 (2018)
Ferro, N., Peters, C.: Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF, vol. 41. Springer, Berlin (2019)
Flach, P.: Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward. In: Proceedings of the AAAI Conference on Artificial Intelligence, 01, pp. 9808–9814 (2019)
Fraleigh, J.B.: A First Course in Abstract Algebra. Pearson Education India (2003)
Fréchet, M.M.: Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884–1940) 22(1), 1–72 (1906)
Fuhr, N.: Some common mistakes in ir evaluation, and how they can be avoided. In: ACM SIGIR Forum. 3, pp. 32–41. ACM New York, NY, USA (2018)
Gaudette, L., Japkowicz, N.: Evaluation methods for ordinal classification. In: Canadian Conference on Artificial Intelligence, pp. 207–210. Springer, Berlin (2009)
Gauss, C.F.: Disquisitiones Generales Circa Superficies Curvas, vol. 1. Typis Dieterichianis (1828)
Giner, F.: A comment to “a general theory of ir evaluation measures” (2023). arXiv:2303.16061
Guccione, J.A.: Espacios métricos. Universidad de Buenos Aires, Texto (2018)
Han, L., Roitero, K., Maddalena, E., Mizzaro, S., Demartini, G.: On transforming relevance scales. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 39–48 (2019)
Hand, D.J.: Statistics and the theory of measurement. J. R. Stat. Soc. A. Stat. Soc. 159(3), 445–473 (1996)
Harman, D.: Information retrieval evaluation. Synth. Lect. Inf. Concepts, Retr., Serv. 3(2), 1–119 (2011)
Hauff, C., de Jong, F.: Retrieval system evaluation: Automatic evaluation versus incomplete judgments. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 863–864 (2010)
Hausdorff, F.: Set Theory, vol. 119. American Mathematical Soc. (2005)
Huibers, T.W.C.: An axiomatic theory for information retrieval. Ph.D. thesis (1996)
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 329–338 (1993)
Hungerford, T.W.: Algebra, vol. 73. Springer Science & Business Media (2012)
Jacobson, N.: Basic Algebra I. Courier Corporation (2012)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Kando, N.: Information retrieval system evaluation using multi-grade relevance judgments-discussion on averageable single-numbered measures. IPSJ SIG Notes 63, 105–112 (2001)
Karimzadehgan, M., Zhai, C.: Axiomatic analysis of translation language model for information retrieval. In: European Conference on Information Retrieval, pp. 268–280. Springer, Berlin (2012)
Kazai, G.: Report of the inex 2003 metrics working group. In: Initiative for the Evaluation of XML Retrieval (INEX): INEX 2003 Workshop Proceedings, Dagstuhl, Germany (2004)
Kazai, G., Lalmas, M.: Inex 2005 evaluation measures. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) Advances in XML Information Retrieval and Evaluation, pp. 16–29. Springer, Berlin (2006)
Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in ir evaluation. J. Am. Soc. Inform. Sci. Technol. 53(13), 1120–1129 (2002)
Korfhage, R.R.: Information Storage and Retrieval. Wiley, USA (1997)
Krantz, D., Luce, D., Suppes, P., Tversky, A.: Foundations of Measurement, vol. I: Additive and Polynomial Representations (1971)
Krantz, D.H.: Foundations of Measurement, vol. II. Geometrical, Threshold and Probabilistic Representations (1989)
Luce, D., Krantz, D., Suppes, P., Tversky, A.: Foundations of Measurement, Vol. III Representation, Axiomatization, and Invariance (1990)
Maddalena, E., Mizzaro, S.: Axiometrics: Axioms of information retrieval effectiveness metrics. In: EVIA@ NTCIR (2014)
Michell, J.: Measurement scales and statistics: a clash of paradigms. Psychol. Bull. 100(3), 398 (1986)
Michell, J.: An Introduction to the Logic of Psychological Measurement. Psychology Press (2014)
Moffat, A.: Seven numeric properties of effectiveness metrics. In: Asia Information Retrieval Symposium, pp. 1–12. Springer, Berlin (2013)
Moffat, A.: Batch evaluation metrics in information retrieval: Measures, scales, and meaning. IEEE Access 10, 105, 564–105,577 (2022)
Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Incorporating user expectations and behavior into the measurement of search effectiveness. ACM Trans. Inf. Syst. (TOIS) 35(3), 1–38 (2017)
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. (TOIS) 27(1), 1–27 (2008)
Montazeralghaem, A., Zamani, H., Shakery, A.: Axiomatic analysis for improving the log-logistic feedback model. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–768 (2016)
Pollock, S.M.: Measures for the comparison of information retrieval systems. Am. Doc. 19(4), 387–397 (1968)
Rahimi, R., Montazeralghaem, A., Shakery, A.: An axiomatic approach to corpus-based cross-language information retrieval. Inf. Retr. J. 23(3), 191–215 (2020)
Roberts, F.S.: Measurement theory. Encycl. Math. Appl. 7 (1985)
Robertson, S.: On gmap: and other transformations. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 78–83 (2006)
Robertson, S.: On the history of evaluation in ir. J. Inf. Sci. 34(4), 439–456 (2008)
Rocchio, J.: Performance indices for document retrieval systems. In: Information Storage and Retrieval p. 83 (1964)
Rosset, C., Mitra, B., Xiong, C., Craswell, N., Song, X., Tiwary, S.: An axiomatic approach to regularizing neural ranking models. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 981–984 (2019)
Sagara, Y.: Performance measures for ranked output retrieval systems. J. Jpn. Soc. Inf. Knowl. 12(2), 22–36 (2002)
Sakai, T.: New performance metrics based on multigrade relevance: their application to question answering. In: NTCIR (2004)
Sakai, T.: Metrics, statistics, tests. In: PROMISE Winter School, pp. 116–163. Springer, Berlin (2013)
Sakai, T.: Statistical reform in information retrieval? In: ACM SIGIR Forum, vol. 48, pp. 3–12. ACM, New York, NY, USA (2014)
Sakai, T.: On fuhr’s guideline for ir evaluation. In: ACM SIGIR Forum, vol. 54, pp. 1–8. ACM, New York, NY, USA (2021)
Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retr. 11(5), 447–470 (2008)
Sakai, T., Oard, D.W., Kando, N.: Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact. Springer Nature (2021)
Salton, G.: Automatic Information Organization and Retrieval (1968)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. Mcgraw-Hill (1983)
Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retr. 4(4), 247–375 (2010)
Savoy, J.: Statistical inference in retrieval effectiveness evaluation. Inf. Process. Manag. 33(4), 495–512 (1997)
Sebastiani, F.: An axiomatically derived measure for the evaluation of classification algorithms. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, pp. 11–20 (2015)
Sirotkin, P.: On search engine evaluation metrics (2013). arXiv:1302.2318
Stevens, S.S.: Mathematics, Measurement, and Psychophysics. Wiley, New York (1951)
Stevens, S.S., et al.: On the Theory of Scales of Measurement. Bobbs-Merrill, College Division (1946)
Swets, J.A.: Information retrieval systems. Science 141(3577), 245–250 (1963)
Urbano, J., Lima, H., Hanjalic, A.: Statistical significance testing in information retrieval: an empirical analysis of type i, type ii and type iii errors. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 505–514 (2019)
Van Rijsbergen, C.J.: Foundation of evaluation. J. Doc. 30(4), 365–373 (1974)
Vanbelle, S., Albert, A.: A note on the linearly weighted kappa coefficient for ordinal scales. Stat. Methodol. 6(2), 157–163 (2009)
Velleman, P.F., Wilkinson, L.: Nominal, ordinal, interval, and ratio typologies are misleading. Am. Stat. 47(1), 65–72 (1993)
Voorhees, E.M.: The trec 2005 robust track. In: ACM SIGIR Forum, vol. 40, pp. 41–48. ACM, New York, NY, USA (2006)
Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval, vol. 63. Citeseer (2005)
Voorhees, E.M., et al.: Overview of the trec 2003 robust retrieval track. In: Trec, pp. 69–77 (2003)
Wicaksono, A.F., Moffat, A.: Metrics, user models, and satisfaction. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 654–662 (2020)
Zhang, F., Liu, Y., Li, X., Zhang, M., Xu, Y., Ma, S.: Evaluating web search with a bejeweled player model. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–434 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Formal Proofs
Proof
(Proposition 1) Symmetry is trivially verified since \(d_{f}(\mathbf {\hat{r}_1},\mathbf {\hat{r}_2})= \vert f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_2}) \vert = \vert f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1}) \vert = d_{f}(\mathbf {\hat{r}_2},\mathbf {\hat{r}_1})\). Triangular inequality is also trivial, by considering the triangular inequality on the real numbers: \(\vert f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_2}) \vert \le \vert f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_3}) \vert + \vert f(\mathbf {\hat{r}_3}) - f(\mathbf {\hat{r}_2}) \vert \). \(\square \)
Proof
(Proposition 2) An interesting result about metric spaces [42] states the following: “Let \((\mathbf {R_2}, d_2)\) be a metric space and let \(f:\mathbf {R_1} \longrightarrow \mathbf {R_2}\) an an injective or one-to-one function, then \((\mathbf {R_1}, d_1)\) is a metric space, where \(d_1(\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}) = d_2(f(\mathbf {\hat{r}_1}), f(\mathbf {\hat{r}_2}))\), \(\forall \mathbf {\hat{r}_1}\), \(\mathbf {\hat{r}_2} \in \mathbf {R_1}\)”.
In the retrieval scenario, \((\mathbf {R_2}, d_2) = (\mathbb {R}, \vert \cdot \vert )\), which is the metric space of the real line endowed with the usual norm (the absolute value). Let f be a one-to-one IR evaluation measure; from the previous result, it follows that \((\mathbf {R_1}, d_1) = (\textbf{R}, d_f)\) is a metric space, i.e., \(d_f\) verifies the three postulates of a metric. \(\square \)
Proof
(Proposition 3) It will be seen the implication from right to left. Consider a metric ordinal scale, f, where the attained values are equally spaced.
An interval is called prime if \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}] = \{\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}\}\). First, it will be seen that the function, \(F(\textbf{x}, \textbf{y}) = \vert f(\textbf{x}) - f(\textbf{y}) \vert \), attains its minimum value on any prime interval.
Let \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_3}] = \{\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}, \mathbf {\hat{r}_3}\}\) be a non-prime interval, where \(\mathbf {\hat{r}_1} \preceq _f \mathbf {\hat{r}_2} \preceq _f \mathbf {\hat{r}_3}\), then it holds that \(f(\mathbf {\hat{r}_1}) \le f(\mathbf {\hat{r}_2}) \le f(\mathbf {\hat{r}_3})\) since f is an ordinal scale. It implies that \(\vert f(\mathbf {\hat{r}_3}) - f(\mathbf {\hat{r}_1}) \vert \le \vert f(\mathbf {\hat{r}_3}) - f(\mathbf {\hat{r}_2}) \vert + \vert f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1}) \vert \), i.e., the minimum value of F is not attained at \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_3}]\). In addition, it holds that the function F assign the same value for every prime interval. Given a prime interval, \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}]\), it can be considered one of its consecutive prime intervals, \([\mathbf {\hat{r}_2}, \mathbf {\hat{r}_3}]\), since \(\preceq _f\) is a weak order (every pair of elements is comparable). These two prime intervals verify that \(f(\mathbf {\hat{r}_1}) < f(\mathbf {\hat{r}_2}) < f(\mathbf {\hat{r}_3})\) since f is a metric, and the attained values of f are equally spaced. Thus, it can be assumed that \(F(\mathbf {\hat{r}_1},\mathbf {\hat{r}_2}) = k \in \mathbb {R}^{+}\) for any prime interval \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}]\).
Now, it will be seen that equally spaced intervals (not necessarily prime) are assigned equal differences. Consider any non-prime interval, \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_m}] = \{\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}, \ldots , \mathbf {\hat{r}_m}\}\). As f is a metric, then it attains different values for different elements. Thus, it can be assumed that \(f(\mathbf {\hat{r}_1}) < f(\mathbf {\hat{r}_2}) < \cdots < f(\mathbf {\hat{r}_{m-1}}) < f(\mathbf {\hat{r}_m})\). Then, every interval \([\mathbf {\hat{r}_i}, \mathbf {\hat{r}_{i+1}}]\) are prime intervals for \(i=1, \ldots m-1\) since F attain the minimum at these intervals. As \(f(\mathbf {\hat{r}_m}) - f(\mathbf {\hat{r}_1}) = f(\mathbf {\hat{r}_m}) - f(\mathbf {\hat{r}_{m-1}}) + f(\mathbf {\hat{r}_{m-1}}) - \cdots - f(\mathbf {\hat{r}_2}) + f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1})\) and \(f(\mathbf {\hat{r}_{i+1}}) - f(\mathbf {\hat{r}_i}) = k\) for \(1 \le i \le m-1\), then \(f(\mathbf {\hat{r}_1}) - f(\mathbf {\hat{r}_m}) = k \cdot m\), which only depends on the span of the interval, m, not on the considered elements. Therefore, equally spaced intervals are assigned equal differences, i.e., f is an interval scale.
Finally, it will be seen the other implication. Consider any prime interval, \([\mathbf {\hat{r}_1}, \mathbf {\hat{r}_2}]\), of \(\textbf{R}\), as f is an interval scale, then equally spaced intervals are assigned to equal differences, i.e., the value \(\vert f(\mathbf {\hat{r}_2}) - f(\mathbf {\hat{r}_1}) \vert \) is constant for every prime interval of \(\textbf{R}\). In addition, it should be an strictly positive value. To see that the attained values are equally spaced, it is sufficient to check that different elements of \(\textbf{R}\) are assigned different values of f, which is hold since f is a metric. \(\square \)
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Giner, F. (2024). An Intrinsic Framework of Information Retrieval Evaluation Measures. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-031-47721-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-031-47721-8_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47720-1
Online ISBN: 978-3-031-47721-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)