Selecting a Subset of Queries for Acquisition of Further Relevance Judgements

Hosseini, Mehdi; Cox, Ingemar J.; Milic-Frayling, Natasa; Vinay, Vishwa; Sweeting, Trevor

doi:10.1007/978-3-642-23318-0_12

Mehdi Hosseini¹⁸,
Ingemar J. Cox¹⁸,
Natasa Milic-Frayling¹⁹,
Vishwa Vinay¹⁹ &
…
Trevor Sweeting¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6931))

Included in the following conference series:

Conference on the Theory of Information Retrieval

861 Accesses
6 Citations

Abstract

Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-of-the-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Fewer topics? A million topics? Both?! On topics subsets in test collections

Article 08 May 2019

How to Run an Evaluation Task

Efficiently Estimating Retrievability Bias

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Allan, J., Carterette, B., Aslam, J.A., Pavlu, V., Dachev, B., Kanoulas, E.: TREC 2007 million query track. Notebook Proceedings of TREC 2007. TREC (2007)
Google Scholar
Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 541–548. ACM, New York (2006)
Google Scholar
Carterette, B., Allan, J., Sitaraman, R.: Minimal test collections for retrieval evaluation. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 268–275. ACM, New York (2006)
Google Scholar
Carterette, B., Gabrilovich, E., Josifovski, V., Metzler, D.: Measuring the reusability of test collections. In: WSDM 2010: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 231–240. ACM, New York (2010)
Google Scholar
Carterette, B., Kanoulas, E., Pavlu, V., Fang, H.: Reusable test collections through experimental design. In: SIGIR 2010: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 547–554. ACM, New York (2010)
Google Scholar
Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J.: Evaluation over thousands of queries. In: SIGIR 2008: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 651–658. ACM, New York (2008)
Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Annals of Statistics 32, 407–499 (2004)
Article MathSciNet MATH Google Scholar
Guiver, J., Mizzaro, S., Robertson, S.: A few good topics: Experiments in topic set reduction for retrieval evaluation. ACM Trans. Inf. Syst. 27(4) (2009)
Google Scholar
Mizzaro, S., Robertson, S.: Hits hits trec: exploring ir evaluation results with network analysis. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 479–486. ACM, New York (2007)
Google Scholar
Robertson, S.: On the contributions of topics to system evaluation. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 129–140. Springer, Heidelberg (2011)
Chapter Google Scholar
Sparck Jones, K., van Rijsbergen, K.: Information retrieval test collections. Journal of Documentation 32(1), 59–75 (1976)
Article Google Scholar
Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002)
Chapter Google Scholar
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 316–323. ACM, New York (2002)
Chapter Google Scholar
Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: SIGIR 1998: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–314. ACM, New York (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

University College London, UK
Mehdi Hosseini, Ingemar J. Cox & Trevor Sweeting
Microsoft Research Cambridge, UK
Natasa Milic-Frayling & Vishwa Vinay

Authors

Mehdi Hosseini
View author publications
You can also search for this author in PubMed Google Scholar
Ingemar J. Cox
View author publications
You can also search for this author in PubMed Google Scholar
Natasa Milic-Frayling
View author publications
You can also search for this author in PubMed Google Scholar
Vishwa Vinay
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Sweeting
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fondazione Ugo Bordoni, Viale del Policlinico 147, 00161, Rome, Italy
Giambattista Amati
Faculty of Informatics, University of Lugano, 6900, Lugano, Switzerland
Fabio Crestani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hosseini, M., Cox, I.J., Milic-Frayling, N., Vinay, V., Sweeting, T. (2011). Selecting a Subset of Queries for Acquisition of Further Relevance Judgements. In: Amati, G., Crestani, F. (eds) Advances in Information Retrieval Theory. ICTIR 2011. Lecture Notes in Computer Science, vol 6931. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23318-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-23318-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23317-3
Online ISBN: 978-3-642-23318-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Selecting a Subset of Queries for Acquisition of Further Relevance Judgements

Abstract

Chapter PDF

Similar content being viewed by others

Fewer topics? A million topics? Both?! On topics subsets in test collections

How to Run an Evaluation Task

Efficiently Estimating Retrievability Bias

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Selecting a Subset of Queries for Acquisition of Further Relevance Judgements

Abstract

Chapter PDF

Similar content being viewed by others

Fewer topics? A million topics? Both?! On topics subsets in test collections

How to Run an Evaluation Task

Efficiently Estimating Retrievability Bias

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation