Abstract
State-of-the-art score normalization methods use generative models that rely on sometimes unrealistic assumptions. We propose a novel parameter estimation method for score normalization based on logistic regression, using the expected parameters from past queries. Experiments on the Gov2 and CluewebA collection indicate that our method is consistently more precise in predicting the number of relevant documents in the top-n ranks compared to a state-of-the-art generative approach and another parameter estimate for logistic regression.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aly, R., Demeester, T., Robertson, S.: Probabilistic models in ir and their relationships. Information Retrieval, 1386–4564 (2013) ISSN 1386-4564, http://dx.doi.org/10.1007/s10791-013-9226-3 , doi:10.1007/s10791-013-9226-3
Arampatzis, A., Kamps, J.: A signal-to-noise approach to score normalization. In: CIKM 2009, USA. ACM (2009) ISBN 978-1-60558-512-3
Arampatzis, A., Robertson, S.E.: Modeling score distributions in information retrieval. Information Retrieval 14, 1–21 (2010) ISSN 1386-4564
Arampatzis, A., Kamps, J., Robertson, S.: Where to stop reading a ranked list?: threshold optimization using truncated score distributions. In: SIGIR 2009, USA, pp. 524–531. ACM (2009) ISBN 978-1-60558-483-6
Callan, J.: Distributed information retrieval. In: Croft, W. (ed.) Advances in Information Retrieval. The Information Retrieval Series, vol. 7, pp. 127–150. Springer US (2000) ISBN 978-0-7923-7812-9, http://dx.doi.org/10.1007/0-306-47019-5_5 , doi:10.1007/0-306-47019-5_5
Cooper, W., Chen, A., Gey, F.C.: Experiments in the probabilistic retrieval based on staged logistic regression. In: TREC 1994. NIST (1994)
Cormack, G.V., Smucker, M.D., Clarke, C.L.A.: Efficient and effective spam filtering and re-ranking for large web datasets. CoRR, abs/1004.5168 (2010)
Cormack, G.V., Grossman, M.R., Hedin, B., Oard, D.W.: Overview of the trec 2011 legal track. In: TREC 2011, p. 1 (2011)
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
Fernández, M., Vallet, D., Castells, P.: Using historical data to enhance rank aggregation. In: SIGIR 2006, USA. ACM (2006) ISBN 1-59593-369-7
Kanoulas, E., Dai, K., Pavlu, V., Aslam, J.A.: Score distribution models: assumptions, intuition, and robustness to score manipulation. In: SIGIR 2010, USA, pp. 242–249. ACM (2010) ISBN 978-1-4503-0153-4
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: SIGIR 2001, USA, pp. 267–275. ACM (2001) ISBN 1-58113-331-6
Nottelmann, H., Fuhr, N.: From uncertain inference to probability of relevance for advanced ir applications. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 235–250. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Aly, R. (2014). Score Normalization Using Logistic Regression with Expected Parameters. In: de Rijke, M., et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-06028-6_60
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06027-9
Online ISBN: 978-3-319-06028-6
eBook Packages: Computer ScienceComputer Science (R0)