Abstract
A common limitation of many language modeling approaches is that retrieval scores are mainly based on exact matching of terms in the queries and documents, ignoring the semantic relations among terms. Latent Dirichlet Allocation (LDA) is an approach trying to capture the semantic dependencies among words. However, using as document representation, LDA has no successful applications in information retrieval (IR). In this paper, we propose a single-document-based LDA (SLDA) document model for IR. The proposed work has been evaluated on four TREC collections, which shows that SLDA document modeling method is comparable to the state-of-the-art language modeling approaches, and it’s a novel way to use LDA model to improve retrieval performance.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the Relationship between Language Model Perplexity and IR Precision-Recall Measures. In: Proc. of 26th SIGIR, pp. 367–370 (2003)
Blei, M., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Cao, G.H., Nie, J.Y., Bai, J.: Integrating Word Relationships into Language Models. In: Proc. of 28th SIGIR, pp. 298–305 (2005)
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Proc. of 19th NIPS, pp. 241–248 (2006)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions Pattern Analysis and Machine Intelligence 6, 721–741 (1984)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)
Griffiths, T., Steyvers, M., Blei, D., Tenenbaum, J.: Integrating topics and syntax. In: Proc. of 17th NIPS, pp. 537–544 (2005)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of 22nd SIGIR, pp. 35–44 (1999)
Lafferty, J.D., Zhai, C.X.: Document language models, query models, and risk minimization for information retrieval. In: Proc. 24th of SIGIR, pp. 111–119 (2001)
Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. of 27th SIGIR, pp. 186–193 (2004)
Madsen, R.E., Kauchak, D., Elkan, C.: Modeling Word Burstiness Using the Distribution. In: Proc. of 22nd ICML, pp. 298–305 (2005)
Tao, T., Zhai, C.X.: An Exploration of Proximity Measures in Information Retrieval. In: Proc. of 30th SIGIR, pp. 295–302 (2007)
Wang, X.R., McCallum, A., Wei, X.: Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. In: Proc. of IEEE 7th ICDM, pp. 697–702 (2007)
Wei, X., Croft, W.B.: LDA-Based Document Models for Ad-hoc Retrieval. In: Proc. of 29th SIGIR, pp. 178–185 (2006)
Zhai, C.X.: Statistical Language Models for Information Retrieval: A Critical Review. Foundations and Trends in Information Retrieval 2(3), 137–213 (2008)
Zhai, C.X., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proc. of 24th SIGIR, pp. 334–342 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ma, D., Rao, L., Wang, T. (2011). An Empirical Study of SLDA for Information Retrieval. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-25631-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25630-1
Online ISBN: 978-3-642-25631-8
eBook Packages: Computer ScienceComputer Science (R0)