Abstract
Topic models such as Latent Dirichlet Allocation have been useful text analysis methods of wide interest. Recently, moment-based inference with provable performance has been proposed for topic models. Compared with inference algorithms that approximate the maximum likelihood objective, moment-based inference has theoretical guarantee in recovering model parameters. One such inference method is tensor orthogonal decomposition, which requires only mild assumptions for exact recovery of topics. However, it suffers from scalability issue due to creation of dense, high-dimensional tensors. In this work, we propose a speedup technique by leveraging the special structure of the tensors. It is efficient in both time and space, and only requires scanning the corpus twice. It improves over the state-of-the-art inference algorithm by one to three orders of magnitude, while preserving equal inference ability.
Chapter PDF
Similar content being viewed by others
Keywords
- Gibbs Sampling
- Latent Dirichlet Allocation
- Inference Method
- Probabilistic Latent Semantic Analysis
- Power Iteration
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ahmed, A., Ho, Q., Teo, C.H., Eisenstein, J., Xing, E.P., Smola, A.J.: Online inference for the infinite topic-cluster model: Storylines from streaming text. In: AISTATS (2011)
Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S., Liu, Y.-K.: A spectral algorithm for latent dirichlet allocation. In: NIPS (2012)
Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. arXiv preprint arXiv:1210.7559 (2012)
Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. In: ICML (2013)
Arora, S., Ge, R., Moitra, A.: Learning topic models–going beyond svd. In: FOCS (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: NIPS (2009)
Foulds, J., Boyles, L., DuBois, C., Smyth, P., Welling, M.: Stochastic collapsed variational bayesian inference for latent dirichlet allocation. In: KDD (2013)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. of the National Academy of Sciences of USA 101(suppl. 1), 5228–5235 (2004)
Hoffman, M., Blei, D., Wang, C., Paisley, J.: Stochastic variational inference. Journal of Machine Learning Research 14, 1303–1347 (2013)
Hoffman, M., Blei, D.M., Mimno, D.M.: Sparse stochastic inference for latent dirichlet allocation. In: ICML (2012)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1-2), 177–196 (2001)
Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: STOC (2004)
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In: EACL (2014)
Maschhoff, K.J., Sorensen, D.: P_ARPACK: An efficient portable large scale eigenvalue package for distributed memory parallel architectures. In: Madsen, K., Olesen, D., Waśniewski, J., Dongarra, J. (eds.) PARA 1996. LNCS, vol. 1184, Springer, Heidelberg (1996)
Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. Journal of Machine Learning Research 10, 1801–1828 (2009)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: NAACL-HLT (2010)
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: KDD (2008)
Sontag, D., Roy, D.: Complexity of inference in latent dirichlet allocation. In: NIPS (2011)
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: KDD (2009)
Zhai, K., Boyd-Graber, J., Asadi, N., Alkhouja, M.L.: Mr. lda: A flexible large scale topic modeling package using variational inference in mapreduce. In: WWW (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, C., Liu, X., Song, Y., Han, J. (2014). Scalable Moment-Based Inference for Latent Dirichlet Allocation. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44845-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-662-44845-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44844-1
Online ISBN: 978-3-662-44845-8
eBook Packages: Computer ScienceComputer Science (R0)