Scalable Moment-Based Inference for Latent Dirichlet Allocation

Wang, Chi; Liu, Xueqing; Song, Yanglei; Han, Jiawei

doi:10.1007/978-3-662-44845-8_19

Chi Wang²³,
Xueqing Liu²³,
Yanglei Song²³ &
…
Jiawei Han²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8726))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2849 Accesses

Abstract

Topic models such as Latent Dirichlet Allocation have been useful text analysis methods of wide interest. Recently, moment-based inference with provable performance has been proposed for topic models. Compared with inference algorithms that approximate the maximum likelihood objective, moment-based inference has theoretical guarantee in recovering model parameters. One such inference method is tensor orthogonal decomposition, which requires only mild assumptions for exact recovery of topics. However, it suffers from scalability issue due to creation of dense, high-dimensional tensors. In this work, we propose a speedup technique by leveraging the special structure of the tensors. It is efficient in both time and space, and only requires scanning the corpus twice. It improves over the state-of-the-art inference algorithm by one to three orders of magnitude, while preserving equal inference ability.

Download to read the full chapter text

Chapter PDF

A new method of moments for latent variable models

Article 22 May 2018

Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streaming

Article 28 February 2024

A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ahmed, A., Ho, Q., Teo, C.H., Eisenstein, J., Xing, E.P., Smola, A.J.: Online inference for the infinite topic-cluster model: Storylines from streaming text. In: AISTATS (2011)
Google Scholar
Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S., Liu, Y.-K.: A spectral algorithm for latent dirichlet allocation. In: NIPS (2012)
Google Scholar
Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. arXiv preprint arXiv:1210.7559 (2012)
Google Scholar
Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. In: ICML (2013)
Google Scholar
Arora, S., Ge, R., Moitra, A.: Learning topic models–going beyond svd. In: FOCS (2012)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: NIPS (2009)
Google Scholar
Foulds, J., Boyles, L., DuBois, C., Smyth, P., Welling, M.: Stochastic collapsed variational bayesian inference for latent dirichlet allocation. In: KDD (2013)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. of the National Academy of Sciences of USA 101(suppl. 1), 5228–5235 (2004)
Article Google Scholar
Hoffman, M., Blei, D., Wang, C., Paisley, J.: Stochastic variational inference. Journal of Machine Learning Research 14, 1303–1347 (2013)
MathSciNet Google Scholar
Hoffman, M., Blei, D.M., Mimno, D.M.: Sparse stochastic inference for latent dirichlet allocation. In: ICML (2012)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1-2), 177–196 (2001)
Article MATH Google Scholar
Kempe, D., McSherry, F.: A decentralized algorithm for spectral analysis. In: STOC (2004)
Google Scholar
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In: EACL (2014)
Google Scholar
Maschhoff, K.J., Sorensen, D.: P_ARPACK: An efficient portable large scale eigenvalue package for distributed memory parallel architectures. In: Madsen, K., Olesen, D., Waśniewski, J., Dongarra, J. (eds.) PARA 1996. LNCS, vol. 1184, Springer, Heidelberg (1996)
Chapter Google Scholar
Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. Journal of Machine Learning Research 10, 1801–1828 (2009)
MATH MathSciNet Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: NAACL-HLT (2010)
Google Scholar
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: KDD (2008)
Google Scholar
Sontag, D., Roy, D.: Complexity of inference in latent dirichlet allocation. In: NIPS (2011)
Google Scholar
Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: KDD (2009)
Google Scholar
Zhai, K., Boyd-Graber, J., Asadi, N., Alkhouja, M.L.: Mr. lda: A flexible large scale topic modeling package using variational inference in mapreduce. In: WWW (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Chi Wang, Xueqing Liu, Yanglei Song & Jiawei Han

Authors

Chi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xueqing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanglei Song
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Computer and Decision Engineering, Université Libre de Bruxelles, Av. F. Roosevelt, CP 165/15, 1050, Brussels, Belgium
Toon Calders
Dipartimento di Informatica, Università degli Studi “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Floriana Esposito
Department of Computer Science, Universität Paderborn, Warburger Str. 100, 33098, Paderborn, Germany
Eyke Hüllermeier
Dipartimento di Informatica,, Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Rosa Meo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Liu, X., Song, Y., Han, J. (2014). Scalable Moment-Based Inference for Latent Dirichlet Allocation. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44845-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-662-44845-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44844-1
Online ISBN: 978-3-662-44845-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalable Moment-Based Inference for Latent Dirichlet Allocation

Abstract

Chapter PDF

Similar content being viewed by others

A new method of moments for latent variable models

Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streaming

A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Scalable Moment-Based Inference for Latent Dirichlet Allocation

Abstract

Chapter PDF

Similar content being viewed by others

A new method of moments for latent variable models

Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streaming

A Left-to-Right Algorithm for Likelihood Estimation in Gamma-Poisson Factor Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation