Finding Better Topics: Features, Priors and Constraints

Wu, Xiaona; Zeng, Jia; Yan, Jianfeng; Liu, Xiaosheng

doi:10.1007/978-3-319-06605-9_25

Xiaona Wu²³,
Jia Zeng²³,
Jianfeng Yan²³ &
…
Xiaosheng Liu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4222 Accesses
1 Citations

Abstract

Latent Dirichlet allocation (LDA) is a popular probabilistic topic modeling paradigm. In practice, LDA users usually face two problems. First, the common and stop words tend to occupy all topics leading to bad topic interpretability. Second, there is little guidance on how to improve the low-dimensional topic features for a better clustering or classification performance. To find better topics, we re-examine LDA from three perspectives: continuous features, asymmetric Dirichlet priors and sparseness constraints, using variants of belief propagation (BP) inference algorithms. We show that continuous features can remove the common and stop words from topics effectively. Asymmetric Dirichlet priors have substantial advantages over symmetric priors. Sparseness constraints do not improve the overall performance very much.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Improving Classification Using Topic Correlation and Expectation Propagation

Robust Initialization for Learning Latent Dirichlet Allocation

Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

Keywords

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101, 5228–5235 (2004)
Article Google Scholar
Zeng, J., Cheung, W.K., Liu, J.: Learning topic models by belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 1121–1134 (2013)
Article Google Scholar
Chang, J., Boyd-Graber, J., Gerris, S., Wang, C., Blei, D.: Reading tea leaves: How humans interpret topic models. In: NIPS, pp. 288–296 (2009)
Google Scholar
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
Google Scholar
Buckley, C.: Automatic query expansion using SMART: Trec 3. In: Proceedings of The Third Text REtrieval Conference (TREC-3), pp. 69–80 (1994)
Google Scholar
Hoffman, M., Blei, D., Bach, F.: Online learning for latent Dirichlet allocation. In: NIPS, pp. 856–864 (2010)
Google Scholar
Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Web Search and Data Mining, pp. 54–63 (2009)
Google Scholar
Wilson, A.T., Chew, P.A.: Term weighting schemes for latent Dirichlet allocation. In: North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 465–473 (2010)
Google Scholar
Minka, T.P.: Estimating a Dirichlet distribution. Technical report, Microsoft Research (2000)
Google Scholar
Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: UAI, pp. 27–34 (2009)
Google Scholar
Wallach, H., Mimno, D., McCallum, A.: Rethinking LDA: Why priors matter. In: NIPS, pp. 1973–1981 (2009)
Google Scholar
Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI (2011)
Google Scholar
Zhu, W., Zhang, L., Bian, Q.: A hierarchical latent topic model based on sparse coding. Neurocomputing 76(1), 28–35 (2012)
Article Google Scholar
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)
MATH MathSciNet Google Scholar
Zeng, J., Cao, X.-Q., Liu, Z.-Q.: Residual belief propagation for topic modeling. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS, vol. 7713, pp. 739–752. Springer, Heidelberg (2012)
Chapter Google Scholar
Zeng, J., Liu, Z.Q., Cao, X.Q.: A new approach to speeding up topic modeling, arXiv:1204.0170 [cs.LG] (2012)
Google Scholar
Heinrich, G.: Parameter estimation for text analysis. Technical report, University of Leipzig (2008)
Google Scholar
Zhong, S., Ghosh, J.: Generative model-based document clustering: A comparative study. Knowl. Inf. Syst. 8(3), 374–384 (2005)
Article Google Scholar
Mimno, D.M., Wallach, H.M., Talley, E.M., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)
Google Scholar
Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Australasian Document Computing Symposium, pp. 11–18 (2009)
Google Scholar
Zeng, J.: TMBP: A topic modeling toolbox using belief propagation. J. Mach. Learn.Res. 13, 2233–2236 (2012)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Xiaona Wu, Jia Zeng, Jianfeng Yan & Xiaosheng Liu

Authors

Xiaona Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosheng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Tainan, Taiwan, R.O.C.
Vincent S. Tseng & Hung-Yu Kao &
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Tu Bao Ho
Nanjing University, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan, R.O.C.
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Zeng, J., Yan, J., Liu, X. (2014). Finding Better Topics: Features, Priors and Constraints. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-06605-9_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Finding Better Topics: Features, Priors and Constraints

Abstract

Chapter PDF

Similar content being viewed by others

Improving Classification Using Topic Correlation and Expectation Propagation

Robust Initialization for Learning Latent Dirichlet Allocation

Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Finding Better Topics: Features, Priors and Constraints

Abstract

Chapter PDF

Similar content being viewed by others

Improving Classification Using Topic Correlation and Expectation Propagation

Robust Initialization for Learning Latent Dirichlet Allocation

Obtaining More Specific Topics and Detecting Weak Signals by Topic Word Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation