Abstract
A novel algorithm named the MTN (Multiple-Topic-Number) algorithm is introduced to deal with the problem of topic number selecting in topic model issue. The purpose of our algorithm is to build the LDA (Latent Dirichlet Allocation) matrices of different topic numbers to make the LDA matrices and machine learning algorithm combined better. So it can be used to solve the traditional problem of selecting topic number: under-size or over-size. The method here is to use different levels of machine learning tree structure to complete the combination. Experimental results show the efficiency of our proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anandkumar, A., Ge, R., Hsu, D., et al.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2012)
Wang, Y., Bai, H., Stanton, M., et al.: PLDA: parallel latent Dirichlet allocation for large-scale applications. In: Algorithmic Aspects in Information and Management, pp. 301–314. Springer, Heidelberg (2009)
Teh, Y.W., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 1353–1360 (2007)
Foulds, J., et al.: Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 446–454. ACM (2013)
Feuerriegel, S., Ratku, A., Neumann, D.: Analysis of how underlying topics in financial news affect stock prices using latent Dirichlet allocation. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), pp. 1072–1081. IEEE (2016)
Philbin, J., Sivic, J., Zisserman, A.: Geometric latent Dirichlet allocation on a matching graph for large-scale image datasets. Int. J. Comput. Vis. 95(2), 138–153 (2011)
Do, K.-A., Qin, Z.S., Vannucci, M.: Predicting cancer subtypes using survival-supervised latent Dirichlet allocation models. In: Advances in Statistical Bioinformatics, pp. 366–381 (2013)
Al-Salemi, B., Ab Aziz, M.J., Noah, S.A.: LDA-AdaBoost. MH: accelerated AdaBoost. MH based on latent Dirichlet allocation for text categorization. J. Inf. Sci. 41(1), 27–40 (2015)
Xie, P., Xing, E.P.: Integrating document clustering and topic modeling. In: Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pp. 694–703. AUAI Press (2013)
Lei, L., Qiao, G., Qimin, C., Qitao, L.: LDA boost classification: boosting by topics. EURASIP J. Adv. Signal Process. 2012, 1–14 (2012)
Qin, Z., Cong, Y., Wan, T.: Topic modeling of Chinese language beyond a bag-of-words. Comput. Speech Lang. 40, 60–78 (2016)
Acknowledgement
This work was supported by Shenzhen Science and Technology Plan under grant number JCYJ20180306171938767 and the Shenzhen Foundational Research Funding JCYJ20180507183527919.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tang, L., Zhao, L. (2020). A Novel Topic Number Selecting Algorithm for Topic Model. In: Pan, JS., Lin, JW., Liang, Y., Chu, SC. (eds) Genetic and Evolutionary Computing. ICGEC 2019. Advances in Intelligent Systems and Computing, vol 1107. Springer, Singapore. https://doi.org/10.1007/978-981-15-3308-2_53
Download citation
DOI: https://doi.org/10.1007/978-981-15-3308-2_53
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3307-5
Online ISBN: 978-981-15-3308-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)