Skip to main content

Text Classification Based on Topic Modeling and Chi-square

  • Conference paper
  • First Online:
Genetic and Evolutionary Computing (ICGEC 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1107))

Included in the following conference series:

Abstract

This paper compares two topic modeling algorithms - Latent Dirichlet Allocation (LDA), Latent Semantic Index (LSI), and a feature selection algorithm chi-square to extract news feature words. After feature extraction, the three classifiers (Logistics Regression, Naive Bayes and SVM) are compared in news classification. Based on the test results, combined LSI and Logistics Regression gives the highest result compared to the other algorithms, with precision of 96% and recall of 95%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Platos, J., Gajdos, P., Kromer, P., Snasel, V.: Non-negative matrix factorization on GPU. In: Second International Conference 2010, vol. 87, pp. 21–30. Springer, Heidelberg (2010)

    Google Scholar 

  2. Snasel, V., Nowakova, J., Xhafa, F., Barolli, L.: Geometrical and topological approaches to Big Data. J. Future Gener. Comput. Syst. 67, 286–296 (2017)

    Article  Google Scholar 

  3. Berry, M., Browne, M.: Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM, Philadelphia (1999)

    MATH  Google Scholar 

  4. Snasel, V., Gajdos, P., Abdulla, H.M.D., Polovincak, M.: Concept lattice reduction by matrix decompositins. DCCA (2007)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Chi2 Feature selection Homepage. https://nlp.standford.edu/IR-book/html/htmledition/feature-selectionchi2-feature-selection-1.html

  7. Van der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  8. Platos, J., Kromer, P.: Prediction of multi-class industrial data. In: International Conference on Intelligent Networking and Collaborative Systems 2013, pp. 64–68 (2013)

    Google Scholar 

  9. Mantyla, M.V., Claes M., Farooq U.: Measuring LDA topic stability from clusters of replicated runs. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, p. 49. ACM (2018)

    Google Scholar 

  10. Linderman, G.C., Steinerberger, S.: Clustering with t-SNE, provably. arXiv preprint arXiv:1706.02582 (2017)

  11. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)

    Article  Google Scholar 

  12. McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: RecSys, pp. 165–172. ACM (2013)

    Google Scholar 

  13. Yang, X., Macdonald, C., Ounis, I.: Using word embeddings in twitter election classification. In: The SIGIR 2016 Workshop on Neural Information Retrieval (2016)

    Google Scholar 

  14. Sun, Y., Platoš, J.: CAPTCHA recognition based on Kohonen maps. In: International Conference on Intelligent Networking and Collaborative Systems 2019, pp. 296–305. Springer, Cham (2019)

    Google Scholar 

  15. Pan, J.S., Liu, J.L., Liu, E.J.: Improved whale optimization algorithm and its application to UCAV path planning problem. In: International Conference on Genetic and Evolutionary Computing 2018, vol. 834, pp. 37–47. Springer, Singapore (2018)

    Google Scholar 

  16. Chang, K.C., Pan, J.S., Chu, K.C., Horng, D.J., Jing, H.: Study on information and integrated of MES big data and semiconductor process furnace automation. In: International Conference on Genetic and Evolutionary Computing 2018, vol. 834, pp. 669–678. Springer, Singapore (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yujia Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, Y., Platoš, J. (2020). Text Classification Based on Topic Modeling and Chi-square. In: Pan, JS., Lin, JW., Liang, Y., Chu, SC. (eds) Genetic and Evolutionary Computing. ICGEC 2019. Advances in Intelligent Systems and Computing, vol 1107. Springer, Singapore. https://doi.org/10.1007/978-981-15-3308-2_56

Download citation

Publish with us

Policies and ethics