Case Studies of Several Popular Text Classification Methods

Karim, Awatif; Hami, Youssef; Loqman, Chakir; Boumhidi, Jaouad

doi:10.1007/978-3-031-29857-8_56

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 668))

Included in the following conference series:

International Conference on Digital Technologies and Applications

798 Accesses

Abstract

The amount of data generated by the human race worldwide is increasing at an exponential rate every day. Therefore, data classification has become a necessity, and many researchers are focusing on evaluating automatic language processing techniques and improving text classification methods.

Recently, deep learning models have achieved state-of-the-art results in many areas, including a wide variety of NLP applications. In fact, deep learning has the potential to handle and analyze massive data in both supervised and unsupervised modes and in real time. This paper briefly introduces different feature extraction and classification algorithms and analyzes and compares the different textual representations on the performance of various text classification algorithms. The results show that distributed word representations such as word2vec and Glove outperform other feature extraction methods such as BOW. More importantly, contextual embedding, such as BERT, can achieve good performance compared to traditional word embedding and compared to other classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Impact of convolutional neural network and FastText embedding on text classification

Article Open access 24 August 2022

Text Classification Using Neural Network Language Model (NNLM) and BERT: An Empirical Comparison

Large Scale Text Classification with Efficient Word Embedding

References

Bengio, Y., Ducharme, J., Vincent, P., Janvin, C.A.: Neural probabilistic language model (2003)
Google Scholar
Mnih, A., Hinton, G.: Three new graphical models for statistical language modeling. In ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 641–648. ACM (2007)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: International Conference on Machine Learning, ICML, pp. 160–167 (2008)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pretraining of deep bidirectional transformers for language understanding. In: NAACL 2019 (2019). https://doi.org/10.18653/v1/N19-1423
Karim, A., Loqman, C., Hami, Y., Boumhidi, J.: Max stable set problem to found the initial centroids in clustering problem. Indon. J. Electr. Eng. Comput. Sci. 25(1), 569–579 (2022)
Google Scholar
Karim, A., Loqman, C., Boumhidi, J.: Determining the number of clusters using neural network and max stable set problem. Procedia Comput. Sci. 127, 16–25 (2018)
Article Google Scholar
Naili, M., Chaibi, A.H., Ghezala, H.H.B.: Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112, 340–349 (2017)
Article Google Scholar
Jiang, M., et al.: Text classification based on deep belief network and softmax regression. Neural Comput. Appl. 29(1), 61–70 (2016). https://doi.org/10.1007/s00521-016-2401-x
Article MathSciNet Google Scholar
Kowsari, K., Heidarysafa, M., Brown, D.E., Meimandi, K.J., Barnes, L.E.: RMDL: random multimodel deep learning for classification. In: Proceedings of the 2018 International Conference on Information System and Data Mining, Lakeland, FL, USA, 9–11 April 2018 (2018). https://doi.org/10.1145/3206098.3206111
Kowsari, K., Brown, D.E., Heidarysafa, M., Jafari Meimandi, K., Gerber, M.S., Barnes, L.E.: HDLTex: hierarchical deep learning for text classification. machine learning and applications (ICMLA). In: Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017 (2017)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In Proceedings of the HLT-NAACL, San Diego, CA, USA, 12–17 June 2016, pp. 1480–1489 (2016)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst. 28, 649–657 (2015)
Google Scholar
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems Conference (NIPS 2013), pp. 3111–3119 (2013)
Google Scholar
Ye, Z., Byron, C.W.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: IJCNLP (2015). arXiv preprint arXiv:1510.03820
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 331–339 (1995)
Google Scholar

Download references

Acknowledgments

This work was supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA) and the CNRST of Morocco [Alkhawarizmi/2020/36].

Author information

Authors and Affiliations

LISAC Laboratory, Faculty of Science Dhar El Mehraz, Sidi Mohamed Ben Abdellah University, Box 30003, Fez, Morocco
Awatif Karim, Chakir Loqman & Jaouad Boumhidi
MASI Team, National School of Applied Science, Abdelmalek Essaadi University, Box 1818, Tangier, Morocco
Youssef Hami

Authors

Awatif Karim
View author publications
You can also search for this author in PubMed Google Scholar
Youssef Hami
View author publications
You can also search for this author in PubMed Google Scholar
Chakir Loqman
View author publications
You can also search for this author in PubMed Google Scholar
Jaouad Boumhidi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Awatif Karim .

Editor information

Editors and Affiliations

Ecole Nationale des Sciences Appliquées, Fez, Morocco
Saad Motahhir
Faculty of Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco
Badre Bossoufi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karim, A., Hami, Y., Loqman, C., Boumhidi, J. (2023). Case Studies of Several Popular Text Classification Methods. In: Motahhir, S., Bossoufi, B. (eds) Digital Technologies and Applications. ICDTA 2023. Lecture Notes in Networks and Systems, vol 668. Springer, Cham. https://doi.org/10.1007/978-3-031-29857-8_56

Download citation

DOI: https://doi.org/10.1007/978-3-031-29857-8_56
Published: 29 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29856-1
Online ISBN: 978-3-031-29857-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Case Studies of Several Popular Text Classification Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Impact of convolutional neural network and FastText embedding on text classification

Text Classification Using Neural Network Language Model (NNLM) and BERT: An Empirical Comparison

Large Scale Text Classification with Efficient Word Embedding

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Case Studies of Several Popular Text Classification Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Impact of convolutional neural network and FastText embedding on text classification

Text Classification Using Neural Network Language Model (NNLM) and BERT: An Empirical Comparison

Large Scale Text Classification with Efficient Word Embedding

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation