Abstract
Parts-of-speech (POS) tagging plays an important role in the field of natural language processing (NLP), such as—retrieval of information, machine translation, spelling check, language processing, sentiment analysis, and so on. Many works have been done for Bangla part-of-speech (POS) tagging using machine learning but the result does not enough. It is a matter of fact that not even a single effective research work has been conducted for Bangla POS tagging using deep learning due to a lack of data scarcity. Considering that our context is the Bangla POS tagging employing both machine learning and deep learning approach. In our research, we have compared some well-known supervised POS tagging approaches (Brill, HMM, unigram, bigram, trigram, and recurrent neural network) for Bangla languages. The supervised POS tagging technique requires a large number of data set to tag accurately. That is why we have used a large number of data set for POS tagging of Bangla languages, which will accept a raw Bangla text to produce a Bangla POS tagged output that can be directly used for other NLP applications. After the comparison, we have found the best tagging approach in terms of performance. Bangla is an inflectional language. That is why it is a very much tough job for grammatical categories of Bangla language. But our proposed model works well for Bangla languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bengali Vocabulary. Available online https://en.wikipedia.org/wiki/Bengali_vocabulary. Last accessed 2 August 2020
Sarkar, K., Gayen, V.:A practical part-of-speech tagger for Bengali. In: 2012 Third International Conference on Emerging Applications of Information Technology, pp. 36–40. IEEE (2012)
Sarkar, K., Gayen, V.: A trigram HMM-based POS tagger for Indian languages. In: Satapathy, S., Udgata, S., Biswal, B. (eds.) Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA). Advances in Intelligent Systems and Computing, vol 199. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35314-7_24
Uddin, M.N., Islam, M.S., Khan, M.A., Jannat, M.E.: A neural network approach for Bangla POS tagger. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–4. IEEE (2018)
Ali, H.: An unsupervised parts-of-speech tagger for the Bangla language, vol. 20, pp. 1–8. Department of Computer Science, University of British Columbia (2010)
Hoque, M.N., Seddiqui, M.H.: Bangla parts-of-speech tagging using Bangla stemmer and rule based analyzer. In: 2015 18th International Conference on Computer and Information Technology (ICCIT), pp. 440–444. IEEE (2015)
Mukherjee, S. Mandal, S.K.D.: Bengali parts-of-speech tagging using global linear model. In Proceeding of IEEE INDICON—2013
Hasan, M.F., UzZaman, N., Khan, M.: Comparison of different POS tagging techniques (n-gram, HMM and Brill’s tagger) for Bangla. In: Elleithy, K. (eds) Advances and Innovations in Systems, Computing Sciences and Software Engineering. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6264-3_23
Hasan, M.F., UzZaman, N., Khan, M.: Comparison of unigram, bigram, HMM and brill’s POS tagging approaches for some South Asian languages (2007)
Chakrabarti, D., CDAC, P.: Layered parts of speech tagging for bangla. Language in India, www.languageinindia.com. Special Volume: Problems of Parsing in Indian Languages (2011)
Patil, H.B., Patil, A.S., Pawar, B.V.: Part-of-speech tagger for Marathi language using limited training Corpora. Int. J. Comput. Appl. 975, 8887 (2014)
Natural Language Toolkit. Available online https://en.wikipedia.org/wiki/Natural_Language_Toolkit. Last accessed 4 July 2020
Keras. Available online https://en.wikipedia.org/wiki/Keras. Last accessed 2 August 2020
Confusion Matrix. Available online https://en.wikipedia.org/wiki/Confusion_matrix. Last accessed 8 July 2020
Brill. Available online https://en.wikipedia.org/wiki/Brill. Last accessed 2 August 2020
Huang, X., Acero, A., Hon, H.W., Reddy, R.: Spoken language processing: a guide to theory, algorithm, and system development. Prentice hall PTR (2001)
NLP | Combining NGram Taggers. Available online https://www.geeksforgeeks.org/nlp-combining-ngram-taggers/. Last accessed 2 August 2020
N-gram Language Models. Available online https://medium.com/mti-technology/n-gram-language-model-b7c2fc322799. Last accessed 15 July 2020
Hidden Markov Models. Available online https://web.stanford.edu/~jurafsky/slp3/A.pdf. Last accessed 2 August 2020
Long Short-term Memory. Available online https://en.wikipedia.org/wiki/Long_short-term_memory. Last accessed 4 June 2020
Perez-Ortiz, J.A., Forcada, M.L.: Part-of-speech tagging with recurrent neural networks. In: IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222), vol. 3, pp. 1588–1592. IEEE (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jueal Mia, M., Hassan, M., Biswas, A.A. (2022). Effectiveness Analysis of Different POS Tagging Techniques for Bangla Language. In: Somani, A.K., Mundra, A., Doss, R., Bhattacharya, S. (eds) Smart Systems: Innovations in Computing. Smart Innovation, Systems and Technologies, vol 235. Springer, Singapore. https://doi.org/10.1007/978-981-16-2877-1_13
Download citation
DOI: https://doi.org/10.1007/978-981-16-2877-1_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2876-4
Online ISBN: 978-981-16-2877-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)