Experimental Evaluation of Deep Learning Models for Marathi Text Classification

Kulkarni, Atharva; Mandhane, Meet; Likhitkar, Manali; Kshirsagar, Gayatri; Jagdale, Jayashree; Joshi, Raviraj

doi:10.1007/978-981-16-6407-6_53

Atharva Kulkarni¹¹,
Meet Mandhane¹¹,
Manali Likhitkar¹¹,
Gayatri Kshirsagar¹¹,
Jayashree Jagdale¹¹ &
…
Raviraj Joshi¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 237))

1006 Accesses
10 Citations

Abstract

The Marathi language is one of the prominent languages used in India. It is predominantly spoken by the people of Maharashtra. Over the past decade, the usage of language on online platforms has tremendously increased. However, research on Natural Language Processing (NLP) approaches for Marathi text has not received much attention. Marathi is a morphologically rich language and uses a variant of the Devanagari script in the written form. This works aims to provide a comprehensive overview of available resources and models for Marathi text classification. We evaluate CNN, LSTM, ULMFiT, and BERT based models on two publicly available Marathi text classification datasets and present a comparative analysis. The pre-trained Marathi fast text word embeddings by Facebook and IndicNLP are used in conjunction with word-based models. We show that basic single layer models based on CNN and LSTM coupled with FastText embeddings perform on par with the BERT based models on the available datasets. We hope our paper aids focused research and experiments in the area of Marathi NLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Learning for Hindi Text Classification: A Comparison

End-to-End Neural Text Classification for Tibetan

Evaluation and Analysis of the NLP Model Zoo for Ukrainian Text Classification

References

Akhtar MS, Ekbal A, Bhattacharyya P (2016) Aspect based sentiment analysis in hindi: resource creation and evaluation. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp 2703–2709
Google Scholar
Al-Amin M, Islam MS, Uzzal SD (2017) Sentiment analysis of Bengali comments with word2vec and sentiment information of words. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, pp 186–190
Google Scholar
Arora G (2020) inltk: Natural language toolkit for indic languages. arXiv preprint arXiv:2009.12534
Arora P (2013) Sentiment analysis for hindi language. MS by Research in Computer Science
Google Scholar
Bolaj P, Govilkar S (2016) Text classification for Marathi documents using supervised learning methods. Int J Computer Appl 155(8):6–10
Google Scholar
Conneau A, Lample G (2019) Cross-lingual language model pretraining. In: Advances in neural information processing systems, pp 7059–7069
Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146
Joshi A, Prabhu A, Shrivastava M, Varma V (2016) Towards sub-word level compositions for sentiment analysis of hindi-english code mixed text. In: Proceedings of COLING 2016, the 26th International conference on computational linguistics: Technical Papers, pp 2482–2491
Google Scholar
Joshi R, Goel P, Joshi R (2019) Deep learning for hindi text classification: a comparison. In: International conference on intelligent human computer interaction. Springer, Berlin, pp 94–101 (2019)
Google Scholar
Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759
Kakwani D, Kunchukuttan A, Golla S, Bhattacharyya A, Khapra MM, Kumar P (2020) Indicnlpsuite: monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages. Findings of EMNLP
Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150
Article Google Scholar
Kudo T, Richardson J (2018) Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226
Pal AR, Saha D, Dash NS (2015) Automatic classification of Bengali sentences based on sense definitions present in bengali wordnet. arXiv preprint arXiv:1508.01349
Patil, HB, Patil AS (2017) Mars: a rule-based stemmer for morphologically rich language marathi. In: 2017 International conference on computer, communications and electronics (Comptelix). IEEE, pp 580–584
Google Scholar
Patil JJ, Bogiri N (2015) Automatic text categorization: Marathi documents. In: 2015 International conference on energy systems and applications. IEEE, pp 689–694
Google Scholar
Patra BG, Das D, Das A (2018) Sentiment analysis of code-mixed Indian languages: An overview of sail\_code-mixed shared task@ icon-2017. arXiv preprint arXiv:1803.06745
Patra BG, Das D, Das A, Prasath R (2015) Shared task on sentiment analysis in Indian languages (sail) tweets-an overview. In: International conference on mining intelligence and knowledge exploration. Springer, Berlin, pp. 650–655
Google Scholar
Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual bert? arXiv preprint arXiv:1906.01502
Sarkar K, Bhowmick M (2017) Sentiment polarity detection in Bengali tweets using multinomial naïve bayes and support vector machines. In: 2017 IEEE Calcutta conference (CALCON). IEEE, pp 31–36
Google Scholar
Suárez PJO, Sagot B, Romary L (2019) Asynchronous pipeline for processing huge corpora on medium to low resource infrastructures. In: 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7). Leibniz-Institut für Deutsche Sprache
Google Scholar
Vispute SR, Potey M (2013) Automatic text categorization of Marathi documents using clustering technique. In: 2013 15th International Conference on Advanced Computing Technologies (ICACT). IEEE, pp 1–5
Google Scholar
Wenzek G, Lachaux MA, Conneau A, Chaudhary V, Guzmán F, Joulin A, Grave E (2019) Ccnet: Extracting high quality monolingual datasets from web crawl data. arXiv preprint arXiv:1911.00359

Download references

Acknowledgements

This work was done under the L3Cube Pune mentorship program. We would like to express our gratitude towards our mentors at L3Cube for their continuous support and encouragement.

Author information

Authors and Affiliations

Pune Institute of Computer Technology, Pune, Maharashtra, India
Atharva Kulkarni, Meet Mandhane, Manali Likhitkar, Gayatri Kshirsagar & Jayashree Jagdale
Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
Raviraj Joshi

Authors

Atharva Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Meet Mandhane
View author publications
You can also search for this author in PubMed Google Scholar
Manali Likhitkar
View author publications
You can also search for this author in PubMed Google Scholar
Gayatri Kshirsagar
View author publications
You can also search for this author in PubMed Google Scholar
Jayashree Jagdale
View author publications
You can also search for this author in PubMed Google Scholar
Raviraj Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, CMR Institute of Technology, Hyderabad, India
Vinit Kumar Gunjan
Department of Electrical and Computer Engineering, University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kulkarni, A., Mandhane, M., Likhitkar, M., Kshirsagar, G., Jagdale, J., Joshi, R. (2022). Experimental Evaluation of Deep Learning Models for Marathi Text Classification. In: Gunjan, V.K., Zurada, J.M. (eds) Proceedings of the 2nd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications. Lecture Notes in Networks and Systems, vol 237. Springer, Singapore. https://doi.org/10.1007/978-981-16-6407-6_53

Download citation

DOI: https://doi.org/10.1007/978-981-16-6407-6_53
Published: 10 January 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6406-9
Online ISBN: 978-981-16-6407-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Experimental Evaluation of Deep Learning Models for Marathi Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning for Hindi Text Classification: A Comparison

End-to-End Neural Text Classification for Tibetan

Evaluation and Analysis of the NLP Model Zoo for Ukrainian Text Classification

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Experimental Evaluation of Deep Learning Models for Marathi Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning for Hindi Text Classification: A Comparison

End-to-End Neural Text Classification for Tibetan

Evaluation and Analysis of the NLP Model Zoo for Ukrainian Text Classification

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation