Skip to main content

A Comprehensive Study on Natural Language Processing, It’s Techniques and Advancements in Nepali Language

  • Conference paper
  • First Online:
Advanced Computational and Communication Paradigms (ICACCP 2023)

Abstract

Natural Language Processing is the ability of a system to understand, interpret, and analyze spoken words, text files, etc. There have been plethora of models that have been developed for language processing which consist of rule-based approach, Neural Network approach, and Traditional Machine Learning. NLP has a wide range of applications, including speech recognition, machine translation, sentiment analysis, chatbots, and intelligent personal assistants. In recent years, NLP has made significant progress, thanks to the development of deep learning models, which have greatly improved the performance of NLP systems. To make the machine understand the speech, the machine should be able to understand the sentences, which can only be achieved when the sentences are well-structured, which is ensured by Grammar. In this paper, a survey is done on various natural language processing works that have been done for the Nepali language, different resources, and work available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Goldberg Y (2017) Neural network methods for natural language processing. Synth Lect Human Language Technol 10(1):1–309

    Google Scholar 

  2. Chowdhary K (2020) Natural language processing. Fundamentals of artificial intelligence, 603–649

    Google Scholar 

  3. Verma MTSR (2018) Natural language processing (Nlp): a comprehensive study

    Google Scholar 

  4. Plisson J, Lavrac N, Mladenic D (2004) A rule based approach to word lemmatization. In: Proceedings of IS, Vol 3, pp 83–86

    Google Scholar 

  5. Shahi TB, Dhamala TN, Balami B (2013) Support vector machines based part of speech tagging for Nepali text. Int J Comput Appl 70(24)

    Google Scholar 

  6. Lamsal R (2020) A large scale Nepali text corpus. IEEEdataport. https://doi.org/10.21227/jxrd-d245

  7. Duwal S, Bal BK (2019) Efforts in the Development of an Aug- mented English–Nepali Parallel Corpus. Technical report, Kathmandu University

    Google Scholar 

  8. Shahi TB, Sitaula C (2021) Natural language processing for Nepali text: a review. Artif Intell Rev, 1–29

    Google Scholar 

  9. Nemkul K, Shakya S (2021) English to Nepali sentence translation using recurrent neural network with attention. In: 2021 international conference on computing, communication, and intelligent systems (ICCCIS), pp 607–611. IEEE

    Google Scholar 

  10. Nemkul K, Shakya S (2021) Low resource English to Nepali sentence translation using RNN—long short-term memory with attention. In: Proceedings of international conference on sustainable expert systems, pp 649–657. Springer, Singapore

    Google Scholar 

  11. Tiedemann J (2012) Parallel data, tools and interfaces in OPUS. In: Lrec, Vol 2012, pp 2214–2218

    Google Scholar 

  12. Staudemeyer RC, Morris ER (2019) Understanding LSTM--a tutorial into long short-term memory recurrent neural networks. arXiv preprint arXiv:1909.09586

  13. Reiter E (2018) A structured review of the validity of BLEU. Comput Linguist 44(3):393–401

    Article  Google Scholar 

  14. Timilsina S, Gautam M, Bhattarai B (2022) NepBERTa: Nepali language model trained in a large corpus. In: Proceedings of the 2nd conference of the Asia-pacific chapter of the association for computational linguistics and the 12th international joint conference on natural language processing, pp 273–284

    Google Scholar 

  15. Nivre J (2015) Towards a universal grammar for natural language processing. In: International conference on intelligent text processing and computational linguistics, pp 3–16. Springer, Cham

    Google Scholar 

  16. Dhanalakshmi V, Rajendran S (2010) Natural language processing tools for tamil grammar learning and teaching. Int J Comput Appl, 0975–8887

    Google Scholar 

  17. Triana JG, De Castro R (2019) Grammars and multifactorial numbers. Global J Pure Appl Math 15(3):251–259

    MATH  Google Scholar 

  18. Klein D, Manning CD (2005) Natural language grammar induction with a generative constituent-context model. Pattern Recogn 38(9):1407–1419

    Article  MATH  Google Scholar 

  19. Nivre J (2005) Dependency grammar and dependency parsing. MSI report 5133(1959):1–32

    Google Scholar 

  20. Debusmann R (2000) An introduction to dependency grammar. Hausarbeit fur das Hauptseminar Dependenzgrammatik SoSe 99(1):16

    Google Scholar 

  21. Khatiwada R (2009) Nepali. J Int Phon Assoc 39(3):373–380

    Article  Google Scholar 

  22. Matthews D (2013) Course in Nepali. Routledge

    Book  Google Scholar 

  23. Bista S, Khatiwada L, Keshari B (2004) Nepali lexicon development. PAN Localization, Working Papers, 2007, 311–15

    Google Scholar 

  24. Bal BK, Shrestha P, Pustakalaya MP (2004) Nepali spellchecker. PAN Localization Working Papers, 2007, 316–318

    Google Scholar 

  25. Yadava YP, Hardie A, Lohani RR, Regmi BN, Gurung S, Gurung A, ... Hall P (2008) Construction and annotation of a corpus of contemporary Nepali. Corpora 3(2):213–225

    Google Scholar 

  26. Bal BK (2004) Structure of Nepali grammar. PAN Localization, Madan Puraskar Pustakalaya, Kathmandu, Nepal, 332–396

    Google Scholar 

  27. Jurish B, Würzner KM (2013) Word and Sentence Tokenization with Hidden Markov Models. J Lang Technol Comput Linguist 28(2):61–83

    Article  Google Scholar 

  28. Katam S (2014) The porter stemmer. Indiana State University

    Google Scholar 

  29. Jivani AG (2011) A comparative study of stemming algorithms. Int J Comp Tech Appl 2(6):1930–1938

    Google Scholar 

  30. Khyani D, Siddhartha BS, Niveditha NM, Divya BM (2021) An Interpretation of Lemmatization and Stemming in Natural Language Processing. J Univ Shanghai Sci Technol

    Google Scholar 

  31. Shrestha I, Dhakal SS (2021) Fine-grained part-of-speech tagging in Nepali text. Procedia Computer Science 189:300–311

    Article  Google Scholar 

  32. Sitaula C (2013) A hybrid algorithm for stemming of Nepali text

    Google Scholar 

  33. Borah S, Choden U, Lepcha N (2017) Design of a morph analyzer for non-declinable adjectives of nepali language. In: Proceedings of the 2017 international conference on machine learning and soft computing, pp 126–130

    Google Scholar 

  34. Chhetri I, Dey G, Das SK, Borah S (2015) Development of a morph analyser for Nepali noun token. In: 2015 international conference on advances in computer engineering and applications, pp 984–987. IEEE

    Google Scholar 

  35. Jayakodi K, Bandara M, Meedeniya D (2016) An automatic classifier for exam questions with WordNet and Cosine similarity. In: 2016 Moratuwa engineering research conference (MERCon), pp 12–17. IEEE

    Google Scholar 

  36. Lu X (2014) Lexical annotation. In: Computational methods for corpus annotation and analysis, pp 39–65. Springer, Dordrecht

    Google Scholar 

  37. Anees AF, Shaikh A, Shaikh A, Shaikh S (2020) Survey paper on sentiment analysis: techniques and challenges. EasyChair2516–2314

    Google Scholar 

  38. Subba S, Paudel N, Shahi TB (2019) Nepali text document classification using deep neural network. Tribhuvan Univ J 33(1):11–22

    Article  Google Scholar 

  39. Tripathi M (2021) Sentiment analysis of nepali covid19 tweets using nb svm and lstm. J Artif Intell 3(03):151–168

    Google Scholar 

  40. Nothman J, Qin H, Yurchak R (2018) Stop word lists in free open-source software packages. In: Proceedings of workshop for NLP open source software (NLP-OSS), pp 7–12

    Google Scholar 

  41. Fernández-González D, Gómez-Rodríguez C (2023) Dependency parsing with bottom-up hierarchical pointer networks. Inf Fusion 91:494–503

    Article  Google Scholar 

  42. ArchitYajnik D (2015) Parsing techniques using Paninian framework on Nepali language. DJ J Eng Appl Math 1(1)

    Google Scholar 

  43. Rai P, Chatterji S (2022) Annotation projection-based dependency parser development for Nepali. Transactions on asian and low-resource language information processing

    Google Scholar 

  44. Chiche A, Yitagesu B (2022) Part of speech tagging: a systematic review of deep learning and machine learning approaches. J Big Data 9(1):1–25

    Article  Google Scholar 

  45. Li H, Mao H, Wang J (2021) Part-of-speech tagging with rule-based data preprocessing and transformer. Electronics 11(1):56

    Article  Google Scholar 

  46. Zheng X, Chen H, Xu T (2013) Deep learning for Chinese word segmentation and POS tagging. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 647–657

    Google Scholar 

  47. Marquez L, Padro L, Rodriguez H (2000) A machine learning approach to POS tagging. Mach Learn 39(1):59–91

    Article  MATH  Google Scholar 

  48. Prasain B, Khatiwada LP, Bal BK, Shrestha P (2008) Part-of-speech Tagset for Nepali. Madan Puraskar Pustakalaya

    Google Scholar 

  49. Bal BK, Shrestha P (2004) A morphological analyzer and a stemmer for Nepali. PAN Localization, working papers, 2007, 324–31

    Google Scholar 

  50. Yajnik A (2017) Part of speech tagging using statistical approach for Nepali text. Int J Cognit Language Sci 11(1):76–79

    Google Scholar 

  51. Paul A, Purkayastha BS, Sarkar S (2015) Hidden Markov model based part of speech tagging for Nepali language. In: 2015 international symposium on advanced computing and communication (ISACC), pp 149–156. IEEE

    Google Scholar 

  52. Prabha G, Jyothsna PV, Shahina KK, Premjith B, Soman KP (2018) A deep learning approach for part-of-speech tagging in nepali language. In: 2018 international conference on advances in computing, communications and informatics (ICACCI), pp 1132–1136. IEEE

    Google Scholar 

  53. Mohit B (2014) Named entity recognition. In: Natural language processing of semitic languages, pp 221–245. Springer, Berlin, Heidelberg

    Google Scholar 

  54. Bam SB, Shahi TB (2014) Named entity recognition for nepali text using support vector machines. Intell Inf Manag

    Google Scholar 

  55. Dey A, Paul A, Purkayastha BS (2014) Named entity recognition for nepali language: a semi hybrid approach. Int J Eng Innov Technol (IJEIT) 3:21–25

    Google Scholar 

  56. Singh OM, Padia A, Joshi A (2019) Named entity recognition for nepali language. In: 2019 IEEE 5th international conference on collaboration and internet computing (CIC), pp 184–190. IEEE

    Google Scholar 

  57. Lee YS, Wu YC (2007) A robust multilingual portable phrase chunking system. Expert Syst Appl 33(3):590–599

    Article  Google Scholar 

  58. Rupakheti P, Report on Nepali Computational Grammar Prajwal Rupakheti, Laxmi Prasad Khatiwada Bal Krishna Bal Madan Puraskar Pustakalaya Lalitpur, PatanDhoka, Nepal.

    Google Scholar 

  59. Hippisley AR (2010) Lexical analysis

    Google Scholar 

  60. Vo AD, Nguyen QP, Ock CY (2020) Semantic and syntactic analysis in learning representation based on a sentiment analysis model. Appl Intell 50(3):663–680

    Article  Google Scholar 

  61. Chandra P, Udaar U (2015) Ergative case and verbal agreement: explaining dialectal variations in Nepali. Acta Linguistica 9(1)

    Google Scholar 

  62. Goddard C (2011) Semantic analysis: a practical introduction. Oxford University Press

    Google Scholar 

  63. Maulud DH, Zeebaree SR, Jacksi K, Sadeeq MAM, Sharif KH (2021) State of art for semantic analysis of natural language processing. Qubahan Acad J 1(2):21–28

    Article  Google Scholar 

  64. Meera S, Geerthik S (2022) Natural language processing. Artificial intelligent techniques for wireless communication and networking, 139–153

    Google Scholar 

  65. Zhao L, Alhoshan W, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca EV, Batista-Navarro RT (2021) Natural language processing for requirements engineering: a systematic mapping study. ACM Comput Surv (CSUR) 54(3):1–41

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sital Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, S., Sharma, K., Sen, B. (2023). A Comprehensive Study on Natural Language Processing, It’s Techniques and Advancements in Nepali Language. In: Borah, S., Gandhi, T.K., Piuri, V. (eds) Advanced Computational and Communication Paradigms . ICACCP 2023. Lecture Notes in Networks and Systems, vol 535. Springer, Singapore. https://doi.org/10.1007/978-981-99-4284-8_13

Download citation

Publish with us

Policies and ethics