Automatic Stemming of Words for Punjabi Language

Gupta, Vishal

doi:10.1007/978-3-319-04960-1_7

Vishal Gupta⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 264))

2761 Accesses
6 Citations
1 Altmetric

Abstract

The major task of a stemmer is to find root words that are not in original form and are hence absent in the dictionary. The stemmer after stemming finds the word in the dictionary. If a match of the word is not found, then it may be some incorrect word or a name, otherwise the word is correct. For any language in the world, stemmer is a basic linguistic resource required to develop any type of application in Natural Language Processing (NLP) with high accuracy such as machine translation, document classification, document clustering, text question answering, topic tracking, text summarization and keywords extraction etc. This paper concentrates on complete automatic stemming of Punjabi words covering Punjabi nouns, verbs, adjectives, adverbs, pronouns and proper names. A suffix list of 18 suffixes for Punjabi nouns and proper names and a number of other suffixes for Punjabi verbs, adjectives and adverbs and different stemming rules for Punjabi nouns, verbs, adjectives, adverbs, pronouns and proper names have been generated after analysis of corpus of Punjabi. It is first time that complete Punjabi stemmer covering Punjabi nouns, verbs, adjectives, adverbs, pronouns, and proper names has been proposed and it will be useful for developing other Punjabi NLP applications with high accuracy. A portion of Punjabi stemmer of proper names and nouns has been implemented as a part of Punjabi text summarizer in MS Access as back end and ASP.NET as front end with 87.37% efficiency

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Survey on Various Stemming Techniques for Hindi and Nepali Language

Design and Development of a Dictionary Based Stemmer for Marathi Language

Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language

Keywords

References

Porter, M.: An Algorithm for Suffix Stripping Program 14, 130–137 (1980)
Google Scholar
Jenkins, M., Smith, D.: Conservative Stemming for Search and Indexing. In: Proceedings of SIGIR 2005 (2005)
Google Scholar
Mayfield, J., McNamee, P.: Single N-gram stemming. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–416 (2003)
Google Scholar
Massimo, M., Nicola, O.: A Novel Method for Stemmer Generation based on Hidden Markov Models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 131–138 (2003)
Google Scholar
Goldsmith, J.A.: Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics 27, 153–198 (2001)
Article MathSciNet Google Scholar
Creutz, M., Lagus, K.: Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora using Morfessor 1.0. Publications of Computer and Information Science, Helsinki University of Technology (2005)
Google Scholar
Ramanathan, A., Rao, D.D.: A Lightweight Stemmer for Hindi. In: Proceedings of Workshop on Computational Linguistics for South-Asian Languages, EACL (2003)
Google Scholar
Islam, M.Z., Uddin, M.N., Khan, M.: A Light Weight Stemmer for Bengali and its Use in Spelling Checker. In: Proceedings of. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA 2007), Irbid, Jordan, pp. 19–23 (2007)
Google Scholar
Majumder, P., Mitra, M., Parui, S.K., Kole, G., Datta, K.: YASS Yet Another Suffix Stripper. Association for Computing Machinery Transactions on Information Systems 25, 18–38 (2007)
Article Google Scholar
Dasgupta, S., Ng, V.: Unsupervised Morphological Parsing of Bengali. Language Resources and Evaluation 40, 311–330 (2006)
Article Google Scholar
Pandey, A.K., Siddiqui, T.J.: An Unsupervised Hindi Stemmer with Heuristic Improvements. In: Proceedings of the Second Workshop on Analytics For Noisy Unstructured Text Data, vol. 303, pp. 99–105 (2008)
Google Scholar
Majgaonker, M.M., Siddiqui, T.J.: Discovering Suffixes: A Case Study for Marathi Language. Proceedings of International Journal on Computer Science and Engineering 2, 2716–2720 (2010)
Google Scholar
Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujarati. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 1–8 (2011)
Google Scholar
Gupta, V., Lehal, G.S.: Punjabi Language Stemmer for Nouns and Proper Names. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39 (2011)
Google Scholar
Gupta, V., Lehal, G.S.: Preprocessing Phase of Punjabi Language Text Summarization. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds.) ICISIL 2011. CCIS, vol. 139, pp. 250–253. Springer, Heidelberg (2011)
Chapter Google Scholar
Gupta, V., Lehal, G.S.: Automatic Punjabi Text Extractive Summarization System. In: Proceedings of International Conference on Computational Linguistics COLING 2012, pp. 191–198 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

University Institute of Engineering & Technology, Panjab University Chandigarh, Chandigarh, India
Vishal Gupta

Authors

Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vishal Gupta .

Editor information

Editors and Affiliations

Technopark Campus, Indian Institute of Information Technology and Management – Kerala (IIITM-K), Trivandrum, Kerala, India
Sabu M. Thampi
Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico
Alexander Gelbukh
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India
Jayanta Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, V. (2014). Automatic Stemming of Words for Punjabi Language. In: Thampi, S., Gelbukh, A., Mukhopadhyay, J. (eds) Advances in Signal Processing and Intelligent Recognition Systems. Advances in Intelligent Systems and Computing, vol 264. Springer, Cham. https://doi.org/10.1007/978-3-319-04960-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-04960-1_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04959-5
Online ISBN: 978-3-319-04960-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Automatic Stemming of Words for Punjabi Language

Abstract

Chapter PDF

Similar content being viewed by others

A Survey on Various Stemming Techniques for Hindi and Nepali Language

Design and Development of a Dictionary Based Stemmer for Marathi Language

Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Stemming of Words for Punjabi Language

Abstract

Chapter PDF

Similar content being viewed by others

A Survey on Various Stemming Techniques for Hindi and Nepali Language

Design and Development of a Dictionary Based Stemmer for Marathi Language

Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation