Abstract
For free word order languages, chunking is quite challenging as they have relatively unrestricted phrase structures. A robust chunker helps in other NLP applications. This paper presents a Hybrid chunker for Gujarati Language. Contextual information in the form of last two unicodes of the word and of part-of-speech (POS) is used as the key features in developing the chunker using Machine learning approach. Four different statistical techniques, namely, SVM, CRF, Naïve Bayes, and HMM have been implemented to identify the most appropriate technique for Chunking the text in Gujarati language. Further, to improve performance, linguistic rules have been designed. Finally, achieved accuracy is 98.21% with precision, recall, and F1 score of 96.42%, 95.62 and 96.02, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
D. Jurafsky, J.H. Martin, Partial parsing, in Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. (Dorling Kindersley Pvt, Ltd., India, 2014), pp. 460–466
P. Dakwale, Anaphora resolution in hindi. M.S. thesis, IIITH, Hyderabad, India (2014)
S.P. Abney, Parsing by chunks, in Studies in Linguistics and Philosophy Principle-Based Parsing (1991), pp. 257–278
E. Ejerhed, K.W. Church, Finite state parsing, in Papers from the Seventh Scandinavian Conference of Linguistics (University of Helsinki, Finland, 1983)
S. Abney, Partial parsing via finite state cascades, in Proceedings of the ESSLLI Workshop on Robust Parsing, Prague, Czech Republic (1996)
T. Brants, Cascaded markov models, in Proceedings of EACL’99, Bergen, Norway (1999)
L.A. Ramshaw, M.P. Marcus, Text chunking using transformation based learning, in Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA (1995), pp. 84–94
A. Singh et al., HMM based chunker for hindi, in Proceedings of IJCNLP-05: The Second International Joint Conference on Natural Language Processing, 11–13 October 2005, Jeju Island, Republic of Korea. TDIL (2005), http://tdil-dc.in
T. Zhang et al., Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615–637 (2002)
T. Kudo, Y. Matsumoto, Chunking with support vector machines. J. Nat. Lang. Process. 9(5), 3–21 (2002)
A. Bharathi, P. Mannem, Introduction to the shallow parsing contest for South Asian languages, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad (2007), pp. 1–8, http://shiva.iiit.ac.in/SPSAL2007
A. Bharati et al., AnnCorra: annotating corpora, guidelines for POS and chunk annotation for Indian languages. LTRC-TR31 (2006), http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf
A. Ekbal et al., POS tagging using HMM and rule based chunking, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad (2007), pp. 25–28, http://shiva.iiit.ac.in/SPSAL2007
S. Dandapat, “Part of Speech and Chunking with Maximum Entropy Model,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 29–32. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007
S. Chandra Pammi and K. Prahallad, “POS tagging and Chunking using Decision Tree Forests,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 33–36. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007
H. Agrawal, “POS tagging and Chunking for Indian Languages,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 37–40. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007
P.V.S. Avinesh, G. Karthik, Part of speech tagging and chunking using conditional random fields and transformation based learning, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, pp. 21–24, http://shiva.iiit.ac.in/SPSAL2007
R.A. Bhat, D.M. Sharma, A hybrid approach to kashmiri shallow parsing, in The 5th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, November 2011
A. Ojha et al., A hybrid chunker for hindi and Indian english, in Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation Under the 10th LREC2016, 23–28 May 2016, pp. 93–99
C. Patel, D. Ahalpara, A statistical chunker for Indian language Gujarati. Int. J. Comput. Eng. Appl. IX(VII), 173–180 (2015)
M.A. Covington, A dependency parser for variable-word-order languages (The University of Georgia 1990)
E.F.T.K. Sang, J. Veenstra, Representing text chunks, in Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics (1999), pp. 173–179
Source code for nltk.tag.hmm, Natural Language Toolkit, https://www.nltk.org/_modules/nltk/tag/hmm.html. Accessed 15 July 2017
A.Z. Amrullah, R. Hartanto, I.W. Mustika, A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia, in 2017 7th International Annual Engineering Seminar (InAES), Yogyakarta (2017), pp. 1–5. https://doi.org/10.1109/inaes.2017.8068538
E. Loper, Source code for nltk.classify.naivebayes, Natural Language Toolkit, _modules/nltk/classify/naivebayes.html. Accessed 15 July 2017
B. Aisen, A comparison of multiclass SVM methods, 15 December 2006, http://courses.media.mit.edu/2006fall/mas622j/Projects/aisen-project/. Accessed 20 July 2017
T. Kudo, Y. Matsumoto, YamCha: yet another multipurpose chunk annotator (2017), http://chasen.org/~taku/software/YamCha/index.html. Accessed 20 June 2017
T. Kudo, CRF: yet another CRF toolkit (2005), https://taku910.github.io/crfpp/. Accessed 10 June 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tailor, C., Patel, B. (2021). Chunker for Gujarati Language Using Hybrid Approach. In: Rathore, V.S., Dey, N., Piuri, V., Babo, R., Polkowski, Z., Tavares, J.M.R.S. (eds) Rising Threats in Expert Applications and Solutions. Advances in Intelligent Systems and Computing, vol 1187. Springer, Singapore. https://doi.org/10.1007/978-981-15-6014-9_10
Download citation
DOI: https://doi.org/10.1007/978-981-15-6014-9_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6013-2
Online ISBN: 978-981-15-6014-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)