Abstract
In natural language processing applications, such as those related to question answering systems, and more specifically, to semantic role labelling, an important task to perform during the text normalization phase is lemmatization which consists in determining those two words which have the same root, despite their surface differences. Due to the lack of a practical lemmatizing tool suitable for the Italian language (which is a highly inflectional one), in this paper we aim to present LIT, a Rule based Italian lemmatizer consisting of a full rule-base lemmatization of all dictionary-words and a discovery algorithm which attempts to predict the grammar of neologisms. This is followed by a practical application of LIT on Europarl v7, a well-known open corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice Hall, Upper Saddle River (2009)
Allam, A.M.N., Haggag, M.H.: The question answering systems: a survey. Int. J. Res. Rev. Inf. Sci. (IJRRIS) 2(3), 211–221 (2012)
Kurdi, M.Z.: Natural Language Processing and Computational Linguistics: Speech, Morphology, and Syntax, vol. 1. ISTE-Wiley (2016). ISBN 978-1848218482
Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–31 (1968)
Porter, M.F.: An algorithm for suffix stripping Program: electronic library and information systems. 14(3), 130–137 (1980)
Paice, C.D.: Another stemmer. In: ACM SIGIR Forum, vol. 24, no. 3, pp. 56–61 (1990)
Melucci, M., Orio, N.: A novel method for stemmer generation based on hidden Markov models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 131–138 (2003)
Prasenjit, M., Mandar, M., Swapan, K.P., Gobinda, K., Kalyankumar, D.: YASS: yet another suffix stripper. ACM Trans. Inf. Syst. 25(4) (2007). Article no. 18
Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 191–202 (1993)
Xu, J., Croft, B.W.: Corpus-based stemming using co-occurrence of word variants. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)
Delmonte, R.: Italian lemmatization by rules with getaruns. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science, vol. 7689. Springer, Berlin (2013)
Acknowledgment
All the research activities here described are related to project “ABAUT - Application for Brand Auditing and Trend” funded to SPHERA Srl by the Italian Ministry of Economic Development - progetto n. F/050420/00/X32, bando “HORIZON 2020” PON I&C 2014–2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Molendini, S., Guerrieri, A., Filieri, A. (2020). LIT: Rule Based Italian Lemmatizer. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1038. Springer, Cham. https://doi.org/10.1007/978-3-030-29513-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-29513-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29512-7
Online ISBN: 978-3-030-29513-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)