Abstract
Two of the problems that should arise when developing a stemming scheme for diachronic corpora are: (1) morphological systems of natural languages may vary throughout time, and these changes are normally not documented sufficiently; and (2) they exhibit very diverse orthographic characteristics. In this short paper, a stemming strategy for a diachronic corpus of Mexican Spanish is briefly described, which partially faces up to these problems. Success rates of the method are contrasted to those of a Porter stemmer.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Medina-Urrea, A., Hlaváčová, J.: Automatic Recognition of Czech Derivational Prefixes. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 189–197. Springer, Heidelberg (2005)
Medina-Urrea, A., Buenrostro Díaz, E.C.: Características cuantitativas de la flexión verbal del chuj. Estudios de Lingüística Aplicada 38, 15–31 (2003)
Medina-Urrea, A., Alvarado García, M.: Análisis cuantitativo y cualitativo de la derivación léxica en ralámuli. Primer Coloquio Leonardo Manrique, Mexico, Conaculta-INAH (2004)
Medina-Urrea, A.: Automatic Discovery of Affixes by Means of a Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7(2), 97–114 (2000)
Harris, J.: Historical Excursus: Reflexes of the Medieval Stridents. In: Spanish Phonology, pp. 189–206. MIT Press, Cambridge (1969)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Medina-Urrea, A. (2006). Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_12
Download citation
DOI: https://doi.org/10.1007/11671299_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)