Abstract
In this paper we present an application of the stacking technique to a chunking task: named entity recognition. Stacking consists in applying machine learning techniques for combining the results of different models. Instead of using several corpus or several tagger generators to obtain the models needed in stacking, we have applied three transformations to a single training corpus and then we have used the four versions of the corpus to train a single tagger generator. Taking as baseline the results obtained with the original corpus (F β= 1 value of 81.84), our experiments show that the three transformations improve this baseline (the best one reaches 84.51), and that applying stacking also improves this baseline reaching an F β= 1 measure of 88.43.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Brants, T.: TnT. A statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP 2000), USA, pp. 224–231 (2000)
Breiman, L.: Bagging predictors. Machine Learning Journal 24, 123–140 (1996)
Carreras, X., Màrquez, L., Padró, L.: Named Entity Extraction using AdaBoost. In: CoNLL 2002 Computational Natural Language Learning, Taiwan, pp. 167–170 (2002)
Civit, M.: Guía para la anotación morfosintáctica del corpus CLiC-TALP. X-TRACT Working Paper WP-00/06 (2000)
Cohen, W.W.: Fast Effective Rule Induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, San Francisco (1995)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of CoNLL 2003, Canada, pp. 168–171 (2003)
Halteren, v.H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27, 199–230 (2001)
Henderson, J.C., Brill, E.: Exploiting diversity in natural language processing. Combining parsers. In: 1999 Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. ACL, USA, pp. 187–194 (1999)
Kohavi, R.: The Power of Decision Tables. In: Proceedings of the European Conference on Machine Learning. LNCS, vol. 914, pp. 174–189. Springer, Heidelberg (1995)
Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. In: Proceedings of NAACL 2000, USA, pp. 63–69 (2000)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Rössler, M.: Using Markov Models for Named Entity recognition in German newspapers. In: Proceedings of the Workshop on Machine Learning Approaches in Computational Linguistics, Italy, pp. 29–37 (2002)
Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowsky, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of COLING 2000, Germany, pp. 857–863 (2000)
Tjong Kim Sang, E.F.: Introduction to the CoNLL 2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL 2002, Taiwan, pp. 155–158 (2002)
Witten, I.H., Frank, E.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning, pp. 144–151. Morgan Kaufman, San Francisco (1998)
Witten, I.H., Frank, E.: Data Mining. In: Machine Learning Algorithms in Java. Morgan Kaufmann Publishers, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Troyano, J.A., Díaz, V.J., Enríquez, F., Carrillo, V., Cruz, F. (2005). Applying Stacking and Corpus Transformation to a Chunking Task. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2005. EUROCAST 2005. Lecture Notes in Computer Science, vol 3643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11556985_20
Download citation
DOI: https://doi.org/10.1007/11556985_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29002-5
Online ISBN: 978-3-540-31829-3
eBook Packages: Computer ScienceComputer Science (R0)