Abstract
The paper discusses the usage of unlabeled data for Spanish Named Entity Recognition. Two techniques have been used: self-training for detecting the entities in the text and co-training for classifying these already detected entities. We introduce a new co-training algorithm, which applies voting techniques in order to decide which unlabeled example should be added into the training set at each iteration. A proposal for improving the performance of the detected entities has been made. A brief comparative study with already existing co-training algorithms is demonstrated.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100 (1998)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGAT Conference on EMNLP and VLC, pp. 100–111 (1999)
Daelemans, W., Zavrel, J., Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner. Technical Report ILK 04-02, Tilburg University (2004)
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 327–334 (2000)
Kozareva, Z., Ferrandez, O., Montoyo, A., Muñoz, R., Suárez, A.: Combining data-driven systems for improving named entity recognition. In: Proceedings of Tenth International Conference on Applications of Natural Language to Information Systems, pp. 80–90 (2005)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of Ninth International Conference on Information and Knowledge Management, pp. 86–93 (2000)
Sang, T.K.: Introduction to the conll-2002 shared task: Language independent named entity recognition. In: Proceedings of CoNLL-2002, pp. 155–158 (2002)
Schroder, I.: A case study in part-of-speech tagging using the icopost toolkit. Technical Report FBI-HH-M-314/02, Department of Computer Science, University of Hamburg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kozareva, Z., Bonev, B., Montoyo, A. (2005). Self-training and Co-training Applied to Spanish Named Entity Recognition. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_78
Download citation
DOI: https://doi.org/10.1007/11579427_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29896-0
Online ISBN: 978-3-540-31653-4
eBook Packages: Computer ScienceComputer Science (R0)