Abstract
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-effective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the case that for some language no training examples are available.
Experimental results of the bi-lingual classification of the ILO corpus (with documents in English and Spanish) are obtained using bi-lingual training, terminology translation and profile-based translation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions, 9th edn (1970)
Berger, A., Lafferty, J.: Information Retrieval as statistical translation. In: Proceedings ACM SIGIR 1999, pp. 222–229 (1999)
Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic Term Detection: A review of current systems. In: Recent Advances in Computational Terminology. John Benjamins, Amsterdam (2001)
Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Chin, A.G. (ed.) Text Databases and Document Management: Theory and Practice, pp. 78–102. Idea Group Publishing, Hershey (2000)
Dagan, I., Karov, Y., Roth, D.: Mistake-Driven Learning in Text Categorization. In: Proceedings of the Second Conference on Empirical Methods in NLP, pp. 55–63 (1997)
Grove, A., Littlestone, N., Schuurmans, D.: General convergence results for linear discriminant updates. Machine Learning 43(3), 173–210 (2001)
Hiemstra, D., de Jong, F.: Disambiguation strategies for crosslanguage Information Retrieval. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 274–293. Springer, Heidelberg (1999)
Koster, C.H.A., Seutter, M.: Taming Wild Phrases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 161–176. Springer, Heidelberg (2003)
Larkey, L.S.: A patent search and classification system. In: Proceedings of DL-1999, 4th ACM Conference on Digital Libraries, pp. 179–187 (1999)
Lavrenko, V., Choquette, M., Bruce Croft, W.: Cross-Lingual Relevance Models. In: Proceedings ACM SIGIR 2002, pp. 175–182 (2002)
Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings ACM SIGIR 1992 (1992)
McNamee, P., Mayfield, J.: Comparing Cross-Language Query Expansion Techniques by Degrading Translation Resources. In: Proceedings ACM SIGIR 2002, pp. 159–166 (2002)
Peters, C., Koster, C.H.A.: Uncertainty-based Noise Reduction and Term Selection in Text Categorization. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 248–267. Springer, Heidelberg (2002)
Resnik, P., Oard, D.W., Levow, G.-A.: Improved Cross- Language Retrieval using Backoff Translation. Human Language Technology Conference (HLT), San Diego, CA (March 2001)
Riloff, E.: Little Words Can Make a Big Difference for Text Classification. In: Proceedings ACM SIGIR 1995, pp. 130–136 (1995)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bel, N., Koster, C.H.A., Villegas, M. (2003). Cross-Lingual Text Categorization. In: Koch, T., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45175-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-45175-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40726-3
Online ISBN: 978-3-540-45175-4
eBook Packages: Springer Book Archive