Abstract
This paper presents an approach to automatically subject index full-text documents with multiple labels based on binary support vector machines (SVM). The aim was to test the applicability of SVMs with a real world dataset. We have also explored the feasibility of incorporating multilingual background knowledge, as represented in thesauri or ontologies, into our text document representation for indexing purposes. The test set for our evaluations has been compiled from an extensive document base maintained by the Food and Agriculture Organization (FAO) of the United Nations (UN). Empirical results show that SVMs are a good method for automatic multi- label classification of documents in multiple languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aas, K., Eikvil, L.: Text Categorization: a survey. Technical Report #941, Norwegian Computing Center (1999)
Berners-Lee, T., Fielding, R., Irvine, U.C., Masinter, L.: Uniform Resource Identifiers (URI): Generic Syntax. ETF Request for Comments: 2396 (August 1998), Online http://www.ietf.org/rfc/rfc2396.txt
Bozsak, E., Ehrig, M., Handschuh, S., Hotho, A., et al.: KAON – Towards a Large Scale Semantic Web. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 304–313. Springer, Heidelberg (2002)
Chung, Y., Pottenger, W.M., Schatz, B.R.: Automatic Subject Indexing Using an Associative Neural Network. In: Proceedings of the 3rd ACM International Conference on Digital Libraries (DL 1998), pp. 59–68. ACM Press, New York (1998)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Crammer, K., Singer, Y.: A new family of online algorithms for category ranking. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. Tampere, Finland, pp. 151–158 (2002)
Hotho, A., Staab, S., Stumme, G.: Text clustering based on background knowledge (Technical Report 425), University of Karlsruhe, Institute AIFB. p. 36 (2003)
Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998), http://citeseer.nj.nec.com/article/joachims98text.html
Lauser, B.: Semi-automatic ontology engineering and ontology supported document indexing in a multilingual environment. Internal Report. University of Karlsruhe, Institute of Applied Informatics and Formal Description Methods (2003)
McCallum, A.: Multi-label text classification with a mixture model trained by em. In: AAAI 1999 Workshop on Text Learning (1999)
Ruiz, M.E., Srinivasan, P.: Combining Machine Learning and Hierarchical Structures for text categorization. In: Proceedings of the 10th ASIS SIG/CR Classification Research Workshop, Advances in Classification Research, vol. 10 (November 1999)
Sebastiani, F.: Machine learning in automated text categorization. Tech. Rep. IEI-B4-31- 1999, Consiglio Nazionale delle Ricerche, Pisa, Italy (1999)
Volz, R.: Akquisition von Ontologien mit Text-Mining-Verfahren. Technical Report 27, Rentenanstalt/Swiss Life, CC/ITRD, CH-8022 Zürich, Switzerland (2000) ISSN 1424–4691
Witten, I., Frank, E.: Data Mining, Practical Machine Learning Tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lauser, B., Hotho, A. (2003). Automatic Multi-label Subject Indexing in a Multilingual Environment. In: Koch, T., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45175-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-45175-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40726-3
Online ISBN: 978-3-540-45175-4
eBook Packages: Springer Book Archive