Abstract
From a documentary point of view, an important aspect when we are conducting a rigorous labeling is to consider the geographic locations related to each document. Although there exist tools and geographic databases, it is not easy to find an automated labeling system for multilingual texts specialized in this type of recognition and further adapted to a particular context.
This paper proposes a method that combines geographic location techniques with Natural Language Processing and statistical and semantic disambiguation tools to perform an appropriate labeling in a general way. The method can be configured and fine-tuned for a given context in order to optimize the results. The paper also details an experience of using the proposed method over a content management system in a real organization (a major Spanish newspaper). The experimental results obtained show an overall accuracy of around 80%, which shows the potential of the proposal.
This research work has been supported by the CICYT project TIN2010-21387-C02-02 and DGA-FSE.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Smeaton, A.F.: Using NLP or NLP Resources for Information Retrieval Tasks. In: Natural Language Information Retrieval. Kluwer Academic Publishers (1999)
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: 27th International Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 273–280. ACM (2004)
Sekine, S., Ranchhod, E.: Named Entities: Recognition, Classification and Use. John Benjamins (2009)
Hill, L.L.: Core elements of digital gazetteers: placenames, categories, and footprints. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 280–290. Springer, Heidelberg (2000)
Quercini, G., Samet, H., Sankaranarayanan, J., Lieberman, M.D.: Determining the spatial reader scopes of news sources using local lexicons. In: 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 43–52. ACM (2010)
Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: HLT-NAACL 2003 Workshop on Analysis of Geographic References, vol. 1, pp. 50–54. Association for Computational Linguistics (2003)
Li, H., Srihari, R.K., Niu, C., Li, W.: Location normalization for information extraction. In: 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)
Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A.-C., Best, C.: Geocoding multilingual texts: Recognition, disambiguation and visualisation. The Computing Research Repository (CoRR) abs/cs/0609065 (2006)
Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993)
Janowicz, K., Keßler, C.: The role of ontology in improving gazetteer interaction. International Journal of Geographical Information Science 22(10), 1129–1157 (2008)
Machado, I.M.R., de Alencar, R.O., de Oliveira Campos Jr., R., Davis Jr., C.A.: An ontological gazetteer and its application for place name disambiguation in text. Journal of the Brazilian Computer Society 17(4), 267–279 (2011)
Gilchrist, A.: Thesauri, taxonomies and ontologies - an etymological note. Journal of Documentation 59(1), 7–18 (2003)
Garrido, A., Gómez, O., Ilarri, S., Mena, E.: Nass: News Annotation Semantic System. In: 23rd International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Garrido, A.L., Gómez, O., Ilarri, S., Mena, E.: An experience developing a semantic annotation system in a media group. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 333–338. Springer, Heidelberg (2012)
Lieberman, M.D., Samet, H., Sankaranarayanan, J.: Geotagging with local lexicons to build indexes for textually-specified spatial data. In: 2010 IEEE 26th International Conference on Data Engineering, pp. 201–212. IEEE (2010)
Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: Workshop on Speech and Natural Language (HLT 1991), pp. 233–237. Association for Computational Linguistics (1992)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. In: Information Processing and Management, vol. 24, pp. 513–523. Pergamon Press, Inc. (1988)
McGuinness, D.L., van Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation (2004), http://www.w3.org/TR/owl-features/
Prudíhommeaux, E.: SPARQL Query Language for RDF. W3C Working Draft (2006), http://www.w3.org/TR/2006/WD-rdf-sparql-query-20061004/
Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: 4th Mexican International Conference on Computer Science, pp. 18–25. IEEE (2003)
Vallez, M., Pedraza-Jimenez, R.: Natural language processing in textual information retrieval and related topics. Hipertext.net (5) (2007)
Aguado de Cea G., Puch, J., Ramos, J.: Tagging spanish texts: The problem of ‘se’. In: Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 2321–2324 (2008)
Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press, Cambridge (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garrido, A.L., Buey, M.G., Ilarri, S., Mena, E. (2013). GEO-NASS: A Semantic Tagging Experience from Geographical Data on the Media. In: Catania, B., Guerrini, G., Pokorný, J. (eds) Advances in Databases and Information Systems. ADBIS 2013. Lecture Notes in Computer Science, vol 8133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40683-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40683-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40682-9
Online ISBN: 978-3-642-40683-6
eBook Packages: Computer ScienceComputer Science (R0)