Abstract
On this article, we present a model to make easier the reading of digital newspapers extracting the location of the news from the articles and showing the places associated with the news on a map. A module of supervised keyword-based extraction recognizes and classifies the geographical locations like named entities. The extraction results are improved using dictionaries or gazetteers (a list of named entities of the geographic area where the news are located). Thesauri are also used to check and complete the results, and for the named entities disambiguation. Finally, the model has been applied to “El Norte de Castilla”, a digital publication of Vallladolid, to validate and identify the tools and techniques with the best results.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baldwin, B., Carpenter, B.: LingPipe, http://www.alias-i.com/lingpipe/
Blázquez, L.M.V., Pascual, A.F.R., Ángel, M., Poveda, B.: Ingeniería ontológica: El camino hacia la mejora del acceso a la información geográfica en el entorno web. In: Subdirección General de Aplicaciones Geográficas del Instituto Geográfico Nacional. Avances En Las Infraestructuras De Datos Espaciales, p. 95 (2006)
Brugmann, H., Malaisé, V., Gazendam, L.: Disambiguating automatic semantic annotation based on a thesaurus structure. In: Proc. 14e Conference Sur le Traitement Automatique des Langues Naturelles, TALN 2007 (2007)
CAGEclass, http://cageclass.sourceforge.net/ (last visit January 2011)
Chinchor, N.: Overview of MUC-7/MET-2. In: Proc. Message Understanding Conference, MUC-7 (1999)
CoNLL-2011, http://www.clips.ua.ac.be/conll/ (last visit March 2011)
Drools: The Business Object Integration Platform, http://www.jboss.org/drools
Flores Cuadrado, A., Villoslada de la Torre, E., Peláez Gutiérrez, A.: Generación de Tesauros basado en Media Wiki. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 3(6) (2009)
FreeLing Home Page, http://nlp.lsi.upc.edu/freeling/ (last visit April 2011)
Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: Proc. 16th Conference on Computational Linguistics, USA, vol. 1, pp. 466–471 (1996)
IREX: Information Retrieval and Extraction Exercise, http://nlp.cs.nyu.edu/irex/
Isaac, A., Summers, E.: SKOS: Simple Knowledge Organization System primer (2008), http://www.w3.org/TR/skos-primer (last visit March 2011)
Keyphrase Extraction Algorithm. Technical Report. Computer Science Department, University of Waikato. Hamilton, New Zealand, http://www.nzdl.org/Kea/index.html
Learning Based Java. Cognitive Computation Group. Universidad de Illinois, EEUU, http://cogcomp.cs.illinois.edu/page/software_view/11 (last visit April 2011)
LT-TTT2. Language Technology-Text Tokenisation Tool, http://www.ltg.ed.ac.uk/software/lt-ttt2 (last visit March 2011)
Mansouri, A., Affendey, L.S., Mamat, A.: Named Entity Recognition Approaches. International Journal of Computer Science and Network Security 8, 339–344 (2008)
Marrero, M., Sánchez-Cuadrado, S., Lara, J.M., Andreadakis, G.: Evaluation of named entity extraction systems. In: Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009), pp. 47–58 (2009)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)
Ortega, J.M.P., Cumbreras, M.A.G., Vega, M.G., López, L.A.U.: Sistemas de Recuperación de Información Geográfica multilinges en CLEF. Procesamiento Del Lenguaje Natural 40, 129–136 (2008)
Ortega, J.M.P., Ráez, A.M., Santiago, F.M., López, L.A.U.: Geo-NER: un reconocedor de entidades geográficas para inglés basado en GeoNames y Wikipedia. Procesamiento Del Lenguaje Natural 43, 33–40 (2009)
Ratinov, L.: Design Challenges and Misconceptions in Named Entity Recognition, http://cogcomp.cs.illinois.edu/page/publication_view/199 (last visit April 2011)
Stanford Named Entity Recognizer. The Stanford Natural Language Processing Group, http://nlp.stanford.edu/software/CRF-NER.shtml (last visit April 2011)
Toral, A.: DRAMNERI: a free knowledge based tool to named entity recognition. In: Proc. 1st Free Software Technologies Conference, La Coruña, España, pp. 27–31 (2005)
UpMyStreet, http://www.upmystreet.com/ (last visit April 2011)
Vargas, J.D.: Reconocimiento de Entidades Nombradas en Textos no Estructurados. Technical Report. Universidad Nacional de Colombia (2008)
Zapater, S., Javier, J.: Ontologías para servicios web semánticos de información de tráfico. Revista digital Dialnet. Lectura en la Universitat de Valencia en 2006 (2006), http://dialnet.unirioja.es/servlet/tesis?codigo=7157 (last visit March 2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gómez, C.G., Cuadrado, A.F., Mínguez, J.D., de la Torre, E.V. (2012). Automatic Extraction of Geographic Locations on Articles of Digital Newspapers. In: Rodríguez, J., Pérez, J., Golinska, P., Giroux, S., Corchuelo, R. (eds) Trends in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28795-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-28795-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28794-7
Online ISBN: 978-3-642-28795-4
eBook Packages: EngineeringEngineering (R0)