Abstract
In this paper, we empirically evaluate an automated technique, based on a hierarchical representation for the Earth’s surface and leveraging linear classifiers, for assigning geospatial coordinates to previously unseen documents, using only the raw text as input evidence. We measured the results obtained with models based on Support Vector Machines, over collections of geo-referenced Wikipedia articles in four different languages, namely English, German, Spanish and Portuguese. The best performing models obtained state-of-the-art results, corresponding to an average prediction error of 83 Kilometers, and a median error of just 9 Kilometers, in the case of the English Wikipedia collection.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Adams, B., Janowicz, K.: On the geo-indicativeness of non-georeferenced text. In: Proceedings of the International AAAI Conference on Weblogs and Social Media (2012)
Dias, D., Anastácio, I., Martins, B.: A language modeling approach for georeferencing textual documents. Actas del Congreso Español de Recuperación de Información (2012)
Dutton, G.: Encoding and handling geospatial data with hierarchical triangular meshes. In: Kraak, M.J., Molenaar, M., (eds.) Advances in GIS Research II. CRC Press (1996)
Górski, K.M., Hivon, E., Banday, A.J., Wandelt, B.D., Hansen, F.K., Reinecke, M., Bartelmann, M.: HEALPIX - a framework for high resolution discretization, and fast analysis of data distributed on the sphere. The Astrophysical Journal 622(2) (2005)
Lieberman, M.D., Samet, H.: Multifaceted toponym recognition for streaming news. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (2011)
Mehler, A., Bao, Y., Li, X., Wang, Y., Skiena, S.: Spatial analysis of news sources. IEEE Transactions on Visualization and Computer Graphics 12(5) (2006)
Roller, S., Speriosu, M., Rallapalli, S., Wing, B., Baldridge, J.: Supervised text-based geolocation using language models on an adaptive grid. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (2012)
Santos, J., Anastácio, I., Martins, B.: Using machine learning methods for disambiguating place references in textual documents. GeoJournal 80(3) (2015)
Speriosu, M., Baldridge, J.: Text-driven toponym resolution using indirect supervision. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2013)
Vincenty, T.: Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Review XXIII(176) (1975)
Wing, B., Baldridge, J.: Simple supervised document geolocation with geodesic grids. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2011)
Wing, B., Baldridge, J.: Hierarchical discriminative classification for text-based geolocation. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Melo, F., Martins, B. (2015). Geocoding Textual Documents Through a Hierarchy of Linear Classifiers. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds) Progress in Artificial Intelligence. EPIA 2015. Lecture Notes in Computer Science(), vol 9273. Springer, Cham. https://doi.org/10.1007/978-3-319-23485-4_59
Download citation
DOI: https://doi.org/10.1007/978-3-319-23485-4_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23484-7
Online ISBN: 978-3-319-23485-4
eBook Packages: Computer ScienceComputer Science (R0)