Generating Spatial Footprints from Hiking Blogs

Acheson, Elise; Wartmann, Flurina M.; Purves, Ross S.

doi:10.1007/978-3-319-63946-8_2

Elise Acheson⁸,
Flurina M. Wartmann⁸ &
Ross S. Purves⁸

Part of the book series: Lecture Notes in Geoinformation and Cartography ((LNGC))

Included in the following conference series:

International Conference on Spatial Information Theory

891 Accesses

Abstract

Explicitly linking text documents to geographical space is an important processing step for applications such as map visualization, spatial querying, and placename disambiguation. In this work, we present a proof-of-concept processing pipeline to generate spatial footprints for a spatially-rich, manually-annotated corpus of hiking blogs. We present preliminary results obtained by exploiting the spatially-focused nature of our input data and the rich placename resources at our disposal. Future work will fully automate the pipeline and systematically examine the influence of processing decisions on the footprints and downstream tasks.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A Big Geo Data Query Framework to Correlate Open Data with Social Network Geotagged Posts

Harvesting Big Geospatial Data from Natural Language Texts

A framework for annotating OpenStreetMap objects using geo-tagged tweets

Article 20 June 2018

Keywords

1 Introduction

How can we geographically represent text documents? For documents strongly linked to geographical space, such as hiking blogs, a representative geographical area can be automatically generated by combining natural language processing methods with geographical processing such as spatial filtering or clustering. The resulting document representations, known as ‘document scopes’ or ‘document footprints’, are useful in downstream tasks (e.g. spatial queries), upstream tasks (e.g. improving placename disambiguation), and as an end in themselves (e.g. visualizing a text document on a mapping interface) (Monteiro et al. 2016; Purves et al. 2007; Quercini et al. 2010).

The context of our work is a project on how people describe landscapes in Switzerland. With the goal of comparing landscape descriptions from different data sources (hiking blogs and Flickr photos), document footprints are generated for a web-crawled corpus of hiking blogs in order to query and select Flickr photos based on location. For this task, we aimed to generate high-precision, geographically focused footprints.

2 Data and Methods

Our corpus consisted of web documents related to ten study sites in the German speaking region of Switzerland in a first-person narrative. Documents were collected by targeted web-crawling, with five texts per site selected by manual triage, for a final corpus of 50 documents.

To generate document footprints, we followed the established three-step processing pipeline (Amitay et al. 2004; Monteiro et al. 2016) consisting of: (1) identifying placenames, (2) grounding placenames, and (3) generating a footprint (geometry) (Fig. 1). Poor placename identification has been identified as a major source of error for document scope propagated downstream (Amitay et al. 2004; Purves et al. 2007). Thus, to obtain precise footprints, we performed step 1 manually, which was feasible due to the small corpus size. Step 2, grounding, involved querying an API to obtain ranked results from the SwissNames3D gazetteer for each placename, after having aggregated placenames repeating within a study site and recording their frequencies. For the final step, footprint generation, we experimented with two approaches: iterative filtering based on the centroid and standard deviation of our candidate points (Smith and Crane 2001), and clustering using DBSCAN to identify one main cluster and discard outliers.

Finally, we experimented with permutations in processing decisions in order to generate optimal footprints which suited our requirements. Decisions included: how many candidates for each placename to retain at the grounding stage; whether to treat with higher priority placenames with exactly one candidate; and whether to use the frequency per site of placenames. We automated the entire processing pipeline, starting from the manually annotated placenames, to output ten convex hulls or bounding boxes on each run.

3 Results and Conclusion

Our preliminary results showed that for placename grounding, simple approaches worked well enough for our purposes: ranked candidate placenames from SwissNames3D were sufficiently accurate that we obtained good results by retaining just the top candidate for each placename. For footprint generation, satisfactory results were obtained using both distance-based filtering and DBSCAN clustering, but the DBSCAN results were better suited to more complex geometric arrangements.

Our footprint requirements stemmed from our downstream tasks: performing a spatial query for Flickr photos, and ultimately, comparing datasources about landscapes at our ten study sites. These task-based requirements, along with the availability of quality placename resources for our area of study, influenced our processing decisions at every stage. Future work will fully automate the placename identification stage, and will systematically measure the effects of permutations in processing decisions on the downstream tasks of querying and document comparison.

References

Amitay E, Har’El N, Sivan R, Soffer A (2004) Web a where: geotagging web content. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, SIGIR ’04, pp 273–280. doi: 10.1145/1008992.1009040
Monteiro BR, Davis CA Jr, Fonseca F (2016) A survey on the geographic scope of textual documents. Comput Geosci. doi:10.1016/j.cageo.2016.07.017. http://www.sciencedirect.com/science/article/pii/S0098300416301972
Purves RS, Clough P, Jones CB, Arampatzis A, Bucher B, Finch D, Fu G, Joho H, Syed AK, Vaid S, Yang B (2007) The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet. Int J Geogr Inf Sci 21(7):717–745. doi:10.1080/13658810601169840
Quercini G, Samet H, Sankaranarayanan J, Lieberman MD (2010) Determining the spatial reader scopes of news sources using local Lexicons. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, ACM, New York, NY, USA, GIS ’10, pp 43–52. doi:10.1145/1869790.1869800
Smith DA, Crane G (2001) Disambiguating geographic names in a historical digital library. In: Constantopoulos P, Slvberg IT (eds) Research and advanced technology for digital libraries, no. 2163 in Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 127–136. http://springerlink.bibliotecabuap.elogim.com/chapter/10.1007/3-540-44796-2_12

Download references

Author information

Authors and Affiliations

Geography Department, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
Elise Acheson, Flurina M. Wartmann & Ross S. Purves

Authors

Elise Acheson
View author publications
You can also search for this author in PubMed Google Scholar
Flurina M. Wartmann
View author publications
You can also search for this author in PubMed Google Scholar
Ross S. Purves
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elise Acheson .

Editor information

Editors and Affiliations

Department of Geodesy and Geoinformation, Vienna University of Technology(TU-Wien), Wien, Austria
Paolo Fogliaroni
Department of Geography, Birkbeck, University of London, London, Middlesex, United Kingdom
Andrea Ballatore
Department of Industrial and Information Engineering and Economics, University of L'Aquila , L'Aquila, Italy
Eliseo Clementini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Acheson, E., Wartmann, F.M., Purves, R.S. (2018). Generating Spatial Footprints from Hiking Blogs. In: Fogliaroni, P., Ballatore, A., Clementini, E. (eds) Proceedings of Workshops and Posters at the 13th International Conference on Spatial Information Theory (COSIT 2017). COSIT 2017. Lecture Notes in Geoinformation and Cartography. Springer, Cham. https://doi.org/10.1007/978-3-319-63946-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-63946-8_2
Published: 16 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63945-1
Online ISBN: 978-3-319-63946-8
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

Generating Spatial Footprints from Hiking Blogs

Abstract

Similar content being viewed by others

A Big Geo Data Query Framework to Correlate Open Data with Social Network Geotagged Posts

Harvesting Big Geospatial Data from Natural Language Texts

A framework for annotating OpenStreetMap objects using geo-tagged tweets

Keywords

1 Introduction

2 Data and Methods

3 Results and Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Generating Spatial Footprints from Hiking Blogs

Abstract

Similar content being viewed by others

A Big Geo Data Query Framework to Correlate Open Data with Social Network Geotagged Posts

Harvesting Big Geospatial Data from Natural Language Texts

A framework for annotating OpenStreetMap objects using geo-tagged tweets

Keywords

1 Introduction

2 Data and Methods

3 Results and Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation