1 Introduction

Recent advancements in Volunteered Geographic Information (VGI) have generated a plethora of unstructured geospatial data [25]. Most common among this data is natural language place descriptions depicting human perception of space. We are motivated to develop automated means for interpreting such descriptions in order to allow the spatial knowledge to be retrieved. The objective is useful for its diverse areas of applications, including geographic information systems (GIS), spatial user interfaces and visualization, services for smart cities, and road network enrichment [8].

Interpreting place descriptions involves the process of relating all noun phrases in a sentence that represent geographic entities to corresponding entities in a geographic database [30]. This task is also referred to as geo-referencing. The categories of nouns present in a sentence can vary from names (a named entity like London) to type identifier nouns like a market, river, and so on. Other examples include abstract nouns that provide meta-level concepts like location, spot, place, position [27], and non-spatial place information such as place semantics, equipment, ethnicity, activities, or affordance. Recognizing place names and geo-referencing them using toponym resolution has received much attention and considerable progress has been made [4, 7, 13, 14, 16, 17]. We aim to advance existing techniques to handle places that are only described by their type or that are paraphrased. This poses new challenges to geo-referencing. For example, consider the place description “a nice spot for getting ice-cream is just next to Old Town Hall, Bamberg”. Existing techniques can identify the named entities Old Town Hall and Bamberg, but the gist of the sentence—indicating a particular ice cream parlour—is not revealed. In order to geo-reference the “spot for getting ice-cream”, one needs to identify a geographic entity that matches the description. With OpenStreetMap (OSM), queries for specific types of objects may be posed (e.g., a restaurant, a café, a bar). But in order for the query to reveal the desired target, the correct entity type indicated by the associated OSM tag must be used. Since natural language is open-ended and human place concepts are sometimes vague, a reliable mapping from phrases to entity types in OSM cannot be achieved; only a ranking of likely entity types can be determined. Consequently, a method for geo-referencing paraphrased place descriptions needs to employ some form of search. In previous work [31], we have described such a search method for simultaneous geo-referencing of named entities and unnamed entities, i.e., entities referred to by their type. For reasons of efficiency, any practical application can only afford to try few candidates for entity types (in particular when facing multiple paraphrased places in a single sentence), so determining a good ranking is key. With the techniques discussed in this paper we aim to allow paraphrased places to be geo-referenced by computing the most likely OSM tags used.

We note that geo-referencing unnamed and paraphrased phrases may also be beneficiary to geo-referencing named entities. Given a natural language description like “Bao is my ideal lunch spot”, humans can easily infer that Bao refers to a place where one can eat. Indeed, Bao is the name of (at least) one restaurant in the United Kingdom, yet it is also the name of a town in the Philippines. Using existing named entity recognition systems, Bao is typically resolved as the name of the town. These systems are unable to recognize Bao to be a restaurant as named entity recognition does not consider ambiguity of locations and cannot handle the ambiguity of locations with the same label in a text [23].

Several researchers have focused on identifying specific contextual factors that are postulated to have an impact on natural language interpretation [26]. One can argue that using the abstract place concepts and details about typical food found at a particular place, we can predict some contextual information that will ease the process of place interpretation. There are numerous categories of contextual factors, out of which indications for place types can be derived. An explicit form is given by the language phrase and likewise ontological statement “is a” as in “Leeds is a city”, which can be exploited to disambiguate interpretations and thus improve geo-referencing [11, 29].

With this paper, we aim to draw attention to the challenging task of geo-referencing paraphrased places and to show how context can be exploited for geo-referencing paraphrased places. In particular, we are interested in the question of how context information provided in a sentence can help to determine the correct entity type, i.e., the associated OSM tag, of a described place. The basic idea is to generate semantically replaceable terms (which can be more than one) in the form of type nouns for any paraphrased place. For example, coffee can be linked to café, coffee and café linked to restaurant, and so on.

For identifying the SRTs, we first extract the nouns of a sentence and then determine the semantic similarity of the nouns to a list of OSM tags. To this end, we consider natural language models that provide some semantic similarity measure. Using semantic similarity, we propose a clustering method that allows us to determine semantically replaceable terms for nouns in a place description. In an experimental study, the ranking induced by this procedure is compared to semantic similarity of raw natural language models and shown to provide further improvement when the underlying models perform well enough. Our proposed processing pipeline is depicted in Fig. 1.

Fig. 1
figure 1

Processing pipeline for determining entity types from a sentence; not all nouns of a sentence are considered, as outlined in Algorithm 1; clustering may be applied optionally with any semantic similarity measure

Our study is conducted on a hand-annotated data set consisting of sentences crawled from travel blogs and English Wikipedia, using a compilation of documented OSM tags, i.e., key-value attributes, and a number of different semantic similarity measures. We argue that even many intuitively phrased sentences are ambiguous in terms of entity types and a context-sensitive geo-referencing strategy is thus needed. With our investigations, we are aiming to answer these questions:

  • To what extent can we exploit nouns present in a sentence to provide contextual information for geo-referencing a paraphrased place?

  • To what extent does semantic similarity help identifying semantically replaceable terms to link words to their corresponding OSM tag?

The remainder of this paper is structured as follows: Sect. 2 discusses work related to our study and summarizes previous approaches. Section 3 presents our methods, while Sect. 4 describes an evaluation and discusses the results obtained. By deriving conclusions and giving an outlook, Sect. 5 concludes the paper.

2 Literature Review

Current georeferencing systems mostly focus on place name recognition and disambiguation. Recently, more research was conducted on also extracting unnamed entities that do not appear in gazetteers [1, 5]. Such unnamed entities are part of many spatial expressions and their meaning often depends on outside knowledge or the surrounding context. Especially in everyday language, people use own names or boundaries to refer to and delimit places (e.g. downtown, city center) [10]. Even places that have a precise location might thus not be easily detectable. Vagueness however is not limited to regions but also applies to concepts, where a probabilistic value indicates the degree of membership to a certain category [20]. Exploiting knowledge about typical co-occurrences of toponyms improves disambiguation between possible spatial interpretations, similar to how knowledge of geographical context allows the name of a city or restaurant to be disambiguated. Hu et al. [11] have shown how structured concept-level information from DBPedia can be exploited in disambiguation. This is also present in VGI systems, where volunteers face the challenge of assigning places to correct categories. In OpenStreetMap (OSM), this is done by including key-value attributes, called tags, which specify an object’s category or add a feature (e.g., amenity=university, name=’University of Bamberg’). Entries in OSM can then be efficiently queried using those tags. Others [7] have also used those tags, among other crowed-sourced data, to enrich entries in geospatial gazetteers. Employing a fuzzy match algorithm, they were able to account for approximate spelling and approximate geocoding, in order to filter out duplicates.

Our goal is to automatically infer the correct OSM tag, i.e., key-value attribute, of a paraphrased entity from the surrounding sentence. We are essentially bridging the gap between existing research that focuses on geoparsing by looking solely at the text and work that tries to make OSM and their tags more approachable on a semantic level.

Already in early geoparsing research, Woodruff and Plaunt [28] noticed the potential of context in understanding natural texts. Aside from place names, they for example also extracted potential candidates that matched feature types taken from the Geographic Names Information System (GNIS) provided by the U.S. Geological Survey. These feature types can be compared to those of the OSM tags, featuring entries such as bar, beach, or school. Similarly, they are linked to place names in GNIS and can thus provide valuable information when georeferencing text. To establish a connection between words in the text and those tags, they additionally employed syntactic transformations such as depluralization (e.g. from beaches to beach), to deal with situations where no direct match is possible. Still, the used method is only able to establish connections between words that directly appear in the tag database. We argue that words provide semantic information beyond their syntactic representation and hence can help us identify a correct tag even without a direct or syntactically similar match.

Making a semantic match between different words has been studied extensively in the field of semantic similarity. Generally, semantic similarity gives a quantitative value of how closely two concepts are related [15]. Semantic similarity is of high importance in many applications in geographic information science (GIScience) and multiple different measures to calculate the similarity have been introduced [24]. For now, we are only concerned with the question if contextual words can contribute to the identification of the correct OSM tag. We observe for example that on this task all similarity measures initially proposed for WordNet [21] do not provide valuable information (see Sect. 3.2.1). By contrast, WordNet has previously shown promising results when only matching the unnamed entity itself to corresponding OSM tags [30].

Making the OSM tags more semantically approachable has also been addressed before. All available tags can for example be browsed through using the Taginfo systemFootnote 1. More approachable might be the TagFinder serviceFootnote 2 that expands upon the Taginfo data by integrating automatic translation (German–English), a thesaurus and an adapted domain-specific semantic network to offer a more flexible, semantic search. While this system is great in providing lists of potential tags, it only has a limited vocabulary and, for example, does not provide any results for every-day words cake or linguine that are used to paraphrase restaurants and both appear in our dataset. Other projects directly try to make OSM data more accessible by rearranging tags in a new semantic ontology as done in OSMontoFootnote 3, or by creating a whole semantic network for OSM as done in the LinkedGeoDataFootnote 4 or the OSM Semantic NetworkFootnote 5 [2].

All those approaches differ from ours in that we are not interested in changing the overall OSM structure but rather argue that we can infer semantically relevant information purely based on a list of all available tags.

3 Problem Description and Methodology

Given a set of natural language words that describe some geographic entity, the main objective of our work is to generate a ranked list of OSM tags that are likely used for representing the entity in OSM. Skimming over texts from Wikipedia and online travel blogs that describe geographic entities, we easily encounter situations where it is challenging to identify information required for mapping a target object to a precise type. We argue that a successful approach has to tackle these major challenges:

  • Conceptualization of entities in text varies across sentences as writers employ a common-sense understanding rather than a strict geographic taxonomy.

  • Classification of a single noun is subject to variations in OSM concepts since the semantic model underlying OSM is open-ended. Moreover, variations across volunteers preparing the data are inevitable.

  • Language is rich and offers many words to communicate nuances of a single entity type. Conceptual boundaries of natural language words are not always aligned with those of the OSM ontology.

Our approach is based on an explicit representation of non-spatial information extracted from the text. The method is divided into two primary modules; the first one is related to the extraction of nouns from the input, and the second one is related to finding the semantic similarity between the extracted nouns and entity type nouns present in a lexicon of possible OSM tags.

3.1 Noun Extraction

The objective of the first module is to translate a single input sentence into a set of nouns that represents spatial and non-spatial information. In contrast to classic parsing, we do not aim to capture the full structure of the sentence, and we only retrieve noun phrases. Noun phrases can either be spatial entities like named or unnamed entities or other nouns which provide further contextual information. For this extraction the part-of-speech tagger from the Natural Language Toolkit (NLTK) [3] is used. If two nouns follow after one another (e.g., “art” followed by “building”), then these are treated as one single nounFootnote 6. Every token tagged as a noun phrase is then further analysed, following a pragmatic approach to categorization which is presented in Algorithm 1. If the noun can be found in the geographic gazetteer GeoNamesFootnote 7, it is categorized as a named entity. If the noun can be linked to a spatial category in OSM, it is categorized as unnamed entity of the respective category. And if none of those provide a match, it is added to a list of context words. Exploitation of those context words is subject of this work.

figure a

Consider for example the sentence “This place has great coffee”. Using the above algorithm, we cannot identify a named or unnamed entity, but only the contextual words place and coffee. As humans, we are likely able to infer that this place might be a café, based on the contextual information that it offers coffee. We use the extracted context words as input for our second component, which automatically derives the most likely OSM tags for the given input and can thus lead to identifying the place mentioned.

3.2 Inferring Spatial Knowledge from Non-Spatial Context Words

Given a number of previously identified context words, this module is concerned with finding the semantically closest related OSM concepts in the form of key-value tags. Previous work often relies on WordNet [19] to establish semantic relationships [2, 30]. We thus relate our approach to an approach using WordNet in order to identify its potential and possible weaknesses (Sect. 3.2.1). In a proof-of-concept, we implemented two approaches based on pre-trained word embeddings: Noun-to-tag similarity (Sect. 3.2.2) and tag-to-tag similarity (Sect. 3.2.3).

3.2.1 Using WordNet as a Baseline

Following the research of [30], we used WordNet as a starting point to calculate the similarity measure between context nouns and OSM tags. WordNet [19] is a special digital English dictionary. Alongside the definition of a word, WordNet presents distinct meanings associated with that word. These meanings are grouped and called synsets, where each synset represents a distinct concept. Words accordingly can be assigned to one or more synsets.

WordNet further builds an ontological structure for nouns and verbs using a hierarchy of is-a relations [21]. Using this structure, similarity can easily be computed as a distance measure in the conceptual structure. Additionally, each word sense (synset) also comes with a short definition or gloss. Using this information, further similarity measures can be offered. Based on those two methods, initially six similarity measures for WordNet have been proposed [21]. Three of them are based on the path lengths between different concepts (lch, wup, and path) and the other three use the information contained in the concept definitions (res, lin, and jcn). Further, there exist three measures of relatedness that aim at expressing related concepts (hso, lesk, and vector).

For an initial test, we extracted a small set of OSM tags and compared all similarity and relatedness measures summarized in [21]Footnote 8 to a number of non-spatial context words. We quickly noticed that the similarity measures returned inconsistent results regarding our use case, or did not render any meaningful results overall. To demonstrate this, we use 10 arbitrary OSM tags from different domains and compare the similarity and relatedness for two different context words, namely lunch and bread. As possible OSM tags in this study we consider hotel, bakery, cafe, garden, bridge, library, picnic-table, fast-food, church, and university. In case of multiple word senses, we automatically select the one with the highest similarity value. The results are displayed in Table 1 and Table 2, respectively. For all measures, higher values represent a higher similarity. Values for lch, res, jcn, and lesk are in the interval of \([0,\infty ]\), wup, path, lin in the interval of [0, 1], and hso in [0, 16].

Table 1 Comparison of WordNet similarities for lunch
Table 2 Comparison of WordNet similarities for bread

It can be noted that some OSM tags like picnic-table do not yield an entry in WordNet and thus no similarity can be computed. For other vastly different concepts (e.g. bakery and library, or church and cafe), all or most values are identical, making the similarity measure almost meaningless. Only fast-food renders surprisingly high results, making it the potentially best match for both terms. We argue that an adequate measure should at least render higher values for the combination of bread and bakery. Lastly, the relatedness measure lesk always returns highest values for bridge, which is objectively not the most related term to either bread or lunch. Similar observations were made with other context words and also when including further OSM tags.

3.2.2 Noun-to-Tag Similarity

As an alternative to WordNet, we first designed a simple algorithm based on word embeddings. Word embeddings, like Word2Vec [18] or GloVe [22], map words to real-valued vectors in a way that synonyms are mapped to vectors that are close with respect to Euclidean distance. Large text corpora are used to train the mapping function.

Using those models, we then retrieve the vector representation of all input tokens (nouns). They are then individually compared to those of all available OSM tags, and the respective vector similarities are computed. Before further processing, all results for a single word are normalized such that the most similar OSM tag is set to 1.0 (maximum similarity) and the least similar OSM tag is set to 0, i.e., scaled to the interval [0,1]. This step is motivated by the assumption that each place must correspond to some entity in the OSM database and each context word should equally contribute to finding the correct OSM tag.

To illustrate this, consider again the sentence “This place has great coffee”. We first extract the nouns of the sentence, giving us the context words place and coffee. For each, the similarity to all OSM tags is calculated. For example, using a pre-trained Word2Vec model, coffee is most similar to OSM tags cafe (0.46), bakery (0.41), and icecream (0.39), and least similar to road (-0.01), raceway (-0.02), and bay (-0.03). place is more ambiguous and thus yields more random results with highest values for streets (0.25), building (0.21), and city (0.20), and lowest values for geyser (-0.01), locomotive (-0.01), and peak (-0.02). These values are then normalized, such that the highest value represents 100% (i.e., cafe and streets both become 1.00) and the smallest value represents 0% (i.e. bay and peak both become 0). The two normalized similarity scores for each OSM tag are then added and the result is the final score of the tag. Using this method, the best five tags are cafe (1.59), bakehouse (1.40), restaurant (1.36), biergarten (1.33), and bakery (1.27). When querying for the described entity, the OSM tag with the highest similarity would be used first. If no match can be obtained, the next best alternative is tried and so forth. In our case, cafe is the best match based on Word2Vec similarity, and also the correct OSM tag for the spatial entity in our dataset. Assuming the area to search to be identified and using one single query to OSM, we can thus find the place described.

The algorithm for computing the noun-to-tag similarity for any sentence, given as a list of context words, and given the list of all possible OSM tags, is depicted in Algorithm 2

figure b

3.2.3 Tag-to-Tag Similarity

Simply adding up the similarity scores over all context words, as done in the noun-to-tag algorithm described above, can yield varying sets of candidates. Often, the general category of the correct OSM tag is represented in the first few results. However, sometimes the exact tag we are looking for is not among them, but somewhere further down the list. To overcome this, we propose a second algorithm that additionally examines the semantic similarity between different OSM tags, and we hence call it the tag-to-tag similarity algorithm. It operates on a pruned list of OSM tags retrieved from the noun-to-tag algorithm. The overall idea is that similar OSM tags should also share a high similarity score. So, by taking the n most related other OSM tags for each OSM tag, we are able to identify OSM tags that are highly relevant to all OSM tags, which again already were identified to be most related to the sentence (based on the pruned list of OSM tags returned by the noun-to-tag algorithm). Given a matrix of n related OSM tags for each of the m pruned OSM tags (from noun-to-tag similarity), we simply count the number of times each OSM tag is represented. In case some OSM tags share the same count, these form a cluster. While this often helps in creating clusters of one contextual category, one cluster may also contain completely unrelated OSM tags. After the clusters are formed, the list of all clusters, sorted by the count, is returned. The full algorithm is depicted in Algorithm 3.

figure c

In order to get a better understanding of the algorithm, let us consider an example. Given the sentence “I came across an Italian place whose bread is to die for in Berlin.”, the extracted context nouns using Algorithm 1 are place and bread. Using noun-to-tag similarity, we get these best five OSM tags: food-court, fast-food, bakehouse, bakery, and picnic-table. These tags all relate to food items, but are not appropriate, as for this sentence the correct OSM tag would be cafe, which, in noun-to-tag similarity, appears at position 25 in the sorted list of all OSM tags.

Using tag-to-tag similarity, the type nouns are first reduced to a set of 12 OSM tags which include bar, cafe, farm, bird-hide, drinking-water, house, houseboat, restaurant, food-court, fast-food, garden, and picnic-table.Footnote 9 Then again, we find how closely these 12 tags are related to one another by means of Algorithm 3. The output indicates that we get clusters of OSM tags that are related to one another. OSM tags with a high occurrence number in the similarity lists indicate that they are related to most other OSM tags and therefore most likely capture the correct context of the input sentence. In other words, it is more likely to be the correct OSM tag for the sentence at hand. Out of these 12 types, picnic-table is related closely to 6 and therefore to the most other type values, garden, food-court, and fast-food are each related to 5 other types, house, restaurant, houseboat, drinking-water, and bird-hide are related to 4, and finally cafe, farm, and bar are related to 3 other OSM tags.

Putting these clusters in a list, we obtain [[picnic-table], [garden, food-court, fast-food], [house, houseboat, drinking-water, restaurant], [cafe, farm, bar]]. In the list of clusters, the correct output cafe can be found within the fourth cluster. Since clusters are sorted lists, a query would now require going through the three prior clusters (8 OSM tags) and find the correct tag at the first position in cluster four, or when executing its ninth query to OSM.

4 Experiment and Discussion

We have implemented our approach as a research prototype to evaluate the contribution of using semantic similarity to exploit context in the interpretation of place descriptions. The focus of our prototype is to improve querying and resolve ambiguity in text-based geo-information retrieval by finding the correct OSM tag.

4.1 Experimental Setup and Data

To the best of our knowledge, there exists no dataset for evaluating geo-referencing of spatial text which addresses complex place descriptions involving unnamed entities. We collected a corpus of sentences that contain place descriptions from travel blogsFootnote 10 and English Wikipedia (15 descriptions) by scanning the summary part of articles. In total, our evaluation dataset comprises 103 sentences and multi-sentence text fragments. For now, we focus mostly on text descriptions about places that are either related to food or tourism.

Further, we are specifically interested in sentences containing contextual non-spatial information. Taking those context words, we then evaluate how well we can infer the correct values for OSM tags. The aim of our evaluation is to assess the applicability of semantic similarity with simple ranking and clustering methods to this problem. In other words, the goal is not primarily to measure computational performance but rather understand how well the problem at large can be tackled, given different methods based on semantic similarity.

Our evaluation is based on running all algorithms on all sentences of the dataset. We first execute the algorithms on a given input sentence. As a result, we retrieve a sorted list of possible OSM tags. We then identify the position at which the correct OSM tag is listed. The lower this position number, the better the algorithm, as a lower position number indicates that the correct OSM tag will be considered earlier when searching for the place. In case multiple tags receive the same score and the order is thus ambiguous, we follow a worst-case assumption as such that the correct tag is said to be the last within this set.

To obtain ground truth, we have manually annotated the input sentences with their correct OSM tags. This was done by browsing OSM using the original named entity, taken from the source of the sentence. We then extracted the nouns according to the method described earlier. To understand the dataset, some annotated sentences are displayed in Table 3Footnote 11. For each sentence, the corresponding context words are highlighted and the correct OSM tag is given.

Table 3 Annotated sentences with correct OSM tag

The highlighted nouns indicate that there are different combinations of context words. The extracted nouns may belong to the same category, like bread and pasta, and thus represent one specific context. Or, in case of more abstract context words, like spot or location, only a generic context not related to a single category can be inferred. We are interested to see if different inputs will affect the prediction of the correct OSM tag.

4.2 Comparison with Semantic Similarity Approaches

The aim of this experimental setting is to compare the performance of clustering against that of raw semantic similarity approaches. For this purpose, we have considered three standard semantic similarity measures, which are (i) GloVe [22] word vectors trained on the Common Crawl datasetFootnote 12 as contained in the Spacy medium model for English (en_core_web_md) in version 3.3Footnote 13, (ii) Word2Vec [18] trained on the Google News dataset (about 100 billion words)Footnote 14, and (iii) semantic similarity scores based on embeddings from a pre-trained transformer model. Precisely, the used model is ‘microsoft/deberta-large-mnli’, a model for multilingual natural language inference (MNLI) tasks, called DeBERTa V3 [9], available as part of the BERTScore [32] Python libraryFootnote 15. Among over 130 pre-trained language models included in BERTScore, the one selected was recommended by the authors as overall best correlating with human scores. With DeBERTa being a derivate from the popular BERT model [6], and given the library’s name, we refer to this semantic similarity approch simply as BERT. BERT differs from the other mentioned word embedding approaches in that word vectors are context sensitive. Next to the usual token embedding, they also add segmentation and position embeddings that depend on the context a given word appears in [6]. Note, however, that given our task, we only provide a list of extracted context words and hence may not benefit from these contextual information. Hence, we also conducted an analysis on the complete, original input sentences. As this did not lead to significant differences, we do not include this approach for the sake of comparability with the fixed word vector embedding approaches (Word2Vec, GloVe).

For all pre-trained models (i.e., GloVe, Word2Vec, and BERT), we calculated both the noun-to-tag similarity without pruning and the tag-to-tag similarity (indicated as clustered in the tables). For each method, we denote the position at which the correct OSM tag appears (Table tab:full). For a more compact presentation of the reults, we also computed the precision for finding the correct OSM tag up to a given position k, denoted as Hit@k (Table 4). In a practical application that aims to identify the mentioned place by evaluating queries until a matching place is found, one would require k queries to OSM in order to find the correct spatial entity for a given set of context words. It is hence of practical interest to keep k as small as possible, or in other words, the Hit@k value high for smaller k.

Finally, we computed the normalized discounted cumulative gain (nDCG) [12] for the top-k results (Table 5, with \(k = 20\)). The nDCG measures ranking quality by weighting the position of the correct OSM tag using a logarithmic scale. It is defined as \(nDCG = \frac{DCG}{IDCG}\) where \(DCG = \sum _{i=1}^k \frac{2^{rel_i} - 1}{\log _2(i+1)}\) with \(rel_i\) being the relevance of the i-th result, which is 1 if it is the correct OSM tag and 0 otherwise. It hence can be simplified to \(DCG = 1 / \log _2(r+1)\) where r indicates the position where the correct OSM tag is found. The ideal discounted cumulative gain (IDCG) is the maximum possible DCG value, which translates to finding the correct OSM tag always at position \(r=1\) and simply defaults to 1. Therefore, the nDCG for a given dataset here is the average DCG value over all input sentences. The higher this value, the better the method in general, as on average the correct OSM tag will be found earlier in the search.

It is worth noting that the nDCG@20 as top-k approach only considers DCG values for positions \(r \le 20\) and returns 0 for all other r. Instead, we could include all positions, but would then need to find an adequate way of representing the pruned OSM tags in the clustering approach. Conservatively, we could use the DCG value of position n, where n is the number of all available OSM tags. Essentially, we then score the method as if the correct tag were found at the very last possible position. Note that even such a conservative calculation does not change the ranking of the methods.

Table 4 Precision of all algorithms given as percentage for the correct OSM tag to appear on a position up to k
Table 5 Normalized Discounted Cumulative Gain [12] nDCG@20 of all algorithms. Higher is better
Table 6 Full results of all algorithms

4.3 Discussion of Results

Determining OSM tags from unconstrained natural language is a tough task. In a first analysis, we consider the sentences from Table 3, leading to the results shown in Table 7, where each line represents a sentence in the form of its extracted context words, the correct OSM tag, and the position in the sorted result list from the tag-to-tag similarity method.

Table 7 Results for sentences in Table 3 using GloVe similarity measure with clustering

For the tag-to-tag similarity method, we first notice that no value for house and beer is found. This is due to the pruning phase in combination with one highly ambiguous (house) and one more specific word (beer). Both words in such a scenario can return vastly different similarity values for the same OSM tags, and in consequence the correct tag may be pruned when individual rankings are combined. The same behaviour has been observed using BERT similarity.

In cases where the words belong to the same context category, like tea and cupcakes, or bread and pasta, the correct OSM tag can often be found at a low position. As the tags extracted for both nouns relate to the same category, clustering will further favour values within this category and contribute to a better ranking of the correct answer.

Likewise, as mentioned above, clustering can have adverse effects in cases where context nouns yield a low similarity to the correct OSM tag. For example, in the sentence “Birdland has all kinds of avian species, including flamingos, pelicans, penguins, and owls”, the correct OSM tag zoo was pruned due to obtaining a very low similarity score between flamingos and zoo. We may attribute this problem to a poor performance of the semantic similarity measure in capturing possible relations between concepts. By contrast, presence of an unspecific context word like place had no adverse effect in this setting.

In a second analysis, we compare the performance of the clustering algorithm with the pure usage of semantic similarity, with the aim to identify whether clustering improves the performance of these measures. The results are displayed in Tables 4 and 6. The data in Table 6 provides a complete overview of the positions at which the correct OSM tag appears using GloVe, Word2Vec and BERT with and without clustering. In case of using the clustering algorithm, correct tags may get pruned off (indicated by column ‘N/A’); otherwise, the position refers to the index at which the correct OSM tag appears in the sorted list of pruned tags (see Algorithm 3). BERT, Word2Vec and GloVe are able to generate plausible options for most input sentences, despite low overall scores in Table 4 and Table 5. For example, if we obtain a ranking that starts with cafe, fast-food, etc. but the correct tag restaurant only appears at a much later position, then only that position determines the result and successfully finding OSM tags of similar semantic meaning is not considered (Fig. 2).

Nevertheless, there is still room for improvement as we often observe tags unrelated to all context words at early positions in the result list. For BERT, we notice that as the number of both abstract and context-specific nouns in a sentence increase, one group of words takes priority and the other group effectively gets ignored. Consider the following example, where the correct answer is the OSM tag restaurant: “Royal China is the place to be when you want dim sum, duck, and Cantonese dishes that are so good your ‘quick lunch’ turns into a three-hour feast” with the following extracted nouns place, dim-sum, duck, dishes, lunch. Using the noun-to-tag algorithm with BERT, restaurant is located at position 71, while the first ten positions are more related to duck and include tags such as pond, ditch, and mud. Figure 3 provides a detailed overview of the experimental results indicating number of nouns and position of correct OSM tags.

For Word2Vec, the mixture of abstract and context-specific words does not appear to have any significant impact. Yet, we observe an inability to deal with compound nouns like fast-food, retail-area, picnic-spot, etc. as these tags often receive positions greater than 20.

In contrast to Word2Vec, GloVe is able to deal with compound nouns more reliably, but is unable to provide an informative similarity score involving abstract context nouns. As these appear frequently in our dataset, performance of GloVe measured as nDCG@20 is generally low. The poor performance of the similarity measure alone appears to be further boosted by clustering, resulting in an even worse performance.

Overall, from the results displayed in Table 6, we can conclude that the performance of BERT and Word2Vec is superior to that of GloVe. However, all three semantic similarity measures yield a significant number of OSM tags appearing at positions greater than 20. Only for Word2Vec in combination with clustering we exceed 20% in Hit@1, i.e., one could identify the paraphrased place using a single query. The performance observed greatly hinges on the individual models, as all provide semantic similarity measures trained from specific datasets using specific machine learning methods.

Fig. 2
figure 2

Scatter plot showing position of correct OSM tags using Word2Vec with clustering, position \(-1\) indicates that the tag is not contained in the answer

Fig. 3
figure 3

Scatter plot showing position of correct OSM tags using BERT. 10 also includes all results \(> 10\)

We also conclude that the proposed clustering method only becomes effective when the underlying semantic similarity measure is sufficiently reliable. As can be seen in Table 6, only the performance of Word2Vec has been improved by application of clustering. In cases where all context nouns are already from the same domain (e.g., tea, cupcake, lemon tart) the performance of clustering is usually within the same range as with Word2Vec without clustering. Clustering particularly helps to identify the correct tag in cases where context nouns present some ambiguity. For example, the correct OSM tag pub for the sentence with the context words place, Michelin star, food appears at position 27 without clustering, but at position 7 after applying clustering. Yet, there also exist cases where the combination of Word2Vec and clustering is unable to provide any answer at all. These are often cases where the context nouns are a combination of abstract and specific nouns (like, place and bread), one of which judged to be dissimilar to the correct tag which then gets pruned away. We are using only the OSM tags appearing at a position less than or equal to 20 from the noun-to-tag algorithm as input to the clustering step. This results in the effect that if the normal version does not find the correct OSM tag within the first 20 positions, clustering cannot even find the OSM tag.Footnote 16 In a practical setting where the described place gets searched, it is key that the correct tag is ranked high and appears very early in the resulting list of tags. So it is unlikely to matter whether the correct tag does not appear within the first 20 positions or it does not appear at all.

In contrast to Word2Vec, the performance of BERT and GloVe decreases after clustering, as both models are subject to noisy similarity judgments. In particular, for GloVe, a larger set of context nouns will result in a minimized and more specific set of OSM tags. However, with more abstract nouns, like spot, interior, drink, the output clusters typically contain also more irrelevant tags, like temple, shrine, cathedral for BERT and words like ice-cream, picnic-spot, hedge, country for GloVe. Clustering then amplifies problems already present in the similarity judgments.

From our experiments we can conclude that word embeddings like Word2Vec can play an essential role in identifying OSM tags, which can aid the task of geo-referencing paraphrased places. In most situations, we were able to detect plausible tags within the first few candidates. Using pruning and clustering techniques, we were able to improve results compared to the original semantic similarity method. Although the results may not appear impressive, the proposed method allows more than 20% of the paraphrased places to be identified with only one query. This is a clear improvement over existing approaches to geo-information retrieval as they mainly focus on named entities.

The results achieved motivate further investigations to determine the most effective strategy. For example, consideration of adjectives or verbs might further improve results, but would require new techniques to assess similarity to OSM tags. Additionally, identifying a method to remove abstract nouns that are not informative for the place category (e.g., place, spot), may help to overcome the current limitations of models or our clustering algorithm. Considering the poor performance of a contemporary transformer model (DeBERTa), it would also be interesting to consider alternative ways of posing the OSM tag estimation task to apply such models more effectively and obtain better results.

5 Summary and Conclusions

In this paper, we investigate how contextual information can be exploited to help geo-referencing places, which are only paraphrased in natural language place descriptions (“a place to eat near river Regnitz”). For identifying those places in the OSM database, one requires OSM type names to query for the object (an entity of type restaurant near river Regnitz, for example). In this paper, we consider semantic similarity of words to OSM tags as a method for inferring and predicting the most likely OSM tags of the place in the OSM database. Natural language place descriptions can be ambiguous and imperfect, so a perfect interpretation is not possible in general. Approaches can only determine a set of (ranked) candidates, aiming to assign a high likelihood to the correct OSM tag. This paper proposes methods to tackle this task and evaluates them on a hand-annotated dataset comprising sentences from Wikipedia and travel blogs.

In our approach, semantic similarity is determined by means of WordNet, pre-trained word vectors (GloVe, Word2Vec) and pre-trained transformer-based word embeddings (DeBERTa). Of those methods, Word2Vec-based semantic similarity already provides us with reasonable results, as it is able to identify the correct OSM tag within the first five results in almost 50% of the cases. In about 16% of the input sentences, Word2Vec even ranked the correct OSM tag at the first position, requiring only one single query to the OSM database to correctly identify the paraphrased place. Using a clustering algorithm, we are able to improve upon these results. In more than 22% of the cases we are able correctly interpret the paraphrased place with only one query to the OSM database.

While these results look promising, we still find instances where the proposed methods fail. This comes at little surprise, as these approaches only consider co-occurrence of words and do not possess true text understanding. We are nevertheless motivated to continue our work, as it can still advance current geo-referencing systems.

In future work, we are going to explore the usage of additional contextual information, such as provided by verbs and adjectives. Finally, we want to integrate these methods within an automated system for interpretation of place descriptions in order to determine the overall effect of contextual information.