1 Introduction

Consider the following indoor route descriptionFootnote 1 (taken from soleway.ugent.be):

Example 1

Enter the hallway. Take the elevator or the stairs to the third floor. Take the double grey door between the stairs and the elevator. Turn right. Take the double brown door. My office is at the left side, a bit over halfway the corridor.

and also this corresponding (made-up) place description:

Example 2

My office is in Building S8, on the third floor, about halfway along a corridor that connects at one end to the central staircase of the building and at the other end to an emergency exit.

Both kinds of descriptions can be mapped to graphs. Graphs contain the essence of these descriptions in a more abstract and database-accessible form. In current practice, the nodes of these graphs represent the locative noun expressions (“hallway”, “double grey door”), and the edges represent the explicit relationships between these (“at the left side”, “between”), or the actions to be taken to reach one from the other (“take”, “turn”, “enter”). Actions can also be considered as implicit relationships of connectedness.

Common language route or place descriptions are provided by people for people with certain questions (“where is”, “how do I get to”). Thus, the extraction and formal representation of human environmental knowledge aims at automated answering of questions of this kind, in terms close to human language. This kind of knowledge is difficult to extract from geometry-based spatial databases [11, 13]. The automatic extraction from language is also challenging but not addressed in this paper (all interpretations in this paper are human-made). This paper focuses on the characteristics of human indoor route descriptions and on the capability of formal graph representations to capture the essence of the encoded human spatial knowledge.Footnote 2 It uses for its study a corpus of crowd-sourced indoor route descriptions that is investigated here for the first time.

The formalization of human route and place knowledge in graphs has been done so far for outdoor environments [2, 4, 36]. Hence, this paper will make the following contributions to knowledge:

  1. 1.

    identify deficiencies of existing route and place graph ontologies to represent spatial knowledge from indoor route descriptions;

  2. 2.

    identify a continuity principle in order to derive implied qualitative spatial relationships between places;

  3. 3.

    show that route and place graph ontologies can be integrated.

The hypothesis of this paper is that the environmental knowledge encapsulated in verbal indoor route descriptions can be extracted and represented in a graph structure for querying.

The paper starts with a review of route ontologies and graph representations for the knowledge in route and place descriptions (Sect. 2). It then explores the route graph of Brosset et al. [4] and the place graph of Vasardani et al. [36] for their respective capacity to represent indoor route descriptions, which is then also compared with the capacity of the two graphs to represent place descriptions given in route perspective (Sect. 3). From these observations, Sect. 4 seeks to integrate the two graphs. An application of graphs in query-answering is presented in Sect. 5. Conclusions are discussed in Sect. 6.

2 Literature Review

The use of graphs for spatial knowledge representation is not new. For example, all transport networks have a graph structure, but then not much semantics attached since these networks consist of geometric elements of identical semantics, such as the street segments in a road network, or the direct flight connections in an airline network. Closer to the interest of this paper are uses of graphs to represent semantically rich knowledge (of places, routes, or networks), even at the cost of geometry, since natural language descriptions are usually qualitative about places and their relationships.

Pioneering in this direction is the work of Kuipers, who made a case for qualitative spatial reasoning in robot route planning. He uses triplets of ‘views’ – sensory input that, by equivalence classes, forms a notion of place – and ‘actions’ – equivalent to the movement verbs in natural language descriptions [22]: (view, action, view). Later he extended this basic graph structure, which he compared with a cognitive map [23], for semantic attributes and a hierarchical structure, in order to cater for applications such as the communication of a robot (vehicle) with its passenger. The ontology developed for this purpose is the hybrid spatial semantic hierarchy [1, 24]. Similarly, Krieg-Brückner and colleagues [21] first proposed a light-weight ontology of route graphs in an indoor environment, which then can be specialized for different user categories or travel modes, for example for people sitting in an intelligent wheelchair, and communicating with the same. Again, the application in mind is robotic route planning.

Again other route ontologies focus on multi-criteria route planning, for example for personalization purposes, or for adaptation to special circumstances or needs [5, 26, 29, 30]. Here, an ontology-based knowledge modeling can enrich a mobility network with a range of criteria, which are then available for the route planning algorithm. Works like these aim to capture a broad range of semantics of network elements in order to enable flexible choice.

In contrast to graph representations built for route planning, the current paper is concerned with the ontology (i.e., the nature) of a route description itself. The interest of this paper is in representing the spatial knowledge from route descriptions, not the route choice. These ontologies are formed from studying text corpora. For [2, 4], the corpus was formed of hiking or orienteering route descriptions in natural environments. Their ontology is based essentially on triplets (from location, action, to location), in our indoor context for example (from the hallway, take the elevator, to the third floor). Where the origin (from location) is missing in verbal route descriptions it can be inferred. For example, “enter the hallway” implies an origin (“from where you are”, or “from outside of the building”) for triplet completion.

In parallel, a similar graph model was developed to represent the knowledge in verbal place descriptions [36]. This model is based on triplets (locatum, spatial relation, relatum), such as in (my office, on, third floor). The spatial relation is directed from locatum to relatum. Although the triplet structure looks similar, it is worth mentioning that route descriptions refer to the location and orientation of the moving individual, while place descriptions refer to the location of places relative to each other, in varying perspectives. I.e., their reference systems are different.

A complementary approach takes a spatial knowledge representation and generates natural language descriptions of routes. This approach has been an active field of research particular for car navigation and web mapping services. The commercial solutions are still mostly limited to the turn-by-turn paradigm and typically hide the origin, such as “in 300 m turn right”, implying a full triplet (from here, turn right, after moving for 300 m). They can afford to be stripped to the essential because they are provided in-situ, and thus do not need to be memorable. Nevertheless there have been calls to include landmarks in these descriptions in order to support people’s cognitive processes of matching with the environment [7, 10]. To consider landmarks, the original turn-by-turn structure requires modification for the optional inclusion of other elements. The most complete data structure in this regard has been suggested by [14].

Going one step further is generating mixed route descriptions. Provided the driver is familiar with parts of the route, natural language generation of route descriptions can start with a place description to identify the anchor point, before continuing with the (sequential) route description, such as in “Go to the post office opposite the station [you know the way], then take ...” [31]. Also a mixed form is produced by an approach to generate a narrative of the experience of an indoor environment from a digital geometric representation such as a CAD or BIM model [3]. The result is not necessarily a route description, but a mixture of views in the environment (a place description) and movement through the environment (a route description).

Indoor environments have some properties that make them conceptually and perceptually different from outdoor environments [28]. First and foremost, movements in indoor environments happen also across levels, a property explicitly excluded for example by the route ontology of [2, 4] but adding substantially to the complexity of human wayfinding [15, 16]. Indoor environments are also environments of relative short vistas and small-scale landmarks compared to outdoor environments [37], and for this reason they typically lack global landmarks and absolute spatial reference frames. Their dense structure also calls for simple route directions [34, 35], among other reasons because these route directions have to be memorable in order to allow users roaming without being forced to constantly watch a smartphone screen [27]. Not only pragmatic reasons, but also linguistic research supports this aim for short descriptions [8, 9]. Finally, indoor (built) environments are typically of high regularity, with narrative strategies adapted.

The current paper will extract environmental knowledge from verbal (human) indoor route descriptions. We will discover that these route descriptions are more often mixed than not, i.e., contain sequential parts (route descriptions) and configurational parts (place descriptions). Hence the paper will explore the use of the verbal route and place description ontologies, and possible combinations.

3 Exploring Graph Models for an Indoor Route Description

The hypothesis – that the environmental knowledge encapsulated in verbal indoor route descriptions can be extracted and represented in a graph structure – requires some definitions, which then can be applied to our data sets.

Definition 1

A route description is a verbal instruction to follow a particular route through an environment.

A route description answers a how [to find] question. The expectation is that a route description has predominantly a sequential structure (as in Example 1), although parts of the sequence can be folded [19]:

Example 3

Go to Level 3 [you know how to], then turn right ...

and non-sequential forms are possible as well, for example hierarchic instructions:

Example 4

My office? You have to get to the third floor; best to take the elevator. Just turn right behind the entrance to find the elevator.

Definition 2

A place description is a verbal explanation of a configuration of places.

A place description answers a where question. Place descriptions can have a survey or a route perspective in their narrative strategy [32]. An example for a survey (‘birds-eye’) perspective is:

Example 5

My office is on the third floor, in the North Wing of the building.

An example for the route perspective is:

Example 6

From the stairs my office is at the left side, halfway the corridor.

Place descriptions can also have a hierarchical form, zooming in or out, like the one shown in Example 2.

Also, as mentioned before, route and place descriptions can mix, such as in Example 1, where the configurational “My office is at the left side”, a place description, sits at the end of a declared route description. This example also illustrates the complexity for automatic interpretation since the configurational part still carries the direction of the route as spatial reference frame for the relative direction relationship. In addition to tracking spatial reference systems, other complexities have already been mentioned above, such as the completion of triplets by inference.

Studying two corpora – one of indoor route descriptionsFootnote 3, and one of mixed indoor-outdoor place descriptions of a campus [36], both manually tagged – reveals that both show substantial portions of descriptions that mix narrative structures. Of the in total 1127 indoor route descriptions 823 (73%) contain configurational parts, and of the 42 campus (place) descriptions 27 (64%) contain route perspective parts. These substantial numbers show that neither of the ontologies above will be capable to capture comprehensively the environmental knowledge expressed in verbal indoor route descriptions. In the following, we will illustrate the capacities of route graphs and place graphs, both on a route description and a place description. In addition, the applied natural language interpretation process can be stricter (allowing only for explicit relations) or more flexible (allowing also for implied relations and references across sentences). The observations will lead to a strategy for storing the collected route and place knowledge together, presented in Sect. 4.

3.1 Brosset’s Route Graph Applied to Indoor Route Descriptions

We will first investigate the capability of Brosset and colleagues’ route graph [2, 4] to represent the environmental knowledge in the 1127 indoor route descriptions of Soleway. The structure of this route graph is based on triplets of two places (nodes) and the action between these places (edges). The edges can have a further attribute, a qualitative spatial relation, as in “turn right”, an action-relation pair. The triplets extracted from one description can be concatenated, and these route graphs can be further merged to semantic networks if different route descriptions show sufficient evidence for common places.

While there is no ontological difference between environmental knowledge extracted from outdoor descriptions (the subject of study by Brosset and colleagues) and indoor descriptions (our interest), the Soleway dataset has revealed other issues, such as:

  • Indoor descriptions seem to have significantly more places characterized by their types and properties rather than by name, increasing the chance of ambiguity. Also, indoor descriptions can draw from only a small set of types due to repetitive design [18]. With higher ambiguity, indoor descriptions provide a significantly higher challenge for merging.

  • Indoor descriptions seem to rely on only two types of actions (Table 1): locomotion (expressed by verbs such as go or walk) and choice (expressed by verbs such as take or find). A third group of verbs are static, relating to configurational parts.

Table 1. The frequent verbs in the soleway corpus, based on the stanford NLP toolkit.

Choice implies a motion in order to realize the choice, hence, choice actions require a more detailed ontological commitment. A choice of a place such as “take the elevator” implies a travel on the elevator (in Example 1 from ground floor to third floor). And thus, the elevator is in this context a vehicle for motion between the two places (elevator@GF and elevator@3F), such that these triplets can be formed: (hallway, walk, elevator@GF), and (elevator@GF, take_elevator, elevator@3F). The same is true for doors: “Take the door” makes the door a passage from one space to another space. This postulate relies on an assumption of a continuous movement, independent from the actual narrative form. We call this assumption the continuity principle.

Figure 1 shows Example 1 in this route graph model. Nodes are the places named in the description, with those place names linked to choice actions expanded to start and end of the corresponding motion. The edges represent the actions found in the description.

Fig. 1.
figure 1

Applying the route model of Brosset et al. to Example 1.

3.2 Vasardani’s Place Graph Applied to Indoor Route Descriptions

Vasardani’s place graph was formed on observations in a corpus of campus descriptions. The campus descriptions cover already indoor elements, and thus, we expected less challenges from a scale perspective, but rather from the fact that Soleway descriptions are route descriptions, not place descriptions.

Here, a now purely spatial continuity principle suggests to infer some relationships that are not explicit, forming for example triplets (elevator, inFootnote 4, hallway) and (elevator, in, third_floor). These inferred edges are included in Fig. 2.

Fig. 2.
figure 2

Applying the place model of Vasardani et al. to Example 1, with inferred edges.

3.3 Brosset’s Route Graph Applied to Place Descriptions

Since also verbal place descriptions can take a route perspective as narrative strategy, i.e., can contain parts that may be recognized as route descriptions, we also apply Brosset’s route graph to place descriptions, for later comparison with Sect. 3.4. The common example shall be a description from the corpus of campus descriptions [36]:

Example 7

Entering the campus from the main entrance on Grattan Street the visitor first needs to climb stairs to get onto South Lawn and on top of the carpark. Now walking towards Old Arts, the Medical School and Baillieu Library are on the left, the Geography and Architecture buildings on the right. Reaching the Old Quad, you enter the most beautiful and oldest part of the campus. Unfortunately it is not very big and soon the visitor passes University House and is again surrounded by yellow brick buildings from the 70ies. Ahead now are the sports facilities of the university.

The place description contains references to 16 places, including “the most beautiful ...part of the campus” and “[above] the carpark”, and “[a group of] yellow brick buildings”. Since the graph is formed from triplets only, Fig. 3 shows in particular that only a subset of these places (11) are connected by an action.

Fig. 3.
figure 3

Applying the route model of Brosset to Example 7.

3.4 Vasardani’s Place Graph Applied to Place Description

Also a place graph can be extracted from Example 7. This graph contains all named places that are connected in the description by a qualitative spatial relationship (Fig. 4), which is a different subset than before, of 7 places.

Fig. 4.
figure 4

Applying the place model of Vasardani to Example 7.

3.5 Observations on Environmental Knowledge Capture

Based on the experiments above, place and route information coexist in both place and route descriptions, and can be disambiguated by the verbs (Table 1). Their coexistence could have multiple reasons, beyond individual preference for a particular narrative strategy. In particular, configurational elements may occur in route descriptions for the conversational context (e.g., assumed familiarity of the recipient with the environment) or for environmental context (e.g., a need for disambiguation) [12]. The speaker of Example 1 sees a need to disambiguate “the door between stair and elevator” from other doors, and is using relations instead of actions. In addition, observations show a strategic use of switching between route and place elements to provide a spatial reference frame – the heading of the route – for relative directions in the configurational part of the description. For example, in “Take the double brown door. My office is at the left” (Example 1) the left in the configurational part refers to the walking direction.

The coexistence of place and route information in route as well as place descriptions [33] is a compelling motivation for reconciling the current graph representation models for places and routes, despite their ontological differences in nodes, edges, and reference frames. Relying on one model only, even if it is the one that fits best in a current situation, will lose some of the environmental knowledge encapsulated in the verbal description. This reconciliation will be investigated in the next section.

4 Integrating the Graphs

Here, a strategy is proposed for storing extracted place and route knowledge in an integrated graph representation.

4.1 Reconciling the Edges

An integrated graph has to address the different semantics of edges in route and place graphs. While in route graphs the edges are representing actions, possibly enriched by (mostly) directions, in place graphs the edges represent qualitative spatial relationships (often topological ones) without a link to an action.

The way to integrate edges of different types in an ontology is by abstraction. The integrated edge represents (just) a relation between two places, which can be further established by attributes:

  1. (1)

    relation:  : {action, qsrelation}

    action  :  : Walk | Take | NoAction

    qsrelation :  : Near | Left | Right | In | ...| NoQSRelation

For example, relation = {Walk, NoQSRelation} reverts to a route graph edge, and relation = {NoAction, Near} reverts to a place graph edge. Since all considered graphs are multi-graphs, both edges can co-exist between two nodes. Thus, this abstraction allows to merge the basic route and place graphs into one graph.

4.2 Inferences

The unified graph representation can be enriched by inferred relationships.

Locomotion verbs can induce a topological relationship of path-connectedness.

  1. (2)

    (action = Walk) \(\rightarrow \) (qsrelation = path-connected)

This inference can either be made explicit as a relation, or set up as a database constraint. Qualitative reasoning on this relationship is possible: If A is path-connected with B, and B with C, then it is possible to reach C from A.

Choice verbs led previously to a split of nodes in the route graph, while the nodes were preserved in place graphs. The two different needs can be reconciled by maintaining the semantics and granularity of place graphs – describing the locations of places in relation to each other – and adding a new element to the integrated graph: A loop. A loop is an edge that connects a node to itself, and thus a loop can describe a movement with a place (“take the elevator”), on a place (“take the stairs”), or through a place (“take the door”). The loop has two attributes: A from and a to, where the places are inferred from path continuity. The action of a loop is by default Take, and the spatial relationship is by default NoQSRelation.

  1. (3)

    looprelation :  : {from, to}

    from :  : place

    to :  : place

For example, “take the elevator to the third floor” would be represented by a loop:

l1 = elevator, (from: hallway, to: third floor), elevator

And the “take the [double grey] door” would be represented by a loop:

l2 = door, (from: elevator, to: corridor), door

Qualitative spatial relations and static verbs allow a default assumption of path-connectedness, and thus reachability, in another inferred edge. In some sense this inference is the inverse to Specification (2):

  1. (4)

    (qsrelation = In \(\vee \) Next \(\vee \) Near) \(\rightarrow \) (action = Walk)

Similarly, the static verb see suggests to add a relation viewrelation :  : InView | NotInView to Specification (1). A viewrelation is neither about reachability, nor is it a canonized qualitative spatial relation [6, 25] since any reasoning with it requires geometry. However, it can be derived directly (“from ...you see ...”) or implied by common sense for many expressed actions (for example, “take the elevator” implies that I perceive the elevator by some environmental cues when I consume this instruction) and spatial relationships (for example, “the door between elevator and stairs” implies that I see all three places). Such common sense (default) reasoning is part of the strategy of a recipient of such descriptions [20]. Even if this relationship is not about reachability per se, in indoor space it often (and thus by default assumption) is the case, as another facet of the spatial continuity assumption.

Such inferences from default reasoning allow a natural language parser to translate some configurational statements into actionable statements. For example, if “the door between elevator and stairs” in Example 1 implies that I see all three places when I consume this description (and that is when I have reached Level 3 either by elevator or stairs), then the following triplets can be added:

  • t1 = (elevator at Level 3, walk, door)

  • t2 = (landing at Level 3, walk, door)

Fig. 5.
figure 5

Integrating place and route graph by abstracting edges in Example 1.

5 Application

If integrated route graphs contain the knowledge of human route descriptions, then it should be possible to generate route descriptions from the graphs again. This is a relevant thought, enabling to use a few collected route descriptions to prepare many more route descriptions by recombination. In this regard, the merging of graphs from different route descriptions is a process functionally equivalent to the cognitive process of integrating route knowledge into survey knowledge, which also enables the recombination of segments to routes from anywhere to anywhere [32].

Querying the integrated indoor route graph for paths relies on the connectedness (reachability) provided by action terms. Route planning can happen on any edge with a positive action attribute (i.e., not NoAction). Then, natural language generation can apply the well-known paradigm of (from loc, action, to loc). The example below guides from the elevator at Level 3 to “my office”, using the knowledge of the integrated graph in Fig. 5.

From elevator (walk to/take) double grey door;

from grey double door (walk to/take) brown double door;

from brown double door (walk to/take) corridor;

from corridor (walk to/take) my office.

or simply, with implied origins:

(walk to/take) grey double door;

(walk to/take) brown double door;

(walk to/take) corridor;

(walk to/take) my office.

Configurational edges (qsrelation) are not yet included. They may be used to add landmarks along the route.

6 Conclusions

The paper addressed the hypothesis that the environmental knowledge encapsulated in natural language indoor route descriptions can be extracted and represented in a graph structure for querying. As the experiments showed, route information and place information coexist in verbal route descriptions. The complementary role of route and place information were discussed using examples.

Existing route and place graph ontologies were compared, and their individual limitations have been demonstrated. The observations helped to identify four mechanisms to develop a stronger graph representation of the environmental knowledge contained in verbal indoor descriptions, based on the verb classification of Table 1:

  1. 1.

    Locomotion verbs can induce an edge for connectedness.

  2. 2.

    Choice verbs can induce a loop in the graph representing a movement (change of location) while being in, on, or at a place.

  3. 3.

    Static, configurational verbs in conjunction with spatial prepositions (characterizing qualitative spatial relationships between places) are only one kind of relation – the integrated graph uses a generalized concept of relation.

  4. 4.

    Applying a continuity principle (in space and time), first, missing origins of movements can be inferred, and secondly, some configurational descriptions can be converted into locomotion actions.

The integrated graph demonstrably contains more of the environmental knowledge: While Fig. 1 has 9 places (but two duplicated by splitting) and 7 labelled relations, and Fig. 2 has 8 places and 10 labelled relations, the integrated graph in Fig. 5 has 8 places and 24 labelled relations. Thus, there is strong evidence supporting the hypothesis: that the environmental knowledge can be represented in one integrated graph, and with more detail than either in the route or place graph, by mechanisms of inference. Also, the same inferences made to add to the integrated graph can be used to answer questions. Thus, the paper also demonstrated ways of using the integrated route graph ontology for query-answering.

The principles identified leave a number of questions for future work. First, the specifications are conceptual so far, and neither implemented nor comprehensive. For example, other modes of mobility in indoor environments than walking (wheelchair), or walking with certain constraints, are not yet considered. Secondly, the specifications have to be implemented in natural language parsers in order to extract not only explicit triplets [17], but also those that can be inferred from the verbs. And thirdly, the integration of graphs from multiple verbal indoor route descriptions (of the same building) requires new attention, since the intersecting elements are not always explicit [2].