Introduction

Early models of Geographical Information Systems (GIS) have been deeply influenced by quantitative models and geometrical representations of space [18]. Despite the interest of these approaches for cartographical applications, they do not completely reflect the way a human being perceives and describes his/her environment since he/she preferably stores and processes qualitative information [10, 20, 36]. Qualitative terms arise from common sense, i.e., intuitive concepts we daily manipulate to interact with our environment. Over the past few years, naive geography has been established as the field of study that combines modeling concepts inspired from common sense and perception of space [8]. Naive geography addresses the modeling of space using concepts derived from daily experience and human knowledge. In particular, naive geography provides a conceptual framework for the study of verbal descriptions derived from the perception of natural landscapes. The perception of space encompasses cognitive principles that favor memorization of the properties of an environment, and communication of its salient properties to an external referent using natural language [14, 44].

This paper extends an initial work in which the objective was to provide a linguistic analysis of a verbal description [23]. Our approach addresses the modeling and the representation of the verbal description of a scenery by an observer perceiving it. It combines a conceptual analysis of the sentences and terms, and a structure-based study of the spatial and rhythmic properties that emerge from such descriptions. We consider the case of an observer located in an unknown natural environment, perceiving his/her 360° surroundings, and who is asked to provide a description of his/her environment to an off-site addressee. The verbal description of a natural environment generally underlines salient entities, as well as the spatial relations binding them, and the overall structure of the environment. We introduce a conceptual modeling of a landscape description, supported by a semantic model that provides a bridge between the perception of a landscape and a qualitative representation. The aim of our research is to provide a representation resulting from the perception of a natural environment, that favors the search for a mapping between descriptions of sceneries and a GIS representation.

The paper is organized as follows. Section 2 briefly discusses related work on the cognitive representations of natural landscapes. Section 3 presents the experimental study, where a panel of observers are given the task of producing verbal descriptions of a natural environment. Section 4 provides a conceptualization of these verbal descriptions and introduces a conceptual model. Section 5 introduces the spatial views that reflect an environmental scene, while Section 6 illustrates the potential of the approach. Finally, Section 7 draws the conclusions and outlines further work.

Spatial perception and mental representations of space

An observer derives an egocentric perception of an environment that reflects his/her own body of knowledge and experience, and his/her interactions with the proximate environment [32]. The resulting mental representation is not limited to a combination of spatial concepts, and the perception based on common sense not an exact replication of the real world [20]. This directly reflects the close relationship between common sense, cognitive, and sensorial processes [42]. This entails several issues such as how spatial knowledge is acquired by human beings, what the range of linguistic constructs manipulated to define and characterize such an environment is, and to which degree the language people manipulate affects their ability to interact effectively with spatial information.

Cognitive studies make a distinction between the internal level oriented to the neuropsychological processes that generate a mental representation and the external level that reproduces the structure-based organization of the observed phenomenon. Our purpose concerns the latter, and it is oriented to the modeling of an environment based on a verbal description. As shown in previous studies, an observer acting in an environment organizes the perceived space according to proximities and senses [44]. The “proximate environment” is defined as the area perceived by several senses, while the “landscape area” that extends to the horizon is perceived by sight alone [11]. The notion of mental space is defined as a cognitive space derived from cultural, pragmatic, and linguistic concepts, and common-sense-based reasoning [30]. It has been schematized by “eliminating details and simplifying features around a framework consisting of elements and the relations among them” [28]. Tversky categorizes four cognitive spaces defined by the actions achievable at different scales, the “space of the body” that integrates our proper actions and sensations, i.e., the area where entities can be touched ; a “space near the body”, larger than the previous one, and conceptualized as a tridimensional environment where objects are located; “the navigational space”, i.e., the space in which the observer has to navigate to perceive it as a whole; and the “space of graphics” made of graphical representations of space (cf. Fig. 1). Similar categorizations of cognitive spaces have been suggested where the identified categories characterize the different spaces according to the scale of entities and their relation to the observer: “figural space” is described as being the space smaller than the body and composed of small objects, “vista space” as the region visible from a single location, “environmental space” as the region accessible via displacement or navigation, and “geographic space” the space too large to be apprehended directly [31].

Fig. 1
figure 1

Tversky’s cognitive spaces

A mental representation associated to the environmental knowledge is denoted as a cognitive map [42], elsewhere quoted as “map in the head” [21]. Cognitive maps result from acquisitions using various modalities and perspectives, and are “fragmented, schematized, incomplete and multimodal” [43]. A “cognitive collage” aggregates sequences of cognitive maps from different scales and levels of abstraction. It is made of specific cognitive representations, including fragments with imprecise spatial information, particularly for distance or orientation metrics.

Landmarks and salient features are prominent objects used in mental representations of an environment. The role of these structural elements has been characterized by the contribution of Lynch, “The Image of the City” that refers to a cognitive representation where landmarks, nodes, paths, quarters, and barriers are considered as primitive concepts of an urban environment [27]. These structural elements are typical of such an environment, but landmarks should also be considered as salient natural entities used for the orientation of an observer in a natural environment. Couclelis defines a landmark as a reference identifying a fixed point in the environment, and that helps a human for achieving orientating tasks [6]. A landmark is not just useful for navigation as it can also help for the location of entities located in the vicinity of salient places [1]. Landmarks are defined as spatial constructs with key characteristics that make them recognizable in the environment. Sorrows and Hirtle introduce three categories of landmarks either based on their visual properties that contrast with the other entities of the environment, physical characteristics when those play a prominent role or location in the environment, or cognitive qualities for which the meaning prevails [39]. Landmarks that play a prominent role for the qualification of a specific environment constitute key-references for the conceptualization of a natural scene.

Context and experimental study

The perception of natural landscapes can be directly experimented using in situ observations or using various image-based alternatives. Photographs have been recognized as substitutes for in situ landscape surveys as they facilitate the collection of verbal descriptions resulting from the perception of an environment. They have been largely used in studies oriented to the cross-comparison of similarities and differences resulting from the direct or indirect perception of landscapes [19, 29, 35]. It has been observed that although photographs entail some limitations for the precise identification of natural features, and some visual distortions due to the resulting planar projection of a three-dimensional space, they provide a convenient alternative for the representation of visual environments [34, 40]. This consequence is largely due to the high quality of digital photographs and the use of adapted lights and intensities.

Our experimental study has been conducted in the context of several semi-natural landscapes located in France. The experimentation was performed using a panel of 23 participants (18 males and five females), non-experts in GIS, and with no previous knowledge of the considered landscapes. The principles of the experiment are as follows. Several 360° panoramic images were displayed on a computer screen. Photographs were presented in an interactive manner using a Java interface, and the observers were able to explore the scenery by rotating a given view, as they would have done by rotating their body in a natural environment. Photographs were displayed in a large size in order to limit the restriction due to the difference of accuracy between an environment perceived in situ, and that perceived through a photograph. Participants were also able to zoom in on a specific area or entity in order to minimize depth-of-field errors. Sceneries were selected in order to favor an unbiased perception and orientation of the observer, since they were composed of entities with a high structural role (e.g., a footpath crossing the scenery from left to right of the observer, a lake nearby the observer or a footpath running along the seaside that clearly makes the distinction between land and the ocean). After a quick overview of the panoramic photograph using a dynamic interface,Footnote 1 the panelists were asked to perform a virtual tour of the scene, viewing an observer-centered photograph, and spontaneously describe this photograph in order to favor recognition of the site by an off-site addressee. Participants were also asked to specify when they were rotating the photograph (on the left or right) and ideally use relative relations to translate this movement as they would have done when rotating in situ. Verbal descriptions were recorded using a data storage device and were not to exceed 5 min.

An example of description resulting from the perception of the scenery presented in Fig. 2 is as follows: “I’m on a footpath that runs along a castle that was certainly constructed during the Middle Ages, and a pond. In front of me, there is a little valley with the castle on the left of it and at the horizon, I can distinguish a mountain range. Behind me, there is the pond with a large meadow behind and a forest far away”.

Fig. 2
figure 2

Experimental panorama of a semi-natural environment

The verbal descriptions generated by the panelists have led to the following results. Most of the participants describe the panoramic photograph from left to right, taking a reference point to begin their description. Most verbal descriptions are implicitly organized with a hierarchy. 60% of the participants first describe the scene as a whole with sentences like “I am in a mountainous region”. The salient concepts structuring the scene are first mentioned but not located. Next, participants describe the content of the landscape, and the way these concepts are related to the others. This shows evidence of a hierarchical perception of space where the landscape is first perceived and described as a whole, before being specifically described, which confirms previous studies and evidence of hierarchies in the perception of spatial information [15]. When observing the environment, humans perceive distinct concepts that are part of the landscape. Most descriptions contain entities directly related to the observer’s cultural and ontological background, and his/her common sense. A scene description encompasses salient entities, as their role is prominent in the structure and organization of the environment. Participants identified human-made (50%), relief (30%) and vegetation (15%) entities.

These entities are qualitatively associated to others using spatial relations. The observers interpret photographs of an environmental scene by using spatial relations to qualify the location of the concepts such as “behind the mountain”, “in front of the house”, “in the background”, “in the foreground”, “in the long distance”, proximity adjectives such as “near”, “close to”, “far from”, “further”, directional relations such as “to the right of” and a few constructs that generate a tridimensional representation of the scenery such as “above” or “below”. Most of these terms are relative constructs associated to the location of another entity identified in the landscape. It also appears that the roles played by the entities selected by the observer are determined by their contribution to the hierarchical organization of the environment. This confirms Tversky’s intuition on the organization of environment that impacts on the structure of a verbal description and precedes its linear structure [44]. The location of these concepts depends on their proximity to the observer. The proximity spaces that emerge from these descriptions, vary according to the terms used.

The experiment also highlighted the prominent role played by directional relations (50%) which are used twice as often as proximity (30%) and topological constructs (20%). This stresses the fact that directional relations are among the most appropriate for structuring a panoramic view. It is also worth noting the trivial role played by quantitative measurements in the descriptions although two participants used metric data in their description to specify a proximity. This may have been caused by the use of photographs which lack the depth-of-field of three-dimensional space to prompt the descriptions. Moreover, these quantitative measurements are widely associated to an imprecise linguistic term such as “about 500 meters from my position” or “around 400 meters from the house”.

In order to describe their environment, the observers took a perspective that depended on the frame of reference used [41]. Levinson distinguishes three frames of reference that are intrinsic, absolute or relative depending on the point of view of the observer and the described entities [25]. The intrinsic frame of reference is a binary relation in which the location of an entity is defined in relation to another one. The absolute frame of reference is also a binary relation where the location of a given entity is defined by a fixed support (e.g., cardinal directions). Last but not least, the relative frame of reference is a ternary system since the location of an entity is given both from the point of view of the observer, and the location of another entity of the environmental scene. Experiments show that observers mainly combine two frames of reference, the relative and intrinsic ones, depending on the location of the entities. However, participants are most likely to use a relative frame of reference to locate salient entities. This confirms previous work on the prominent role played by relative frames of reference [41].

Conceptual model of an environmental scene

We define an environmental scene as the 360° environment, perceived and described by an observer from a static point of view. It is associated to an anthropocentric description, i.e., what an observer perceives from his/her location. As humans tend to perceive their environment using different levels of spatial perception [11], an environmental scene is qualitatively structured using the concept of proximity spaces respectively defined according to the actions an observer is able to achieve in them. They are defined as follows :

  • The space of the body as introduced by Tversky [43]. It corresponds to the space that integrates our own actions and sensations. It contains easily accessible and recognizable objects.

  • The experienced space that can be easily apprehended by moving around the space of the body.

  • The distant space is the environment located between the experienced space and the space at the horizon, and is hardly accessible without a significant displacement.

  • The space at the horizon is made of silhouettes that constitute the boundaries of the forms of relief, i.e., the boundary between land and sky [5].

Cognitive spaces reflect different levels of interaction that vary with the scale of the entities composing them [31]. They are influenced by the field of vision of the observer, i.e., their distance from the observer [11], and the actions the observer is able to perform in them [43]. Similarly, proximity spaces are determined by the actions the observer is able to perform in them, and their distance from him/her. The way the boundaries of proximity spaces are conceptualized by an observer depends on the landscapes studied. They might be materialized by fiat regions of space, i.e., with non-well-defined boundaries as there is an uncertainty on the location of their boundaries. Alternatively, they can be revealed by qualitative discontinuities that correspond to a physical reality, and are then considered as bona fide boundaries [38].

The description of an environmental scene can be considered as a one-dimensional semantic time-line [24]. This timeline is rhythmed by the succession of sentences, forms, landmarks, and relations that structure the environment. This reflects the fact that space is structured through the use of periodic, recognizable signs and rhythmed by similarities and observable changes [7].

From a conceptual point of view, two definitions of space coexist. On the one hand, Newton argues that the existence of space is independent from the existence of entities composing it. On the other hand, Leibniz argues that space is purely relative, never empty and that it is defined by the entities composing it. Our position is close to the latter as we argue that an environmental scene cannot be defined without specification of the location of entities composing the environmental scene, and the relationships that relate them.

Observable entities are materialized by geographical lexemes, i.e., abstract units of morphological analysis in linguistics that correspond to a set of forms taken by a single word [3]. These units include vegetal entities such as meadows or forests, structural and geomorphological features such as mountains or valleys, water bodies such as lakes or marshlands, and human-made features such as roads or buildings. These geographical entities are closely related to common-sense concepts. Geographical entities that are commonly used in everyday discourse are essential for the apprehension of our daily natural environment. They often appear in cartographic symbols, but geographers or cartographers do not always provide a formal definition of them. Since they are mainly used for the description of an application domain, their physical existence cannot be questioned [37]. However, the formalization of these concepts requires an ontology in order to clearly reflect their semantics.

Ontology is the branch of metaphysics that addresses the nature of spatial characteristics of beings and things that exist [45]. Conceptual models applied to information systems provide a more pragmatic meaning defining ontologies by their main functions as “a specification of a conceptualization” [12]. One of the interests of these approaches is to infer facts that were previously unknown by the system [13]. The main objective of these approaches is to precisely describe the concepts of the application, and to formalize them through a theoretical framework, in order to convey the semantics and syntax appropriate for a specific domain or application. The design of an application ontology consists in a systematic description of the features that characterize this application, and the relations between them and the features specific to the domain.

Since GIS databases rest on ontological commitments [9] and the descriptions of landscapes need to be semantically well structured, we associate a semantic meaning to the terms of the description using an ontology. The construction of our application ontology is based on the knowledge resulting from a taxonomy derived from a topographic database provided by the French Institut Géographique National. This vector database covers the French territory, at a scale of 1:10,000. One third of the collected descriptions are used for the design of the application ontology. A top–down approach is applied for the modeling process, since the entities identified by the description, and referencing geographical entities of the environment are associated to semantic categories, either anthropomorphic or natural [17]. The application ontology is also composed of spatial relations that are categorized by their type, e.g., topologic, orientation, and distance (Fig. 3).

Fig. 3
figure 3

Top level concepts of the application ontology

Schematization of a verbal description

We introduce a schematization approach the objective of which is to facilitate the understanding of the concepts and spatial structures that emerge from the verbal description of a scenery. A verbal description is made of a corpus of sentences. A sentence contains several concepts associated using relations that underline the structure of the scene. We characterize a sentence by concepts, i.e., forms, and landmarks modeled as entities, and relationships that relate them. We consider that an environmental scene is ordered by two orthogonal dimensions: the rhythm produced by the timeline of the sentences generated by the observer [16], and proximity spaces that locate the entities relatively to his/her location. The rhythm of the verbal description is modeled by a linguistic view, and proximity spaces by a proximity spaces view. The approach is completed by a semantic view that takes into account the ontological characteristics of the represented scene, and directional properties modeled using a directional cones view. The principles and properties of these views are developed in the following subsections.

Linguistic view

The linguistic view provides a semantic schematization oriented to the modeling of the properties of the entities that appear in a verbal description. Concepts and spatial relations are represented by their corresponding lexemes, and associated to semantics derived from the application ontology. The linguistic view combines a conceptual diagram with a framework structured by proximity spaces and the rhythm of the description.

The linguistic view gives a representation of the verbal description of the environment, composed of entities related by relationships, and associated to lexemes of the verbal description (Fig. 4). This view is structured by two dimensions, the temporal and spatial ones. The temporal dimension is given by the ordering of the entities and relationships, and the ordering of sentences represented by the vertical bars that illustrate the end of a sentence and the beginning of another one. The spatial dimension is delineated by the limits of the proximity spaces. The linguistic view models the entities identified in the verbal description according to the temporal ordering of the sentences in which they appear, and materializes the spatial relations between them. This timeline reflects a cognitive order of importance, i.e., important or salient entities are firstly perceived and quoted [22, 28].

Fig. 4
figure 4

Linguistic view of a verbal description

The semantics exhibited by the verbal description of a scenery can be closely schematized by a musical score composed of several spatial entities the layout of which constitutes rhythm. According to Bar Yosef, the properties of the musical time organizations and the space concepts are analogous when time is conceived as an axis that transforms time organization and space concept into an analogy between two spaces, the first one-dimensional and the second two- or three-dimensional [2]. In the eyes of what is achieved in the musical domain, the interest of such a metaphor is to better understand the phenomenon of rhythm of a verbal description, and its harmony that reveals the respective role of the entities of the scenery. This representation can be analyzed along a melodic dimension that follows the timeline. This should favor the study of the entities that compose a verbal description and the spatial arrangements that emerge.

The representation of a verbal description by a linguistic view is achieved by a semi-automated data processing that retains the sentence structure. This process is achieved with the Tinki parser, a semi-automatic parser used in natural language processing [26]. The text resulting from the verbal description of an environmental scene is filtered in order to keep the relevant information only. Let us consider the example of the sentence “I’m on a footpath that runs along a castle that was certainly constructed during the Middle Ages”. Since the noun phrase qualifying the “castle” is not directly pertinent for our modeling purposes, it is not retained. A linguistic analysis is applied, taking into account co-references. A co-reference corresponds to a relation between a pronoun and its antecedent. Let us consider the following sentence “On the left, we can see a castle and a valley on the right of it” which becomes “On the left, there is a castle. A valley is on the right of the castle” after processing. This representation outlines the entities quoted several times in the description (or entities with initial co-references such as the one labelled “castle” in the previous example), this reinforcing their role in the scene description. This confirms previous studies that outline the fact that linguistic salience is generally linked to physical or visual salience [4, 22].

Once the linguistic process is achieved, the identified entities are associated to one-to-many proximity spaces. This implies a specification of the distance and directional relations that qualify the location of entities. Let us consider the example “I’m on a footpath. In front of me, there is a little valley with the castle on the left of it and at the horizon, I can distinguish a mountain range”. As the space of the body corresponds to the area where entities can be directly apprehended, the “footpath” is directly located in the space of the body. On the contrary, as language can generate several interpretations, the “valley” and the “castle” can be located into either the experienced or distant proximity spaces. Lastly, specific terms such as “at the horizon, I can distinguish” illustrate specific cases where the entities are located in the space at the horizon. Since the application ontology also integrates distance relations and the associated linguistic terms used by the panel of participants, the allocation of a particular proximity space to an entity is semi-automatically computed.

The principles of the modeling approach being introduced, we hereafter present its formal representation. Let \(\mathcal{P}\) be the set of sentences composing a verbal description, \(\mathcal{D}\) the set of verbal descriptions, \(\mathcal{U}\) the set of elementary units composing a sentence, \(\mathcal{E}\) the set of entities of an environmental scene, and \(\mathcal{R}\) the set of spatial relations including the null element ∅. A verbal description D is formalized as an ordered set of sentences \(p_{i}\in\mathcal{P}\), i.e., D = [p 1, p 2, ...,p n ] where D \(\in\mathcal{D}\) and n ≥ 1. A sentence p i is an ordered set of elementary units \(u_{i}\in\mathcal{U}\), i.e., ∀ i ∈ [1,...,m] with m ≥ n, p i  = [u 1, u 2, ..., u m ]. An elementary unit u i is a triplet such as u i  = [e j , r k ,e l ] with \(e_{j}, e_{k} \in \mathcal{E}\), and \(r_{k} \in \mathcal{R}\).

As the observer implicitly acts as a spatial reference, he/she refers to his/her location at least once during the description of the surroundings, i.e., \(\forall D \in \mathcal{D}, \exists e_{i} \in \mathcal{E}, e_i = observer\). Three classes of spatial relations are used by the panelist to locate entities. Let \(\mathcal{T}\) = {directional, distance, topological} be the set of possible types for the relations, then the function f relation that associates a type to a relation is given by

$$ \begin{array}{ccccl} f{\rm_{relation}} & : & \mathcal{R} \smallsetminus \{\emptyset\} & \to & \mathcal{T}\\ & & r_{i} & \mapsto & {\rm {type}} \\ \end{array} $$
(1)

Spatial entities are linked by relationships, the function h that models the relation r k associating two entities e i and e j is given by

$$ \begin{array}{ccccl} h & : & \mathcal{E}^{2} & \to & \mathcal{R}\\ & & (e_{i}, e_{j}) & \mapsto & r_{k}, \textrm{such as $[e_{i}, r_{k}, e_{j}] \in \mathcal{U}$}. \end{array} $$
(2)

Let \(\mathcal{S}\) = {body, exp, dist, hor} be the set of proximity spaces, \(\mathfrak{P}(\mathcal{S})\) the power set of \(\mathcal{S}\) i.e., the set of subsets of \(\mathcal{S}\), and e i \(\mathcal{E}\). The function f space(e i ) that locates the entities is given by

$$ \begin{array}{ccccl} f{\rm_{space}} & : & \mathcal{E} & \to & \mathfrak{P}(\mathcal{S}) \smallsetminus \{\} \\ & & e_{i} & \mapsto & \{s_{i}\} \textrm{ such as } s_i \in \mathcal{S} \\ \end{array} $$
(3)

With respect to the example given by Fig. 4, we have f space(footpath) = {body} and f space(castle) = {exp, dist}. It is immediate to note that at most four proximity spaces can be associated to a given entity, i.e., ∀ i ,1 ≤ Card(f space(e i )) ≤ 4 where Card() is the cardinality operator. Also, when an entity is located over several proximity spaces, this entity fulfils a contiguity constraint defined as follows

$$ 2 \leq Card(f{\rm_{space}}(e_{i})) \leq 4 \Rightarrow \bigcap f{\rm_{space}}(e_{i}) \neq \emptyset $$
(4)

Let us denote f occurrence(e i , j) the function that returns the number of occurrences of an entity e i in the sentence j

$$ \begin{array}{ccccl} f{\rm_{occurrence}} & : & \mathcal{E}*\mathbb{N}^{*} & \to & \mathbb{N} \textrm{, with $\mathbb{N}^{*}$ the set of natural numbers excluding 0}\\ & & (e_i, j) & \mapsto & k \\ \end{array} $$
(5)

It is also immediate to note that each entity quoted in the verbal description appears at least one time in a sentence, i.e., \(\forall e_{i} \in \mathcal{E}, \exists j \in \mathbb{N}\) such as f occurrence(e i , j) ≥ 1. Finally, f sentence is defined as the function that associates a set of ordering sentence numbers {j} to an entity e i . A non-empty set corresponds to the presence of the entity e i in the sentence(s) {j}

$$ \begin{array}{ccccl} f{\rm_{sentence}} & : &\mathcal{E} & \to & \mathfrak{P}(\mathbb{N}) \\ & & e_{i} & \mapsto & \{j\} \textrm{ such as } f{\rm_{occurrence}}(e_i, j) \geq 1. \\ \end{array} $$
(6)

The verbal description of Fig. 4 has three sentences with f sentence(footpath) = {1} and f sentence(castle) = {1,2}. The constraint ∀ i ,1 ≤ Card(f sentence(e i )) ≤ n , where n is the number of sentences of the verbal description, illustrates the fact that a given entity exists at least in one sentence, and at most in all sentences of the verbal description. The example illustrated in Fig. 4 shows evidence of a progressive description from nearby to distant spaces. A landmark, the castle, is present in two different sentences, thus playing a prominent role on the relative location of the other landmarks of the description. This example confirms that the boundaries between the distant and the experienced spaces are difficult to evaluate in most cases.

Semantic view

The objective of the semantic view is to provide a summarized representation of the entities and spatial relations identified in a verbal description. A semantic view is defined by a finite and connected graph G where nodes model named entities, and edges the named relations (Fig. 5). The semantic graph makes the difference between the node referencing the observer, and the ones that reference spatial entities quoted in the verbal description and represented by the linguistic view. More formally, the semantic graph G of the scene description is given by the pair of elements G = (V, E). The elements of V are the nodes of the graph G, the elements of E are the labelled edges. Each triplet u i corresponds to a subgraph G i where e j and e k are the vertices and r k the edge.

Fig. 5
figure 5

Semantic view of a verbal description

This view outlines the entities that play a central role in the description (e.g., observer, pond and footpath) and the outliers (e.g., forest). It also reveals the diversity of the terms used for the entities and relations derived from the ontology, and thus the variety of the elements identified in the landscape.

Proximity spaces view

The integration of proximity spaces within the semantic view gives another interpretation of a scene description (cf. Fig. 6). Let \(\mathcal{E}{\rm_{body}}\) be the set of entities located in the space of the body, \(\mathcal{E}{\rm_{exp}}\) the set of entities located in the experienced space, \(\mathcal{E}{\rm_{dist}}\) the set of entities located in the distant space, and \(\mathcal{E}{\rm_{hor}}\) the set of entities located in the space at the horizon. The proximity spaces are defined as

$$\begin{array}{rll} \mathcal{E}{\rm_{body}} &=& \{ e_{i} \textrm{ such as } f{\rm_{space}}(e_{i}) = \{s_j \textrm{ such as } \exists s_k \textrm{ such as } s_{k} = {\rm{body}} \}\} \\ \mathcal{E}{\rm_{exp}} &=& \{ e_{i} \textrm{ such as } f{\rm_{space}}(e_{i}) = \{s_j \textrm{ such as } \exists s_k \textrm{ such as } s_{k} = {\rm{exp}} \}\} \\ \mathcal{E}{\rm_{dist}} &=& \{ e_{i} \textrm{ such as } f{\rm_{space}}(e_{i}) = \{s_j \textrm{ such as } \exists s_k \textrm{ such as } s_{k} = {\rm{dist}} \}\} \\ \mathcal{E}{\rm_{hor}} & =& \{ e_{i} \textrm{ such as } f{\rm_{space}}(e_{i}) = \{s_j \textrm{ such as } \exists s_k \textrm{ such as } s_{k} = {\rm{hor}} \}\} \end{array}$$
(7)
Fig. 6
figure 6

Proximity spaces representation

The proximity spaces view illustrates a summarized representation of the entities associated to the proximity spaces. Figure 6 illustrates the prominent role of the distant and experienced spaces in the example considered. It is also worth noting the relations that connect entities between different proximity spaces (e.g., footpath), or within a given proximity space (e.g., pond). Overall, this view provides a representation of the structure of the landscape that results from the verbal description, of the respective importance of the proximity spaces identified, and the relative depth of field of the image of the landscape perceived by the observer.

Directional cones view

Humans also tend to structure space using bodily directions that relate the perceived entities to their location in space. The way these directional relations organize space implicitly generates a partition of space, often represented using conceptual directional cones [33]. A cone-based partition emerges from the directional relations identified. The proximity spaces view is enriched by the directional relations and an orientation-based structure of space. We retain a cone-based partition with four possible directions : front, back, right, and left (cf. Fig. 7).

Fig. 7
figure 7

Directional cones representation

A directional cone is detected when at least one directional relation exists between the observer and an entity of the environmental scene. The number of directional cones can vary from two (front–back or right–left) to four (front, back, right, and left), depending on the directional relations used between the observer and an entity or a group of entities to describe the surroundings. More formally, let <  t be a temporal ordering operator of the terms of the description, i.e., entities and relations of elementary units, \(\mathcal{E}{\rm_{front}}\) the set of entities located in the directional cone in front of the observer, \(\mathcal{E}{\rm_{back}}\) the set of entities located in the back cone, \(\mathcal{E}{\rm_{left}}\) the set of entities located in the left cone, \(\mathcal{E}{\rm_{right}}\) the set of entities located in the right cone, and x a directional relation such as x ∈ \(\{front, back, left, right\} \subset \mathcal{R}\). We define \(\mathcal{E}_{x}\) such as

$$\begin{array}{lll} &&\mathcal{E}_{x} = \{ e_{m} \textrm{ such as } e_n <_{t} e_i \leq_{t} e_m \textrm{ with } (f{\rm_{relation}}(h(e_n, e_i))={\rm{directional}}\\ &&{\kern24pt}\quad\land e_n={\rm{observer}}) \land ((h(e_n, e_i))=x \} \end{array}$$
(8)

Since the application ontology makes the distinction between the directional relations and the others, the process of detection of a new directional cone is automatically done when a directional relation is specified between the observer and an entity or a group of entities. The user should finally manually adjust the number of directional cones and their composition. The example of Fig. 8 illustrates the roles played by directional relations in the verbal descriptions. It appears here that directional relations tend to reference distant entities in order to make their location more precise. Conversely, they are barely used for nearby entities.

Fig. 8
figure 8

Directional cones search

This directional-based view clearly makes a difference between the spaces in front and behind the observer. It provides another characterization of the verbal description. This allows a preliminary spatialization of the identified entities thanks to an observer-centered structure of the scenery. The scene that results from the perception of an environmental scene is then structured by the proximity spaces, and the directional cones that provide an observer-centered reference for the location of entities.

Conceptual map of an environmental scene

The successive views provide a semantic foundation for the derivation of a conceptual map that should reconcile them within a representation that integrates the proximity spaces, directional cones, entities, relationships and sentences of the verbal description. We define such a “conceptual map” as an abstraction that characterizes the mental mapping of a verbal description. It should provide a bridge between the linguistic characteristics of a verbal description, its mental abstraction, and the semantics of the scene observed.

Figure 9 gives an example of such a conceptual map where space is structured into two areas, i.e., the front and back of the observer. The conceptual map that results from this example shows evidence of a close relationship between the different sentences described and the entities identified. This reflects and generates a sort of continuity of the description, this being reflected by the intertwining of the salient features that emerge.

Fig. 9
figure 9

Conceptual map of a semi-natural landscape

These conceptual maps allow comparison of descriptions either made by different observers, or resulting from different landscapes. Let us consider another example of verbal description, the one of an observer walking along a seacoast of Brittany, France and illustrated by Fig. 10: “Behind me, there is a huge lighthouse and four houses nearby. I’m on a footpath, and on my right there is a grassy hill. On my left is the Ocean”. The conceptual map that emerges from this verbal description shows a clear separation between the different parts of the scene. The observer mainly uses bodily-centered directions to qualify the location of the entities such as the “footpath”, the “lighthouse” and the “ocean”. There are a few connections between the entities, this reflecting a landscape with a clear separation between the different entities identified by the observer, the sentences and the entities, and the cone-based regions of the scene. It is also worth noting the small number of entities identified, this illustrating the relative flatness of the landscape. Last, the “lighthouse” clearly denotes a salient role as several entities are spatially related to it.

Fig. 10
figure 10

Conceptual map of a maritime environment

Conclusion

The research presented in this paper introduces a language-based and cognitive approach that models the verbal description of a landscape scene. A verbal description is modeled by a qualitative, structural and proximity-based representation that reflects its semantics. The structural properties of a verbal description are linked to several semantic views that attempt to reflect the user’s perception, and rely on a semantic-based analysis. Semantics, proximity-spaces and directional cone-based views together provide a step towards a “conceptual map”, conceived as an abstraction that characterizes an environmental scene from its verbal description. The model is based on the spatial structure, the rhythmic organization of space, the diversity, ordering and salience of the entities identified by the observer, and reflect the main characteristics and entities of the environment. Such a model qualifies and characterizes natural landscapes, and provides a framework for the analysis of the properties of the verbal descriptions made by different observers, and cross-comparisons of different landscape descriptions. Although experienced in natural landscapes, this modeling approach could potentially be applied to urban environments.

Further work concerns an evaluation of the properties of the salient entities identified in a description of a given landscape and the design of a prototype that should provide a preliminary resource for the development of a mapping between a scenery description of an observer and a conventional GIS representation. It should provide a support for the georeferencing of the observer from the analysis of the salient and structural components of a scenery description.