Keywords

1 Introduction

Data driven scientific discovery approach has been an important paradigm for computing in many central areas including Internet of Things, social networks, remote sensors, etc. Under this paradigm, mobility data commonly named trajectory data is the core that reveals the details of instantaneous behaviours conducted by mobile entities. Basically, trajectory data is a record of the evolution of the position (perceived as a point) of an object that is moving in space during a given time interval in order to achieve a given goal [9]. Actually, working on this field is a fresh but active matter which is essentially due to the rise of applications, pervasive devices and positioning technologies offering mobility data. The management of collected mobility data is expected to extract useful knowledge about moving object and facilitates, then, the understanding of their behaviour from analytic and cognitive perspectives. Therefore, collected mobility data gave rise to different trajectory data models achieved either by enhancing classical “conceptual models” used for designing database schema or by proposing new ones such as “ontology-based representations”. Yet, disparate trajectory data, stored and manipulated in classical and semantic databases provides a limited support for analysing and understanding mobile objects behaviour and activities represented by heterogeneous trajectory data models.

To fulfil this need, the design model of trajectory data can be expressed through formal languages, for instance by using Description Logics formalism (DL) or one of its fragments [4]. Together with DL, one may also consider developing and using ontologies. Some studies argue that ontologies exceed conceptual models by making them consensual and enriching them by reasoning mechanisms [2]. In addition, Ontologies allow a shared understanding and may offer a common model for different structures and representations of trajectory data where designers can pick the appropriate knowledge to define trajectories in view of share, exchange or integration. Alongside, Trajectory Data Warehouse (TrDW) is considered as an efficient tool for analysing and extracting valuable information from heterogeneous trajectory data sources.

For this purpose, this paper sheds light on a Semantic Trajectory Data Warehouse (STrDW) using the best of both Data Warehousing and Semantic Data Modelling worlds. The STrDW will mainly (i) emphasize a generic shared trajectory ontology that explains the semantics of these data in an unambiguous way and (ii) defines the STrDW conceptual model. Our proposal permits to save too much designers efforts and time needed to acquire domain knowledge since the latter is extracted from the generic ontology. The STrDW will mainly highlights the trajectory to be seen as a first class semantic concept, providing an ontology-based multidimensional model. The generic shared ontology that we propose is an OWL-DL formalism that covers the most important existing conceptual and ontological trajectory data models. We focus on DL formalism because DL is able to capture the most popular data class-based modelling formalisms frequently used in databases, warehouses and information systems analysis [4].

The outline of this paper is structured as follows. In Sect. 2, we analyse the evolution of warehousing approaches towards the birth of trajectory data warehousing approaches. Section 3 outlines basic notions required to the understanding of our approach. Section 4 pinpoints an ontology driven approach that describes a STrDW conceptual model by using a generic trajectory ontology. Section 5 illustrates our work with a case study dealing with Edinburgh informatics forum. Section 6 concludes the paper and suggests some future work.

2 Related Research: Towards a STrDW

The predominant step for extracting knowledge from trajectory data is to provide a design model able to represent moving objects. In that the database community has stored and managed such type of data in Spatio-Temporal Databases (STDB) [18] and Moving Object Databases (MOD) [13, 21] by the definition of spatio-temporal data types inter alia,moving point and moving region data types. However, current DBMS ability, even extended to support spatio-temporal data, is limited only on storing raw trajectory which omits any semantic information and/or analysis capabilities. To make an efficient exploitation of this data, there were attempts to enrich it with semantic annotations in order to support different views of knowledge. For that purpose, recently, ontology building and logics attracted researches aimed at supporting trajectory-based applications with semantic approaches [1, 16, 20, 22]. The majority of these approaches deal with ontology as a storage repository and not as a domain ontology, where designers can pick concepts and properties to represent trajectory data.

There are many ways for efficiently analysing trajectory data. Warehousing and mining techniques are, among others, supporting the extraction of valuable information from disparate raw trajectory. Focusing on our research area, TrDW is the application of data warehousing techniques on trajectory data [7, 19]. Before getting to the TrDW, research communities were interested in analysing spatio-temporal data in Spatio-Temporal Data Warehousing (STDW). There have been various proposals of multidimensional models for STDW [23] aiming at the integration of various data sources containing spatio-temporal data. Trajectory data is a particular case of spatio-temporal data characterizing objects mobility. Then, a TrDW is obviously a particular case of STDW where trajectory is the fact [3, 7, 17]. However, obtaining an implementation of the DW is a complex task that often forces designers to acquire wide knowledge of the domain, thus requiring a high level of expertise and becoming it a prone-to-fail task. In real-world projects, we have faced up with a set of situations i.e., additivity and conformed dimensions in which we believe that the use of some kind of knowledge resources will improve the design task of data warehouses.

In the light of these issues, ontologies seems to be a promising solution, since they are common conceptualization of a universe of discourse representing shared knowledge in terms of classes and properties that is formal, consensual and referenceable [14]. The first attempt to set a Semantic Spatio-temporal Data Warehouse is given by authors in [5] which annotate the datacube elements with domain ontologies as well as mathematical ontology. On top of this, substantial research has been conducted on methods and tools for designing the DW through ontologies. The team in [10, 11] gathers domain ontologies and semantically annotated data resources. Authors in [12] presents the OLAP cube in the basis of an OLAP design ontology. The work of [6] defines a DW in the basis of a global ontology integrating local ontologies of ontology-based database sources participating in the integration process. [15] defines the DW multidimensional model from an ontology by identifying functional dependencies between concepts. In the following (Table 1), we summarize the evolution of reviewed warehousing approaches according to these criteria: type of warehouse (DW, STDW, TrDW) and the used technique for designing the DW (ontology, conceptual).

In this context, we hold a different point of view for unifying the modelling and the analysis of trajectory data. The innovation of our work consists of offering a generic trajectory ontology that describes heterogeneous mobility data sources. The shared trajectory ontology covers most important existing formalisms and representations of trajectory concept. This ontology serves as semantic layer for the STrDW allowing the analysis of heterogeneous trajectory data sources.

Table 1. Evolution of warehousing approaches

3 Problem Definition

We focus in our study on different representations of trajectory data. In the following, we outline basic facts (representations) relevant to our work:

Definition 1

(Raw trajectory). A sequence of spatio-temporal position recording the trace of a moving object i.e., \(\{\)(x\(_{0}\), y\(_{0}\), t\(_{0}\)), ..., (x\(_{n}\), y\(_{n}\), t\(_{n}\))\(\}\), where x\(_{i}\), y\(_{i}\), t\(_{i}\) \(\in \) \(\mathfrak {R}\) for i=0, ..., N and t\(_{0}\) t\(_{n}\).

Definition 2

(Structured trajectory). A set of sub-trajectories according to predefined paths. A sub-trajectory includes strictly one Begin and one End. It includes also at least one Stop. Moves are used to connect stops to other elements (Stop, Begin, End) i.e., \(\{\)(Sub-trajectory\(_{1}\), ..., Sub-trajectory\(_{n}\)), Sub-trajectory= \(\{\)Begin, Move\(_{1}\), ..., Stop\(_{n-1}\), Move\(_{n}\), End\(\}\), Begin=\(\{\)x\(_{0}\), y\(_{0}\), t\(_{0}\) \(\}\), Stop\(_{n-1}\)=\(\{\)x\(_{n-1}\), y\(_{n-1}\), t\(_{n-1}\) \(\}\), End=\(\{\)x\(_{n}\), y\(_{n}\), t\(_{n}\) \(\}\) \(\}\) where x\(_{i}\), y\(_{i}\), t\(_{i}\) \(\in \) \(\mathfrak {R}\) for i = 0, ..., N and t\(_{0}\) t\(_{n-1}\) t\(_{n}\).

Definition 3

(Trajectory with ROI). A sequence of visited places (regions) and intervals. A region is a set of consecutive line segments i.e., \(\{\)(ROI\(_{1}\), ..., ROI\(_{n}\)), ROI\(_{i}\)=(Region\(_{i}\), Interval\(_{i}\))\(\}\) where i \(\in \) \(\mathfrak {R}\) for i= 1, ..., N and Interval\(_{1}\) before Interval\(_{N}\).

Definition 4

(Semantic trajectory). A structured trajectory where spatio-temporal positions are annotated. Begin, stop, move, and end become geographical concepts linked to points of interest rather than spatio-temporal data i.e., Semantic Trajectory=\(\{\)(SemanticSub-trajectory\(_{1}\),..., SemanticSub-trajectory\(_{n}\)), SemanticSub-trajectory = \(\{\)SemanticBegin, SemanticMove\(_{1}\), ..., SemanticStop\(_{n-1}\), SemanticMove\(_{n}\), SemanticEnd\(\}\), SemanticBegin= \(\{\)x\(_{0}\), y\(_{0}\), t\(_{0}\), Point of Interest\(\}\), SemanticStop\(_{n-1}\)= \(\{\)x\(_{n-1}\), y\(_{n-1}\), t\(_{n-1}\), PointofInterest\(\}\), SemanticEnd= \(\{\)x\(_{n}\), y\(_{n}\), t\(_{n}\), PointofInterest\(\}\) \(\}\) where x\(_{i}\), y\(_{i}\), t\(_{i}\) \(\in \) \(\mathfrak {R}\) for i=0, ..., N, t\(_{0}\) t\(_{n-1}\) t\(_{n}\), and PointofInterest is a geographical place.

Definition 5

(Semantic ROI). A trajectory with ROI annotated with semantic information i.e., trajectory with Semantic ROI=\(\{\)(SemanticROI\(_{1}\), ..., SemanticROI\(_{n}\)), SemanticROI\(_{i}\)=(Region\(_{i}\), Interval\(_{i}\), Pointof Interest)\(\}\) where i \(\in \) \(\mathfrak {R}\) for i=0, ...,N, Interval\(_{1}\) before Interval\(_{N}\), and PointofInterest is a geographical place.

Definition 6

(Space-time path). A semantic trajectory extended with mobile object activity i.e., space-time path=\(\{\)(Space-time\(_{1}\), Activity\(_{1}\)), ..., (Space-time\(_{n}\), Activity\(_{n}\)), Space-time\(_{i}\)=\(\{\)x\(_{i}\), y\(_{i}\), t\(_{i}\), PointofInterest\(\}\) \(\}\) where i \(\in \) \(\mathfrak {R}\) for i=0, ..., N. PointofInterest is a geographical place, and activity is a contextual information about moving object activity.

4 STrDW Approach

The outburst of ontologies in web applications and their use by different companies leads to the creation of important amount of web data referencing ontologies. These data are called Ontology-Based Moving Object Data (OBMOD). Some solutions proposed to manage OBMOD in main memory like Protégé are primarily used for designing ontologies and Jena TDB, Virtuoso for publishing triples cannot offer affordable performance for handling huge amount of trajectory datasets. To overcome this problem, data warehousing solutions have been proposed offering efficient storage and querying mechanisms for heterogeneous OBMOD. The generated data warehouses are called Semantic Trajectory Data Warehouses (STrDWs) which are data warehouses storing both ontology and trajectory data (Fig. 1). Our objective in this paper is to define an ontology-based design approach for modelling and analysing heterogeneous OBMOD. To fulfil this objective, we need to define: (i) a global trajectory shared ontology that explains the semantics of these data and (ii) the structure of the STrDW conceptual model. The following subsections will describe each step.

Fig. 1.
figure 1

Proposed framework

4.1 Trajectory Global Ontology

In this section, we define the global ontology. We adopt a modular approach to facilitate reusability and possible design extension. The proposed model holds three modules: Geometric Trajectory Ontology (GTO), Geographic Ontology (GO) and Application Domain Ontology (ADO).

Geometric Module. GTO holds resources to describe how moving objects movement can be understood and trajectories can be represented. It covers most important trajectory data works like raw trajectory which describes trajectory as (position, timestamps). Structured trajectory that organizes trajectory in sub-trajectories which include a begin, a set of stops, moves and End. Also, it provides a set of relations between concepts like hasBegin, hasEnd, hasStop, hasMove. Then, trajectory with ROI represents trajectory in the form of moving regions. In addition, it enriches structured trajectory and trajectory with ROI with geographical features through the relation hasGeometry to give birth respectively to Semantic Trajectory and semantic ROI. The proposed model presents also trajectory in the form of space-time path which annotates semantic stops with corresponding activities. This set of resources allowed us to define a generic ontology-based geometric facet for trajectory data and supports linking trajectory concepts to application and geographical concepts (Fig. 2).

Fig. 2.
figure 2

Geometric Trajectory Ontology

Application Domain Module. ADO contains resources relevant to a field such as traffic management, bird migration, transportation, etc. This module describes the mobile object i.e., animal or person and possible activities related to the displacement of the moving object like physical activities i.e., reading and virtual activities i.e., mailing. The module presents also points of interest relevant to the application domain i.e., university.

Geographic Module. GO contains concepts about the geographic environment in which mobile objects involved. Concepts are likely to include those describing the topography of the land (e.g. mountain, lake), networks (e.g. road network, railway network), building places (e.g. home, work, supermarket) and anything else that is of interest to the application. This module is closely related to the geometric trajectory module, as each trajectory concept that has a spatial implication is to be linked to a type of geography that is used by the application to specify the corresponding spatial measure. The module is also related with the application domain module as its concepts may also have a thematic description providing application information beyond geographic and geometric facets. For example, concepts about building places may include standard schemes defined in the geographic module in addition to other features specific to the application domain (Table 2).

Table 2. A description of geographic ontology

Combining the GTO, GO and ADO together leads to the final overall Generic Semantic Trajectory Ontology. This final ontology maintains interoperability since it is a modular approach (domain oriented), ensures genericity as it covers most important trajectory data works and assures consensuality because it is based on commonly and shared conceptualizations by mobility data community.

4.2 STrDW Schema Design

In this section, we discuss the design of the STrDW supposed to be tailored around semantic concepts that allow the specification of thematic and spatio-temporal aspects of the moving object and its trajectory. The STrDW design starts from the annotation process using an ontology-based design methodology.

Aforementioned works in the state of the art proposed ontology-based methods for the design of semantic data warehouses. Most of these works hold a single domain ontology and threshold values must be set by the designer for the annotation process of the warehouse resources. This, clearly increases the complexity of design task for autonomous designers. In this work, the design of the STrDW is derived from the global ontology. This is done by importing all resources related to the chosen trajectory representation by the designer (i.e., raw trajectory, space-time path) from the generic ontology defined in the previous section to the STrDW conceptual model. A sub-ontology model is then extracted to be called Semantic Trajectory Data Warehouse Ontology (STrDWO). The following extracted model need then to be annotated by multidimensional roles such as: fact, dimension, measure and dimension attribute. The annotation phase identifies the multidimensional role of each resource in the STrDWO.

Broadly, what is most evident about semantic models built around trajectory data is that there are always spatio-temporal resources representing time varying geometry nature of trajectory data, added to the thematic part its application-specific aspect. For that, inhere we suppose that a generic ontology is generally composed of four types of resources: fact, thematic, temporal and spatial which are represented respectively within the following modules (sub-ontologies):

  • GTO: The fact is the trajectory representation type selected by the designer i.e., structured trajectory;

  • ADO&GO: Application domain concepts gathers at a time resources relevant to the mobile object, its activities during the travel and visited points of interest.

  • Temporal Ontology: Temporal concepts and roles are based on the standard Time-owl ontologyFootnote 1 developed by W3C.

  • Spatial Ontology: Spatial concepts and roles are based on Geo ontologyFootnote 2 developed likely by the W3C standard.

The question to be asked here is then:

“How to extract STrDW design model from these 4 parts and make resulting model take into consideration the special nature of disparate trajectory data?”

By analysing the generic ontology, spatial and temporal sub-ontologies, temporal and spatial concepts are identified: Instant, Interval for the temporal sub-ontology, and Point, Line, Region for the spatial sub-ontology. An Interval is described by a Start Date and an End Date instantiating the Instant concept. So, a first step for the annotation process related to the STrDW is to identify concepts and relations from the GTO, ADO and GO sub-ontologies, that could be assimilated to the aforementioned spatial and temporal concepts.

Indeed, the following Algorithm 1 is proposed to bring out resources. Since the thematic part is the subject of analysis, the STrDW’s Fact might be extracted from the GTO concepts according to the type of trajectory chosen by the designer. A STrDW is, also, to be mainly composed of spatial and temporal dimensions, extracted from GTO concepts that are assimilated to concepts from the spatial and temporal sub-ontologies. The fact measures are time and space represented by spatial and temporal concepts from the GTO sub-ontology. To construct fact, spatial, temporal and thematic dimensions the following algorithm is proposed:

figure a

5 Case Study: Edinburgh Informatics Forum

We illustrate the generic semantic trajectory modelling approach by using a case study related to Edinburgh informatics forum Footnote 3 [8]. In the remaining subsections, we present the application scenario and we drive the STrDW conceptual model.

5.1 Application Scenario

This subsection is aimed to illustrate the application scenario. The hereinafter conducted researches are motivated by the scenario related to a set of pedestrians trajectories walking through the Informatics Forum, the main building of the School of Informatics at the University of Edinburgh. Data holds several months of observation which has resulted in about 1000 observed trajectories each working day.

The source trajectory datasets are time-stamped locations. Additional information related to pedestrian, and its activity during the trip, are provided too. The main components of the trajectory dataset are [8]:

  • Reference \(_{i}\): the trajectory’s reference;

  • Long and Lat: are respectively longitude and latitude, the spatial coordinates of the pedestrian’s position;

  • Start-date and End-date: are respectively the start and the end temporal coordinates of a pedestrian’s trajectory.

Actually, the movement of pedestrians is still relatively unknown. In this work, our research team is interested in collecting and analysing data becoming from these pedestrians to understand their behaviour from cognitive and analytic perspectives. Clearly, it is hard to exploit raw trajectory data to that end. For that, a semantic layer was added to trajectory data and prominent semantic components were revealed. The STrDW design we developed is tailored around main concepts from the aforementioned ontological model, and that’s what makes this former support trajectory semantic concepts.

5.2 The Design Model

The STrDW model is derived from the already existing semantic layer including thematic, spatial and temporal ontologies. The designer identifies resources and their coordinates according to the mentioned trajectory representation type. For example, we consider in this case the trajectory representation type “Space-time path”. A Space-time path is defined as follows:

Space-time path  \(\underline{\texttt {equivalentTo}}\)  SemanticStop  \(\cup \)  Activity

Semantic Stop  \(\underline{\texttt {isa}}\)  Stop.hasGeometry PointofInterest

Stop  \(\underline{\texttt {isa}}\)  Point  \(\cap \) Interval

In addition, the designer instantiates the ADO according to the case study. In our case, the mobile object (pedestrian), activities (phone call, drink coffee, walking, eating) and visited points of interest (stairs, night exit, coffee, elevator, labs, front door) as illustrated in (Fig. 3).

Fig. 3.
figure 3

Application domain ontology

The ADO is linked to GTO (i) and GO (ii) respectively by using the following statements:

  1. (i)

    Pedestrian \(\underline{\texttt {hasTrajectory}}\) Trajectory Activity \(\underline{\texttt {equivalentTo}}\) ActivityS SemanticROI \(\underline{\texttt {IsLocatedIn}}\) PointOfInterest SemanticSub-Trajectory \(\underline{\texttt {IsLocatedIn}}\) PointOfInterest SpaceTimepath \(\underline{\texttt {IsLocatedIn}}\) PointOfInterest

  2. (ii)

    StreetA \(\underline{\texttt {equivalentTo}}\) StreerG PointOfInterest \(\underline{\texttt {equivalentTo}}\) BuildingPlace

The projection of resources allows then the extraction of sub-ontology STrDWO from the global ontology. This step is of paramount importance because it will permit, later, the definition of the STrDW conceptual model based on ontological concepts that express as much as possible effective user’s requirements (trajectory representation type). In addition, user’s requirements are also used for the annotation of the STrDWO by multidimensional concepts such as fact, dimension, measures and dimension attributes, to result on the STrDW conceptual model. A first possible design model for the application scenario is given in (Fig. 4).

Fig. 4.
figure 4

Proposed model of the STrDW

5.3 Analysis

Here is a statement that incorporates a user requirement example:

Q:“Analyse pedestrian activities in a given time interval in a specific point of interest”

The result to analyse is the rate of different pedestrian activities in specific place and time. The aforementioned result is quantified by some metrics which are in this case duration\(\_\)stop and time\(\_\)allocation. The criteria influencing this result are time, space and pedestrian characteristics (gender and age). The design model for the application scenario given in (Fig. 4) appeals numeric measures (duration\(\_\)stop and time\(\_\)allocation) and 3 dimensions:

  • Time-Dim: organized following the hierarchy: second, minute, hour, day, month and year;

  • Space-Dim: organized following the hierarchy: position, stop;

  • Pedestrian-Dim: represented by the pedestrian’ attributes: name, gender, age and identifier.

Space-Dim is the spatial dimension of the model, and contains two levels related to a spatial hierarchy. Those levels reference geometric objects. The aggregation function applied against the measure Activity-Rate is actually the rate of pedestrian activities (phone call/drink coffee/walking/eating) calculated using the following formula Activity-rate \(=\) \(\frac{Walking-Sum}{All-Activities-Sum}\). This custom aggregation function is implemented to take into consideration the requirement inflicted by our model and its aims. The fact table is composed of dimensions keys at their lower level that form the symbolic coordinates for the value of the measure. In this model, activities are the subject of the multidimensional analysis, so the designer can deduce information about the activity of the pedestrian during a special period of time and location in the forum.

6 Conclusion and Future Work

Throughout this work, we have been motivated by the need to support applications dealing with heterogeneous trajectory data sources. To meet this need, we first presented a trajectory shared ontology which served as semantic layer, where designers can pick resources to represent their trajectories. Then, we offered a STrDW ontology-based approach for modelling and analysing heterogeneous OBMODs allowing interoperability, reusability, and maintenance between applications supporting trajectory data.

Research on this topic is crucial for expanding the usefulness of multidimensional models to non-traditional applications. The STrDW contains huge amounts of mobility data, so optimization issues are of paramount importance either for data storage and retrieval issues.