Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Ontology Based Data Access

Ontology Based Data Access (OBDA) [4] is a paradigm of accessing data trough a conceptual layer. Usually, the conceptual layer is expressed in the form of an RDF(S) [10] or OWL [15] ontology, and the data is stored in relational databases. The terms in the conceptual layer are mapped to the data layer using mappings which associate to each element of the conceptual layer a (possibly complex SQL) query over the data sources. The mappings have been formalized in the recent R2RML W3C standard [6]. This virtual graph can then be queried using an RDF query language such as SPARQL [7].

Formally, an OBDA system is a triple \(\mathcal{O}=\langle \mathcal{T},\mathcal{S},\mathcal{M}\rangle \), where:

  • \(\mathcal{T}\) is the intensional level of an ontology. We consider ontologies formalized in description logics (DLs), hence \(\mathcal{T}\) is a DL TBox.

  • \(\mathcal{S}\) is a relational database representing the sources.

  • \(\mathcal{M}\) is a set of mapping assertions, each one of the form

    $$ \varPhi (\varvec{x})~ \leftarrow ~ \varPsi (\varvec{x}) $$

    where

    • \({\varPhi (\varvec{x})}\) is a query over \(\mathcal{S}\), returning tuples of values for \(\varvec{x}\)

    • \({\varPsi (\varvec{x})}\) is a query over \(\mathcal{T}\) whose free variables are from \(\varvec{x}\).

The main functionality of OBDA systems is query answering. A schematic description of the query transformation process (usually SPARQL to SQL) performed by a typical OBDA system is provided in Fig. 1. In such an architecture, queries posed over a conceptual layer are translated into a query language that can be handled by the data layer. The translation is independent of the actual data in the data layer. In this way, the actual query evaluation can be delegated to the system managing the data sources.

Fig. 1.
figure 1

Query processing in an OBDA system

2 The Ontop Framework

Ontop is an open-source OBDA framework released under the Apache license, developed at the Free University of Bozen-BolzanoFootnote 1 and currently acts as the query transformation module of the EU project OptiqueFootnote 2.

As an OBDA system, to the best of our knowledge, Ontop is the first to support all the following W3C recommendations: OWL, R2RML, SPARQL, SWRL and SPARQL OWL 2 QL regime. In addition, all the major commercial and free databases are supported. For each component of the OBDA system, Ontop supports the widely used standards:

  • Mapping. Ontop supports two mapping languages: (1) the native Ontop mapping language which is easy to learn and use and (2) the RDB2RDF Mapping Language (R2RML) which is a W3C recommendation.

  • Ontology. Ontop fully supports OWL 2 QL ontology language [11], which is a superset of RDFS. OWL 2 QL is based on the DL-Lite family of description logics [5], which are lightweight ontologies and guarantee queries over the ontology can be rewritten to equivalent queries over the data source. Recently Ontop is also extended to support the linear recursive fragment of SWRL (Semantic Web Rule Language) [8, 16].

  • Data Source. Ontop supports all the databases which implement SQL 99. These include all major relational database systems, e.g., PostgreSQL, MySQL, H2, DB2, ORACLE, and MS SQL Server.

  • Query. Ontop essentially supports all the features of SPARQL 1.0 and SPARQL OWL QL Regime of SPARQL 1.1 [9]. Supporting of other features in SPARQL 1.1 (e.g., aggregates, property path queries, negations) is ongoing work.

The core of the Ontop is the SPARQL engine Quest which supports RDFS and OWL 2 QL entailment regimes by rewriting the SPARQL queries (over the virtual RDF graph) to SQL queries (over the relational database). Ontop is able to generate efficient (and highly optimized [13, 14]) SQL queries, that in some cases are very close to the SQL queries that would be written by a database expert.

The Ontop framework can be used as:

  • a plugin for Protege 4 which provides a graphical interface for mapping editing and SPARQL query execution,

  • a Java library which implements both OWL API and Sesame API interfaces, available as maven dependencies, and

  • a SPARQL end-point through Sesame’s Workbench.

3 A Demo of the Movie Scenario

In this section, we describe a complete demo of Ontop using the movie scenario [12]. The datasets and systems are available onlineFootnote 3.

3.1 Movie Scenario Dataset

The Movie Ontology. The movie ontology MO aims to provide a controlled vocabulary to semantically describe movie related concepts (e.g., Movie, Genre, Director, Actor) and the corresponding individuals (“Ice Age”, “Drama”, “Steven Spielberg” or “Johnny Depp”) [3]. The ontology contains concept hierarchies for movie categorization that enables user-friendly presentation of movie descriptions in the appropriate detail. There are several additions to the ontology terminology due to the requirements in the demo, e.g., concepts TVSeries and Actress.

IMDb Data. IMDB’s data is provided as text filesFootnote 4 which need to be converted into an SQL file using a third party tool. Our IMDB raw data was downloaded in 2010 and the SQL script was generated using IMDbPYFootnote 5. IMDbPY generates an SQL schema (tables) appropriate for storing IMDB data and then reads the IMDB plain text data files to generate the SQL INSERT commands that populate the tables. It can generate PostgreSQL, MySQL and DB2 SQL scripts. In this demo we use a PostgreSQL compatible script and database takes up around 6 GB on the disk.

Mappings. The mappings for this scenario are natural mappings that associate the data in the SQL database to the movie ontology’s vocabulary. They are “natural” mapping, in the sense that the only purpose of the mappings was to be able to query the data through the ontology. There was no intention to highlight the benefits of any algorithm or technique used in Ontop. The first version of the mappings for this scenario were developed by students of Free University of Bolzano as part of an lab assignment. The current mappings are the improved version of those create by our development team.

Queries. We included around 40 queries which are in the file movieontology.q and can be used to explore the data set. The queries have different complexities, going from very simple to fairly complex. Note that some form of inference (beyond simple query evaluation) is involved in most of these queries, in particular, hierarchies are often involved.

Fig. 2.
figure 2

Movie ontology

3.2 Using Protege Plugin

We demonstrate how to use Ontop as a protege plugin. The steps are:

  1. (1)

    Start PostgreSQL with IMDb data.

  2. (2)

    Start Protege with ontop plugin from command line.

  3. (3)

    Open the OWL file movieontology.owl from Protege. The Ontop plugin will also automatically open the mapping file movieontology.obda and query file movieontology.q.

  4. (4)

    Check the ontology and mappings. Two screen shots of the ontology and mappings are shown in Figs. 2 and 3.

  5. (5)

    Start the Quest reasoner from the menu.

  6. (6)

    Run sample queries and check the generated SQLs. For example, we can execute the query “Find names that act as both the director and the actor at the same time produced in Eastern Asia” as shown in Fig. 4.

Fig. 3.
figure 3

Movie mappings

Fig. 4.
figure 4

Example query

3.3 Using Java API

We show how the movie scenario can be implemented using the Ontop java libraries through OWL API and sesame API. The complete code for the demo is available onlineFootnote 6.

Using OWL API. The OWL API is a Java API and reference implementation for creating, manipulating and serializing OWL Ontologies [2]. In the first example we use OWL API to execute all the 40 SPARQL queries over the movie ontology, using the mapping in our obda format and a PostgreSQL database with the IMDb data.

Ontop uses Maven to manage the dependencies. Since the release of version 1.10, Ontop itself has been deployed to the central maven repository. All artifacts have the same groupId it.unibz.inf.ontop. In this example we use the OWL API interface of Ontop, so we put the following in the pom.xml:

figure a

Moreover we need the dependency for PostgreSQL JDBC driver as shown below.

figure b

The files needed to start the Ontop reasoner are the ontology file movieontology.owl and the obda file movieontology.obda. The obda file contains both mappings and database settings. This allows to access the data in the PostgreSQL database using the mappings in the OBDA model. First we load the OWL file and OBDA file:

figure c

Next we create a new instance of the reasoner (QuestOWL reasoner), adding the necessary preferences to prepare its configuration. We prepare the

figure d

Ontop supports a file format of multiple SPARQL queries. Here we execute each query using the file movieontology.q of 40 queries. Within the instance each SPARQL query is translated in an SQL query, which allows to retrieve the results from the PostgreSQL database. For simplicity, we only display to the user the number of results of the query and the time required for the execution.

figure e

At the end of the execution we close all connections and we dispose of the reasoner.

figure f

Using Sesame API. OpenRDF Sesame is a de-facto standard framework for processing RDF data and includes parsers, storage solutions (RDF databases a.ka. triplestores), reasoning and querying, using the SPARQL query language [1].

In the second example we show how to create a repository and execute a single query using Sesame API. First we need to add the Sesame API module of Ontop as a dependency to the pom file pom.xml.

figure g

Then we set up the repository and create a connection. The repositories must always be initialized first. We get the repository connection that will be used to execute the query.

figure h

We load the SPARQL file q1Movie.rq which contains the same query that we used for the Protege example.

figure i

Now we are ready to execute the query using the created Sesame repository connection and output the results of the SPARQL from the database.

figure j

Finally we close all the connections and release the resources.

figure k