Keywords

1 Introduction and Motivation

The second version of the Web Ontology Language (OWL 2) offers three profiles providing significant advantages in different application scenarios. These profiles are: OWL 2 EL, OWL 2 RL and OWL 2 QL. All of them are defined as syntactic restrictions of OWL 2 [11] with different computational complexity. OWL 2 RL, which we are focused on, provides the implementation of polynomial time reasoning algorithms in a standard rule engine. Moreover, this profile has been designed to perform reasoning tasks in a forward chaining rule system by implementing a set of predefined rules. However, a naive implementation of OWL 2 RL reasoner is known to perform poorly with large ABoxes [7]. Furthermore, the official listFootnote 1 of OWL 2 reasoners supporting OWL 2 RL is limited. Moreover, there is a lack of tools that can generate rules for different rule engines.

Usually, a rule-based system processes data only in its working memory which is limited by available RAM space. According to a forward chaining mechanism (bottom-up evaluation), commonly used in reasoning tasks, a user gets conclusions as a set of inferred facts. In this set it is hard to find a fact or facts which the user is interested in. Thus, there is a need for executing a query in order to obtain the necessary results. This is a better way than looking through the working memory manually. Moreover, the forward chaining approach performs reasoning with all facts in the working memory. Therefore, some of the inferred facts are useless and many rules are fired unnecessarily. As a result the efficiency of the query answering process is decreased. One way of increasing the efficiency and scalability is to store data outside the working memory and load facts only when needed. Thus, the scalability and efficiency of reasoning as well as query answering will be increased.

According to the aforementioned issues we are motivated to provide an easy-to-use framework for performing the ABox reasoning with OWL 2 RL ontologies in any forward chaining rule engine. Moreover, we want to support efficient query answering with relational data that is semantically described by the use of mappings between an ontology and a database schema. In this paper we are focused on query answering with OWL 2 RL ontologies executed by forward chaining rule engines. However, presented approach can be applied to ontologies that are more expressible than OWL 2 RL. It is possible because we use HermiTFootnote 2 in order to execute the TBox reasoning (with the terminological part of an ontology) first. Then, we start the ABox reasoning (with the assertional part of the ontology). Thus, we can employ a rule-based engine in order to execute reasoning and query answering.

The main goal of this paper is to present database connectivity of the RuQAR (Rule-based Query Answering and Reasoning) framework in which query answering and reasoning can be performed using DroolsFootnote 3 and JessFootnote 4. Moreover, we present RuQAR’s evaluation using the LUBM ontology benchmark [6].

The remaining part is organized as follows. Section 2 discusses the related work. Section 3 presents the overview of the translation of an OWL 2 RL ontology into rules. Relational database access is presented in Sect. 4 whereas Sect. 5 describes the implementation details and experimental evaluation of RuQAR. Section 6 contains concluding remarks and the description of future work.

2 Related Work

A storage method for ABox as well as reasoning results in a relational database is described in [4]. The presented OwlOntDB system proposes a novel database-driven forward chaining method that executes scalable reasoning over OWL 2 RL ontologies with large ABoxes. However, OwlOntDB does not support query answering “on-the-fly”. Addition of one fact requires to perform reasoning and materializing once again. Without this, an answer may be not complete or sound.

OWL 2 RL rule-based reasoners are presented in [12]. In this case Jess and Drools perform inferences with rules that directly represent the semantics of the OWL 2 RL Profile. As a result, these rules can be perceived as the naive ones. Moreover, in this approach a non-triple based representation of facts and patterns in rules is applied which makes it difficult to use in other applications.

In [9] another scalable OWL 2 RL reasoner is presented. In this case an inference engine is implemented within the Oracle database system. The proposed reasoner introduces novel techniques for parallel processing with special optimizations for computing owl:sameAs property. However, in this approach “on-the-fly” query answering is also not supported.

The most closely related work regarding ontology transformation is an approach employed in DLEJena [10]. Nevertheless, DLEJena is able to use only one reasoning tool (Jena in contrast to Jess and Drools in our case). Moreover, we employ slightly different translation approach. DLEJena uses template rules to produce instantiated rules whereas we provide a Java-based generation of rules. Such an approach do not produce redundant instantiated rules as in [10]. Furthermore, in DLEJena the entailment rules are created at runtime whereas RuQAR produces ABox rules ahead of the reasoning process.

3 Ontology Translation Method

When applying a rule engine to an ontology-based reasoning one needs to translate the ontology into rules and facts. In our previous work [2] we proposed an approach that splits such a reasoning into two successive processes: the TBox reasoning (which solves the concept subsumption problem) and the ABox reasoning (which solves the instance checking problem). Moreover, we provided a method of translating an OWL 2 based ontology into two sets: one of rules and one of facts. In this section we present the main overview of previously proposed approach which is necessary to understand the following sections. However, more details can be found in [1, 2].

Since we focus on execution of rule-based reasoning with different rule engines we proposed the Abstract Syntax of Rules and Facts (ASRF). Rules and facts generated by our translation method are both expressed in ASRF first. Then, it is required to translate ASRF expressions into the native language of a chosen rule engine. However, RuQAR provides the translation into Jess and Drools out of the box.

The translation schema of an OWL 2 ontology into ASRF sets is presented in Fig. 1. It consists of the following steps:

  1. 1.

    An OWL 2 ontology is loaded into the HermiT engine with assumption that this ontology is consistent.

  2. 2.

    The TBox reasoning is executed by HermiT. As a result, a new classified version of the ontology (new TBox) is obtained.

  3. 3.

    The ontology is translated into rules and facts expressed in ASRF. However, in case when the ABox is empty, a set of facts is also empty.

Fig. 1.
figure 1

Translation schema of an OWL 2 ontology into the ASRF syntax.

According to the aforementioned schema, by having two ASRF sets we separate the TBox part (set of rules) from the ABox part (set of facts) of an ontology. Thus, we are able to perform the ABox reasoning with a forward chaining rule engine after the translation of both ASRF sets into the engine’s language.

After the classification performed by HermiT, our translation of an OWL 2 ontology into a set of ASRF rules is performed in the following way. For each supported OWL 2 RL/RDF rule and the corresponding OWL 2 RL axiom in the given ontology a rule that reflects the expression in this ontology is created. In other words, rather than transforming the semantics of OWL 2 RL into rules we create rules according to this semantics combined with a given ontology. For example, when an ObjectProperty hasCousin is defined as a SymmetricObjectProperty our method will generate a rule that follows the semantics of the property. As a result, when an instance of hasCousin occurs, a symmetric instance should be inferred (the following shortcuts are made: S for Subject, P for Predicate and O for Object):

$$\begin{aligned}&If \qquad (Triple \ \ (S \ \ ?w) \ (P \ \ ``hasCousin") \ (O \ \ ?z))\\ \nonumber&Then \quad (Triple \ \ (S \ \ ?z) \ \ (P \ \ ``hasCousin") \ (O \ \ ?w)) \end{aligned}$$
(1)

Therefore, rule (1) reflects the semantics of prp-symp rule from Table 5 in [11]. Such semantically equivalent rules containing a direct reference to a given ontology are created for each OWL 2 RL axiom that exist in this ontology. Each generated rule is an instantiated version of the corresponding OWL 2 RL/RDF rule for a particular TBox. As a result generated rules should be perceived as ontology instance related rules (i.e. instantiated rules or ABox rules). We call these rules ontology-dependant since they express the semantics of a particular TBox and are intended for the ABox reasoning (with facts). Thus, the rules can be directly applied in a forward chaining rule engine after the translation from ASRF to the engine’s language. As a result, an execution of a reasoning with the assertional part is provided. Such an approach has a positive impact on the efficiency of reasoning since the semantics of TBox is directly reflected in the generated rules. Furthermore, the average number of conditions in the bodies of rules is smaller than in the corresponding OWL 2 RL/RDF rules, which consequently increases the efficiency of reasoning.

Current RuQAR implementation lacks support of some rules defined in OWL 2 specification [11]. We decided to use the simplest subset of OWL 2 RL/RDF rules which is easily implementable in any reasoning engine. Moreover, we excluded rules that are “constraint” rules (e.g. cls-nothing2 from Table 6 in the OWL 2 RL Profile) and rules that are not used in the ABox reasoning (e.g. rules from Table 9 in [11]). Nevertheless, some rules need to be implemented, e.g. cls-maxqc3 from Table 6 in [11].

Our translation method may provide more entailments during reasoning than those represented by OWL 2 RL/RDF rules. It results from the fact that we apply a DL-based reasoner and the TBox reasoning first. Nonetheless, it is determined by the expressivity of a given ontology. However, the application of our approach to an ontology that contains expressions beyond OWL 2 RL Profile will not provide the same entailments as derived by an appropriate DL-based reasoner. As a result, the reasoning with RuQAR will be sound but not complete. We observed such a case in our evaluation with LUBM ontologies where all results obtained with RuQAR were within entailments derived by PelletFootnote 5. This is a correct result since constructions used in LUBM ontologies are beyond the OWL 2 RL axioms.

4 Mapping Ontology Predicates to Relational Data

In order to enable semantic access to relational data, it is necessary to express relational concepts in terms of ontology concepts, that is to define mappings between a relational schema and ontology classes (concepts) and relations (roles). Given such mappings, one can transform relational data to RDF triples and process that copy in semantic applications. This method has an obvious drawback, such as maintaining synchronization. Another method is to create a data adapter based on query rewriting. Such adapters can rewrite SPARQL [5] query to SQL [3] query and execute it in RDBMS. Such a method could be fast in data retrieval, but without a reasoner, the full potential of ontology cannot be exploited. The third method is to generate semantic data from relational data “on-the-fly”, on demand for the requesting application, and then process that data with a reasoner. We use such a method to fill a gap between the representation of relational data and the semantically described data.

A very important step in our mapping approach consists of linking data stored in a relational database to a knowledge base (an ontology). We accomplish this by creating mapping rules which contain SQL queries in their heads. These rules serve as mappings that are used to relate knowledge predicates and the corresponding database.

In our mapping method we assume that an ontology which will be used and translated into rules is properly constructed (i.e. the ontology is classified without inconsistencies). Then, it is required to define predicate-database mappings. Each predicate-database mapping is defined as a rule of the following form:

$$\begin{aligned} Ontology\_predicate \ \rightarrow \ SQL\_statement \end{aligned}$$
(2)

A result of each SQL statement should return rows with one, two or three columns representing instances of a class, a property or a triple, respectively. The body of each mapping rule contains an ontology element which will be instantiated when an SQL query, that resides in the head, is executed. In other words, every execution of an SQL query provides a set of RDF triples. We assume that every SQL query has one of the following permissible forms (query patterns):

$$\begin{aligned}&SELECT \ Col_1 \ FROM \ * \ WHERE \ (Col_1 \ is \ not \ NULL); \end{aligned}$$
(3)
$$\begin{aligned}&SELECT \ Col_1, \ Col_2 \ FROM \ * \ WHERE \nonumber \\&\ ((Col_1 \ is \ not \ NULL) \ AND \ (Col_2 \ is \ not \ NULL)); \end{aligned}$$
(4)
$$\begin{aligned}&SELECT \ Col_1, \ Col_2 , \ Col_3\ FROM \ * \ WHERE \nonumber \\&\ ((Col_1 \ is \ not \ NULL) \ AND\,(Col_2 \ is \ not \ NULL)) \ AND \ (Col_3 \ is \ not \ NULL)); \end{aligned}$$
(5)

where:

  • \(COl_1,\ COL_2,\ COL_3\) are the attributes (columns) that occur in the result of a query,

  • \(*\) is an SQL statement; it can contain SQL commands that are available in the SQL server - e.g. nested Select query or a table name,

  • \((COL_x \ is \ not \ NULL)\) means NULL results are not allowed.

The default meaning of a query pattern is to provide an access to a relational database by obtaining results of the query execution. Pattern (3) can be used to obtain instances of a class; pattern (4) gathers instances of a property whereas pattern (5) aims at loading different kind of triples. Patterns are designed in a way it is easy to execute them with or without values. Execution of each pattern without values returns all mapped instances. Otherwise, only requested instances will be returned (if they occur in a database). In that way values can be perceived as constraints. For example, let assume that we have the following pattern for a property worksFor:

$$\begin{aligned}&SELECT \ IDEmployee, \ IDCompany \ from \ Employee \ WHERE \nonumber \\&((IDEmployee \ is \ not \ NULL) \ AND \ (IDcompany \ is \ not \ NULL)); \end{aligned}$$
(6)

Pattern (6) tries to obtain instance(s) of relation worksFor between an employee’s id and a company’s id. If we do not know any value of used columns (IDEmployee or IDCompany) we can execute query which looks exactly the same as pattern (6). However, in case when values of variables are known we execute the following queries:

$$\begin{aligned}&SELECT \ IDEmployee, \ IDCompany \ from \ Employee \ WHERE \nonumber \\&((IDEmployee \ in \ (4,\ 5,\ 6)) \ AND \ (IDcompany \ is \ not \ NULL)); \end{aligned}$$
(7)
$$\begin{aligned}&SELECT \ IDEmployee, \ IDCompany \ from \ Employee \ WHERE \nonumber \\&((IDEmployee \ is \ not \ NULL) \ AND \ (IDcompany \ in \ (11, 13, 15))); \end{aligned}$$
(8)

In that case IDEmployee values 4, 5, 6 and IDCompany values 11, 13, 15 are known. As a result we obtain instances of worksFor property that contain those values. However, we can execute query (9) that contains both values:

$$\begin{aligned}&SELECT \ IDEmployee, \ IDCompany \ from \ Employee \ WHERE \nonumber \\&((IDEmployee \ in \ (4)) \ AND \ (IDcompany \ in \ (100))); \end{aligned}$$
(9)

Each mapping rule that follows our method includes SQL pattern (query) in the head while the body contains an ontology predicate (class or property) or a triple. Every time, when an instance of a class/property/triple is required, we add a special trigger fact which activates a rule. When the rule fires, the corresponding SQL query is executed and results are added to a reasoning engine. As a result, query answering process fires rules with SQL queries only when there is a need for accessing data. It is important to note that queries that follow our patterns may combine data from different tables and we can execute complex queries, i.e. we can use nested SELECT statements (they should be inserted in * place).

5 Implementation and Experiments

RuQAR implements our approach of translating OWL 2 RL ontologies into sets expressed in the ASRF syntax. The database interface including our mapping method is also provided. Current version supports JDBC connectivity. The tool is developed in Java. RuQAR is able to execute ABox reasoning and query answering with two state-of-the-art rule engines: Drools and Jess. Moreover, RuQAR may be used as a library and can be employed in applications that require efficient ABox reasoning. The tool uses the OWL API [8] in order to load and process ontology files. We use Drools 5.5 and Jess 7.1. We employ MS SQL Server 2012 to store an ontology in a relational database.

RuQAR also supports automatic transformation of an ontology data into a relational database. However, the transformation is a very basic one. The ABox part of an ontology is transformed into a database with one table containing three columns: subject, predicate and object. Thus, it is easy to automatically generate mappings between an ontology and the corresponding database. The transformation and generation of mapping rules require only an access to a database server and an ontology. Then, both processes can be executed automatically.

We evaluated RuQAR’s query answering feature using LUBM test ontology taken from the KAON2 websiteFootnote 6. We used different datasets of each ontology (LUBM_0, ..., LUBM_4) where the higher number means bigger ABox set. Herein, we present results from the largest set because of the limited space. We performed the evaluation with the following engines: Jess, Drools and Pellet. Our tests were executed on a Windows 10 desktop machine with: i7-4820K CPU 3,7 GHz, Java 1.7 update 79 while the maximum heap space was set to 15 GB.

Our evaluation takes into account the execution of 14 LUBM queries in two cases: (i) TBox and ABox are stored in the main memory and (ii) TBox is stored in the main memory while ABox resides in a relational database. Evaluation schema for the first case (called IM case) was the following:

  1. 1.

    Perform the TBox reasoning with HermiT.

  2. 2.

    Transform the classified ontology into ASRF rules and facts.

  3. 3.

    Generate rules and facts for a rule engine.

  4. 4.

    Load rules and facts into a rule engine.

  5. 5.

    Run reasoning.

  6. 6.

    Execute queries.

Evaluation schema for the second case (called DB case) was the following:

  1. 1.

    Store the ABox part of an ontology into a corresponding relational database.

  2. 2.

    Generate mapping rules for each class and property.

  3. 3.

    Perform the TBox reasoning with HermiT.

  4. 4.

    Transform the classified ontology into ASRF rules.

  5. 5.

    Generate rules for a rule engine.

  6. 6.

    Load both sets of rules into a rule engine.

  7. 7.

    Execute queries and perform reasoning.

Aforementioned cases were used for Jess and Drools. For the Pellet engine we loaded an ontology, performed the TBox and ABox reasoning separately, and then we executed 14 LUBM queries.

It is worth noting that in the DB case reasoning was performed only during the execution of queries. It means that this kind of execution should be perceived as top-down or goal-oriented reasoning. In other cases we had to perform reasoning first and then we were able to execute queries (without reasoning we would not be able to obtain complete results).

In each case we recorded: reasoning times, query answering times and counted the results. However, we executed tests using data stored in the working memory of an engine and with data stored in a relational database (only with Drools and Jess which follow our mapping method). Moreover, we validated the engines in order to prove that they produced identical results (an akin empirical approach was employed in [4, 12]). LUBM queries were defined in: the Jess Language, the Drools Rule Language and in SPARQL (for the Pellet engine).

Figures 2 and 3 present results of our query answering evaluation. Each test was executed three times and average times are presented. Results are presented in milliseconds. Figure 2 shows IM evaluation. Times presented herein are without reasoning (TBox and ABox) since we wanted to show differences between query execution times. In each engine, combined (TBox+ABox) reasoning times were the following: 8,5 s in Drools, 10,6 s in Jess and 14,8 s in Pellet. In this case Drools performed reasoning and executed queries in the fastest way.

Fig. 2.
figure 2

LUBM queries executed in the working memory. Times in ms.

Fig. 3.
figure 3

LUBM queries executed with data stored in a relational database and comparison with the working memory tests. Times in ms.

Figure 3 shows DB evaluation compared to the IM one. In order to make comparison adequateFootnote 7 we summarized reasoning times with query execution times in IM results. As we can see from the results Drools and Jess usually perform better when we use database as ABox storage, especially in comparison to Pellet. Queries 2, 8 and 9 require loading of huge number of triples. In queries 8 and 9 both engines load more than a half of all triples stored in a relational database. As a result, loading data from a database has strong impact on the efficiency. Nevertheless, our evaluation shows that querying data stored in a relational database using rule engine is possible and efficient. The important advantage comes from the fact that when using RuQAR with relational databases the answer for a query is always up to date since queries are executed on the current state of a database (“on-the-fly”). In any other case when data change the whole reasoning process needs to be performed once again before any query can be executed. More information about RuQAR, ASRF and efficiency issues can be found at RuQAR’s web page: http://etacar.put.poznan.pl/jaroslaw.bak/RuQAR.php.

6 Conclusions and Future Work

In this paper we presented a query answering method and a relational database connectivity implemented in the RuQAR framework that is aimed to be used with OWL 2 RL ontologies translated into a set of rules. Moreover, we described and performed an evaluation of RuQAR with Drools, Jess and Pellet. We compared them when executing queries in the working memory and with the use of a relational database. Our results show that it is better to use a rule engine when executing the ABox queries.

In the next RuQAR’s release we will provide novel and optimized query processing (currently, we use query functions directly supported in Drools and Jess). We also plan to perform experiments with the latest versions of Drools and Jess, 6.5 and 8.0, respectively. In this case we will be able to check whether the reasoning efficiency as well as query answering performance has been increased or not. As a result, in the Drools case, we will be able to compare two different algorithms: PHREAK (Drools 6.5) and ReteOO (Drools 5.5).