Keywords

1 Introduction

The aim of the Semantic Web is to make information available in a form that is understandable and automatically manageable by machines. In order to realize this vision, the W3C has supported the development of a family of knowledge representation formalisms of increasing complexity for defining ontologies, called OWL (Web Ontology Language), that are based on Description Logics (DLs). Many inference systems, generally called reasoners, have been proposed to reason upon these ontologies, such as Pellet [23], Hermit [22] and Fact++ [24].

Nonetheless, modeling real-world domains requires dealing with information that is incomplete or that comes from sources with different trust levels. This motivates the need for the uncertainty management in the Semantic Web, and many proposals have appeared for combining probability theory with OWL languages, or with the underlying DLs [4, 8, 12, 14, 15]. Among them, in [18, 26] we introduced the DISPONTE semantics for probabilistic DLs. DISPONTE borrows the distribution semantics [20] from Probabilistic Logic Programming, that has emerged as one of the most effective approaches for representing probabilistic information in Logic Programming languages. Examples of probabilistic reasoners that perform inference under DISPONTE are BUNDLE [18, 19, 26], TRILL and TRILL\(^\text {P}\) [26, 27]. The first one is implemented in Java, whereas the other two are written in Prolog to exploit Prolog’s backtracking facilities during the search of all the possible justifications.

In order to perform probabilistic inference over DISPONTE knowledge bases (KBs), it is necessary to find the covering set of justifications and this is accomplished by a non probabilistic reasoner. The first version of BUNDLE was able to execute this search by exploiting only the Pellet reasoner [23].

In this paper, we propose a new version of BUNDLE which is modular and allows one to use different OWL reasoners and different approaches for justification finding. In particular, it embeds Pellet, Hermit, Fact++ and JFact as OWL reasoners, and three justification generators, namely GlassBox (only for Pellet), BlackBox and OWL Explanation. The introduced modularity has two main advantages with respect to BUNDLE ’s previous version. First, it allows one to “plug-in” a new OWL API-based reasoner in a very simple manner. Second, the framework can be easily extended by including new concrete implementations of algorithms for justification finding.

In this modular version, BUNDLE performance will strongly depend on the sub-system employed to build the set of justifications for a given query. To evaluate it we ran several experiments on different real-world and synthetic datasets.

The paper is organized as follows: Sect. 2 briefly introduces DLs, while Sect. 3 illustrates the justification finding problem. Sections 4 and 5 present DISPONTE and the theoretical aspects of inference in DISPONTE KBs respectively. The description of BUNDLE is provided in Sect. 6. Finally, Sect. 7 shows the experimental evaluation and Sect. 8 concludes the paper.

2 Description Logics

An ontology describes the concepts of the domain of interest and their relations with a formalism that allows information to be processable by machines. The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies or knowledge bases. OWL 2 [25] is the last version of this language and since 2012 it became a W3C recommendation.

Descriptions Logics (DLs) provide a logical formalism for knowledge representation. They are useful in all the domains where it is necessary to represent information and to perform inference on it, such as software engineering, medical diagnosis, digital libraries, databases and Web-based informative systems. They possess nice computational properties such as decidability and (for some DLs) low complexity [1].

There are many different DL languages that differ in the constructs that are allowed for defining concepts (sets of individuals of the domain) and roles (sets of pairs of individuals). The \(\mathcal {SROIQ}\)(D) DL is one of the most common fragments; it was introduced by Horrocks et al. in [7] and it is of particular importance because it is semantically equivalent to OWL 2.

Let us consider a set of atomic concepts \(\mathbf {C}\), a set of atomic roles \(\mathbf {R}\) and a set of individuals \(\mathbf {I}\). A role could be an atomic role \(R \in \mathbf {R}\), the inverse \(R^{-}\) of an atomic role \(R \in \mathbf {R}\) or a complex role \(R \circ S\). We use \(\mathbf {R^{-}}\) to denote the set of all inverses of roles in \(\mathbf {R}\). Each \(A \in \mathbf {A}\), \(\bot \) and \(\top \) are concepts and if \(a \in \mathbf {I}\), then \(\{a\}\) is a concept called nominal. If C, \(C_1\) and \(C_2\) are concepts and \(R \in \mathbf {R}\cup \mathbf {R^{-}}\), then \((C_1\sqcap C_2)\), \((C_1\sqcup C_2 )\) and \(\lnot C\) are concepts, as well as \(\exists R.C\), \(\forall R.C\), \({\ge }n R.C\) and \({\le }n R.C\) for an integer \(n \ge 0\).

A knowledge base (KB) \(\mathcal {K} = (\mathcal {T}, \mathcal {R}, \mathcal {A})\) consists of a TBox \(\mathcal {T} \), an RBox \(\mathcal {R} \) and an ABox \(\mathcal {A} \). An RBox \(\mathcal {R} \) is a finite set of transitivity axioms Trans(R), role inclusion axioms \(R \sqsubseteq S\) and role chain axioms \(R\circ P \sqsubseteq S\), where \(R, P, S \in \mathbf {R} \cup \mathbf {R^{-}}\). A TBox \( \mathcal {T} \) is a finite set of concept inclusion axioms \(C\sqsubseteq D\), where C and D are concepts. An ABox \(\mathcal {A} \) is a finite set of concept membership axioms a : C and role membership axioms (ab) : R, where C is a concept, \(R \in \mathbf {R}\) and \(a,b \in \mathbf {I}\).

A KB is usually assigned a semantics using interpretations of the form \(\mathcal {I} = (\varDelta ^\mathcal {I}, \cdot ^\mathcal {I})\), where \(\varDelta ^\mathcal {I} \) is a non-empty domain and \(\cdot ^\mathcal {I} \) is the interpretation function that assigns an element in \(\varDelta ^\mathcal {I} \) to each individual a, a subset of \(\varDelta ^\mathcal {I} \) to each concept C and a subset of \(\varDelta ^\mathcal {I} \times \varDelta ^\mathcal {I} \) to each role R. The mapping \(\cdot ^\mathcal {I} \) is extended to complex concepts as follows (where \(R^\mathcal {I} (x,C) = \{y|\langle x,y\rangle \in R^\mathcal {I}, y \in C^\mathcal {I} \}\) and \(\#X\) denotes the cardinality of the set X):

\(\mathcal {SROIQ}\)(D) also permits the definition of datatype roles, which connect an individual to an element of a datatype such as integers, floats, etc.

A query Q over a KB \(\mathcal {K} \) is usually an axiom for which we want to test the entailment from the KB, written as \(\mathcal {K} \,\models \,Q\).

Example 1

Consider the following KB “Crime and Punishment”

This KB states that if you killed someone then you are a nihilist and whoever is a nihilist is a “great man” (TBox). It also states that Raskolnikov killed Alyona and Lizaveta (ABox). The KB entails the query \(Q= { \textsf {raskolnikov}} : { \textsf {GreatMan}} \) (but are we sure about that?).

3 Justification Finding Problem

Here we discuss the problem of finding the covering set of justifications for a given query. This non-standard reasoning service is also known as axiom pinpointing [21] and it is useful for tracing derivations and debugging ontologies. This problem has been investigated by various authors [2, 6, 9, 21]. A justification corresponds to an explanation for a query Q. An explanation is a subset of logical axioms \(\mathcal {E} \) of a KB \(\mathcal {K} \) such that \(\mathcal {E} \,\models \,Q\), whereas a justification is an explanation such that it is minimal w.r.t. set inclusion. Formally, we say that an explanation \(\mathcal {J} \subseteq \mathcal {K} \) is a justification if for all \(\mathcal {J'} \subset \mathcal {J} \), , i.e. \(\mathcal {J} '\) is not an explanation for Q. The problem of enumerating all justifications that entail a given query is called axiom pinpointing or justification finding. The set of all the justifications for the query Q is the covering set of justifications for Q. Given a KB \(\mathcal {K} \), the covering set of justifications for Q is denoted by All-Just(\(Q,\mathcal {K} \)).

Below, we provide the formal definitions of justification finding problem.

Definition 1

(Justification finding problem).

Input: A knowledge base \(\mathcal {K} \), and an axiom Q such that \(\mathcal {K} \,\models \,Q\).

Output: The set All-Just(\(Q,\mathcal {K} \)) of all the justifications for Q in \(\mathcal {K} \).

There are two categories of algorithms for finding a single justification: glass-box algorithms [9] and black-box algorithms. The former category is reasoner-dependent, i.e. a glass-box algorithm implementation depends on a specific reasoner, whereas a black-box algorithm is reasoner-independent, i.e. it can be used with any reasoner. In both cases, we still need a reasoner to obtain a justification.

It is possible to incrementally compute all justifications for an entailment by using Reiter’s Hitting Set Tree (HST) algorithm [17]. This algorithm repeatedly calls a glass-box or a black-box algorithm which builds a new justification. To avoid the extraction of already found justifications, at each iteration the extraction process is performed on a KB from which some axioms are removed by taking into account the previously found justifications. For instance, given a KB \(\mathcal {K}\) and a query Q, if the justification \(\mathcal {J} = \{ E_1, E_2, E_3 \}\) was found, where \(E_i\)s are axioms, to avoid the generation of the same justification, the HST algorithm tries to find a new justification on \(\mathcal {K} ' = \mathcal {K} \smallsetminus E_1\). If no new justification is found the HST algorithm backtracks and tries to find another justification by removing other axioms from \(\mathcal {J} \), one at a time.

4 Probabilistic Description Logics

DISPONTE [18, 26] applies the distribution semantics [20] to Probabilistic Description Logic KBs.

In DISPONTE, a probabilistic knowledge base \(\mathcal {K}\) is a set of certain axioms or probabilistic axioms. Certain axioms take the form of regular DL axioms. Probabilistic axioms take the form

$$ p \,{::}\, E $$

where \(p \in [0,1]\) and E is a DL axiom. \( p\,{::}\,E \) means that we have degree of belief p in axiom E.

DISPONTE associates independent Boolean random variables to the DL axioms. The set of axioms that have the random variable assigned to 1 constitutes a world. The probability of a world w is computed by multiplying the probability \(p_i\) for each probabilistic axiom \(E_i\) included in the world by the probability \(1-p_i\) for each probabilistic axiom \(E_i\) not included in the world.

Below, we provide some formal definitions for DISPONTE.

Definition 2

(Atomic choice). An atomic choice is a couple \((E_i, k)\) where \(E_i\) is the ith probabilistic axiom and \(k \in \{0, 1\}\). The variable k indicates whether \(E_i\) is chosen to be included in a world \((k = 1)\) or not \((k = 0)\).

Definition 3

(Composite choice). A composite choice \(\kappa \) is a consistent set of atomic choices, i.e., \((E_i, k) \in \kappa \), \((E_i, m) \in \kappa \) implies \(k = m\) (only one decision is taken for each axiom).

The probability of composite choice \(\kappa \) is

$$P(\kappa ) = \prod _{(E_i, 1)\in \kappa }p_i\prod _{(E_i, 0)\in \kappa }(1-p_i)$$

where \(p_i\) is the probability associated with axiom \(E_i\), because the random variables associated with axioms are independent.

Definition 4

(Selection). A selection \(\sigma \) is a total composite choice, i.e., it contains an atomic choice \((E_i, k)\) for every probabilistic axiom of the theory. A selection \(\sigma \) identifies a theory \(w_\sigma \) called a world: \(w_\sigma = \mathcal {C}\cup \{E_i|(E_i, 1)\in \sigma \}\), where \(\mathcal {C}\) is the set of certain axioms.

\(P(w_\sigma )\) is a probability distribution over worlds. Let us indicate with \(\mathcal {W}\) the set of all worlds. The probability of Q is [18]:

$$ P(Q) = \sum _{w\in \mathcal {W}: w\models Q}P(w) $$

i.e. the probability of the query is the sum of the probabilities of the worlds in which the query is true.

Example 2

Let us consider the knowledge base and the query \(Q= { \textsf {raskolnikov}} : { \textsf {GreatMan}} \) of Example 1 where some of the axioms are probabilistic:

$$\begin{aligned} E_1 =&\ 0.2\,{:}{:}\, { \textsf {Nihilist}} \sqsubseteq { \textsf {GreatMan}}&C_1=&\ \exists { \textsf {killed}} .\top \sqsubseteq { \textsf {Nihilist}} \\ E_2 =&\ 0.6\,{:}{:}\, { \textsf {(raskolnikov, alyona) : killed}}&E_3=&\ 0.7\,{::}\, { \textsf {(raskolnikov, lizaveta) : killed}} \end{aligned}$$

Whoever is a nihilist is a “great man” with probability 0.2 (\(E_1\)) and Raskolnikov killed Alyona and Lizaveta with probability 0.6 and 0.7 respectively (\(E_2\) and \(E_3\)). Moreover there is a certain axiom (\(C_1\)). The KB has eight worlds and Q is true in three of them, corresponding to the selections:

$$\begin{aligned} \{\{(E_1,1),(E_2,1),(E_3,1)\},\ \{(E_1,1),(E_2,1),(E_3,0)\},\ \{(E_1,1),(E_2,0),(E_3,1)\}\} \end{aligned}$$

The probability is \(P(Q) = 0.2\cdot 0.6\cdot 0.7 + 0.2\cdot 0.6\cdot (1-0.7) + 0.2\cdot (1-0.6)\cdot 0.7 = 0.176\).

5 Inference in Probabilistic Description Logics

It is often infeasible to find all the worlds where the query is true. To reduce reasoning time, inference algorithms find, instead, explanations for the query and then compute the probability of the query from them. Below we provide the definitions of DISPONTE explanations and justifications, which are tightly intertwined with the previous definitions of explanation and justification for the non-probabilistic case.

Definition 5

(DISPONTE Explanation). A composite choice \(\phi \) identifies a set of worlds \(\omega _\phi = \{ w_\sigma | \sigma \in \mathcal {S}, \sigma \supseteq \phi \}\), where \(\mathcal {S}\) is the set of all selections. We say that \(\phi \) is an explanation for Q if Q is entailed by every world of \(\omega _\phi \).

Definition 6

(DISPONTE Justification). We say that an explanation \(\gamma \) is a justification if, for all \(\gamma ' \subset \gamma \), \(\gamma '\) is not an explanation for Q.

A set of explanations \(\varvec{\varPhi }\) is covering Q if every world \(w_\sigma \in \mathcal {W} \) in which Q is entailed is such that \(w_\sigma \in \bigcup _{\phi \in \varvec{\varPhi }} \omega _\phi \). In other words a covering set \(\varvec{\varPhi }\) identifies all the worlds in which Q succeeds.

Two composite choices \(\kappa _1\) and \(\kappa _2\) are incompatible if their union is inconsistent. For example, \(\kappa _1 = \{(E_i, 1) \}\) and \(\kappa _2 = \{(E_i, 0)\}\) are incompatible. A set \(\varvec{K}\) of composite choices is pairwise incompatible if for all \( \kappa _1 \in \varvec{K}\), \(\kappa _2 \in \varvec{K}\), \(\kappa _1 \ne \kappa _2\) implies that \(\kappa _1\) and \(\kappa _2\) are incompatible. The probability of a pairwise incompatible set of composite choices K is \(P(\varvec{K}) = \sum _{\kappa \in \varvec{K}} P(\kappa )\).

Given a query Q and a covering set of pairwise incompatible explanations \(\varvec{\varPhi }\), the probability of Q is [18]:

$$\begin{aligned} P(Q) = \sum _{w_\sigma \in \omega _{\varvec{\varPhi }}} P(w_\sigma ) = P(\omega _{\varvec{\varPhi }}) = P(\varvec{\varPhi }) = \sum _{\phi \in \varvec{\varPhi }} P(\phi ) \end{aligned}$$
(1)

where \(\omega _{\varvec{\varPhi }}\) is the set of worlds identified by the set of explanations \(\varvec{\varPhi }\).

Example 3

Consider the KB and the query \(Q= { \textsf {raskolnikov}} : { \textsf {GreatMan}} \) of Example 2. We have the following covering set of pairwise incompatible explanations: \(\varvec{\varPhi } = \{\{(E_1, 1), (E_2, 1)\},\ \{(E_1, 1), (E_2, 0), (E_3, 1)\}\}.\) The probability of the query is \(P(Q) = 0.2\cdot 0.6 + 0.2 \cdot 0.4\cdot 0.7 = 0.176.\)

Unfortunately, in general, explanations (and hence justifications) are not pairwise incompatible. The problem of calculating the probability of a query is therefore reduced to that of finding a covering set of justifications and then transforming it into a covering set of pairwise incompatible explanations.

We can think of using justification finding algorithms for non-probabilistic DLs to find the covering set of non-probabilistic justifications, then consider only the probabilistic axioms and transform the covering set of DISPONTE justifications into a pairwise incompatible covering set of explanations from which it is easy to compute the probability.

Example 4

Consider the KB and the query \(Q= { \textsf {raskolnikov}} : { \textsf {GreatMan}} \) of Example 2. If we use justification finding algorithms by ignoring the probabilistic annotations, we find the following non-probabilistic justifications: \( \varvec{\mathcal {J}} = \{\{E_1, C_1, E_2\},\ \{E_1, C_1, E_3\}\} \). Then we can translate them into DISPONTE justifications: \(\varvec{\varGamma } = \{\{(E_1, 1), (E_2, 1)\},\ \{(E_1, 1), (E_3, 1)\}\}\). Note that \(\varvec{\varGamma }\) is not pairwise incompatible, therefore we cannot directly use Eq. (1). The solution to this problem will be shown in the following section.

6 BUNDLE

The reasoner BUNDLE [18, 19] computes the probability of a query w.r.t. DISPONTE KBs by first computing all the justifications for the query, then converting them into a pairwise incompatible covering set of explanations by building a Binary Decision Diagram (BDD). Finally, it computes the probability by traversing the BDD. A BDD for a function of Boolean variables is a rooted graph that has one level for each Boolean variable. A node n has two children corresponding respectively to the 1 value and the 0 value of the variable associated with the level of n. When drawing BDDs, the 0-branch is distinguished from the 1-branch by drawing it with a dashed line. The leaves store either 0 or 1.

Given the set \(\varvec{\varPhi }\) of all DISPONTE explanations for a query Q, we can define the Disjunctive Normal Form Boolean formula \(f_{\varvec{\varPhi }}\) representing the disjunction of all explanations as \(f_{\varvec{\varPhi }}(\mathbf {X})=\bigvee _{\phi \in \varvec{\varPhi }}\bigwedge _{(E_i,1)}X_{i}\bigwedge _{(E_i,0)}\overline{X_{i}}\). The variables \(\mathbf {X}=\{ X_i\mid p_i\,{::}\,E_i \in \mathcal {K} \}\) are independent Boolean random variables with \(P(X_i=1)=p_i\) and the probability that \(f_{\varvec{\varPhi }}(\mathbf {X})\) takes value 1 gives the probability of Q.

BDDs perform a Shannon’s expansion of the Boolean function \(f_{\varvec{\varPhi }}\) that makes the disjuncts, and hence the associated explanations, mutually exclusive, i.e. pairwise incompatible.

Given the BDD, we can use function Probability described in [10] to compute the probability. This dynamic programming algorithm traverses the diagram from the leaves to the root and computes the probability of a formula encoded as a BDD.

Example 5

(Example 2 cont.). Let us consider the KB and the query of Example 2. If we associate random variables \(X_{1}\) with axiom \(E_1\), \(X_{2}\) with \(E_2\) and \(X_3\) with \(E_3\), the BDD representing the set of explanations is shown in Fig. 1. By applying function Probability [10] to this BDD we get

$$\begin{aligned} \textsc {Probability}(n_3)= & {} 0.7\cdot 1+0.3\cdot 0=0.7\\ \textsc {Probability}(n_2)= & {} 0.6\cdot 1+0.4\cdot 0.7=0.88\\ \textsc {Probability}(n_1)= & {} 0.2\cdot 0.88+0.8\cdot 0=0.176 \end{aligned}$$

and therefore \(P(Q)=\textsc {Probability}(n_1)=0.176\), which corresponds to the probability given by DISPONTE.

Fig. 1.
figure 1

BDD representing the set of explanations for the query of Example 2.

BUNDLE uses implementations of the HST algorithm to incrementally obtain all the justifications. However, the first version was able to use only a glass-box approach which was dependent on the Pellet reasoner [23].

In the following, we illustrate the modifications introduced in the new modular version of BUNDLE.

Figure 2 shows the new architecture of BUNDLE. The main novelties are the adoption of the OWL Explanation libraryFootnote 1 [6] and of the BlackBox approach offered by OWL API. Thanks to them, BUNDLE is now reasoner-independent and it can exploit different OWL reasoners.

Fig. 2.
figure 2

Software architecture of BUNDLE.

Modularity is therefore realized in two directions: (1) support of different OWL reasoners: Pellet 2.5.0, Hermit 1.3.8.413 [22], Fact++ 1.6.5 [24], and JFact 4.0.4Footnote 2; (2) three different strategies for finding a justification, which are:

  • GlassBox. A glass-box approach which depends on Pellet. It is a modified version of the GlassBoxExplanation class contained in the Pellet Explanation library.

  • BlackBox. A black-box approach offered by the OWL APIFootnote 3 [5]. The OWL API is a Java API for the creation and manipulation of OWL 2 ontologies.

  • OWL Explanation. A library that is part of the OWL Explanation Workbench [6]. The latter also contains a Protégé plugin, underpinned by the library, that allows Protégé users to find justifications for entailments in their OWL 2 ontologies.

All reasoners can be paired with the BlackBox and OWL Explanation methods, while only Pellet can exploit the GlassBox method.

To find all justifications of a given query with the GlassBox and BlackBox approaches an implementation of the HST algorithm is used, which is a modified version of the HSTExplanationGenerator class of the OWL API. We modified this class in order to support annotated axioms (DISPONTE axioms are OWL axioms annotated with a probability). OWL Explanation, instead, already contains an HST implementation and a black-box approach that supports annotated axioms.

BUNDLE can be used as standalone desktop application or, in this new version, as a library.

6.1 Using BUNDLE as Application

BUNDLE is an open-source software and is available on Bitbucket, together with its manual, at https://bitbucket.org/machinelearningunife/bundle.

A BUNDLE image was deployed in Docker Hub. Users can start using BUNDLE with just a couple of docker commands. All they have to do is pull the image and start the container with the commands:

figure a

A bash shell of the container then starts and users can use BUNDLE by running the command bundle. For instance, if we consider the KB and the query of Example 2, the user can ask the query with:

figure b

6.2 Using BUNDLE as Library

BUNDLE can also be used as a library. The library is set up as a Maven application and published on Maven CentralFootnote 4.

Once the developer has added BUNDLE dependency in the project’s POM file, the probability of the query can be obtained in just few lines:

figure c

where rootOntology and query are objects of the classes OWLOntology and OWLAxiom of the OWL API library respectively.

Line 3 shows that the developer can inject the preferred OWL API-based reasoner and perform probabilistic inference without modifying BUNDLE.

7 Experiments

We performed three different tests to compare the possible configurations of BUNDLE, which depend on the reasoner and the justification search strategy chosen, for a total of 9 combinations. In the first test we compared all configurations on four different datasets, in order to highlight which combination reasoner/strategy show the best behavior in terms of inference time. In the second one, we considered KBs of increasing size in terms of the number of probabilistic axioms. Finally, in the third test, we asked queries on a synthetic dataset of increasing size. The last two experiments were targeted to investigate the scalability of the different configurations. All tests were performed on the HPC System MarconiFootnote 5 equipped with Intel Xeon E5-2697 v4 (Broadwell) @ 2.30 GHz, using 8 cores for each test.

Test 1. The first test considers 4 real world KBs of various complexity as in [27]: (1) BRCA [11], which models the risk factors of breast cancer; (2) an extract of DBPediaFootnote 6 [13], containing structured information from Wikipedia, usually those contained in the information box on the righthand side of pages; (3) Biopax level 3Footnote 7 [3], which models metabolic pathways; (4) VicodiFootnote 8 [16], which contains information on European history and models historical events and important personalities.

We used a version of the DBPedia and Biopax KBs without the ABox and a version of BRCA and Vicodi with an ABox containing 1 individual and 19 individuals respectively. For each KB we added a probability annotation to each axiom. The probability values were randomly assigned. We randomly created 50 subclass-of queries for all the KBs and 50 instance-of queries for BRCA and Vicodi, following the concepts hierarchy of the KBs, ensuring each query had at least one explanation.

Table 1 shows the average time in seconds to answer queries with different BUNDLE configurations. Bold values highlight the fastest configuration for each KB. With the exception of DBPedia, the best results are obtained by Pellet with the GlassBox approach, corresponding to the configuration of the previous non-modular version of BUNDLE. However, the use of OWL Explanation library with Pellet shows competitive results. For BioPax and Vicodi KBs, the BlackBox approach with Fact++ wasn’t able to return a result (cells with “crash”).

Table 1. Average time (in seconds) for probabilistic inference with all possible configurations of BUNDLE over different datasets (Test 1). For BioPax and Vicodi KBs Fact++/BlackBox wasn’t able to return a result due to an internal error.

Test 2. The second test was performed following the approach presented in [11] on the BRCA KB (\(\mathcal {ALCHF}(D)\), 490 axioms). To test BUNDLE, we randomly generated and added an increasing number of subclass-of probabilistic axioms. The number of these axioms was varied from 9 to 16, and, for each number, 100 different consistent ontologies were created. Although the number of additional axioms, they may cause an exponential increase of the inference complexity (please see [11] for a detailed explanation).

Finally, an individual was added to every KB, randomly assigned to each simple class that appeared in the probabilistic axioms, and a random probability was attached to it. Complex classes contained in the conditional constraints were split into their components, e.g., the complex class PostmenopausalWomanTakingTestosterone was divided into PostmenopausalWoman and WomanTakingTestosterone. Finally, we ran 100 probabilistic queries of the form a : C where a is the added individual and C is a class randomly selected among those that represent women under increased and lifetime risk such as WomanUnderLifetimeBRCRisk and WomanUnderStronglyIncreasedBRCRisk, which are at the top of the concept hierarchy.

Table 2 shows the execution time averaged over the 100 queries as a function of the number of probabilistic axioms. For each size, bold values indicate the best configuration. The BlackBox approaches are much slower on average than the others. The best performance is shown by Pellet/GlassBox until size 12, and by Pellet/OWLExp, Fact++/OWLExp and JFact/OWLExp from size 13.

Table 2. Average execution time (ms) for probabilistic inference with different configurations of BUNDLE on versions of the BRCA KB of increasing size (Test 2).

Test 3. In the third test we artificially created a set of KBs of increasing size of the following form:

where \(n \ge 1\) and \(1\le i\le n\). The query \(Q= B_0\sqsubseteq B_n\) has \(2^n\) explanations, even if the KB has a size that is linear in n.

We increased n from 2 to 10 in steps of 2 and we collected the running time, averaged over 50 executions. Table 3 shows, for each n, the average time in seconds that the systems took for computing the probability of the query Q (in bold the best time for each size). We set a timeout of 10 min for each query, so the cells with “–” indicate that the timeout occurred. This experiment confirms what already suggested by Test 2, i.e. the best results in terms of scalability are provided by the OWL Explanation method paired with any reasoner except Hermit. Thanks to this library the new version of BUNDLE is able to beat the first version (corresponding to Pellet/GlassBox), by reaching a larger dataset size.

Table 3. Average time (in seconds) for probabilistic inference with different configurations of BUNDLE on a synthetic dataset (Test 3). “–” means that the execution timed out (600 s).

8 Conclusions

In this paper, we presented a modular version of BUNDLE, a system for reasoning on Probabilistic Description Logics KBs that follow DISPONTE. Modularity allows one to pair 4 different OWL reasoners with 3 different approaches to find query justifications. In addition, BUNDLE can now be used both as a standalone application and as a library. We provided a comparison between the various configurations reasoner/approach over different datasets, showing that Pellet paired with GlassBox or any reasoner (except Hermit) paired with the OWLExplanation library achieve the best results in terms of inference time on a probabilistic ontology. In the future, we plan to study the effects of glass-box or grey-box methods for collecting explanations.