Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The main idea of the Semantic Web is making information available in a form that is understandable and automatically manageable by machines [16]. In order to realize this vision, the W3C has supported the development of a family of knowledge representation formalisms of increasing complexity for defining ontologies, called Web Ontology Language (OWL). In particular, OWL defines the sublanguages OWL-Lite, OWL-DL (based on Description Logics) and OWL-Full. Since the real world often contains uncertain information, it is fundamental to be able to represent and reason with such information. This problem has been investigated by various authors both in the general case of First Order Logic (FOL) [5, 14, 27] and in the case of restricted logics, such as Description Logics (DLs) and Logic Programming (LP).

In particular, in LP the distribution semantics [39] has emerged as one of the most effective approaches for representing probabilistic information and it underlies many probabilistic LP languages such as Probabilistic Horn Abduction [30], PRISM [39, 40], Independent Choice Logic [29], Logic Programs with Annotated Disjunctions [47], ProbLog [9] and CP-logic [46].

In [7, 34, 36, 37] we applied the distribution semantics to DLs obtaining DISPONTE for “DIstribution Semantics for Probabilistic ONTologiEs” (Spanish for “get ready”), in which we annotate axioms of a theory with a probability and assume that each axiom is independent of the others. A DISPONTE knowledge base (KB for short) defines a probability distribution over regular KBs (worlds) and the probability of a query is obtained from the joint probability of the worlds and the query.

In order to fully support the development of the Semantic Web, efficient DL reasoners, such us Pellet [44], RacerPro [12] and HermiT [43], are used to extract implicit information from the modeled ontologies, and probabilistic DL reasoners, such as PRONTO [21], are used to compute the probability of the inferred information. Most DL reasoners implement a tableau algorithm in a procedural language. However, some tableau expansion rules are non-deterministic, requiring the developers to implement a search strategy in an or-branching search space. Moreover, in some cases we want to compute all explanations for a query, thus requiring the exploration of all the non-deterministic choices of the tableau algorithm.

We present the algorithm BUNDLE for “Binary decision diagrams for Uncertain reasoNing on Description Logic thEories”, that performs inference over DISPONTE DLs. BUNDLE exploits an underlying reasoner such as Pellet [44] that returns explanations for queries.

Moreover, we present the system TRILL for “Tableau Reasoner for descrIption Logics in proLog”, a tableau reasoner implemented in the declarative Prolog language. Prolog’s search strategy is exploited for taking into account the non-determinism of the tableau rules. TRILL uses the Thea2 library [45] for parsing OWL in its various dialects. Thea2 translates OWL files into a Prolog representation in which each axiom is mapped into a fact. TRILL can check the consistency of a concept and the entailment of an axiom from an ontology, and can return the “pinpointing formula” for queries.

Both BUNDLE and TRILL use the inference techniques developed for probabilistic logic programs under the distribution semantics, in particular Binary Decision Diagrams (BDDs), for computing the probability of queries from the set of all explanations and the pinpointing formula respectively. They encode the results of the inference process in a BDD from which the probability can be computed in time linear in the size of the diagram.

In the following, Sect. 2 briefly introduces \(\mathcal {ALC}\) and \(\mathcal {SHOIN}(\mathbf D )\) DLs. Section 3 presents the DISPONTE semantics while Sect. 4 defines the problem of answering queries to DLs. Sections 5 and 6 describe BUNDLE and TRILL respectively. Section 7 illustrates related work. Section 8 shows experiments and Sect. 9 concludes the paper.

2 Description Logics

Description Logics (DLs) are knowledge representation formalisms that possess nice computational properties such as decidability and/or low complexity, see [1, 2] for excellent introductions. DLs are particularly useful for representing ontologies and have been adopted as the basis of the Semantic Web.

While DLs can be translated into FOL, they are usually represented using a syntax based on concepts and roles. A concept corresponds to a set of individuals of the domain while a role corresponds to a set of pairs of individuals of the domain. We first briefly describe \(\mathcal {ALC}\) and then \(\mathcal {SHOIN}(\mathbf {D})\), showing the difference with \(\mathcal {ALC}\).

Let \(\mathbf {A}\), \(\mathbf {R}\) and \(\mathbf {I}\) be sets of atomic concepts, roles and individuals, respectively. Concepts are defined by induction as follows. Each \(C \in \mathbf {A}\) is a concept, \(\bot \) and \(\top \) are concepts. If \(C\), \(C_1\) and \(C_2\) are concepts and \(R \in \mathbf {R}\), then \((C_1\sqcap C_2)\), \((C_1\sqcup C_2)\) and \(\lnot C\) are concepts, as well as \(\exists R.C\) and \(\forall R.C\). A TBox \(\mathcal{T}\) is a finite set of concept inclusion axioms \(C\sqsubseteq D\), where \(C\) and \(D\) are concepts. We use \(C \equiv D\) to abbreviate the conjunction of \(C \sqsubseteq D\) and \(D\sqsubseteq C\). An ABox \(\mathcal{A}\) is a finite set of concept membership axioms \(a : C\), role membership axioms \((a, b) : R\), equality axioms \(a = b\) and inequality axioms \(a \ne b\), where \(C\) is a concept, \(R \in \mathbf {R}\) and \(a, b \in \mathbf {I}\). A knowledge base \(\mathcal{K}= (\mathcal{T}, \mathcal{A})\) consists of a TBox \(\mathcal{T}\) and an ABox \(\mathcal{A}\). A knowledge base \(\mathcal{K}\) is usually assigned a semantics in terms of interpretations \(\mathcal{I}= ({\varDelta }^\mathcal{I}, \cdot ^\mathcal{I})\), where \({\varDelta }^\mathcal{I}\) is a non-empty domain and \(\cdot ^\mathcal{I}\) is the interpretation function that assigns an element in \({\varDelta }^\mathcal{I}\) to each \(a \in \mathbf {I}\), a subset of \({\varDelta }^\mathcal{I}\) to each \(C \in \mathbf {A}\) and a subset of \({\varDelta }^\mathcal{I}\times {\varDelta }^\mathcal{I}\) to each \(R \in \mathbf {R}\).

The mapping \(\cdot ^\mathcal{I}\) is extended to all concepts (where \(R^\mathcal{I}(x) = \{y | (x, y) \in R^\mathcal{I}\}\)) as:

$$ \begin{array}{rcl} \top ^\mathcal{I}&{}=&{}{\varDelta }^\mathcal{I}\\ \bot ^\mathcal{I}&{}=&{}\emptyset \\ (C_1\sqcap C_2)^\mathcal{I}&{}=&{}C_1^\mathcal{I}\cap C_2^\mathcal{I}\\ (C_1\sqcup C_2)^\mathcal{I}&{}=&{}C_1^\mathcal{I}\cup C_2^\mathcal{I}\\ (\lnot C)^\mathcal{I}&{}=&{}{\varDelta }^\mathcal{I}\setminus C^\mathcal{I}\\ (\forall R.C)^\mathcal{I}&{}=&{}\{x\in {\varDelta }^\mathcal{I}| R^\mathcal{I}(x)\subseteq C^\mathcal{I}\}\\ (\exists R.C)^\mathcal{I}&{}=&{}\{x\in {\varDelta }^\mathcal{I}| R^\mathcal{I}(x)\cap C^\mathcal{I}\ne \emptyset \}\\ \end{array} $$

The satisfaction of an axiom \(E\) in an interpretation \(\mathcal{I}= ({\varDelta }^\mathcal{I}, \cdot ^\mathcal{I})\), denoted by \(\mathcal{I}\models E\), is defined as follows: (1) \(\mathcal{I}\models C\sqsubseteq D\) iff \(C^\mathcal{I}\subseteq D^\mathcal{I}\), (2) \(\mathcal{I}\models a : C\) iff \(a^\mathcal{I}\in C^\mathcal{I}\), (3) \(\mathcal{I}\models (a, b) : R\) iff \((a^\mathcal{I}, b^\mathcal{I}) \in R^\mathcal{I}\), (4) \(\mathcal{I}\models a = b\) iff \(a^\mathcal{I}= b^\mathcal{I}\), (5) \(\mathcal{I}\models a \ne b\) iff \(a^\mathcal{I}\ne b^\mathcal{I}\). \(\mathcal{I}\) satisfies a set of axioms \(\mathcal {E}\), denoted by \(\mathcal{I}\models \mathcal {E}\), iff \(\mathcal{I}\models E\) for all \(E \in \mathcal {E}\). An interpretation \(\mathcal{I}\) satisfies a knowledge base \(\mathcal{K}= (\mathcal{T}, \mathcal{A})\), denoted \(\mathcal{I}\models \mathcal{K}\), iff \(\mathcal{I}\) satisfies \(\mathcal{T}\) and \(\mathcal{A}\). In this case we say that \(\mathcal{I}\) is a model of \(\mathcal{K}\).

In following we describe \(\mathcal {SHOIN}(\mathbf {D})\) showing what it adds to \(\mathcal {ALC}\). A role is either an atomic role \(R \in \mathbf {R}\) or the inverse \(R^-\) of an atomic role \(R \in \mathbf {R}\). We use \({\mathbf R^-}\) to denote the set of all inverses of roles in \(\mathbf {R}\). An RBox \(\mathcal{R}\) consists of a finite set of transitivity axioms \(Trans(R)\), where \(R \in \mathbf {R}\), and role inclusion axioms \(R\sqsubseteq S\), where \(R, S \in \mathbf {R} \cup \mathbf {R}^{-}\).

If \(a\in \mathbf {I}\), then \(\{a\}\) is a concept called nominal, and if \(C\), \(C_1\) and \(C_2\) are concepts and \(R \in \mathbf {R}\cup \mathbf {R}^-\), then \(\ge nR\) and \(\le nR\) for an integer \(n\ge 0\) are also concepts. A \(\mathcal {SHOIN}(\mathbf {D})\) KB \(\mathcal{K}= (\mathcal{T}, \mathcal{R}, \mathcal{A})\) consists of a TBox \(\mathcal{T}\), an RBox \(\mathcal{R}\) and an ABox \(\mathcal{A}\).

The mapping \(\cdot ^\mathcal{I}\) is extended to all new concepts (where \(\#X\) denotes the cardinality of the set \(X\)) as:

$$ \begin{array}{rcl} (R^-)^\mathcal{I}&{}=&{}\{(y,x)|(x,y)\in R^\mathcal{I}\}\\ \{a\}^\mathcal{I}&{}=&{}\{a^\mathcal{I}\}\\ (\ge n R)^\mathcal{I}&{}=&{}\{x\in {\varDelta }^\mathcal{I}| \#R^\mathcal{I}(x)\ge n\}\\ (\le n R)^\mathcal{I}&{}=&{}\{x\in {\varDelta }^\mathcal{I}| \#R^\mathcal{I}(x)\le n\}\\ \end{array} $$

\(\mathcal {SHOIN}(\mathbf {D})\) allows the definition of datatype roles, i.e., roles that map an individual to an element of a datatype such as integers, floats, etc. Then new concept definitions involving datatype roles are added that mirror those involving roles introduced above. We also assume that we have predicates over the datatypes.

The satisfaction of an axiom \(E\) in an interpretation \(\mathcal{I}= ({\varDelta }^\mathcal{I}, \cdot ^\mathcal{I})\), denoted by \(\mathcal{I}\models E\), is defined as for \(\mathcal {ALC}\), plus the following ones regarding RBox axioms: (6) \(\mathcal{I}\models Trans(R)\) iff \(R^\mathcal{I}\) is transitive, (7) \(\mathcal{I}\models R\sqsubseteq S\) iff \(R^\mathcal{I}\subseteq S^\mathcal{I}\). An interpretation \(\mathcal{I}\) satisfies a knowledge base \(\mathcal{K}= (\mathcal{T},\mathcal{R}, \mathcal{A})\), denoted \(\mathcal{I}\models \mathcal{K}\), iff \(\mathcal{I}\) satisfies \(\mathcal{T}\), \(\mathcal{R}\) and \(\mathcal{A}\). In this case we say that \(\mathcal{I}\) is a model of \(\mathcal{K}\).

Each DL is decidable if the problem of checking the satisfiability of a KB is decidable. In particular, \(\mathcal {SHOIN}(\mathbf {D})\) is decidable iff there are no number restrictions on non-simple roles. A role is non-simple iff it is transitive or it has transitive subroles.

A query \(Q\) over a KB \(\mathcal{K}\) is usually an axiom for which we want to test the entailment from the KB, written \(\mathcal{K}\models Q\). The entailment test may be reduced to checking the unsatisfiability of a concept in the knowledge base, i.e., the emptiness of the concept. For example, the entailment of the axiom \(C\sqsubseteq D\) may be tested by checking the unsatisfiability of the concept \(C \sqcap \lnot D\).

Example 1

The following KB is inspired by the ontology people+pets [28]:

$$\begin{array}{l} \ \ \ \ \exists hasAnimal.Pet \sqsubseteq NatureLover\\ \ \ \ \ \ { fluffy }: Cat \\ \ \ \ \ \ tom: Cat \\ \ \ \ \ \ Cat\sqsubseteq Pet\\ \ \ \ \ \ (kevin,{ fluffy }):hasAnimal \\ \ \ \ \ \ (kevin,tom):hasAnimal \ \ \ \ \ \end{array}$$

It states that individuals that own an animal which is a pet are nature lovers and that \(kevin\) owns the animals \({ fluffy }\) and \(tom\). Moreover, \({ fluffy }\) and \(tom\) are cats and cats are pets. The query \(Q=kevin:NatureLover\) is entailed by the KB.

3 The DISPONTE Semantics

DISPONTE [37] applies the distribution semantics [39] of probabilistic logic programming to DLs. A program following this semantics defines a probability distribution over normal logic programs called worlds. Then the distribution is extended to queries and the probability of a query is obtained by marginalizing the joint distribution of the query and the programs.

In DISPONTE, a probabilistic knowledge base \(\mathcal{K}\) contains a set of probabilistic axioms which take the form

$$\begin{aligned} p~{::}~E \end{aligned}$$
(1)

where \(p\) is a real number in \([0,1]\) and \(E\) is a DL axiom.

The idea of DISPONTE is to associate independent Boolean random variables to the probabilistic axioms. To obtain a world \(w\) we decide whether to include each probabilistic axiom or not in \(w\). A world therefore is a non probabilistic KB that can be assigned a semantics in the usual way. A query is entailed by a world if it is true in every model of the world.

The probability \(p\) can be interpreted as an epistemic probability, i.e., as the degree of our belief in axiom \(E\). For example, a probabilistic concept membership axiom \( p~{::}~a:C \) means that we have degree of belief \(p\) in \(C(a)\). A probabilistic concept inclusion axiom of the form \( p~{::}~C\sqsubseteq D \) represents the fact that we believe in the truth of \(C \sqsubseteq D\) with probability \(p\).

Formally, an atomic choice is a couple \((E_i,k)\) where \(E_i\) is the \(i\)th probabilistic axiom and \(k\in \{0,1\}\). \(k\) indicates whether \(E_i\) is chosen to be included in a world (\(k\) = 1) or not (\(k = 0\)). A composite choice \(\kappa \) is a consistent set of atomic choices, i.e., \((E_i,k)\in \kappa , (E_i,m)\in \kappa \) implies \(k=m\) (only one decision is taken for each axiom). The probability of a composite choice \(\kappa \) is \(P(\kappa )=\prod _{(E_i,1)\in \kappa }p_i\prod _{(E_i, 0)\in \kappa } (1-p_i)\), where \(p_i\) is the probability associated with axiom \(E_i\). A selection \(\sigma \) is a total composite choice, i.e., it contains an atomic choice \((E_i,k)\) for every axiom of the theory. A selection \(\sigma \) identifies a theory \(w_\sigma \) called a world in this way: \(w_\sigma =\{E_i|(E_i,1)\in \sigma \}\). Let us indicate with \(\mathcal {S}_\mathcal{K}\) the set of all selections and with \(\mathcal {W}_\mathcal{K}\) the set of all worlds. The probability of a world \(w_\sigma \) is \(P(w_\sigma )=P(\sigma )=\prod _{(E_i,1)\in \sigma }p_i\prod _{(E_i, 0)\in \sigma } (1-p_i)\). \(P(w_\sigma )\) is a probability distribution over worlds, i.e., \(\sum _{w\in \mathcal {W}_\mathcal{K}}P(w)=1\).

We can now assign probabilities to queries. Given a world \(w\), the probability of a query \(Q\) is defined as \(P(Q|w)=1\) if \(w\models Q\) and 0 otherwise. The probability of a query can be defined by marginalizing the joint probability of the query and the worlds:

$$\begin{aligned} P(Q)=\sum _{w\in \mathcal {W}_\mathcal{K}}P(Q,w)=\sum _{w\in \mathcal {W}_\mathcal{K}} P(Q|w)P(w)=\sum _{w\in \mathcal {W}_\mathcal{K}: w\models Q}P(w) \end{aligned}$$
(2)

4 Querying KBs

In order to answer queries to DL KBs, a tableau algorithm [42] can be used. Such an algorithm decides whether an axiom is entailed or not by a KB by refutation: axiom \(E\) is entailed if \(\lnot E\) has no model in the KB. The algorithm works on completion graphs also called tableaux: they are ABoxes that can also be seen as graphs, where each node represents an individual \(a\) and is labeled with the set of concepts \(\mathcal {L}(a)\) it belongs to. Each edge \(\langle a,b\rangle \) in the graph is labeled with the set of roles \(\mathcal {L}(\langle a,b\rangle )\) to which the couple \((a,b)\) belongs. The algorithm starts from a tableau that contains the ABox of the KB and the negation of the axiom to be proved. For example, if the query is a membership one, \(C(a)\), it adds \(\lnot C\) to the label of \(a\). If we query for the emptyness (unsatisfiability) of a concept \(C\), the algorithm adds a new anonymous node \(a\) to the tableau and adds \(C\) to the label of \(a\). The axiom \(C \sqsubseteq D\) can be proved by showing that \(C \sqcap \lnot D\) is unsatisfiable. The algorithm repeatedly applies a set of consistency preserving tableau expansion rules (see [35] for a list of expansion rules for \(\mathcal {SHOIN}(\mathbf {D})\)) until a clash (i.e., a contradiction) is detected or a clash-free graph is found to which no more rules are applicable.

Some of the rules used by the tableau algorithm are non-deterministic, i.e., they generate a finite set of tableaux. Thus the algorithm keeps a set of tableaux \(T\). If a non-deterministic rule is applied to a graph \(G\) in \(T\), then \(G\) is replaced by the resulting set of graphs.

An \(event\) during the execution of the algorithm can be [18]: (1) \(Add(C,a)\), the addition of a concept \(C\) to \(\mathcal {L}(a)\); (2) \(Add(R,\langle a,b\rangle )\), the addition of a role \(R\) to \(\mathcal {L}(\langle a,b\rangle )\); (3) \(Merge(a,b)\), the merging of the nodes \(a\), \(b\); (4) \({\ne }(a,b)\), the addition of the inequality \(a{\ne }b\) to the relation \({\ne }\); (5) \(Report(g)\), the detection of a clash \(g\). We use \(\mathcal {E}\) to denote the set of events recorded during the execution of the algorithm. A clash is either:

  • a couple \((C,a)\) where \(C\) and \(\lnot C\) are present in the label of node \(a\), i.e. \(\{C,\lnot C\} \subseteq \mathcal {L}(a)\);

  • a couple \((Merge(a,b), {\ne }(a,b))\), where the events \(Merge(a,b)\) and \({\ne }(a,b)\) belong to \(\mathcal {E}\).

Each time a clash is detected in a completion graph \(G\), the algorithm stops applying rules to \(G\). Once every completion graph in \(T\) contains a clash or no more expansion rules can be applied to it, the algorithm terminates. If all the completion graphs in the final set \(T\) contain a clash, the algorithm returns unsatisfiable as no model can be found. Otherwise, any one clash-free completion graph in \(T\) represents a possible model for \(C(a)\) and the algorithm returns satisfiable.

In order to perform probabilistic inference, we need not only to answer queries but also to compute explanations for queries. In fact, computing the probability of a query by generating the worlds of the KB would be impractical as there is an exponential number of them. By computing explanations, we find compact representations of the set of worlds where the query is true, as shown below.

4.1 Finding Explanations

The problem of finding explanations for a query has been investigated by various authors [13, 1820, 41]. It was called axiom pinpointing in [41] and considered as a non-standard reasoning service useful for tracing derivations and debugging ontologies. In particular, Schlobach and Cornet [41] define minimal axiom sets or MinAs for short.

Definition 1

(MinA). Let \(\mathcal{K}\) be a knowledge base and \(Q\) an axiom that follows from it, i.e., \(\mathcal{K}\models Q\). We call a set \(M\subseteq \mathcal{K}\) a minimal axiom set or MinA for \(Q\) in \(\mathcal{K}\) if \(M \models Q\) and it is minimal w.r.t. set inclusion.

We also explanation a MinA. The problem of enumerating all MinAs is called min-a-enum in [41]. All-MinAs(\(Q,\mathcal{K}\)) is the set of all MinAs for query \(Q\) in the knowledge base \(\mathcal{K}\).

We report here the techniques used by Pellet [44] to compute explanations for queries. Pellet first finds a single MinA by using a modified version of the tableau algorithm and then finds the others with a black box method: axioms are iteratively removed from the KB and new MinAs are computed until all possible MinAs have been found. The modified tableau algorithm is shown in Algorithm 1.

figure a

In this algorithm, each expansion rule updates as well a tracing function \(\tau \), which associates sets of axioms with events in the derivation. For example, \(\tau (Add(C,a))\) (\(\tau (Add(R,\langle a,b \rangle ))\)) is the set of axioms needed to explain the event \(Add(C,a)\) (\(Add(R,\langle a,b \rangle )\)). For the sake of brevity, we define \(\tau \) for couples (concept, individual) and (role, couple of individuals) as \(\tau (C,a)=\tau (Add(C,a))\) and \(\tau (R,\langle a,b \rangle )=\tau (Add(R,\langle a,b \rangle ))\) respectively. The function \(\tau \) is initialized as the empty set for all the elements of its domain except for \(\tau (C,a)\) and \(\tau (R,\langle a,b\rangle )\) to which the values \(\{a:C\}\) and \(\{(a,b):R\}\) are assigned if \(a:C\) and \((a,b):R\) are in the ABox respectively. The expansion rules add axioms to values of \(\tau \).

If \(g_1,\ldots ,g_n\) are the clashes, one for each tableau of the final set, the output of the algorithm Tableau is \(S = \bigcup _{i\in \{1,...,n\}}\tau (g_i)\setminus \{C(a)\}\) where \(a\) is the anonymous individual initially assigned to \(C\).

Tableau returns a single MinA. To solve min-a-enum, Pellet uses the hitting set algorithm [31]. The algorithm, described in detail in [18], starts from a MinA \(S\) and initializes a labeled tree called Hitting Set Tree (HST) with \(S\) as the label of its root \(v\). Then it selects an arbitrary axiom \(E\) in \(S\), it removes it from \(\mathcal{K}\), generating a new knowledge base \(\mathcal{K}' = \mathcal{K}- \{E\}\), and tests the unsatisfiability of \(C\) w.r.t. \(\mathcal{K}'\). If \(C\) is still unsatisfiable, we obtain a new explanation. The algorithm adds a new node \(w\) and a new edge \(\langle v,w\rangle \) to the tree, then it assigns this new explanation to the label of \(w\) and the axiom \(E\) to the label of the edge. The algorithm repeats this process until the unsatisfiability test returns negative: in that case the algorithm labels the new node with \(OK\), makes it a leaf, backtracks to a previous node, selects a different axiom to be removed from the KB and repeats these operations until the HST is fully built. The algorithm also eliminates extraneous unsatisfiability tests based on previous results: once a path leading to a node labeled \(OK\) is found, any superset of that path is guaranteed to be a path leading to a node where \(C\) is satisfiable, and thus no additional unsatisfiability test is needed for that path, as indicated by a \(X\) in the node label. When the HST is fully built, all leaves of the tree are labeled with \(OK\) or \(X\). The distinct non leaf nodes of the tree collectively represent the set All-MinAs \((C,\mathcal{K})\).

In [3, 4], Baader and Peñaloza presented the problem of finding a pinpointing formula instead of All-MinAs(\(Q,\mathcal{K}\)) for queries. The pinpointing formula is a monotone Boolean formula in which each Boolean variable corresponds to an axiom of the KB. This formula is built using the variables and the conjunction and disjunction connectives. It compactly encodes the set of all MinAs. Let assume that each axiom \(E\) of a KB \(\mathcal{K}\) is associated with a propositional variable, indicated with \(var(E)\). The set of all propositional variables is indicated with \(var(\mathcal{K})\). A valuation \(\nu \) of a monotone Boolean formula is the set of propositional variables that are true. For a valuation \(\nu \subseteq var(\mathcal{K})\), let \(\mathcal{K}_{\nu } := \{t \in \mathcal{K}|var(t)\in \nu \}\).

Definition 2

(Pinpointing formula). Given a query \(Q\) and a KB \(\mathcal{K}\), a monotone Boolean formula \(\phi \) over \(var(\mathcal{K})\) is called a pinpointing formula for \(Q\) if for every valuation \(\nu \subseteq var(\mathcal{K})\) it holds that \(\mathcal{K}_{\nu } \models Q\) iff \(\nu \) satisfies \(\phi \).

In Lemma 2.4 of [4], the authors proved that we can obtain all MinAs from a pinpointing formula by transforming the formula into DNF and removing disjuncts implying other disjuncts. The example below illustrates axiom pinpointing and the pinpointing formula.

Example 2

(Pinpointing formula). Consider the KB of Example 1. We associate Boolean variables with axioms as follows: \(F_1 = \exists hasAnimal.Pet \sqsubseteq NatureLover\), \(F_2=(kevin,{ fluffy }):hasAnimal\), \(F_3=(kevin,tom):hasAnimal\), \(F_4={ fluffy }: Cat\), \(F_5=tom: Cat\) and \(F_6= Cat\sqsubseteq Pet\). Let \(Q=kevin:NatureLover\) be the query, then All-MinAs(\(Q,\mathcal{K}\)) \(= \{\{F_2, F_4, F_6, F_1\},\{F_3, F_5, F_6, F_1\}\}\), while the pinpointing formula is \(((F_2 \wedge F_4) \vee (F_3 \wedge F_5))\wedge F_6 \wedge F_1\).

A tableau algorithm can be modified to find the pinpointing formula. See [4] for the details.

4.2 Probabilistic Inference

We do not have to generate all worlds where a query is true in order to compute its probability, finding a pinpointing formula is enough.

From a pinpointing formula \(\phi \) for \(Q\) we can compute the probability \(P(\phi )\) of \(\phi \) being true from the probability of the Boolean variables that appear in \(\phi \) assuming all the variables are independent. \(P(\phi )\) is the sum of the probabilities of the valuations that make the formula true. The probability of a valuation is given by

$$ P(\nu )=\prod _{var(E_i)\in \nu }p_i\prod _{var(E_i)\in var(\mathcal{K})\setminus \nu }(1-p_i) $$

where \(p_i\) is the probability associated with axiom \(E_i\). Computing \(P(\phi )\) is equivalent to performing weighted model counting [38]: each variable \(var(E_i)\) has a weight \(p_i\) when set to true and a weight \(1 - p_i\) when set to false, the weight of a truth assignment is the product of the weights of its literals and the weighted model count of a formula is the sum of the weights of its satisfying assignments.

Theorem 1

If \(\phi \) is a pinpointing formula for the query \(Q\) from a KB \(\mathcal{K}\), then \(P(Q)=P(\phi )\).

Proof

Every valuation \(\nu \subseteq var(\mathcal{K})\) that satisfies \(\phi \) uniquely corresponds to a world where \(Q\) is true. Thus the sum of the probability of the valuations that satisfy \(\phi \) is equal to the sum of the probabilities of the worlds where \(Q\) is true.

The pinpointing formula can be obtained directly from the inference algorithm or can be built starting from the set of all explanations \(K={\textsc {All}}\)-\({\textsc {MinAs}}(Q,\mathcal{K})\) in this way

$$\phi _K=\bigvee _{\kappa \in K}\bigwedge _{(E_i,1)\in \kappa }var(E_i).$$

It is easy to see that every valuation that makes \(\phi _K\) true uniquely corresponds to a world where \(Q\) is true. \(\phi _K\) is in Disjunctive Normal Form (DNF).

Weighted model counting is a #P-complete problem [11]. A practical approach for solving it involves knowledge compilation [8]: we translate the formula to a target language that allows weighted model counting in polynomial time. In this case the complexity is confined in the compilation process.

5 BUNDLE

BUNDLE is based on Pellet [44] and extends it in order to allow the computation of the probability of queries from a probabilistic knowledge base that follows the DISPONTE semantics. BUNDLE can answer concept and role membership queries, subsumption queries, and can find explanations both for the unsatifiability of one or all concepts contained in the KB and for the inconsistency of a KB.

Irrespective of which representation of the explanations we choose, a DNF or a general pinpointing formula, we can apply knowledge compilation and transform it into a Binary Decision Diagram (BDD), from which we can compute the probability (perform weighted model counting) of the query with a dynamic programming algorithm that is linear in the size of the BDD.

A BDD for a function of Boolean variables is a rooted graph that has one level for each Boolean variable. A node \(n\) in a BDD has two children: one corresponding to the 1 value of the variable associated with the level of \(n\), indicated with \(child_1(n)\), and one corresponding to the 0 value of the variable, indicated with \(child_0(n)\). When drawing BDDs, the 0-branch - the one going to \(child_0(n)\) - is distinguished from the 1-branch by drawing it with a dashed line. The leaves store either 0 or 1. Figure 1 shows a BDD for the function \( f(\mathbf {X})=(X_{1}\wedge X_{3})\,\vee \,(X_{2}\wedge X_{3}) \), where the variables \(\mathbf {X}=\{X_{1},X_{2},X_{3}\}\) are independent Boolean random variables whose probability of being true is \(p_i\) for the variable \(X_i\).

Fig. 1.
figure 1

BDD representing the function \(f(\mathbf {X})=(X_{1}\wedge X_{3})\vee (X_{2}\wedge X_{3})\).

A BDD performs a Shannon expansion of the Boolean formula \(f(\mathbf {X})\), so that, if \(X\) is the variable associated with the root level of a BDD, the formula \(f(\mathbf {X})\) can be represented as \(f(\mathbf {X})=X\wedge f^X(\mathbf {X})\vee \overline{X}\wedge f^{\overline{X}}(\mathbf {X})\) where \(f^X(\mathbf {X})\) (\(f^{\overline{X}}(\mathbf {X})\)) is the formula obtained by \(f(\mathbf {X})\) by setting \(X\) to 1 (0). Now the two disjuncts are pairwise exclusive and the probability of \(f(\mathbf {X})\) can be computed as \(P(f(\mathbf {X}))=P(X)P( f^X(\mathbf {X}))+(1-P(X))P(f^{\overline{X}}(\mathbf {X}))\). Algorithm 2 shows function Prob that implements the dynamic programming algorithm for computing the probability of a formula encoded as a BDD. The function should also store the value of already visited nodes in a table so that, if a node is visited again, its probability can be retrieved from the table. For the sake of simplicity the algorithm does not show this optimization but it is fundamental to achieve linear cost in the number of nodes, as without it the cost of function Prob would be proportional to \(2^n\) where \(n\) is the number of Boolean variables.

figure b

The main BUNDLE function, shown in Algorithm 3, first builds a data structure \(PMap\) that associates each DL axiom \(E_i\) with its probability \(p_i\). In the OWL files the probabilistic information is specified using the annotation system allowed by the OWL language. Then BUNDLE uses Pellet’s \(\textsc {ExpHST}(C,\mathcal{K})\) function that computes all the MinAs for the unsatisfiability of a concept \(C\) using the Hitting Set Tree algorithm. BUNDLE exploits the version of this function in which we can specify the maximum number of explanations to be found.

figure c

Two data structures are initialized: \(VarAx\) is an array that maintains the association between Boolean random variables (whose index is the array index) and couples (axiom, probability), and \(BDD\) stores a BDD. \(BDD\) is initialized to the zero Boolean function.

Then BUNDLE performs two nested loops that build a BDD representing the pinpointing formula in DNF. To manipulate BDDs we used JavaBDDFootnote 1 that is an interface to a number of underlying BDD manipulation packages. As the underlying package we used CUDD.

In the outer loop, BUNDLE combines BDDs for different explanations. In the inner loop, BUNDLE generates the BDD for a single explanation.

In the outer loop, \(BDDE\) is initialized to the “one” Boolean function. In the inner loop, the axioms of each MinA are considered one by one. The value \(p\) associated with the axiom is extracted from \(PMap\). The axiom is searched for in \(VarAx\) to see if it was already assigned a random variable. If not, a cell is added to \(VarAx\) to store the couple. At this point we know the couple position \(i\) in \(VarAx\) and so the index of its Boolean variable \(X_i\). We obtain a BDD representing \(X_i=1\) with BDDGetIthVar and we conjoin it with \(BDDE\). After the two cycles, function Prob of Algorithm 2 is called over \(BDD\) and its result is returned to the user.

6 TRILL

TRILL implements a tableau algorithm that computes the pinpointing formula representing the set of MinAs. After generating the pinpointing formula, TRILL converts it into a BDD and computes the probability of the query. TRILL can answer concept and role membership queries, subsumption queries, and can find explanations both for the unsatifiability of a concept contained in the KB and for the inconsistency of the entire KB. TRILL was implemented in Prolog, so the management of the non-determinism of the rules is delegated to this language.

We use the Thea2 library [45] for converting OWL DL KBS into Prolog. Thea2 performs a direct translation of the OWL axioms into Prolog facts. For example, a simple subclass axiom between two named classes \(Cat\sqsubseteq Pet\) is written using the subClassOf/2 predicate as subClassOf(‘Cat’,‘Pet’). For more complex axioms, Thea2 exploits the list construct of Prolog, so the axiom \(NatureLover \equiv PetOwner \sqcup GardenOwner\) becomes equivalentClasses( [‘NatureLover’, unionOf([‘PetOwner’, ‘GardenOwner’])]). When a probabilistic KB is given as input, for each probabilistic axiom of the form \(Prob~{::}~Axiom\) a fact p(Axiom,Prob) is asserted in the Prolog KB.

In order to represent the tableau, TRILL uses a couple \(Tableau=(A, T)\), where \(A\) is a list containing information about nominal individuals and class assertions with the corresponding value of the pinpointing formula, while \(T\) is a triple (\(G\), \(RBN\), \(RBR\)) in which \(G\) is a directed graph that contains the structure of the tableau, \(RBN\) is a red-black tree (a key-value dictionary) in which a key is a couple of individuals and its value is the set of the labels of the edge between the two individuals, and \(RBR\) is a red-black tree in which a key is a role and its value is the set of couples of individuals that are linked by the role. This representation allows to quickly find the information needed during the execution of the tableau algorithm. For managing the blocking system we use a predicate for each blocking state: nominal/2, blockable/2, blocked/2, indirectly_blocked/2 and safe/3. Each predicate takes as arguments the individual \(Ind\) and the tableau \((A, T)\); safe/3 takes as input also the role \(R\). For each individual \(Ind\) in the ABox we add the atom \(nominal(Ind)\) to \(A\), then every time we have to check the blocking status of an individual we call the corresponding predicate that returns the status by checking the tableau.

Deterministic and non-deterministic tableau expansion rules are treated differently. Non-deterministic rules are implemented by a predicate \(rule\_name(Tab, TabList)\) that, given the current tableau \(Tab\), returns the list of tableaux \(TabList\) created by the application of the rule on \(Tab\), while deterministic rules are implemented by a predicate \(rule\_name(Tab,Tab1)\) that, given the current tableau \(Tab\), returns the tableau \(Tab1\) obtained by the application of the rule on \(Tab\).

Expansion rules are applied in order by apply_all_rules/2, first the non-deterministic ones and then the deterministic ones. The predicate apply_nondet_ rules(RuleList,Tab,Tab1) takes as input the list of non-deterministic rules and the current tableau and returns a tableau obtained by the application of one of the rules. apply_nondet_rules/3 is called as apply_nondet_rules([or_rule], Tab,Tab1) and is shown in Fig. 2. If a non-deterministic rule is applicable, the list of tableaux obtained by its application is returned by the predicate corresponding to the applied rule, a cut is performed to avoid backtracking to other rule choices and a tableau from the list is non-deterministically chosen with the member/2 predicate.

Fig. 2.
figure 2

Definition of the non-deterministic expansion rules by means of the predicates apply_all_rules/2 and apply_nondet_rules/3.

If no non-deterministic rule is applicable, deterministic rules are tried sequentially with the predicate apply_det_rules/3, shown in Fig. 3, that is called as apply_det_rules(RuleList,Tab,Tab1). It takes as input the list of deterministic rules and the current tableau and returns a tableau obtained with the application of one of the rules.

After the application of a deterministic rule, a cut avoids backtracking to other possible choices for the deterministic rules. If no rule is applicable, the input tableau is returned and rule application stops, otherwise a new round of rule application is performed.

Fig. 3.
figure 3

Definition of the deterministic expansion rules by means of the predicate apply_det_rules/3.

Fig. 4.
figure 4

Code of the predicates build_bdd_rules/2.

Once the pinpointing formula is built, TRILL builds the corresponding BDD by using the build_bdd/2 predicate, shown in Fig. 4, that takes as input a pinpointing formula and returns the correspondig BDD. It scans the pinpointing formula and, for each variable, it searches for the probabilistic axiom corresponding to the variable with the query p(Axiom,Prob). If the query succeeds, it creates the corresponding BDD and combines it with the BDD representing the pinpointing formula. Finally, it computes the probability of the query from the BDD so built using the predicate compute_prob/2. The predicates one/1 and zero/1 return BDDs representing the Boolean constants 1 and 0; and/3 and or/3 execute Boolean operations between BDDs. get_var_n/4 returns the random variable associated with axiom X and list of probabilities [Prob,ProbN], where \(\mathtt{ProbN }=1-\mathtt{Prob }\). equality/3 returns the BDD B associated with the expression VX=val where VX is a random variable and val is 0 or 1. The predicate p/2 is used for specifying the association between axioms and probability, i.e. p(subClassOf(’A’,’B’),0.9) asserts the axiom \(A \sqsubseteq B\) is associated with a probability of \(0.9\). The predicates compute_prob/2, one/1, zero/1, and/3, or/3, get_var_n/4 and equality/3 are imported from a Prolog library of the cplint suite [33].

7 Related Work

While there are many works that propose approaches for combining probability and DLs, there are relatively fewer inference algorithms. One of these is PRONTO [21] that, similarly to BUNDLE, is based on Pellet. PRONTO performs inference on P-\(\mathcal {SHIQ}(\mathbf {D})\) [25] KBs instead of DISPONTE. In these KBs the probabilistic part contains conditional constraints of the form \((D|C)[l,u]\) that informally mean “generally, if an object belongs to \(C\), then it belongs to \(D\) with a probability in the interval \([l,u]\)”. P-\(\mathcal {SHIQ}(\mathbf {D})\) uses probabilistic lexicographic entailment from probabilistic default reasoning and allows both terminological and assertional probabilistic knowledge about instances of concepts and roles. P-\(\mathcal {SHIQ}(\mathbf {D})\) is based on Nilsson’s probabilistic logic [27] that defines probabilistic interpretations instead of a single probability distribution over theories.

Differently from BUNDLE and PRONTO, reasoners written in Prolog can exploit Prolog’s backtracking facilities for performing the search. This has been observed in various work. Beckert and Posegga [6] proposed a tableau reasoner in Prolog for First Order Logic (FOL) based on free-variable semantic tableaux. However, the reasoner is not tailored to DLs.

Hustadt, Motik and Sattler [17] presented the KAON2 algorithm that exploits basic superposition, a refutational theorem proving method for FOL with equality, and a new inference rule, called decomposition, to reduce a \(\mathcal {SHIQ}\) KB into a disjunctive datalog program, while DLog [24] is an ABox reasoning algorithm for the \(\mathcal {SHIQ}\) language that allows to store the content of the ABox externally in a database and to answer instance check and instance retrieval queries by transforming the KB into a Prolog program.

Meissner [26] presented the implementation of a Prolog reasoner for the DL \(\mathcal {ALCN}\). This work was the basis of the work of Herchenröder [15], that considered \(\mathcal {ALC}\) and improved the work of Meissner by implementing heuristic search techniques to reduce the running time. Faizi [10] added to [15] the possibility of returning information about the steps executed during the inference process for queries but still handled only \(\mathcal {ALC}\).

A different approach is the one of Ricca et al. [32] that presented OntoDLV, a system for reasoning on a logic-based ontology representation language called OntoDLP. This is an extension of (disjunctive) ASP and can interoperate with OWL. OntoDLV rewrites the OWL KB into the OntoDLP language, can retrieve information directly from external OWL Ontologies and answers queries by using ASP.

TRILL differs from the previous works for the target description logics (\(\mathcal {ALC}\)) and for the fact that those reasoners do not return explanations for the given queries. Moreover, TRILL differs in particular from DLog for the possibility of answering general queries instead of instance check and instance retrieval only.

8 Experiments

In this section, we evaluate the performance of TRILL and BUNDLE. We first compare BUNDLE with the publicly available version of PRONTO on four probabilistic ontologies. The experiments have been performed on Linux machines with a 3.10 GHz Intel Xeon E5-2687W with 2 GB memory allotted to Java.

The first ontology is BRCAFootnote 2 that models breast cancer risk assessment. It contains a certain part and a probabilistic part. The tests were defined following [22]: we randomly sampled axioms from the probabilistic part of this ontology which are then added to the certain part. So each sample was a probabilistic KB with the full certain part of the BRCA ontology and a subset of the probabilistic constraints. We varied the number of these constraints from 9 to 15, and, for each number, we generated 100 different consistent ontologies. In order to generate a query, an individual \(a\) is added to the ontology. \(a\) is randomly assigned to each class that appears in the sampled conditional constraints with probability 0.6. If the class is composite, as for example PostmenopausalWomanTakingTestosterone, \(a\) is assigned to the component classes rather than to the composite one. In the example above, \(a\) will be added to PostmenopausalWoman and WomanTakingTestosterone. The ontologies are then translated into DISPONTE by replacing the constraint \((D|C)[l,u]\) with the axiom \(u~{::}~C\sqsubseteq D\).

For each ontology we perform the query \(a:C\) where the class \(C\) is randomly selected among those that represent women under increased and lifetime risk such as WomanUnderLifetimeBRCRisk and WomanUnderStronglyIncreasedBRCRisk. We then applied both BUNDLE and PRONTO to each generated test and we measured the execution time and the memory used. Figure 5(a) shows the execution time averaged over the 100 KBs as a function of the number of probabilistic axioms and, similarly, Fig. 5(b) shows the average amount of memory used. As one can see, execution times are similar for small KBs, but the difference between the two reasoners rapidly increases for larger knowledge bases. The memory usage for BUNDLE is always less than 53 % with respect of PRONTO.

Fig. 5.
figure 5

Comparison between BUNDLE and PRONTO on the BRCA KB.

The other three ontologies are an extract from the CellFootnote 3 ontology that represents cell types of the prokaryotic, fungal, and eukaryotic organisms, an extract from the NCI ThesaurusFootnote 4 that describes human anatomy and an extract from the Teleost_anatomyFootnote 5 ontology (Teleost for short) that is a multi-species anatomy ontology for teleost fishes. For each of these KBs we considered the versions of increasing size used in [23]: the authors added 250, 500, 750 and 1000 new probabilistic conditional constraints to the extract of the publicly available non-probabilistic version of each ontology. We converted these KBs into DISPONTE in the same way presented for the BRCA ontology and we created a set of 100 different random subclass queries for each KB, such as \(CL\_0000802 \sqsubseteq CL\_0000800\) for the Cell KB, \(NCI\_C32042 \sqsubseteq NCI\_C32890\) for the NCI Thesaurus and \(TAO\_0001102 \sqsubseteq TAO\_0000139\) for the Teleost KB. For generating the queries we built the hierarchy of each KB and we randomly selected two classes connected in the hierarchy for each query, so that it had at least one explanation.

In Table 1 we report, for each version of the datasets, the average execution time for BUNDLE to perform inference. In addition, for each KB we report its number of non-probabilistic TBox axioms. With these datasets, PRONTO always terminated with out-of-memory error.

Table 1. Average execution time for the queries to the Cell, Teleost and NCI KBs. The first column reports the size of the non-probabilistic TBox of each KB.
Table 2. Results of the experiments on BRCA, DBPedia, Biopax and Vicodi KBs in terms of average times for computing the probability of queries. The first column reports the size of the non-probabilistic TBox of each KB.

As can be seen, BUNDLE needs lower amount of memory and is faster than the publicly available version of PRONTO. BUNDLE can answer most queries in a few seconds and manage larger KBs with respect to PRONTO.

Finally, we tested TRILL performance when computing probability of queries by comparing it to BUNDLE. The experiments have been performed on a Linux machine with a 2.33 GHz Intel Dual Core E6550 with 2 GB memory allotted to Java. We consider four different knowledge bases of various complexity: BRCA already used for the comparison with PRONTO, an extract of the DBPediaFootnote 6 ontology obtained from Wikipedia, Biopax level 3Footnote 7 that models metabolic pathways and VicodiFootnote 8 that contains information about European history. For the tests, we used the DBPedia and the Biopax KBs without ABox while for BRCA and Vicodi we used a small ABox containing 1 individual for the first one and 19 individuals for the second one. We added 50 probabilistic axioms to each KB. For BRCA we used the probabilistic axioms already created for the previous test, while for the other KBs we created the probabilistic axioms by randomly selecting certain axioms from them and associating a random probability. For each dataset we randomly created 100 different queries. In particular, for the DBPedia and Biopax we created 100 subclass-of queries while for the other KBs we created 80 subclass-of and 20 instance-of queries. Some examples of queries are \(Village \sqsubseteq PopulatedPlace\) for DBPedia, \(TransportWithBiochemicalReaction \sqsubseteq Entity\) for Biopax and Creator(Anthony-van-Dyck-is-Painter-in-Flanders) for Vicodi KB. The queries generated for the BRCA KB are similar with those used in the test of BUNDLE. For generating the subclass-of queries, we randomly selected two classes that are connected in the class hierarchy, while for the instance-of queries we randomly selected an individual \(a\) and a class to which \(a\) belongs by following the class hierarchy, starting from the class to which \(a\) explicitly belongs, so that each query had at least one explanation. Table 2 shows, for each ontology, the number of non-probabilistic axioms and the average time in seconds that TRILL and BUNDLE took for answering the queries.

These preliminary tests show that TRILL is sometimes able to outperform BUNDLE, thanks to the fact that the translation of the set of explanations into a DNF formula is not required. However, on DBPedia, its longer running time may be due to the lack of all the optimizations that BUNDLE inherits from Pellet. This represents evidence that a Prolog implementation of a Semantic Web tableau reasoner is feasible and that may lead to a practical system.

9 Conclusions

In this paper we presented the DISPONTE semantics for probabilistic DLs that is inspired by the distribution semantics of probabilistic logic programming. We also presented the systems BUNDLE and TRILL for reasoning on DISPONTE KBs and their implementations. Both systems are tested on real world datasets. The experiments show that BUNDLE uses less memory and is faster than the publicly available version of the probabilistic reasoner PRONTO and is able to manage larger KBs. Moreover, the results for TRILL show that Prolog is a viable language for implementing DL reasoning algorithms and that its performance is comparable with that of a state-of-the-art reasoner. Both TRILL and BUNDLE are able to deal with ontologies of significant complexity.