Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A fundamental element of any intelligent application is storing and manipulating the knowledge from the application domain. Logic-based knowledge representation languages such as description logics (DLs) [1] provide a clear syntax and unambiguous semantics that guarantee the correctness of the results obtained. However, languages based on classical logic are ill-suited for handling the uncertainty inherent to many application domains. To overcome this limitation, various probabilistic logics have been investigated during the last three decades (e.g., [3, 15, 20]). In particular, several probabilistic DLs have been developed [18, 19]. To handle probabilistic knowledge, many approaches require a complete definition of joint probability distributions (JPD) [5, 6, 8, 16, 26]. One approach to avoid a full JPD specification was proposed by Paris [22]: the user gives a partial specification through a set of probabilistic constraints and the partial knowledge is completed by means of the principle of maximum entropy.

In this paper we consider a new probabilistic extension of description logics based on the principle of maximum entropy. In our approach we group different axioms from a knowledge base together into so-called contexts, which are identified by a propositional formula. Intuitively, each context corresponds to a possible situation, in which the associated sub-KB is guaranteed to hold. Uncertainty is associated to the contexts through a set of probabilistic constraints, which are interpreted under the principle of maximum entropy.

To facilitate the understanding of our approach, we focus on the DL \(\mathcal{ALC}\)  [27] as a prototypical example of a knowledge representation language, and propositional probabilistic constraints as the framework for expressing uncertainty. As reasoning service we consider subsumption relations between concepts given some partial knowledge of the current context. Since the knowledge in a knowledge base is typically incomplete, one cannot expect to obtain a precise probability for a given consequence. Instead, we compute a belief interval that describes all the probability degrees that can be associated to the consequence without contradiction. The lowest bound of the interval corresponds to a sceptical view, considering only the most fundamental models of the knowledge base. The upper bound, in contrast, reflects the credulous belief in which every context that is not explicitly removed is considered. In the worst-case, we get the trivial interval [0, 1], in the best case, we get a point probability where the upper and lower bounds coincide. In some applications, it might be reasonable to consider only one of these bounds. For instance, if the probability interval that a treatment will cause heavy complications is [0.01, 0.05], we might want to use the upper bound 0.05. In contrast, when the probability interval that a treatment will be successful is [0.7, 0.9], we might be more interested in the lower bound 0.7.

The main contributions of this paper are the following:

  • we define the new probabilistic description logic \(\mathcal {ALCP}\) that allows for a flexible description of axiomatic dependencies, and its reasoning problems (Sect. 3);

  • we explain in detail how degrees of belief for the subsumption problem can be computed (Sect. 4); and

  • we show that \(\mathcal {ALCP}\) satisfies several desirable properties of probabilistic logics (Sect. 5).Footnote 1

2 Maximum Entropy

We start by recalling the basic notions of probabilistic propositional logic and the principle of maximum entropy.

Let \(\mathcal {L}\) be a propositional language constructed over a finite signature \(\mathsf {sig} (\mathcal {L})\), i.e., a set of propositional variables, in the usual way. An \(\mathcal {L}\)-interpretation \(\textit{v}\) is a truth assignment of the propositional variables in \(\mathsf {sig} (\mathcal {L})\). \(\textit{Int}(\mathcal {L})\) denotes the set of all \(\mathcal {L}\)-interpretations. Satisfaction of a formula \(\phi \in \mathcal {L} \) by an \(\mathcal {L}\)-interpretation \(\textit{v} \in \textit{Int}(\mathcal {L}) \) (denoted \(\textit{v} ~\models ~\phi \)) is defined as usual. A probability distribution over \(\mathcal {L} \) is a function \(P: \textit{Int}(\mathcal {L}) \rightarrow [0,1]\) where \(\sum _{\textit{v} \in \textit{Int}(\mathcal {L})} P(\textit{v}) = 1\). Probability distributions are extended to arbitrary \(\mathcal {L}\)-formulas \(\phi \) by setting \(P(\phi ) = \sum _{\textit{v} ~\models ~\phi } P(\textit{v})\).

Definition 1

(Probabilistic Constraints, Models). Given the propositional language \(\mathcal {L}\), a probabilistic constraint (over \(\mathcal {L})\) is an expression of the form

$$\begin{aligned} c_0 + \sum _{i=1}^k c_i \cdot \mathsf {p} (\phi _i) \ge 0 \end{aligned}$$
(1)

where \(c_0, c_i \in \mathbb {R}\), and \(\phi _i \in \mathcal {L} \), \(1 \le i \le k\). A probability distribution P over \(\mathcal {L}\) is a model of the probabilistic constraint \(c_0+\sum _{i=1}^k c_i \cdot \mathsf {p} (\phi _i) \ge 0\) if and only if \(c_0+\sum _{i=1}^k c_i \cdot P(\phi _i) \ge 0\). The distribution P is a model of the set of probabilistic constraints \(\mathcal {R}\) (\(P~\models ~\mathcal {R} \)) off it satisfies all the constraints in \(\mathcal {R}\). The set of all models of \(\mathcal {R} \) is denoted by \(\textit{Mod} (\mathcal {R})\). If \(\textit{Mod} (\mathcal {R}) \ne \emptyset \), we say that \(\mathcal {R}\) is consistent.

Our probabilistic constraints can express the most common types of constraints considered in the literature of probabilistic logics. For instance, probabilistic conditionals \((\psi \mid \phi )[\ell ,u]\) are satisfied iff \(\ell \cdot P(\phi ) \le P(\psi \wedge \phi ) \le u \cdot P(\phi )\) [17]. That is, the conditional is satisfied iff the conditional probability of \(\psi \) given \(\phi \) is between \(\ell \) and u whenever \(P(\phi ) > 0\). Sometimes \(P(\phi ) > 0\) is demanded, but strict inequalities are computationally difficult and the semantical differences are negligible in many cases, see [25] for a thorough discussion. These conditions can be expressed in the form (1) as follows

$$\begin{aligned} \mathsf {p} (\psi \wedge \phi ) - \ell \cdot \mathsf {p} (\phi )&{} \ge 0, \qquad \textit{and}\\ u \cdot \mathsf {p} (\phi ) - \mathsf {p} (\psi \wedge \phi )&{} \ge 0. \end{aligned}$$

Probabilistic constraints can also express more complex restrictions; for example, we can state that the probability that a bird cannot fly is at most one fourth of the probability that a bird flies through the constraint

$$\begin{aligned} \frac{1}{4} \mathsf {p} ({\small {{\textsc {bird}}}} \wedge {\small {{\textsc {flies}}}}) - \mathsf {p} ({\small {{\textsc {bird}}}} \wedge \lnot {\small {{\textsc {flies}}}}) \ge 0. \end{aligned}$$
(2)

To improve readability, we will often rewrite constraints in a more compact manner, using conditionals as in the first example, or e.g. rewriting (2) as \( \frac{1}{4} \mathsf {p} ({\small {{\textsc {bird}}}} \wedge {\small {{\textsc {flies}}}}) \ge \mathsf {p} ({\small {{\textsc {bird}}}} \wedge \lnot {\small {{\textsc {flies}}}}). \)

In general, consistent sets of probabilistic constraints have infinitely many models, and there is no obvious way to distinguish between them. One well-studied approach for dealing with this diversity is to focus on the model that maximizes the entropy

$$\begin{aligned} H(P) = - \sum _{\textit{v} \in \textit{Int}(\mathcal {L})} P(\textit{v}) \cdot \log P(\textit{v}). \end{aligned}$$

From an information-theoretic point of view, the maximum entropy (ME) distribution can be regarded as the most conservative one in the sense that it minimizes the information-theoretic distance (that is, the KL-divergence) to the uniform distribution among all probability distributions that satisfy our constraints. In particular, if there are no restrictions on the probability distributions considered, then the uniform distribution is the ME distribution, see, e.g., [28] for a more detailed discussion of these issues. A complete characterization of maximum entropy for the purpose of uncertain reasoning can be found in [22].

Definition 2

(ME-Model). Let \(\mathcal {R}\) be a consistent set of probabilistic constraints. The ME-model \(P^\textit{ME}_{\mathcal {R}}\) of \(\mathcal {R} \) is the unique solution of the maximization problem \( \arg \max _{P~\models ~\mathcal {R}} H(P). \)

Existence and uniqueness of \(P^\textit{ME}_{\mathcal {R}}\) follows from the fact that H is strictly concave and continuous, and that the probability distributions that satisfy \(\mathcal {R} \) form a compact and convex set. \(P^\textit{ME}_{\mathcal {R}}\) is usually computed by deriving an unconstrained optimization problem by means of the Karush-Kuhn-Tucker conditions. The resulting problem can be solved, for instance, by (quasi-)Newton methods with cost \(|\textit{Int}(\mathcal {L}) |^3\), see, e.g., [21] for more details on these techniques.

3 The Probabilistic Description Logic \(\mathcal {ALCP}\)

\(\mathcal {ALCP}\) is a probabilistic extension of the classical description logic \(\mathcal{ALC}\) capable of expressing complex logical and probabilistic relations. As with classical DLs, the main building blocks in \(\mathcal {ALCP}\) are concepts. Syntactically, \(\mathcal {ALCP}\) concepts are constructed exactly as \(\mathcal{ALC}\) concepts. Given two disjoint sets \(\mathsf{N_C}\) of concept names and \(\mathsf{N_R}\) of role names, \(\mathcal {ALCP}\) concepts are built using the grammar rule \( C\,{:}{:=}\,A \mid \lnot C \mid C\sqcap C\mid \exists r.C, \) where \(A\in \mathsf{N_C} \) and \(r\in \mathsf{N_R} \). Note that we can derive disjunction, universal quantification and subsumption from these rules by using logical equivalences like \(C_1 \sqcup C_2 \equiv \lnot (\lnot C_1 \sqcap \lnot C_2)\). The knowledge of the application domain is expressed through a finite set of axioms that restrict the way the different concepts and roles may be interpreted. To express both probabilistic and logical relationships, each axiom is annotated with a formula from \(\mathcal {L}\) that intuitively expresses the context in which this axiom holds.

Definition 3

(KB). An \({\mathcal {L}}\)-restricted general concept inclusion (\(\mathcal {L}\)-GCI) is of the form \(\langle C\sqsubseteq D:\kappa \rangle \) where CD are \(\mathcal {ALCP}\) concepts and \(\kappa \) is an \(\mathcal {L}\)-formula. An \(\mathcal {L} {\text {-TBox}}\) is a finite set of \(\mathcal {L}\)-GCIs. An \(\mathcal {ALCP}\) knowledge base (KB) over \(\mathcal {L}\) is a pair \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) where \(\mathcal {R}\) is a set of probabilistic constraints and \(\mathcal {T}\) is an \(\mathcal {L}\)-TBox.

Example 4

Consider an application modeling beliefs about bacterial and viral infections using the concept names \(\small {\textsf {strep}}\) (streptococcal infection), \(\small {\textsf {bac}}\) (bacterial infection), \(\small {\textsf {vir}}\) (viral infection), \(\small {\textsf {inf}}\) (infection), and \(\small {\textsf {ab}}\) (antibiotic); and the role names \(\small {\textsf {sf}}\) (suffers from), and \(\small {\textsf {suc}}\) (successful treatment); and the propositional variables \(\small {\textsc {res}}\) (antibiotic resistance), and \(\small {{\textsc {h}}}\) (heavy use of antibiotics by patient). Define the \(\mathcal {L}\)-TBox \({\mathcal {T} _{\mathsf {exa}}}\) containing the \(\mathcal {L}\)-GCIs

$$\begin{aligned}&{\langle \exists {\small {{\textsf {sf}}}}.{\small {{\textsf {bac}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}: \lnot {\small {{\textsc {res}}}}{\wedge } \lnot {\small {{\textsc {h}}}}\rangle },&\langle \exists {\small {{\textsf {sf}}}}.{\small {{\textsf {vir}}}} \sqsubseteq \lnot \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}: \top \rangle ,&\quad \langle {\small {{\textsf {strep}}}} \sqsubseteq {\small {{\textsf {bac}}}}: \top \rangle ,\\&{\langle \exists {\small {{\textsf {sf}}}}.{\small {{\textsf {bac}}}} \sqsubseteq \lnot \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}: {\small {{\textsc {res}}}}\rangle },&{\langle {\small {{\textsf {bac}}}} \sqsubseteq {\small {{\textsf {inf}}}}: \top \rangle },&\quad \langle {\small {{\textsf {vir}}}} \sqsubseteq {\small {{\textsf {inf}}}}: \top \rangle , \end{aligned}$$

where \(\top \) is any \(\mathcal {L}\)-tautology. For example, the first \(\mathcal {L}\)-GCI states that a bacterial infection can be treated successfully with antibiotics if no antibiotic resistance is present and there was no heavy use of antibiotics; the second one states that viral infections can never be treated with antibiotics successfully. Consider additionally the set \(\mathcal {R}\) containing the probabilistic constraints containing

$$\begin{aligned}&({\small {{\textsc {res}}}})[0.05],&({\small {{\textsc {res}}}} \mid {\small {{\textsc {h}}}})[0.8]. \end{aligned}$$

That is, the probability of an antibiotic resistance is \(5\,\%\) if no further information is given. If the patient used antibiotics heavily, the probability increases to \(80\,\%\).

Notice that the probabilistic constraints, and hence the representation of the uncertainty in the knowledge, refer only to the propositional formulas that label the \(\mathcal {L}\)-GCIs. In \(\mathcal {ALCP}\), the uncertainty of the knowledge is handled through these propositional formulas as explained next.

A possible world interprets both the axiom language (i.e., the concept and role names) and the context language (the propositional variables). Intuitively, it describes a possible context (\(\mathcal {L}\)-interpretation) together with the relationships between concepts in that situation (\(\mathcal {ALC}\)-interpretation).

Definition 5

(Possible World). A possible world is a triple \(\mathcal {I} =(\varDelta ^\mathcal {I},\cdot ^\mathcal {I},\textit{v} ^\mathcal {I})\) where \(\varDelta ^\mathcal {I} \) is a non-empty set (called the domain), \(\textit{v} ^\mathcal {I} \) is an \(\mathcal {L}\)-interpretation, and \(\cdot ^\mathcal {I} \) is an interpretation function that maps every concept name A to a set \(A^\mathcal {I} \subseteq \varDelta ^\mathcal {I} \) and every role name r to a binary relation \(r^\mathcal {I} \subseteq \varDelta ^\mathcal {I} \times \varDelta ^\mathcal {I} \).

The interpretation function \(\cdot ^\mathcal {I} \) is extended to complex concepts as usual in DLs by letting \((\lnot C)^\mathcal {I}:=\varDelta ^\mathcal {I} \setminus C^\mathcal {I} \); \((\exists r.C)^\mathcal {I}:=\{d\in \varDelta ^\mathcal {I} \mid \exists e\in \varDelta ^\mathcal {I}.(d,e)\in r^\mathcal {I}, e\in C^\mathcal {I} \}\); and \((C\sqcap D)^\mathcal {I}:=C^\mathcal {I} \cap D^\mathcal {I} \). A possible world is a model of an \(\mathcal {L}\)-GCI iff it satisfies the description logic constraint of the axiom whenever it satisfies the context.

Definition 6

(Model of TBox). A possible world \(\mathcal {I} =(\varDelta ^\mathcal {I},\cdot ^\mathcal {I},\textit{v} ^\mathcal {I})\) is a model of the \(\mathcal {L}\)-GCI \(\langle C\sqsubseteq D:\kappa \rangle \) (\(\mathcal {I} \models \langle C\sqsubseteq D:\kappa \rangle \)) iff (i) \(\textit{v} ^\mathcal {I} \not \models \kappa \), or (ii) \(C^\mathcal {I} \subseteq D^\mathcal {I} \). It is a model of the \(\mathcal {L}\)-TBox \(\mathcal {T}\) iff it is a model of all \(\mathcal {L}\)-GCIs in \(\mathcal {T}\).

The classical DL \(\mathcal{ALC}\) is a special case of \(\mathcal {ALCP}\) where all the axioms are annotated with an \(\mathcal {L}\)-tautology \(\top \). To preserve the syntax of classical DLs, we denote such \(\mathcal {L}\)-GCIs as \(C\sqsubseteq D\) instead of \(\langle C\sqsubseteq D:\top \rangle \). In this case, the condition (i) from Definition 6 cannot be satisfied, and hence a model is required to satisfy \(C^\mathcal {I} \subseteq D^\mathcal {I} \) for all \(\mathcal {L}\)-GCIs \(C\sqsubseteq D\) in the TBox. For a deeper introduction to classical \(\mathcal{ALC}\), see [1].

According to our semantics, we only demand that the \(\mathcal {L}\)-GCIs are satisfied in some specific contexts. Thus, it is often useful to focus on the classical \(\mathcal{ALC}\) TBox that contains the knowledge that holds in a particular situation. For a KB \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) and \(\textit{v} \in \textit{Int}(\mathcal {L}) \), the \(\textit{v}\)-restricted TBox is the \(\mathcal{ALC}\) TBox

$$\begin{aligned} \mathcal {T} _\textit{v}:= \{ C\sqsubseteq D \mid \langle C\sqsubseteq D:\kappa \rangle \in \mathcal {T}, \textit{v} \models \kappa \}. \end{aligned}$$

The possible world \(\mathcal {I}\) satisfies \(\mathcal {T} _\textit{v} \) (\(\mathcal {I} \models \mathcal {T} _\textit{v} \)) if for all \(\mathcal {L}\)-GCIs \(C\sqsubseteq D\in \mathcal {T} _\textit{v} \) it holds that \(C^\mathcal {I} \subseteq D^\mathcal {I} \). In the following, we will often consider subsumption and strong non-subsumption between concepts w.r.t. a restricted TBox. We say that C is subsumed by D w.r.t. \(\mathcal {T} _\textit{v} \) (\(\mathcal {T} _\textit{v} \models C\sqsubseteq D\)) if for every \(\mathcal {I} \models \mathcal {T} _\textit{v} \) it holds that \(C^\mathcal {I} \subseteq D^\mathcal {I} \). Dually, C is strongly non-subsumed by D w.r.t. \(\mathcal {T} _\textit{v} \) (\(\mathcal {T} _\textit{v} \models C\not \,\not \sqsubseteq D\)) if for every \(\mathcal {I} \models \mathcal {T} _\textit{v} \), \(C^\mathcal {I} \not \subseteq D^\mathcal {I} \) holds. Notice that strong non-subsumption requires that the inclusion between axioms does not hold in any possible world satisfying \(\mathcal {T} _\textit{v} \). Hence, this condition is more strict than just negating the subsumption relation.

We now describe how the probabilistic constraints are handled in our logic. An \(\mathcal {ALCP}\)-interpretation consists of a finite set of possible worlds and a probability function over these worlds.

Definition 7

\(\mathbf{(}\mathcal {ALCP}\) -Interpretation). An \(\mathcal {ALCP}\)-interpretation is a pair of the form \(\mathcal {P} =(\mathfrak {I},P_\mathfrak {I})\), where \(\mathfrak {I}\) is a non-empty, finite set of possible worlds and \(P_\mathfrak {I} \) is a probability distribution over \(\mathfrak {I}\).

Each \(\mathcal {ALCP}\)-interpretation induces a probability distribution over \(\mathcal {L}\). The probability of a context can be obtained by adding the probabilities of all possible worlds in which this context holds.

Definition 8

(Distribution Induced by \(\mathcal {P} \) ). Let \(\mathcal {P} =(\mathfrak {I},P_\mathfrak {I})\) be an \(\mathcal {ALCP}\)-interpretation. The probability distribution \(P^{\mathcal {P}}: \textit{Int}(\mathcal {L}) \rightarrow [0,1]\) induced by \(\mathcal {P}\) is defined by \(P^{\mathcal {P}} (\textit{v}) := \sum _{\mathcal {I} \in \mathfrak {I} |_{\textit{v}}} P_\mathfrak {I} (\mathcal {I})\), where \(\mathfrak {I} |_{\textit{v}} = \{ (\varDelta ^\mathcal {I},\cdot ^\mathcal {I},\textit{v} ^\mathcal {I}) \in \mathfrak {I} \mid \textit{v} ^\mathcal {I} = \textit{v} \}\).

As usual, reasoning is restricted to interpretations that satisfy the restrictions imposed by the knowledge base. In our case, we have to demand that the interpretation is consistent with both the classical and the probabilistic part of our knowledge base. That is, we consider only those possible worlds that satisfy both the terminological knowledge (\(\mathcal {T}\)) and the probabilistic constraints (\(\mathcal {R}\)).

Definition 9

(Model). Let \(\mathcal {P} =(\mathfrak {I},P_\mathfrak {I})\) be an \(\mathcal {ALCP}\)-interpretation. \(\mathcal {P}\) is consistent with the TBox \(\mathcal {T}\) if every \(\mathcal {I} \in \mathfrak {I} \) is a model of \(\mathcal {T}\). \(\mathcal {P}\) is consistent with the set of probabilistic constraints \(\mathcal {R}\) iff \(P^{\mathcal {P}} ~\models ~\mathcal {R} \). The \(\mathcal {ALCP}\)-interpretation \(\mathcal {P}\) is a model of the KB \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) iff it is consistent with both \(\mathcal {T}\) and \(\mathcal {R}\). As usual, a KB is consistent iff it has a model.

Notice that \(\mathcal {ALCP}\)-KBs can express both, logical and probabilistic dependencies between axioms. For instance, two \(\mathcal {L}\)-GCIs \(\langle C_1\sqsubseteq D_1:\kappa _1\rangle \) and \(\langle C_2\sqsubseteq D_2:\kappa _2\rangle \) where \(\kappa _1\Rightarrow \kappa _2\) express that whenever the first \(\mathcal {L}\)-GCI is satisfied, the second one must also hold. Similarly, the probabilistic dependencies between axioms are expressed via the probabilistic constraints of the labeling formulas.

We are interested in computing degrees of belief for subsumption relations between concepts. We define the conditional probability of a subsumption relation given a context with respect to a given \(\mathcal {ALCP}\)-interpretation following the usual notions of conditioning.

Definition 10

(Probability of Subsumption). Let CD be concepts, \(\kappa \) a context and \(\mathcal {P} \) an \(\mathcal {ALCP}\)-interpretation. The conditional probability of \(C\sqsubseteq D\) given \(\kappa \) with respect to \(\mathcal {P} \) is

$$\begin{aligned} \textit{Pr} _\mathcal {P} (C\sqsubseteq D \mid \kappa ):= \frac{\sum _{\mathcal {I} \in \mathfrak {I},\mathcal {I} ~\models ~\kappa ,\mathcal {I} \models C\sqsubseteq D}P_\mathfrak {I} (\mathcal {I})}{\sum _{\mathcal {I} \in \mathfrak {I},\mathcal {I} ~\models ~\kappa }P_\mathfrak {I} (\mathcal {I})}. \end{aligned}$$
(3)

Notice that the denominator in (3) can be rewritten as

$$\begin{aligned} \sum _{\mathcal {I} \in \mathfrak {I},\mathcal {I} ~\models ~\kappa }P_\mathfrak {I} (\mathcal {I})&= \sum _{\textit{v} ~\models ~\kappa } \sum _{\mathcal {I} \in \mathfrak {I} |_\textit{v}}P_\mathfrak {I} (\mathcal {I}) = \sum _{\textit{v} ~\models ~\kappa } P^{\mathcal {P}} (\textit{v}) = P^{\mathcal {P}} (\kappa ). \end{aligned}$$

As usual, the conditional probability is only well-defined when \(P^{\mathcal {P}} (\kappa )>0\).

Recall that the set of probabilistic constraints \(\mathcal {R}\) may be satisfied by an infinite class of probability distributions. In the spirit of maximum entropy reasoning, we consider only the most conservative ones in the sense that they induce the ME-model \(P^\textit{ME}_{\mathcal {R}}\) of \(\mathcal {R}\).

Definition 11

(ME- \(\mathcal {ALCP}\) -Model). An \(\mathcal {ALCP}\)-model \(\mathcal {P}\) of \(\mathcal {K}\) is called an ME-\(\mathcal {ALCP}\) -model of \(\mathcal {K}\) iff \(P^{\mathcal {P}} = P^\textit{ME}_{\mathcal {R}} \). The set of all ME-\(\mathcal {ALCP}\)-models of \(\mathcal {K} \) is denoted by \(\textit{Mod} _\textit{ME} (\mathcal {K})\). \(\mathcal {K}\) is called ME-consistent iff \(\textit{Mod} _\textit{ME} (\mathcal {K}) \ne \emptyset \).

Note that ME-consistency is a strictly stronger notion of consistency. ME-consistent knowledge bases are always consistent, but the converse does not necessarily hold if the classical TBox obtained from \(\mathcal {T} \) by restricting to a context is inconsistent as we show in the following example.

Example 12

Let \(\mathsf {sig} (\mathcal {L}) = \{x\}\) and \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) be the KB with \(\mathcal {R} = \emptyset \) and \(\mathcal {T} = \{\langle A\sqcup \lnot A \sqsubseteq A\sqcap \lnot A: x\rangle \}\). Since \(A\sqcup \lnot A \sqsubseteq A\sqcap \lnot A\) is contradictorial, each \(\mathcal {ALCP}\)-model of \(\mathcal {K}\) must satisfy \(\lnot x\). There certainly are such models, but in each such model \(\mathcal {P}\), \(P^{\mathcal {P}} (x) = 0\). However, since \(\mathcal {R} = \emptyset \), we have \(P^\textit{ME}_{\mathcal {R}} (x) = 0.5\) and hence \(\mathcal {K} \) has no ME-model.

ME-inconsistency rules out some undesired cases in which the whole knowledge base is consistent, but the TBox restricted to some context is inconsistent. The following theorem gives a simple characterization of ME-consistency: to verify ME-consistency of a KB, it suffices to check consistency of the TBoxes induced by the \(\mathcal {L}\)-interpretations that have positive probability with respect to \(P^\textit{ME}_{\mathcal {R}} \). By the properties of the ME distribution, these are the interpretations that are not explicitly restricted to have zero probability through \(\mathcal {R}\).

Theorem 13

The KB \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) is ME-consistent iff for every \(\textit{v} \in \textit{Int}(\mathcal {L}) \) such that \(P^\textit{ME}_{\mathcal {R}} (\textit{v})>0\), \(\mathcal {T} _\textit{v} \) is consistent.

For the rest of this paper we consider only ME-consistent KBs. Hence, whenever we speak of a KB \(\mathcal {K}\), we implicitly assume that \(\mathcal {K}\) has at least one ME-model.

We are interested in computing the probability of a subsumption relation w.r.t. a given KB \(\mathcal {K}\). Notice that, although we consider only one probability distribution \(P^\textit{ME}_{\mathcal {R}}\), there can still exist many different ME-models of \(\mathcal {K}\), which yield different probabilities for the same subsumption relation. One way to handle this is to consider the smallest and largest probabilities that can be consistently associated to this relation. We call them the sceptical and the creduluos degrees of belief, respectively.

Definition 14

(Degree of Belief). Let CD be \(\mathcal {ALCP}\) concepts, \(\kappa \) a context, and \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) an \(\mathcal {ALCP}\) KB. The sceptical degree of belief of \(C\sqsubseteq D\) given \(\kappa \) w.r.t. \(\mathcal {K}\) is

$$\begin{aligned} \mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D \mid \kappa ):= \inf _{\mathcal {P} \in \textit{Mod} _\textit{ME} (\mathcal {K})}\textit{Pr} _\mathcal {P} (C\sqsubseteq D \mid \kappa ). \end{aligned}$$

The credulous degree of belief of \(C\sqsubseteq D\) given \(\kappa \) w.r.t. \(\mathcal {K}\) is

$$\begin{aligned} \mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D \mid \kappa ):= \sup _{\mathcal {P} \in \textit{Mod} _\textit{ME} (\mathcal {K})}\textit{Pr} _\mathcal {P} (C\sqsubseteq D \mid \kappa ). \end{aligned}$$

Example 15

Consider \({\mathcal {K} _{\mathsf {exa}}} \) from Example 4. If we ask for the degrees of belief that a patient who suffers from an infection can be successfully treated with antibiotics, we obtain

$$\begin{aligned} \mathcal {B} ^\mathsf {s} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {inf}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid \top )&{} = 0,\\ \mathcal {B} ^\mathsf {c} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {inf}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid \top )&{} = 1. \end{aligned}$$

These bounds are not very informative, but they are perfectly justified by our knowledge base since we do not know anything about the effectiveness of antibiotics with respect to infections in general. However, for a patient who suffers from a streptococcal infection we get

$$\begin{aligned} \mathcal {B} ^\mathsf {s} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid \top )&{} = 0.9405,\\ \mathcal {B} ^\mathsf {c} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid \top )&{} = 0.95. \end{aligned}$$

If we know that this patient used antibiotics heavily in the past, then there is nothing in our knowledge base that guarantees the existence of a successful treatment. Hence, the degrees of belief become

$$\begin{aligned} \mathcal {B} ^\mathsf {s} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid {\small {{\textsc {h}}}})&{} = 0\\ \mathcal {B} ^\mathsf {c} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid {\small {{\textsc {h}}}})&{} = 0.2.\\ \end{aligned}$$

Our definition of the sceptical degree of belief raises a philosophical question: should there be no difference between the degree of belief 0 and an infinitely small degree of belief? A dual question arises for the credulous degree of belief and the probability 1. However, as we show in the next section, the sceptical and credulous degrees of belief actually correspond to minimum and maximum rather than to infimum and supremum (see Corollary 20) so that these questions become vacuous. From the following theorem we can conclude that every intermediate degree can also be obtained by some model of the KB.

Theorem 16

(Intermediate Value Theorem). Let \(p_1<p_2\) and \(\mathcal {P} _1\) and \(\mathcal {P} _2\) be two ME-\(\mathcal {ALCP}\)-models of the KB \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) such that \(\textit{Pr} _{\mathcal {P} _1}(C\sqsubseteq D\mid \kappa ) = p_1\) and \(\textit{Pr} _{\mathcal {P} _2}(C\sqsubseteq D\mid \kappa ) = p_2\). Then for each p between \(p_1\) and \(p_2\) there exists an ME-\(\mathcal {ALCP}\)-model \(\mathcal {P} \) of \(\mathcal {K}\) such that \(\textit{Pr} _{\mathcal {P}}(C\sqsubseteq D\mid \kappa ) = p\)

As we will show in Corollary 20, both the sceptical degree \(\mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D\mid \kappa )\) and the credulous degree \(\mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D\mid \kappa )\) are in fact witnessed by some ME-models. Therefore it is meaningful to consider the whole interval of beliefs between \(\mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D\mid \kappa )\) and \(\mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D\mid \kappa )\).

Definition 17

(Belief Interval). Let CD be \(\mathcal {ALCP}\) concepts, \(\kappa \in \mathcal {L} \) a context and \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) a \(\mathcal {ALCP}\) KB. The belief interval for \(C\sqsubseteq D\) w.r.t. \(\mathcal {K}\) given \(\kappa \) is

$$\begin{aligned} \mathcal {B} _\mathcal {K} (C\sqsubseteq D \mid \kappa ):= [\mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D \mid \kappa ), \mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D \mid \kappa )]. \end{aligned}$$

4 Computing Beliefs

In this section we show how to compute the belief interval. The first theorem states that the sceptical degreef of belief for a subsumption relation can be computed by adding the probabilities of those \(\mathcal {L}\)-interpretations w that entail this subsumption in the corresponding restricted TBox \(\mathcal {T} _w\).

Theorem 18

Let \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) be a KB, CD two concepts, and \(\kappa \) a context such that \(P^\textit{ME}_{\mathcal {R}} (\kappa )>0\). Then

$$\begin{aligned} \mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D\mid \kappa )= \frac{\sum _{{w\in \textit{Int}(\mathcal {L})},{\mathcal {T} _w\models C\sqsubseteq D},{w\models \kappa }}P^\textit{ME}_{\mathcal {R}} (w)}{P^\textit{ME}_{\mathcal {R}} (\kappa )}. \end{aligned}$$

Dually, the credulous degree of belief for a subsumption relation can be computed by removing all the situations in which this relation cannot possibly hold.

Theorem 19

Let \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) be a KB, CD two concepts, and \(\kappa \) a context with \(P^\textit{ME}_{\mathcal {R}} (\kappa )>0\). Then

$$\begin{aligned} \mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D\mid \kappa )= 1 - \frac{\sum _{{w\in \textit{Int}(\mathcal {L})},{\mathcal {T} _w\models C\not \,\not \sqsubseteq D},{w\models \kappa }}P^\textit{ME}_{\mathcal {R}} (w)}{P^\textit{ME}_{\mathcal {R}} (\kappa )}. \end{aligned}$$

To prove these theorems, one can build two models of the KB \(\mathcal {K}\), \(\mathcal {P}\) and \(\mathcal {Q}\) such that \(\textit{Pr} _{\mathcal {P}}(C\sqsubseteq D\mid \kappa )\) and \(\textit{Pr} _{\mathcal {Q}}(C\sqsubseteq D\mid \kappa )\) are those degrees expressed by Theorems 18 and 19, respectively. As a byproduct of these proofs, we obtain that the infimum and supremum that define the sceptical and the credulous degrees of belief actually correspond to minimum and maximum taken by some ME-models, yielding the following corollary.

Corollary 20

Let \(\mathcal {K}\) be an \(\mathcal {ALCP}\) KB, CD be two concepts, and \(\kappa \) be a context. There exist two ME-models \(\mathcal {P}\),\(\mathcal {Q}\) of \(\mathcal {K}\) with \(\mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D\mid \kappa )=\textit{Pr} _\mathcal {P} (C\sqsubseteq D\mid \kappa )\) and \(\mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D\mid \kappa )=\textit{Pr} _\mathcal {Q} (C\sqsubseteq D\mid \kappa )\).

The direct consequence of Theorems 18 and 19 is that if we want to compute the belief interval for \(C\sqsubseteq D\) given some context, it suffices to identify all \(\mathcal {L}\)-interpretations whose induced (classical) TBoxes entail the subsumption relation \(C\sqsubseteq D\) (for the sceptical belief) or the strong non-subsumption \(C\not \,\not \sqsubseteq D\) (for credulous belief). Recall that every set of propositional interpretations can be represented by a propositional formula. This motivates the following definition.

Definition 21

(Consequence Formula). An \(\mathcal {L}\)-formula \(\phi \) is a consequence formula for \(C\sqsubseteq D\) (respectively \(C\not \,\not \sqsubseteq D\)) w.r.t. the \(\mathcal {L}\)-TBox \(\mathcal {T}\) if for every \(w\in \textit{Int}(\mathcal {L}) \) it holds that \(w\models \phi \) iff \(\mathcal {T} _w\models C\sqsubseteq D\) (respectively \(\mathcal {T} _w\models C\not \,\not \sqsubseteq D\)).

If we are able to compute these consequence formulas, then the computation of the belief interval can be reduced to the evaluation of the probability of these formulas w.r.t. the ME-distribution satisfying \(\mathcal {R}\).

Theorem 22

Let \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) be an \(\mathcal {ALCP}\) KB, \(\phi \) and \(\psi \) be consequence formulas for \(C\sqsubseteq D\) and \(C\not \,\not \sqsubseteq D\) w.r.t. \(\mathcal {T}\), respectively, and \(\kappa \) a context. Then \(\mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D\mid \kappa )=P^\textit{ME}_{\mathcal {R}} (\phi \mid \kappa )\) and \(\mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D\mid \kappa )=1-P^\textit{ME}_{\mathcal {R}} (\psi \mid \kappa )\).

Example 23

In our running example, one can see that a consequence formula for \(\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}}\sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\) is \(\lnot {\small {{\textsc {res}}}}\wedge \lnot {\small {{\textsc {h}}}}\). Indeed, in order to deduce this consequence it is necessary to satisfy the first axiom of \({\mathcal {T} _{\mathsf {exa}}}\), which is only guaranteed in the context \(\lnot {\small {{\textsc {res}}}}\wedge \lnot {\small {{\textsc {h}}}}\). Similarly, \({\small {{\textsc {res}}}}\) is a consequence formula for \(\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}}\not \,\not \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\). Knowing both the consequence formulas and the ME-model, we can deduce

$$\begin{aligned} \mathcal {B} ^\mathsf {s} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid \top )= {}&P^\textit{ME}_{\mathcal {R}} (\lnot {\small {{\textsc {res}}}}\wedge \lnot {\small {{\textsc {h}}}}) = 0.9405, \text { and} \\ \mathcal {B} ^\mathsf {c} _{\mathcal {K} _{\mathsf {exa}}} (\exists {\small {{\textsf {sf}}}}.{\small {{\textsf {strep}}}} \sqsubseteq \exists {\small {{\textsf {suc}}}}.{\small {{\textsf {ab}}}}\mid {\small {{\textsc {h}}}})= {}&1-P^\textit{ME}_{\mathcal {R}} ({\small {{\textsc {res}}}}\mid {\small {{\textsc {h}}}})= 0.2. \end{aligned}$$

In particular, Theorem 22 implies that the belief interval can be computed in two phases. The first phase uses purely logical reasoning to compute the consequence formulas, while the second phase applies probabilistic inferences to compute the degrees of belief from these formulas. We now briefly explain how the consequence formulas can be computed.

Notice first that subsumption and non-subsumption are monotonic consequences in the sense of [2]; that is, if an \(\mathcal{ALC}\) TBox \(\mathcal {T}\) entails the subsumption \(C\sqsubseteq D\), then every superset of \(\mathcal {T}\) also entails this consequence. Similarly, adding more axioms to a TBox entailing \(C\not \,\not \sqsubseteq D\) does not remove this entailment. Moreover, the set of all \(\mathcal {L}\)-formulas (modulo logical equivalence) forms a distributive lattice ordered by generality, in which \(\mathcal {L}\)-interpretations are all the join prime elements. Thus, the consequence formulas from Definition 21 are in fact the so-called boundaries from [2]. Hence, they can be computed using any of the known boundary computation approaches.

Assuming that the number of contexts is small in comparison to the size of the TBox, it is better to compute the degrees of belief through a more direct approach following Theorems 18 and 19. In order to compute \(\mathcal {B} ^\mathsf {s} _\mathcal {K} (C\sqsubseteq D\mid \kappa )\) and \(\mathcal {B} ^\mathsf {c} _\mathcal {K} (C\sqsubseteq D\mid \kappa )\), it suffices to enumerate all interpretations \(\textit{v} \in \textit{Int}(\mathcal {L}) \) and check whether \(\mathcal {T} _\textit{v} \models C\sqsubseteq D\) or \(\mathcal {T} _\textit{v} \models C\not \,\not \sqsubseteq D\), and \(\textit{v} \models \kappa \), or not (see Algorithm 1). This approach requires \(2^{|\mathsf {sig} (\mathcal {L})|}\) calls to a standard \(\mathcal{ALC}\) reasoner, and each of these calls runs in exponential time on \(|\mathcal {T} |\) [9]. Notice that this algorithm has an any-time behaviour: it is possible to stop its execution at any moment and obtain an approximation of the belief interval. Moreover, the longer the algorithm runs, the better this approximation becomes. Thus, this method is adequate for a system where finding good approximations efficiently may be more important than computing the precise answers.

figure a

5 Properties

We now investigate some properties of probabilistic logics [22]. First we show that \(\mathcal {ALCP}\) is language and representation invariant. Invariance is meant with respect to logical objects. Language invariance means that just extending the language without changing the knowledge base should not affect reasoning results. Representation invariance means that equivalent knowledge bases should yield equal inference results. Notice that different notions of representation dependence exist in the literature. For instance, in [11] a very different notion is considered, where the language and the knowledge base are changed simultaneously. This case is not covered by our notion of representation invariance. \(\mathcal {ALCP}\) also satisfies an independence property; i.e., reasoning results about a part of the language are not changed, when we add knowledge about an independent part of the language. Finally, \(\mathcal {ALCP}\) is continuous in the sense that minor changes in the probabilistic knowledge expressed by a knowledge base cannot induce major changes in the reasoning results.

Theorem 24

(Representation Invariance). Let \(\mathcal {K} _i=(\mathcal {R} _i,\mathcal {T} _i)\), \(i\in \{1,2\}\), be two KBs such that \(\textit{Mod} (\mathcal {R} _1) = \textit{Mod} (\mathcal {R} _2)\) and \(\textit{Mod} (\mathcal {T} _1) = \textit{Mod} (\mathcal {T} _2)\). Then for all concepts CD and contexts \(\kappa \in \mathcal {L} \), \(\mathcal {B} _{\mathcal {K} _1}(C\sqsubseteq D \mid \kappa ) = \mathcal {B} _{\mathcal {K} _2}(C\sqsubseteq D \mid \kappa )\).

\(\mathcal {ALCP}\) is not only representation invariant, but also language invariant. This property is of computational interest, in particular in combination with independence, that we investigate subsequently. To illustrate this, suppose that we added knowledge about bone fractures in our medical example, which is independent of the knowledge about infections. Independence guarantees that we can ignore the knowledge about infections when answering queries about bone fractures. In this way, we can decrease the size of the knowledge base. Language invariance guarantees that we can also ignore the concepts, relations and propositional variables related to the infection domain. Thus, we can decrease the size of the language. Exploiting both properties, the size of the computational problems can sometimes be decreased significantly.

Theorem 25

(Language Invariance). Let \(\mathcal {K} _1,\mathcal {K} _2\) be KBs over \(\mathcal {L} ^1, \mathsf{N_C^{1}}, \mathsf{N_R^{1}} \) and \(\mathcal {L} ^2, \mathsf{N_C^{2}}, \mathsf{N_R^{2}} \), respectively. If \(\mathcal {K} _1 = \mathcal {K} _2\), \(\mathcal {L} ^1 \subseteq \mathcal {L} ^2, \mathsf{N_C^{1}} \subseteq \mathsf{N_C^{2}} \) and \(\mathsf{N_R^{1}} \subseteq \mathsf{N_R^{2}} \), then for all concepts \(C,D \in \mathsf{N_C^{1}} \) and contexts \(\kappa \in \mathcal {L} ^1\), it holds that

$$\begin{aligned} \mathcal {B} _{\mathcal {K} _1}(C\sqsubseteq D \mid \kappa ) = \mathcal {B} _{\mathcal {K} _2}(C\sqsubseteq D \mid \kappa ). \end{aligned}$$

For an \(\mathcal {L}\)-TBox \(\mathcal {T}\), we define the signature of \(\mathcal {T}\) to be the set \(\mathsf {sig} (\mathcal {T})\) of all concept names and role names appearing in \(\mathcal {T}\). Likewise, \(\mathsf {sig} (\mathcal {R})\) is the set of all propositional variables appearing in \(\mathcal {R}\). The signature of a KB \(\mathcal {K} =(\mathcal {R},\mathcal {T})\) is \(\mathsf {sig} (\mathcal {K}):=\mathsf {sig} (\mathcal {R})\cup \mathsf {sig} (\mathcal {T})\).

Theorem 26

(Independence). Let \(\mathcal {K} _1,\mathcal {K} _2\) be s.t. \(\mathsf {sig} (\mathcal {K} _1)\cap \mathsf {sig} (\mathcal {K} _2)=\emptyset \), CD be two concepts, and \(\kappa \) a context where \(\left( \mathsf {sig} (C)\cup \mathsf {sig} (D)\cup \mathsf {sig} (\kappa )\right) \cap \mathsf {sig} (\mathcal {K} _2)=\emptyset \). Then \(\mathcal {B} (C\sqsubseteq _{\mathcal {K} _1} D\mid \kappa )=\mathcal {B} (C\sqsubseteq _{\mathcal {K} _1\cup \mathcal {K} _2} D\mid \kappa )\).

To conclude, we consider continuity. One important practical feature of continuous probabilistic logics is that they guarantee a numerically stable behaviour. That is, minor rounding errors due to floating-point arithmetic will not result in major errors in the computed probabilities. As demonstrated by Paris in [22], measuring the difference between probabilistic knowledge bases is subtle and is best addressed by comparing knowledge bases extensionally; i.e., with respect to their model sets. To this end, Paris considered the Blaschke metric. Formally, the Blaschke distance \(\Vert S_1, S_2 \Vert _B\) between two convex sets \(S_1, S_2\) is defined by

$$\begin{aligned}&\inf \{\delta \in \mathbb {R} \mid \ \forall P_1 \in S_1 \exists P_2 \in S_2: \Vert P_1, P_2 \Vert _2 \le \delta \ \textit{and} \\&\qquad \qquad \forall P_2 \in S_2 \exists P_1 \in S_1:\Vert P_2, P_1 \Vert _2 \le \delta \} \end{aligned}$$

Intuitively, \(\Vert S_1, S_2 \Vert _B\) is the smallest real number d such that for each distribution in one of the sets, there is a probability distribution in the other that has distance at most d to the former. We say that a sequence of knowledge bases \((\mathcal {K} _i)\) converges to a knowledge base \(\mathcal {K}\) iff the classical part of each \(\mathcal {K} _i\) is equivalent to the classical part of \(\mathcal {K} \) and the probabilistic part converges to the probabilistic part of \(\mathcal {K} \). Our reasoning approach behaves indeed continuously with respect to this metric.

Theorem 27

(Continuity). Let \((\mathcal {K} _i)\) be a convergent sequence of KBs with limit \(\mathcal {K} \) and \(\mathcal {B} _{\mathcal {K} _i}(C\sqsubseteq D\mid \kappa ) = [\ell _i,u_i]\). If \(\mathcal {B} _{\mathcal {K}}(C\sqsubseteq D\mid \kappa ) = [\ell ,u]\), then \((l_i)\) converges to \(\ell \) and \((u_i)\) converges to u (with respect to the usual topology on \(\mathbb {R}\)).

6 Related Work

Relational probabilistic logical approaches can be roughly divided into those that consider probability distributions over the domain, those that consider probability distributions over possible worlds and those that combine both ideas [10]. Our framework belongs to the second group. Maximum entropy reasoning in propositional probabilistic logics has been discussed extensively, e.g., in [13, 22], and various extensions to first-order languages have been considered in recent years [3, 4, 14, 15]. In these works, the domain is restricted to a finite number of constants or bounded in the limit. We circumvent the need to do so by combining a classical first-order logic with unbounded domain with a probabilistic logic with fixed domain.

Many probabilistic DLs have also been considered in the last decades [16, 18, 19]. Our approach is closest to Bayesian DLs [5, 6] and disponte [26]. The greatest difference with the former lies in the fact that \(\mathcal {ALCP}\) KBs do not require a complete specification of the probability distribution, but only a set of probabilistic constraints. Moreover, the previous formalisms consider only the sceptical degree of belief, while we are interested in the full belief interval. In contrast to disponte, \(\mathcal {ALCP}\) is capable of expressing both, logical and probabilistic dependencies between the axioms in a KB; in addition, disponte requires all uncertainty degrees to be assigned as mutually independent point probabilities, while \(\mathcal {ALCP}\) allows for a more flexible specification.

7 Conclusions

We have introduced the probabilistic DL \(\mathcal {ALCP}\), which extends the classical DL \(\mathcal {ALC}\) with the capability of expressing and reasoning about uncertain contextual knowledge defined through the principle of maximum entropy. Effective reasoning methods were developed using the decoupling between the logical and the probabilistic components of \(\mathcal {ALCP}\) KBs. We also studied the properties of this logic in relation to other probabilistic logics.

We plan to extend this work in several directions. First, instead of considering the ME-model, we could reason over all probability distributions that satisfy our probabilistic constraints similar to [12, 17, 20]. This will result in larger belief intervals in general. A smaller interval is preferable since it corresponds to a more precise degree of belief. However, when using all probability distributions the size of the interval can be a good indicator for the variation of the possible beliefs in our query with respect to the knowledge base.

In some applications it is also useful to allow more expressive propositional or relational context languages like those proposed in [4, 7, 15, 24]. Similarly, we can consider other DLs for our concept language. Indeed, \(\mathcal{ALC}\) was chosen as a prototypical DL for studying the basic properties of our framework. Including additional constructors into the formalism should be relatively simple. In contrast, considering other reasoning problems beyond subsumption is less straightforward. Recall, for instance, that if an \(\mathcal {ALCP}\) KB \(\mathcal {K}\) contains an inconsistent context with positive probability, then \(\mathcal {K}\) has no models. It is thus unclear how to handle the probability of consistency of a KB.

Practical reasoning with \(\mathcal {ALCP}\) can be currently performed by combining existing ME-reasonersFootnote 2 with any \(\mathcal {ALC}\)-reasonerFootnote 3 according to Algorithm 1. Clearly, such an approach can still be further optimized. We are working on combining the classical and probabilistic reasoning parts in more sophisticated ways.