1 Introduction

Description Logics (DLs) [2] are a successful family of logic-based knowledge representation languages, which are employed in various application domains, but arguably their most prominent success was the adoption of the DL-based language OWLFootnote 1 as the standard ontology language for the Semantic Web. A DL knowledge base (aka ontology) consists of a TBox and an ABox. In the former, concepts can be used to state terminological constraints as so-called general concept inclusions (GCIs). For example, the concept \(\exists parent .( Famous \sqcap Rich )\) describes individuals that have a parent that is both famous and rich, and the GCI \(\exists friend . Famous \sqsubseteq Famous \) states that individuals that have a famous friend are famous themselves. The expressiveness of a DL depends on which constructors for building concepts are available. The concepts in our example use the constructors conjunction (\(\sqcap \)) and existential restriction (\(\exists r.C\)), which together with the top concept (\(\top \)) are the ones available in the DL \(\mathcal {EL} \), to which we restrict our attention here. While being quite inexpressive, \(\mathcal {EL} \) is nevertheless frequently used for building ontologies,Footnote 2 and it has the advantage over more expressive DLs that reasoning is polynomial w.r.t. \(\mathcal {EL} \) ontologies. In the ABox, one can relate named individuals with concepts and with each other. For example, the concept assertion \((\exists parent . Rich )\,( BEN )\) states that Ben has a rich parent, and the role assertion \( friend ( BEN , JOHN )\) says that Ben has John as friend. If concept assertions are restricted to employing only concept names, like \( Famous ( JOHN )\), rather than complex concepts, then the ABox is called simple. DL systems provide their users with inference services that automatically derive implicit consequences such as instance relationships. For example, given the ABox assertions and the GCI introduced above, we can derive that Ben is famous, i.e., that the assertions \( Famous ( BEN )\) follows from this ontology.

Although DL reasoners are usually sound (i.e., only derive instance relationship that indeed follow from the ontology), a computed consequence may still be incorrect in the application domain, due to the fact that the modelling of the domain in the ontology is erroneous. The question is then how to repair the ontology such that one gets rid of the unwanted consequences, but retains as many consequences as possible. Classical repair approaches that are based on removing axioms from the ontology [8, 11, 15, 16, 18, 19] are not optimal since, by removing large axioms, one may also lose information that does not contribute to the unwanted consequence. For example, if the concept assertion for John is \(( Famous \sqcap Rich )( JOHN )\) rather than just \( Famous ( JOHN )\), then to get rid of the consequence \( Famous ( BEN )\) we need to remove the whole assertion, and thus unnecessarily also lose the information that John is rich.

Extending on our previous work in [5, 7], we investigated in [3] how to compute optimal repairs in a setting where the ABox may contain errors, but the TBox is assumed to be a correct \(\mathcal {EL} \) TBox, and thus remains unchanged. More precisely, we consider a generalization of ABoxes called quantified ABoxes (qABoxes) both as input for and as result of the repair process since this allows us to retain more consequences. Such a qABox is a simple ABox where, however, some of the individuals are anonymized, which is formally expressed by existentially quantifying over them. In [3], we introduce two different notions of repair, depending on which entailment relation between qABoxes is considered: classical logical entailment or \(\mathsf {IQ}\)-entailment, where the latter retains as many instance relationships as possible (but not necessarily answers to conjunctive queries). For the \(\mathsf {IQ}\) case, we show that optimal \(\mathsf {IQ}\)-repairs always exist and can be computed in exponential time. In the worst case, such repairs may be exponentially large and there may be exponentially many of them. Reusing an example from the introduction of [3], let us assume that the input ABox contains the information that Ben has a parent, Jerry, that is both rich and famous, that the TBox contains the GCI \( Famous \sqsubseteq Rich \), and that we want to remove the consequence \((\exists parent .( Rich \sqcap Famous ))( BEN )\). Using the optimized repair approach of [3], we obtain the following qABox as one of the optimal \(\mathsf {IQ}\)-repairs: \( \exists \{y\}.\{ parent ( BEN , y), Rich (y), Famous ( JERRY ), Rich ( JERRY )\}. \)

The advantage of using qABoxes rather than ABoxes for repair is that more information can be retained (e.g. the fact that Ben has a rich parent). The disadvantage is that, though anonymized individuals are part of the OWL standard, DL systems usually do not accept them as input. Thus, the question arises whether one can also obtain optimal repairs if one restricts the output of the repair process to being ABoxes. In the above example, the qABox obtained as an optimal \(\mathsf {IQ}\)-repair can actually be expressed by an ABox with complex concept assertions: \( \{(\exists parent . Rich )( BEN ), Famous ( JERRY ), Rich ( JERRY )\}. \)

However, this is not always the case. As an example, consider the ABox \(\mathcal {A}:= \{ parent ( BEN , JERRY ), Rich ( JERRY )\}\) and the TBox \( \mathcal {T}:= \{ \exists parent . Rich \sqsubseteq Famous , Famous \sqsubseteq \exists friend . Famous , \exists friend . Famous \sqsubseteq Famous \}, \) which together imply that Ben is famous. Assume that Ben wants to get rid of this consequence. The repair approach of [3] yields the following qABox as an optimal \(\mathsf {IQ}\)-repair: \( \exists {\,\{x, y\}}\hbox {.}\,{\{} parent ( BEN , x), Rich ( JERRY ), friend ( BEN , y), friend (y, y) \} \). This qABox retains the information that Ben has a parent (but not that Jerry is this parent) and that Ben is the starting point of an infinite \( friend \)-chain, i.e., Ben belongs to the concepts \(C_n := (\exists friend .)^n\top \) for all \(n\ge 1\). The latter is the reason why this qABox cannot be expressed by an \(\mathsf {IQ}\)-equivalent ABox, which in turn is the reason why there is no optimal ABox repair. The culprit is obviously the cycle \( friend (y, y)\). However, such cycles need not always cause problems. In fact, if we remove the third GCI \(\exists friend . Famous \sqsubseteq Famous \) from the TBox, then the following qABox is an optimal \(\mathsf {IQ}\)-repair:

$$ \begin{array}{r@{}l} \exists {\,\{x, y\}}\hbox {.}\,{\{}&{} parent ( BEN , x), Rich ( JERRY ), \\ &{} friend ( BEN , y), friend (y, y), Famous (y) \}. \end{array} $$

This qABox can be expressed by an ABox that is \(\mathsf {IQ}\)-equivalent to it w.r.t. the given TBox: \( \{ (\exists parent .\top )( BEN ), Rich ( JERRY ), (\exists friend . Famous )( BEN )\}. \) The reason is that, due to the existence of a famous friend of Ben, the GCI \( Famous \sqsubseteq \exists friend . Famous \) now yields the infinite \( friend \)-chain.

These examples demonstrate that optimal ABox repairs may not always exist, and that it is not obvious to see when they do. The main contribution of the present paper is that we show how to decide the existence of optimal ABox repairs in exponential time, and how to compute all such repairs in case they exist. There may exist exponentially many such repairs, and each one may in the worst case be of double-exponential size. Our approach for showing these results roughly proceeds as follows. First, we observe that classical entailment between a qABox and an ABox coincides with so-called \(\mathsf {IRQ}\)-entailment, which is slightly stronger than \(\mathsf {IQ}\)-entailment by additionally taking role assertions between named individuals into account. Then, we show that both the canonical and the optimized \(\mathsf {IQ}\)-repairs of [3] cannot only be used to obtain all optimal \(\mathsf {IQ}\)-repairs, but also to compute all optimal \(\mathsf {IRQ}\)-repairs. Subsequently, we introduce the notion of an optimal ABox approximation of a given qABox, and prove that the set of optimal ABox approximations of all optimal \(\mathsf {IRQ}\)-repairs yields all optimal ABox repairs. A given qABox may not have an optimal ABox approximation, but if it does, then this approximation is unique up to equivalence and of at most exponential size. Then we investigate the problem of deciding the existence of optimal ABox approximations. The first step is to transfer the qABox into a specific form, called pre-approximation, which is saturated w.r.t. the TBox and consists of the original role assertions between named individuals and for each named individual a a sub-qABox \(\mathcal {B} _a\). We prove that the original qABox has an optimal ABox approximation iff all the named individuals a have a most specific concept \(C_a\) in \(\mathcal {B} _a\) w.r.t. the TBox. The optimal ABox approximation is then obtained by replacing each \(\mathcal {B} _a\) with \(C_a(a)\) in the pre-approximation. We can then use the results stated in [20] to test the existence of the msc in polynomial timeFootnote 3 and to generate the at most exponentially large msc. Given that the optimal \(\mathsf {IRQ}\)-repairs may be of exponential size, this yields the complexity upper bounds for testing the existence and computing optimal ABox repairs mentioned above. Due to space constraints, we cannot give complete proofs of all our results. They can be found in [4].

2 Preliminaries

We start with introducing the DL \(\mathcal {EL} \) as well as TBoxes and (quantified) ABoxes. Then we consider the entailment relations relevant for this paper.

The name space available for defining \(\mathcal {EL} \) concepts and ABox assertions is given by a signature \(\varSigma \), which is the disjoint union of sets \(\varSigma _{\mathsf {O}}\), \(\varSigma _{\mathsf {C}}\), and \(\varSigma _{\mathsf {R}}\) of object names, concept names, and role names. Starting with concept names and the top concept \(\top \), \(\mathcal {EL} \) concepts are defined inductively: if CD are \(\mathcal {EL} \) concepts and r is a role name, then \(C\sqcap D\) (conjunction) and \(\exists {\,r}\hbox {.}\,{C}\) (existential restriction) are also \(\mathcal {EL} \) concepts. An \(\mathcal {EL} \) general concept inclusion (GCI) is of the form \(C\sqsubseteq D\), an \(\mathcal {EL} \) concept assertion is of the form C(u), and a role assertion is of the form r(uv), where CD are \(\mathcal {EL} \) concepts, \(r\in \varSigma _{\mathsf {R}} \), and \(u,v\in \varSigma _{\mathsf {O}} \). An \(\mathcal {EL} \) TBox is a finite set of \(\mathcal {EL} \) GCIs and an \(\mathcal {EL} \) ABox is a finite set of \(\mathcal {EL} \) concept assertions and role assertions. Such an ABox is called simple if all its concept assertions are of the form A(u) with \(A\in \varSigma _{\mathsf {C}} \). A quantified ABox (qABox) is of the form \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) where X is a finite subset of \(\varSigma _{\mathsf {O}} \) and \(\mathcal {A} \) is a simple ABox, which we call the matrix of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\). We call the elements of X variables and the other object names occurring in \(\mathcal {A} \) individuals.Footnote 4 The set of individual names occurring in \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) is denoted with \(\varSigma _{\mathsf {I}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\), and the set of all object names (including the variables) with \(\varSigma _{\mathsf {O}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\).

The semantics of the syntactic entities introduced above can either be defined directly using interpretations, or by a translation into first-order logic (FO). For the sake of brevity, we choose the latter approach (see [3] for the former). In the translation, the elements of \(\varSigma _{\mathsf {O}}\), \(\varSigma _{\mathsf {C}}\), and \(\varSigma _{\mathsf {R}}\) are respectively viewed as constant symbols, unary predicate symbols, and binary predicate symbols. \(\mathcal {EL} \) concepts C are inductively translated into FO formulas \(\phi _C(x)\) with one free variable x:

  • concept A for \(A\in \varSigma _{\mathsf {C}} \) is translated into A(x) and \(\top \) into \(A(x)\vee \lnot A(x)\) for an arbitrary \(A\in \varSigma _{\mathsf {C}} \);

  • if CD are translated into \(\phi _C(x)\) and \(\phi _D(x)\), then \(C\sqcap D\) is translated into \(\phi _C(x)\wedge \phi _D(x)\) and \(\exists {\,r}\hbox {.}\,{C}\) into \(\exists {\,y}\hbox {.}\,{(r(x,y)\wedge \phi _D(y))}\), where \(\phi _D(y)\) is obtained from \(\phi _D(x)\) by replacing the free variable x by a different variable y.

GCIs \(C\sqsubseteq D\) are translated into sentences \(\phi _{C\sqsubseteq D} := \forall {\,x}\hbox {.}\,{(\phi _C(x)\rightarrow \phi _D(x))}\) and TBoxes \(\mathcal {T} \) into \(\phi _\mathcal {T}:= \bigwedge _{{C\sqsubseteq D}\in \mathcal {T}}\phi _{C\sqsubseteq D}\). Concept assertions C(u) are translated into \(\phi _C(u)\), role assertions r(uv) stay the same, and ABoxes \(\mathcal {A} \) are translated into the conjunction \(\phi _\mathcal {A} \) of the translations of their assertions. For a quantified ABox \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\), the elements of X are viewed as first-order variables rather than constants, and its translation is \(\exists {\,\vec {x}}\hbox {.}\,{\phi _\mathcal {A}}\), where \(\vec {x}\) is the tuple of the variables in X in arbitrary order.

Let \(\alpha , \beta \) be (q)ABoxes, concept inclusions, or concept assertions (possibly not both of the same kind), and \(\mathcal {T} \) an \(\mathcal {EL}\) TBox. Then we say that \(\alpha \) entails \(\beta \) w.r.t. \(\mathcal {T}\) (written \(\alpha \models ^{\mathcal {T}}\beta )\) if the implication \((\phi _\alpha \wedge \phi _\mathcal {T})\rightarrow \phi _\beta \) is valid according to the semantics of FO. Furthermore, \(\alpha \) and \(\beta \) are equivalent w.r.t. \(\mathcal {T}\) (written \(\alpha \equiv ^\mathcal {T} \beta \)), if \(\alpha \models ^{\mathcal {T}}\beta \) and \(\beta \models ^{\mathcal {T}}\alpha \). In case \(\mathcal {T} =\emptyset \), we will sometimes write \(\models \) instead of \(\models ^\emptyset \). If \(\emptyset \models ^{\mathcal {T}}C\sqsubseteq D\), then we also write \(C\sqsubseteq ^{\mathcal {T}}D\) and say that C is subsumed by D w.r.t. \(\mathcal {T}\); in case \(\mathcal {T} =\emptyset \) we simply say that C is subsumed by D. If \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}C(a)\), then a is called an instance of C w.r.t. \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\mathcal {T} \). For ABoxes, the instance relation is defined analogously. Entailment between qABoxes w.r.t. an \(\mathcal {EL} \) TBox is NP-complete, but the subsumption and the instance problem are polynomial [7].

Note that ABoxes are a special case of qABoxes. For simple ABoxes, this is the case where \(X=\emptyset \). For general ABoxes, one can express complex concept assertions by introducing existentially quantified variables (e.g., \(\{(A\sqcap \exists r.B)(a)\}\) is equivalent to \(\exists {\,\{x\}}\hbox {.}\,{\{A(a), r(a,x), B(x)\}}\)). For this reason, the entailment relations defined below for qABoxes are also well-defined for ABoxes.

IQ-Entailment. If one is mainly interested in asking instance queries, i.e., in what kind of instance relations a qABox entails, then the following weaker form of entailment can be used [3, 7]. We say that the qABox \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) IQ-entails the qABox \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) w.r.t. the \(\mathcal {EL}\) TBox \(\mathcal {T}\) (written \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_\mathsf {IQ} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\)) if every concept assertion C(a) entailed w.r.t. \(\mathcal {T}\) by the latter is also entailed w.r.t. \(\mathcal {T}\) by the former. Whenever we compare two qABoxes \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\), we follow [7] and assume without loss of generality that they are renamed apart, which means that X is disjoint with \(\varSigma _{\mathsf {O}} (\exists {\,Y}\hbox {.}\,{\mathcal {B}})\) and Y is disjoint with \(\varSigma _{\mathsf {O}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\), and we further assume that the two qABoxes speak about the same set of individual names \(\varSigma _{\mathsf {I}} {:}{=}\varSigma _{\mathsf {I}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\cup \varSigma _{\mathsf {I}} (\exists {\,Y}\hbox {.}\,{\mathcal {B}})\).

For the case of an empty TBox, it was shown in [7] that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models _\mathsf {IQ} ^\emptyset \exists {\,Y}\hbox {.}\,{\mathcal {B}}\) iff there is a simulation from \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) to \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\). A simulation from \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) to \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) is a relation \(\mathfrak {S} \subseteq \varSigma _{\mathsf {O}} (\exists {\,Y}\hbox {.}\,{\mathcal {B}})\times \varSigma _{\mathsf {O}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) such that \((a,a)\in \mathfrak {S} \) for each \(a\in \varSigma _{\mathsf {I}} \) and, for each \((u,v)\in \mathfrak {S} \), \(A(u)\in \mathcal {B} \) implies \(A(v)\in \mathcal {A} \) and \(r(u,u')\in \mathcal {B} \) implies that there exists an object \(v'\in \varSigma _{\mathsf {O}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) such that \((u',v')\in \mathfrak {S} \) and \(r(v,v')\in \mathcal {A} \). Since checking the existence of a simulation can be done in polynomial time [10], the simulation characterization of \(\mathsf {IQ}\)-entailment shows that \(\mathsf {IQ}\)-entailment between qABoxes can be decided in polynomial time if \(\mathcal {T} =\emptyset \) [7].

Fig. 1.
figure 1

The \(\mathsf {IQ}\)-saturation rules from [3].

To extend these results to the case of a non-empty TBox, the notion of an \(\mathsf {IQ}\)-saturation is introduced in [3]. The saturation rules given in Fig. 1 add new variables and assertions to the qABox if the existence of a corresponding element and the validity of the assertion is implied by the TBox. To be more precise, for each existential restriction \(\exists {\,r}\hbox {.}\,{C}\) occurring in \(\mathcal {T} \), a fresh variable \(x_C\) not contained in the initial qABox is introduced. When applying the \(\exists \)-rule to an assertion of the form \((\exists {\,r}\hbox {.}\,{C})(t)\), this variable is always used for the successor object. As pointed out in [3], \(\mathsf {IQ}\)-saturation (i.e., the exhaustive application of the \(\mathsf {IQ}\)-saturation rules) terminates in polynomial time and generates a qABox \(\mathsf {sat} ^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\), which can be seen as a qABox representation of what is called the canonical model in [13, Sect. 5.2]. \(\mathsf {IQ}\)-entailment for qABoxes w.r.t. an \(\mathcal {EL}\) TBox is now characterized in [3] as follows.

Theorem 1

([3]). Let \(\mathcal {T}\) be an \(\mathcal {EL}\) TBox and \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) qABoxes. Then the following statements are equivalent:

  • \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_\mathsf {IQ} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\),

  • \(\mathsf {sat} ^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}}) \models ^\emptyset _\mathsf {IQ} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\),

  • there is a simulation from \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) to \(\mathsf {sat} ^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\).

Since the \(\mathsf {IQ}\)-saturation can be computed in polynomial time, this clearly shows that \(\mathsf {IQ}\)-entailment for qABoxes w.r.t. an \(\mathcal {EL}\) TBox can also be decided in polynomial time.

IRQ-Entailment. If we are not only interested in implied concept assertions, but also in implied role assertions, then \(\mathsf {IQ}\)-entailment is not sufficient. Instead, we must use \(\mathsf {IRQ}\)-entailment. We say that the qABox \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) IRQ-entails the qABox \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) w.r.t. the \(\mathcal {EL}\) TBox \(\mathcal {T}\) (written \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_{\mathsf {IRQ}} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\)) if every concept or role assertion entailed w.r.t. \(\mathcal {T}\) by the latter is also entailed w.r.t. \(\mathcal {T}\) by the former.

It is easy to see that a qABox cannot entail a role assertion involving a variable, and it can only entail a role assertion between individuals if its matrix contains this assertion. This yields the following characterization of \(\mathsf {IRQ}\)-entailment, which shows that \(\mathsf {IRQ}\)-entailment can be decided in polynomial time.

Proposition 2

Let \(\mathcal {T}\) be an \(\mathcal {EL}\) TBox and \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) qABoxes. Then the following statements are equivalent:

  • \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_{\mathsf {IRQ}} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\),

  • \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_\mathsf {IQ} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\) and \(r(a,b)\in \mathcal {B} \) implies \(r(a,b)\in \mathcal {A} \) for all \(r\in \varSigma _{\mathsf {R}} \) and \(a,b\in \varSigma _{\mathsf {I}} \).

Since ABoxes consist of concept and role assertions, we obtain the following characterization of entailment between a qABox and an ABox, which implies that this entailment can be decided in polynomial time.

Proposition 3

Let \(\mathcal {T}\) be an \(\mathcal {EL}\) TBox, \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) a qABox, and \(\mathcal {B} \) an ABox. Then \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}\mathcal {B} \) iff \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_{\mathsf {IRQ}} \mathcal {B} \).

3 Optimal ABox Repairs and Approximations

We first introduce the notion of an optimal repair w.r.t. an entailment relation, and show that the approaches for computing optimal \(\mathsf {IQ}\)-repairs described in [3] can also be used to compute optimal \(\mathsf {IRQ}\)-repairs. Then, we define optimal ABox approximations and show some useful properties for them. Finally, we introduce optimal ABox repairs, and describe how optimal ABox approximations can be used to obtain them from optimal \(\mathsf {IRQ}\)-repairs.

3.1 Optimal \(\mathsf {IQ}\)- and \(\mathsf {IRQ}\)-Repairs

We start by recalling the definition of optimal repairs given in [3], but consider \(\mathsf {IRQ}\) as an additional entailment relation.

Definition 4

Let \(\mathcal {T}\) be an \(\mathcal {EL}\) TBox and \({\mathsf {QL}} \in \{{\mathsf {IRQ}},\mathsf {IQ} \}\).

  • An \(\mathcal {EL}\) repair request is a finite set of \(\mathcal {EL}\) concept assertions.

  • Given a qABox \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and an \(\mathcal {EL}\) repair request \(\mathcal {R}\), a \(\mathsf {QL}\)-repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) is a qABox \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) such that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_{\mathsf {QL}} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\) and \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\not \models ^{\mathcal {T}}C(a)\) for all \(C(a)\in \mathcal {R} \).

  • Such a repair \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) is optimal if there is no \(\mathsf {QL}\)-repair \(\exists {\,Z}\hbox {.}\,{\mathcal {C}}\) of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) such that \(\exists {\,Z}\hbox {.}\,{\mathcal {C}}\models ^{\mathcal {T}}_{\mathsf {QL}} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\) and \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\not \models ^{\mathcal {T}}_{\mathsf {QL}} \exists {\,Z}\hbox {.}\,{\mathcal {C}}\).

Two qABoxes are QL-equivalent if they \(\mathsf {QL}\)-entail each other, and \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) strictly QL-entails \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) if \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}_{\mathsf {QL}} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\) and \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\not \models ^{\mathcal {T}}_{\mathsf {QL}} \exists {\,X}\hbox {.}\,{\mathcal {A}}\). We say that a set \(\mathfrak {R} \) of \(\mathsf {QL}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) QL-covers all \(\mathsf {QL}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) if for every \(\mathsf {QL}\)-repair \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) there exists an element \(\exists {\,Z}\hbox {.}\,{\mathcal {C}}\) of \(\mathfrak {R} \) such that \(\exists {\,Z}\hbox {.}\,{\mathcal {C}}\models ^{\mathcal {T}}_{\mathsf {QL}} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\). It is easy to see that such a covering set \(\mathfrak {R}\) must contain, up to \(\mathsf {QL}\)-equivalence, all optimal \(\mathsf {QL}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\), and thus one can obtain from it, up to \(\mathsf {QL}\)-equivalence, the set of all optimal \(\mathsf {QL}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) by removing elements that are strictly \(\mathsf {QL}\)-entailed by another element. Clearly, this set still \(\mathsf {QL}\)-covers all \(\mathsf {QL}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\).

In [3], two ways of computing such a covering set for \(\mathsf {IQ}\)-repairs are described, the canonical \(\mathsf {IQ}\)-repairs and the optimized \(\mathsf {IQ}\)-repairs (see Proposition 8 and Theorem 14). Since these covering sets are of at most exponential cardinality, their elements are of at most exponential size, and \(\mathsf {IQ}\)-entailment can be decided in polynomial time, this shows that, up to \(\mathsf {IQ}\)-equivalence, the set of all optimal \(\mathsf {IQ}\)-repairs can be computed in exponential time.

The canonical (optimized) \(\mathsf {IQ}\)-repairs also yield covering sets for the \(\mathsf {IRQ}\) case. The reason is basically that the approaches for constructing them introduced in [3] do not generate new role assertions between individuals and preserve as many of them as possible, although this is not required for \(\mathsf {IQ}\)-entailment.

Proposition 5

Let \(\mathcal {T} \) be an \(\mathcal {EL}\) TBox, \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) a qABox, and \(\mathcal {R}\) an \(\mathcal {EL}\) repair request. If \(\mathfrak {R}\) is the set of all canonical or all optimized \(\mathsf {IQ}\)-repairs obtained from this input according to the definitions in [3], then \(\mathfrak {R}\) is a set of \(\mathsf {IRQ}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) that \(\mathsf {IRQ}\)-covers all \(\mathsf {IRQ}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\). In particular, up to \(\mathsf {IRQ}\)-equivalence, the set of optimal \(\mathsf {IRQ}\)-repairs can be computed in exponential time, and it \(\mathsf {IRQ}\)-covers all \(\mathsf {IRQ}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\).

Note that, though we have the same covering set \(\mathfrak {R}\) in the \(\mathsf {IQ}\) and in the \(\mathsf {IRQ}\) case, the sets of optimal repairs obtained from it by removing strictly entailed elements need not coincide since different entailment relations are used during this removal. Since the requirements for \(\mathsf {IQ}\) entailment are weaker than for \(\mathsf {IRQ}\) entailment, it could be that a qABox may be removed from \(\mathfrak {R}\) in the \(\mathsf {IQ}\) case, but must be retained in \(\mathsf {IRQ}\) case. Also notice that the proposition need not hold for arbitrary \(\mathsf {IQ}\)-covering sets. Its proof uses properties of the canonical and the optimized \(\mathsf {IQ}\)-repairs that need not hold for arbitrary covering sets.

Example 6

Consider the qABox \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {A} = {\{ A(a), r(a,x), r(x,x)\}}\), assume that the TBox is empty, and that the repair request is \(\{ A(a) \}\). An optimal \(\mathsf {IQ}\)-repair \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {A} '}\) can be obtained from this qABox by removing the assertion A(a) from \(\mathcal {A} \), and this is also an optimal \(\mathsf {IRQ}\)-repair. However, the ABox \(\{r(a,a)\}\) is also an optimal \(\mathsf {IQ}\)-repair since it is \(\mathsf {IQ}\)-equivalent to \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {A} '}\), but it is not even an \(\mathsf {IRQ}\)-repair since it is not \(\mathsf {IRQ}\)-entailed by \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {A}}\).

3.2 Optimal ABox Approximations

Given a qABox we are now interested in finding an ABox that approximates it as closely as possible in the sense that a minimal amount of information is lost. In the definition below, we use classical entailment. But note that, according to Proposition 3, this coincides with \(\mathsf {IRQ}\)-entailment.

Definition 7

Given a qABox \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and an \(\mathcal {EL}\) TBox \(\mathcal {T}\), we call an \(\mathcal {EL}\) ABox \(\mathcal {B}\) an ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\) if \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^\mathcal {T} \mathcal {B} \). The ABox approximation \(\mathcal {B}\) of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\) is optimal if there is no ABox approximation \(\mathcal {C}\) of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\) such that \(\mathcal {C} \models ^\mathcal {T} \mathcal {B} \), but \(\mathcal {B} \not \models ^\mathcal {T} \mathcal {C} \).

Such an optimal ABox approximation need not exist. The qABox \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {A} '}\) with \(\mathcal {A} ' = \{ r(a,x), r(x,x)\}\) is an example for this case. In fact, this qABox entails \(((\exists {r}.)^n{\top })(a)\) for all \(n\ge 1\), which is not possible for an ABox entailed by \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {A} '}\) since such an ABox cannot contain role assertions and can contain only finitely many concept assertions. However, if an optimal ABox approximation exists, then it is unique up to equivalence. This is an easy consequence of the fact that the union of two ABox approximations is again an ABox approximation.

Proposition 8

If \(\mathcal {B} _1\) and \(\mathcal {B} _2\) are optimal ABox approximations of the qABox \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. the \(\mathcal {EL}\) TBox \(\mathcal {T}\), then \(\mathcal {B} _1\) and \(\mathcal {B} _2\) are equivalent w.r.t. \(\mathcal {T}\).

Optimal ABox approximations can now be characterized as follows.

Theorem 9

The ABox \(\mathcal {B} \) is an optimal ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\) iff \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\mathcal {B} \) are \(\mathsf {IRQ}\)-equivalent.

Proof

First, assume that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\mathcal {B} \) are \(\mathsf {IRQ}\)-equivalent w.r.t. \(\mathcal {T}\). Then \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}\mathcal {B} \) by Proposition 3, and thus \(\mathcal {B} \) is an ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\). If \(\mathcal {C} \) is another ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\), then \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}}\mathcal {C} \) by definition, and thus \(\mathcal {B} \models ^{\mathcal {T}}\mathcal {C} \) due to the assumed \(\mathsf {IRQ}\)-equivalence. This shows optimality of \(\mathcal {B} \).

Second, assume that \(\mathcal {B} \) is an optimal ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\) that is not \(\mathsf {IRQ}\)-equivalent with \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\). Then there is either a role assertion that belongs to \(\mathcal {A} \), but not to \(\mathcal {B} \), or a concept assertion that is entailed w.r.t. \(\mathcal {T}\) by \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\), but not by \(\mathcal {B} \). Adding this assertion to \(\mathcal {B} \) yields an ABox \(\mathcal {B} '\) that is an ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\). In addition, it satisfies \(\mathcal {B} '\models ^{\mathcal {T}}\mathcal {B} \), but not \(\mathcal {B} \models ^{\mathcal {T}}\mathcal {B} '\), which contradicts the assumed optimality of \(\mathcal {B} \).    \(\square \)

An approach for deciding whether a given qABox has an optimal ABox approximation, and for computing it in case it exists, will be described in Sect. 4. But first, we show how optimal ABox approximations can be used to compute optimal ABox repairs.

3.3 Optimal ABox Repairs

The repair approaches developed in [3] in general yield quantified ABoxes as output, even if the input is an ABox. We are now interested in producing repairs that are ABoxes. The approach developed below does not require the input to be an ABox. It actually assumes that the input is a qABox, which means that input ABoxes first need to be transformed into equivalent qABoxes.

Definition 10

Let \(\mathcal {T} \) be an \(\mathcal {EL}\) TBox, \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) a qABox, and \(\mathcal {R}\) an \(\mathcal {EL}\) repair request. We call an \(\mathcal {EL}\) ABox \(\mathcal {B}\) an ABox repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) if \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^\mathcal {T} \mathcal {B} \) and \(\mathcal {B} \not \models ^{\mathcal {T}}C(a)\) for all \(C(a)\in \mathcal {R} \). The ABox repair \(\mathcal {B}\) of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) is optimal if there is no ABox repair \(\mathcal {C}\) of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) such that \(\mathcal {C} \models ^\mathcal {T} \mathcal {B} \), but \(\mathcal {B} \not \models ^\mathcal {T} \mathcal {C} \).

Our approach for computing optimal ABox repairs proceeds as follows: first, we compute the set of all optimal \(\mathsf {IRQ}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\), and then ABox-approximate the elements of this set. In the following, if we say that \(\mathfrak {R}\) is the set of optimal \(\mathsf {IRQ}\)-repairs of a qABox, we mean that, for every optimal \(\mathsf {IRQ}\)-repair, \(\mathfrak {R}\) contains one element of its \(\mathsf {IRQ}\)-equivalence class. Also, for a given qABox \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\), we define

$$ \mathsf {Oapp}^\mathcal {T} (\exists {\,Y}\hbox {.}\,{\mathcal {B}}) := \left\{ \begin{array}{cl} \{\mathcal {C} \} &{} \text {for an optimal ABox approx.}\ \mathcal {C} \text { of } \exists {\,Y}\hbox {.}\,{\mathcal {B}}\text { w.r.t.}\ \mathcal {T},\\ \emptyset &{} \text {if no optimal ABox approx.}\ \text {of } \exists {\,Y}\hbox {.}\,{\mathcal {B}}\text { w.r.t.}\ \ \text {Tmc exists}. \end{array} \right. $$

Theorem 11

Let \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) be a qABox, \(\mathcal {T} \) an \(\mathcal {EL}\)-TBox, \(\mathcal {R} \) an \(\mathcal {EL}\) repair request, and \(\mathfrak {R}\) the set of optimal \(\mathsf {IRQ}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \). Then the set

$$ \bigcup _{\exists {\,Y}\hbox {.}\,{\mathcal {B}}\in \mathfrak {R}}\mathsf {Oapp}^\mathcal {T} (\exists {\,Y}\hbox {.}\,{\mathcal {B}}) $$

consists of all optimal ABox repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \) up to equivalence.

Proof

First, assume that the ABox \(\mathcal {C} \) belongs to the union defined in the statement of the theorem. Then \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models _{\mathsf {IRQ}} ^{\mathcal {T}} \exists {\,Y}\hbox {.}\,{\mathcal {B}}\models ^{\mathcal {T}} \mathcal {C} \) for some qABox \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\in \mathfrak {R} \) that has \(\mathcal {C} \) as an optimal ABox approximation. This implies that \(\mathcal {C} \) does not entail any of the concept assertions in \(\mathcal {R} \) (since \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) does not) and that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models ^{\mathcal {T}} \mathcal {C} \). Thus, \(\mathcal {C} \) is an ABox repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \). It remains to show that it is optimal. Assume to the contrary that \(\mathcal {C} '\) is an ABox repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \) such that \(\mathcal {C} '\models ^{\mathcal {T}}\mathcal {C} \), but \(\mathcal {C} \not \models ^{\mathcal {T}}\mathcal {C} '\). Since \(\mathcal {C} \) and \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) are \(\mathsf {IRQ}\)-equivalent by Theorem 9, this is a contradiction to the fact that \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) is an optimal \(\mathsf {IRQ}\)-repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T} \) since \(\mathcal {C} '\) would then be a better \(\mathsf {IRQ}\)-repair.

Second, assume that the ABox \(\mathcal {C} \) is an optimal ABox repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \). Then \(\mathcal {C} \) is also an \(\mathsf {IRQ}\)-repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \), and thus Proposition 5 yields that there is an optimal \(\mathsf {IRQ}\)-repair \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\in \mathfrak {R} \) such that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\models _{\mathsf {IRQ}} ^{\mathcal {T}}\, \exists {\,Y}\hbox {.}\,{\mathcal {B}}\models _{\mathsf {IRQ}} ^{\mathcal {T}} \mathcal {C} \). We know by Proposition 3 that the second \(\mathsf {IRQ}\)-entailment is in fact an entailment, and thus \(\mathcal {C} \) is an ABox approximation of \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\). It remains to show that it is optimal. Assume to the contrary that \(\mathcal {C} '\) is an ABox approximation of \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) such that \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\models ^{\mathcal {T}}\mathcal {C} '\models ^{\mathcal {T}}\mathcal {C} \), but \(\mathcal {C} \not \models ^{\mathcal {T}}\mathcal {C} '\). But then \(\mathcal {C} '\) is an ABox repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \) (since \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\) is a repair) that is better than \(\mathcal {C} \), which contradicts our assumption that \(\mathcal {C} \) is optimal.    \(\square \)

Once we have developed a method for computing the sets \(\mathsf {Oapp}^\mathcal {T} (\exists {\,Y}\hbox {.}\,{\mathcal {B}})\), this theorem shows how to compute the set of all optimal ABox repairs of a given qABox. Such a method will be introduced in the next section. Before doing this, we want to point out that, in contrast to the set of optimal \(\mathsf {IRQ}\)-repairs, which covers all \(\mathsf {IRQ}\)-repairs, the set of optimal ABox repairs in general does not cover all ABox repairs.

Example 12

Consider the ABox \(\mathcal {A} = {\{ A(a), r(a,b), B(b)\}}\), the TBox \(\mathcal {T} = \{ B\sqsubseteq \exists {\,r}\hbox {.}\,{B}, \exists {\,r}\hbox {.}\,{B} \sqsubseteq B\}\) and the repair request \(\mathcal {R} = \{ (A\sqcap \exists {\,r}\hbox {.}\,{B})(a) \}\). There are basically three options for \(\mathsf {IRQ}\)-repairing \(\mathcal {A} \): remove A(a), remove B(b), or remove r(ab). Since things implied by the TBox must also be taken into account, these three options yield the following optimal \(\mathsf {IRQ}\)-repairs of \(\mathcal {A} \) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\):Footnote 5 \(\mathcal {B} _1 = \{ r(a,b), B(b)\}\) as well as \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {B} _i}\) for \(i=2,3\), where \(\mathcal {B} _2 = \{ A(a), r(a,b), r(b,x), r(x,x)\}\) and \(\mathcal {B} _3 = \{ A(a), B(b), r(a,x), r(x,x)\}\). Of these three, \(\mathcal {B} _1\) is already an ABox, and thus its own optimal ABox approximation, whereas the other two have no optimal ABox approximation. However, they have non-optimal ABox approximations, which are not necessarily covered by \(\mathcal {B} _1\). For example, \(\{ A(a), r(a,b), (\exists {\,r}\hbox {.}\,{\exists {\,r}\hbox {.}\,{\top }})(b)\}\) is an ABox approximation of \(\exists {\,\{x\}}\hbox {.}\,{\mathcal {B} _2}\) and an ABox repair of \(\mathcal {A} \) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\), but since it contains A(a), it is not entailed by \(\mathcal {B} _1\).

4 Computing Optimal ABox Approximations

In this section, we assume that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) is a qABox and \(\mathcal {T}\) an \(\mathcal {EL}\) TBox. We will develop an approach for deciding whether \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) has an optimal ABox approximation w.r.t. \(\mathcal {T}\), which in the affirmative case also yields such an optimal approximation.

The first step is to saturate \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) using the \(\mathsf {IQ}\)-saturation rules of Fig. 1. In the following, let \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) denote a (fixed) qABox obtained by applying the \(\mathsf {IQ}\)-saturation rules exhaustively to \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\). Note that the size of \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) is polynomial in the size of the input \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\mathcal {T}\), and that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) are \(\mathsf {IQ}\)-equivalent w.r.t. \(\mathcal {T}\) by Theorem 1. In addition, it is easy to see that these two qABoxes contain the same individuals and the same role assertions between individuals. Thus, they are even \(\mathsf {IRQ}\)-equivalent w.r.t. \(\mathcal {T}\). As before, we use \(\varSigma _{\mathsf {I}}\) to denote set of individuals of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\).

In the next step, we transform \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) into a new qABox, called pre-approximation, whose matrix basically consists of the union of ABoxes \(\mathcal {B} _a\) for each \(a\in \varSigma _{\mathsf {I}} \), extended with the role assertions between individuals in \(\mathcal {A}\). Each ABox \(\mathcal {B} _a\) contains a as the only individual name, and further contains a fully anonymized copy of the saturation \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\), which is connected with a by indispensable role assertions.

Definition 13

We call a role assertion r(au) in \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) for \(a\in \varSigma _{\mathsf {I}} \) indispensable if there is no role assertion r(ab) for \(b\in \varSigma _{\mathsf {I}} \) such that there is a simulation from \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) to itself that contains (ub).

Since an individual always simulates itself, only role assertion r(au) where u is a variable can be indispensable. We are now ready to define the pre-approximation.

Definition 14

The pre-approximation \(\mathsf {pre}\hbox {-}\mathsf {approx}^\mathcal {T} _{\mathsf {IRQ}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\) is defined as the quantified ABox \(\exists {\,Y}\hbox {.}\,{\mathcal {B}}\), where

$$\begin{aligned}&\begin{aligned} Y{:}{=}{}&\{\, u'\, |\, u\text { is an object name occurring in }\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}}) \,\},\\ \mathcal {B} {:}{=}{}&\bigcup \{\, \mathcal {B} _a\, |\, a\text { is an individual name in }\varSigma _{\mathsf {I}} \,\}\\&\cup \{\, r(a,b)\, |\, r(a,b)\text { occurs in } \underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\text { where }a, b\in \varSigma _{\mathsf {I}} \,\},\\ \mathcal {B} _a{:}{=}{}&\{\, A(a)\, |\, A(a)\text { occurs in }\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}}) \,\}\\&\cup \{\, r(a,u')\, |\, r(a,u)\text { occurs in }\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\text { and is indispensable} \,\}\\&\cup \{\, A(u')\, |\, A(u)\text { occurs in }\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}}) \,\}\\&\cup \{\, r(u',v')\, |\, r(u,v)\text { occurs in }\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}}) \,\}. \end{aligned} \end{aligned}$$

Obviously, the pre-approximation can be computed in polynomial time. In addition, it is \(\mathsf {IRQ}\)-equivalent to \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) [4].

Lemma 15

The qABoxes \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) and \(\mathsf {pre}\hbox {-}\mathsf {approx}^\mathcal {T} _{\mathsf {IRQ}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) are \(\mathsf {IRQ}\)-equivalent w.r.t. the empty TBox \(\emptyset \), and thus also w.r.t. \(\mathcal {T}\).

Since we already know that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) and \(\underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) are \(\mathsf {IRQ}\)-equivalent w.r.t.  \(\mathcal {T}\), this shows that \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) is \(\mathsf {IRQ}\)-equivalent to its pre-approximation w.r.t. \(\mathcal {T}\). Consequently, an ABox \(\mathcal {C}\) is an optimal ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\) iff it is one of the pre-approximation w.r.t. \(\mathcal {T}\).

To test whether \(\mathsf {pre}\hbox {-}\mathsf {approx}^\mathcal {T} _{\mathsf {IRQ}} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\) has an optimal ABox approximation w.r.t. \(\mathcal {T}\), it is sufficient to check whether, for all \(a\in \varSigma _{\mathsf {I}} \), the individual a has a most specific concept in \(\mathcal {B} _a\) w.r.t. \(\mathcal {T}\).

Definition 16

Let \(\mathcal {C} \) be an \(\mathcal {EL}\) ABox, \(\mathcal {T}\) an \(\mathcal {EL}\) TBox, and a an individual name. The \(\mathcal {EL}\) concept C is a most specific concept (msc) of a in \(\mathcal {C} \) w.r.t. \(\mathcal {T}\) if \(\mathcal {C} \models ^{\mathcal {T}}C(a)\) and \(\mathcal {C} \models ^{\mathcal {T}}D(a)\) implies \(C\sqsubseteq ^{\mathcal {T}} D\) for all \(\mathcal {EL}\) concepts D.

The most specific concept need not exist, but if it does, then it is unique up to equivalence w.r.t. \(\mathcal {T}\). The ABox \(\mathcal {C}:= \{r(a,a)\}\) is a simple example where the msc of a does not exist w.r.t. the empty TBox. In fact, \(\mathcal {C} \models (\exists r.)^n\top (a)\) for all \(n\ge 1\), and it is easy to see that no \(\mathcal {EL}\) concept can be subsumed by these infinitely many concepts. Note, however, that \(\mathcal {C} \) has an optimal ABox approximation since it is itself an ABox. In this case, the pre-approximation is \(\{r(a,a)\}\cup \mathcal {B} _a\) where \(\mathcal {B} _a = \{r(a',a')\}\). There is no role assertion \(r(a,a')\) since r(aa) is not indispensable. While \(a'\) does not have an msc in \(\mathcal {B} _a\), this is not what we are interested in. We want to know whether a has one, and the answer is “yes” since \(\top \) is an msc of a in \(\mathcal {B} _a\). The problem of testing for the existence of and computing the msc in \(\mathcal {EL}\) was investigated in [20], where the following result is stated.

Proposition 17

([20]). Let \(\mathcal {C} \) be an \(\mathcal {EL}\) ABox, \(\mathcal {T}\) an \(\mathcal {EL}\) TBox, and a an individual name. It can be decided in polynomial time whether a has a most specific concept in \(\mathcal {C} \) w.r.t. \(\mathcal {T}\), and if the msc exists, then it can be computed in exponential time.

The main idea underlying the proof of this proposition (rephrased into the setting of the present paper) is to unravel the \(\mathsf {IQ}\)-saturation of \(\mathcal {C}\) w.r.t. \(\mathcal {T}\) into a concept \(C_k\) an increasing number k of steps, starting from a. After each step, one tests whether the ABox \(\{C_k(a)\}\) \(\mathsf {IQ}\)-entails \(\exists {\,X}\hbox {.}\,{\mathcal {C}}\) w.r.t. \(\mathcal {T}\), where X consists of the object names in \(\mathcal {C}\) different from a. In case this test succeeds, the concept \(C_k\) is the msc of a in \(\mathcal {C} \) w.r.t. \(\mathcal {T}\). This yields an effective test for the existence of the msc since the following can be shown: there is a polynomial p such that the entailment test succeeds after at most \(p(|\mathcal {C} |,|\mathcal {T} |)\) steps iff the msc exists.

For example, for the ABox \(\mathcal {C} ^{(1)} = \{r(a,a)\}\) and the TBox \(\mathcal {T} ^{(1)} = \emptyset \), the 0-step unraveling is \(C_0^{(1)} = \top \), the 1-step unraveling is \(C^{(1)}_1 = \exists r.\top \), the two-step unraveling is \(C^{(1)}_2 = \exists r.\exists r.\top \), etc. It is easy to that there is no k such that the entailment test succeeds. Thus, it does not succeed for \(k(\mathcal {C} ^{(1)},\mathcal {T} ^{(1)})\), which shows that a does not have an msc. If instead we consider the ABox \(\mathcal {C} ^{(2)} = \{A(a), r(a,b), s(a,b), r(b,c), s(b,c), B(c)\}\) w.r.t. \(\mathcal {T} ^{(2)} = \emptyset \), then the 0-step unraveling is \(C^{(2)}_0 = A\), the 1-step unraveling is \(C^{(2)}_1 = A \sqcap \exists r.\top \sqcap \exists s.\top \), the 2-step unraveling is \(C^{(2)}_2 = A \sqcap \exists r.(\exists r.B\sqcap \exists s.B)\sqcap \exists s.(\exists r.B\sqcap \exists s.B)\), and the 3-step unraveling is identical to \(C^{(2)}_2\). The entailment test succeeds for \(k = 2\). It is easy to see that, whenever the unraveling becomes stable (which happens if no cycle in the ABox is reachable from a), then the entailment test succeeds. However, a reachable cycle in the ABox need not prevent the existence of the msc. For example, the individual a has the msc \(\exists r.B\) in \(\mathcal {C} ^{(3)} = \{ r(a,b), r(b,b), B(b)\}\) w.r.t. \(\mathcal {T} ^{(3)} = \{B\sqsubseteq \exists {\,r}\hbox {.}\,{B}\}\).

As sketched until now, this method for deciding the existence of the msc does not yield a polynomial-time decision procedure. The reason is that, though the bound \(k(\mathcal {C},\mathcal {T})\) on the number of steps is polynomial, the unraveled concepts \(C_k\) may become exponential even for \(k\le k(\mathcal {C},\mathcal {T})\), as can be seen using an obvious generalization of our example ABox \(\mathcal {C} ^{(2)}\). This problem can be avoided by employing structure-sharing, which can be realized by representing the ABoxes \(\{C_k(a)\}\) by \(\mathsf {IQ}\)-equivalent qABoxes. In our second example, the ABox \(\{C^{(2)}_2(a)\}\) can be represented by the more compact \(\mathsf {IQ}\)-equivalent qABox \(\exists {\,\{x,y\}}\hbox {.}\,{\{A(a), r(a,x), s(a,x), r(x,y), s(x,y), B(y)\}}\) (see the definition of the k-unraveling in [1] for how such an unraveling with structure sharing can be defined in general). It is easy to see that the qABoxes representing the ABoxes \(\{C_k(a)\}\) are of polynomial size. Since \(\mathsf {IQ}\)-entailment between qABoxes is polynomial, this yields the polynomiality result stated in the proposition. Note, however, that the msc obtained this way is still an unraveled concept \(C_k\) without structure sharing, and thus may be of exponential size.

The following theorem shows that existence of the optimal ABox approximation can be reduced to existence of the msc (see [4] for the proof).

Theorem 18

Let \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) be a qABox with set of individuals \(\varSigma _{\mathsf {I}}\), let \(\mathcal {T}\) be an \(\mathcal {EL}\) TBox, and let \(\mathcal {B} _a\) for all \(a\in \varSigma _{\mathsf {I}} \) be the ABoxes introduced in Definition 14. Then \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) has an optimal ABox approximation w.r.t. \(\mathcal {T}\) iff, for all individuals \(a\in \varSigma _{\mathsf {I}} \), the msc of a in \(\mathcal {B} _a\) w.r.t. \(\mathcal {T}\) exists. If the latter condition is satisfied and \(C_a\) are these most specific concepts, then the following ABox is an optimal ABox approximation of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) w.r.t. \(\mathcal {T}\):

$$ \{\, C_a(a)\, |\, a\in \varSigma _{\mathsf {I}} \} \cup \{r(a,b)\, |\, r(a,b)\text { occurs in } \underline{\mathsf {sat}}^\mathcal {T} _\mathsf {IQ} (\exists {\,X}\hbox {.}\,{\mathcal {A}})\text { where }a, b\in \varSigma _{\mathsf {I}} \,\}. $$

In particular, the existence of an optimal ABox approximation can be tested in polynomial time and such an optimal approximation can be computed in exponential time if it exists.

5 Computing Optimal ABox Repairs

We can now reap the benefits from the results shown in the previous two sections. Given a qABox \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\), an \(\mathcal {EL}\) TBox \(\mathcal {T}\), and an \(\mathcal {EL}\) repair request \(\mathcal {R}\), we can compute the set \(\mathfrak {R}\) of optimal \(\mathsf {IRQ}\)-repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) in exponential time. More precisely, by Proposition 5 this set contains at most exponentially many repairs, each of which has at most exponential size. Theorem 11 then says that the set of all optimal ABox repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R}\) w.r.t. \(\mathcal {T}\) (up to equivalence) consists of the optimal ABox approximations w.r.t. \(\mathcal {T}\) of those elements of \(\mathfrak {R}\) for which such an optimal approximation exists. Finally, Theorem 18 shows how to decide existence of such optimal approximations and how to compute them if they exist. Since the elements of \(\mathfrak {R}\) are already of exponential size, existence can be tested in exponential time and the size of the computed approximations is at most double-exponential.

Theorem 19

Let \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) be a qABox, \(\mathcal {T}\) an \(\mathcal {EL}\)-TBox, and \(\mathcal {R} \) an \(\mathcal {EL}\) repair request. Then the existence of an optimal ABox repair of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T} \) can be decided in exponential time, and the set of all such repairs can be computed in double-exponential time. This set contains at most exponentially many elements, each of which has at most double-exponential size.

If the given qABox does not have an optimal repair or if we are looking for a repair not covered by an optimal one, our approach can also be used to compute non-optimal ABox repairs. In fact, consider an optimal \(\mathsf {IRQ}\)-repair that does not have an optimal ABox approximation. Then there are individuals a whose msc in \(\mathcal {B} _a\) does not exist. Following [17], we can then use the role-depth bounded msc instead, which is basically obtained by unraveling up to a fixed bound k on the role-depth (i.e., the maximal nesting of existential restrictions). This way, we can produce a set of (possibly) non-optimal ABox repairs, which covers all ABox repairs whose concept assertions satisfy this bound on the role depth.

There are also cases where the existence of the optimal ABox approximation of the optimal \(\mathsf {IRQ}\)-repairs is guaranteed. In fact, if the qABox is acyclic and the TBox is cycle-restricted (i.e., there is no concept C such that \(C\sqsubseteq ^{\mathcal {T}}\exists r_1.\cdots \exists r_k.C\), as defined in [3]), then the optimal \(\mathsf {IRQ}\)-repairs are acyclic, which implies that the ABoxes \(\mathcal {B} _a\) in the pre-approximations are also acyclic. Consequently, all optimal \(\mathsf {IRQ}\)-repairs have an optimal ABox approximation. The following corollary is an easy consequence of this observation.

Corollary 20

Let \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) be an acyclic qABox, \(\mathcal {T}\) a cycle-restricted \(\mathcal {EL}\)-TBox, and \(\mathcal {R} \) an \(\mathcal {EL}\) repair request. Then the set of optimal ABox repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T}\) is non-empty, and it \(\mathsf {IRQ}\)-covers all ABox repairs of \(\exists {\,X}\hbox {.}\,{\mathcal {A}}\) for \(\mathcal {R} \) w.r.t. \(\mathcal {T}\).

6 Conclusion

Traditional repair approaches for DL-based ontologies, which compute maximal subsets of the ontology that do not have the unwanted consequences, are syntax-dependent and thus may remove too many consequences. Recently developed syntax-independent approaches for repairing DL ABoxes [3, 5, 7] compute optimal repairs that do not lose consequences unnecessarily, but they have the disadvantage that they produce quantified ABoxes rather than traditional ABoxes. In this paper we show how to overcome this problem by developing methods for computing optimal repairs that are traditional ABoxes. These methods are based on the computation of optimal \(\mathsf {IRQ}\)-repairs, by adapting the approaches in [3] for computing optimal \(\mathsf {IQ}\)-repairs, and then optimally approximating these qABoxes with ABoxes.

A perceived disadvantage of our approach could be that optimal ABox repairs need not exist, and even if they do, they need not cover all ABox repairs. However, by Corollary 20 this problem does not occur if the ABox is acyclic and the TBox is cycle-restricted. To see how often this corollary applies in practice, we checked the 80 large ontologies used in the experiments in [3]: 62 have cycle-restricted TBoxes, and of those only 7 have cyclic ABoxes. Thus, our Corollary 20 applies to 55 of the 80 ontologies considered in [3].

Another disadvantage could be the potentially double-exponential size of optimal ABox repairs. However, the first exponential comes from the computation of the optimal \(\mathsf {IQ}\)-repairs, and the experiments in [3] indicate that this exponential blow-up does not occur in practice if the optimized approach for computing \(\mathsf {IQ}\)-repairs is used. We do not yet have experimental results regarding the possible exponential blow-up due to the computation of ABox approximations, but would be surprised if this happened often in practice.

What is called “repair” in the DL community is closely related to what is called “contraction” in the Belief Change community. For classical repairs and also for the gentle repairs of [6], this connection was investigated in [14]. It would be interesting to see whether this investigation can be extended to our optimal ABox repairs. The original intention underlying our repair approach is that the ontology engineer chooses one of the computed optimal repairs as the new, repaired ABox. Alternatively, one could try to adapt the different repair semantics employed in inconsistency-tolerant query answering [9, 12] from classical repairs to our optimal repairs.