1 Introduction

An inconsistent database is a database that fails to satisfy one or more integrity constraints that the data are hand are supposed to obey. Inconsistency in databases arises in a variety of applications, including data integration and data warehousing, where the task is to bring together data distributed over different sources that may obey mutually incompatible constraints. In practice, inconsistency is handled mainly via data cleaning, which means that the inconsistent database is transformed, through deletions or additions, to a consistent one that is then used for query answering or for warehousing purposes. This process, however, forces arbitrary choices to be made since, in general, there is a multitude of ways in which an inconsistent databases can be transformed to a consistent one. Arenas, Bertossi and Chomicki [4] introduced a principled approach to the management of inconsistency by formulating the notions of a repair of an inconsistent database and of consistent query answering. Intuitively, a repair of an inconsistent database I is a consistent database J that differs from I in a “minimal” way. Furthermore, the consistent answers of a query q on an inconsistent database I are defined to be the intersection \(\bigcap \{q(J): J ~\text {is a repair of} \ I\}\). Thus, the inconsistencies in the database are kept, but are handled at query time by considering all repairs and returning the tuples that are guaranteed to be in the result of the query on every repair.

Two algorithmic problems concerning repairs of inconsistent databases naturally arise. The first is, of course, the problem of computing the consistent answers of a query over an inconsistent database. The second is the repair checking problem, which can be thought of as the model-checking problem for repairs: given two database instances I and J, is J a repair of I? Since the publication of [4] in 1999, these two problems have been extensively explored for different types of repairs (set-based repairs, cardinality-based repairs, attribute-based repairs) and for different types of constraints. As regards types of constraints, the earlier work on repair checking and consistent query answering focused on functional dependencies, inclusion dependencies, and denial constraints (see the overviews [7, 13]). More recently, broader classes of constraints, such as tuple-generating dependencies (tgds) and equality-generating dependencies (egds), have also been considered in the study of repair checking and consistent query answering. As is well known, these classes of constraints were originally investigated in the context of classical dependency theory, but in the past decade have found numerous uses in the context of data integration and data exchange. For some of these broader classes of constraints, the repair-checking problem has been studied in [2] and [24], and the consistent query answering problem in [32] and [3].

In this paper, we systematically explore the data complexity of consistent query answering for sets of tgds and egds. This means that for every fixed set Σ of tgds and egds and for every fixed conjunctive query q, we consider the complexity of the following algorithmic problem: given a database instance I, compute the consistent answers of q on I w.r.t. Σ. Concerning the types of repairs, we consider set-based repairs, that is, subset-repairs, superset-repairs, and ⊕-repairs (symmetric difference repairs).

Our main results about the data complexity of consistent query answering, together with previously known results, are summarized in Table 1. In this table, by an entry such as “C/C-comp.” we mean that for every fixed set of dependencies in the class considered and for every fixed conjunctive query, the problem is in C and that there is a set of dependencies in the class and a conjunctive query for which the problem is C-complete.

Table 1 Data complexity of the repair checking problem and the consistent query answering problem for conjunctive queries

One finding of our investigation is that the class of LAV (local-as-view) tgds exhibits tractable algorithmic behavior as regards the data complexity of consistent query answering with respect to all three types of set-based repairs considered here; this extends earlier results about inclusion dependencies and subset-repairs [14]. Another finding is that there is a set (in fact, a singleton set) of GAV (global-as-view) tgds for which the data complexity of the consistent query answering with respect to both subset-repairs and ⊕-repairs is coNP-complete; this strengthens a coNP-completeness result for GAV tgds and functional dependencies with respect to ⊕-constraints in [32]. We also show that there are sets of weakly acyclic sets of tgds for which the data complexity of the consistent query answering problem is \({{\Pi }_{2}^{p}}\)-complete with respect to both subset-repairs and ⊕-repairs; earlier, \({{\Pi }_{2}^{p}}\)-completeness results had been obtained for the data complexity of consistent query answering for sets of functional dependencies and universal constraints [32] with respect to ⊕-repairs. Finally, we show that the assumption of weak acyclicity is of the essence for the decidability of the consistent query answering problem. Specifically, we show that there is a fixed set of tgds and a fixed conjunctive query for which the consistent query answering problem is undecidable with respect to ⊕-repairs; furthermore, a similar undecidability result holds for superset-repairs. Previously, it was known that the consistent query answering problem was undecidable in combined complexity for conjunctive queries and for sets of inclusion dependencies and functional dependencies with respect to superset-repairs [31]. It was also known that the consistent query answering problem was undecidable in combined complexity for unions of conjunctive queries and for sets of inclusion dependencies and functional dependencies with respect to ⊕-repairs [11]. Finally, it was known that there is a fixed set of universal constraints and a fixed universal query for which consistent query answering is undecidable with respect to ⊕-repairs [3].

In addition to the data complexity of consistent query answering, we also addressed the following question: For which types of dependencies is it the case that, given a database instance, there is an efficient way to compute a “representative” and “useful” repair?

We formalized and answered this question by introducing the notion of a universal repair and the notion of an n-universal repair, where n is a positive integer. These notions are analogous to and, in fact, are motivated from the notion of a universal solution in data exchange [17]. Furthermore, the notion of universal repair is closely related to the notion of nucleus in [35], since a universal repair is nucleus that is also a repair. Informally, a universal repair is a repair such that the consistent answers of an arbitrary conjunctive query can be computed by essentially evaluating the query on the universal repair. Similarly, an n-universal repair has the same property but only for conjunctive queries with at most n atoms. We study the existence of universal repairs and of n-universal repairs, as well as structural properties of such repairs and the complexity of computing them. We show that, if Σ is a set of LAV tgds, then every database instance has a unique universal ⊕-repair that is also a universal subset-repair and can be computed in polynomial time. Furthermore, while not every database instance has a universal superset-repair, we show that for every n≥1, an n-universal superset-repair exists and can be computed in polynomial time. If Σ is a weakly acyclic set of tgds and egds, things are the other way around: every database instance has a universal superset-repair that can be computed in polynomial time, but not every instance has a universal, or even a 1-universal, ⊕-repair; furthermore, similar limitations hold true for subset-repairs.

An extended abstract of the present paper has appeared earlier in [33]. The present paper augments it by providing detailed proofs and new material. In particular, Proposition 3.8 and Theorem 4.1 are new results obtained since the publication of [33].

2 Basic Notions

A schema R is a finite sequence (R 1,…,R k ) of relation symbols, each of a fixed arity. An instance I over R is a sequence \(({R^{I}_{1}}, {\ldots } ,{R^{I}_{k}})\), where each \({R^{I}_{i}}\) is a relation of the same arity as R i . For notational simplicity, we shall write R i to denote both the relation symbol and the relation \({R^{I}_{i}}\) that interprets it. A fact of an instance I (over R) is an expression R i (v 1,…,v m ), where R i is one of the relations of I and v 1,…,v m are values such that (v 1,…,v m )∈R i . Every instance can be identified with the set of its facts. The size of an instance is the number of facts it has. The active domain of an instance I, denoted by adom(I), is the set of all values occurring in the facts of I. We assume that all instances I considered are finite, which means that every relation R i of I is finite, for 1≤ik.

Definition 2.1 (Dependencies)

A tuple-generating dependency (tgd) is a first-order sentence of the form

$$\forall \mathbf{x} (\phi(\mathbf{x}) \to\exists \mathbf{y}~\psi(\mathbf{x},\mathbf{y})), $$

where ϕ, ψ are conjunctions of atomic formulas, x=(x 1,…,x n ) and y=(y 1,…,y m ) are tuples of variables, and every universally quantified variable x i occurs in ϕ.

An equality-generating dependency (egd) is a first-order sentence of the form

$$\forall \mathbf{x} (\phi(\mathbf{x})\to x_{k}=x_{\ell}), $$

where ϕ is a conjunction of atomic formulas, x=(x 1,…,x n ) is a tuple of variables, 1≤k,n, and each universally quantified variable x i occurs in ϕ.

In what follows, by the term dependency, we will mean a tgd or an egd. Also, by a set of dependencies, we will mean a finite set of dependencies.

Definition 2.2

A local-as-view or, simply, LAV tgd is a tgd ∀x(ϕ(x)→∃y ψ(x,y)) in which ϕ is a single atomic formula.

A global-as-view or, simply, GAV tgd is a tgd ∀x(ϕ(x)→ψ(x )) in which ψ is a single atomic formula such that the variables in x are among the variables of x.

For example, the copy tgd ∀xy(E(x,y)→F(x,y)) is both a LAV tgd and a GAV tgd. In contrast,

$$\forall x\forall y (E(x,y)\rightarrow \exists z(F(x,z)\land F(z,y))) $$

is a LAV tgd that is not a GAV tgd, while

$$\forall x\forall y \forall z(E(x,z)\land E(z,y)\rightarrow F(x,y))$$

is a GAV tgd that is not a LAV tgd. Note that every inclusion dependency is a LAV tgd, but not vice-versa. Furthermore, every tgd with no existential quantifiers in its right-hand side (such tgds are called full) is logically equivalent to a set of GAV tgds. For example, the full tgd ∀xyz(E(x,z)∧E(z,y)→F(x,y)∧P(x)) is logically equivalent to the set consisting of the GAV tgds ∀xyz(E(x,z)∧E(z,y)→F(x,y)) and ∀xyz(E(x,z)∧E(z,y)→P(x)).

From now on and for the sake of readability, we will often drop the universal quantifiers when writing dependencies.

As mentioned in the Introduction, tgds play an important role in data exchange and data integration, where they are used to specify the relationship between a source (local) schema and a target (global) schema or to express constraints in a target (global) schema. Moreover, it is known that weakly acyclic sets of tgds have good algorithmic properties as regards data exchange that, in general, are not possessed by arbitrary sets of tgds.

Definition 2.3 (Weak Acyclicity [15, 17])

Let Σ be a set of tgds and egds over a schema S.

  • The dependency graph of Σ is the following directed graph.

    The nodes are the pairs (R,i), where RS is a relation of some arity k, and 1≤ik. We call such pairs positions.

    There is an edge from position (R,i) to position (S,j) if Σ contains a tgd ∀x(ϕ→∃y ψ) such that some variable from x occurs in position (R,i) in ϕ and in position (S,j) in ψ.

    There is a special edge from position (R,i) to position (S,j) if Σ contains a tgd ∀x(ϕ→∃y ψ) such that (i) some variable from x occurs in position (R,i) in ϕ and also occurs in ψ, and (ii) some variable from y occurs in position (S,j) in ψ.

  • We say that Σ is weakly acyclic if the dependency graph contains no cycle going through a special edge.

Every set of GAV tgds is weakly acyclic, since the dependency graph has no special edges. It is also easy to see that every acyclic set of inclusion dependencies is weakly acyclic. However, the set Σ={D(x,y)→M(y),M(y)→∃x D(x,y)} is a cyclic, yet weakly acyclic set of inclusion dependencies. Finally, the singleton set consisting of the inclusion dependency R(x,y)→∃z R(y,z) is not weakly acyclic because the dependency graph contains a self-loop on position R.2.

Intuitively, a repair of an instance I with respect to a set of dependencies Σ is an instance J that satisfies Σ and “differs minimally” from I. Here, we will consider three different set-based types of repairs. If I and J are instances, we will write IJ to denote the symmetric difference (IJ)∪(JI) of I and J, that is, IJ is the set of all facts that either belong to I and not to J, or belong to J and not to I.

Definition 2.4 (Repairs)

Let I,J be instances, and Σ a set of dependencies.

  • We say that J is an ⊕-repair of I w.r.t. Σ if J⊧Σ and there is no instance J such that J ⊧Σ and \(I\oplus J^{\prime } \subsetneq I\oplus J\).

  • We say that J is a subset-repair of I w.r.t. Σ if J is both a subset of I and a ⊕-repair of I w.r.t. Σ. Equivalently, J is a subset-repair of I w.r.t. Σ if JI, J⊧Σ and there is no instance J I such that J ⊧Σ and \(J \subsetneq J^{\prime }\).

  • We say that J is a superset-repair of I w.r.t Σ if J is both a superset of I and a ⊕-repair of I w.r.t. Σ. Equivalently, J is a superset-repair of I w.r.t. Σ if IJ, J⊧Σ, and there is no instance J such that IJ , J ⊧Σ and \(J^{\prime } \subsetneq J\).

In what follows and when the set Σ of dependencies is understood from the context, we will often drop the reference to Σ and, instead, write simply that J is an ⊕-repair of I, and similarly for subset and superset repairs.

For example, consider the key constraint R(x,y)∧R(x,z)→y=z and the instance I={R(a,b),R(a,c),R(b,c)}. Then I has exactly two subset-repairs J 1 and J 2 that are also its only ⊕-repairs, namely, J 1={R(a,b),R(b,c)} and J 2={R(a,c),R(b,c)}.

Next, let Σ={P(x)→Q(x)} and I={P(a),P(b)}. Then I has exactly four ⊕-repairs, namely, J 1=, J 2={P(a),Q(a)}, J 3={P(b),Q(b)}, and J 4={P(a),P(b),Q(a),Q(b)}. Note that J 1 is the only subset-repair of I, while J 4 is the only superset-repair of I.

For a different example, let Σ={R(x,y)→∃z R(y,z)} and I={R(1,2)}. It is easy to see that, for every n≥1 and every k<n, the “loop” instance

$$J_{n,k} = \{ R(i,i+1)\mid 1\leq i < n\} \cup \{R(n,k)\} $$

is a superset-repair (hence, also a ⊕-repair) of I w.r.t. Σ. Thus, I has infinitely many, and arbitrarily large, superset-repairs w.r.t. Σ.

Next, we give the definitions of the main algorithmic problems in the study of repairs.

Definition 2.5 (Repair Checking)

Let Σ be a set of dependencies, and let ⋆∈{⊕,subset,superset}. The ⋆-repair checking problem w.r.t. Σ asks: given instances I and I , check whether I is a ⋆-repair of I w.r.t. Σ.

Definition 2.6 (Consistent Answers)

Let Σ be a set of dependencies, q a conjunctive query, I an instance, and ⋆∈{⊕,subset,superset}.

  • The ⋆-consistent answers of q on I w.r.t. Σ, denoted by ⋆-Con(q,I, Σ), is the intersection

    $$\bigcap \{ q(J) \mid J \text{ is a}\ \star\text{-repair of } I \text{ w.r.t. } {\Sigma} \}. $$
  • The ⋆-consistent query answering problem of q w.r.t. Σ asks: given an instance I, compute the ⋆-consistent answer of q on I with respect to Σ.

Several remarks are in order now. First, we wish to emphasize that only finite instances and only finite repairs are considered in this paper. It is known that there exist sets of dependencies Σ, conjunctive queries q, and instances I such that the consistent answer to q on I w.r.t. Σ would yield a different result if infinite repairs would be considered as well (see [31]). Second, if Σ consists of tgds and egds, there is always a subset-repair and, hence, also an ⊕-repair. If Σ consists of tgds only, there is also always a superset-repair.

Proposition 2.7

If Σ is a set dependencies, then every instance has a subset-repair and, hence, also an ⊕-repair w.r.t. Σ. Furthermore, Σ is a set of tgds, then every instance has a superset-repair w.r.t. Σ.

Proof

The empty instance satisfies Σ, hence, for every instance I that fails to satisfy Σ, there is a maximal (w.r.t. ⊆) instance J such that JI and J⊧Σ. Now let Σ consist of tgds only, and let I be an instance. Take J to be the instance that has the same active domain of I and in which each relation symbol denotes the complete relation (of appropriate arity) over the given active domain. Then IJ and J⊧Σ. Hence, there is a minimal (w.r.t. ⊆) instance J J such that IJ and J ⊧Σ. By construction, J is a superset-repair of I w.r.t. Σ. □

Finally, it is easy to see that, for all ⋆∈{⊕,subset,superset}, it is the case that the ⋆-repair checking problem is in coNP. Moreover, the subset-consistent query answering is always in \({{\Pi }^{p}_{2}}\). For example, to check that J is not an ⊕-repair of I, one has to guess an instance J of size at most |I|+|J| and verify that J ⊧Σ and IJ IJ.

Note that the notion of superset-consistent answers is closely related to the notion of certain answers in incomplete information and in data exchange. To make the connection with data exchange precise, recall from [17] that a schema mapping (also called data exchange setting) is a 4-tuple \(\mathcal {M}=(\mathbf {S}, \mathbf {T}, {\Sigma }_{st}, {\Sigma }_{t})\), where S and T are disjoint schemas called the source schema and the target schema, Σ s t is a collection of source-to-target tgds, and Σ t is a set of target dependencies, that is, dependencies over the target schema. Recall also that an instance J over the target schema is said to be a solution for an instance I over the source schema if the pair (I,J) satisfies all constraints in Σ s t ∪Σ t . Finally, recall that, if \(\mathcal {M}\) is a schema mapping, q is a query over the target schema of \(\mathcal {M}\), and I is an instance over the source schema of \(\mathcal {M}\), then the certain answers of q on I w.r.t. \(\mathcal {M}\), denoted by \(\text {certain}(q,I,\mathcal {M})\), are defined as follows:

$$\text{certain}(q,I,\mathcal{M}) = \bigcap\{q(J)|\ J\ \text{is solution for}\ I\ \text{w.r.t.}\ \mathcal{M}\}.$$

The connection between superset-consistent answers and certain answers is given by the next proposition.

Proposition 2.8

Let Σ is a set of dependencies and let \(\mathcal {M}_{\Sigma }\) be the schema mapping defined as follows:

  1. (i)

    The source-to-target tgds of \(\mathcal {M}_{\Sigma }\) are copy tgds of the form R (x)→R(x), where R is a relation symbol of the schema of Σ and R is a new relation symbol whose arity is that of R.

  2. (ii)

    The target tgds and egds of \(\mathcal {M}_{\Sigma }\) are the tgds and egds in Σ.

Then for every conjunctive query q and every instance I,

$$\text{superset-Con}(q,I,{\Sigma}) = \text{certain}(q,I^{\prime},\mathcal{M}_{\Sigma}),</p><p class="noindent">$$

where I is the copy of the instance I obtained by changing the name of every relation R of I to R .

The preceding proposition is proved by first observing that every superset repair of I w.r.t. Σ is a solution of I w.r.t. \(\mathcal {M}_{\Sigma }\) and then combining the monotonicity of conjunctive queries with the fact that every solution J for I w.r.t. \(\mathcal {M}_{\Sigma }\) contains a superset-repair of I. We will make use of this connection in some of our complexity results for the superset-consistent query answering problem.

Finally, we recall the definition of data complexity from [34] adapted to our context.

Definition 2.9

Let \(\mathcal L\) be a class of dependencies, let ⋆∈{⊕,subset,superset}, and let \(\mathcal C\) be a complexity class.

  • We say that the data complexity of ⋆-consistent query answering w.r.t. \(\mathcal L\) is in \(\mathcal C\) if for every finite subset Σ of \(\mathcal L\) and for every conjunctive query q, we have that the ⋆-consistent query answering problem of q w.r.t. Σ is in \(\mathcal C\).

  • We say that the data complexity of ⋆-consistent query answering w.r.t. \(\mathcal L\) is \(\mathcal C\) -complete if it is in \(\mathcal C\) and there are a finite subset Σ of \(\mathcal L\) and a conjunctive query q , such that the ⋆-consistent query answering problem of q w.r.t. Σ is \(\mathcal C\)-complete.

3 Universal Repairs

By analogy to the notion of a universal solution in data exchange [17], we will now introduce the notion of a universal repair. Since it will turn out that universal repairs often do not exist, we also introduce a weaker notion, namely, that of a n-universal repair, n ≥ 1.

Definition 3.1

Let Σ be a set of dependencies, I an instance, and ⋆∈{⊕,subset,superset}.

  • A universal ⋆-repair of I w.r.t. Σ is a ⋆-repair J of I w.r.t. Σ such that if q is a conjunctive query, then

    $$\star-\text{Con}(q,I,{\Sigma}) = q(J)_{\downarrow},$$

    where q(J) is the set of tuples in q(J) containing only values from the active domain adom(I) of I.

  • For n≥1, an n-universal ⋆-repair of I w.r.t. Σ is a ⋆-repair J of I w.r.t. Σ such that if q is conjunctive query with at most n atoms, then

    $$\star-\text{Con}(q,I,{\Sigma}) = q(J)_{\downarrow}.$$

Clearly, a universal repair is also a n-universal repair, for every n≥1. As we shall soon see, however, the converse is not always true. The next proposition, which follows immediately from the definitions and the fact that the data complexity of conjunctive queries is in Ptime, justifies the introduction of the notions of universal repairs and n-universal repairs.

Proposition 3.2

Let Σ be a set of dependencies and let ⋆∈{⊕,subset, superset}.

  • Assume that n is a positive integer. If there is a polynomial-time algorithm that, given an instance I, it returns an n-universal ⋆-repair of I w.r.t. Σ, then the data complexity of the ⋆-consistent query answering problem for conjunctive queries with at most n atoms is in Ptime.

  • In particular, if there is a polynomial-time algorithm that, given an instance I, it returns a universal ⋆-repair for I w.r.t. Σ, then the data complexity of the ⋆-consistent query answering for conjunctive queries is in Ptime.

It is not hard to see that the notion of a universal repair can be characterized in terms of homomorphisms.

Proposition 3.3

Let Σ be a set of dependencies, let ⋆∈{⊕, subset, superset}, and let I and J be two instances. Then the following statements are equivalent.

  1. (1)

    J is a universal ⋆-repair of I w.r.t. Σ.

  2. (2)

    J is a ⋆-repair of I w.r.t. Σ and for every ⋆-repair J of I w.r.t. Σ, there is a homomorphism h:J→J such that h(a)=a, for every a in adom(I)∩adom(J).

Proof

The direction (1)⇒(2) is proved by applying the definition of universal ⋆-repair to the variant of canonical conjunctive query of J in which the elements in the intersection adom(I)∩adom(J) are viewed as free variables and the elements in adom(J)∖adom(I) are existentially quantified. The direction (2)⇒(1) follows from the preservation of conjunctive queries under homomorphisms. □

Note that a similar characterization of n-universal repairs in terms of homomorphisms does not hold; intuitively, the reason is that, in general, an n-universal repair may contain more than n facts, so the proof of direction (1)⇒(2) does not go through.

The notion of a universal repair is closely related to that of a nucleus, which was introduced and studied in the context of update repairs by J. Wijsen in [35]. The notion of a nucleus can be adapted to set-based repairs as follows.

Let Σ be a set of dependencies, ⋆∈{⊕, subset, superset}, and I an instance. A ⋆-nucleus for I w.r.t. Σ is an instance J such that if q is a conjunctive query, then ⋆-Con(q,I,Σ)=q(J) . Thus, a ⋆-universal repair for I w.r.t. Σ is ⋆-nucleus for I w.r.t. Σ that also happens to be a ⋆-repair of I w.r.t. Σ. It should be pointed out that, if an instance I has finitely many ⋆-repairs w.r.t. Σ, then a ⋆-nucleus for I w.r.t. Σ always exists. To see this, let J be the direct product of all ⋆-repairs of I w.r.t. Σ. From the definition of the direct product and the fact that the direct product has homomorphisms to each of its factors, it is easy to verify that for every Boolean conjunctive query q, we have that ⋆-Con(q,I,Σ)=q(J). This does not exactly extend to non-Boolean conjunctive queries, but comes quite close. For example, if q is a unary conjunctive query and a is an element in the active domain adom(I), then a∈⋆-Con(q,I,Σ) if and only if the tuple (a,…,a)∈q(J). Therefore, one can obtain a ⋆-nucleus J for I from J by taking an isomorphic copy of J in which every tuple in J of the form (a,…,a) with a∈adom(I) is identified with a, while every tuple that is not of this form is identified with a unique element not in adom(I). This construction is also described in [35, Definition 4.4]. See also the proof of Proposition 3.8 below, where a similar construction is used.

Note that all ⋆-nuclei of I w.r.t. Σ are homomorphically equivalent via homomorphisms that are the identity on elements of the active domain adom(I) of I. However, it is possible that every ⋆-nucleus of I w.r.t. Σ is of size exponentially bigger than the size of I.

We now give several examples illustrating universal repairs and n-universal repairs.

Example 1

Let Σ={R(x,y)→∃z R(y,z)}.

First, consider the instance I={R(1,2),R(2,2),R(1,3),R(3,4)}. Then the instance J={R(1,2),R(2,2)} is a universal subset-repair of I w.r.t. Σ, because it is the only subset-repair of I.

Next, consider the instance I={R(1,2)}. Then I has no universal superset-repair; indeed, this is so because every superset-repair of I must contain a cycle of some length n 0 (recall that all repairs must be finite), and therefore cannot be universal since, as seen earlier, not every superset-repair contains a cycle of the same length n 0. However, for every n≥1, there are n-universal superset-repairs of I. Specifically, one such n-universal repair is the “loop” instance

$$J_{4n,2n}=\{R(i,i+1)| 1\leq i < 4n\}\cup \{R(4n,2n)\}. $$

Example 2

Let Σ={P(x)∧Q(x)→R(x)} and I={P(a),Q(a)}. The instance J={P(a),Q(a),R(a)} is a universal superset-repair of I w.r.t. Σ, because it is the only superset-repair of I. However, I has no universal subset-repair, since its subset-repairs are {P(a)} and {Q(a)}. In fact, I does not even have a 1-universal subset-repair, since the queries P(x) and Q(x) return different values on the two subset-repairs of I.

Example 3

Let Σ={P(x)∧Q(y)→x=y} and I={P(a),Q(b)}. First, I has no superset-repair w.r.t. Σ, hence it has no 1-universal superset-repair. Furthermore, it has neither a 1-universal subset-repair, nor a 1-universal ⊕-repair w.r.t. Σ.

The next proposition describes some basic properties of the different types of universal repairs, and the relationship between them. When we consider an isomorphism between repairs J,J of an instance I, we assume that the isomorphism h in question is constant on the active domain of I, that is, h(a)=a for all a∈adom(I).

Proposition 3.7

Let Σ be a set of dependencies.

  1. 1.

    Every instance has at most one universal superset-repair, up to isomorphism.

  2. 2.

    An instance has a universal subset-repair if and only if it has exactly one subset-repair. Consequently, every instance has at most one universal subset-repair.

  3. 3.

    Every universal ⊕-repair of an instance I is a universal subset-repair of I. Consequently, every instance has at most one universal ⊕-repair. In contrast, a universal subset-repair need not be a universal ⊕-repair.

  4. 4.

    For every instance I and every instance J, the following statements are equivalent:

    1. (1)

      J is a universal ⊕-repair of I.

    2. (2)

      J is a universal subset-repair of I and J⊆J , for every ⊕-repair J of I.

    3. (3)

      J is a subset repair of I and J⊆J , for every ⊕-repair J of I.

Proof

  1. 1.

    Let J 1 and J 2 be universal superset-repairs of I. In particular, the active domain of I is contained in both the active domain adom(J 1) of J 1 and the active domain adom(J 2) of J 2. Let q 1 and q 2 be the canonical conjunctive queries of J 1 and J 2, where, as in the proof of Proposition 3.3, the values in the active domain adom(I) of I are treated as free variables of the queries, and the values in adom(J i )∖adom(I), i=1,2, are treated as existentially quantified variables of the queries. By universality, q 1(J 1) =q 1(J 2) and q 2(J 1) =q 2(J 2) . This implies that there are homomorphisms h 1:J 1J 2 and h 2:J 2J 1 such that h 1(a)=a and h 2(a)=a for all values a in the active domain of I. Consequently, J 1 and J 2 are homomorphically equivalent, where the values from the active domain of I are treated as constants. Furthermore, since J 1 and J 2 are superset-repairs of I, we claim that each of them is a core (with the values from the active domain of I treated as constants). This means that there is no proper retraction of J i to itself, i.e., we have to show that, for i=1,2, there is no homomorphism h:J i J i such that h is the identity on its range and h(J i ) (as a set of facts) is properly contained in J i . Suppose that h:J i J i is a homomorphism that is the identity on its range. Then we have that Ih(J i )⊆J i . Moreover, since Σ consists of tgds and egds, J i satisfies Σ, and h is the identity on its range, we have that h(J i )⊧Σ (this means that tgds and egds are preserved under retractions - see also cf. [23]). Since J i is a superset repair of I, it follows that h(J i )=J i . This completes the proof that J 1 and J 2 are cores. But it is a well known fact that any two homomorphically equivalent core instances are isomorphic (cf. [18, 25]).

  2. 2.

    If an instance I has exactly one subset-repair J, then, trivially, J is a universal subset-repair. If, on the other hand, there are (at least) two different subset-repairs J 1 and J 2, then, by the definition of subset repairs, neither is a subset of the other, and hence, neither can be universal. This is so because each J i , i=1,2, contains a fact, say R(a), that is not included in all subset-repairs; therefore, J i does not correctly compute the consistent answers for the query R(x).

  3. 3.

    Let J be a universal ⊕-repair of I w.r.t. Σ. Let J be a subset-repair of I (recall that every instance has a subset-repair). Since J is a universal ⊕-repair, every fact R(a) of I that belongs to J also belongs also to J (for, if not, then evaluating the query q(x)=R(x) in J would not yield the consistent answers). Consequently, J IJI, and therefore, by the definition of ⊕-repairs, J=J . Hence J is a subset-repair.

    Not every universal subset-repair is a universal ⊕-repair. To see this, consider the set Σ={P(x)→∃y Q(y),P(x)∧Q(x)→R(x)} and the instances I={P(a),Q(a)} and J={Q(a)}. Then J is a universal subset-repair of I, but not a universal ⊕-repair of I, as may be seen by considering the ⊕-repair J ={P(a),Q(b)} and the query q(x)=Q(x). In fact, this shows that J is not even a 1-universal ⊕-repair of I.

  4. 4.

    For the implication (1)⇒(2), if J is a universal ⊕-repair of I, then, as shown in Part 3, J is a universal subset-repair of I. Moreover, by ⊕-universality, J is contained in every ⊕-repair of I. The implication (2)⇒(3) is obvious. Finally, for the implication (3)⇒(1), assume that J is a subset repair of I that is contained in every ⊕-repair of I, and let q be a conjunctive query. Since every subset repair is also a ⊕-repair, we have that ⊕-Con(q,I,Σ)⊆q(J) =q(J) (the equality holds because J is a subinstance of I). Moreover, if J is an arbitrary ⊕-repair of I, then we have that q(J)⊆q(J ), since JJ and conjunctive queries are monotone. Thus, q(J)⊆⊕-Con(q,I,Σ) and so ⊕-Con(q,I,Σ)=q(J) =q(J).

Note that Part 2 and Part 3 of Proposition 3.7 hold with “n-universal” in place of “universal”, where n is an arbitrary positive integer. The reason is that universality was applied only to queries with a single atom. In particular, the counterexample in the proof of Part 3 shows that a universal subset-repair need not even be a 1-universal ⊕-repair. Note also that the notion of a universal superset-repair is intimately linked to the notion of a core universal solution in data exchange [18], as seen in the proof of Proposition 3.7. We will further exploit this connection when we consider superset-repairs for weakly acyclic sets of tgds and egds.

Proposition 3.8

Let Σ be a set of dependencies. Every instance that has a superset-repair w.r.t. Σ has an n-universal superset-repair w.r.t. Σ, for each positive integer n.Footnote 1 Moreover, there is a set of tgds Σ for which there is no algorithm that, given an instance I, computes a 1-universal superset repair of I.

Proof

Let Σ be a set of dependencies, let I be an instance, and let n be a natural number. Let Q n the set of all pairs 〈q,a〉 where q is a conjunctive query with at most n atoms, and a∈adom(I)k (where k≥0 is the arity of q) be such that a∉superset-Con(q,I,Σ). For each 〈q,a〉∈Q n , let J q,a be a superset-repair of I w.r.t. Σ such that aq(J q,a). Let J × be the direct product π{J q,a∣〈q,a〉∈Q n } (cf. [16]), and let J be an isomorphic copy of J × in which each “diagonal element”, i.e., each element of the form (a,a,…,a) is replaced by a.

Claim 1:

IJ and J⊧Σ.

First, we prove that I is a subset of J. Every fact R(a 1,…,a m ) of I must belong to all superset-repairs of I, from which it follows, by the definition of direct products, that R((a 1,…,a 1),…,(a m ,…,a m )) belongs to J ×, and therefore R(a 1,…,a m ) belongs to J.

Next we show that Σ is true in J. This follows from the known fact that tgds and egds are preserved under taking direct products [16]: whenever I 1,…,I n are instances satisfying a set of tgds and egds Σ, then the direct product π i I i satisfies Σ. This finishes the proof of the Claim.

It follows from Claim 1 that there is some superset-repair J 0 of I such that J 0J. We prove that J 0 is a n-universal superset-repair of I w.r.t. Σ. That is, for all queries q with at most n atoms,

$$\text{superset-Con}(q,I,{\Sigma})=q(J_{0})_{\downarrow}. $$

One inclusion follows from the fact that J 0 is a superset-repair of I w.r.t. Σ. For the other inclusion, we reason by contraposition: suppose that a∉superset-Con(q,I,Σ). Then, by construction, aq(J q,a). Let h:J ×J q,a be the natural map that sends each tuple to its value at the coordinate corresponding to 〈q,a〉. Let h :JJ q,a be the composition of h with the isomorphism between J × and J. In other words,

$$h^{\prime}(x) = \left\{\begin{array}{lllllll} a & \text{if}\, x=a\ \text{for some}\ a\in\text{adom}(I) \\ h(x) & \text{otherwise} \end{array}\right. $$

Since h:J ×J q,a is a homomorphism, we have that h :JJ q,a is a homomorphism as well. Since conjunctive queries are preserved by homomorphisms, it follows that aq(J). Since J 0J and conjunctive queries are monotone, this implies that aq(J 0).

We now proceed to the second part of the proposition. It was shown in [10] that there is a set of tgds Σ and a Boolean conjunctive query q consisting of a single atom, such that the following problem is undecidable:

Given an instance I, decide whether, for every instance J⊧Σ with IJ, we have that q is true in J.

Using the notation used in Proposition 2.8, this is precisely the problem of computing the certain answers of q with respect to the schema mapping \(\mathcal {M}_{\Sigma }\). It follows that there is no effective algorithm for computing a 1-universal superset-repair with respect to Σ, since, by Proposition 2.8, such an algorithm would immediately yield an algorithm for computing the certain answers of q with respect to Σ. □

4 LAV tgds

The main result of this section is that if Σ is a set of LAV tgds and q is a conjunctive query, then there is a polynomial-time algorithm for computing the ⋆-consistent answers of q on a given instance I w.r.t. Σ, where ⋆∈{⊕, subset, superset}. We also show that there is a polynomial-time algorithm for the ⋆-repair checking problem w.r.t. Σ, where ⋆∈{⊕,subset, superset}. In the special case in which Σ is a set of inclusion dependencies, the tractability of the subset-repair checking problem and the subset-consistent query answering problem for conjunctive queries was established in [14]. The tractability of the subset and the ⊕-repair checking problem for a weakly acyclic set of LAV tgds was shown in [2] and was subsequently extended to the broader class of semi-LAV sets of tgds [24] (which is still a subclass of the class of weakly acyclic sets of tgds). Finally, the ⊕-consistent query problem for sets of inclusion dependencies and universal constraints was studied in [9], where disjunctive logic programming was used to obtain the ⊕-consistent answers, but no complexity results were established.

In obtaining our tractability results, we will use extensively the notions of universal repair and of n-universal repair, n≥1, for sets of LAV tgds.

We say that a set of dependencies Σ is closed under union if for all instances I 1,I 2 such that I 1⊧Σ and I 2⊧Σ, we have that I 1I 2⊧Σ. It is well known that every set of LAV tgds is closed under union (e.g., see [2 , 12]). The next result shows that, in a certain sense, this property is characteristic of LAV tgds. For completeness, we also include a proof that sets of LAV tgds are closed under union.

Theorem 4.1

Every set of LAV tgds is closed under union. Furthermore, if Σ is a set of tgds that is closed under union, then Σ is logically equivalent to a set of LAV tgds.

Proof

Suppose that Σ is a set of LAV tgds and that I 1 and I 2 are instances such that both satisfy Σ. Let ∀x(R(x)→∃y ψ(x,y) be a tgd in Σ. Then, for every tuple of values a such that R(a) holds in I 1I 2, we have that R(a) already holds in I 1 or I 2, and hence ∃y ψ(a,y) holds in I 1 or in I 2, which implies that it also holds in I 1I 2. This shows that I 1I 2⊧Σ.

We now prove that the converse holds. Let Σ be a set of tgds that is closed under union, and let R be the schema of Σ. We define Σ as the (finite) set of all LAV tgds over R implied by Σ and containing at most n atoms in the right-hand side, where n is the maximum number of atoms in the right-hand side of a tgd in Σ. By definition, Σ logically implies every LAV tgd in Σ. Conversely, consider any tgd in Σ of the form

$$\sigma ~=~ \forall\mathbf{x}(\phi(\mathbf{x})\to\exists\mathbf{y}\psi(\mathbf{x},\mathbf{y}))$$

We will show that Σσ. Let I be the canonical instance of ϕ(x), that is, the instance whose active domain consists of the variables in x and whose facts are the atoms in ϕ. Let g be the natural variable assignment for x, i.e., the map that sends every variable in x to itself.

For each fact f of I, let J f be an n-universal superset-repair of {f} with respect to Σ (using Proposition 3.8), using all fresh elements except for the elements belonging to the fact f itself. Finally, let \(J=\bigcup \{J_{f}\mid \text {\textit {f} is a fact of \textit {I}}\}\). It follows from the closure under union that J⊧Σ. Furthermore, by construction, we have that IJ. In particular, we have that Jϕ(x) under the variable assignment g, and therefore, ψ(x,y) must be satisfied under some variable assignment g extending g.

In what follows, the basic idea will be that we split the conjunctive query ∃y ψ(x,y) into different parts, such that each part is mapped (by g ) into a different “component” J f of J. If it happens to be the case that g maps each existential variable to a value outside of a d o m(I), then this will turn out to be easy to do. In general, it may be the case that g maps some of the variables y to elements of adom(I). However, if this is the case, then we can find a substitution instance of ψ that is satisfied, and where those existential variables have been replaced some of the universally quantified variables. Let ∃y ψ (x,y ) be a maximal substitution instance (i.e., replacing as many existentially quantified variables by universal quantified variables as possible) for which it is the case that J⊧∃y ψ (x,y ) under the variable assignment g, and let h be a variable assignment for x and y extending g such that ψ (x,y ) is satisfied in J under h. Note that by construction, h(x)=x for all xx and h(y)∉adom(I) for all yy .

We now label each atom A of ψ by a fact f of I such that h(A) belongs to J f (note that, by construction of J, we can always find such a fact. If there are more than one, we choose one arbitrarily). Observe that, if two atoms of ψ receive a different label, they cannot share an existentially quantified variable (this is precisely because all existential variables are mapped by h to values outside a d o m(I), which must therefore belong to a unique “component” J q of J). For each fact f of I, let q f be the conjunctive query that is obtained as the subquery of ∃y ψ(x,y ) consisting of the atoms labeled f. Then by construction, q f is a conjunctive query that is satisfied in J f under the natural variable assignment g. Since q f has at most n atoms and J f is an n-universal superset-repair of {f}, it follows that q f is true in every superset-repair of {f} with respect to Σ, under the natural variable assignment g. But this means that Σ implies the LAV tgd ∀x(f(x)→q(x)), and hence the latter belongs to Σ. This argument applies to each fact f of I. It follows that, whenever ϕ(x) is satisfied in an instance satisfying Σ, then the LAV tgds in Σ ensure that ∃y ψ(x,y ), and hence also ∃y ψ(x,y) is satisfied. In other words, Σ implies σ. □

Closure under union implies that every instance has a universal ⊕-repair. In fact, the following characterization holds.

Proposition 4.2

Let Σ be a set of dependencies. The following statements are equivalent:

  1. (1)

    Σ is closed under union.

  2. (2)

    Every instance has a universal ⊕-repair w.r.t. Σ.

  3. (3)

    Every instance has a universal subset-repair w.r.t. Σ.

Proof

For the direction (1) ⇒ (2), let Σ be a set of tgds that is closed under union, and let I be an instance. Since the empty instance satisfies Σ, we have that I has at least one subset-repair w.r.t. Σ. In fact, I has a unique subset-repair.Footnote 2 For, if J 1 and J 2 are subset-repairs of I, then J 1J 2 is also a subinstance of I satisfying Σ, and hence, by the definition of repairs, J 1=J 1J 2=J 2. Clearly, the unique subset-repair J of I is a universal subset-repair of I. Furthermore, we claim that J is a universal ⊕-repair of I. Let q be a conjunctive query. We have to show that ⊕-Con(q,I,Σ)=q(J) . From the definition of ⊕-Con(q,I,Σ) and since J is a ⊕-repair of I, it follows that ⊕-Con(q,I,Σ)⊆q(J) . For the other inclusion, let K be a ⊕-repair of I. We have to show that q(J) q(K). Since J and K are ⊕-repairs, we have \(J \vDash {\Sigma }\) and \(K \vDash {\Sigma }\). By closure under union, we have that also \(J \cup K\vDash {\Sigma }\). As JI, we have I⊕(JK)⊆IK. Since K is a ⊕-repair of I, this can only happen if KJ=K, which means that JK. It follows that q(J) q(K). Hence, J is a universal ⊕-repair of I.

The direction (2) ⇒ (3) follows immediately from Proposition 3.7.

[3 ⇒ 1] Let I 1,I 2 be instances satisfying Σ. Towards a contradiction, suppose that I 1I 2 does not satisfy Σ. Let J be the universal subset-repair of I 1I 2. Then J must omit some fact of I 1I 2. Without loss of generality, we may assume that J omits a fact of I 1. Since I 1 satisfies Σ, there must be a subset-repair of I 1I 2 that contains all facts in I 1. But this subset-repair must then be incomparable to J, which means that J is not the only subset-repair of I, and hence, by Proposition 3.7, J is not a universal subset-repair of I, a contradiction. □

Corollary 4.3

If Σ is a set of LAV tgds, then every instance I has a unique subset repair, which is also the unique universal subset-repair and the unique universal ⊕-repair of I w.r.t. Σ.

Corollary 4.3 implies, in particular, that if Σ is a set of LAV tgds, then, for all conjunctive queries q and instances I, ⊕-Con(q,I,Σ)=subset-Con(q,I,Σ).

Theorem 4.4

Let Σ be a set of LAV tgds. There is a polynomial-time algorithm that, given an instance I, computes the unique universal subset-repair of I w.r.t. Σ (which is also the unique universal ⊕-repair of J w.r.t. Σ).

Proof

Let I be an instance. By Corollary 4.3, I has a unique subset repair I 0, which is also both the unique universal subset-repair and the unique universal ⊕-repair of I. We will show how to compute I 0 in polynomial time. We start with a definition. Let τ be a LAV tgd of the form ∀x (R(x)→∃y ψ(x,y)). We say that a fact t falsifies the LAV tgd τ in an instance J if t is in J, it is of the form R(a), and ∃y ψ(a,y) is false in J.

The algorithm works as follows. It starts with the instance I. As long as the current instance contains a fact R(a) that falsifies some tgd in Σ, it removes it from the current instance.

Clearly, the algorithm runs in time polynomial in the size of I. It is easy to show by induction that, at any time of the run of the algorithm, the current instance is a superset of the unique subset-repair I 0 of I. Hence, if J is the instance obtained at the end of the run of the algorithm, we have that I 0J. Moreover, since there is no tuple falsifying a tgd in Σ in J, we have that J satisfies Σ. Since \(J \vDash {\Sigma }\), I 0JI, and I 0 is a subset-repair of I, we conclude that I 0 is equal to J. □

The situation for superset-repairs is very different. Example 3.4 shows that an instance may have infinitely many superset-repairs of arbitrarily large size, and it may have no universal superset-repairs. Nevertheless, we will show that if Σ is a set of LAV tgds then, for every instance I and every n≥1, there is an n-universal superset-repair for I w.r.t. Σ that can be computed in polynomial time in the size of I. For this, we need the following lemma.

Lemma 4.5

Let Σ be a set of LAV tgds. There is a polynomial time algorithm that, given instances I and J with I⊆J and J⊧Σ, computes a superset-repair K of I w.r.t. Σ such that K⊆J.

Proof

Figure 1 depicts an algorithm that, given I and J with IJ and \(J \vDash {\Sigma }\), either verifies that J is a repair of I or computes a set K such that \(I \subseteq K \subsetneq J\) and \(K \vDash {\Sigma }\). By applying repeatedly the algorithm until it outputs a repair, we obtain the result.

Fig. 1
figure 1

Algorithm for the superset-repair checking problem with respect to a set of LAV tgds

By Theorem 4.4, computing the subset-repair of an instance is in polynomial time. It follows that the algorithm in Fig. 1 runs in time polynomial in the sizes of I and J.

We prove the correctness of the algorithm. Suppose first that the algorithm outputs an instance K. We have to show that \(I \subseteq K \subsetneq J\) and \(K \vDash {\Sigma }\). By construction, there exists a fact R(a) in JI such that K is the subset-repair of J∖{R(a)}. This implies that \(K \vDash {\Sigma }\) and \(K \subsetneq J\). Moreover, K contains I, as the algorithm returned K.

Next, assume that J is not a superset-repair. We have to prove that the algorithm does not output “J is a repair”. Since J is not a superset-repair, there exists an instance J 0 such that \(I \subseteq J_{0} \subsetneq J\) and \(J_{0} \vDash {\Sigma }\). Take R(a) in JJ 0. Let L be the subset-repair of J∖{R(a)}.

Consider the instance L 0:=LJ 0. By Theorem 4.1, \(L_{0} \vDash {\Sigma }\). Moreover, L 0J∖{R(a)} and JL 0JK. Since K is a subset-repair of J, this can only happen if L 0=K. That is, J 0L. Together with IJ 0, we obtain IL. Recall that L is the subset-repair of J∖{R(a)}. This means that, when the algorithm enters the loop and picks R(a), it stops running and returns the instance L. In particular, the algorithm does not output “J is a repair”. □

Theorem 4.6

Let Σ be a set of LAV tgds, and let n≥1. There is a polynomial time algorithm that, given an instance I, computes an n-universal superset-repair of I w.r.t. Σ.

Proof

Let R be the schema consisting of the relations in Σ. We may safely restrict attention to instances over R (because facts involving relations outside of R do not require repairing).

By Propositions 3.8 and 2.7, every instance I has an n-universal superset-repair w.r.t. Σ. Using the fact that Σ consists of LAV tgds, we can furthermore show that an n-universal superset-repair of a given instance can be effectively constructed: let q be any conjunctive query over R with at most n atoms, and let a be a tuple from a d o m(I) whose length is the arity of q. Recall from Proposition 2.8 that a∈superset-Con(q,I,Σ) holds if and only if a is belongs to q(J) for all instances J extending I and satisfying Σ. It was shown in [5 , 10] that the latter is decidable in double exponential time, and, moreover, when a∉superset-Con(q,I,Σ), a counterexample (i.e., an instance J⊧Σ such that IJ and aq(J)) can be effectively constructed (this was shown in [5, 10] for guarded tgds, which form a proper generalization of LAV tgds). These facts immediately turn the construction described in the proof of Proposition 3.8 into an effective procedure.

In the remainder of the proof, we show that, for fixed Σ and n, we can compute n-universal superset-solutions in polynomial time.

We first consider instances consisting of a single fact (over R). Let I={R(a)}, where R(a) is some fact over R. By Proposition 3.8, I has an n-universal superset-repair, which we will denote by J R(a). For each fact R(a), we choose some suitable such J R(a). There are only finitely many distinct facts over R, up to isomorphism (i.e., up to a one-to-one renaming of values), and therefore the map that sends each fact R(a) to the chosen instance J R(a) can be viewed as a finite object (by assuming that, whenever two facts f,f are isomorphic, then the chosen instances J f and \(J_{f^{\prime }}\) are isomorphic as well). We can think of it as a lookup table that will be used by the algorithm below to efficiently construct n-universal repairs for instances consisting of a single fact. Note that the lookup table does not depend on the input instance (it depends only on Σ, which is fixed) and hence constructing the lookup table is not part of the computation performed by the algorithm.

We now proceed with the description of the algorithm. Given as input an arbitrary instance I, the algorithm constructs, in polynomial time, an instance J, namely, the union of all instances J R(a) where R(a) is a fact belonging to I. We assume here that each instance J R(a) contains, besides a, only fresh values not occurring in any of the other instances.

Clearly, we have that IJ. Since LAV tgds are closed under union, we have that J⊧Σ. In what follows, we will show that, for every query q of size at most n, and for every tuple a from the active domain of I, if aq(J) then a∈superset-Con(q,I,Σ) in I. By Lemma 4.5, we can construct, in polynomial time, a subinstance J of J is a superset-repair of I. We will show that J is in fact an n-universal superset-repair of I.

Let aq(J) where q is a conjunctive query of size at most n and a is a tuple from the active domain of I. For the sake of a contradiction, assume that a∉superset-Con(q,I,Σ). We may assume that q is a minimal size query for which this happens.

Let h be a witnessing homorphism from q to J. We know that h is injective, for, otherwise, we could replace q by a smaller query. We also know that the image of h is entirely contained in a single part J F of J (where F is a fact of I). This is because, otherwise, q could be divided into two subqueries, one whose image is contained in some J F and the other containing the rest of the query q. One of these two parts would then be a smaller counterexample.

But, since the image of h is entirely contained in J F , and we know that J F is an n-universal superset-repair of {F}, we have that a∈superset-Con(q,{F},Σ)⊆superset-Con(q,I,Σ). □

The proof of Theorem 4.6 also shows that, given a set of LAV tgds Σ, a natural number n≥1, and an instance I, we can effectively compute an n-universal superset-repair of I w.r.t. Σ.

The main result of this section is now an immediate consequence of Proposition 3.2, Theorems 4.4, and 4.6.

Theorem 4.7

For every set Σ of LAV tgds, for every conjunctive query q, and for ⋆∈{⊕,superset,subset}, the ⋆-consistent query answering problem w.r.t. Σ is solvable in polynomial time.

The Repair-Checking Problem

We now consider the repair checking problem for sets of LAV tgds. First, we show how the ⊕-repair checking problem can be reduced to the subset-repair checking problem and the superset-repair checking problem.

Lemma 4.8

Let Σ be a set of LAV tgds, and let I and J be two instances. Then the following statements are equivalent:

  1. 1.

    J is a ⊕-repair of I w.r.t. Σ.

  2. 2.

    J is a superset-repair of I∩J w.r.t. Σ and J is a subset-repair of I∪J w.r.t. Σ.

Proof

Let Σ be a set of LAV tgds, and let I and J be two instances. For the direction (1)⇒(2), assume that J is a ⊕-repair of I. First, we prove that J is a superset-repair of IJ. Let K 0 be an instance such that \(K_{0} \vDash {\Sigma }\) and IJK 0J. We have to show that K 0=J. As IJK 0J, we also have IK 0IJ. Since \(K_{0} \vDash {\Sigma }\) and J is a ⊕-repair of I, this can only happen if K 0=J.

Next we prove that J is a subset-repair of IJ. Let K 1 be an instance such that \(K_{1} \vDash {\Sigma }\) and JK 1IJ. We have to show that J=K 1. Since JK 1IJ, we have IK 1IJ. Putting this together with the facts that \(K_{1} \vDash {\Sigma }\) and J is a ⊕-repair of I, we obtain K 1=J.

For the direction (2)⇒(1), assume that J is a superset-repair of IJ w.r.t. Σ and that J is a subset-repair of IJ w.r.t. Σ. Let K be an instance such that \(K \vDash {\Sigma }\) and IKIJ. We have to show that K=J. Since IKIJ, we have

$$J \cap I \subseteq K \cap I \text{ and } K \backslash I \subseteq J \backslash I. $$

As KIJI, we have KJIJ. Moreover, since \(K \vDash {\Sigma }\) and \(J \vDash {\Sigma }\), it follows from Theorem 4.1 that \(K \cup J \vDash {\Sigma }\). Recall that J is a subset-repair of IJ. Since \(K \cup J \vDash {\Sigma }\) and JKJIJ, this implies that J=KJ. Hence, KJ.

Recall that JIKI. Putting everything together, we have IJKJ. Since \(K \vDash {\Sigma }\) and J is a repair of IJ, this implies K=J. □

Theorem 4.9

Let Σ be a set of LAV tgds, and let ⋆∈{⊕,superset,subset}. The ⋆-repair checking problem w.r.t. Σ is solvable in polynomial time.

Proof

By Lemma 4.8, it suffices to give polynomial-time algorithms for the superset-repair checking problem and the subset-repair checking problem. The fact that the subset-repair checking problem is solvable in polynomial time follows directly from Theorem 4.4 (since it suffices to check for equality with the universal subset-repair). Finally, the fact that the superset-repair checking problem is solvable in polynomial time is a direct consequence of Lemma 4.5 □

Adding egds

The tractability results concerning LAV tgds are optimal, in the sense that if we consider sets of LAV tgds and egds, then most of them do not remain true. We discuss first the repair-checking problem for sets of LAV tgds and egds. In [14], it was shown that there is a set consisting of a cyclic inclusion dependency and a functional dependency for which the subset-repair checking problem is coNP-complete. In [2], the intractability of subset-repair checking (and ⊕-repair checking) was shown to hold for the union of an acyclic set of inclusion dependencies with a set of egds. However, the superset-repair checking problem can still be solved in polynomial time. This follows from the next lemma combined with Theorem 4.9.

Lemma 4.10

Let Σ be a set of tgds and egds, and let I and J be two instances such that I⊆J and \(J \vDash {\Sigma }\). Then J is a superset-repair of I w.r.t. Σ if and only if J is a superset-repair of I w.r.t. the set Σ 1 consisting of all tgds in Σ.

Proof

Assume that Σ=Σ1∪Σ2, where Σ1 is a set of tgds and Σ2 is a set of egds. Let I and J be two instances such that IJ and \(J \vDash {\Sigma }_{1}\cup {\Sigma }_{2}\). It is clear that if J is a superset-repair of I w.r.t. Σ1, then J is a superset-repair of I w.r.t. Σ1∪Σ2.

For the other direction, assume that J is a superset-repair of I w.r.t. Σ1∪Σ2. Towards a contradiction, assume that J is not a superset-repair of I w.r.t. Σ1. Hence, there is an instance J such that \(I \subseteq J^{\prime } \subsetneq J\) and \(J^{\prime } \vDash {\Sigma }_{1}\). Since Σ2 is a set of egds, \(J \vDash {\Sigma }_{2}\) and J J, we also have that \(J^{\prime } \vDash {\Sigma }_{2}\). So, J is a superset of I such that J J and \(J^{\prime } \vDash {\Sigma }_{1} \cup {\Sigma }_{2}\). This contradicts the fact that J is a superset-repair of I w.r.t. Σ=Σ1∪Σ2. □

Next, we comment on the consistent query answering problem for sets of LAV tgds and egds. Theorem 4.7 in [14] shows that there is a set Σ of inclusion dependencies and functional dependencies, and a conjunctive query q such that computing the subset-consistent answers of q w.r.t. Σ is a \({{\Pi }_{2}^{p}}\)-complete problem. Furthermore, as regards the combined complexity of consistent query answering, it follows from Theorem 5 in [31] that the following problem is undecidable: given a set Σ of LAV tgds and egds, a conjunctive query q and an instance I, compute the superset-consistent answers of q on I w.r.t. Σ. We leave it as an open problem whether or not this undecidability result still holds for some fixed conjunctive query q and some fixed set Σ of LAV tgds and egds. It is also unknown whether or not the data complexity of ⊕-consistent answers for conjunctive queries is decidable. As mentioned in the Introduction, the ⊕-consistent answering problem for unions of conjunctive queries and for sets of inclusion dependencies and functional dependencies was shown to be undecidable in combined complexity [11].

Finally, if we consider LAV tgds together with egds, there might not always exist a universal (or even 1-universal) subset-repair, ⊕-repair or superset-repair, as seen in Example 3.6.

5 GAV tgds

In this section, we investigate the existence and efficient computability of universal repairs for GAV tgds, we review known complexity results for repair checking and consistent query answering for GAV tgds, and we provide new matching lower bounds. Furthermore, toward the end of the section, we will briefly consider the addition of egds, and will also discuss semi-LAV sets of tgds; as mentioned earlier, semi-LAV sets of tgds were introduced in [24].

Recall that in the case of LAV tgds, every instance has a unique subset-repair, which is also the unique universal ⊕-repair. This is far from true for GAV tgds. First of all, Example 3.5 shows that there is a set Σ of GAV tgds and an instance I such that I has no 1-universal subset repair w.r.t. Σ; therefore, Proposition 3.7 implies that I also has no 1-universal ⊕-repair w.r.t. Σ. In addition, even if an instance happens to have a universal subset-repair, this may not be a universal ⊕-repair, as revealed by the next example.

Example 4

Consider the set

$${\Sigma}=\{P(x) \land Q(y) \to M(y), M(x) \to P(x)\}$$

and the instance I={M(a),Q(b)}. The instance J={Q(b)} is then the only subset-repair of I, hence it is also a universal subset-repair of I. However, J is not a universal ⊕-repair of I, as can be seen by considering the ⊕-repair J ={M(a),P(a)} and the query Q(x). This also shows that, unlike the case of LAV tgds, the subset-consistent answers of a conjunctive query w.r.t. a class of GAV tgds do not always coincide with the ⊕-consistent answers of the same query w.r.t. the same class of GAV tgds.

Unlike the case for subset-repairs and ⊕-repairs, we will show that, for sets of GAV tgds, every instance has a universal superset-repair. In fact, every instance has a unique superset-repair; moreover, the latter property is characteristic for GAV tgds (this will be analogous to the characterization of LAV tgds given in the previous section). To state and proof this characterization, we need a definition and a lemma. We say that a set Σ of dependencies is closed under intersection if for all instances I 1,I 2 such that I 1⊧Σ and I 2⊧Σ, we have that I 1I 2⊧Σ. Closure under intersection will turn out to be equivalent to the existence of a unique superset-repair, and this characterizes the GAV tgds among all tgds.

Lemma 5.2

Let Σ be a set of tgds and let I be an instance that has a unique superset-repair J w.r.t. Σ. If I is an instance and J is a superset-repair of I w.r.t. Σ, then every homomorphism h:I→I can be extended to a homomorphism h :J→J .

Proof

Here, adom(K) will denote the active domain of an instance K, while rng(h) will denote the range of the function h. Let \(J^{\prime }_{h}\) be the following instance:

  • \(\text {adom}(J^{\prime }_{h}) = \text {adom}(I)\cup (\text {adom}(J^{\prime })\setminus \text {rng}(h))\).

  • For every fact R(b 1,…,b n ) of J , we populate \(J^{\prime }_{h}\) with all possible facts that can be obtained by replacing each value b i by a value a i d o m(I) such that h(a i )=b i , if b i ∈rng(h), or leaving b i untouched, otherwise.

Since h(I)⊆I J , we have that \(I\subseteq J^{\prime }_{h}\). Let h be a function defined on \(\text {adom}(J^{\prime }_{h})\) in the following way: if ad o m(I), then h (a)=h(a); if ad o m(I) (but a∈adom(J )∖rng(h)), then h (a)=a. From the construction of \(J^{\prime }_{h}\) and the definition of h , it follows that h is a homomorphism from \(J^{\prime }_{h}\) to J . To complete the proof, it suffices to show that \(J^{\prime }_{h}\models {\Sigma }\). Indeed, once we have shown this, it will follow that \(J\subseteq J^{\prime }_{h}\), because \(I\subseteq J^{\prime }_{h}\) and J is the unique superset-repair of I; consequently, the restriction of h on adom(J) is a homomorphism from J to J .

To show that \(J^{\prime }_{h}\models {\Sigma }\), consider an arbitrary tgd from Σ, say,

$$\phi(\mathbf{x})\to\exists\mathbf{y}\psi(\mathbf{x},\mathbf{y}). $$

Suppose that \(J^{\prime }_{h}\) satisfies ϕ(a) for some tuple of values a. Then, by the construction of \(J^{\prime }_{h}\), we have that J satisfies ϕ(h (a)), and hence, it also satisfies ψ(h (a),b) for some values b=b 1,…,b m . Let \(\mathbf {a}^{\prime }=a^{\prime }_{1}, \ldots , a^{\prime }_{m}\) be a tuple of values such that \(h^{\prime }(a^{\prime }_{i})=b_{i}\), for 1≤im. Then, by construction, \(J^{\prime }_{h}\) satisfies ψ(a,a ). □

Theorem 5.3

Let Σ be a set of tgds. Then the following statements are equivalent:

  1. 1.

    Every instance has a unique (universal) superset-repair w.r.t. Σ.

  2. 2.

    Σ is closed under intersection.

  3. 3.

    Σ is logically equivalent to a set of GAV tgds.

Proof

The proof goes round-robin.

  1. [1 ⇒3

    ] Assume that every instance I has a unique superset-repair J. It follows that J can only contain values that come from the active domain of I (otherwise, by taking an isomorphic copy in which the values outside of the active domain of I are replaced by fresh values, we would refute the uniqueness of the repair). Consider a tgd τ∈Σ of the form ϕ(x)→∃y ψ(x,y), where ψ is a conjunction of atomic formulas. Let I ϕ be the canonical instance of ϕ(x), and let J ϕ be the unique superset-repair of I ϕ . Since J ϕ is a repair of I ϕ , it satisfies the tgd τ. It follows that ∃y ψ(x,y) is satisfied in J ϕ under the natural assignment. Since J ϕ contains only values that are from the active domain of I ϕ , this means that J ϕ satisfies ψ(x,x ) for some tuple of values x from the active domain of I ϕ . Let \(\widehat {\tau }\) be the dependency

    $$\phi(\mathbf{x})\to\psi(\mathbf{x},\mathbf{x}^{\prime}), $$

    which can be equivalently written as a finite set of GAV tgds, and let \(\widehat {\Sigma }=\{\widehat {\tau }\mid \tau \in {\Sigma }\}\). We claim that Σ and \(\widehat {\Sigma }\) are logically equivalent. The fact that Σ logically implies Σ follows directly from the construction of Σ. For the other direction, suppose that I⊧Σ and Iϕ(a) for some tuple of values a. Let I ϕ be again the canonical instance of ϕ(x). Then the function h:xa is a homomorphism from I ϕ to I. Since J ϕ is the unique superset-repair of I ϕ w.r.t. Σ, Lemma 5.2 implies that h extends to a homomorphism from J ϕ to every superset-repair of I. Since I satisfies Σ, it is its own repair, which means that I satisfies ψ(a,h(a)). Therefore, \(I\models \widehat {\tau }\).

  2. [3 ⇒ 2

    ] Let Σ be a set of GAV tgds. Let I 1⊧Σ and I 2⊧Σ, and suppose that, for some GAV tgd t∈Σ of the form

    $$\phi(x_{1},\ldots,x_{n})\to R(x_{i_{1}},\ldots,x_{i_{k}}) $$

    it holds that I 1I 2ϕ(a 1,…,a n ) for some values a 1,…,a n . Then both I 1 and I 2 satisfy ϕ(a 1,…,a n ); since I 1 and I 2 satisfy Σ, we have that I 1 and I 2 satisfy \(R(a_{i_{1}}, \ldots , a_{i_{k}})\); thus, the fact \(R(a_{i_{1}}, \ldots , a_{i_{k}})\) belongs to the intersection of I 1 and I 2.

  3. [2 ⇒ 1

    ] Let Σ be a set of tgds that is closed under intersection. Let I be an instance and let J be the instance consisting of all possible facts over the active domain of I. Then IJ and J⊧Σ, hence a superset-repair of I must exist. Suppose that I has two distinct superset-repairs J 1 and J 2. By closure under intersection, J 1J 2⊧Σ, hence the properties of superset-repairs imply that J 1=J 1J 2=J 2, a contradiction.

The classical chase procedure from dependency theory (see [1]) provides a method for efficiently computing the unique superset-repair of an instance w.r.t. a set of GAV tgds.

Theorem 5.4

Let Σ be a set of GAV tgds. There is a polynomial time algorithm that, given an instance I, computes the unique (universal) superset-repair of I w.r.t. Σ.

We now move to consistent query answering. Let Σ be a set of GAV tgds and let q be a conjunctive query. The preceding Theorem 5.4 implies immediately that the superset-consistent query answering problem for q w.r.t. Σ is solvable in polynomial time. Staworko and Chomicki [32] showed that both the subset-consistent query answering problem and the ⊕-consistent query answering problem for q w.r.t. Σ are in coNP. This will also follow from Theorem 6.2 in the next section, which implies that the size of every ⊕-repair of an instance I w.r.t. a set of GAV tgds is bounded by a polynomial in the size of I. In [32], it is also shown that there is a conjunctive query q and a set Σ consisting of GAV tgds and egds such that the ⊕-consistent query answering problem for q w.r.t. Σ is coNP-complete. Our next result improves on this lower bound by showing that it can hold even for a set consisting of a single GAV tgd.

Theorem 5.5

There is a set Σ consisting of a single GAV tgd and there is a conjunctive query q such that both the subset-consistent query answering problem and the ⊕-consistent query answering problem for q w.r.t. Σ are coNP-complete.

Proof

We produce a GAV tgd τ, a conjunctive query q, and a polynomial-time reduction from the complement of Positive 1-in-3-SAT to the problem of finding the ⊕-consistent answers of q w.r.t. {τ}. Recall that Positive 1-in-3-SAT is the following NP-complete problem [22]: given a Boolean formula ϕ in conjunctive normal form and such that each clause is a disjunction of the form (x 1x 2x 3) of three positive literals, is there a truth assignment that makes true exactly one variable in every clause?

Let τ be the following GAV tgd:

$$\forall x,u,u^{\prime} (P(x,u) \wedge P (x,u^{\prime}) \to E(u,u^{\prime})) $$

and let q be the following conjunctive query:

$$\begin{array}{@{}rcl@{}} &&\exists x_{1}, x_{2}, x_{3},u_{1}, u_{2}, u_{3} (R(x_{1},x_{2},x_{3}) \wedge P(x_{1},u_{1})\\ &&\quad\quad\quad\wedge~ P(x_{2},u_{2}) \wedge P(x_{3},u_{3}) \wedge S(u_{1},u_{2},u_{3})). \end{array} $$

The intuition behind the relation symbols is as follows: P encodes truth values for the variables in a given Boolean formula, while E is used to simulate equality. The tgd τ expresses that each variable is assigned at most one truth value.

The relation R encodes every clause (x 1x 2x 3) occurring in a formula in conjunctive normal form such that each clause is a disjunction of three positive literals. The relation S will consist of the triples in {0,1}3∖{(1,0,0),(0,1,0),(0,0,1)} . The conjunctive query q expresses that there are a clause (x 1x 2x 3) and truth values assigned to x 1,x 2 and x 3 such that it is not the case that exactly one variable is true in the clause (x 1x 2x 3).

Let ϕ be a formula of the form

$$(x_{11} \vee x_{12} \vee x_{13}) \wedge {\dots} \wedge (x_{n1} \vee x_{n2} \vee x_{n3}), $$

where for all 1≤in and for all 1≤j≤3, x i j is a variable. We denote by V a r the set {x i j ∣1≤in,1≤i≤3}. We construct an instance I such that

$$ \oplus\text{-Con}(q,I,{\Sigma}) = {true} \quad \text{iff} \quad \phi \notin \textsc{Positive 1-in-3-SAT}. $$
(1)

The instance I is defined as follows:

$$\begin{array}{@{}rcl@{}} R^{I} & = & \{ (x_{i1}, x_{i2},x_{i3}) \mid 1 \leq i \leq n \},\\ P^{I} & = & \{ (x_{ij},0), (x_{ij},1) \mid 1 \leq i \leq n, 1 \leq j \leq 3\}, \\ E^{I} & = & \{ (1,1),(0,0) \},\\ S^{I} & = & \{ 0,1\}^{3} \backslash \{ (1,0,0),(0,1,0),(0,0,1) \}. \end{array} $$

In order to prove the implication from left to right of (1), suppose that ⊕-Con(q,I,Σ)=t r u e. Let V:P r o p→{0,1} be an assignment. We have to show that it is not the case that exactly one variable is true in every clause. That is, there exists a conjunct

$$x_{i1} \vee x_{i2} \vee x_{i3} $$

of ϕ such that (V(x i1),V(x i2),V(x i3)) does not belong to {(1,0,0),(0,1,0),(0,0,1)}.

We define an instance J 0 in the following way:

$$\begin{array}{@{}rcl@{}} R^{J_{0}} & = & \{ (x_{i1}, x_{i2},x_{i3}) \mid 1 \leq i \leq n \},\\ P^{J_{0}} & = & \{ (x_{ij},V(x_{ij}))\mid 1 \leq i \leq n, 1 \leq j \leq 3\},\\ E^{J_{0}} & = & \{ (1,1),(0,0) \},\\ S^{J_{0}} & = & \{ 0,1\}^{3} \backslash \{ (1,0,0),(0,1,0),(0,0,1) \}. \end{array} $$

We can check that J 0 is a ⊕-repair of I with respect to Σ. Since the ⊕-consistent answer of q is true, q is true in J 0. Hence, there are x 1,x 2,x 3,u 1,u 2 and u 3 such that

$$R(x_{1},x_{2},x_{3}) \wedge P(x_{1},u_{1}) \wedge P(x_{2},u_{2}) \wedge P(x_{3},u_{3}) \wedge S(u_{1},u_{2},u_{3}) $$

holds in J 0. By definition of R J0, R J0(x 1,x 2,x 3) implies that x 1x 2x 3 is a conjunct of ϕ. It also follows from the definition of P J0 that for all 1≤i≤3, P J0(x i ,u i ) implies u i =V(x i ). Next, by definition of S J0, S J0(u 1,u 2,u 2) means that (u 1,u 2,u 3)∉{(1,0,0),(0,1,0),(0,0,1)}.

Putting everything together, we have a conjunct x 1x 2x 3 of ϕ such that (V(x i1),V(x i2),V(x i3)) does not belong to {(1,0,0),(0,1,0),(0,0,1)}. This finishes the proof of the implication from left to right of (1).

Next we show the implication from right to left of (1). Suppose that ϕ does not belong to Positive 1-in-3-SAT. Let J be a ⊕-repair of I.

We start by proving that R J=R I and S J=S I. Consider the instance J 0 obtained by replacing R J by R I and S J by S I. Since t is true in J and neither R nor S occurs in Σ, t is true in K. By definition of J 0, IJ 0IJ. Since J 0 is a ⊕-repair, this implies that J 0 is equal to J. Therefore, R J=R I and S J=S I.

Since R J=R I and S J=S I, it follows from the definitions of R I, S I and q that

(‡) q is true in J iff there are a conjunct x i1x i2x i3 of ϕ and u 1,u 2,u 3 such that P J(x i1,u 1), P J(x i2,u 2), P J(x i3,u 3) and (u 1,u 2,u 3)∉{(1,0,0),(0,1,0),(0,0,1)}.

Claim 1

One of the followings holds:

  1. (a)

    For all xP r o p and for all u∈{0,1}, P(x,u) belongs to J.

  2. (b)

    For all xP r o p, there is a unique u∈{0,1} such that P J(x,u) holds.

Proof of Claim

We start by proving that

$$ \forall x \in {Prop}, \forall u, u^{\prime} (P^{J}(x,u) \text{ or } E^{J}(u,u^{\prime}) \Rightarrow u,u^{\prime} \in \{ 0,1\}) $$
(2)

Consider the instance J 1 obtained from J by deleting the tuples of the form P J(x,u), E J(u,u ) and E J(u ,u), where xP r o p and u∉{0,1}. The tgd t is satisfied in J 1. It also follows from the definitions of I and J 1 that IJ 1IJ. Since J is a ⊕-repair, this means that J 1=J. That is, for all u,u such that P J(x,u) or E J(u,u ) or E J(u ,u) holds, u belongs to {0,1}.

Next we show that

$$ \{ (0,0),(1,1) \} \subseteq E^{J}. $$
(3)

Consider the instance J 2 obtained from J by adding the tuples E(0,0) and E(0,1) to J. We can check that J 2 satisfies t. Moreover, by definitions of I and J 2, we have IJ 2IJ. Since J is a ⊕-repair, J 2=J. That is, E(0,0) and E(0,1) belong to J.

Putting (2) and (3) together, we obtain that

$$\{ (0,0),(1,1) \} \subseteq E^{J} \subseteq \{ (0,0), (1,1), (0,1), (1,0)\}. $$

We make the following case distinction:

  • Suppose that E J={(0,0),(1,1)}. We show that (b) holds. That is, for all xP r o p, there is a unique u∈{0,1} such that P J(x,u) holds. Since E J={(0,0),(1,1)} and the constraint

    $$\forall x,u,u^{\prime} (P(x,u) \wedge P (x,u^{\prime}) \to E(u,u^{\prime})) $$

    is true in J, we have that for all xP r o p, there exists at most one element u∈{0,1} such that P J(x,u).

    Hence, in order to prove (b), it remains to show that for all xP r o p, there exists u∈{0,1} such that P J(x,u) holds. Let x be an element of P r o p. Suppose for contradiction that there is no u∈{0,1} such that P J(x,u). By (2), this implies that there is no u such that P J(x,u). Let J 3 be the instance obtained by adding the tuple P J(x,0) to J. We prove that t is true in J 3. Suppose that P J3(y,v) and \(P^{J_{6}}(y,v^{\prime })\) hold. We have to show \(E^{J_{3}}(v,v^{\prime })\). If yx, this follows from the definition of J 3 and the fact that t is true in J.

    Next suppose that y=x. That is, P J3(x,v) and \(P^{J_{3}}(x,v^{\prime })\) hold. Recall that there is no u such that P J(x,u). Hence, by definition of J 3, P J3(x,v) and \(P^{J_{3}}(x,v^{\prime })\) imply v=v =0. Since E J={(0,0),(1,1)} and E J=E J3, we have \(E^{J_{3}} (v,v^{\prime })\). This finishes the proof that t is true in J 3.

    Next, it follows from the definition of J 3 that IJ 3IJ. Since J is a repair, J 3=J. This contradicts our assumption that there is no u∈{0,1} such that P J(x,u). This finishes the proof that (b) holds.

  • Next suppose that (0,1) or (1,0) belongs to E J. Without loss of generality, we may assume that (0,1) belongs to E J.

    There must be an element x 0 such that P J(x 0,0) and P J(x 0,1) hold. Indeed, if there is no such element, then we can let J 4 be the instance obtained by removing the tuple E(0,1) from J. Then, since there is no x 0 such that P J(x 0,0) and P J(x 0,1), t is satisfied in J 4. We also have that IJ 4JJ. As J is a ⊕-repair, this implies that J=J 4. That is, E(0,1) does not belong to J, which is a contradiction. Hence, there is x 0 such that P J(x 0,0) and P J(x 0,1).

    Now, since P(x 0,1)∧P(x 0,0) holds in J and the tgd

    $$\forall x,u,u^{\prime} (P(x,u) \wedge P (x,u^{\prime}) \to E(u,u^{\prime})) $$

    is satisfied in J, we have that E(1,0) belongs to J. Therefore, E J={(0,0),(1,0),(0,1),(1,1)}.

    We show that for all xP r o p and for all u∈{0,1}, P(x,u) belongs to J. Take xP r o p and u∈{0,1}. Define J 5 as the instance obtained by adding the tuple P(x,u) to J. Since t is true in J and E J={(0,0),(1,0),(0,1),(1,1)}, t remains true in J 5. It is clear that IJ 5IJ. As J is a ⊕-repair, this implies that J=J 5. In other words, P(x,u) belongs to J. This finishes the proof of the claim.

Suppose first that (a) holds. That is, for all xP r o p and for all u∈{0,1}, P(x,u) belongs to J. We show that q is true in J. Let x i1x i2x i3 be a conjunct of ϕ. Since (a) holds, P(x i1,0), P(x i2,0) and P(x i3,0) belong to J. By (‡), this implies that q is true in J.

Next suppose that (b) holds. That is, for all xP r o p, there is a unique u∈{0,1} such that P J(x,u) holds. Hence, we may choose a valuation V:P r o p→{0,1} such that for all xP r o p and for all u∈{0,1},

$$V(x) = u \quad \text{iff} \quad P^{J} (x,u). $$

Since ϕ does not belong to Positive 1-in-3-SAT, it is not the case that exactly one variable is true in every clause. Therefore, there exists a conjunct x i1x i2x i3 of ϕ such that

$$(V(x_{i1}),V(x_{i2}) ,V(x_{i3})) \notin \{ (1,0,0),(0,1,0),(0,0,1)\}. $$

Moreover, by definition of V, P(x i1,V(x i1)), P(x i2,V(x i2)) and P(x i2,V(x i3)) belong to J. It follows from (‡) that q is true in J. □

Extensions: egds and Semi-LAV

All complexity upper bounds described in this section hold for the more general case of sets of GAV tgds and egds. Note that if Σ is a set of GAV tgds and egds, then an instance I may not have any superset-repair w.r.t. Σ. Still, if a superset-repair exists, then it is unique, and it can be computed in polynomial time using the chase procedure. The existence of a superset-repair can be tested in polynomial time as well (c.f. Theorem 6.1 below).

In [24], the class of semi-LAV sets of dependencies was introduced; it contains properly the class of all sets of GAV tgds, as well as the class of all weakly acyclic sets of LAV tgds. It was shown in [24] that the ⊕-repair checking problem for semi-LAV sets of tgds is still solvable in polynomial time (however, this is no longer true if egds are allowed [2]). For ⋆-consistent query answering, where ⋆∈{subset,superset,⊕}, the complexity for semi-LAV sets of tgds is the same as the complexity for sets of GAV tgds. The lower bounds naturally transfer, and the upper bounds follow from the fact that repair checking is in polynomial time, together with Theorems 6.1 and 6.2 in the next section, since every semi-LAV set of tgds is, by definition, weakly acyclic.

6 Weakly Acyclic Sets of tgds (and egds)

In this section, we study the consistent query answering problem for conjunctive queries and weakly acyclic sets of tgds and egds. We begin by considering universal superset-repairs. Recall that, according to Proposition 3.7, every instance has at most one universal superset-repair, up to isomorphism.

Theorem 6.1

Let Σ be a weakly acyclic set of tgds and egds. If an instance has a superset-repair w.r.t. Σ, then it has a universal superset-repair. Moreover, there is a polynomial time algorithm that, given an instance I, tests whether it has a superset-repair w.r.t. Σ and if so, computes the (unique up to isomorphism) universal superset-repair of I.

Proof

We rely on known results from data exchange. We say that an instance J is a solution for an instance I w.r.t. Σ if IJ and J satisfies Σ. Thus, a superset-repair of I is a solution of I such that no strict subset is a solution for I. In [23], it was shown that if Σ is a fixed weakly acyclic set of tgds and egds, then there is a polynomial-time algorithm that, given an instance I, tests whether it has a solution w.r.t. Σ, and if so, computes a solution J satisfying the following additional properties:

  1. (i)

    For each solution J of I, there is a homomorphism h:JJ that is the identity on values from the active domain of I.

  2. (ii)

    For each solution J satisfying the above condition (i), we have that JJ .

This solution J is is known as the “core universal solution” of I [18], and is unique up to isomorphism.

Let J be the core universal solution of I w.r.t Σ. Since IJ and J satisfies Σ, it remains to show that (a) J is a superset-repair of I, i.e., there is no \(J^{\prime }\subsetneq J\) that contains I and satisfies Σ, and (b) J is a universal superset-repair of I. The first item follows immediately from the above properties (i) and (ii): any such J would satisfy condition (i), thereby contradicting the fact that J satisfies condition (ii) above. The second item follows from the fact that J satisfies condition (i) and from the preservation of conjunctive queries under homomorphisms. □

Recall from Example 3.5 that, in general, an instance might not have a 1-universal subset-repair (hence, a 1-universal ⊕-repair) w.r.t. a weakly acyclic set of tgds.

Next, we move to the consistent query answering problem. One of the key observations for obtaining an upper bound for the complexity of the ⊕-consistent answers is that the size of every ⊕-repair of an instance I is polynomial in the size of I. The proof relies on the solution-aware chase, which was introduced in the context of peer data exchange [32].

Theorem 6.2

Let Σ be a weakly acyclic set of tgds and egds. There is a polynomial p(x) such that for all instances I and for all ⊕-repairs J of I with respect to Σ, the size of J is bounded by p(|I|), where |I| is the size of I.

Proof

Let Σ be a weakly acyclic set of tgds and egds. The proof relies on the algorithm described in the proof of Lemma 3.4 in [20]. Specifically, the proof of that lemma yields a polynomial p(x) and an algorithm that works as follows. Given instances K and L such that KL and \(L \vDash {\Sigma }\), the algorithm computes an instance f(K,L) such that Kf(K,L)⊆L, \(f(K,L)\vDash {\Sigma }\), and the size of f(K,L) is bounded by p(|K|).

Fix an ⊕-repair I 0 of I. We show that the size of I 0 is bounded by p(|I|). We can use the algorithm described in Lemma 3.4 in [20] to produce an instance J:=f(II 0,I 0) satisfying the following. The size of J is bounded by p(|II 0|), II 0JI 0 and \(J \vDash {\Sigma }\). In particular, the size of J is bounded by p(|I|).

In order to show that the size of I 0 is bounded by p(|I|), it is sufficient to show that I 0=J. Since I 0 is a ⊕-repair of I, this is equivalent to proving that J⊧Σ and IJII 0.

We already know that J⊧Σ. It remains to show that IJII 0. Let R(a) be a fact in IJ. We prove that R(a) belongs to II 0. Suppose first that R(a) belongs to JI. We know that JI 0. Hence, if R(a) belongs to JI, then R(a) belongs to I 0I. This implies that R(a) belongs to II 0. Next, assume that R(a) belongs to IJ. We have to show that R(a)∈II 0. Since R(a) belongs to I, this means that we have to prove that R(a)∉I 0. Suppose for contradiction that R(a) belongs to I 0. Then R(a) belongs to II 0. Recall that II 0J. Putting this together with R(a)∈II 0, we obtain that R(a) belongs to J, which is a contradiction. □

Our main result concerning consistent query answering w.r.t. a weakly acyclic set of tgds is a \({{\Pi }_{2}^{p}}\) lower bound both for subset-repairs and for ⊕-repairs. A \({{\Pi }_{2}^{p}}\) lower bound had been obtained earlier for the problem of finding the ⊕-consistent answers w.r.t. a set of functional dependencies and universal constraints [32].

Theorem 6.3

Let Σ be a weakly acyclic set of tgds and egds and let q be a conjunctive query.

  1. 1.

    The superset-consistent query answering problem for q w.r.t. Σ is in PTIME.

  2. 2.

    The ⊕-consistent (subset-consistent) query answering problem for q w.r.t. Σ is in \({{\Pi }_{2}^{p}}\).

  3. 3.

    There is a set Σ of weakly acyclic tgds and a conjunctive query q such that both the ⊕-consistent and the subset-consistent query answering problems for q w.r.t. Σ are \({{\Pi }_{2}^{p}}\) -complete.

Proof

Part 1 is an an immediate consequence of Theorem 6.1. It also follows from Proposition 2.8 and results in [17] concerning the tractability of the certain answers of conjunctive queries in data exchange w.r.t. weakly acyclic sets of tgds and egds. Part 2 follows from Theorem 6.2 and the fact that both the ⊕-repair and the subset-repair checking problems with respect to any set of tgds and egds is in CoNP. For Part 3, we give a weakly acyclic set Σ of tgds and a Boolean conjunctive query q with the following properties: for every quantified Boolean formula ϕ of the form ∀p 1…∀p n q 1…∃q k ψ, where ψ is a conjunction of clauses containing 3 literals, we can construct in polynomial time an instance I ϕ so that the following statements are equivalent:

  1. 1.

    ϕ is true ;

  2. 2.

    subset-Con (q,I ϕ ,Σ)=⊤;

  3. 3.

    ⊕-Con(q,I ϕ ,Σ)=⊤.

Evaluating such formulas is known to be a \({{\Pi }_{2}^{p}}\)-complete problem [30].

We introduce the following tgds:

$$\begin{array}{@{}rcl@{}} \phi_{1} & =& Q(q,v) \to \exists s A(s), \\ \phi_{2} & = & A(s) \wedge Q^{\prime}(q) \to \exists v Q(q,v),\\ \phi_{3} & = & Q(q,v) \wedge Q(q,v^{\prime}) \to E(v, v^{\prime}),\\ \phi_{4} & = & P(p,v) \wedge P(p,v^{\prime}) \to E(v,v^{\prime}),\\ \phi_{a_{1}a_{2}a_{3}}(X_{1},X_{2},X_{3}) & = & R_{a_{1}a_{2}a_{3}} (x_{1},x_{2},x_{3}) \wedge A(s) \wedge \\ & & X^{\prime}_{1}(x_{1}) \wedge X^{\prime}_{2} (x_{2}) \wedge X^{\prime}_{3} (x_{3}) \to \\ & & \exists v_{1}, v_{2}, v_{3} (X_{1} (x_{1},v_{1}) \wedge X_{2} (x_{2},v_{2}) \\ & & \wedge X_{3}(x_{3},v_{3}) \wedge T_{a_{1}a_{2}a_{3}}(v_{1},v_{2},v_{3})), \end{array} $$

where a i ∈{0,1} and X i ∈{P,Q}, for i=1,2,3. We now let

$$\begin{array}{@{}rcl@{}} {\Sigma} & = & \{\phi_{1}, \phi_{2},\phi_{3},\phi_{4} \} \cup \\ & & \{ \phi_{a_{1}a_{2}a_{3}}(X_{1}, X_{2},X_{3}) \mid a_{i} \in \{ 0,1\}, X_{i} \in \{ P,Q\} \}. \end{array} $$

We also let q be the query ∃s A(s). Note that Σ is a weakly acyclic set. Indeed, the only special edges are from positions in Q , P and R; however, there is no incoming edge to a position in Q , P or R.

The intuition behind the relation symbols is as follows. The relation P encodes the universally quantified variables p 1,…,p n of ϕ, while Q encodes the existentially quantified variables q 1,…,q k of ϕ. Furthermore, P encodes truth values for p 1,…,p n , while Q encodes truth values for q 1,…,q k . We use the symbol E to simulate equality; it will consist of the tuples (0,0) and (1,1). The tgds ϕ 3 and ϕ 4 express that each variable gets assigned at most one truth value.

In what follows, we will use x,x 1,x 2,x 3,… to denote literals. If x is a positive literal, we define the inverse sign of x, denoted by i s(x), to be 0. If x is a negative literal, we define i s(x) to be 1. Thus, i s(x) denotes the opposite of the sign of the literal x. For a 1,a 2,a 3∈{0,1}, we will use a relation \(R_{a_{1}a_{2}a_{3}}\) to encode clauses of the form x 1x 2x 3 such that i s(x i )=a i . The relation \(T_{a_{1}a_{2}a_{3}}\) consists of those truth assignments that make true the clauses of the form x 1x 2x 3, where i s(x i )=a i .

The symbol A acts as guard. It is activated (that is, becomes non-empty) as soon as one variable in {q 1,…,q k } gets assigned a truth value (see the tgd ϕ 1). Once A has been activated, it makes sure that each variable in {q 1,…,q k } is assigned a truth value (as expressed by the tgd ϕ 2). Hence, either no variable in {q 1,…,q k } gets assigned a truth value or all variables in {q 1,…,q k } get assigned a truth value. Moreover, when A is non-empty, the tgds of the form \(\phi _{a_{1}a_{2}a_{3}} (X_{1},X_{2},X_{3})\) express that the truth assignment is such that each clause of the form x 1x 2x 3 occurring in ϕ is true under the truth assignment.

Consider a quantified Boolean formula ϕ of the form

$$\forall p_{1}, \dots, p_{n} \exists q_{1}, \dots, q_{k} \psi, $$

where ψ is a conjunction of clauses of the form

$$x_{i1} \vee x_{i2} \vee x_{i3}, $$

with 1≤il and x i j ∈{p a p a ,q b q b ∣1≤an,1≤bk}.

We construct now an instance I ϕ so that the following statements are equivalent:

  1. 1.

    ϕ is true ;

  2. 2.

    subset-Con (q,I ϕ ,Σ)=⊤;

  3. 3.

    ⊕-Con(q,I ϕ ,Σ)=⊤.

We define I ϕ =I by

$$\begin{array}{@{}rcl@{}}</p><p class="noindent">A^{I} & = & \{ 1 \}, \\ (Q^{\prime})^{I} & = & \{ q_{i} \mid 1 \leq i \leq k\}, \\ (P^{\prime})^{I} & = & \{ p_{i} \mid 1 \leq i \leq n\}, \\ P^{I} & = & \{ (p_{i},v) \mid 1 \leq i \leq n, v \in \{ 0,1\}\}, \\ Q^{I} & = & \{ (q_{i},v) \mid 1 \leq i \leq k, v \in \{ 0,1\}\}, \\ E^{I} & = & \{ (0,0), (1,1) \}, \\ T^{I}_{a_{1}a_{2},a_{3}} & = & \{ (v_{1}, v_{2},v_{3}) \mid \forall 1\leq i \leq 3, v_{i} \in \{ 0,1\}, \\ & & (v_{1},v_{2},v_{3}) \neq (a_{1},a_{2},a_{3})\},\\ R^{I}_{a_{1}a_{2}a_{3}} & = & \{ (x_{1},x_{2},x_{3}) \mid x_{1} \vee x_{2} \vee x_{3} \text{ is a clause of } \psi \text{ and } \\ & & \forall 1 \leq i \leq 3, {is}(x_{i}) = a_{i} \}. \end{array} $$

In order to prove that the instance I has the required properties, it suffices to show that

$$\begin{array}{@{}rcl@{}} \text{subset-Con} (q,I,{\Sigma}) = \top & \Rightarrow & \phi \text{ is true}, \end{array} $$
(4)
$$\begin{array}{@{}rcl@{}} \phi \text{ is true } & \Rightarrow & \oplus\text{-Con} (q,I,{\Sigma}) = \top . \end{array} $$
(5)

We start by showing (4).

Claim 2

If the subset-consistent answer of q is true, then ϕ is true. □

Proof of Claim

Suppose that the consistent answer of q is true and let V P :{p i ∣1≤in}→{0,1} be a valuation. We have to find a valuation V Q :{q i ∣1≤ik}→{0,1} such that ψ is true under V P V Q . We define an instance J in the following way:

$$\begin{array}{@{}rcl@{}} A^{J} & = & \emptyset, \\ (Q^{\prime})^{J} & = & (Q^{\prime})^{I}, \\ (P^{\prime})^{J} & = & (P^{\prime})^{I}, \\ P^{J} & = & \{ (p_{i},V_{P}(p_{i})) \mid 1 \leq i \leq n\}, \\ Q^{J} & = & \emptyset,\\ E^{J} & = & E^{I}, \\ T^{J}_{a_{1}a_{2}a_{3}} & = & T^{I}_{a_{1}a_{2}a_{3}},\\ R^{J}_{a_{1}a_{2}a_{3}} & = & R^{I}_{a_{1}a_{2}a_{3}}. \end{array} $$

The formulas of Σ are satisfied in J. Hence, there is a subset-repair J 0 of I such that IJ 0IJ. This implies that \((Q^{\prime })^{J_{0}} = (Q^{\prime })^{I}\), \((P^{\prime })^{J_{0}} = (P^{\prime })^{I}\), E J0=E I, T J0=T I, R J0=R I, \(Q^{J_{0}} \subseteq Q^{I}\) and \(P^{J} \subseteq P^{J_{0}}\).

As the subset-consistent answer of q is true, we have A J0. Take i∈{1,…,k}. Since A J0, \((Q^{\prime })^{J_{0}}(q_{i})\) holds and ϕ 2i is true in J 0, there exists v i such that Q J0(q i ,v i ) holds. As \(Q^{J_{0}} \subseteq Q^{I}\), v i belongs to {0,1}.

We define a valuation V Q :{q i ∣1≤ik}→{0,1} such that for all i∈{1,…,k}, we have

$$V_{Q}(q_{i}) = v_{i}. $$

We show that ψ is true under V:=V P V Q . Consider a clause x i1x i2x i3 of ψ. For all 1≤j≤3, define a j by i s(x i j ). By definition of R I, we have \(R^{I}_{a_{1}a_{2}a_{3}}(x_{i1}, x_{i2}, x_{i3})\). Since \(R^{I}_{a_{1}a_{2}a_{3}} = R^{J_{0}}_{a_{1}a_{2}a_{3}}\), we also have \(R^{J_{0}}_{a_{1}a_{2}a_{3}} (x_{i1}, x_{i2}, x_{i3})\).

If j∈{1,2,3} and x i j =p l for some 1≤ln, we define X j as P. Otherwise, we define X j as Q. By definition of (P )J and (Q )J, we have \((X_{j}^{\prime })^{J} (x_{ij})\). Since \((X_{j}^{\prime })^{J} \subseteq (X_{j}^{\prime })^{J_{0}}\), this implies \((X_{j}^{\prime })^{J_{0}} (x_{ij})\).

Putting everything together we obtain that

$$ R^{J_{0}}_{a_{1}a_{2}a_{3}} (x_{i1},x_{i2},x_{i3}) \wedge (X_{1}^{\prime})^{J_{0}}(x_{i1}) \wedge (X_{2}^{\prime})^{J_{0}} (x_{i2}) \wedge (X_{3}^{\prime})^{J_{0}} (x_{i3}). $$
(6)

Since Σ is true in J 0, \(\phi _{a_{1}a_{2}a_{3}}(X_{1}, X_{2},X_{3})\) holds in J 0. Putting this together with (6) and the fact that A J0, there exists v 1,v 2 and v 3 such that

$$X^{J_{0}}_{1} (x_{i1},v_{1}) \wedge X^{J_{0}}_{2} (x_{i2},v_{2}) \wedge X^{J_{0}}_{3}(x_{i3},v_{3}) \wedge T^{J_{0}}_{a_{1}a_{2}a_{3}} (v_{1},v_{2},v_{3}). $$

We show that for all 1≤j≤3,

$$ v_{j} = V(x_{ij}). $$
(7)

Take j∈{1,2,3}. Suppose first that X j =Q. Since ϕ 3i is true in J 0 and E J0={(0,0),(1,1)}, there is a unique v j such that Q J0(x i j ,v j ) and v j ∈{0,1}. By definition of V, we have Q J0(x i j ,V(x i j )). Hence, v j =V(x i j ). The case where X J =P is similar and this finishes the proof of (7).

Since \(T^{J_{0}}_{a_{1}a_{2}a_{3}} (v_{1},v_{2},v_{3})\) and \(T^{J_{0}}_{a_{1}a_{2}a_{3}} = T^{I}_{a_{1}a_{2}a_{3}}\), we have (v 1,v 2,v 3)≠(a 1,a 2,a 3). It follows from (7) that

$$(V(x_{i1}), V(x_{i2}), V(x_{i3})) \neq (a_{1},a_{2},a_{3}). $$

By definition a 1,a 2 and a 3, this means

$$(V(x_{i1}), V(x_{i2}), V(x_{i3})) \neq ({is}(x_{1}),{is}(x_{2}), {is}(x_{3})). $$

Together with the definition of the inverse sign i s, we obtain that x i1x i2x i3 is true under the valuation V. □

Next we prove that (5) holds.

Claim 3

If ϕ is true, then the ⊕-consistent answer of q is true.

Proof of Claim

Assume that ϕ is true and let J be a ⊕-repair of I. Suppose for contradiction that A J=. As ϕ 1 is true in J, this implies that Q J=.

We start by showing that (P )J=(P )I. Let J 1 be the instance obtained from J by replacing (P )J by (P )I. Since Σ is satisfied in J and A J=, Σ is satisfied in J 1 as well. Moreover, it follows from the definition of J 1 that IJ 1IJ. Since J is a ⊕-repair of I, this implies J=J 1. Hence, (P )J=(P )I. Similarly, we can show that (Q )J=(Q )I.

Next we prove that

$$ \text{either } E^{J}= E^{I} \text{ or } E^{J} = \{ (0,0),(1,1),(1,0),(0,1) \}. $$
(8)

In order to show (8), we start by proving that

$$ E^{I} \subseteq E^{J} \subseteq \{ (0,0),(1,1),(1,0),(0,1) \}. $$
(9)

We define J 2 as the instance obtained from J by replacing E J by (E JE I)∩{(0,0),(1,1),(1,0),(0,1)}. We can check that Σ is satisfied in J 2. Moreover, we have that IJ 2IJ. Since J is a ⊕-repair of I, this means that J=J 2. In particular,

$$E^{J} = (E^{J} \cup E^{I}) \cap \{ (0,0),(1,1),(1,0),(0,1) \}. $$

This implies that (9) holds.

Now in order to derive (8) from (9), it is sufficient to prove that E J(0,1) implies E J(1,0) and E J(1,0) implies E J(0,1). We only show that E J(0,1) implies E J(1,0); the other implication is similar.

Assume that E J(0,1). We claim that there exists 1≤in such that P J(p i ,0) and P J(p i ,1). Suppose for contradiction that there is no such a i. Let J 3 be the instance obtained from J by replacing E J by E J∖{(0,1)}. Since there is no i such that P J(p i ,0) and P J(p i ,1), ϕ 4i is true for all 1≤in. As Q J=, ϕ 3i is also true for all 1≤ik. It follows that Σ is true in J 3. As \(I \oplus J_{3} \varsubsetneq I \oplus J\), this contradicts the fact that J is a ⊕-repair of I. Hence, there exists 1≤i 0n such that \(P^{J}(p_{i_{0}},0)\) and \(P^{J}(p_{i_{0}},1)\).

Since \(P^{J}(p_{i_{0}},1) \wedge P^{J}(p_{i_{0}},0)\) holds and \(\phi _{4i_{0}}\) is true in J, we must have E J(1,0). This finishes the proof that E J(0,1) implies E J(1,0); hence the proof of (8).

  • Suppose first that E J=E I. Since for all 1≤in, ϕ 4i is true in J, there is at most one value v i ∈{0,1} such that P J(p i ,v i ). Hence, we can pick a valuation V P :{p i ∣1≤in}→{0,1} such that for all 1≤in,

    $$P^{J}(p_{i},v_{i}) \quad \text{implies} \quad v_{i} = V_{p}(p_{i}). $$

    Since ϕ is true, there exists a valuation V Q :{q i ∣1≤ik}→{0,1} such that ψ is true under the valuation V P V Q .

    Next we define the instance J 1 in the following way:

    $$\begin{array}{@{}rcl@{}} A^{J_{1}} & = & \{ 1\},\\ (P^{\prime})^{J_{1}} & = & (P^{\prime})^{I}, \\ (Q^{\prime})^{J_{1}} & = & (Q^{\prime})^{I}, \\ E^{J_{1}} & = & E^{I}, \\ P^{J_{1}} & = & P^{J},\\ Q^{J_{1}} & = & \{ (q_{i},V_{q}(q_{i})) \mid 1 \leq i \leq k\},\\ T^{J_{1}}_{a_{1}a_{2}a_{3}} & = & T^{I}_{a_{1}a_{2}a_{3}} ,\\ R^{J_{1}}_{a_{1}a_{2}a_{3}} & = & R^{I}_{a_{1}a_{2}a_{3}} . \end{array} $$

    Using the fact that ψ is true under the valuation V P V Q , it is possible to check that Σ is true in J 1. Moreover, as Q J= and A J=, we can also show that IJ 1IJ. Since J is a ⊕-repair of I, we can conclude that J=J 1. This implies that A J, which is a contradiction.

  • Suppose finally that E J={(0,0),(1,1),(1,0),(0,1)}. We define an instance J 1 in the following way:

    $$\begin{array}{@{}rcl@{}} A^{J_{1}} & = & \{ 1\},\\ (P^{\prime})^{J_{1}} & = & (P^{\prime})^{I}, \\ (Q^{\prime})^{J_{1}} & = & (Q^{\prime})^{I}, \\ E^{J_{1}} & = & E^{J} = \{ (0,0),(1,1),(1,0),(0,1) \}, \\ P^{J_{1}} & = & P^{I},\\ Q^{J_{1}} & = & Q^{I},\\ T^{J_{1}}_{a_{1}a_{2}a_{3}} & = & T^{I}_{a_{1}a_{2}a_{3}} ,\\ R^{J_{1}}_{a_{1}a_{2}a_{3}} & = & R^{I}_{a_{1}a_{2}a_{3}} . \end{array} $$

    Using the fact that E J1={(0,0),(1,1),(1,0),(0,1)}, we can check that Σ is true in J 1. Next, as E J={(0,0),(1,1),(1,0),(0,1)}, we also have IJ 1IJ. Since J is a ⊕-repair of I, this implies that J=J 1. Hence, A J, which is a contradiction.

7 Arbitrary tgds (and egds)

In this section, we obtain results about arbitrary sets of tgds that contrast sharply with our earlier results about weakly acyclic sets of tgds. We begin by pointing out that our previous techniques are not of help in the study of arbitrary sets of tgds. First, as seen in Example 3.4, there is a (non-weakly acyclic) set of LAV tgds Σ and an instance I such that I has superset-repairs w.r.t. Σ (hence also ⊕-repairs w.r.t. Σ) of arbitrarily large sizes that cannot be bounded by any function in the size of the original instance. This contrasts with the state of affairs for weakly acyclic sets of tgds and egds in Theorem 6.2. Furthermore, recall Theorem 4.6, which states that if Σ is any set of LAV tgds, then given any instance we compute in polynomial time an n-universal superset-repair w.r.t. Σ, for any fixed n≥1 (even though a universal superset-repair may not exist). But, by Proposition 3.8, in cases where Σ consists of arbitrary tgds, an n-universal superset-repairs may not be effectively computable even for n=1.

We present now the results concerning the consistent query answering problem. We begin with the subset-consistent query answering and the superset-consistent query answering problems.

Theorem 7.1

The following statements are true.

  1. 1.

    If Σ is a set of tgds and q is a conjunctive query, then the subset-consistent query answering problem for q w.r.t. Σ is in \({{\Pi }^{p}_{2}}\).

  2. 2.

    There is a set of tgds Σ and a conjunctive query q such that the subset-consistent query answering problem for q w.r.t. Σ is \({{\Pi }^{p}_{2}}\) -complete.

  3. 3.

    There is a set of tgds Σ and a conjunctive query q such that the superset-consistent query answering problem for q w.r.t. Σ is undecidable.

Proof

We already mentioned the \({{\Pi }^{p}_{2}}\) upper bound of subset-consistent query answering in Section 2; the lower bound was shown in Theorem 6.3, even for a weakly acyclic set of tgds.

As regards superset-repairs, it was shown in [28] that there is a (non-weakly acyclic) set Σ of tgds and egds such that the following problem is undecidable: given an instance I, is there an instance J⊧Σ such that IJ? Since every such instance J contains a superset-repair for I, we have that checking whether a given instance has a superset-repair with respect to Σ is an undecidable problem as well. By taking q=∃x P(x), where P is a fresh relation, it follows that the superset-consistent query answering problem is also undecidable. The egds of Σ can be eliminated by a standard construction: we add another binary relation E to the schema and extend Σ with GAV tgds stating that E is a congruence, i.e., an equivalence relation such that every other relation in the schema is closed under substitution of equals by equals with respect to the equivalence relation E. Alternatively, a proof of this undecidability result can be obtained by adapting the proof of Theorem 7.2 given later on in this section. □

We now come to the main result, which asserts that there is a set Σ of tgds and a conjunctive query q such that computing the ⊕-consistent answers of q w.r.t. Σ is an undecidable problem. As mentioned in the Introduction, this improves a result in [3] asserting that there is a set of universal first-order sentences and a universal query (in fact, the negation of a conjunctive query) such that computing the ⊕-consistent answers is an undecidable problem.

Theorem 7.2

There is a set Σ of tgds and a conjunctive query q such that the ⊕-consistent query answering problem for q w.r.t. Σ is undecidable.

The remainder of this section is devoted to the proof of Theorem 7.2. We start with the following lemmas.

Lemma 7.3

Let Σ be a set of tgds and egds. Suppose that R is a relational symbol that does not appear on the right-hand side of any tgd in Σ. Then for all ⊕-repairs J of an instance I, R J ⊆R I.

Proof

Let J be a ⊕-repair of I. We define the instance K such that R K=R JR I and for all relational symbols SR, S K=S J. Since \(J \vDash {\Sigma }\) and R never appears on the right-hand side of a tgd in Σ, Σ remains true in K. It also follows from the definition of K that IKIJ. Since J is a ⊕-repair, this implies that K=J. In particular, R J is equal to R K(=R JR I). That is, R JR I. □

Lemma 7.4

Let Σ be a set of tgds and let q be a query of the form

$$ q_{1} \vee {\dots} \vee q_{k} \vee \neg r_{1} \vee {\dots} \vee \neg r_{l}, $$
(10)

where q 1 ,…,q k ,r 1 ,…,r l are conjunctive queries over a schema R. There is a set Σ of tgds and a conjunctive query q , both over a schema R R, such that for every instance I over R, we can compute in polynomial time an instance I over R satisfying

$$\oplus \text{-Con} (q, I,{\Sigma}) = \oplus \text{-Con} (q^{\prime},I^{\prime},{\Sigma}^{\prime}). $$

Proof

Suppose that q is a query of the form

$$q_{1} \vee {\dots} \vee q_{k} \vee \neg r_{1} \vee {\dots} \vee \neg r_{l}, $$

where q 1,…,q k ,r 1,…,r l are conjunctive queries.

Let A and B be fresh relation symbols. We define R as the schema R∪{A,B}. For all 1≤ik, if q i =∃x i ϕ i (x i ), we let t i be the tgd given by

$$\phi_{i} (\mathbf{x_{i}}) \to \exists s B(s). $$

Next, if r i =∃y i ψ i (y i ) for 1≤il, we define t as the tgd

$$A(s) \wedge \psi_{1} (\mathbf{y}_{1}) \wedge {\dots} \wedge \psi_{l} (\mathbf{y}_{l}) \to \exists u B(u), $$

with s not occurring in y 1,…,y l . We define Σ as the set

$${\Sigma} \cup \{ t, t_{i} \mid 1 \leq i \leq k\} $$

and we let q be the conjunctive query ∃s A(s). Let I be an instance over R. We define I such that \(A^{I^{\prime }} = \{ 1\}\), \(B^{I^{\prime }} = \emptyset \) and for all RR, \(R^{I^{\prime }} = R^{I}\). We have to show that

$$ \oplus\text{-Con} (q, I,{\Sigma}) = \top \text{ iff } \quad \oplus\text{-Con} (q^{\prime},I^{\prime},{\Sigma}^{\prime}) = \top. $$
(11)

We start by proving the direction from left to right of (11). Assume that ⊕-Con(q,I,Σ) is true. Let J be a ⊕-repair of I with respect to Σ. We have to show that ∃s A(s) is true in J . We define \(J^{\prime }_{0}\) as the instance obtained from J by adding the fact A(1). It suffices to show that \(J^{\prime } = J^{\prime }_{0}\).

Let J be the instance over R such that for all RR, \(R^{J} = R^{J^{\prime }}\). J is a ⊕-repair of I with respect to Σ. Since ⊕-Con(q,I,Σ) is true, q is true in J. Therefore, q is also true in J .

We show that this implies that t is true in \(J^{\prime }_{0}\). We make the following case distinction:

  • Suppose that for some 1≤ik, q i is true in J . Hence, there exists a such that ϕ i (a) holds in J . Since Σ is true in J , t i given by

    $$\phi_{i} (\mathbf{x_{i}}) \to \exists s B(s) $$

    is true in J . It follows that ∃s B(s) is true in J ; hence, in \(J^{\prime }_{0}\). This implies that the tgd t is true in \(J^{\prime }_{0}\).

  • Next suppose that for some 1≤in, ¬r i is true in J . Since r i is false in J , r i is also false in \(J^{\prime }_{0}\). So there are no y i such ψ i (y i ) holds in \(J^{\prime }_{0}\). Hence, the left-hand side of t is never satisfied in \(J^{\prime }_{0}\). In particular, t is true in \(J^{\prime }_{0}\).

Since A does not occur in Σ∪{t i ∣1≤ik} and that set of tgds is true in J , Σ∪{t i ∣1≤ik} is also true in \(J^{\prime }_{0}\). Together with \(J^{\prime }_{0} \vDash t\), we obtain \(J^{\prime }_{0} \vDash {\Sigma }^{\prime }\). It is clear that \(I^{\prime } \oplus J^{\prime }_{0} \subseteq I^{\prime } \oplus J^{\prime }\). Since J is a ⊕-repair of I with respect to Σ, this implies \(J^{\prime }_{0} = J^{\prime }\) and this finishes the proof that q is true in J .

Next we prove the implication from right to left of (11). Assume that ⊕-Con(q ,I ) is true. Suppose for contradiction that q is false in a ⊕-repair J of I with respect to Σ. Let J be the instance such that \(A^{J^{\prime }} = \emptyset \), \(B^{J^{\prime }} = \emptyset \) and for all RR, \(R^{J^{\prime }} = R^{J}\). In order to finish the proof, it suffices to show that J is a ⊕-repair of I with respect to Σ. Indeed, q is clearly false in J , which contradicts the fact that ⊕-Con(q ,I ) is true.

Now we prove that J is a ⊕-repair of I with respect to Σ. There exists a ⊕-repair K of I with respect to Σ such that IK IJ . We prove that J =K . Let K be the instance over R such that for all RR, \(R^{K} = R^{K^{\prime }}\). K is a ⊕-repair of I with respect to Σ and IKIJ. Since J is a ⊕-repair of I with respect to Σ, K=J. It follows that for all RR, \(R^{J^{\prime }} = R^{K^{\prime }}\).

In order to prove that J =K , it remains to show that \(B^{J^{\prime }} = B^{K^{\prime }}\) and \(A^{J^{\prime }} = A^{K^{\prime }}\). Since \(B^{J^{\prime }} = B^{I^{\prime }}\) and I K I J , we have \(B^{K^{\prime }} = B^{I^{\prime }} = B^{J^{\prime }} \).

Suppose for contradiction that \(A^{J^{\prime }} \neq A^{K^{\prime }}\). Since \(A^{J^{\prime }} = \emptyset \), \(A^{I^{\prime }} = \{1 \}\) and IK IJ , this can only happen if \(A^{K^{\prime }} = \{ 1\}\). Recall that q is false in J. In particular, for all 1≤il, r i is true in J. That is, for all 1≤il, there exists a tuple b i such that ψ i (b i ) is true in J; hence, in J and in K . This implies that

$$A(1) \wedge \psi_{1} (\mathbf{b}_{1}) \wedge {\dots} \wedge \psi_{l} (\mathbf{b}_{l}) $$

is true in K . Since K is a ⊕-repair with respect to Σ, t given by

$$A(s) \wedge \psi_{1} (\mathbf{y}_{1}) \wedge {\dots} \wedge \psi_{l} (\mathbf{y}_{l}) \to \exists u B(u), $$

is true in K . Therefore, ∃u B(u) is true in K . This contradicts the fact that \(B^{K^{\prime }} = \emptyset \). □

Note that Lemma 7.4 remains true if we consider subset-repairs, instead of ⊕-repairs. In the case of superset-repairs, we do not know whether the lemma holds.

Theorem 7.2 will be proved via a reduction from the halting problem for two-register machines. In describing two-register machines, we follow the definition used in [6].

A two-register machine (2RM) is similar to Turing machines, except that, instead of a tape, it has two registers r 1,r 2. Each register contains a natural number. A 2RM is programmed by a numbered sequence α 0,…,α l of instructions. Each instruction α i is either an addition or a subtraction. An addition has the form +(r g,j), with r g a register number and jl an instruction number. Its semantics is: add one to register r g and move to instruction α j . A subtraction has the form −(r g,j,k) with r g a register number and j,kl instruction numbers. Its semantics is: if content of register r g is zero, then move to instruction α j ; otherwise, subtract one from register r g and move to instruction α k .

An instantaneous description (ID) of a 2RM M is a triple (s,t,i) with il an instruction number and m,n≥0 natural numbers representing the content of the registers r 1 and r 2. The unique successor of an ID (s,t,i) is the ID (s ,t ,i ) such that

  • if α i =+(1,j), then s =s+1, t =t, i =j;

  • if α i =+(2,j), then s =s, t =t+1, i =j;

  • if α i =−(1,j,k) and s≠0, then s =s−1, t =t, i =j;

  • if α i =−(1,j,k) and s=0, then s =s, t =t, i =k;

  • if α i =−(2,j,k) and t≠0, then s =s, t =t−1, i =j;

  • if α i =−(2,j,k) and t=0, then s =s, t =t, i =k.

The ID (0,0,l) is called final. If M=(α 0,…,α l ) is a 2RM, then the run of M is the sequence (D i ) i≥1 of ID’s such that D 0=(0,0,0) and D i+1 is the successor of D i , for all i≥1. The run is halting if it contains the ID (0,0,l). The halting problem for 2RM is to determine whether the run of a given 2RM is halting.

Theorem 7.5

[8] The halting problem for two-register machines is undecidable.

In the proof of Theorem 7.2, we also make use of the following property of 2RMs.

Lemma 7.6

Given a 2RM M=(α 0 ,…,α l ), we can compute a 2RM M such that the successor of the ID (0,0,l) in M is itself and

$$\text{the run of } M \text{ is halting} \quad \text{iff} \quad \text{the run of } M^{\prime} \text{ is halting}. $$

Proof

Let M=(α 0,…,α l ) be a 2RM. We define M =(β 0,…,β l+4) as the following 2RM:

$$\begin{array}{@{}rcl@{}} \beta_{i} & = & \alpha_{i},\\ \beta_{l} & = & - (1, l+2,l+4),\\ \beta_{l+1} & = & \alpha_{l}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \beta_{l+2} & = & + (1,l+1),\\ \beta_{l+3} & = & + (2,l+1),\\ \beta_{l+4} & = & - (2,l+3,l+4), \end{array} $$

where 0≤i<l. By definition of M , the successor of ID (0,0,l+4) is (0,0,l+4). Hence, it remains to show that the run of M is halting iff the run of M is halting. This is implied by the following property: for all IDs a and b such that a≠(0,0,l) and b is the successor of a in M, there exists a sequence a 0a n of ID such that

$$ a_{0} = a, a_{n} = b \text{ and } a_{i+1} \text{ is the successor of the ID } a_{i} \text{ in } M^{\prime}, $$
(12)

where 1≤i<n.

Let b be the successor in M of an ID a=(s,t,i) such that a≠(l,0,0). We prove the existence of a sequence a 0a n satisfying (12). If il, then the sequence a 0 a 1 with a 0=a and a 1=b satisfies (12).

Next suppose that i=l. We make the following case distinction. Assume first that s≠0. Then the successor ID of a=(s,t,l) in M is (s−1,t,l+2). It also follows from the definition of M that the successor ID of (s−1,t,l+2) in M is (s,t,l+1). Since β l+1=α l , the successor ID of (s,t,l+1) in M is b. Hence, the sequence a 0a 3 with a 0=a, a 1=(s−1,t,l+2), a 2=(s,t,l+1) and a 3=b satisfies (12).

Assume now that s=0. Since a≠(0,0,l), this means that t≠0. By definition of M , the successor ID of a=(0,t,l) in M is a 1=(0,t,l+4). Since t≠0, the successor ID of a 1 in M is a 2=(0,t−1,l+3). Next, the successor ID of a 2 in M is a 3=(0,t,l+1). Since β l+1=α l , the successor ID of (0,t,l+1) in M is b. The sequence a 0a 4 with a 0=a and a 4=b satisfies (12). □

We now proceed with the proof of Theorem 7.2.

Proof of Theorem 7.2

We shall give a reduction from the halting problem for 2RMs. For this, we define a set Σ of tgds and a conjunctive query q such that for every 2RM M, we can compute an instance I such that

$$ \text{the run of} M \text{is halting}\,\Longleftrightarrow \oplus\text{-Con} (q^{\prime},I^{\prime},{\Sigma}^{\prime}) \neq \top . $$
(13)

The idea is that each ⊕-repair of I M encodes the run of a subset of the instructions in M (due to the symmetric difference semantics, we cannot prevent instructions from being dropped). The run of M will be halting if and only if some ⊕-repair of I M contains the final ID of M. Hence, if q is a conjunctive query expressing that the final ID occurs, then the run of M is halting if and only if q is true in some ⊕-repair of I. That is, the run of M is halting if and only if ⊕-Con(¬q,I M,Σ)≠⊤. The problem is that ¬q is not a conjunctive query, and this is where we use Lemma 7.4.

By Lemma 7.4, in order to find Σ and q satisfying (1), it suffices to find a set Σ of tgds and a query q of the form 10 (as stated in Lemma 7.4) such that for all 2RM M, we can compute an instance I satisfying

$$ \text{the run of }M \text{ is halting} \quad \Longleftrightarrow \quad \oplus \text{-Con}(q, I,{\Sigma}) \neq \top. $$
(14)

Let Σ be the set of the following tgds:

$$\begin{array}{@{}rcl@{}} \phi_{1}^{+} & = & S(s,t,i) \wedge R_{1,+} (i,j) \to \exists s^{\prime} {Succ}(s,s^{\prime}) \\ \psi_{1}^{+} & = & S(s,t,i) \wedge R_{1,+} (i,j) \wedge {Succ}(s,s^{\prime}) \to S(s^{\prime},t,j) \\ \phi_{2}^{+} & = & S(s,t,i) \wedge R_{2,+} (i,j) \to \exists t^{\prime} {Succ}(t,t^{\prime}) \\ \psi_{2}^{+} & = & S(s,t,i) \wedge R_{2,+} (i,j) \wedge {Succ}(t,t^{\prime}) \to S(s,t^{\prime},j) \\ \phi_{1}^{-} & = & S(s,t,i) \wedge R_{1,-} (i,j,k) \wedge {Succ}(s^{\prime},s) \to S(s^{\prime},t,j) \\ \phi_{1}^= & = & S(s,t,i) \wedge R_{1,-} (i,j,k) \wedge {Zero} (s) \to S(s,t,k) \\ \phi_{2}^{-} & = & S(s,t,i) \wedge R_{2,-} (i,j,k) \wedge {Succ}(t^{\prime},t) \to S(s,t^{\prime},j) \\ \phi_{2}^= & = & S(s,t,i) \wedge R_{2,-} (i,j,k) \wedge {Zero} (t) \to S(s,t,k) \\ \phi_{1}^{{Anc}} & = & {Anc} (u,s) \wedge {Succ} (s,s^{\prime}) \to {Anc} (u,s^{\prime}) \\ \phi_{2}^{{Anc}} & = & {Succ} (s,s^{\prime}) \to {Anc} (s,s^{\prime}) \\ \phi_{f} & = & {Final} (j) \wedge S(s,t,j) \wedge {Zero} (s) \wedge {Zero} (t) \to \exists s F(s) \end{array} $$

Let q be the query ∃s A n c(s,s)∨¬(∃s F(s)). The intuition behind the relation symbols is as follows. We use S to encode IDs: the fact S(s,t,i) corresponds to the ID where s is the content of the first register, t is the content of the second register and i is an instruction number. We use the relation S u c c to encode the successor relation on an initial segment of the natural numbers, while A n c encodes the strict natural ordering on that initial segment. The relation A n c is the transitive closure of S u c c, as expressed by the tgds \(\phi _{1}^{{Anc}}\) and \(\phi _{2}^{{Anc}}\). We represent the initial and the final instructions of the machine with the relations Z e r o and F i n a l. The relation Z e r o will consist of the singleton 0, while F i n a l will consist of the singleton l (where l+1 is the number of instructions of the machine).

The relations R 1,+, R 1,−, R 2,+ and R 2,− encode the instructions of the machines. A fact R n,+(i,j) expresses that the instruction α i is equal to +(n,j). Similarly, a fact R n,−(i,j,k) expresses that the instruction α i is the subtraction −(n,j,k).

The tgd \(\phi _{1}^{+}\) expresses that if the ID is (s,t,i) and instruction α i needs to add one to the first register, then the initial segment of the natural numbers that we consider should at least contain the successor of s. The tgd \(\psi _{1}^{+}\) expresses that if the ID is (s,t,i), the instruction α i is +(1,j), and s is equal to s+1, then we move to the ID (s ,t,j). The tgds \(\phi _{2}^{+}\) and \(\psi _{2}^{+}\) have similar meanings.

The tgd \(\phi _{1}^{-}\) expresses that if the ID is (s,t,i), the instruction α i is −(1,j,k), and s is equal to s−1 (in particular, s≠0), then we move to the ID (s ,t,j). The tgd ϕ1= expresses that if the ID is (s,t,i), the instruction α i is +(1,j) and s is equal to 0, then we move to the ID (s,t,k).

Finally, F acts as an indicator for termination. It becomes non-empty, when the instance contains an ID of the form (0,0,l). This is expressed by the tgd ϕ f . The query q is false in an instance if and only if the relation F is non-empty and the ancestor relation does not contain any tuple of the form (s,s). Intuitively, this means that the instance contains an ID of the form (0,0,l) and that the successor relation does not contain any cycle.

We associate an instance I M with each 2RM. Let M=(α 0,…,α l ) be a 2RM. By Lemma 7.6, we may assume that the successor of the ID (0,0,l) is itself. We define an instance I:=I M as follows:

$$\begin{array}{@{}rcl@{}} S^{I} & = & \{ (0,0,0) \}, \\ R_{1,+}^{I} & = & \{ (i,j) \mid \alpha_{i} = +(1,j) \},\\ R_{1,-}^{I} & = & \{ (i,j,k) \mid \alpha_{i} = -(1,j,k) \},\\ R_{2,+}^{I} & = & \{ (i,j) \mid \alpha_{i} = +(2,j) \},\\ R_{2,-}^{I} & = & \{ (i,j,k) \mid \alpha_{i} = -(2,j,k) \}, \\ {Zero}^{I} & = & \{ 0\}, \\ {Final}^{I} & =& \{ l \},\\ {Succ}^{I} & = & \emptyset,\\ {Anc}^{I} & = & \emptyset,\\ F^{I} & = & \emptyset,\\ B^{I} & = & \emptyset. \end{array} $$

We start by proving the implication from left to right of (14). Assume that the run of M is halting. Let (s n ,t n ,i n ) nω be the run of M. There exists m such that (s m ,t m ,i m )=(0,0,l). Let n 0 be the maximum of the set {s n ,t n ∣0≤nm}. We define an instance J 0 as follows

$$\begin{array}{@{}rcl@{}} S^{J_{0}} & = & \{ (s_{n},t_{n},i_{n}) \mid 0 \leq n \leq m \},\\ R_{1,+}^{J_{0}} & = & R_{1,+}^{I},\\ R_{1,-}^{J_{0}} & = &R_{1,-}^{I} ,\\ R_{2,+}^{J_{0}} & = & R_{2,+}^{I},\\ R_{2,-}^{J_{0}} & = & R_{2,-}^{I}, \\ {Zero}^{J_{0}} & = & \{ 0\}, \\ {Final}^{J_{0}} & =& \{ l \},\\ {Succ}^{J_{0}} & = & \{ (s,s+1) \mid 0 \leq s< n_{0}\},\\ {Anc}^{J_{0}} & = & \{ (s,t) \mid 0 \leq s < t \leq n_{0} \},\\ F^{J_{0}} & = & \{ 1\},\\ B^{J_{0}} & = & \emptyset. \end{array} $$

The query q is false in J 0. Hence, if J 0 is a ⊕-repair, this implies that ⊕-Con(¬q,I,Σ)≠⊤.

We prove that J 0 is a ⊕-repair. It is a simple exercise to check that Σ is true in J 0. Next let K 0 be an instance such that \(K_{0} \vDash {\Sigma }\) and IK 0IJ 0. We have to prove that J 0=K 0.

First, for all relational symbols T∈{R 1,+,R 1,−,R 2,+,R 2,−,Z e r o,F i n a l}, we have T J0=T I. Since IK 0IJ 0, this implies that for all T∈{R 1,+,R 1,−,R 2,+,R 2,−,Z e r o,F i n a l},

$$T^{K_{0}} = T^{I} = T^{J_{0}}. $$

Next, we show that S J0=S K0 and \({Succ}^{J_{0}} = {Succ}^{K_{0}}\). Since \(S^{I} \subseteq S^{J_{0}}\), \({Succ}^{I} \subseteq S^{J_{0}}\) and IK 0IJ 0, we have

$$ S^{K_{0}} \subseteq S^{J_{0}} \text{ and } {Succ}^{K_{0}} \subseteq {Succ}^{J_{0}}. $$
(15)

Next we prove by induction on n that for all 0≤nm,

$$\begin{array}{@{}rcl@{}} && (s_{n},t_{n},i_{n}) \in S^{K_{0}} \text{ and } \\ && \{ (s,s+1) \mid 0 \leq s+1 \leq f(n)\} \subseteq {Succ}^{K_{0}}, \end{array} $$
(16)

where f(n) is the maximum of the set {s j ,t j ∣0≤jn}. Suppose first that n=0. Recall that (s 0,t 0,i 0)=(0,0,0). Since (0,0,0) belongs to \(S^{I} \cap S^{J_{0}}\) and IK 0IJ 0, (0,0,0) also belongs to S K0.

For the induction step, suppose that (16) holds. The ID (s n+1,t n+1,i n+1) is the unique successor of (s n ,t n ,i n ).

  • Suppose that i n is an addition. Without loss of generality, we may assume that i n is of the form +(1,j). Hence, (s n+1,t n+1,i n+1) is equal to (s n +1,t n ,j). First, we show that

    $$ (s,s+1) \mid 0 \leq s+1 \leq f(n+1)\} \subseteq {Succ}^{K_{0}}. $$
    (17)

    Since t n+1=t n and s n+1=s n +1, it follows from the definition of f(n) and f(n+1) that either f(n+1)=f(n) or f(n+1)=s n +1. If f(n+1)=f(n), then (17) follows immediately from the induction hypothesis.

    Suppose next that f(n+1)=s n +1. Let s be such that 0≤s+1≤s n +1. We have to show that (s,s+1) belongs to S u c c K0. If s+1≤s n , then by definition of s n , s+1≤f(n). By induction hypothesis, (s,s+1) belongs to S u c c K0. Next suppose that s+1=s n +1. We have to show that (s n ,s n +1) belongs to S u c c K0.

    By induction hypothesis, S(s n ,t n ,i n ) belongs to K 0. Moreover, by definition of \(R_{1,+}^{J_{0}}\), R 1,+(i n ,j) belongs to J 0. We observed earlier that \(R_{1,+}^{J_{0}} = R_{1,+}^{K_{0}}\). Hence, R 1,+(i n ,j) belongs to K 0. Hence,

    $$S(s_{n},t_{n},i_{n}) \wedge R_{1,+}(i_{n},j) $$

    is true in K 0. The formula \(\phi _{1}^{+}\) given by

    $$S(s,t,i) \wedge R_{1,+}(i,j) \to \exists s^{\prime} {Succ} (s,s^{\prime}), $$

    is true in K 0. Hence, there is s such that S u c c(s n ,s ) belongs to K 0. Recall that \({Succ}^{K_{0}} \subseteq {Succ}^{J_{0}}\). By definition of S u c c J0, \({Succ}^{K_{0}}(s_{n},s^{\prime })\) implies s =s n +1. This means that (s n ,s n +1) belongs to S u c c K0. This finishes the proof of (17)

    Next we prove that S(s n+1,t n+1,i n+1) belongs to K 0. We showed in the previous paragraph that

    $$S(s_{n},t_{n},i_{n}) \wedge R_{1,+}(i_{n},j) \wedge {Succ} (s_{n},s_{n}+1) $$

    is true in K 0. Recall that \(\psi _{1}^{+}\) given by

    $$S(s,t,i) \wedge R_{1,+}(i,j) \wedge {Succ} (s,s^{\prime}) \to S(s^{\prime},t,j) $$

    is true in K 0. It follows that S(s n +1,t n ,j) belongs to K 0. That is, S(s n+1,t n+1,i n+1) belongs to K 0.

  • Suppose that i n is a subtraction. Without loss of generality, we may assume that i n is of the form −(1,j,k).

    Since i n is a subtraction, f(n+1)=f(n). Hence, using the induction hypothesis, we obtain

    $$\{(s,s+1) \mid 0 \leq s+1 \leq f(n+1)\} \subseteq {Succ}^{K_{0}}. $$

    Next we prove that (s n+1,t n+1,i n+1) belongs to K 0. By definition of \(R_{1,-}^{J_{0}}\), R 1,−(i n ,j,k) belongs to J 0. Recall that \(R_{1,-}^{J_{0}} = R_{1,-}^{K_{0}}\). Hence, R 1,−(i n ,j,k) belongs to K 0.

    Suppose first that s n ≠0. Therefore, (s n+1,t n+1,i n+1) is equal to (s n −1,t n ,j). By induction hypothesis, S(s n ,t n ,i n ) belongs to K 0 and (s n −1,s n ) belongs to S u c c K0. It follows that

    $$S(s_{n},t_{n},i_{n}) \wedge R_{1,-} (i_{n},j,k) \wedge {Succ} (s_{n} -1,s+n) $$

    holds in K 0. Since \(\phi _{1}^{-}\) given by

    $$S(s,t,i) \wedge R_{1,-} (i,j,k) \wedge {Succ} (s^{\prime},s) \to S(s^{\prime},t,j) $$

    is true in K 0, S(s n −1,t n ,j) belongs to K 0. That is, (s n+1,t n+1,i n+1) belongs to K 0.

    Next assume that s n =0. Hence, (s n+1,t n+1,i n+1) is equal to (0,t n ,k). By definition of Z e r o J0, Z e r o(0) holds in J 0. Together with \({Zero}^{J_{0}} ={Zero}^{K_{0}}\), we obtain Z e r o K0(0). Therefore,

    $$S(0,t_{n},i_{n}) \wedge R_{1,-}(i_{n},j,k) \wedge {Zero} (0) $$

    holds in K 0. Since ϕ1= given by

    $$S(s,t,i) \wedge R_{1,-} (i,j,k) \wedge Q_{0} (s) \to S(s,t,k) $$

    is true in K 0, S(0,t n ,k) belongs to K 0. That is, (s n+1,t n+1,i n+1) belongs to K 0.

This finishes the proof that (16) holds for all 0≤nm.

It follows from (16) that \(S^{J_{0}} \subseteq S^{K_{0}}\) and \({Succ}^{J_{0}} \subseteq {Succ}^{K_{0}}\). Together with (15), we obtain S J0=S K0 and \({Succ}^{J_{0}} = {Succ}^{K_{0}}\).

Next we show that \({Anc}^{K_{0}} = {Anc}^{J_{0}}\). Recall that \(\phi _{1}^{{Anc}}\) and \(\phi _{2}^{{Anc}}\) are true in K 0. Together with \({Succ}^{J_{0}} = {Succ}^{K_{0}}\), we obtain that \({Anc}^{J_{0}} \subseteq {Anc}^{K_{0}}\). Since A n c I= and IK 0IJ 0, this implies \({Anc}^{J_{0}} = {Anc}^{K_{0}}\).

In order to finish the proof that K 0=J 0, it remains to show that F K0=F J0. We established earlier that S K0=S J0. In particular, (s m ,t m ,i m )=(0,0,l) belongs to S K0. We also proved that \({Zero}^{K_{0}} = {Zero}^{J_{0}} = \{ 0\}\) and \({Final}^{K_{0}} = {Final}^{J_{0}} = \{ l\}\). Hence,

$${Final} (l) \wedge S(0,0,l) \wedge {Zero}(0) \wedge {Zero} (0) $$

holds in K 0. The formula ϕ f given by

$${Final}(j) \wedge S(s,t,j) \wedge {Zero} (s) \wedge {Zero}(t) \to \exists s F(s) $$

is true in K 0. This implies that F K0. Moreover, since \(F^{I} \subseteq F^{J_{0}}\) and IK 0IJ 0, we have \(F^{K_{0}} \subseteq F^{J_{0}}\). Recall that F J0 is a singleton. Together with F K0, we obtain F K0=F J0.

We prove the implication from right to left of (14). Suppose that the consistent answers of q with respect to I and Σ is false. Hence, there is an instance J that is a ⊕-repair of I and such that q is false in J. That is,

$$ J \vDash \neg \exists s {Anc} (s,s) \quad \text{and} \quad F^{J} \neq \emptyset. $$
(18)

We start by proving

$$\begin{array}{@{}rcl@{}} & R^{J}_{i,j} \subseteq R^{I}_{i,j} , \end{array} $$
(19)
$$\begin{array}{@{}rcl@{}} & {Zero}^{J} = {Zero}^{I} = \{ 0 \}, \end{array} $$
(20)
$$\begin{array}{@{}rcl@{}} & {Final}^{J} = {Final}^{I} = \{ l\} \end{array} $$
(21)
$$\begin{array}{@{}rcl@{}} & (0,0,0) \in S^{J} . \end{array} $$
(22)

Inclusion (19) follows from Lemma 7.3. Next we prove (20); that is, Z e r o J=Z e r o I. The inclusion Z e r o JZ e r o I follows from Lemma 7.3. Since Z e r o I={0}, Z e r o J= or Z e r o J=Z e r o I. Suppose for contradiction that Z e r o J=. Let J 3 be the instance such that F J3=, and, for all relational symbols R distinct from F, R J3=R J. Since Z e r o J=, the set Σ remains true in R J3. Moreover, IJ 3IJ. Since J is a ⊕-repair, J 3=J. In particular, F J=. This contradicts (18) and finishes the proof of (20).

The proof that (21) holds is similar to the proof of (20). In order to prove (22), suppose for contradiction that (0,0,0) does not belong to S J. Let J 2 be the instance such that S J2=, F J2= and for all relational symbols R distinct from S and F, R J2=R J. We can check that Σ is true in J 2 and IJ 2IJ. Hence, J=J 2. In particular, F J=. This contradicts (18). Therefore, (0,0,0) belongs to S J. This finishes the proof of (19), (20), (21) and (22).

The proof of the implication from right to left of (14) is basically divided in two parts. First we prove that we may identify the relation S u c c with the successor relation on a well-chosen initial segment of the natural numbers. Second, we show how we may associate the run of M with the facts of the form S(s,t,i).

We start by associating a natural number with each value occurring in S. Let N be the set {s∣∃s ,S J(s,s ) or S J(s ,s)}. We pick a map f:NN∪{⊥} such that for all sN,

  • if there is s such that S u c c J(s,s ), then S u c c J(s,f(s)),

  • otherwise, f(s)=⊥.

We show that there is a smallest natural number n 0 such that f n0+1(0)=⊥ where f n0+1(0) is obtained by applying f (n 0+1) times to 0. Suppose for contradiction that there is no n such that f n(0)=⊥. Since J is finite, there is a sequence s 1,…,s n such that for all 1≤in−1, S u c c I(s i ,s i+1) and S u c c(s n ,s 1). Since \(\phi _{1}^{{Anc}}\) and \(\phi _{2}^{{Anc}}\) are true in J, we obtain that A n c J(s 1,s 1). This contradicts (18). Hence, there is a smallest natural number n 0 such that f n0+1(0)=⊥.

We define U as the set {f n(0)∣0≤nn 0}. Without loss of generality, we may identify the element f n(0) (where 0≤nn 0) with the natural number n. By definition of f and since \(\phi _{1}^{{Anc}}\) and \(\phi _{2}^{{Anc}}\) are true in J, we have

$$\begin{array}{@{}rcl@{}} \{ (s,s+1) \mid 0 \leq s < n_{0}\} & \subseteq & {Succ}^{J} \end{array} $$
(23)
$$\begin{array}{@{}rcl@{}} \{ (s,t) \mid 0 \leq s < t \leq n_{0} \} & \subseteq & {Anc}^{J} \end{array} $$
(24)

Claim 4

  1. (a)

    S J={(s,t,i)∈S J∣0≤s,tn 0},

  2. (b)

    S u c c J={(s,s+1)∣0≤s<n 0},

  3. (c)

    A n c J={(s,t)∣0≤s<tn 0}.

Proof of Claim

Let J 5 be the instance such that S J5 is equal to {(s,t,i)∈S J∣0≤s,tn 0}, S u c c J5 is equal to {(s,s+1)∣0≤s<n 0}, A n c J5 is equal to {(s,t)∣0≤s<tn 0} and for all relational symbols R∉{S,S u c c,A n c}, R J5=R J. In order to prove (a), (b) and (c), we have to show that J=J 5.

First, we show that Σ is true in J 5.

By definition of S u c c J5 and A n c J5, the formulas \(\phi _{1}^{{Anc}}\) and \(\phi _{2}^{{Anc}}\) are true in J 5.

Next, as F J5=F J and F J (see (18)), ϕ f is also true in J 5. Now we prove that \(\phi _{1}^{-}\) is true in J 5. Suppose that S J5(s,t,i), \(R^{J_{5}}_{1,-} (i,j,k)\) and \({Succ}^{J_{5}} (s^{\prime },s)\) hold. We have to show that \(S^{J_{5}} (s^{\prime },t,j)\) holds. By definition of J 5 and by inclusion (23), this implies S J(s,t,i), \(R^{J}_{1,-} (i,j,k)\) and S u c c J(s ,s). Since \(\phi _{1}^{-}\) is true in J, we have S J(s ,t,j). Now, as S J5(s,t,i) and \({Succ}^{J_{5}} (s^{\prime },s)\), s and t are natural numbers between 0 and n 0. Putting that together with the fact that S J(s ,t,j), we obtain \(S^{J_{5}} (s^{\prime },t,j)\). This finishes the proof that \(\phi _{1}^{-}\) is true in J 5. The proofs that ϕ1=, \(\phi _{2}^{-}\), ϕ2=, \(\psi _{1}^{+}\) and \(\psi _{2}^{+}\) are true in J 5 are similar.

Next we prove that \(\phi _{1}^{+}\) is true in J 5. Suppose that S J5(s,t,i) and \(R^{J_{5}}_{1,+} (i,j)\) hold. In particular, s and t belong to {n∣0≤nn 0}. We have to find s such that \({Succ}^{J_{5}} (s,s^{\prime })\). By definition of J 5, S J5(s,t,i) and \(R^{J_{5}}_{1,+} (i,j)\) imply S J(s,t,i) and \(R^{J}_{1,+} (i,j)\). Since \(\phi _{1}^{+}\) is true in J, there exists s such that S u c c J(s,s ). The problem is that there is no guarantee that s is a natural number between 0 and n 0.

Since s has a successor s in J, we know by definition of f and n 0 that s<n 0. Hence, S u c c J5(s,s+1). By (23), this implies S u c c J(s,s+1). We may conclude that \(\phi _{1}^{+}\) is true in J 5. The proof that \(\phi _{2}^{+}\) is true in J 5 is similar. This finishes the proof of the Claim.

It follows from the definition of J 5 and inclusions (23) and (24) that IJ 5IJ. Since J is a ⊕-repair, this means that J=J 5. Hence, (a), (b) and (c) hold. □

The next step is to show how the relation S encodes the run of M. For that purpose, we introduce the notion of reachability. We say that a tuple S(s ,t ,j) in J is 1-step reachable from a tuple S(s,t,i) in J if

  • either \(R^{J}_{1,+} (i,j)\), s =s+1 and t=t ,

  • or \(R^{J}_{2,+}(i,j)\), t =t+1 and s=s ,

  • or \(R^{J}_{1,-}(i,j,k)\) for some k, s =s−1 and t=t ,

  • or \(R^{J}_{1,-} (i,k,j)\) for some k, Z e r o(s), s=s and t=t ,

  • or \(R^{J}_{2,-}(i,j,k)\) for some k, t =t−1 and s=s ,

  • or \(R^{J}_{2,-} (i,k,j)\) for some k, Z e r o(s), s=s and t=t .

Since Z e r o J={0} and \(R^{J}_{i,j} \subseteq R^{I}_{i,j}\) (where i∈{1,2} and j∈{+,−}), we obtain

$$\begin{array}{@{}rcl@{}} && \text{if} S(s^{\prime},t^{\prime},j) \text{is 1-step reachable from} S(s,t,i), \\ && \text{the successor of the ID}\ (s,t,i)\ \text{is}\ (s^{\prime},t^{\prime},i^{\prime}). \end{array} $$
(25)

A tuple S(j,s ,t ) is reachable from a tuple S(s,t,i) if the pair (S(s,t,i),S(j,s ,t )) belongs to the transitive closure of the 1-step reachability relation.

Now let m 0 be the maximum of the set

$$\{ s,t \mid S(s,t,i) \text{ is reachable from } S(0,0,0)\}. $$

We show that

  • S J={S(s,t,i)∣S(s,t,i) is reachable from S(0,0,0)},

  • S u c c J={(s,s+1)∣0≤s<m 0},

  • A n c J={(s,t)∣0≤s<tm 0}.

Let J 6 be the database such that S J6 is equal to {S(s,t,i)∣S(s,t,i) is reachable from S(0,0,0)}, S u c c J6 is equal to {(s,s+1)∣0≤s<m 0}, A n c J6 is equal to {(s,t)∣0≤s<tm 0} and for all relational symbols R∉{S,S u c c,A n c}, R J6=R J.

We can check that Σ remains true in the instance J 6. By definition of J 6, we also have IJ 6IJ. Hence, J=J 6; that is, (i), (ii) and (iii) hold.

Next we prove that there exists s m ,t m and j m such that

$$ S^{J} (s_{m},t_{m},j_{m}), {Zero} (s_{m}), {Zero} (t_{m}) \text{ and } {Final} (j_{m}). $$
(26)

Suppose for contradiction that there are not such s m ,t m and j m . Consider the instance J 7 obtained from J by removing the tuples of the form F J(s). The tgds of Σ remains true in J 7. Moreover, IJ 7IJ. Since J is a ⊕-repair, J=J 7. That is, F J=, which contradicts (18). Therefore, there exists s m ,t m and j m satisfying (26).

We proved earlier that F i n a l J=F i n a l I and Z e r o J=Z e r o I. Hence, s m =t m =0 and j m =l. Together with (26), this means that the fact S(0,0,l) belongs to J. Since J satisfies (i), the tuple S(0,0,l) is reachable from (0,0,0). That is, there is a sequence (s 0,t 0,i 0),…,(s m ,t m ,i m ) such that

  • (s 0,t 0,i 0)=(0,0,0),

  • for all 0≤n<m, S(s n+1,t n+1,i n+1) is 1-step reachable from (s n ,t n ,i n ),

  • (s m ,t m ,i m )=(0,0,l).

It follows from (25) and the hypothesis that the ID (0,0,l) is its unique successor, that the sequence

$$(s_{0},t_{0},i_{0}),</p><p class="noindent">\dots, (s_{m-1},t_{m-1},i_{m-1}), (0,0,l), (0,0,l), \dots $$

is the run of the machine M. Hence, the run of M is halting.

8 Concluding Remarks

In this paper, we carried out a fairly comprehensive investigation of the data complexity of the consistent query answering problem for conjunctive queries with respect to classes of constraints that have played a major role in data exchange and data integration. In the process, we brought into front stage and used extensively the notions of universal repairs and n-universal repairs, n≥1.

One problem left open by our investigation is the data complexity of the superset-consistent answers and of the ⊕-consistent answers for conjunctive queries w.r.t. sets of inclusion dependencies and equality-generating dependencies. As mentioned earlier, this problem has been shown to be undecidable in combined complexity [31]. It would also be interesting to investigate algorithmic aspects concerning the existence of universal repairs. Specifically, what can one say about the complexity of the following decision problem: given a set Σ of dependencies and an instance I, does I have a ⋆-universal repair w.r.t. Σ, where ⋆∈{⊕, subset, superset}? And similarly for n-universal repairs, n≥1. Note that results in [26, 27] already imply that there is a set Σ of tgds for which the problem of checking whether a given instance has a universal superset-repair is undecidable.

Finally, the results presented give rise to challenging complete classification problems that, if resolved, may take the form of a dichotomy or a trichotomy theorems. For example, is it true that for every set Σ of GAV tgds and every conjunctive query q, the ⊕-consistent answers of q are either in Ptime or coNP-complete? Even for just key constraints only partial results towards such a dichotomy theorem have been obtained so far [19, 21, 29, 36, 37].