Keywords

1 Introduction

Embedded dependencies generalize the concept database dependencies within the framework of first-order logic. Their implication is undecidable but however recursively enumerable, thus enabling complete axiomatizations. A standard example of such a proof procedure is the chase that was invented in the late 1970 s [1, 2], and then soon extended to equality and tuple generating dependencies [3]. In this paper we present an axiomatization for the class of embedded dependencies that simulates the chase at the logical level using inclusion dependencies. In particular, completeness of the rules is obtained by constructing deductions in which all the intermediate steps are inclusion dependencies, except for the first and the last step. These inclusion dependencies consist of attributes of which some are new, i.e., such that they are not allowed to appear at any earlier stage of the deduction.

As a background example, consider the combined class of functional and inclusion dependencies. It is well known that the corresponding implication problem is undecidable, lacking hence finite axiomatization [4, 5]. One strategy in such situations has been to search for axiomatizations within a more general class of dependencies, and partly for this reason many different dependency notions were introduced in the 1980 s. For instance, a textbook on dependency theory from 1991 considers more than 80 different dependency classes [6]. In [7] Mitchell proposed another strategy by presenting an axiomatization of functional and inclusion dependencies using a notion of new attributes which should be thought of as implicitly existentially quantified. In this paper we take an analogous approach, and present an axiomatization for embedded dependencies where new attributes correspond to new values obtained from an associated chasing sequence. These attributes can be thought of as implicitly existentially quantified in the sense of team semantics, that is, a semantic framework that has teams, i.e., sets of assignments, as its underlying concept [8]. Team semantics is compositionally applicable to logics that extend first-order logic with various database dependencies [9, 10]. In this setting, inclusion logic, i.e., first-order logic with additional inclusion dependencies, captures the positive fragment of greatest fixed-point logic and hence all \(\mathsf {PTIME}\) recognizable classes of finite, ordered models [1113]. Therefore, inclusion dependencies with new attributes can be thought of as greatest fixed-point logic expressions. This may in part enable succinct intermediate steps in deductions in contrast to axiomatic systems that simulate the chase by composing first-order definable dependencies.

The methods described in this paper generalize the axiomatization of conditional independence and inclusion dependencies presented in [14]. It is also worth noting that extending relations with new attributes reminds of algebraic dependencies, that are, typed embedded dependencies defined in algebraic terms. The complete axiomatization of algebraic dependencies presented in [15] involves also an extension schema that introduces new copies of attributes.

2 Preliminaries

For two sets A and B, we write AB to denote their union, and for two sequences \({\varvec{a}} {\varvec{b}}\), we write \({\varvec{a}} {\varvec{b}}\) to denote their concatenation. For a sequence \({\varvec{a}} =(a_1, \ldots ,a_n)\) and a mapping f, we write \(f({\varvec{a}})\) for \((f(a_1),\ldots ,f(a_n))\). We denote by \(\mathrm {id}\) the identity function and by \(\mathrm {pr}_i\) the function that maps a sequence to its ith projection. For a function f and \(A\subseteq \text {Dom}(f)\), we write \({ \left. f \right| _{A} }\) for the restriction of f to A, and for a set of mappings F, we write \({ \left. F \right| _{A} }\) for \(\{{ \left. f \right| _{A} }: f\in F\}\).

We start by fixing two countably infinite sets \(\mathsf {Val}\) and \(\mathsf {Att}\), the first denoting possible values of relations and the second attributes. For notational convenience, we will assume that \(\mathsf {Val}= \mathsf {Att}\). For \(R\subseteq \mathsf {Att}\), a tuple over R is a mapping \(R \rightarrow \mathsf {Val}\), and a relation over R is a set of tuples over R. We may sometimes write r[R] to denote that r is a relation over R. Values of a relation r over R are denoted by \(\mathsf {Val}(r)\), i.e., \(\mathsf {Val}(r):=\{t(A):t\in r,A\in R\}\). Let f be a valuation, i.e., a mapping \(\mathsf {Val}\rightarrow \mathsf {Val}\). Then for a tuple t, we write \(f(t):=f\circ t\), and for a relation r, \(f(r) :=\{f( t) :t\in r\}\). A valuation f embeds a relation r (a tuple t) to \(r'\) if \(f(r) \subseteq r'\) (\(f(t)\in r\)). Since we are usually interested only valuations of a relation, we say that \(f: \mathsf {Val}(r)\rightarrow \mathsf {Val}\) is a valuation on r. For a valuation f on r, we say that g is an extension of f to another relation \(r'\) if g is a valuation on \(r'\) such that it agrees with f on values of \(\mathsf {Val}(r)\cap \mathsf {Val}(r')\).

Embedded dependencies (ed’s) can be written using first-order logic in the following way.

Definition 1

(Embedded Dependency). Embedded dependency is a first-order sentence of the form

$$\forall x_1, \ldots ,x_n \big ( \phi (x_1, \ldots , x_n) \rightarrow \exists z_1 \ldots \exists z_k\psi (y_1, \ldots ,y_m)\big )$$

where \(\{z_1, \ldots ,z_k\} = \{y_1, \ldots ,y_m\}\setminus \{x_1, \ldots ,x_n\}\) and

  • \(\phi \) is a (possibly empty) conjunction of relational atoms using all of the variables \(x_1, \ldots ,x_n\);

  • \(\psi \) is a conjunction of relational and equality atoms using all of the variables \(z_1, \ldots ,z_k\);

  • there are no equality atoms in \(\psi \) involving existentially quantified variables.

If at most one relation symbol occurs in an ed, then we say that the ed is unirelational, and otherwise it is multirelational. An ed is called typed if there is an assignment of variables to column positions such that variables in relation atoms occur only in their assigned position, and each equality atom involves a pair of variables assigned to the same position. Otherwise we say that an ed is untyped. If \(\psi \) contains only one atom, then we say that the ed is single-head, and otherwise it is multi-head. A single-head ed where \(\psi \) is an equality is called an equality generating dependency (egd). If \(\psi \) is a conjunction of relational atoms, then the ed is called a tuple generating dependency (tgd). For notational simplicity, we restrict attention to unirelational ed’s. It is easy to se that any ed is equivalent to a set of tgd’s and egd’s, and hence we restrict attention to ed’s that belong to either of these subclasses.

The following alternative tableau presentation for egd’s and tgd’s are used in this paper.

Definition 2

Let T and \(T'\) be finite relations over R, and \(x,y\in \mathsf {Val}(T)\). Then \((T,x=y)\) and \((T,T')\) are an egd and a tgd over R, respectively, with the below satisfaction relation for a relation r over \(S\supseteq R\):

  • \(r \models (T, x = y) \Leftrightarrow \) for all valuations f such that \(f(T) \subseteq { \left. r \right| _{R} }\), it holds that \(f(x)=f(y)\).

  • \(r \models (T,T') \Leftrightarrow \) for all valuations f on T such that \(f(T)\subseteq { \left. r \right| _{R} }\), there is an extension g of f to \(T'\) such that \(g(T')\subseteq { \left. r \right| _{R} }\).

Sometimes we write \(\sigma [R]\) to denote that \(\sigma \) is a dependency over R. If T or \(T'\) is a singleton, then we may omit the set braces in the notation, e.g., write (Tt) instead of \((T,\{t\})\).

We also extend valuations to dependencies. For an egd \(\sigma = (T,x=y)\) we write \(\mathsf {Val}(\sigma )=\mathsf {Val}(T)\), and for a tgd \(\tau = (T,T')\) we write \(\mathsf {Val}(\sigma )=\mathsf {Val}(T)\cup \mathsf {Val}(T')\). Moreover, if f is a valuation, then \(f(\sigma )=(f(T),f(x)=f(y))\) and \(f(\tau )=(f(T),f(T'))\).

Example 1

Consider the relation r and the tgd’s \(\sigma _1:= (\{t,t'\},\{u\})\) and \(\sigma _2:= (\{t,t'\},\{v,v'\})\) obtained from Fig. 1.Footnote 1 We notice that there are two valuations on \(\{t,t'\}\) that embed \(\{t,t'\}\) to r, namely \(f:=\{(x,0),(y,1),(z,2)\}\) and \(g:=\{(x,3),(y,0),(z,1)\}\). Then \(r\models \sigma _1\) since f and g embed u into r, witnessed by tuples \(s_2\) and \(s_3\), respectively. We also notice that \(r\not \models \sigma _2\) since, although \(f\cup \{(a,3)\}\) embeds \(\{v,v'\}\) into r, no extension of g does the same.

Fig. 1.
figure 1

Relation r and tgd’s \(\sigma _1,\sigma _2\)

Next we define inclusion dependencies which are examples of possibly untyped tgd’s.

Definition 3

(Inclusion Dependency). Let \(A_1, \ldots ,A_n\) and \(B_1, \ldots ,B_n\) be (not necessarily distinct) tuples of attributes. Then \(A_1 \ldots A_n \subseteq B_1 \ldots B_n\) is an inclusion dependency (ind) over \(R= \{A_i,B_i: i=1, \ldots ,n\}\) with the following semantic rule for a relation r over \(S\supseteq R\):

$$r\models A_1 \ldots A_n \subseteq B_1 \ldots B_n \Leftrightarrow \forall s\in r\exists s'\in r \forall i=1, \ldots ,n: s(A_i)=s'(B_i).$$

The axiomatization presented in the next section involves inclusion dependencies that introduce new attributes. These attributes are here interpreted as existentially quantified in lax team semantics sense [9]:

$$\begin{aligned} r\models \exists A \phi \Leftrightarrow r[f/ A] \models \phi \text { for some } f:r\rightarrow \mathcal {P}(\mathsf {Val})\setminus \{\emptyset \}, \end{aligned}$$
(1)

where \(r[f/ A]:=\{t(x/A) : x\in f(A)\}\) and t(x / A) is the mapping that agrees with t everywhere except that it maps A to x. Interestingly, inclusion logic formulae with this concept of existential quantification can be characterized with positive greatest fixed-point logic formulae (see Theorem 15 in [11]).

3 Axiomatization

In this section we present an axiomatization for the class of all embedded dependencies. The axiomatization contains an identity rule and three rules for the chase. We also involve conjunction in the language and therefore incorporate its usual introduction and elimination rules in the definition. Regarding the equalities that appear in the rules, note that both \(AB\subseteq AA\) and \(AB \subseteq BB\) indicate that the values of A and B coincide in each row. Therefore, we use \(A=B\) to denote ind’s of either form. For a tgd (an egd) \(\sigma \), we say that \(x\in \mathsf {Val}(\sigma )\) is distinct if it appears at most once as a value in \(\sigma \). Namely,

  • for a tgd \(\sigma =(T,T')[R]\), x is distinct if for all \(t,t'\in T\cup T'\) and \(A,B\in R\), if \(t(A)=x=t'(B)\), then \(t=t'\) and \(B=B'\);

  • for an egd \(\sigma =(T,y=z)[R]\), x is distinct if \(x\not \in \{y,z\}\) and for all \(t,t'\in T\) and \(A,B\in R\), if \(t(A)=x=t'(B)\), then \(t=t'\) and \(B=B'\).

Lastly, note that in the following rules we assume that values can appear as attributes and vice versa.

Definition 4

In addition to the below rules we adopt the usual introduction and elimination rules for conjunction. In the last three rules, we assume that \({\varvec{A}}\) is a sequence listing the attributes of R.

  1. EE

    Equality Exchange:

    $$\text {if }A=B \wedge \sigma \text {, then } \tau .$$

    where \(\sigma \) is an ind and \(\tau \) is obtained from \(\sigma \) by replacing any number of occurrences of A by B and any number of occurrences of B by A.

  2. CS

    Chase Start:

    $$(T^*,\mathrm {id})[RS]\wedge \bigwedge _{t\in T} t({\varvec{A}}) \subseteq {\varvec{A}}$$

    where \(T={ \left. T^* \right| _{R} }\), \(S= \mathsf {Val}(T)\) consists of new attributes, and R consists of distinct values.

  3. CR

    Chase Rule:

    $$\text {tgd:}\quad \text {if }( T, T')[R] \wedge \bigwedge _{t\in T}f\circ t({\varvec{A}}) \subseteq {\varvec{A}} \text {, then } \bigwedge _{t'\in T'}f\circ t'({\varvec{A}}) \subseteq {\varvec{A}},$$
    $$\text { egd:}\quad \text {if }(T,x=y)[R] \wedge \bigwedge _{t\in T} f\circ t({\varvec{A}}) \subseteq {\varvec{A}} \text {, then } f(x)=f(y),$$

    where tgd: f is a valuation that it is 1-1 on \(\mathsf {Val}(T')\setminus \mathsf {Val}(T)\), and f(x) is a new attribute for \(x\in \mathsf {Val}(T')\setminus \mathsf {Val}(T)\).

  4. CT

    Chase Termination:

    $$\text { tgd:}\quad \text {if }(T^*,\mathrm {id})[R S]\wedge \bigwedge _{t'\in T'} u\circ t'({\varvec{A}})\subseteq {\varvec{A}} ,\text { then } (T ,T')[ R],$$
    $$\text { egd:}\quad \text {if }(T^*,\mathrm {id})[ R S]\wedge x=y ,\text { then } (T ,x=y)[R],$$

    where \(T={ \left. T^* \right| _{R} }\), \(S= \mathsf {Val}(T)\), and \(\mathsf {Val}({ \left. T^* \right| _{S} })\) consists of distinct values. Moreover, tgd: u is a mapping \(\mathsf {Val}(T')\rightarrow \mathsf {Att}\) that is the identity on \(\mathsf {Val}(T)\cap \mathsf {Val}(T')\), and egd: \(x,y\in \mathsf {Val}(T)\).

For a dependency \(\sigma \) over R, we let \(\mathsf {Att}(\sigma ):=R\), and for a set of dependencies \(\varSigma \), we let \(\mathsf {Att}(\varSigma ):=\bigcup _{\sigma \in \varSigma }\mathsf {Att}(\sigma )\) .

Definition 5

A deduction from \(\varSigma \) is a sequence \((\sigma _1, \ldots ,\sigma _n)\) such that:

  1. 1.

    Each \(\sigma _i\) is either an element of \(\varSigma \), an instance of [CS], or follows from one or more formulae of \(\{\sigma _1, \ldots ,\sigma _{i-1}\}\) by one of the rules presented above.

  2. 2.

    For each \(A\in \mathsf {Att}(\sigma _i)\), if A is new in \(\sigma _i\), then \(A\not \in \mathsf {Att}(\varSigma \cup \{\sigma _1,\ldots ,\sigma _{i-1}\})\), and otherwise \(A\in \mathsf {Att}(\varSigma \cup \{\sigma _1,\ldots ,\sigma _{i-1}\})\).

We say that \(\sigma \) is provable from \(\varSigma \), written \(\varSigma \vdash \sigma \), if there is a deduction \((\sigma _1, \ldots ,\sigma _n)\) from \(\varSigma \) with \(\sigma =\sigma _n\) and such that no attributes in \(\sigma \) are new in \(\sigma _1, \ldots ,\sigma _n\).

We will also use the following rules that are derivable from [EE]:

  1. ES

    Equality Symmetry:

    $$\text {if }A=B\text {, then } B=A.$$
  2. ET

    Equality Transitivity:

    $$\text {if }A = B \wedge B = C \text {, then }A = C.$$

One may find the chase rules slightly convoluted at first sight. However, the ideas behind the rules are relatively simple as illustrated in the following examples.

Example 2

(Chase Start). Let \(\sigma _0:= (\{t_0,t_1\},\{u_0\})[RS]\) be as in Fig. 2, for \(R:=\{A,B,C\}\) and \(S:=\{x,y,z\}\). Then

Fig. 2.
figure 2

Dependencies \(\sigma _0,\sigma _1,\sigma _2\)

$$\begin{aligned} \tau :=\sigma _0\wedge xyz\subseteq ABC \wedge xy\subseteq BC \end{aligned}$$

is an instance of [CS]. Here xyz are interpreted either as values or as new attributes. By the latter we intuitively mean that any relation r[ABC] can be extended to some \(r'[ABCxyz]\) such that \(r' \models \tau \). For instance, one can define \(r':=q(r)\) where q is the following SPJR query

$$ ABC \bowtie (\pi _{xyz}(\sigma _{xy=BC}(\rho _{xyz/ABC}(ABC)\bowtie ABC)))$$

where \(\sigma \) refers to (S)election, \(\pi \) to (P)rojection, \(\bowtie \) to (J)oin, and \(\rho \) to (R)ename operator. Then q(r) is a relation over RS such that its restriction to xyz lists all abc for which there exist \(s,s'\in r\) such that \(s(ABC)=abc\) and \(s'(BC)=ab\). Let \(\sigma _1=(\{t_0,t_1\},\{u_1\})[R]\) be as in Fig. 2. Now,

$$\begin{aligned} r\models \sigma _1 \Leftrightarrow q(r)\models zx \subseteq AC. \end{aligned}$$

Hence proving \(\varSigma \models \sigma _1\) reduces to showing that \(\varSigma \cup \{\tau \} \models zx\subseteq AC\).

Example 3

(Chase Rule). Assume

$$\begin{aligned} \sigma _2 \wedge xyz \subseteq ABC\wedge xy \subseteq BC \end{aligned}$$
(2)

where \(\sigma _2= (\{t_0,t_1\},\{u_2,u_3\})[R]\) is as in Fig. 2, for \(R:=\{A,B,C\}\). Then, interpreting f as \(\mathrm {id}\), one can derive with one application of [CR]

$$\begin{aligned} zv \subseteq AC \wedge vz \subseteq AC \end{aligned}$$
(3)

from (2). Note that in (3) v is interpreted as a new attribute, and the idea is that any relation r[R] satisfying (2) and with \(v\not \in R\) can be extended to a relation \(r'[R\cup \{v\}]\) satisfying (3) by introducing suitable values for v.

Example 4

(Chase Termination). Assume

$$\begin{aligned} \sigma _0 \wedge zx\subseteq AC \end{aligned}$$
(4)

where \(\sigma _0= (\{t_0,t_1\},\{u_0\})[RS]\) is as in Fig. 2, for \(R:=\{A,B,C\}\) and \(S:=\{x,y,z\}\). Then, letting \(u=\mathrm {id}\), one can derive \(\sigma _1\) as in Fig. 2 from (4) with one application of [CT].

4 Soundness Theorem

In this section we show that the axiomatization presented in the previous section is sound. First note that the next lemma follows from the definitions of egd’s, tgd’s and ind’s.

Lemma 1

Let \(\sigma \) be a dependency over R, and let r and \(r'\) be relations over supersets of R and with \({ \left. r \right| _{R} }={ \left. r' \right| _{R} }\). Then \(r\models \sigma \Leftrightarrow r'\models \sigma \).

Then we prove the following lemma which implies soundness of the axioms. For attribute sets \(R,R'\) with \(R\subseteq R'\) and a relation r over R, we say that a relation \(r'\) over \(R'\) is an extension of r to \(R'\) if \({ \left. r' \right| _{R} }= r\). Recall from Eq. 1 that exactly such extensions are used in the existential quantification of lax team semantics.

Lemma 2

Let r be a relation over \(\mathsf {Att}(\varSigma )\) such that \(r\models \varSigma \), and let \((\sigma _1, \ldots ,\sigma _n)\) be a deduction from \(\varSigma \). Then there exists an extension \(r'\) of r to \(\mathsf {Att}(\varSigma \cup \{\sigma _1, \ldots ,\sigma _n\})\) such that \(r'\models \varSigma \cup \{\sigma _1, \ldots ,\sigma _n\}\).

Proof

We prove the claim by induction on n. We denote by \(R_n\) the set \(\mathsf {Att}(\varSigma \cup \{\sigma _1, \ldots ,\sigma _n\})\). Assuming the claim for \(n-1\), we first find an extension \(r_{n-1}\) of r to \(R_{n-1}\) such that \(r_{n-1}\models \varSigma \cup \{\sigma _1, \ldots ,\sigma _{n-1}\}\). If \(\sigma _n\) is obtained by an application of a conjunction or some ind rule, then it is easy to see that we may choose \(r_n:=r_{n-1}\). Hence, it suffices to consider the cases where \(\sigma _n\) is obtained by using one of the chase rules. Due to Lemma 1, it suffices to find an extension \(r_n\) of \(r_{n-1}\) to \(R_n\) such that \(r_n\models \sigma _n\). In the following cases, \({\varvec{A}}\) denotes a sequence listing the attributes of \(R\subseteq R_{n-1}\).

Case [CS]. Assume that \(\sigma _n\) is obtained by [CS] and is of the form

$$(T^*,\mathrm {id})[RS]\wedge \bigwedge _{t\in T} t({\varvec{A}}) \subseteq {\varvec{A}}$$

where \(T={ \left. T^* \right| _{R} }\), \(S=\mathsf {Val}(T)\) consists of new attributes and R of distinct values. Let \(r_n:= r_{n-1}\bowtie r\) be an extension of \(r_{n-1} \) to \(R_n=R_{n-1} S\), where

$$r:= \{ h: h\text { is a valuation on }T \text { such that } h(T) \subseteq { \left. r_{n-1} \right| _{R} }\}.$$

We claim that \(r_n\models \sigma _{n}\). Consider the first conjunct of \(\sigma _n\), and let h be a valuation on \(T^*\) such that \(h(T^*) \subseteq { \left. r_n \right| _{RS} }\). Then \({ \left. h \right| _{S} }\) is is a valuation on T such that \(h(T) \subseteq { \left. r_{n} \right| _{R} }={ \left. r_{n-1} \right| _{R} }\), i.e., \({ \left. h \right| _{S} }={ \left. t_0 \right| _{S} }\) for some \(t_0\in r_n\). Since R consists of distinct values and thus \(R\cap \text {Dom}(h) =\emptyset \), we may define \(h'\) as an extension of h with \(A\mapsto t_0(A)\), for \(A\in R\). Then \({ \left. h' \right| _{RS} }={ \left. t_0 \right| _{RS} }\in { \left. r_n \right| _{RS} }\), and therefore \(r_n \models (T^*,\mathrm {id})[RS]\).

Consider then \( t({\varvec{A}}) \subseteq {\varvec{A}}\), for \(t\in T\), and let \(t_0\in r_n\). By the definition, \({ \left. t_0 \right| _{S} }= h\) for some valuation h on T such that \(h(T) \subseteq { \left. r_{n} \right| _{R} }\), and hence we obtain that for some \(t_1\in r_n\). Therefore, \(r_n\models t({\varvec{A}}) \subseteq {\varvec{A}}\).

Case [CR]. Assume that \(\sigma _n\) is of the form (i) \(\bigwedge _{t'\in T'}f\circ t'({\varvec{A}}) \subseteq {\varvec{A}}\) or (ii) \(f(x)=f(y)\), and is obtained by [CR] from

  1. (i)

    \(( T, T')[R] \wedge \bigwedge _{t\in T}f\circ t({\varvec{A}}) \subseteq {\varvec{A}}, \)

  2. (ii)

    \((T,x=y)[R] \wedge \bigwedge _{t\in T} f\circ t({\varvec{A}}) \subseteq {\varvec{A}}, \)

where in case (ii) f is a valuation on \(T\cup T'\) such that it is 1-1 on \(S:=\mathsf {Val}(T')\setminus \mathsf {Val}(T)\) and f(x) is a new attribute for \(x\in S\). Let \(s\in r_{n-1}\). Since \(r_{n-1}\models \bigwedge _{t\in T}f\circ t({\varvec{A}}) \subseteq {\varvec{A}} \), we first obtain that \(s\circ f (T)\subseteq { \left. r_{n-1} \right| _{R} }\).

  1. (i)

    Since \(r_{n-1} \models ( T, T')[R]\) we find a mapping \(g:S\rightarrow \mathsf {Val}\) such that \(h(T')\subseteq { \left. r_{n-1} \right| _{R} }\), for \(h= g\cup (s\circ f)\). Since f is 1-1 on S, we can now define \(r_n\) as the relation obtained from \(r_{n-1}\) by extending each \(s\in r_{n-1}\) with \(f(x) \mapsto g(x)\) for \(x \in S\). Then for each \(s\in r_n\), \(s\circ f(T')\subseteq { \left. r_n \right| _{R} }\), and hence we obtain that \(r_n \models \bigwedge _{t'\in T'} f\circ t'({\varvec{A}}) \subseteq {\varvec{A}}.\)

  2. (ii)

    It suffices to show that \(r_{n-1}\models f(x)=f(y)\). Since \(s\circ f (x)=s\circ f(y)\) by \(r_{n-1} \models (T,x=y)[R]\), this follows immediately.

Case [CT]. Assume that \(\sigma _n\) is of the form (i) \((T ,T')[ R]\) or (ii) \((T ,x=y)[ R]\) and is obtained by [CT] from

  1. (i)

    \((T^*,\mathrm {id})[R S]\wedge \bigwedge _{t'\in T'} u\circ t'({\varvec{A}})\subseteq {\varvec{A}},\) where u is a mapping \(\mathsf {Val}(T')\rightarrow \mathsf {Att}\) that is the identity on \(\mathsf {Val}(T)\cap \mathsf {Val}(T')\),

  2. (ii)

    \((T^*,\mathrm {id})[ R S]\wedge x=y,\) where \(x,y\in \mathsf {Val}(T)\).

Moreover, in both cases \(T={ \left. T^* \right| _{R} }\), \(S= \mathsf {Val}(T)\), and \(\mathsf {Val}({ \left. T^* \right| _{S} })\) consists of distinct values. It suffices to show that \(r_{n-1}\models \sigma _n\), so let h be a valuation on T such that \(h(T)\subseteq { \left. r_{n-1} \right| _{R} }\). Since \(\mathsf {Val}({ \left. T^* \right| _{S} })\) consists of disctinct values, h can be extended to a valuation \(h'\) on \(T^*\) such that \(h'(T^*)\subseteq { \left. r_{n-1} \right| _{RS} }\). Since \(r_{n-1}\models (T^*,\mathrm {id})[R S]\), there is an extension \(h''\) of \(h'\) to attributes in R such that \({ \left. h'' \right| _{RS} } \in { \left. r_{n-1} \right| _{RS} }\). Hence, we obtain that \({ \left. h \right| _{S} } \in { \left. r_{n-1} \right| _{S} }\). Let then \(s\in r_{n-1}\) be such that it agrees with h on S.

  1. (i)

    Since \( r_{n-1}\models \bigwedge _{t'\in T'} u\circ t'({\varvec{A}})\subseteq {\varvec{A}}\), we obtain that \(s\circ u(T') \subseteq { \left. r_{n-1} \right| _{R} }\). Moreover, we notice that \(s\circ u = h\) on \(\mathsf {Val}(T) \cap \mathsf {Val}(T')\).

  2. (ii)

    Since \( r_{n-1}\models x=y\), we obtain that \(s(x)=s(y)\). Then \(h(x)=h(y)\) since \(x,y\in S\).

Hence, in both cases we obtain that \(r_{n-1}\models \sigma _n\). This concludes the [CT] case and the proof.   \(\square \)

Using the previous lemma, soundness of the rules follows.

Theorem 1

Let \(\varSigma \cup \{\sigma \}\) be a finite set of egd’s and tgd’s over R. Then \(\varSigma \models \sigma \) if \(\varSigma \vdash \sigma \).

Proof

Let r be a relation such that \(r\models \varSigma \), and assume that \((\sigma _1, \ldots ,\sigma _n)\) is a deduction from \(\varSigma \) where \(\sigma =\sigma _n\) contains no attributes that appear as new in \(\sigma _1, \ldots ,\sigma _n\). If \(R':=\mathsf {Att}(\varSigma \cup \{\sigma _1, \ldots ,\sigma _n\})\), then by Lemma 2 we find an extension \(r'\) of \({ \left. r \right| _{R} }\) to \(R'\) such that \(r'\models \sigma \). Then using Lemma 1 we obtain that \(r\models \sigma \).   \(\square \)

5 Chase Revisited

In this section we define the chase for the class of egd’s and tgd’s. The chase algorithm was generalized to typed egd’s and tgd’s in [3], and here we present the chase using notation similar to that in [16]. First let us assume, for notational convenience, that there is a total, well-founded order \(<\) on the set \(\mathsf {Val}\), e.g., \(x_1<x_2<x_3<\ldots \) for \(\mathsf {Val}=\{x_1, x_2, x_3,\ldots \}\). Let \(\varSigma \cup \{\sigma \}\) be a set of egd’s and tgd’s over R. A chasing sequence of \(\sigma \) over \(\varSigma \) is a (possibly infinite) sequence \(\sigma _0,\sigma _1, \ldots ,\sigma _n,\ldots \) where \(\sigma _0=\sigma \), and \(\sigma _{n+1}\) is obtained from \(\sigma _n\), with \(T:=\mathrm {pr}_1(\sigma _n)\), according to either of the following rules.

Let \(\tau \in \varSigma \) be of the form \((S,x=y)\), and suppose that there is a valuation f on S such that \(f(S)\subseteq T\) but \(f(x) \ne f(y)\). Then \(\tau \) (and f) can be applied to \(\sigma _n\) as follows:

  • egd rule: Let \(\sigma _{n+1}:= g(\sigma _n)\) where \(g: \mathsf {Val}\rightarrow \mathsf {Val}\) is the identity everywhere except that it maps f(y) to f(x) if \(f(x) < f(y)\), and f(x) to f(y) if \(f(y) < f(x)\).

Let \(\tau \in \varSigma \) be of the form \((S, S')\), and suppose that there is a valuation f on S such that \(f(S) \subseteq T\), but there exists no extension \(f'\) of f to \(S'\) such that \(f(S')\subseteq T\). Then \(\tau \) can be applied to \(\sigma _n\) as follows:

  • tgd rule: List all \(f_1, \ldots , f_n\) that have the above property, and for each \(f_i\) choose a distinct extension to \(S'\), i.e., an extension \(f'_i\) to \(S'\) such that each variable in \(\mathsf {Val}(S') \setminus \mathsf {Val}(S)\) is assigned a distinct new value greater than any value in \(\mathsf {Val}(\sigma _0)\cup \ldots \cup \mathsf {Val}(\sigma _n)\). Moreover, no new value is assigned by two \(f'_i,f'_j\) where \(i \ne j\). Then we let \(\sigma _{n+1}:( T \cup f'_1(S')\cup \ldots \cup f'_m(S'),\mathrm {pr}_2(\sigma _{n}))\).

Construction of a chasing sequence is restricted with the following two conditions:

  1. (i)

    Whenever an egd is applied, it is applied repeatedly until it is no longer applicable.

  2. (ii)

    No dependency is starved, i.e., each dependency that is applicable infinitely many times is applied infinitely many times.

Let \(\overline{(\varSigma ,\sigma )}=\sigma _0, \sigma _1,\ldots \) be a chasing sequence of \(\sigma \) over \(\varSigma \). Due to the possibility of applying egd’s, a chasing sequence may not be monotone with respect to \(\subseteq \). Hence, depending on whether \(\sigma \) is a tgd or an egd, we define

  • egd: \(\text {chase}\overline{(\varSigma ,\sigma )}:=(T^1,x=y)\),

  • tgd: \(\text {chase}\overline{(\varSigma ,\sigma )}:=(T^1,T^2)\),

where \(T^i:= \{u:\exists m \forall n\ge m (u\in \mathrm {pr}_i(\sigma _n))\}\) and \(x=y\) is \(\mathrm {pr}_2(\sigma _n)\) for \(n\in \mathbb {N}\) such that \(\mathrm {pr}_2(\sigma _n)=\mathrm {pr}_2(\sigma _m)\) for all \(m\ge n\). Note that “newer” values introduced by the tgd rule are always greater than the “older” ones, and values may only be replaced with smaller ones. Hence, no value can change infinitely often, and therefore \(\text {chase}\overline{(\varSigma ,\sigma )}\) is always well defined and non-empty.

We also associate each chasing sequence with the following descending valuations \(\rho _n\), for \(n \ge 0\). We let \(\rho _0=\mathrm {id}\), \(\rho _{n+1}= g\circ \rho _n\) if \(\sigma _{n+1}\) is obtained by an application of the egd rule where \(\sigma _{n+1}=g(\sigma _{n})\), and \(\rho _{n+1}= \mathrm {id}\circ \rho _n\) otherwise. We then define \(\rho (x) = \lim _{n\rightarrow \infty } \rho _n(x)\), i.e., \(\rho (x) = \rho _n(x)\) if \(n\in \mathbb {N}\) such that \(\rho _{m}(x) =\rho _n(x)\) for all \(m\ge n\). Then we obtain that

$$\text {chase}\overline{(\varSigma ,\sigma )} =\bigcup _{n=0}^{\infty }\rho (\sigma _n).$$

A dependency \(\tau \) is trivial if

  • \(\tau \) is of the form \((T,x=x)\), or

  • \(\tau \) is of the form \((T,T')\) and there is a valuation f on \(T'\) such that f is the identity on \(\mathsf {Val}(T)\cap \mathsf {Val}(T')\) and \(f(T') \subseteq T\).

It is well-known that the chase algorithm captures unrestricted implication of dependencies. For the proof of the following proposition, see Appendix of the arXiv version of the paper [17].

Proposition 1

Let \(\varSigma \cup \{\sigma \}\) be a set of egd’s and tgd’s over R. Then the following are equivalent:

  1. (i)

    \(\varSigma \models \sigma \),

  2. (ii)

    there is a chasing sequence \(\overline{(\varSigma ,\sigma )}=\sigma _0, \sigma _1,\ldots \) of \(\sigma \) over \(\varSigma \) such that \(\text {chase}\overline{(\varSigma ,\sigma )}\) is trivial,

  3. (iii)

    there is a chasing sequence \(\overline{(\varSigma ,\sigma )}=\sigma _0, \sigma _1,\ldots \) of \(\sigma \) over \(\varSigma \) such that \(\sigma _n\) is trivial, for some n.

6 Completeness Theorem

In this section we show that the rules presented in Definition 4 are complete for the implication problem of embedded dependencies. Let us first illustrate the use of the axioms in the following simple example.

Example 5

Consider the implication problem \(\{\sigma ,\sigma '\}\models \tau \) where \(\sigma ,\sigma ',\tau \) are illustrated in Fig. 3, e.g., \(\sigma =(T,t)\) where T consists of the top two rows of \(\sigma \) and t is the bottom row. Note that \(\sigma \) and \(\tau \) are embedded multivalued dependencies of the form \(A\twoheadrightarrow B|C\) and \(A \twoheadrightarrow B|CD\), respectively, and \(\sigma '\) is a functional dependency of the form \(C\rightarrow D\). It is easy to see that the implication holds, and this can be also verified by a chasing sequence \(\tau _0,\tau _1,\tau _2\) of \(\tau \) over \(\{\sigma ,\sigma '\}\) where \(\tau _2\) is trivial (Fig. 4). In the chasing sequence, \(\tau _0=\tau \) and \(\tau _1\) is the result of applying \(\sigma \) to \(\tau _0\). For this, note that there exists two valuations on T that embed T to \(\mathrm {pr}_1(\tau _0)\) but has no extension that embeds t into \(\mathrm {pr}_1(\tau _0)\). These valuations are the identity and the function f that swaps the values of the top and bottom row of T. Then \(\tau _1\) is obtained by adding to \(\mathrm {pr}_1(\tau _0)\) \(\mathrm {id}^*(t)\) and \(f^*(t)\) where \(\mathrm {id}^*\) and \(f^*\) are distinct extensions of \(\mathrm {id}\) and f to t, e.g., \(\mathrm {id}^*=\mathrm {id}\) also on \(d_2\) and \(f^*\) maps \(d_2\) to \(d_3\). Also, \(\tau _2\) is the result of applying \(\sigma '\) to \(\tau _1\) two times, i.e., \(\tau _2\) is obtained from \(\tau _1\) by replacing \(d_3\) with \(d_0\) and \(d_2\) with \(d_1\). Clearly \(\tau _2\) is trivial, and hence we obtain the claim by Proposition 1.

Fig. 3.
figure 3

Dependencies \(\sigma ,\sigma ',\tau \)

Fig. 4.
figure 4

Chasing sequence \(\tau _0,\tau _1,\tau _2\)

Fig. 5.
figure 5

 \((T,\mathrm {id})[RS]\)

This procedure can now be simulated with our axioms as follows. First, with one application of [CS] we derive

$$(T,\mathrm {id})[RS] \wedge a_0b_0c_0d_0\subseteq ABCD \wedge a_0b_1c_1d_1\subseteq ABCD$$

where \(T=\{t,t'\}\), \(R=\{A,B,C,D\}\), and \(S=\{a_0,b_0,b_1,c_0,c_1,d_0,d_1\}\) is a set of values that are interpreted as new attributes. Here t(x) and \(t'(x)\), for \(x\in S\), and ABCD are interpreted as distinct values. (Tt)[RS] is illustrated in Fig. 5 where all the distinct values are hidden. Now with one application of [CR], letting \(f= \mathrm {id}\), we derive \(a_0b_0c_1d_2\subseteq ABCD\) from

$$\begin{aligned} \sigma \wedge a_0b_0c_0d_0\subseteq ABCD \wedge a_0b_1c_1d_1\subseteq ABCD \end{aligned}$$
(5)

Note that in this step, \(d_2\) is interpreted as a new attribute. Let then f be the valuation that is the identity on \(a_0,b_0,b_1,d_1\), and otherwise maps \(a_1\mapsto a_0\), \(c_0\mapsto c_1\), and \(d_0\mapsto d_2\). We notice that \(f(a_0b_0c_0d_0)=a_0b_0c_1d_2\) and \(f(a_1b_1c_0d_1)=a_0b_1c_1d_1\). Hence, we may derive with one application of [CR] \(f(d_0)=f(d_1)\), i.e., \(d_2=d_1\) from

$$\begin{aligned} \sigma ' \wedge f(a_0b_0c_0d_0)\subseteq ABCD \wedge f(a_1b_1c_0d_1)\subseteq ABCD. \end{aligned}$$

Then we apply [EE] and derive \(a_0b_0c_1d_1\subseteq ABCD\) from

$$d_2=d_1 \wedge a_0b_0c_1d_2\subseteq ABCD$$

Finally, we may apply [CT] and derive \(\tau \) from \((T,\mathrm {id})[RS] \wedge a_0b_0c_1d_1\subseteq ABCD.\)

The following lemma shows that the above technique extends to all chasing sequences. The proof is straightforward and hence omitted here (see Appendix of the arXiv version of the paper [17]).

Lemma 3

Let \(\overline{(\varSigma ,\sigma )}=\sigma _0,\sigma _1,\ldots \) be a chasing sequence of \(\sigma \) over \(\varSigma \), where \(\varSigma \cup \{\sigma \}\) is a finite set of egd’s and tgd’s over R, let \({\varvec{A}}\) be a sequence listing the attributes of R, let \(T:=\mathrm {pr}_1(\sigma )\) and \(T_i := \mathrm {pr}_1(\sigma _i)\), and let \(n\in \mathbb {N}\). Then there exists a deduction from \(\varSigma \), with attributes from \(R\cup \bigcup _{i\in \mathbb {N}} \mathsf {Val}(T_i)\), listing the following dependencies:

  1. (i)

    \((T^*,\mathrm {id})[RS]\) where \({ \left. T^* \right| _{R} }=T\), \(S=\mathsf {Val}(T)\), and \( { \left. T^* \right| _{S} }\) consists of distinct values,

  2. (ii)

    \(f(x)=f(y)\), for each application of \((S,x=y)\) and f to \(\sigma _m\), for \(m < n\),

  3. (iii)

    \( t({\varvec{A}}) \subseteq {\varvec{A}}\), for \(t \in T_m\) where \(m\le n\).

With the lemma, we can now show completeness.

Theorem 2

Let \(\varSigma \cup \{\sigma \}\) be a finite set of egd’s and tgd’s over R. Then \(\varSigma \models \sigma \Leftrightarrow \varSigma \vdash \sigma \).

Proof

Assume that \(\varSigma \models \sigma \), and let \({\varvec{A}}\) be a sequence listing R. Then by Proposition 1 there is a chasing sequence \(\overline{(\varSigma ,\sigma )}=\sigma _0, \sigma _1, \ldots \) of \(\sigma \) over \(\varSigma \) such that \(\sigma _n\) is trivial for some n. Let \(D=(\tau _1, \ldots ,\tau _l)\) be a deduction from \(\varSigma \) obtained by Lemma 3, and let \(T:=\mathrm {pr}_1(\sigma )\) and \(T_i:=\mathrm {pr}_1(\sigma _i)\).

Assume first that \(\sigma \) is an egd of the form \((T,x=y)\). Then \(\sigma _n\) is \((T_n,z=z)\) where \(z=\rho _n(x)=\rho _n(y)\). Now, either \(\rho _{i+1}(x)\) is \(\rho _i(x)\), or the equality \(\rho _{i+1}(x)=\rho _i(x)\) (or its reverse) is listed in D by item (ii). Hence, using repeatedly [ES,ET] we may further on derive \(z=x\). Since \(z=y\) is derivable analogously, we therefore obtain \(x=y\) by [ES,ET]. Then with one application of [CT], we derive \((T,x=y)\) from \((T^*,\mathrm {id})[RS] \wedge x=y\) where \({ \left. T^* \right| _{R} }=T\). Note that the \((T^*,\mathrm {id})[RS]\) of the correct form is listed in D by item (i) of Lemma 3.

Assume then that \(\sigma \) is a tgd of the form \((T,T')\), and let \( T'_i:= \mathrm {pr}_2(\sigma _i)\). Then \(\sigma _n\) is \((T_n,T'_n)\), and there is a valuation f on \(T_n'\) such that f is the identity on \(\mathsf {Val}(T_n)\cap \mathsf {Val}(T'_n)\) and \(f(T'_n)\subseteq T_n\). Let \(t'\in T'\). Then \(\rho _n\circ t'\in T'_n\) and by item (iii) of Lemma 3 we obtain that \(f\circ \rho _n\circ t'({\varvec{A}} )\subseteq {\varvec{A}}\) is listed in D. For \(A\in R\), we have then two cases :

  • If \(t'(A) \in \mathsf {Val}(T')\cap \mathsf {Val}(T)\), then we first notice that \(f\circ \rho _n\circ t'(A)\) is \(\rho _n\circ t'(A)\) since \(\rho _n\circ t'(A) \in \mathsf {Val}(T'_n)\cap \mathsf {Val}(T_n)\). Also we notice that the equality \(\rho _n\circ t'(A)=t'(A)\) can be derived analogously to the egd case.

  • If \(t'(A) \in \mathsf {Val}(T')\setminus \mathsf {Val}(T)\), then \(f\circ \rho _n \circ t' (A)= f\circ t'(A)\) since by the definition of the chase \(\rho _n\) is the identity on \(\mathsf {Val}(T')\setminus \mathsf {Val}(T)\).

Now, letting \(f^*\) be the mapping \(\mathsf {Val}(T')\rightarrow \mathsf {Att}\) which is the identity on \(\mathsf {Val}(T')\cap \mathsf {Val}(T)\) and agrees with f on \(\mathsf {Val}(T')\setminus \mathsf {Val}(T)\), we can by the previous reasoning and using repeatedly [EE] derive \(f^*\circ t'({\varvec{A}}) \subseteq {\varvec{A}}\) from \(f\circ \rho _n \circ t' ({\varvec{A}}) \subseteq {\varvec{A}}\). Finally, we can then with one application of [CT] derive \((T,T')\) from

$$(T^*,\mathrm {id})[RS] \wedge \bigwedge _{t'\in T'} f^*\circ t'({\varvec{A}}) \subseteq {\varvec{A}}.$$

   \(\square \)

7 Typed Dependencies

Consider then the class of typed embedded dependencies. In this setting [CS] and [CT] can be replaced with rules that involve only embedded join dependencies (ejd’s) and inclusion dependencies. We define ejd’s over tuples of attributes as follows.

Definition 6

Let \({\varvec{A}}_1, \ldots ,{\varvec{A}}_n\) be tuples of attributes listing \(R_1, \ldots ,R_n\), respectively, and let \(R:= \bigcup _{i=1}^n R_i\). Then \(\bowtie ({\varvec{A}}_i)_{i=1}^n\) is an embedded join dependency with the semantic rule

  • \(r \models \bowtie ({\varvec{A}}_i)_{i=1}^n\) if and only if \({ \left. r \right| _{ {\varvec{R}}} } = { \left. r \right| _{ {\varvec{R}}_1} }\bowtie \ldots \bowtie { \left. r \right| _{{\varvec{R}}_n} }\).

The two alternative rules for the chase are now the following. We call a relation typed if none of its values appears in two distinct columns.

  1. CS*

    Chase Start\(^*\):

    $$\bigwedge _{t\in T }{\varvec{A}} \subseteq t({\varvec{A}}) \wedge \bowtie (t({\varvec{A}}))_{t\in T} \wedge \bigwedge _{t\in T } t({\varvec{A}}) \subseteq {\varvec{A}} $$

    where T is a typed relation and \(\mathsf {Val}(T)\) is a set of new attributes.

  2. CT*

    Chase Termination\(^*\):

    $$tgd: \text {if } \bigwedge _{t\in T} {\varvec{A}}\subseteq t({\varvec{A}}) \wedge \bowtie (t({\varvec{A}}))_{t\in T} \wedge \bigwedge _{t'\in T'}u\circ t'({\varvec{A}}) \subseteq {\varvec{A}}, \text { then }(T,T')[R], $$
    $$egd: \text {if } \bigwedge _{t\in T} {\varvec{A}}\subseteq t({\varvec{A}}) \wedge \bowtie (t({\varvec{A}}))_{t\in T} \wedge x=y, \text { then }(T,x=y)[R], $$

    where tgd: u is a mapping \(\mathsf {Val}(T')\rightarrow \mathsf {Att}\) that is the identity on \(\mathsf {Val}(T')\cap \mathsf {Val}(T')\), and egd: \(x,y\in \mathsf {Val}(T)\).

The first rule is sound for typed dependencies since, for arbitrary r with \(\text {Dom}(r)\cap \mathsf {Val}(T) =\emptyset \), an instance of [CS*] is satisfied by \(r\bowtie q(r)\) where q is the SPJR query

$$ \rho _{t_1({\varvec{A}})/{\varvec{A}}} {\varvec{A}} \bowtie \ldots \bowtie \rho _{t_n({\varvec{A}})/{\varvec{A}}} {\varvec{A}},$$

where \(\rho \) is the rename operator and \(T=\{t_1, \ldots ,t_n\}\). However, a counter example for soundness can be easily constructed for untyped dependencies. If T and r are the relations illustrated in Fig. 6, then no extension \(r'\) of r to \(\mathsf {Val}(T)\) satisfies \(\bigwedge _{t\in T} t(AB)\subseteq AB\).

Fig. 6.
figure 6

Relations T and r

Soundness of [CT*] is obtained analogously to that of [CT]. Also, completeness is obtained by deriving exactly in the same way as in the general case, \(\bigwedge _{t'\in T'}u\circ t'({\varvec{A}}) \subseteq {\varvec{A}}\) (in the tgd case) or \(x=y\) (in the egd case) from \(\bigwedge _{t\in T} t({\varvec{A}}) \subseteq {\varvec{A}}\). Let us then write \(\varSigma \vdash ^*\sigma \) if \(\sigma \) is deduced from \(\varSigma \) in the sense of Definition 5 and using rules [EE,CS*,CR,CT*] together with elimination and introduction of conjunction. Then we obtain the following theorem.

Theorem 3

Let \(\varSigma \cup \{\sigma \}\) be a finite set of typed egd’s and tgd’s over R. Then \(\varSigma \models \sigma \Leftrightarrow \varSigma \vdash ^*\sigma \).