Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Over time, databases undergo two kinds of changes: structural changes (i.e., changes in the schema), known also as restructuring or schema evolution, and data changes (i.e., insertion, deletions and modifications of tuples). Data changes are usually referred to as updates. In the same way, one may talk about selection queries or simply queries (non-modification queries) and updates.

Non-modification query usually must be answered lots of time. That is why, as a rule, such queries are maintained as auxiliary relations, called in the context of databases as materialized views. Non-materialized views are called virtual views and, as a rule, are not updatable. We consider only materialized views in the paper. Moreover, in this paper, we consider only relational databases.

In this paper, we are concentrated on two problems: handling queries under restructuring of databases and under database updates.

Handling Queries Under Restructuring of Databases: Database, during its life cycle, may be restructured several times. At the same time, we have several application programs, oriented on some specific generation of the database. The problem under investigation is:

Given: There are two different generations of the same database, g and \(g+1\). There is an application, running on the \(g^{th}\) generation: \(Q_g\).

Find: An application \(Q_{g+1}\), running on the \((g+1)^{th}\) generation with the same results.

Let us consider a toy example. The \(g^{th}\) generation of the database contains only one relation P, while the \((g+1)^{th}\) generation contains two relations R and S, such that \(P=(R\bowtie S)\). The application, running on the \(g^{th}\) generation \(Q_g\) is a simple modification query on P, which deletes tuples from P, according to some condition \(\theta \), expressed in terms of P. The set of deleted tuples is defined by \(\bigtriangledown _{\theta }P\) rather than given by enumeration.

We have problems with this kind of rules like deletion over join. In fact: in \(\bigtriangledown _{\theta }(R\bowtie S)\), we deal with formula \(\theta \) that can be complicated. When we use the substitution of \((R\bowtie S)\) instead of P in \(\theta \), we receive a new formula in terms of R and S that contains a mix of attributes from both relations: R and S. In order to evaluate \(\theta \), we must first produce \((R\bowtie S)\) and then delete \(\bigtriangledown _{\theta }\) from the join, while we are mostly interested to derive (if possible) from \(\theta \) some formulae: \(\theta _1^R,\ldots ,\theta _{\imath }^R\) over R and \(\theta _1^S,\ldots ,\theta _{\jmath }^S\) over S, which we will apply to R and S respectively in order to obtain the same desired result.

In logical notation, the formulae: \(\theta _1^R,\dots ,\theta _{\imath }^R\) over R and \(\theta _1^S,\dots ,\theta _{\jmath }^S\) over S are Feferman-Vaught reduction sequences (or simply, reductions), cf. [19]. The sequences are sets of queries such that each such a query can be evaluated on the components: R and S. Next, from the local answers, and possibly some additional information, we compute the answer. In this paper, we generalize the notion of Feferman-Vaught reduction sequences to handling queries over \(\Phi \)–sums.

Handling queries under database updates: Materialized views contain some derived portion of the database information and storing as new relations. In order to reflect the changes, made on the source relations, the views should be modified by adding or deletions tuples without total re-computation from the database.

Given: A materialized view and a database update.

Find: A set of view updates that uses the old content of the view and delete from and inserts in the view some set of tuples defined on the source database.

In the case of the incremental view maintenance, we try to find some effective way to refresh the content of the view by some updates on it. The updates should be derived from the update on the source database, without the total view re-computation. In many case, it permits to simplify the maintenance procedure.

Unfortunately, as a rule, the derived view contains only some small part of the database information, and it is just impossible to obtain the desired results as a map over only the view. Using extension of the logical machinery of syntactically defined translations schemes, first introduced in [29] and recently used in [21] in the context of the database theory, we give precise definition of incremental view re-computation and prove that every query expressible in several extensions of First Order Logic (FOL) allows the incremental view re-computation.

In general, this contribution deals with exploitation of logical reduction techniques in database theory. This approach unifies different aspects, related to both schema and data evolution in databases, into a single framework. It is assumed that the reader is familiar with database theory as presented in [1] and has logical background as described in [15].

The used logical reduction techniques come in the form of Feferman-Vaught reduction sequences and translation schemes, known also in model theory as interpretations. The interpretations give rise to two induced maps, translations and transductions. Transductions describe the induced transformation of database instances and the translations describe the induced transformations of queries. Translation schemes appear naturally in the context of databases: The first example is the vertical decomposition of a relation scheme into two relation schemes with overlapping attribute sets. Also the reconstruction of the original scheme of the vertical decomposition can be looked at as translation scheme. The same is true for horizontal decompositions and the definition of views. More surprisingly, also updates can be cast into this framework. Finally, translation schemes describe also the evolution of one database scheme over different generations of database designs.

The paper is structured in the following way. Section 2 presents short review of the related works. Section 3 provides the definitions and main results, related to syntactically defined translation schemes. Section 4 is dedicated to handling of queries under restructuring of databases. Section 5 is dedicated to handling of queries under database updates. Section 6 summarizes the paper.

2 Related Works

Maintaining dynamic databases, have a long history, cf. [911, 13]. One of the most recent paper is [21], inspired by [31]. Like [21], we are also “interested in some arbitrary but fixed query on a finite structure, which is subject to an ongoing sequence of local changes, and after each change the answer to the query should remain available.”

In [21], the local changes of the database were limited to elements, which are constantly inserted in and deleted from the database. We take the following verbatim from the Conclusion and future work section of [21]: We think that it is interesting to consider updates that are induced by first-order formulae. On the one hand one can consider formulae which induce updates directly to the structure, i.e. consider updates that change all tuples with the property defined by the formula. On the other hand one can perform canonical updates to one structure and consider the changes that are induced on a first-order interpreted structure. In this paper, we propose a unified logic based approach to maintenance of queries under database changes. We show how this approach works not only for FOL but also for different extensions of it, used in the database theory.

In [26], the incremental view maintenance problem was investigated from an algebraic perspective. The author constructed a ring of databases and used it as the foundation of the design of a query calculus that allowed to express powerful aggregate queries. In this framework, a query language needed to be closed under computing an additive inverse (as a generalization of the union operation on relations to support insertions and deletions) and the join operation had to be distributive over this addition to support normalization, factorization, and the taking of deltas of queries.

Some propagation techniques for view updates may be found in [3]. In [24], the complexity of testing the correctness of an arbitrary update to a database view is analyzed, coming back to constant-complement approach of Bancilhon and Spyratos, cf. [5]. We must mention the recent exciting works [1618, 20, 23], which use propagation techniques for view updates as well. However, no one of them considers the question in comparable generality. In fact, we do not need most of usually used additional assumptions. For example, we do not need the structures to be ordered. Moreover, we allow both restructuring of the database and insertion, deletion or set operations under the same logical framework. In addition, we do not restrict ourselves to the use of FOL but rather different its extensions.

3 Translation Schemes

In this section, we introduce the general framework for syntactically defined translation schemes in terms of databases. We assume that the reader is familiar with precise definitions of extensions of FOL, cf. [25]. The notion of abstract translation schemes comes back to Rabin, cf. [29]. The translation schemes are also known in model theory as interpretations, as described in particular in [25]. The definition is valid for a wide class of logics or query languages, including Datalog or Second Order Logic (SOL) as well as FOL, MSOL, TC, n-TC, LFP or n-LFP. However, we start from Relational Calculus in the form of FOL. Occasionally, we use Relational Algebra expressions when they are more convenient to readers.

We follow Codd’s notations, cf. [7]. Database systems should present the user with tables called relations (\(R_1, R_2, \ldots \)) and their columns are headed by attributes (\(A_1, A_2, \ldots \)) for a relation is called the schema for that relation (\(R_1[\bar{\mathcal{A}}], R_2[\bar{\mathcal{B}}], \ldots \)). The set of schema for the relations (\(\mathbf {R}, \mathbf {S}, \ldots \)) is called a relational database schema, or just database schema. The row of a relation (t) are called tuples. A tuple has one component (\(t[A_1]\), \(t[A_2]\), \(\dots \)) for each attribute of the relation. We shall call a set of tuples for a given relation an instance (\(I(R_1)\), \(I(R_2)\), \(\dots \)) of that relation.

Definition 1

(Translation Schemes \({\Phi }\) ). Let \(\mathbf {R}\) and \(\mathbf {S}\) be two database schemes. Let \(\mathbf {S} = (S_1, \ldots , S_m)\) and let \(\rho (S_i)\) be the arity of \(S_i\). Let \(\Phi = \langle \phi , \phi _1, \ldots , \phi _m \rangle \) be FOL formulae over \(\mathbf {R}\). \(\Phi \) is kfeasible for \(\mathbf {S}\) over \(\mathbf {R}\) if \(\phi \) has exactly k distinct free FOL variables and each \(\phi _i\) has \(k \rho (S_i)\) distinct free first order variables. Such a \(\Phi = \langle \phi , \phi _1, \ldots , \phi _m \rangle \) is also called a k\(\mathbf {R}\)\(\mathbf {S}\)translation scheme or, in short, a translation scheme, if the parameters are clear in the context.

  • If \(k=1\) we speak of scalar or non–vectorized translation schemes.

  • If \(\phi \) is a tautology, then the translation scheme is non–relativized. Otherwise, \(\phi \) defines relativization of the new database domain.

The formulae \(\phi , \phi _1, \ldots , \phi _m \) can be thought of as queries. \(\phi \) describes the new domain, and the \(\phi _i\)’s describe the new relations. Vectorization creates one attribute out of a finite sequence of attributes. The use of vectorized translation schemes in the context of databases is shown in particular in [2] and [27]. We shall discuss concrete examples after we have introduced the induced transformation of database instances.

A (partial) function \(\Phi ^*\) from \(\mathbf {R}\) instances to \(\mathbf {S}\) instances can be directly associated with a translation scheme \(\Phi \).

Definition 2

(Induced Map \(\Phi ^*\) ). Let \(I(\mathbf {R})\) be a \(\mathbf {R}\) instance and \(\Phi \) be k–feasible for \(\mathbf {S}\) over \(\mathbf {R}\). The instance \(I(\mathbf {S})_{\Phi }\) is defined as follows:

  1. 1.

    The universe of \(I(\mathbf {S})_{\Phi }\) is the set \(I(\mathbf {S})_{\Phi } =\{\bar{a} \in I(\mathbf {R})^k: I(\mathbf {R}) \models \phi (\bar{a}) \}\).

  2. 2.

    The interpretation of \(S_i\) in \(I(\mathbf {S})_{\Phi }\) is the set

    $$\begin{aligned} I(\mathbf {S})_{\Phi }(S_i) =\{\bar{a} \in {I(\mathbf {S})_{\Phi }}^{\rho (S_i)}: I(\mathbf {R}) \models (\phi _i(\bar{a}) ) \}. \end{aligned}$$

    Note that \(I(\mathbf {S})_{\Phi }\) is a \(\mathbf {S}\) instance of cardinality at most \(\mid \mathbf {R}\mid ^k\).

  3. 3.

    The partial function \(\Phi ^*: I(\mathbf {R}) \rightarrow I(\mathbf {S})\) is defined by \(\Phi ^*(I(\mathbf {R})) = I(\mathbf {S})_{\Phi }\). Note that \(\Phi ^*(I(\mathbf {R}))\) is defined iff \(I(\mathbf {R}) \models \exists \bar{x} \phi \).

\(\Phi ^*\) maps \(\mathbf {R}\) instances into \(\mathbf {S}\) instances, by computing the answers to the queries \(\phi _1, \ldots , \phi _m \) over the domain of \(\mathbf {R}\) specified by \(\phi \), see Fig. 1. The definition of \(\Phi ^*\) can be extended on the case of sub-sets of \(\mathbf {R}\) instances in the regular way.

Next we want to describe the way formulae (query expressions) are transformed when we transform databases by \(\Phi ^*\). For this a function \(\Phi ^\#\) from \(\mathcal{L}_1\)–formulae over \(\mathbf {S}\) \(\mathcal{L}_2\)–formulae over \(\mathbf {R}\) can be directly associated with a translation scheme \(\Phi \), see Fig. 1.

Definition 3

(Induced map \(\Phi ^\#\) ). Let \(\theta \) be a \(\mathbf {S}\)–formula and \(\Phi \) be k–feasible for \(\mathbf {S}\) over \(\mathbf {R}\).The formula \(\theta _{\Phi }\) is defined inductively as follows:

  1. 1.

    For each \(S_i \in \mathbf {S}\) and \(\theta = S_i( x_1,\ldots , x_l)\) let \(x_{j,h}\) be new variables with \(j \le l\) and \(h \le k\) and denote by \(\bar{x}_j = \langle x_{j,1}, \ldots , x_{j,k}\rangle \). We make \(\theta _{\Phi } = \phi _i(\bar{x}_1, \ldots , \bar{x}_l)\).

  2. 2.

    For the boolean connectives, the translation distributes, i.e. if \(\theta = (\theta _1 \vee \theta _2)\) then \(\theta _{\Phi }= ({\theta _1}_{\Phi } \vee {\theta _2}_{\Phi })\) and if \(\theta = \lnot \theta _1\) then \(\theta _{\Phi }= \lnot {\theta _1}_{\Phi }\), and similarly for \(\wedge \).

  3. 3.

    For the existential quantifier, we use relativization, i.e., if \(\theta = \exists y \theta _1\), let \(\bar{y} = \langle y_1, \ldots , y_k \rangle \) be new variables. We make \(\theta _{\Phi }= \exists \bar{y} (\phi (\bar{y}) \wedge ({\theta _1})_{\Phi }). \)

  4. 4.

    For infinitary logics: if \(\theta =\bigwedge \Psi \) then \(\theta _{\Phi }=\bigwedge \Psi _{\Phi }\).

  5. 5.

    For second order variables U of arity \(\ell \) and \(\bar{a}\) a vector of length \(\ell \) of first order variables or constants we translate \(V(\bar{a})\) by treating V like a relation symbol and put \(\theta _{\Phi }= \exists V (\forall \bar{v} (V(\bar{v}) \rightarrow (\phi (\bar{v_1}) \wedge \ldots \phi (\bar{v_{\ell }}) \wedge ({\theta _1})_{\Phi }))).\)

  6. 6.

    For LFP, if \(\theta = n\)-\(LFP \bar{x},\bar{y},\bar{u},\bar{v} \theta _1\) then \(\theta _{\Phi } = nk\)-\(LFP \bar{x},\bar{y},\bar{u},\bar{v} {\theta _1}_{\Phi }\).

  7. 7.

    For TC: if \(\theta = n\)-\(TC \bar{x},\bar{y},\bar{u},\bar{v} \theta _1\) then \(\theta _{\Phi } = nk\)-\(TC \bar{x},\bar{y},\bar{u},\bar{v} {\theta _1}_{\Phi }\).

  8. 8.

    The function \(\Phi ^\#:\mathcal{L}_1\) over \(\mathbf {S}\rightarrow \mathcal{L}_2 \) over \(\mathbf {R}\) is defined by \(\Phi ^\#(\theta ) = \theta _{\Phi }\).

  9. 9.

    For a set of \(\mathbf {S}\)–formulae \(\Sigma \) we define

    $$\begin{aligned} \Phi ^\#(\Sigma )= \{\theta _{\Phi }:\theta \in \Sigma \text{ or } \theta = \forall \bar{y}(S_i \leftrightarrow S_i)\} \end{aligned}$$

    This is to avoid problems with \(\Sigma \) containing only quantifier free formulae, as \(\Phi ^\#(\Sigma )\) need not be a set of tautologies even if \(\Sigma \) is. If \(\Sigma \) contains only quantifier free formulae, we can reflect effect of relativization.

Observation 1

  1. 1.

    \(\Phi ^\# (\theta ) \in FOL\) (SOLTCLFP) if \(\theta \in FOL (SOL, TC, LFP)\), even for vectorized \(\Phi \).

  1. 2.

    \(\Phi ^\# (\theta ) \in MSOL\) provided \(\theta \in MSOL\), but only for scalar \(\Phi \).

  2. 3.

    \(\Phi ^\# (\theta ) \in nk\)-TC (nk-LFP) provided \(\theta \in n\)-TC (n-LFP) and \(\Phi \) is a k–feasible.

  3. 4.

    \(\Phi ^\# (\theta ) \in TC^{kn} (LFP^{kn},L^{kn}_{\infty \omega })\) provided \(\theta \in TC^n (LFP^n,L^{n}_{\infty \omega })\) and \(\Phi \) is a k–feasible.

The following fundamental theorem is folklore and establishes the correctness of the translation, cf. [15]. Figure 1 illustrates the fundamental theorem.

Theorem 1

Let \(\Phi = \langle \phi , \phi _1, \ldots , \phi _m \rangle \) be a k\(\mathbf {R}\)\(\mathbf {S}\)–translation scheme, \(I(\mathbf {R})\) be a \(\mathbf {R}\)-instance and \(\theta \) be a FOL–formula over \(\mathbf {S}\). Then \(I(\mathbf {R}) \models \Phi ^\#(\theta )\) iff \(\Phi ^*(I(\mathbf {R})) \models \theta \).

Fig. 1.
figure 1

Components of translation schemes the fundamental property

Now, we can define the composition of translation schemes:

Definition 4

(Composition of Translation Schemes). Let \(\Psi = \langle \psi ,\) \( \psi _1,\) \( \ldots ,\) \( \psi _{m_1} \rangle \) be a \(k_1\)\(\mathbf {R}\)\(\mathbf {S}\)–translation scheme, and let \(\Phi = \langle \phi , \phi _1, \ldots , \phi _{m_2} \rangle \) be a \(k_2\)\(\mathbf {S}\)\(\mathbf {T}\)–translation scheme. Then we denote by \(\Psi \circ \Phi \) the \((k_1 \cdot k_2)\)\(\mathbf {R}\)\(\mathbf {T}\)–translation scheme given by \(\langle \Psi ^{\#}(\phi ), \Psi ^{\#}(\phi _1), \ldots , \Psi ^{\#}(\phi _{m_1}) \rangle \). \(\Psi (\Phi )\) is called the composition of \(\Phi \) with \(\Psi \).

One can easily check that the syntactically defined composition of translation schemes has the following semantic property: \(\Psi \circ \Phi (I(\mathbf {R})) = \Psi (\Phi (I(\mathbf {R}))).\)

Now, we give a line of examples of translation schemes, relevant to the field of database theory. Assume that in all the examples, we a given database scheme \(\mathbf {R}=(R_1,R_2,\ldots ,R_n)\).

Example 1

(Restriction of the Domain). Assume we want to restrict the domain of \(\mathbf {R}\) by allowing only elements, which satisfy some condition, defined by formula \(\phi (x)\) in the chosen language (FOL, relation calculus, etc.). The corresponding translation scheme \(\Phi _{Restriction}\) is:

$$\begin{aligned} \Phi _{Restriction}= \langle \phi , R_1, R_2,\ldots , R_n \rangle . \end{aligned}$$

Example 2

(Deletion of a Definable set of Tuples from a Relation). Assume we want to delete from a relation, say, \(R_i\) of the \(\mathbf {R}\) a set of tuples, which do not satisfy some condition, defined by formula \(\theta \). The corresponding translation scheme \(\Phi _{DT}\) is:

$$\begin{aligned} \Phi _{DT}= \langle x\approx x, R_1, \ldots , R_{i-1}, R_{i}\wedge \lnot \theta , R_{i+1}, \ldots , R_n \rangle . \end{aligned}$$

Example 3

(Insertion of a Tuple into a Relation). Assume we want to insert a tuple into a relation, say, \(R_{i}[A_1,\ldots ,A_{k_{i}}]\) of the \(\mathbf {R}\), where \(R_{i}\) contains \(k_{i}\) attributes. The corresponding translation scheme \(\Phi _{InT}\) is a parametrized translation scheme with \(k_{i}\) parameters \(a_1,\ldots , a_{k_i}\), which can be expressed, for example, in FOL in the following way:

$$\begin{aligned} \Phi _{InT}= \langle x\approx x, R_1, R_{2}, \ldots , R_{i-1}, (R_i\bigvee (\bigwedge _{1\le j\le k_i} x_j\approx a_j)), R_{i+1}, \ldots , R_n \rangle . \end{aligned}$$

Example 4

(Vertical Decomposition (Projections)). The vertical decomposition, given by a translation scheme in FOL notation, is:

$$\begin{aligned} \Phi _{VD} = \langle x\approx x, \phi _1, \ldots , \phi _n \rangle , \end{aligned}$$

where each \(\phi _i\) is of the form \(\phi _i(\bar{x}_i)=\exists \bar{y}_i R_{i}(\bar{x}_i,\bar{y}_i)\). \(R_{i}\) is a relation symbol from \(\mathbf {R}\) and \(\bar{x}_i\) is a vector of free variables. In relational algebra notation, this amounts to \(\phi _i(\bar{x}_i)=\pi _{\bar{x}_i}R_{i}\).

Example 5

(Vertical Composition (Join)). The vertical composition, given by a translation scheme in FOL notation, is:

$$\begin{aligned} \Phi _{VC} = \langle x\approx x, \phi _1, \ldots , \phi _n \rangle , \end{aligned}$$

where each \(\psi _i\) is of the form \(\phi _i(\bar{x}_i)=\bigwedge _{l=1}^{k}\ R_{i_l}(\bar{x}_{i_l}),\) \(R_{i_l}\) is a relation symbol from \(\mathbf {R}\) and \(\bar{x}_i\) is a vector of free variables. Furthermore \(\cup _j\bar{x}_{i_j}=\bar{x}_i\) and for all \(\bar{x}_{i_{j_1}}\) there is \(\bar{x}_{i_{j_2}}\) such that \(\bar{x}_{i_{j_1}}\cap \bar{x}_{i_{j_2}}\ne \emptyset \). In relational algebra notation, this amounts to \(\phi _i(\bar{x}_i)=\bowtie _{l=1}^{k}R_{i_l}\). If there are no common free variables, this just defines the Cartesian product.

Example 6

(Horizontal Decomposition (Exceptions)). Assume we want to decompose a relation, say, \(R_i[A_1,\ldots ,A_{k_{i}}]\) into two parts \(R^1_i[A_1,\ldots ,A_{k_{i}}]\) and \(R^2_i[A_1,\ldots ,A_{k_{i}}]\) such that all tuples of the first part satisfy some definable condition (formula) \(\theta \) and all tuples of the second part do not. Such a transformation is called horizontal decomposition of \(R_i\) along \(\theta \) and in FOL notation is:

$$\begin{aligned} \Phi _{HD}= \langle x\approx x, R_1, R_{2}, \ldots , R_{i-1}, R_i\wedge \theta ,R_i\wedge \lnot \theta , R_{i+1}, \ldots , R_n \rangle . \end{aligned}$$

Example 7

(Horizontal Composition (Union)). Assume we want to compose a new relation, say, \(R_{n+1}[A_1,\ldots ,A_{k_{n+1}}]\) from two given relations \(R_{i_1}[A_1,\ldots ,A_{k_{n+1}}]\) and \(R_{i_2}[A_1,\ldots ,A_{k_{n+1}}]\). Such a transformation is called horizontal composition of \(R_{n+1}\) and in FOL notation is:

$$\begin{aligned} \Phi _{HC}= \langle x\approx x, R_1, R_{2}, \ldots , R_n, R_{i_1}\vee R_{i_2} \rangle . \end{aligned}$$

The translation \(\Phi _{HC}\) is called the horizontal composition (union) of \(R_{i_1}\) and \(R_{i_2}\).

Example 8

(Definition of a View). Assume we are given a database scheme that contains four relations: \(\mathbf R=(R_1,R_2,R_3,R_4)\). Assume that we want to define a view of a snapshot that is derived from the database by applying the following query, given in the format of relational algebra: \(\phi _{View}=(\pi _A R_1 \cup R_2) \bowtie (R_3-\sigma _\xi R_4)\). In this case, the corresponding translation scheme is:

$$\begin{aligned} \Phi _{View}=\langle x=x, \phi _{View}\rangle . \end{aligned}$$

4 Handling Queries Under Restructuring of Databases

In terms of translation schemes, the problem of handling queries under restructuring of databases may be paraphrased in the following way, see Fig. 2:

Given: Two different generations of the same database, say, \(\mathbf {R}^g\) and \(\mathbf {R}^{g+1}\). Additionally, we have two maps: \(\Phi _{g}\) and \(\Psi _{g}\), where \(\Phi _{g}\) produces \(\mathbf {R}^{g+1}\) from \(\mathbf {R}^g\) and \(\Psi _{g}\) is the corresponding reconstruction map. Finally, there is an application (translation scheme) \(\Phi _{g}^{app}\) on the \(g^{th}\) generation.

Find: An application (translation scheme) \(\Phi _{g+1}^{app}\) on the \((g+1)^{th}\) generation, such that: \(\Phi _{g+1}^{app*}(\Phi _{g}^*(\mathbf {R}^g))=\Phi _{g}^{app*}(\mathbf {R}^g).\)

Fig. 2.
figure 2

Query on two different generations of database

Example 9

Assume that we are given database scheme \(\mathbf {R^g}=(R^g)\) and two restructurings, defined by the following pair of translation schemes:

  1. 1.

    \(\mathbf {R^{g+1}}=(R^{g+1}_1,R^{g+1}_2)\), \(\Psi _{g}=(\psi ^{g})\) and \(\psi ^{g}=(R^{g+2}_1\bowtie R^{g+2}_2)\).

  2. 2.

    \(\mathbf {R^{g+2}}=(R^{g+2}_1,R^{g+2}_2,R^{g+2}_3,R^{g+2}_4)\), \(\Psi _{g+1}=(\psi ^{g+1}_1,\psi ^{g+1}_2)\) and \(\psi ^{g+1}_1=(\pi _{A}R^{g+2}_1\cup R^{g+2}_2)\), \(\psi ^{g+1}_2=(R^{g+2}_3-\sigma _{\zeta }R^{g+2}_4)\).

Assume that \(Q_g\) is a simple modification query on \(\mathbf {R}^g\), which deletes tuples from \(R^g\), according to some condition \(\theta \), expressed in terms of \(R^g\). The set of deleted tuples is defined by \(\bigtriangledown _{\theta }R^g\) rather than given by enumeration. In such a case, we want to understand which tuples of which relations from \(\mathbf {R}^{g+2}\) must be deleted, or moreover not only deleted, in order to produce the same output. Using substitutions, we obtain over \(\mathbf {R}^{g+2}\):

$$\begin{aligned} \bigtriangledown _{\Psi ^\#_{g+1} (\Psi ^\#_g(\theta ))} ((\pi _{A}R^{g+2}_1\cup R^{g+2}_2)\bowtie (R^{g+2}_3-\sigma _{\zeta }R^{g+2}_4)). \end{aligned}$$

From Example 9, we observe that the derived set of tuples, defined by \(\Psi ^\#_{g+1} (\Psi ^\#_g(\theta ))\), seems to be already in terms of \(\mathbf {R}^{g+2}\). However, the corresponding modification procedure can not be directly presented in terms of updates of relations from \(\mathbf {R}^{g+2}\).

4.1 Handling of Queries over Disjoint Unions and Shufflings

The Disjoint Union (DJ) is the simplest example of juxtaposition, where none of the components are linked to each other. Assume we have a set of database schemes \(\mathbf {R}_{\imath }\)’s and we want to define a database scheme that represents their DJ. In this case, we add an, so called, index scheme \(\mathbf {R}_{I}\), which specifies the parameters of the composition of the database schemes. The index scheme is a database scheme, whose instances are used in combining disjoint databases into a single database.

Definition 5

(Disjoint Union). Let \(\mathbf {R}_{I}\) be a database scheme chosen as an index scheme \(\mathbf {R}_{I}=( R_1^{I},\ldots ,R_{\jmath ^I}^I)\) with domain I and \(\mathbf {R}_{\imath }=( R_1^{\imath },\ldots , R_{\jmath ^{\imath }}^{\imath })\) be a database scheme with domain \(D_{\imath }\). In the general case, the resulting database scheme \(\mathbf {R}\)=\({\bigsqcup }_{\imath \in I}\mathbf {R}_{\imath }\) with the domain \( I\cup \dot{\bigcup }_{\imath \in I}D_{\imath }\) will be

$$\begin{aligned} \mathbf {R}=( P(\imath ,x), Index(x), R^I_{j}(1\le j\le \jmath ^I), R^{\imath }_{j^i}(\imath \in I,1\le j^i\le \jmath ^{\imath }))~for~all~ \imath \in I, \end{aligned}$$
  • the instance of \(P(\imath ,x)\) in \(\mathbf {R}\) contains a tuple \((\imath ,x)\) iff x came from \(R_{\imath }\);

  • the instance of Index(x) in \(\mathbf {R}\) contains x iff x came from I;

  • \(R^I_{j}(1\le j\le \jmath ^I)\) are from \(\mathbf {R}_{I}\) and

  • \(R^{\imath }_{j^i}(\imath \in I,1\le j^i\le \jmath ^{\imath })\) are from \(\mathbf {R}_{\imath }\)

Now, we give the classical theorem for the DJ, cf. [19, 22].

Theorem 2

(Feferman-Vaught-Gurevich). Let \(\mathbf {R}_{I}\) be an index scheme with domain of size k and let \(\mathbf {R}\)=\({\bigsqcup }_{\imath \in I}\mathbf {R}_{\imath }\). For any FOL formula \(\varphi \) over \(\mathbf {R}\) there are:

  1. 1.

    formulae of FOL \(\psi _{1,1},\ldots ,\psi _{1,j_1},\ldots , \psi _{k,1},\ldots ,\psi _{k,j_{k}}\)

  2. 2.

    a formula of MSOL \( \psi _{I}\)

  3. 3.

    a boolean function \(F_{\varphi }(b_{1,1},\ldots ,b_{1,j_1}, \ldots , b_{k,1}, \ldots ,b_{k,j_{k}}, b_{I})\)

with the formulae in 1-2 having the following property:

$$\begin{aligned} I({\mathbf {R}}_{\imath }) \models \psi _{\imath ,\jmath } \text{ iff } b_{\imath ,\jmath } =1 \text{, } \text{ and } I({\mathbf {R}_{I}}) \models \psi _{I} \text{ iff } b_{I} =1 \end{aligned}$$

and, for the boolean function of 3, we have

$$\begin{aligned} I({\mathbf {R}}) \models \varphi \text{ iff } F_{\varphi }(b_{1,1},\ldots ,b_{1,j_1},\ldots ,b_{k,1}, \ldots ,b_{k,j_{k}}, b_{I}) =1. \end{aligned}$$

Note that we require that \(F_{\varphi }\) and the \(\psi _{\imath ,\jmath }\)’s depend only on \(\varphi \), k and \(\mathbf {R}_1,\ldots ,\mathbf {R}_{k}\) but not on the instances involved.

For the case of the DJ, we assume that domains of databases in each site are disjoint. However, as a rule, the values of certain attributes may appear at several sites. We can assume that the domain of the index scheme is fixed and known, however we can not (without additional assumption) fix finite number of one place predicates. This puts the main limitation on the use of Theorem 2. Moreover, even if \(\phi _{=}\) exists for some fixed database instance, it must be independent upon the current content of database and must be formulated ahead syntactically. In addition, it must be relatively small, as otherwise it causes explosion in size of other formulae. Now, we apply logical machinery.

Definition 6

(Partitioned Index Structure). Let \({\mathcal I}\) be an index structure over \(\tau _{ind}\). \({\mathcal I}\) is called finitely partitioned into \(\ell \) parts if there are unary predicates \(I_{\alpha }\), \(\alpha < \ell \), in the vocabulary \(\tau _{ind}\) of \({\mathcal I}\) such that their interpretation forms a partition of the universe of \({\mathcal I}\).

In addition to the DJ, one may produce a new structure by shuffling.

Definition 7

(Shuffle over Partitioned Index Structure). Let \({\mathcal A}_i, i \in I\) be a family of structures such that for each \(i \in I_{\alpha }\): \({\mathcal A}_i \cong {\mathcal B}_{\alpha }\). In this case, we say that \(\biguplus _{\alpha < \beta }^{{\mathcal I}}{\mathcal A}_{\alpha }\) is the shuffle of \({\mathcal B}_{\alpha }\) along the partitioned index structure \({\mathcal I}\).

We generalize Theorem 2 by introducing abstract preservation properties in the following way:

Definition 8

(Preservation Properties with Fixed Index Set). For two logics \(\mathcal{L}_1\) and \(\mathcal{L}_2\) we define Preservation Property for Disjoint Union

  • Input of operation: Indexed set of structures;

  • Preservation Property: if for each \(i\in I\) (index set) \({\mathcal A}_i\) and \({\mathcal B}_i\) satisfy the same sentences of \(\mathcal{L}_1\) then the disjoint unions \(\bigsqcup _{i\in I}{\mathcal A}_i \) and \(\bigsqcup _{i\in I}{\mathcal B}_i \) satisfy the same sentences of \(\mathcal{L}_2\).

  • Notation: DJ-\(PP(\mathcal{L}_1, \mathcal{L}_2)\)

Definition 9

(Preservation Properties with Variable Index Structures). For two logics \(\mathcal{L}_1\) and \(\mathcal{L}_2\) we define Preservation Properties for Shuffle

Input of operation: A family of structures \({\mathcal B}_{\alpha }: \alpha < \beta \) and a (finitely) partitioned index structure \({\mathcal I}\) with \(I_{\alpha }\) a partition.

Preservation Property: Assume that for each \(\alpha < \beta \) the pair of structures \({\mathcal A}_{\alpha }, {\mathcal B}_{\alpha }\) satisfy the same sentences of \(\mathcal{L}_1\), and \({\mathcal I}, {\mathcal I}\) satisfy the same MSOL-sentences. Then the schuffles \(\biguplus _{\alpha < \beta }^{{\mathcal I}}{\mathcal A}_{\alpha } \) and \(\biguplus _{\alpha < \beta }^{{\mathcal I}}{\mathcal B}_{\alpha }\) satisfy the same sentences of \(\mathcal{L}_2\).

Notation: Shu-\(PP(\mathcal{L}_1, \mathcal{L}_2)\) (FShu-\(PP(\mathcal{L}_1, \mathcal{L}_2)\))

Now, we list which Preservation Properties hold for which logics.

Theorem 3

Let \({\mathcal I}\) be an index structure and \(\mathcal{L}\) be any of FOL, \(FOL^{m,k}\), \(L_{\omega _1, \omega }^{\omega }\), \(L_{\omega _1, \omega }^{k}\), \(MSOL^m\), \(MTC^m\), \(MLFP^m\), or \(FOL[{\mathbf Q}]^{m,k}\) (\(L_{\omega _1, \omega }[{\mathbf Q}]^{k}\)) with unary generalized quantifiers. Then DJ-\(PP(\mathcal{L}, \mathcal{L})\) and FShu-\(PP(\mathcal{L}, \mathcal{L})\) hold. Note that this includes DJ-\(PP(FOL^{m,k}, FOL^{m,k})\) and FShu-\(PP(FOL^{m,k}, FOL^{m,k})\) with the same bounds for both arguments, and similarly for the other logics.

Proof

  • The proofs for FOL and MSOL are classical, see in particular [6]. Extension for \(FOL^{m,k}\) can be done directly from the proof for FOL.

  • The proof for MLFP was given in [4].

  • The proof was given in [8].

  • The proof was given in [30].

Now, we recall that analyzing Example 9, we decided that we are interested to derive from \(\theta \) of \(\bigtriangledown _{\Psi ^\#_{g+1} (\Psi ^\#_g(\theta ))} ((\pi _{A}R^{g+2}_1\cup R^{g+2}_2)\bowtie (R^{g+2}_3-\sigma _{\zeta }R^{g+2}_4))\) some formulae: \(\theta _1^R,\dots ,\theta _{\imath }^R\) over R and \(\theta _1^S,\dots ,\theta _{\jmath }^S\) over S, which we will apply to R and S respectively. Now, we formulate the requirement more formally:

Definition 10

(Reduction Sequence). Let \({\mathcal I}\) be a finitely partitioned \(\tau _{ind}\)-index structure and \(\mathcal{L}\) be logic.

Let \({{\mathcal A}} = \biguplus _{\alpha < \beta }^{\varvec{\mathcal {I}}} {\mathcal B}_{\alpha }\) be the \(\tau \)–structure which is the finite shuffle of the \(\tau _{\alpha }\)-structures \({\mathcal B}_{\alpha }\) over \(\varvec{\mathcal {I}}\) or another combination of the components. A \(\mathcal{L}_1\)-reduction sequence for shuffling for \(\phi \in {\mathcal{L}_2}(\tau _{shuffle})\) is given by

  1. 1.

    a boolean function \(F_{\phi }(b_{1,1},\ldots ,b_{1,j_1}, \ldots , b_{\beta ,1}, \ldots ,b_{\beta ,j_{\beta }}, b_{I,1},\ldots ,b_{I,j_{I}})\)

  2. 2.

    set \(\Upsilon \) of \({\mathcal{L}_1}\)–formulae \(\Upsilon =\{\psi _{1,1},\ldots ,\psi _{1,j_1},\ldots , \psi _{\beta ,1},\ldots ,\psi _{\beta ,j_{\beta }}\}\)

  3. 3.

    MSOL–formulae \(\psi _{I,1},\ldots ,\psi _{I,j_{I}}\)

and has the property that for every \({{\mathcal A}}\), \({\mathcal I}\) and \({{\mathcal B}}_{\alpha }\) as above with \({{\mathcal B}}_{\alpha } \models \psi _{\alpha ,j}\) iff \(b_{\alpha ,j} =1\) and \({{\mathcal B}}_{I} \models \psi _{I,j}\) iff \(b_{I,j} =1\) we have

$$\begin{aligned} {{\mathcal A}} \models \phi \text{ iff } F_{\phi }(b_{1,1},\ldots ,b_{1,j_1},\ldots ,b_{\beta ,1}, \ldots ,b_{\beta ,j_{\beta }}, b_{I,1},\ldots ,b_{I,j_I}) =1. \end{aligned}$$

Note that we require that \(F_{\phi }\) and the \(\psi _{\alpha ,j}\)’s depend only on \(\phi \),\(\beta \) and \(\tau _1,\ldots ,\tau _{\beta }\) but not on the structures involved.

The following theorem partially answers the question of Example 9.

Theorem 4

Let \(\mathcal{L}\) be any of FOL, \(FOL^{m,k}\), \(L_{\omega _1, \omega }^{\omega }\), \(L_{\omega _1, \omega }^{k}\) \(MSOL^m\), \(MTC^m\), \(MLFP^m\), or \(FOL[{\mathbf Q}]^{m,k}\) with unary generalized quantifiers. There is an algorithm, which for given \(\mathcal{L}\), \(\tau _{ind}\), \(\tau _{\alpha }, \alpha < \beta \), \(\tau _{shuffle}\) and \(\phi \in \mathcal{L}(\tau _{shuffle})\) produces a reduction sequence for \(\phi \) for \((\tau _{ind}, \tau _{shuffle})\)-shuffling. However, \(F_{\phi }\) and the \(\psi _{\alpha ,j}\) are tower exponential in the quantifier rank of \(\phi \). Furthermore, F depends on the MSOL–theory of the index structure restricted to the same quantifier rank as \(\phi \).

Proof

By analyzing the proof of Theorem 3.

Note that Theorem 4 is not true for all logics as shown in [30].

4.2 Handling Queries Over \(\Phi \)–Sum

Combining Disjoint Unions and Shuffles with translation schemes, we can reach a very large set of useful structures. In this section, we present our new results in the field. We expend the classical Theorem 2 and more recent Theorems 3 and 4 to the cases, when translation schemes are involved in process of construction of the desired structure from the Disjoint Unions and Shuffles.

Definition 11

( \(\Phi \)Sum for extensions of FOL ). Let \({\mathcal I}\) be a finitely partitioned index structure and \(\mathcal{L}\) be any of FOL, MSOL, MTC, MLFP, or FOL with unary generalized quantifiers. Let \({{\mathcal A}} = {\bigsqcup }_{i \in I} {{\mathcal A}}_i\) or \({{\mathcal A}} = \biguplus _{\alpha < \beta }^{\varvec{\mathcal {I}}} {\mathcal B}_{\alpha }\) be a \(\tau \)–structure, where each \({{\mathcal A}}_i\) is isomorphic to some \({{\mathcal B}}_1, \ldots ,{{\mathcal B}}_{\beta }\) over the vocabularies \(\tau _1,\ldots ,\tau _{\beta }\), in accordance with the partition.

For a \(\Phi \) be a scalar (non–vectorized) \(\tau \)\(\sigma \) \({\mathcal{L}}\)–translation scheme, the \(\Phi \)–sum of \({{\mathcal B}}_1, \ldots ,{{\mathcal B}}_{\beta }\) over I is the structure \(\Phi ^*({{\mathcal A}})\), or rather any structure isomorphic to it.

Theorem 5

Let \({\mathbf {R}}_I\) be a finitely partitioned index database scheme, \(\mathcal{L}\) be any of FOL, MSOL, MTC, MLFP, MSOL or FOL with unary generalized quantifiers. Let \({{\mathbf {R}}}\) be the \(\Phi \)–sum of \({\mathbf {R}}_{{{\mathcal B}}_1},\) \( \ldots ,\) \({\mathbf {R}}_{{{\mathcal B}}_{\beta }}\) over I, as above. For every \(\varphi \in {\mathcal{L}}(\tau )\) there are

  1. 1.

    a boolean function \(F_{\Phi , \varphi }(b_{1,1},\ldots ,b_{1,j_1},\ldots , b_{\beta ,1}, \ldots ,b_{\beta ,j_{\beta }},b_{I,1},\ldots ,b_{I,j_I})\)

  2. 2.

    \(\mathcal{L}\)–formulae \(\psi _{1,1},\) \(\ldots ,\) \(\psi _{1,j_1},\) \(\ldots ,\) \(\psi _{\beta ,1},\) \(\ldots ,\) \(\psi _{\beta ,j_{\beta }}\)

  3. 3.

    and MSOL–formulae \(\psi _{I,1},\) \(\ldots ,\) \(\psi _{I,j_I}\)

such that for every \({{\mathbf {R}}}\), \({\mathbf {R}}_I\) and \({\mathbf {R}}_{{{\mathcal B}}_{\imath }}\) as above with \({\mathbf {R}}_{{{\mathcal B}}_{\imath }} \models \psi _{\imath ,\jmath }\) iff \(b_{\imath ,\jmath } =1\) and \({\mathbf {R}}_{I} \models \psi _{I,\jmath }\) iff \(b_{I,\jmath } =1\) we have

$$\begin{aligned} {{\mathbf {R}}} \models \varphi \text{ iff } F_{\Phi , \varphi }(b_{1,1},\ldots ,b_{1,j_1},\ldots ,b_{\beta ,1}, \ldots ,b_{\beta ,j_{\beta }},b_{I,1},\ldots ,b_{I,j_I}) =1. \end{aligned}$$

Moreover, \(F_{\Phi , \varphi }\) and the \(\psi _{\imath ,\jmath }\) are computable from \(\Phi ^\#\) and \(\varphi \) , but are tower exponential in the quantifier depth of \(\varphi \) Footnote 1.

Proof

By analyzing the proof of Theorem 4 and using Theorem 1.

Finally, we receive our main result, concerning handling of queries under restructuring of databases:

Theorem 6

Let I be an index, \(\mathcal L\) be FOL (or rather any language for which Theorem 5 holds), and let \({\mathbf R}^{g+1}\) be the generalized sum of \({\mathbf R}^{g+1\prime }_1,\) \( \ldots ,\) \({\mathbf R}^{g+1\prime }_{\ell }\) over I, as usual. Let \(\Phi _g\), \(\Psi _g\) and \(\Phi ^{up}_g\) of the logic \(\mathcal L\) be as above. Any query \(\Phi ^{app}_g\) over \({\mathbf R}^{g}\) gives the corresponding query \(\Phi ^{app}_{g+1}\) over \({\mathbf R}^{g+1}\), where \(\Phi _{g+1}^{app}=\Phi _{g}^{app}(\Psi _{g})\) and each \(\varphi _{g+1,i}^{app}\) in \(\Phi _{g+1}^{app}\) may be computed with the help of the corresponding boolean function \(F_{ \{\Phi _g,\Psi _g,\Phi ^{app}_g\}, \varphi _{g+1,i}^{app}}\) \((b_{1,1}\),\(\ldots \),\(b_{1,j_1}\),\(\ldots \),\( b_{\ell ,1}\), \(\ldots \),\(b_{\ell ,j_{\ell }}\),\(b_{I,1}\),\(\ldots \),\(b_{I,j_I})\) as in Theorem 5.

5 Handling Queries Under Database Updates

Assume that we have a database scheme \(\mathbf R\) and a query (translation scheme) \(\Phi _{View}\), which defines the view. Assume that \(\mathbf R\) was updated by translation scheme \(\Phi ^{up}\). In terms of translation schemes, we obtain the following formulation:

Given: Translation scheme \(\Phi _V\), and the database update \(\Phi ^{up}_{DB}\).

Find: A set of view updates \(\Phi ^{incr}_V\) that uses the old content of the view and delete from and inserts in the view some set of tuples defined on the source database.

This leads to the situation on Fig. 3, where \(\Phi ^{incr}_V\) uses both: database and the old view. For the case of queries defined in relational algebra and for updates given as deletion and insertion of a (undefined) set of tuples, the question was investigated in [28]. For the case of Datalog, the answer for the same kind of updates is given in [12]. However, the techniques were defined for the specific languages. Moreover, the update operations, used in both cases are data changes. It means that sets of tuples, which we insert in relations or delete from relations are not defined, but given by enumeration.

Fig. 3.
figure 3

Incremental view maintenance under update.

Recently, in [21], dynamic problem was introduced in the following way. For a sequence \(w = \sigma _ 1, \dots , \sigma _m \in \bigtriangleup ^*_{can} (\tau )\) of operations (update translation schemes like \(\Phi ^{up}_{DB}\) in our formulation) and a structure \({\mathcal U}\), \(w({\mathcal U})\) is the result of subsequently applying the operations to \({\mathcal U}\) (\(DB^{new}\)), and \({\mathcal U}\) (\(DB^{old}\)) if \(w = \epsilon \).

Definition 12

([21]). Let S be a Boolean query on \(\tau \) -structures. The dynamic problem \({\mathcal D}(S)\) associated with S is the set of pairs \({\mathcal D}= ({\mathcal U}, w)\) where \({\mathcal U}\in Fin(\tau )\) and \(w \in \bigtriangleup _{can} (\tau )\) is an update sequence with \(w({\mathcal U}) \in S\). The query S is called the underlying static problem of \({\mathcal D}(S)\).

The dynamic problems are handled by incremental evaluation systems. These systems allow auxiliary relations over the universe of the input structure \({\mathcal U}\). Incremental Evaluation System (IES) for a dynamic problem \({\mathcal D}(S)\) consists of a set of logical interpretations (translation schemes) and an additional logical sentence \(\varphi \). Given an initial structure \({\mathcal D}\), the IES defines auxiliary relations over the universe of \({\mathcal U}\) by an interpretation called the initial interpretation.

In practice, we are interested in update operations, which we call relational updates, that means definable updates. Indeed, as a rule, a regular query that deletes (inserts) data from (to) a database looks like: delete from relation R all tuples, such that ...

Let us use one example from [28] for our purposes and paraphrase it the following way:

Example 10

Given database scheme \(\mathbf {R}=(R_1,R_2,R_3,R_4)\) and \(\Phi _{V}=(\phi )\), where

$$\begin{aligned} \phi =(\pi _{A}R_1\cup R_2)\bowtie (R_3-\sigma _{\zeta }R_4). \end{aligned}$$

Suppose a database update causes a set of tuples \(\bigtriangledown _{\theta }R_4\) to be deleted, where \(\theta \) is a formula that defines the set of tuples to be deleted.

The update changes only one relation and its translation scheme is: \(\Phi _{DB}^{up}=(R_1(x_1,\ldots ,x_{n_1}),R_2(x_1,\ldots ,x_{n_2}), R_3(x_1,\ldots ,x_{n_3}), R_4(x_1,\ldots ,x_{n_4})\wedge \) \(\lnot \theta (x_1,\ldots ,x_{n_4})),\) where \(\theta \), in general, contains parameters.

In terms of FOL, the query that defines the view is:

figure a

After the update, made by \(\Phi _{DB}^{up}\), the query is:

figure b

First, we show:

$$\begin{aligned}&((R_4(x_1,\ldots ,x_{n_3})\wedge \lnot \theta )\wedge \zeta )=\\&(R_4(x_1,\ldots ,x_{n_3})\wedge \zeta \wedge \lnot \zeta )\vee (R_4(x_1,\ldots ,x_{n_3})\wedge \zeta \wedge \lnot \theta )=\\&(R_4(x_1,\ldots ,x_{n_3})\wedge \zeta )\wedge (\lnot \theta \vee \lnot \zeta )=\\&(R_4(x_1,\ldots ,x_{n_3})\!\wedge \!\zeta \wedge \lnot R_4(x_1,\ldots ,x_{n_3}))\vee ((R_4(x_1,\ldots ,x_{n_3})\!\wedge \!\zeta )\wedge (\lnot \theta \vee \lnot \zeta ))=\\&(R_4(x_1,\ldots ,x_{n_3})\wedge \zeta )\wedge (\lnot R_4(x_1,\ldots ,x_{n_3})\vee \lnot \zeta \vee \lnot \theta )=\\&(R_4(x_1,\ldots ,x_{n_3})\wedge \zeta )\wedge \lnot (R_4(x_1,\ldots ,x_{n_3})\wedge \zeta \wedge \theta ). \end{aligned}$$

Now, we use the equivalence, obtained above, for \(\Phi _{DB}^{up\#}(\phi (x_1,\ldots ,x_{n_3}))\):

figure c

The second part of \(\Phi _{DB}^{up\#}(\phi (x_1,\ldots ,x_{n_3}))\) is exactly

$$\begin{aligned} ((\pi _{A}R_1\cup R_2)\bowtie (R_3\cap \sigma _{\zeta }\bigtriangledown _{\theta }R_4)), \end{aligned}$$

if written in relational algebra notation.

Example 10 shows that the only tools, which we really used in order to obtain the new propagation rules, were logical equivalences. Note additionally that, in general, any update translation scheme \(\Phi ^{up}=(\phi _1,\ldots ,\phi _i,\ldots ,\phi _n), \) which deletes (inserts) tuples, according to condition \(\theta \), from (to) relation \(R_i\) of database scheme \(\mathbf {R}=(R_1,\ldots ,R_i,\ldots ,R_n)\) is in the form: \(\phi _j=R_j\) if \(i\not =j\) and \(\phi _i=(R_i\wedge \lnot \theta )\) (or \(\phi _i=(R_i\vee \theta )\) for insertion of tuples, described by \(\theta \)), without relativization but parametrized.

Now, the following proposition generalizes the example and gives the following answer:

Proposition 1

For any formula \(\xi \) of FOL, MSOL or SOL and for any update translation scheme \(\Phi ^{up}\) of the same logic, it holds: \(\Phi ^{up\#}(\xi )=\xi \) or there is a set of formulae \(\xi ^{\prime }_i\), \(1\le i\le n\) of the same logic, such that \(\Phi ^{up\#}(\xi )= (\ldots ((\xi \circ _1 \xi ^{\prime }_1)\circ _2 \xi ^{\prime }_2)\ldots \circ _n \xi ^{\prime }_n)\), where \(\circ _i\in \{\wedge ,\vee \}\).

Proof

By induction on \(\xi \).

To show the same fact for LFP, IFP and TC, we use:

Theorem 7

Given \(\psi _1(\bar{x},X,\bar{y})\) and \(\psi _2(\bar{x},X,\bar{z})\), it holds:

$$\begin{aligned}&LFP \bar{x},X,\bar{u}(\psi _1(\bar{x},X,\bar{y})\vee \psi _2(\bar{x},X,\bar{z}))= LFP \bar{x},X,\bar{u}(LFP \bar{x},X,\bar{u}(\psi _1(\bar{x},X,\bar{y}))\vee \psi _2(\bar{x},X,\bar{z}));\\&IFP \bar{x},X,\bar{u}(\psi _1(\bar{x},X,\bar{y})\vee \psi _2(\bar{x},X,\bar{z}))= IFP \bar{x},X,\bar{u}(IFP \bar{x},X,\bar{u}(\psi _1(\bar{x},X,\bar{y}))\vee \psi _2(\bar{x},X,\bar{z}));\\&TC \bar{x},X,\bar{u}(\psi _1(\bar{x},X,\bar{y})\vee \psi _2(\bar{x},X,\bar{z}))= TC \bar{x},X,\bar{u}(TC \bar{x},X,\bar{u}(\psi _1(\bar{x},X,\bar{y}))\vee \psi _2(\bar{x},X,\bar{z})). \end{aligned}$$

The same holds for \(\wedge \) as well.

Proof

The proof follows directly from the semantics of LFP, IFP and TC.

Now, it remains to combine Proposition 1 and Theorem 7 with the following results, proven in [14]:

Theorem 8

If \(\varphi \) is an LFP-formula and \(\varphi ^{\prime }\) is an IFP-formula then there is a first-order formula \(\psi \), such that \(\varphi \) is equivalent to \(\exists (\forall )\bar{u}^{\prime } LFP \bar{x},X,\bar{u}\psi \) and there is an existential first-order formula \(\psi ^{\prime }\), such that \(\varphi ^{\prime }\) is equivalent to \(\exists (\forall )\bar{u}^{\prime } IFP \bar{x},X,\bar{u}\psi ^{\prime }\).

Theorem 9

Suppose that we have two constant c and d and in our model \(c\not =d\). Let \(\varphi \) be an existential pos-TC-formula. Then \(\varphi \) is equivalent to a formula of the form: \(TC\bar{x},\bar{x}^{\prime },c,d\psi (\bar{x},\bar{x}^{\prime })\), where \(\psi \) is a first-order quantifier-free formula.

Finally, we receive our main result, concerning handling of queries under database updates:

Theorem 10

Every query expressible in FOL, MSOL, SOL, LFP, IFP and existential pos-TC allows incremental view re-computation.

Proof

Use Proposition 1 with Theorems 7, 8 and 9.

As I-\(DATALOG\equiv IFP\) and on ordered databases LFP(TC) covers polynomial time (logarithmic space) computations, we, in particular, have:

Corollary 1

  1. 1.

    Every I-DATALOG program allows incremental re-computation.

  1. 2.

    On ordered databases every program, computable in polynomial time or logarithmic space, allows incremental re-computation.

6 Discussion and Conclusions

The paper introduces a unified logic based approach to maintenance of queries under database changes and shows how known results in translations schemes transfer can be applied to particular problems in database maintenance. This approach unifies different aspects, related to both schema and data evolution in databases, into a single framework. The basic underlying notion of a logical translation scheme and its induced maps, is based on the classical syntactic notion of interpretability from logic, made explicit by M. Rabin in [29].

Analyzing computations on different generations of databases, using our general technique, we encountered several problems with some kinds of rules, for example, deletion over join. Systematically using the technique of translation scheme, we introduced the notion of \(\Phi \)-sums and showed how queries, expressible in different extensions of FOL may be handled over different generations of the \(\Phi \)-sums.

Moreover, using the technique of translation scheme, we introduced the notions of an incremental view re-computations. We proved that every query expressible in FOL, MSOL, SOL, LFP, IFP and existential pos-TC allows incremental view re-computations. The last results lead to the corollary that every I-DATALOG program allows incremental re-computation. Moreover, it follows from our main results that on ordered databases every program, computable in polynomial time or logarithmic space, allows incremental re-computation.