Keywords

1 Introduction

Datalog, a declarative logic programming language, has many applications in a variety of domains such as deductive databases [17], data integration [12], program analysis [4, 11], bidirectional programming [21], and so forth. Verifying Datalog programs plays an essential role to guarantee the properties of these programs required by the applications. When a property is not satisfied, it is more important to reduce the user’s burden in debugging the unexpected behavior of the program.

This kind of debugging problem, which arises when a property of a program is not satisfied, has not been well studied for Datalog. There are two challenges in practice. The first challenge is searching for a concrete input database, i.e., a counterexample that reveals the unexpected behavior of the program. The second challenge is locating the buggy Datalog rules that break the property. By adopting the algorithmic debugging method [7], a few approaches were proposed for debugging Datalog programs [5, 6, 14]. However, the existing approaches neither provide users a way to specify the properties of Datalog programs nor generate counterexamples to show the incorrectness of the programs. To locate a bug, these approaches ask the users many questions about the computation correctness of the Datalog program. In other words, the users have to find out whether the Datalog program has unintended interpretations, e.g., the intention is not met by the program results. Identifying such unintended interpretations becomes costly when the input database of the program is not small.

An ideal approach to debugging would allow the user to specify the program’s properties and automatically run all the checks. The properties of a program are commonly specified by a set of assertions such as equalities, domain constraints, containments, and so forth. For Datalog, which is a logic programming language in relational databases, it is intuitive for programmers to specify the assertions in the forms of relational predicates. For example, one may consider that some relations of the Datalog program must be equivalent or some relations must be empty, i.e., the corresponding predicates are always false.

We illustrate with the following example the property specifications and the debugging problem of Datalog programs.

Fig. 1.
figure 1

Motivating example. The unexpected tuple and the buggy rule are highlighted.

Example 1 (Motivating Example: View Update Strategy)

In this example, we consider an application of Datalog in describing view update strategies [21]. Suppose that we are given a database of two base relations \(s_1(A, B)\) and \(s_2(A, B)\) (Fig. 1) with a view v(AB) defined over these two relations by a union query: \(v = get({s_1, s_2}) = s_1 \cup s_2\). The following is a buggy Datalog program (denoted as putdelta) that describes a view update strategy, i.e., a description about how to update the base relations \(s_1\) and \(s_2\) through the view v.

figure a

In putdelta, for a relation, \(\varDelta ^+\) and \(\varDelta ^-\) denote the insertion and deletion sets on the relation, respectively. Rules (r1) and (r2) state that if a tuple \(\langle X,Y \rangle \) is in \(s_1\) or \(s_2\) but not in v, it will be deleted from \(s_1\) or \(s_2\), respectively. Rule (r3) checks the tuples in v but not in \(s_1\), and stores these tuples in a mediate relation m. The last rule states that if a tuple \(\langle X,Y \rangle \) is in m but not in \(s_2\), it will be inserted into \(s_1\). putdelta takes as input the states of \(s_1, s_2\), and v to produce the delta relations of \(s_1\) and \(s_2\).

Such a putback program putdelta is required to satisfy round-tripping properties to maintain the consistency of view updates, as formulated in the existing works [10, 21]. Here, we illustrate the problem with the property (called GetPut) that in the input of putdelta, if the view is unchanged, i.e., \(v = s_1 \cup s_2\), the output of putdelta must be empty. We use first-order logic sentences (Fig. 1) to specify the constraints of the input (called precondition) and the constraints over the output (called postcondition).

Figure 1 shows a counterexample of GetPut that is a collection of tuples in the source tables and the view (\(s_1, s_2, v\)). Over this counterexample, the result of putdelta is \(\varDelta ^-_{s_1} = \varDelta ^-_{s_2} = \emptyset \) and \(\varDelta ^+_{s_1} = \{\langle b_2, a_2 \rangle \}\). That means tuple \(\langle b_2, a_2 \rangle \) is inserted into \(s_1\). This insertion is not expected by the postcondition. Since the input of putdelta satisfies the precondition but the output does not satisfy the postcondition, the GetPut property of putdelta is violated.

The user may wonder why tuple \(\langle b_2, a_2 \rangle \) of \(\varDelta ^+_{s_1}\) occurs unexpectedly in the output of putdelta. From this unexpected tuple, the problem now is to detect which rules in the original Datalog program are the causes. Here, in the head of rule (r3), the variables X and Y are placed in the wrong positions and thereby some wrong tuples are derived. This bug must be fixed to make putdelta satisfy the GetPut property.    \(\square \)

We believe that for a required property of a Datalog program, the user may not only have unexpected mistakes such as typos but also have wrong intentions that do not conform to the property. Providing suggestions on how to correct the program is very useful to users but is a challenging issue. In addition, debugging is an ambiguous process that there are many possible causes for a bug. Therefore, it is essential to design an interface that lets users interact with the underlying debugging engine. For example, the user can mark suspicious rules to inspect or decide how to proceed for the bug ambiguity.

The key insight of this paper is that counterexamples play a central role in debugging Datalog programs. First, a program is buggy if and only if a counterexample exists. Second, to be useful for debugging the Datalog program, a counterexample is expected to be a realistic and simple database.

Our approach is statically generating such a counterexample rather than dynamically testing the program with randomly generated test cases as in other works such as [3]. Over the generated counterexample, bugs can be observed in the execution results of the Datalog program. Although data provenance techniques from the database literature [16] can provide useful support to explain how and why the unexpected results are derived, whether we can use this provenance information to efficiently track down the detailed source of bugs remains unclear. In this paper, we fulfill this gap by a novel method that combines the provenance information with the user interaction for resolving the ambiguity in debugging. In summary, this paper has the following contributions:

  • We present a new way to use a syntactic extension of non-recursive Datalog for specifying the properties of a Datalog program.

  • To explain to the user the behavior of the written Datalog program, we develop a counterexample generator that statically checks specified properties of non-recursive Datalog programs and generates counterexamples for showing why the properties are not satisfied.

  • To reduce the user’s effort of correcting buggy Datalog programs, we design a user interface and a provenance-based debugging engine to assist the user in locating the bugs with the counterexamples. The debugging engine provides correction hints to the user when the bugs are found.

  • To demonstrate the efficiency and the usability of the proposed approach, we have implemented a prototype of the approach and evaluated it with Datalog programs in practice. The source code is available upon request.

The paper is organized as follows. Section 2 gives some background about the Datalog language with syntax extensions. In Sect. 3, we explain the design of our proposed counterexample generation method. We describe the counterexample-guided debugging approach in Sect. 4 and the experiment in Sect. 5. Section 6 presents related works. Section 7 wraps up the paper.

2 Background

A pure Datalog program is a finite set of logical rules, and each rule is an expression of the form [9]:

where \(r_0, r_1, \ldots , r_n\) are relations, “” is a variant of the standard logical implication “\(\leftarrow \)” from the rule body in the right-hand side to the rule head in the left-hand side. Each \(\textit{\textbf{X}}_i\) (\(i\in [0,n]\)) is a tuple of variables. Each variable occurring in \(\textit{\textbf{X}}_0\) must occur in at least one of \(\varvec{X_1}, \ldots , \varvec{X_n}\) in the body.

Fig. 2.
figure 2

Counterexample generation architecture

The relations in a Datalog program are divided into two categories:

  • EDB relations, which are physically stored in a relational database, called extensional database (EDB). These relations are the input of the program.

  • IDB (intensional database) relations, which are derived from the EDB relations using the Datalog program. An IDB relation occurs in some rule heads while an EDB relation can never be in the head of a rule. An IDB relation is recursive if it appears in both the head and the body of a rule. A Datalog program is non-recursive if it has no recursive IDB relation.

We can extend Datalog by allowing negations and built-in predicates such as equality (\(=\)) or comparison (\(<,>\)) in Datalog rule bodies but in a safe way that each variable occurring in the negated atoms or the built-in predicates must also occur in some positive atoms [9]. Throughout the paper, we refer Datalog to the Datalog language with the extensions of safe negation and built-in predicates.

Let P be a Datalog program and D be the database of all the EDB and IDB relations. A tuple \(\textit{\textbf{A}}\) in r, or a fact \(r(\textit{\textbf{A}})\), is immediately inferred from P and D if it satisfies one of the following conditions:

  • \(\textit{\textbf{A}} \in r\), where r is an EDB relation.

  • is an instantiation of a rule in P, i.e., all variables in the rule are substituted with constants. Here, a negative fact \(\lnot r_i(\textit{\textbf{A}}_i)\) holds if the fact \(r_i(\textit{\textbf{A}}_i)\) does not hold, i.e., \(\textit{\textbf{A}}_i\) is not a tuple of \(r_i\) in D. This is based on the Closed World Assumption (CWA) [9].

Semantically, evaluating P is computing the minimum database D such that every tuple in D is immediately inferred from D and P. In other words, we compute the least fixpoint of the immediate inference operator. In the standard bottom-up evaluation strategy for Datalog, the least fixpoint is obtained from P and the input EDB database by deriving all IDB tuples with a finite number of immediate inferences. To deal with negations in the Datalog program, the Datalog program is stratified to ensure that all the tuples of an IDB relation are derived before using any negative facts of this IDB relation in other immediate inferences. This is because if an IDB relation is incomplete, it is not sufficient to judge a negative fact of the IDB relation. The sequence of immediate inferences used for deriving a fact is called a proof of the fact and can be represented in a proof tree with different levels of the applied rules and facts.

3 Counterexample Generation

In this section, we present our approach to statically validating and generating counterexamples for a specified property of a non-recursive Datalog program.

Figure 2 shows our counterexample generation architecture. It consists of two main parts: a validator for statically checking the specified property and a counterexample generator for finding a counterexample for the property. The Datalog program with its property specification is first passed to the validator. If the validator successfully proves that the program satisfies the property, we conclude there is no counterexample. If the validator fails, the Datalog program is passed to the counterexample generator. Since many static checks such as equivalence for Datalog programs are undecidable [19], in both the validator and generator, we transform the property of the Datalog program into logical constraints that can be solved by an SMT solver, even though the termination is not guaranteed.

3.1 Specifying Program Properties

As mentioned previously, rather than introducing a new language, our approach is to use the same language to specify properties of a non-recursive Datalog program using preconditions and postconditions. By following the syntax introduced in [8, 21], we allow Datalog rules to have truth constant false (denoted as \(\bot \)) in the head. In this way, a precondition, as well as a postcondition, is a set of Datalog rules that have the following form:

figure b

That means \(\forall \textit{\textbf{X}}, (r_1(\textit{\textbf{X}}_1) \wedge \ldots \wedge r_n(\textit{\textbf{X}}_n)) \rightarrow \bot \), where \(\textit{\textbf{X}}\) are all the free variables.

Example 2

Consider the GetPut property in Example 1, which says that if there is no change to the view v, there is no change to the base tables \(s_1\) and \(s_2\). We use non-recursive Datalog to specify the precondition as follows:

The first two rules store the union of \(s_1\) and \(s_2\) in a mediate relation \(v^{old}\), and the last two rules indicate that v is the same as \(v^{old}\), i.e., the view does not change. And we can specify the postcondition that there is no change to the base tables as follows.

3.2 Validation

We use an SMT solver to prove the specified property of the Datalog program by translating the property into a first-order logic (FO) sentence. If there is a proof such that the FO sentence is valid, the property is satisfied.

Our transformation from non-recursive Datalog to first-order logic is based on the standard transformation [2, 9]. Let P be a non-recursive Datalog program, we inductively transform each relation r in P and the rules of the precondition and the postcondition into an equivalent FO formula \(\varphi _r\) as follows:

If r is an EDB relation, \(\varphi _{r} = r(\varvec{X_r}) = r(X_1,\ldots ,X_{arity(r)})\).

If r is an IDB relation, i.e., r occurs in the head of m rules:

The FO formula of r, if considering only the i-th rule, is \(\varphi _{r,i}(\varvec{X_r}) = \exists \textit{\textbf{E}}_i, \bigwedge \limits _{j=1}^{n_i} \beta _{{i,j}}\), where \(\textit{\textbf{E}}_i\) contains the bound variables of the i-th rule, i.e., the variables not in the rule head, and

By combining all the rules of r, we have:

$$\begin{aligned} \varphi _{r}(\varvec{X_r}) = \bigvee \limits _{i = 1}^m \varphi _{r,i}(\varvec{X_r}) = \bigvee \limits _{i = 1}^m \left( \exists \textit{\textbf{E}}_i, \bigwedge \limits _{j=1}^{n_i} \beta _{{i,j}}\right) \end{aligned}$$

By having the first-order formulas of all the IDB relations, each special Datalog rule of (*), which has \(\bot \) in the head in the precondition and postcondition, is transformed into a first-order sentence: \(\forall \textit{\textbf{X}}, (\varphi _{r_1}(\textit{\textbf{X}}_1) \wedge \ldots \wedge \varphi _{r_n}(\textit{\textbf{X}}_n)) \rightarrow \bot \). The precondition, as well as the postcondition, is a conjunction of all its FO sentences transformed from the special Datalog rules.

Let \(\varphi _{pre}\) and \(\varphi _{post}\) be the first-order sentences of the precondition and the postcondition, respectively. We employ an automated theorem prover to prove whether \(\varphi _{post}\) holds if \(\varphi _{pre}\) holds. In other words, we check whether the following first-order sentence is valid: \(\varphi _{pre} \rightarrow \varphi _{post}\).

3.3 Generating Counterexamples

As mentioned previously, to assist the user in debugging a specified property, we shall generate counterexamples, which are used to guide the user to the location of bugs. The simpler the counterexamples are, the easier the user can succeed in debugging the program.

To generate a counterexample, our idea is to create a symbolic database and transform the evaluation of the Datalog program over the symbolic database with the specified property into a constraint program in Rosette [20]. The Rosette symbolic execution runtime translates the program into logical constraints that are performed by an underlying SMT solver such as Z3 [1]. The result obtained by the Rosette framework is an interpretation of the symbolic input over which the specified property of the Datalog program is violated.

Fig. 3.
figure 3

Transformation from Datalog to functions

To put it more concretely, we construct a symbolic input of the source and view tables by representing each table as a list of tuples, each tuple is a list, where each element is a symbolic value. The order and the duplicates of tuples are ignored because a relation is a set of tuples rather than a list. Considering Example 1, assuming that the types of attributes A and B are integer and real, respectively, we define a symbolic table v as follows (similarly for \(s_1\) and \(s_2\)).

figure c

Since string values are not supported in the underlying SMT solvers, in our transformation, we use an integer symbol for a string attribute. A value for this integer symbol will be mapped to a string value by using a predefined dictionary, where the integer value is used as an index to determine the corresponding string value. In other words, we build up a partial bijective function that maps an integer value to a string in the dictionary. Since the dictionary has finite words, we limit the values of a string attribute to be in the predefined dictionary. For example, for a relation r(S : string), we define a symbolic tuple as the following:

figure d

The assertion in the second line ensures that the value of \(s_1\) is in the index range of the dictionary.

We evaluate a non-recursive Datalog program over a symbolic input by using four functions: Cartesian product, Filter, Map, and Concat. Figure 3 illustrates the steps for evaluating a relation r. For each rule of r, we first take a cartesian product over all positive relations in the rule body and then apply a filter (Filter\(_1\)) for the join attributes, a filter (Filter\(_2\)) for all built-in predicates, and another filter (Filter\(_3\)) for the negative relations. Over the tuples resulted from these tree filters, we use a mapping function to select the attributes appearing in the rule headFootnote 1. If r is defined by multiple rules, we evaluate r in each rule and concatenate all the resulted tuples. For a non-recursive Datalog program, which has many IDB relations, we can inductively evaluate all the IDB relations in the program.

Example 3

For the first rule in Fig. 3, we take a cartesian product of the two positive relations s and u. The result is first filtered by Filter\(_2\) to select only tuples, where the second attribute of s agrees with the first attribute of u, i.e., \(Y_s=Y_u\). Filter\(_2\) is applied to select the tuples satisfying \(X>1\). Filter\(_3\) checks whether there exists a tuple \(\langle X_t, Z_t \rangle \) in t that agrees with the attributes \(X_s\) and \(Z_u\) in the tuples resulted from Filter\(_2\). The mapping function takes a projection over the three-dimension tuples and results in two-dimension tuples. Function Concat gets all the tuples computed by the two rules.    \(\square \)

We now turn to encode the property that is specified by the precondition and the postcondition. Recall that the precondition, as well as the postcondition, is a set of Datalog rules having constant \(\bot \) in the head. To encode these Datalog rules into Rosette constraints, we first replace \(\bot \) with a normal predicate, named \(\emptyset _{pre}\) for the precondition and \(\emptyset _{post}\) for the postcondition, and then encode the evaluation of the obtained Datalog rules into functions as presented previously. These two relations, \(\emptyset _{pre}\) and \(\emptyset _{post}\), are both expected to be empty. With the evaluation of \(\emptyset _{pre}\) and \(\emptyset _{post}\) over the symbolic input presented previously, we first encode the precondition into an assertion that the length of table \(\emptyset _{pre}\) is equal to 0 as the following:

figure e

We then add another assertion that the length of table \(\emptyset _{post}\) is greater than 0 to solve the constraint on the symbolic input that the precondition is satisfied but the postcondition is violated:

figure f
figure g

Algorithm 1 summarizes the main steps in our proposed counterexample generation. Starting from 0, we increase the maximum size, denoted as n, of each input EDB table. With a value of n, we construct n symbolic tuples for each EDB table. We encode the specified property by constructing assertions corresponding to the precondition and the postcondition. We input these assertions to the Rosette framework [20] to find a value for each symbol in the input that the precondition is satisfied but the postcondition is not. If it succeeds, we stop the while loop, instantiate all the EDB symbolic tables, and eliminate duplicates. Otherwise, we continue the loop with an increased value of n.

4 Interactively Locating Bugs with Counterexamples

In this section, we present our method for interactively debugging a non-recursive Datalog program with counterexamples. Our approach consists of a user interface and an underlying debugging engine that assists the user in determining the location of bugs that cause the unexpected behavior of the program.

4.1 Checking Counterexamples

As presented in the previous section, a counterexample is an instance of the input database of the Datalog program such that the property, which is specified by the precondition and the postcondition, is not satisfied. Given an instance of the input database, to check whether the property is violated, we evaluate the output and check whether the input satisfies the precondition and the output does not satisfy the postcondition. Recall that both the precondition and the postcondition are written in Datalog rules with a constant \(\bot \) in the head. We check these conditions by replacing \(\bot \) with \(\emptyset _{pre}(\textit{\textbf{X}})\)/\(\emptyset _{post}(\textit{\textbf{X}})\) for the precondition/postcondition, where \(\textit{\textbf{X}}\) are variables in the rule body, and evaluating the obtained Datalog rules. The specified property is violated if \(\emptyset _{pre}\) is empty but \(\emptyset _{post}\) is not empty. Any tuple appearing in \(\emptyset _{post}\) is the symptom of the unexpected behavior of the Datalog program with respect to the specified property.

Example 4

Consider the putdelta program with an input database in Example 1 and its GetPut property specified in Example 2. To check GetPut, we check the emptiness of \(\emptyset _{pre}\) and \(\emptyset _{post}\) in the following rules:

Fig. 4.
figure 4

Strata-based sequentialization.

Clearly, in the result, there is no tuple in \(\emptyset _{pre}\) but there is a tuple \(\langle b_2, a_2 \rangle \) in \(\emptyset _{post}\). Therefore, GetPut is violated.

4.2 Dialog-Based User Debugging Interface

Given a counterexample, the debugging problem is to locate the buggy Datalog rules that cause the symptom that the output is faulty. It is extremely ambiguous to determine the locations of bugs since there may be many possible reasons for a fault in the output. Therefore, we allow the user to be involved in the debugging process by designing a dialog-based interface that asks the user to confirm and choose relevant options to handle the ambiguity occurring in the debugging process.

Since Datalog is a declarative programming language, the computation is not explicitly described in the Datalog program. Rather than constructing the computation tree or graph from the Datalog program as in other existing works [5, 6, 14], we shall sequentialize the Datalog program to construct an order of the rules for the evaluation. In other words, we partition the original Datalog program into a sequence of smaller parts, where the final output of the program is obtained by evaluating these parts one by one in the order defined by the sequence. Similarly, we also sequentialize Datalog rules of the postcondition, where the head \(\bot \) is replaced by \(\emptyset _{post}\).

To construct a partition \(\{P_1, P_2, \ldots , P_n\}\) of a Datalog program P, we use the well-known stratification method for Datalog [9] simplified for the case that there is no recursion in the Datalog program. Specifically, we use the precedence graph defined as the following.

Definition 1

The precedence graph \(G_P\) of a Datalog program P is a directed graph, where nodes are the IDB relations of P and edges are relation dependencies: if or is a rule in P, then \(\langle r',r\rangle \), which represents that \(r'\) precedes r, is an edge in \(G_P\).

For a precedence graph, we assign to each node, which is a relation, all the rules of the relation. The rules in each node in the precedence graph form a stratum. We assign to each stratum a unique position such that if stratum \(P_i\) precedes stratum \(P_j\) in the precedence graph, then \(i < j\). Clearly, each stratum in the graph can be evaluated only after all its preceding stratums are evaluated.

Figure 4 shows a program P, which is partitioned into n parts \(P_1, P_2, \ldots , P_n\), and postcondition rules, which are partitioned into m parts \(\varSigma _1, \ldots , \varSigma _m\). The input of P, which consists of EDB relations, is the input for the first part \(P_1\). We evaluate the output of P by evaluating each part individually that the output of \(P_{i-1}\) (IDB\(_{i-1}\)) becomes the input of \(P_i\) (EDB\(_i\)) for every part \(P_i\). Similarly, the output of P is the input of the postcondition rules. By evaluating \(\varSigma _1, \ldots , \varSigma _m\) in this order, we obtain \(\emptyset _{post}\).

Any tuple unexpectedly appearing in \(\emptyset _{post}\) indicates that the specified property is violated. From this fault symptom, the debugging process is to analyze how the data is changed after each stratum to detect which stratum contains the bugs. In the input/output of a stratum, there are two types of faulty tuples: wrong tuples, which unexpectedly appear, and missing tuples, which cannot be computed as expected. For example, all the tuples in \(\emptyset _{post}\) are wrong. This is caused by wrong or missing tuples in the input of \(\varSigma _m\), i.e., the output of \(\varSigma _{m-1}\).

For each stratum \(P_i\), if there is a wrong/missing tuple in the output of \(P_i\) (IDB\(_i\)), we have two possible reasons: \(P_i\) contains the buggy rules; or the input of \(P_i\), which is the output of \(P_{i-1}\), contains wrong/missing tuples.

Since the root cause of the property violation is in the original Datalog program P, only \(P_1, P_2, \ldots , P_n\) need to be inspected. Meanwhile, the stratums of the postcondition rules, \(\varSigma _1, \ldots , \varSigma _m\), do not need to be inspected. They are used to detect faulty tuples in the output of P. Our underlying debugging engine automatically predicts the possible faults in the input of each stratum \(\varSigma _i\). In this way, the possible faults in the output of P are detected without user interaction.

The user interaction is allowed when the underlying debugging engine inspects the stratums from \(P_n\) to \(P_1\). At each stratum \(P_i\), when having a faulty tuple in the output of \(P_i\), we let the user confirm and choose one of the two reasons for diagnosing the bugs by questioning the user about the validity of IDB\(_{i-1}\), i.e., the input of \(P_i\). Specifically, we evaluate all the stratums preceding \(P_i\) to obtain IDB\(_{i-1}\) and use the faulty output of \(P_i\) (IDB\(_i\)) to predict faulty tuples in IDB\(_{i-1}\). On one hand, if the user confirms that IDB\(_{i-1}\) is valid, the underlying engine will suspect \(P_i\) to infer possible buggy rules. On the other hand, if the user finds suspiciousness in IDB\(_{i-1}\), the underlying engine will infer possible wrong/missing tuples in IDB\(_{i-1}\) assuming \(P_{i}\) is correct, and then question the user to confirm the relevant faulty tuples.

Fig. 5.
figure 5

Debugging interaction example.

Example 5

Figure 5 illustrates a debugging session for the putdelta program and its GetPut property shown in Example 1. Here, putdelta is stratified into four parts, \(P_1, P_2, P_3, P_4\), corresponding to the four rules defining the four IDB relations in the program. There is only one stratum \(\varSigma _1\) for the postcondition rules.    \(\square \)

4.3 Debugging Engine

We now present our underlying debugging engine that generates debugging details for the dialog-based user interaction and performs the debugging process based on the user’s choices. Specifically, the debugging engine traverses all the stratums from the last one to the first one. At each stratum \(P_i\), the debugging engine predicts possible faults in the input of the stratum that cause the faults observed in the output of the stratum and lets the user confirm and choose one fault. If the user confirms the input of \(P_i\) is correct, the engine suspects \(P_i\). In contrast, if the user chooses one fault, the engine goes to the preceding stratum \(P_{i-1}\) for inspecting.

Assuming that the rules in the stratum are correct, and there is a faulty (wrong or missing) tuple in the output of the stratum, we predict faulty tuples in the input of the stratum based on the provenance information of the faulty tuple in the output that is how it is derived or how it is not derived.

For a wrong tuple in the output of the stratum, its provenance can be explained by constructing all the proof trees that are used by the stratum to derive the tuple. In our stratification strategy, each stratum contains only rules of an IDB relation. Therefore, the maximum height of the proof trees of wrong output tuples is 1. If a wrong tuple does not belong to the IDB relation, it is derived directly from the same wrong tuple in the input of the stratum. In contrast, if a wrong tuple belongs to the IDB relation, it is derived by an immediate inference with rules in the stratum, thus its proof trees have height 1. The proof trees can be extracted from the standard bottom-up evaluation strategy [9] of Datalog by assembling all the immediate inferences.

Example 6

Considering the putdelta program in Example 4 and its stratification in Fig. 5, the provenance of tuple \(\langle b_2, a_2 \rangle \) of \(\emptyset _{post}\) in the output of the last stratum is explained by the following proof tree:

where \(\varDelta ^+_{s_1}(b_2, a_2)\) is explained by the previous stratum as the following:

   \(\square \)

From the constructed proof trees, we detect all the faulty tuples in the input that must be changed to make the wrong tuples in the output disappear. For a wrong tuple, which is derived directly from the same tuple in the input of the stratum, we conclude this tuple in the input of the stratum is wrong. For a wrong tuple derived by the rules of the stratum, all the proof trees of this tuple must be deconstructed by changing the facts used in these proof trees.

Let w be the IDB relation defined in a stratum \(P_i\), and \(w(\textit{\textbf{A}}_0)\) be a wrong tuple in the output of \(P_i\). A proof tree of \(w(\textit{\textbf{A}}_0)\) has the following form:

Here, we apply the rule with the facts \((\lnot )r_1(\textit{\textbf{A}}_1), \ldots , (\lnot )r_n(\textit{\textbf{A}}_n)\) to infer \(w(\textit{\textbf{A}}_0)\). Since \(w(\textit{\textbf{A}}_0)\) is derived if all the facts \((\lnot )r_1(\textit{\textbf{A}}_1)\), ..., and \((\lnot )r_n(\textit{\textbf{A}}_n)\) hold, changing one of \((\lnot )r_1(\textit{\textbf{A}}_1), \ldots , (\lnot )r_n(\textit{\textbf{A}}_n)\) is sufficient to make \(w(\textit{\textbf{A}}_0)\) not derived, and thus correct \(w(\textit{\textbf{A}}_0)\). In other words, \(w(\textit{\textbf{A}}_0)\) is wrong because one of the facts \((\lnot )r_1(\textit{\textbf{A}}_1), \ldots , (\lnot )r_n(\textit{\textbf{A}}_n)\) is wrong. We exclude facts that are from EDB relations because the EDB database is not computed by the Datalog program. We raise a question to the user interface to let the user confirm and choose one wrong tuple. This is repeatedly performed for each proof tree of each wrong tuple in the output of \(P_i\).

Remark 1

A fact \(\lnot r(\textit{\textbf{A}})\) is wrong iff \(r(\textit{\textbf{A}})\) is missing. This follows from the closed world assumption (CWA).

A missing tuple, which is not derived in the output of a stratum, is explained by any proof tree that fails to be constructed. The failed proof tree cannot be completed because of some facts that are required but do not hold. As presented previously, in our stratification strategy, each stratum contains only rules of an IDB relation that the proof trees of a tuple have maximum height 1. A proof tree, which has height 1, is constructed by instantiating a rule in the stratum. To avoid constructing an infinite number of proof trees that are not related to the context of the Datalog program, as other approaches [16], we restrict the Datalog program to its active domain, which is the set of all constants appearing in the EDB relations and the program. Specifically, only values in the active domain are used to instantiate a rule. In this way, we obtain a finite number of proof trees for a tuple in the output.

We detect the faulty tuples in the input that cause a missing tuple in the output as follows. If the missing tuple does not belong to the IDB relation defined by the rules in the stratum, we conclude it is missing in the input of the stratum. In contrast, we construct a proof tree of the missing tuple by instantiating a rule in the stratum and then find all the facts not holding in the rule body. Clearly, these faulty facts explain the missing tuple in the output of the stratum. In this way, by constructing all the proof trees, we enumerate all possible faults in the input and raise a question to the user for choosing the most suitable fault. To reduce the number of possible faults, we also prefer the smaller faults to the bigger ones. A fault is smaller if the number of faulty facts in the fault is smaller. The smaller a fault is, the more easily it can be fixed.

We have predicted all the faults (wrong and missing tuples) in the input of a stratum based on the assumption that the rules in the stratum are correct. At the user interface level, we have raised questions to the user to confirm the faults in the input that cause the faulty tuples in the output. Since a stratum contains only rules of an IDB relation, named \(r_i\), changing the rules in the stratum can only correct the faulty tuples of \(r_i\) in the output. Therefore, for the faulty tuples of \(r_i\), if in the input, there is no possible fault or the user confirms no predicted fault is suitable, we can conclude that the rules in the stratum contain the bugs and start inspecting the stratum’s rules.

Given a faulty tuple in the output of a stratum and assuming that all the tuples in the input are correct, the problem is to determine which rules of \(r_i\) are wrong or whether a rule is missing. For a wrong tuple in the output, to locate the corresponding buggy rules, we use the wrong tuple’s proof trees constructed before. Specifically, all the rules applied in these proof trees are wrong since they must be changed to make the wrong tuple disappear in the output. For a missing tuple in the output, the user has two ways to fix the rules for producing the missing tuple. The first option is changing one of the rules in the stratum so that it can produce the missing tuple. The second option is adding to the stratum a new rule that can be applied to derive the missing tuple.

To assist the user in correcting the buggy rules in the stratum, we give the user correction hints by showing the proof trees of the faulty tuples and showing the input and the output expected for adding/changing the rules. To be efficient, at each stratum, we show all these observations to the users for finding the cheapest way to correct all the bugs found.

Fig. 6.
figure 6

Debugging demonstration.

Example 7

We illustrate our debugging approach by considering the putdelta program in Example 1 with another property, called PutGet [21], specified as follows. There is no rule for the precondition, and the postcondition is:

figure h

That means if we apply delta relations, \(\varDelta ^\pm _{s_1/s_2}\) obtained from the putdelta program, to the source relations, \(s_1\) and \(s_2\), and calculate the view \(v^{new}\) again, we expect \(v^{new}\) to be the same as the initial view v. Let us consider a counterexample of PutGet as the following: \(s_1=\{\langle a_1, b_1 \rangle \}\), \(s_2 = \emptyset \), \(v = \{\langle a_1, b_1 \rangle , \langle a_2, b_2 \rangle \}\). Over this counterexample, the result of putdelta is: \(\varDelta ^-_{s_1} = \varDelta ^-_{s_1} = \emptyset \), \(\varDelta ^+_{s_1} = \{\langle b_2, a_2 \rangle \}\). Thus, \(v^{new} = \{\langle a_1, b_1 \rangle , \langle b_2, a_2 \rangle \}\), leading to that \(\emptyset _{post} = \{\langle a_2, b_2 \rangle , \langle b_2, a_2 \rangle \}\) in the rules (r10) and (r11). Therefore, the PutGet property is violated.

Figure 6 illustrates how the causes of the wrong tuples \(\emptyset _{post} (a_2, b_2)\) and \(\emptyset _{post} (b_2, a_2)\) are predicted. Here, the putdelta program is stratified into \(P_1\), \(P_2\), \(P_3\), \(P_4\) and the PutGet precondition is stratified into \(\varSigma _1\), \(\varSigma _2\), \(\varSigma _3\), \(\varSigma _4\).

For the wrong tuple \(\emptyset _{post} (b_2, a_2)\), by using its proof trees at each stratum of \(\varSigma _1, \varSigma _2, \varSigma _3\) and \(\varSigma _4\), we have wrong tuples \(v^{new}(b_2, a_2)\), \(s_1^{new}(b_2, a_2)\), \(s_1^{new}(b_2, a_2)\), and \(\varDelta ^+_{s_1}(b_2, a_2)\), respectively. Since stratum \(\varSigma _2\) does not contain any rules defining \(s_1^{new}\), the wrong tuple \(s_1^{new}(b_2, a_2)\) in the output of \(\varSigma _2\) is simply derived from this wrong tuple \(s_1^{new}(b_2, a_2)\) in the input of \(\varSigma _2\).

For the wrong tuple \(\emptyset _{post} (a_2, b_2)\), at stratum \(\varSigma _4\), we predict a wrong fact \(\lnot v^{new}(a_2, b_2)\) in the input of \(\varSigma _4\). That means \(v^{new}(a_2, b_2)\) is missing. At stratum \(\varSigma _3\), there are two possible proof trees corresponding to rules (r8) and (r9), respectively. Therefore, there are two possible causes of \(v^{new}(a_2, b_2)\): \(s_1^{new}(a_2, b_2)\) is missing or \(s_2^{new}(a_2, b_2)\) is missing. We continue to predict the causes of each of these tuples \(s_1^{new}(a_2, b_2)\) and \(s_2^{new}(a_2, b_2)\). Eventually, some predicted causes are invalid. For example, at \(\varSigma _2\), the cause of the missing tuple \(s_2^{new}(a_2, b_2)\) is a missing tuple \(s_2(a_2, b_2)\) which cannot be fixed because \(s_2\) is an EDB relation. There is only one valid cause: \(\varDelta ^+_{s_1}(a_2, b_2)\) is missing.

Table 1. Debugging results. indicates that the property is satisfied.

After predicting the faults in the output of \(P_4\), i.e., the output of the putdelta program, the user interaction is triggered. At stratum \(P_4\), assuming \(P_4\) is correct, the cause of the wrong tuple \(\varDelta ^+_{s_1}(b_2, a_2)\) is a wrong tuple \(m(b_2, a_2)\) and the cause of the missing tuple \(\varDelta ^+_{s_1}(a_2, b_2)\) is a missing tuple \(m(a_2, b_2)\). Here, a question of confirming whether \(m(b_2, a_2)\) is wrong and whether \(m(a_2, b_2)\) is missing is raised to the user interface. If the user confirms there is no faulty tuple, the debugging engine will inspect \(P_4\); in contrast, it goes to stratum \(P_3\). For inspecting \(P_4\), since there is only one rule (r4) that is used in the proof tree of \(\varDelta ^+_{s_1}(b_2, a_2)\) and \(\varDelta ^+_{s_1}(a_2, b_2)\), (r4) is a buggy rule. For \(P_3\), because no fault in the input of \(P_3\) is predicted, the engine inspects \(P_3\) without user interaction. Interestingly, both the choices of inspecting \(P_4\) or going to \(P_3\) can detect the bug that can be solved. Specifically, changing m(XY) in (r4) to m(YX) can make \(\varDelta ^+_{s_1}(b_2, a_2)\) disappear and make \(\varDelta ^+_{s_1}(a_2, b_2)\) appear in the output, and thus PutGet satisfied. Similarly, changing m(YX) in (r3) to m(XY) can also correct the program.    \(\square \)

5 Implementation and Experiment

We have implemented a prototype for our debugging approach in Ocaml and integrated it with Rosette [20] and Z3 [1] as the SMT solvers for our counterexample generation. The user can interact with our system via a command-line tool. By the tool, the user can start a debugging session with a counterexample which is automatically generated by the tool or given by the user.

To evaluate our approach, we use non-recursive Datalog programs collected in [21]. These programs are written for implementing practical view update strategies that are required to be well-defined (called the DeltaDis property) and satisfy the round-tripping properties, i.e., GetPut and PutGet, with the corresponding view definitions to guarantee the consistency between the views and the source tables. We randomly add bugs to these programs and run an experiment to evaluate the performance of our approach in debugging these programs. Specifically, we measure the time for generating counterexamples, the size of the generated counterexamples, and the number of questions used to ask the user for locating the bugs. The experiment is performed on a computer of 2 CPUs and 4 GB RAM running Ubuntu Server LTS 16.04. We set up a timeout of 1 min for generating counterexamples.

Table 1 summarizes the results of our experiment. The time for generating counterexamples and the size of counterexamples almost increase against the number of rules in the program and the specified properties. The generating time also depends on the difficulty of the bugs and the complexity of Datalog rules. For example, phonelist has a smaller generating time than koncerty because the rules of phonelist are more straightforward. products has a bigger generating time than purchaseview because PutGet is usually more complex than GetPut. For vehicle_view, the counterexample generator does not terminate after the maximum allowed running time. The results show that the number of questions used in locating bugs is usually small. This number depends on the complexity of the program and the difficulty of the bugs. Some simple programs such as luxuryitems have no question, meanwhile, some bigger programs such as all_cars and koncerty, which contain more bugs or more user-written rules, need more questions with the user interaction to find the buggy rules.

6 Related Work

Algorithmic debugging [18], also known as declarative debugging, is a semi-automatic debugging technique that is based on the answers of the programmer to a series of questions generated automatically by the algorithmic debugger. Due to its abstraction level, this technique is relevant to declarative programming languages such as Datalog. Some approaches [5, 6, 14] have been proposed to apply algorithmic debugging to Datalog. These existing approaches can assist the user after a fault (i.e., a counterexample) is detected but suffer from the well-known scalability problems of algorithmic debugging [7] that more user interaction is required in the debugging process. In our approach, we strengthen the algorithmic debugging technique applied to non-recursive Datalog by statically generating minimum-size counterexamples for the debugging process. We exploit provenance techniques [13, 15, 16] to automatically predict the root causes of the observed faults of the Datalog programs for reducing the human effort of answering the questions raised by the algorithmic debugger.

7 Conclusion

In this paper, we have presented a novel debugging approach to non-recursive Datalog programs. Our framework assists users in checking and generating counterexamples for the programs with properties prespecified by users and then uses counterexamples to guide the users to the location of bugs via a dialog-based interface. The experimental results show the performance of our approach.