1 Introduction

Meta-interpretive learning (MIL) [4, 6, 25] is a form of inductive logic programming (ILP). MIL uses second-order Horn clauses, called metarules, as a form of declarative bias [28]. Metarules define the structure of learnable programs, which in turn defines the hypothesis space. For instance, to use MIL to learn the grandparent/2 relation given the parent/2 relation, the chain metarule would be suitable:

$$\begin{aligned} P(A,B) \leftarrow Q(A,C), R(C,B) \end{aligned}$$

The letters P, Q, and R denote existentially quantified second-order variables (i.e. variables that can be bound to predicate symbols), and the letters A, B and C denote universally quantified first-order variables (i.e. variables that can bound to constant symbols). Given this metarule, the background parent/2 relation, and examples of the grandparent/2 relation, MIL uses a Prolog meta-interpreter to generate a proof of the examples by finding substitutions for the second-order variables. In this scenario, MIL could perform the substitutions {P/grandparent, Q/parent, R/parent} to induce the theory:

$$\begin{aligned} grandparent(A,B) \leftarrow parent(A,C), parent(C,B) \end{aligned}$$

Many ILP systems, such as BLIP [7], Clint [29], DIALOGS [9], MOBAL [14], and MIL-HEX [13], use metarulesFootnote 1 (or variants of them). Non-ILP program induction systems, such as ProPPR [32], SYNTH [1], and DILP [8], also use variants of metarules. However, despite their widespread use, there is little work determining which metarules to use for a given task. Instead, suitable metarules are typically assumed to be given as part of the background knowledgeFootnote 2.

In MIL, deciding which metarules to use is a trade-off between efficiency and expressivity. The hypothesis space increases given more metarules (Theorem 3.3), so we wish to use fewer metarules. But if we use too few metarules then we lose expressivity. For instance, it is impossible to learn the grandparent/2 relation using only monadic metarules. To address this issue, Cropper and Muggleton [3] used Progol’s entailment-reduction algorithm [23] to identify irreducible, or minimal, sets of metarules. Their approach removed entailment redundant clauses from sets of metarules, where a clause C is entailment redundant in a clausal theory \(T \cup \{C\}\) when \(T \models C\). To illustrate this form of redundancy, consider the clausal theory:

$$\begin{aligned} \begin{array}{l} C_1 = p(A,B) \leftarrow q(A,B) \\ C_2 = p(A,B) \leftarrow q(A,B), r(A) \end{array} \end{aligned}$$

The clause \(C_2\) is entailment redundant because it is a logical consequences of \(C_1\).

Cropper and Muggleton showed that, in some cases, as few as two metarules are sufficient to entail an infinite fragment of second-order dyadic datalog. Moreover, they showed that learning with minimal sets of metarules improves predictive accuracies and reduces learning times compared to non-minimal sets. However, entailment reduction is not always the most appropriate form of reduction. To illustrate this point, suppose you want to learn the father/2 relation given the background relations parent/2, male/1, and female/1. Then a suitable hypothesis would be:

$$\begin{aligned} father(A,B) \leftarrow parent(A,B), male(A) \end{aligned}$$

To learn such a theory, one would need a metarule of the form \(P(A,B) \leftarrow Q(A,B),R(A)\). Now suppose you have the metarules:

$$\begin{aligned} \begin{array}{l} M_1 = P(A,B) \leftarrow Q(A,B)\\ M_2 = P(A,B) \leftarrow Q(A,B),R(A) \end{array} \end{aligned}$$

Running entailment reduction on these metarules would remove \(M_2\) because it is a logical consequence of \(M_1\). But it is impossible to learn the intended father/2 theory given only \(M_1\). As this example shows, entailment reduction can be too strong because it can remove metarules necessary to specialise a clause (where \(M_2\) can be seen as a specialisation of \(M_1\)).

To address this issue, we describe a new form of reduction based on derivations. Let \(\vdash \) represent derivability in SLD-resolution [15], then a Horn clause C is derivationally redundant in a Horn theory \(T \cup \{C\}\) when \(T \vdash C\). A Horn theory is derivationally irreducible if it contains no derivationally redundant clauses. To illustrate the difference between entailment reduction and derivation reduction, consider the metarules:

$$\begin{aligned} \begin{array}{l} M_1 = P(A,B) \leftarrow Q(A,B)\\ M_2 = P(A,B) \leftarrow Q(A,B),R(A)\\ M_3 = P(A,B) \leftarrow Q(A,B),R(A,B)\\ M_4 = P(A,B) \leftarrow Q(A,B),R(A,B),S(A,B) \end{array} \end{aligned}$$

Running entailment reduction on these would leave the single metarule \(M_1\) because it entails the rest of the theory. By contrast, performing derivation reduction would only remove \(M_4\) because it can be derived by self-resolving \(M_3\). The remaining metarules \(M_2\) and \(M_3\) are not derivationally redundant because there is no way to derive them from the other metarules.

1.1 Contributions

Our main contributions are:

  • We introduce the derivation reduction problem, the problem of removing derivationally redundant clauses from a clausal theory, and show that the problem is undecidable in general (Sect. 3)

  • We introduce a derivation reduction algorithm (Sect. 3)

  • We run derivation and entailment reduction on finite sets of metarules to identify minimal sets (Sect. 4)

  • We theoretically study whether sets of metarules can be derivationally reduced (Sect. 4)

  • We experimentally compare learning with derivation and entailment reduced metarules, where the results show that using the former set results in higher predictive accuracies and lower learning times (Sect. 5).

2 Related Work

Meta-interpretive Learning. Although the study of metarules has implications for many ILP approaches [1, 7,8,9, 14, 29, 32], we are primarily motivated by MIL. MIL is a form of ILP based on a Prolog meta-interpreter. The key difference between a MIL learner and a standard Prolog meta-interpreter is that whereas a standard Prolog meta-interpreter attempts to prove a goal by repeatedly fetching first-order clauses whose heads unify with a given goal, a MIL learner additionally attempts to prove a goal by fetching second-order metarules, supplied as background knowledge, whose heads unify with the goal. The resulting meta-substitutions are saved and can be reused in later proofs. Following the proof of a set of goals, a logic program is formed by projecting the meta-substitutions onto their corresponding metarules, allowing for a form of ILP which supports predicate invention and learning recursive theories.

Metarules. Metarules were introduced in the Blip system [7]. Kietz and Wrobel [14] studied generality measures for metarules in the RDT system. A generality order is necessary because the RDT system searches the hypothesis space (which is defined by the metarules) in a top-down general-to-specific order. A key difference between RDT and MIL is that whereas RDT requires metarules of increasing complexity (e.g. rules with an increasing number of literals in the body), MIL derives more complex metarules through resolution. This point is important because the ability to derive more complex metarules through resolution allows us to start from smaller sets of primitive or core metarules. The focus of this paper is identifying such core sets.

Using metarules to build a logic program is similar to the use of refinement operators in ILP [26, 31] to build a definite clause literal-by-literalFootnote 3. As with refinement operators, it seems reasonable to ask about completeness and irredundancy of a set of metarules, which we explore in this paper.

Logical Redundancy. Detecting and eliminating redundancy in a clausal theory is useful in many areas of computer science. In ILP, logically reducing a theory is useful to remove redundancy from a hypothesis space to improve learning performance [3, 10]. In general, simplifying or reducing a theory often makes a theory easier to understand and use, and may also have computational efficiency advantages.

Plotkin [27] introduced methods to decide whether a clause is subsumption redundant in a first-order clausal theory. The same problem, and slight variants, has been extensively studied in the propositional case [18, 19]. Removing redundant clauses has numerous applications, such as to improve the efficiency of SAT [12]. In contrast to these works, we focus on reducing theories formed of second-order Horn clauses, which to our knowledge has not yet been extensively explored. Another difference is that we study redundancy based on SLD-derivations. Langlois et al. [16] also considered derivations. They studied combinatorial problems for propositional Horn clauses. By contrast, we focus on derivationally reducing sets of second-order Horn clauses.

The work most relevant to this paper is by Cropper and Muggleton [3]. They used Progol’s entailment-reduction algorithm [23] to identify irreducible, or minimal, sets of metarules. Their approach removed entailment redundant clauses from sets of metarules. They identified theories that are (1) entailment complete for certain fragments of second-order Horn logic, and (2) minimal or irreducible, in that no further reductions are possible. They demonstrated that in some cases as few as two clauses are sufficient to entail an infinite language. However, they only considered small and highly constrained fragments of metarules. In particular, they focused on metarules where each literal is dyadic and each term variable appears exactly twice (we call this fragment exactly-two-connected, see Definition 4.2). In this paper, we go beyond entailment reduction and introduce derivation reduction. We also consider more general fragments of metarules.

3 Logical Reduction

We now introduce the derivation reduction problem, the problem of removing derivationally redundant clauses from a clausal theory. Before introducing this problem, we describe preliminary notation and also describe entailment reduction, to which we compare our new approach.

3.1 Preliminaries

We assume familiarity with logic programming notation [21], but we restate some key terminology. A clause is a disjunction of literals. A clausal theory is a set of clauses. A Horn clause is a clause with at most one positive literal. A Horn theory is a set of Horn clauses. Most of the concepts introduced in this section can be defined for any resolution-based proof system, but, because MIL is based on a Prolog meta-interpreter, we focus on SLD-resolution [15]. To identify clauses derivable from a theory, we first define a function \(R^n(T)\) of a Horn theory T as:

$$\begin{aligned} \begin{array}{l} R^0(T) = T\\ R^n(T) = \{C | C_1 \in R^{n-1}(T),C_2 \in T,C { \text { is the binary resolvent of } } C_1 { \text { and } } C_2\} \end{array} \end{aligned}$$

We use this definition to define the SLD-closure of a Horn theory:

Definition 3.1

(SLD-closure). The SLD-closure \(R^*(T)\) of a Horn theory T is: \(\bigcup \limits _{n\in \mathcal {N}}R^n(T)\)

We can now state our notion of derivability:

Definition 3.2

(Derivability). A Horn clause C is derivable from the Horn theory T, written \(T \vdash C\), if and only if \(C \in R^*(T)\).

We also introduce k-derivability:

Definition 3.3

(k-derivability). Let k be a natural number. Then a Horn clause C is k-derivable from the Horn theory T, written \(T \vdash _k C\), if and only if \(C \in R^k(T)\).

Some definitions and results in this section rely on Kowalski’s subsumption theorem for SLD-resolution [15], which is based on SLD-deductions [15]:

Definition 3.4

(SLD-deduction). Let T be a Horn theory and C be a Horn clause. Then there exists a SLD-deduction of C from T, written \(T \vdash _d C\), if C is a tautology or if there exists a clause D such that \(T \vdash D\) and D subsumes C.

We denote a SLD-deduction restricted by k-derivability as \(\vdash _{d_k}\). To illustrate the difference between \(\vdash \) and \(\vdash _d\), consider the clauses \(M_1\) to \(M_4\) defined in the introduction. We have \(\{M_1\} \vdash _d \{M_2,M_3,M_4\}\) but \(\{M_1\} \not \vdash \{M_2,M_3,M_4\}\). Kowalski’s subsumption theorem shows the relationship between SLD-deductions and logical entailment:

Theorem 3.1

(SLD-subsumption theorem). Let T be a Horn theory and C be a Horn Clause. Then \(T \models C\) if and only if \(T \vdash _d C\).

A more general version of this theorem also applies to unconstrained resolution [26].

3.2 Entailment Reduction

Muggleton [23] provided two definitions for eliminating entailment redundant clauses from a clausal theory:

Definition 3.5

(Entailment redundant clause). The clause C is entailment redundant in the clausal theory \(T \cup \{C\}\) whenever \(T\models C\).

Definition 3.6

(Entailment reduced theory). A clausal theory is entailment reduced if and only if it does not contain any redundant clauses.

If C is entailment redundant in \(T \cup \{C\}\) then T is entailment equivalent to \(T \cup \{C\}\) because \(T\models T \cup \{C\}\) and \(T \cup \{C\} \models T\). Muggleton’s definitions apply to clauses, but can easily be adapted to Horn clauses.

Because entailment between arbitrary Horn clauses is undecidable [22], determining whether a Horn clause is entailment redundant in a Horn theory is also undecidableFootnote 4. Algorithm 1 finds a k-bounded entailment reduction (henceforth called an E-reduction) \(T'\) of a Horn theory T.

figure a

In Sect. 4, we use a Prolog implementation of Algorithm 1 to find E-reduced sets of metarules.

3.3 Derivation Reduction

We now describe a new form of reduction based on derivability. We first define derivationally redundant clauses:

Definition 3.7

(Derivationally redundant clause). A Horn clause C is derivationally redundant in the Horn theory \(T \cup \{C\}\) if and only if \(T\vdash C\).

We can now define derivationally reduced theories:

Definition 3.8

(Derivationally reduced theory). A Horn theory is derivationally reduced if and only if it does not contain any derivationally redundant clauses.

We now define the derivation reduction problem:

Definition 3.9

(Derivation reduction problem). Given a Horn theory T, the derivation reduction problem is to find a finite theory \(T' \subseteq T\) such that (1) \(T' \vdash C\) for every Horn clause C in T, and (2) \(T'\) is derivationally reduced.

Note that a solution to the derivation reduction problem must be a finite set. For convenience, we name the output of the derivation reduction problem:

Definition 3.10

(Derivation reduction). Let T and \(T'\) be the input and output respectively from a derivation reduction problem. Then we call \(T'\) a derivation reduction (or D-reduction) of T.

The following proposition outlines the connection between an E-reduction and a D-reduction:

Proposition 3.1

Let T be a Horn theory, \(T_E\) be an E-reduction of T, and \(T_D\) be a D-reduction of T. Then \(T_E \subseteq T_D\).

Algorithm 2 finds a D-reduction \(T'\) of a Horn theory T. Note that a fragment can have multiple D-reductions. For instance, consider the theory T:

$$\begin{aligned} \begin{array}{l} C_1 = P(A,B) \leftarrow Q(B,A)\\ C_2 = P(A,B) \leftarrow Q(A,C),R(C,B)\\ C_3 = P(A,B) \leftarrow Q(C,A),R(C,B) \end{array} \end{aligned}$$

One D-reduction of T is \(\{C_1,C_2\}\) because you can resolve the first body literal of \(C_2\) with \(C_1\) to derive \(C_3\) (with variable renaming). Another D-reduction of T is \(\{C_1,C_3\}\) because you can likewise resolve the first body literal of \(C_3\) with \(C_1\) to derive \(C_2\).

As with the entailment reduction problem, the derivation reduction problem is undecidable for Horn theories:

Theorem 3.2

(Horn decidability). The derivation reduction problem for Horn theories is undecidable.

Proof

Assume the opposite, that the problem is decidable, which implies that \(T \vdash C\) is decidable. Since \(T \vdash C\) is decidable and subsumption between Horn clauses is decidable [11], then finding a SLD-deduction is also decidable. Therefore, by the SLD-subsumption theorem, entailment between Horn clauses is decidable. However, entailment between Horn clauses is undecidable [30], so the assumption cannot hold. Therefore, the problem must be undecidable.

figure b

In future work, described in Sect. 6, we want to study the decidability of the derivation problem for other forms of logic, such as datalog. To overcome the aforementioned undecidability issue, we use a k-bounded d-reduction algorithm (algorithm omitted for brevity). The k-bounded version is similar to Algorithm 2 but additionally takes as input a resolution depth bound k which is used to constrain the SLD-derivability check step. This k-bounded version has the worst-case time complexity:

Proposition 3.2

(k-bounded derivation reduction complexity). Given a Horn theory T and a natural number k, k-bounded derivation reduction requires at most \(O(|T|^k)\) resolutions.

Sketch proof 1

In the worst case the algorithm searches the whole SLD-tree which has a maximum branching factor |T| and a maximum depth k. Thus, the overall complexity is \(O(|T|^k)\).

In Sect. 4, we use the k-bounded entailment and derivation reduction algorithms to logically reduce sets of metarules. From this point onwards, any reference to the entailment or derivation reduction algorithms refer to the k-bounded versions.

3.4 Language Class and Hypothesis Space

In Sect. 4 we logically reduce fragments of second-order datalog formed of metarules, where a fragment of a theory is a syntactically restricted subset of that theory [2]. We make explicit our notion of a metarule, which is a second-order Horn clause:

Definition 3.11

(Second-order Horn clause). A second-order Horn clause is of the form:

$$A_0 \leftarrow A_1, \; \dots \;, \; \; A_m$$

where each \(A_i\) is a literal of the form \(P(T_1,\dots ,T_n )\) where P is either a predicate symbol or a second-order variable that can be substituted by a predicate symbol, and each \(T_i\) is either a constant symbol or a first-order variable that can be substituted by a constant symbol.

We denote the language of second-order datalog as \(\mathcal {H}\), which we further restrict. Our first restriction is on the syntactic form of clauses in \(\mathcal {H}\):

Definition 3.12

(The fragment \(\mathcal {H}^a_m\)). We denote as \(\mathcal {H}^a_m\) the fragment of \(\mathcal {H}\) where each literal has arity at most a and each clause has at most m literals in the body.

Having defined this fragment we can characterise the size of the hypothesis space of a MIL learner given metarules restricted to this fragment. The following result generalises previous results [4, 20]:

Theorem 3.3

(Number of programs in \(\mathcal {H}^a_m\)). Given p predicate symbols and k metarules (not necessarily distinct), the number of \(\mathcal {H}^a_m\) programs expressible with at most n clauses is O\(((p^{m+1}k)^n)\).

Proof

The number of clauses which can be constructed from a \(\mathcal {H}^a_m\) metarule given p predicate symbols is at most \(p^{m+1}\) because for a given metarule there are potentially \(m+1\) predicate variables with \(p^{m+1}\) possible substitutions. Therefore the set of such clauses \(S_{k,p,m}\) which can be formed from k distinct \(\mathcal {H}^a_m\) metarules using p predicate symbols has cardinality at most \(kp^{m+1}\). It follows that the number of programs which can be formed from a selection of n rules chosen from \(S_{k,p,m}\) is at most \(O((p^{m+1}k)^n)\).

Theorem 3.3 shows that the MIL hypothesis space increases given more metarules, which suggests that we should remove redundant metarules. The next section explores this idea.

4 Reduction of Metarules

We now logically reduce fragments of second-order datalog, where the fragments correspond to sets of metarules. The goal is to identify a finite minimal set of metarules from which a larger (possibly infinite) set can be derived. To reason about metarules using Prolog (i.e. when running the Prolog implementations of the reduction algorithms), we use a method called encapsulation [3] which transforms a second-order logic program to a first-order logic program. To aid readability of the results, we present non-encapsulated (i.e second-order) metarules.

We focus on fragments of second-order datalog useful to ILP. We follow standard ILP convention [3, 8, 26] and only consider fragments consisting of connected clauses:

Definition 4.1

(Connected clause). A clause is connected if the literals in the clause cannot be partitioned into two sets such that the variables appearing in the literals of one set are disjoint from the variables appearing in the literals of the other set.

We further restrict the fragments using syntactic restrictions on their clauses, namely on literal arity and clause body size (Definition 3.12). We denote the case that each literal has arity of exactly a as \(=\). For instance, the fragment where each clause has at most two literals in the body and each literal has arity of exactly 3 is denoted as \(\mathcal {H}^{3=}_2\). We consider two fragments: exactly two-connected and two-connected, both of which contain only connected clauses. Our goal is to first identify k-bounded E-reduction and D-reductions for these fragments using the reduction algorithms described in Sect. 3. To identify reductions for an infinite fragment, such as \(\mathcal {H}^a_*\), we first run the reduction algorithms on the sub-fragment \(\mathcal {H}^a_5\) using a resolution bound 10 (i.e. \(k=10\)). Having found k-bounded reductions for the fragment \(\mathcal {H}^a_5\), our goal is to then theoretically determine whether larger (preferably infinite) sets can be derived from these reductions.

4.1 Exactly-Two-Connected Fragment

We first consider an exactly-two-connected fragment, studied by Cropper and Muggleton [3]. The restriction is:

Definition 4.2

(Exactly-two-connected clause). A clause is exactly-two-connected if each term variable appears exactly twice.

We denote the exactly-two-connected fragment of \(\mathcal {H}\) as \(\mathcal {E}\). Figure 1(a) shows the results of applying the entailment and derivation reduction algorithms to \(\mathcal {E}^{2=}_5\). Both algorithms return the same reduced set of two metarules in \(\mathcal {E}^{2=}_2\). This result corroborates the result of Cropper and Muggleton [3]. Figure 1(b) shows the results of applying the entailment and derivation reduction algorithms to \(\mathcal {E}^{2}_5\), a fragment not studied by Cropper and Muggleton. Again, both algorithms return the same reduced set of metarules in \(\mathcal {E}^{2}_2\). We now show that \(\mathcal {E}^2_\infty \) has the same D-reduction as \(\mathcal {E}^2_2\):

Fig. 1.
figure 1

(a) and (b) show the results of applying the E-reduction and D-reduction algorithms to the corresponding fragments

Theorem 4.1

(\(\mathcal {E}^2_\infty \) reducibility). The fragment \(\mathcal {E}^2_\infty \) has the same D-reduction as \(\mathcal {E}^2_2\).

Proof

See Appendix A for the proof.

An immediate consequence of Theorem 4.1 is the result:

Corollary 4.1

The fragment \(\mathcal {E}^2_\infty \) has the same E-reduction as \(\mathcal {E}^2_2\).

Proof

Follows from Theorem 4.1 and Proposition 3.1.

4.2 Two-Connected Fragment

A common constraint in ILP is to require that all the variables in a clause appear at least twice [24, 29]. We study a slight variant of this constraint which we call two-connected:

Definition 4.3

(Two-connected clause). A clause is two-connected if each term variable appears in at least two literals

We denote the two-connected fragment of \(\mathcal {H}\) as \(\mathcal {K}\). Figure 2 shows the results of applying the entailment and derivation reduction algorithms to \(\mathcal {K}^{2=}_5\). Unlike the exactly two-connected fragment, the algorithms do not return the same reduced sets of metarules. Whereas the E-reduced set is in \(\mathcal {K}^{2=}_2\), the D-reduced set is not in \(\mathcal {K}^{2=}_2\). In fact, although the majority of clauses have been removed, the D-reduced set still contains a clause with five body literals. Figure 3 shows the results of applying the entailment and derivation reduction algorithms to \(\mathcal {K}^{2}_5\). Again, whereas the E-reduced set is in \(\mathcal {K}^{2}_2\), the D-reduced set is not in \(\mathcal {K}^{2}_2\), and again contains a clause with five body literals. We now show that the D-reduction of \(\mathcal {K}^{2}_5\) is not in \(\mathcal {K}^{2}_2\):

Proposition 4.1

(\(\mathcal {K}^2_5\) irreducibility). There is no D-reduction of \(\mathcal {K}^2_5\) in \(\mathcal {K}^2_2\).

Sketch proof 2

We use the clause \(P_0(x_1,x_2)\leftarrow P_1(x_1,x_3),P_2(x_1,x_4),P_3(x_2,x_3),P_4(x_2,x_4),P_5(x_3,x_4)\) as a counter example. We explore the different ways to derive this clause from strictly smaller clauses. We reach a contradiction each time. See Appendix B for the full proof.

We also show that \(\mathcal {K}^{2}_\infty \) has no D-reduction:

Theorem 4.2

(\(\mathcal {K}^2_\infty \) irreducibility). \(\mathcal {K}^2_\infty \) has no D-reduction.

Sketch proof 3

We define a transformation that turns an irreducible clause, such as the counter example in Proposition 4.1, into a larger irreducible clause. See Appendix B for the full proof.

4.3 Discussion

We have used the entailment and derivation reduction algorithms to reduce four fragments of second-order datalog, corresponding to sets of metarules. Theorem 4.2 shows that certain fragments do not have finite reductions. This result has implications for the completeness of any ILP system which relies on metarules. In MIL, for instance, the result implies incompleteness when learning programs in the fragment \(\mathcal {K}^2_\infty \).

Fig. 2.
figure 2

Reductions of \(\mathcal {K}^{2=}_5\)

Fig. 3.
figure 3

Reductions of \(\mathcal {K}^{2}_5\)

5 Experiments

As explained in Sect. 1, entailment reduction can be too strong and can remove metarules necessary to make a hypothesis more specific. The contrast between entailment and derivation reduction was shown in the previous section, where in all cases the E-reductions are a subset of the D-reductions. However, as shown in Theorem 3.3, the MIL hypothesis space increases given more metarules, which suggests that we should use fewer metarules. In this section we experimentally explore this tradeoff between expressivity and efficiency. Specifically, we describe an experimentFootnote 5 that compares learning with the different reduced sets of metarules. We test the null hypothesis:

  • Null hypothesis 1: There is no difference in learning performance when using E-reduced and D-reduced sets of metarules

Materials. We use Metagol [5], the main MIL implementation, to compare learning with the E-reduced and D-reduced sets of metarules for the fragment \(\mathcal {K}^2_5\). We also compare a third set which we call D*-reduced, which is the D-reduced set but without the irreducible metarule with 5 literals in the body. We compare the sets of metarules on the Michalski trains problems [17], where the task is to induce a hypothesis that distinguishes five eastbound trains from five westbound trains. To generate the experimental data, we first randomly generated 8 target train programs, where the programs are progressively more difficult measured by the number of literals.

Fig. 4.
figure 4

(a) and (b) show the predictive accuracies (%) and learning times (seconds rounded to 2 decimal places) respectively when using different reduced sets of metarules on the Michalski trains problems

Method. Our experimental method is as follows. For each target program:

  1. 1.

    Generate 10 training examples, half positive and half negative

  2. 2.

    Generate 200 testing examples, half positive and half negative

  3. 3.

    For each set of metarules m:

    1. (a)

      Learn a program p using the training examples and metarules m with a timeout of 10 min

    2. (b)

      Measure the predictive accuracy of p using the testing examples

We repeat the above method 20 times, and measure mean predictive accuracies, learning times, and standard errors.

Fig. 5.
figure 5

Example programs learned by Metagol when varying the metarule set. Only the D*-reduction program is success set equivalent to the target program (when restricted to the target predicate f/1). In all three cases Metagol discovered that if a carriage has three wheels then it is a long carriage, i.e. Metagol discovered that the literal long(C2) is redundant in the target program. In fact, if you unfold the D*reduction program to remove the invented predicates, then the resulting single clause program is one literal shorter than the target program.

Results. Figure 4(a) shows the predictive accuracies when learning with the different sets of metarules. In 6/8 tasks, the D-reduced set has higher predictive accuracies than the E-reduced set. The D-reduced and D*-reduced sets have similar levels of performance, except on the most difficult task 8. A McNemar’s test on the D-reduced and D*-reduced accuracies confirmed the significance at the p < 0.01 level. On task 8, the D*-reduced set greatly outperforms the other two sets. The poor performance with the D-reduced set on task 8 is because Metagol often times out when using these metarules, which can be explained by the larger hypothesis space searched (Theorem 3.3). Specifically, when searching for a program with 3 clauses, the D-reduced space contains \(4.27^{23}\) programs, whereas the D*-reduced space contains \(1.51^{13}\) programs. When searching for a program with 4 clauses, the D-reduced space contains \(3.21^{31}\) programs, whereas the D*-reduced space contains \(3.72^{17}\) programs.

Figure 4(b) shows the learning times when learning using the different reduced sets of metarules. Again, the D-reduced set has lower learning times compared to the E-reduced set, and again the D*-reduced set outperforms the D-reduced set on the most difficult task 8. A paired t-test on the D-reduced and D*-reduced learning times confirmed the significance at the p < 0.01 level. Figure 5 shows the target program for task 8 and example programs learned by Metagol using the various reduced sets of metarules. In both cases, the null hypothesis is refuted, both in terms of predictive accuracies and learning times.

6 Conclusions and Further Work

We have introduced the derivation reduction problem (Definition 3.9), the problem of removing derivationally redundant clauses from a clausal theory. We have also introduced a derivation reduction algorithm, which we have used to reduce sets of metarules. We have shown that certain sets of metarules do not have finite reductions (Theorem 4.2), which has implications on completeness not only for MIL, but for any ILP system which relies on metarules. We also compared learning programs using the E-reduced and D-reduced sets of metarules. In general, using the D-reduced set outperforms the E-reduced set both in terms of predictive accuracies and learning times. We also compared a D*-reduced set, a subset of the D-reduced metarules, which, although derivationally incomplete, outperforms the other two sets in terms of predictive accuracies and learning times.

Limitations and Future Work. Theorem 3.2 shows that the derivation reduction problem is undecidable for general Horn theories. In future work, we wish to study the decidability of the derivation problem for other fragments of logic, such as function-free theories. For the decidable cases, we wish to identify more efficient reduction algorithms. Theorem 4.2 shows that certain fragments of Datalog do not have finite D-reductions. Future work should explore techniques to mitigate this result, such as exploring whether special metarules, such as a currying metarule [4], could alleviate the issue. We have compared D-reduction to E-reduction, but we would also like to compare other forms of reduction, such as theta-subsumption reduction [27]. We would also like to study other fragments of logic, including triadic logics. We have shown that, although incomplete, the D*-reduced set of metarules outperforms the other sets in our Michalski trains experiment. We would like to explore this incompleteness in more detail, such as determining the degree of incompleteness. Finally, we would like to run more experiments comparing the learning performance of the different sets of metarules on a wider variety of problems.