1 Introduction

Saturation-based proof search has been the leading technology in automated theorem proving for first-order logic for some time. The core idea of this approach is to saturate a set of clauses (including the negated goal) with respect to some inference system with the aim of deriving a contradiction and concluding that the goal holds. Over the last few years this technology has been extended to reason with both quantifiers, and theories such as arithmetic and term algebras (also known as algebraic, recursive or inductive datatypes), making it highly applicable in areas such as program analysis and verification, which were previously the sole domain of SMT solvers. However, so far little has been done to extend saturation-based proof search with automated induction. Most attempts to date have focussed on using saturation-based methods to discharge subgoals once an induction axiom has been selected.

The aim of this work is to extend saturation-based proof search with lightweight methods for automated induction where those techniques are integrated directly into proof search i.e. they do not rely on some external procedure to produce subgoals. We achieve this by the introduction of new inference rules capturing inductive steps and new proof search heuristics to guide their application. Our approach is based on the research hypothesis that many problems requiring induction only require relatively simple applications of induction.

Example 1

As an introductory example, consider the problem of proving the commutativity of \((\forall x \forall y) \mathsf {plus}(x,y) \approx \mathsf {plus}(y,x)\), where x and y range over natural numbers. We now briefly described how this approach will handle this problem.

When we Skolemise its negation, we obtain the clause \(\mathsf {plus}(\sigma _0,\sigma _1) \not \approx \mathsf {plus}(\sigma _1,\sigma _0)\). In this paper, we will denote by \(\sigma _i\) fresh Skolem constants introduced by converting formulas to clausal form.

Our approach will immediately apply induction to \(\sigma _0\) in the negated conjecture by resolving this clause with the (clausal form of the) induction axiom

$$ \left( \begin{array}{l} \mathsf {plus}(\mathsf {zero},\sigma _1) \approx \mathsf {plus}(\sigma _1,\mathsf {zero}) \wedge \\ (\forall z) \left( \begin{array}{l} \mathsf {plus}(z,\sigma _1) \approx \mathsf {plus}(\sigma _1,z) \rightarrow \\ \mathsf {plus}(\mathsf {succ}(z),\sigma _1) \approx \mathsf {plus}(\sigma _1,\mathsf {succ}(z)) \end{array} \right) \end{array} \right) \rightarrow (\forall x) \mathsf {plus}(x,\sigma _1) \approx \mathsf {plus}(\sigma _1,x) $$

to produce the following subgoals:

$$\begin{aligned} \begin{array}{ll} \mathsf {plus}(\mathsf {zero},\sigma _1) \not \approx \mathsf {plus}(\sigma _1,\mathsf {zero}) \vee &{}\mathsf {plus}(\mathsf {succ}(\sigma _2),\sigma _1) \not \approx \mathsf {plus}(\sigma _1,\mathsf {succ}(\sigma _2))\\ \mathsf {plus}(\mathsf {zero},\sigma _1) \not \approx \mathsf {plus}(\sigma _1,\mathsf {zero}) \vee &{}\mathsf {plus}(\sigma _1,\sigma _2) \approx \mathsf {plus}(\sigma _2,\sigma _1) \end{array} \end{aligned}$$
(1)

Clause splitting is then used to split the search space into two parts to be considered separately. This splitting is important to our approach and can be used in any saturation theorem prover implementing some version of it, for example using splitting with backtracking as in SPASS [24] or the AVATAR architecture as in Vampire [22]. The first part contains \(\mathsf {plus}(\mathsf {zero},\sigma _1) \not \approx \mathsf {plus}(\sigma _1,\mathsf {zero})\) and is refuted by deriving \(\mathsf {plus}(\sigma _1,\mathsf {zero}) \not \approx \sigma _1\) using the definition of \(\mathsf {plus}\) and applying a second induction step to \(\sigma _1\) in this clause. By resolving with a similar induction axiom to before, the following clauses are produced and are refuted via the definition of \(\mathsf {plus}\) and the injectivity of datatype constructors.

$$ \begin{array}{ll} \mathsf {zero}\not \approx \mathsf {plus}(\mathsf {zero},\mathsf {zero}) \vee &{} \mathsf {succ}(\sigma _3) \not \approx \mathsf {plus}(\mathsf {succ}(\sigma _3),\mathsf {zero}) \\ \mathsf {zero}\not \approx \mathsf {plus}(\mathsf {zero},\mathsf {zero}) \vee &{} \mathsf {plus}(\sigma _3,\mathsf {zero}) \approx \sigma _3 \end{array} $$

The second part of the clause splitting then contains the other half of the clauses given above. Superposition is then applied to these clauses and the axioms of \(\mathsf {plus}\) to derive

$$\mathsf {succ}(\mathsf {plus}(\sigma _1,\sigma _2)) \not \approx \mathsf {plus}(\sigma _1,\mathsf {succ}(\sigma _2))$$

and a third induction step is applied to this clause on \(\sigma _1\). The resulting subgoals can again be refuted via the definition of \(\mathsf {plus}\) and the injectivity of datatype constructors.

While inductive reasoning in this example may seem to be the same as in almost any other inductive theorem prover, there is an essential difference: instead of reducing goals to subgoals using induction and trying to prove these subgoals using theory reasoning or again induction, we simply consider induction as an additional inference rule adding new formulas to the search space. In a way, every clause generated during the proof search becomes a potential target for applying induction and induction becomes integrated in the saturation process.

In this example there were three applications of induction to ground unit clauses in the search space, however our implementation performs 5 induction steps with 2 being unnecessary for the proof. This is typical in saturation-based proof search where many irrelevant consequences are often derived. This is an important observation; our general approach is to derive consequences (inductive or otherwise) in a semi-guided fashion, meaning that we may make many unnecessary induction steps. However, this is the philosophy behind saturation-based approaches.

During proof search for this example it was necessary to (i) decide which clauses to apply induction to, (ii) decide which term within that clause to apply induction to, and (iii) decide how to apply induction. We address issues (ii) and (iii) in this paper, whilst relying on the clause selection techniques of saturation-based theorem provers for (i). We begin in Sect. 2 by introducing the necessary preliminary definitions for the work. In Sect. 3 we address (iii), how we apply induction, through the introduction of a set of new inference rules. In Sect. 4 we consider (ii) through a number of heuristics for selecting goals for induction. Then in Sect. 5 we show how standard clause splitting techniques can be used in our induction proofs (without any additional work) for case splitting. Section 6 describes implementation and experimental evaluation. We then consider related work in Sect. 7 before concluding in Sect. 8.

2 Preliminaries

Multi-sorted First-Order Logic. We consider standard multi-sorted first-order predicate logic with equality. We allow all standard boolean connectives and quantifiers in the language. We denote terms by st, variables by xyz, constants by a, and function symbols by f. We consider equality \(\approx \) as part of the language, that is, equality is not a symbol. An atom is an equality or a predicate applied to a list of terms. A literal is an atom A or its negation \(\lnot A\). Literals that are atoms are called positive, while literals of the form \(\lnot A\) are negative. If \(L = \lnot A\) is a literal we write \(\lnot L\) for the literal A. A clause is a disjunction of literals \(L_1 \vee \ldots \vee L_n\), where \(n \ge 0\). When \(n=0\), we will speak of the empty clause, denoted by \(\Box \). We denote atoms by A, literals by L, clauses by C, and formulas by F, all possibly with indices. Formulas can be clausified (transformed into a set of clauses) via standard techniques (e.g. [13] and our recent work in [15]). We write \(\mathsf{clausify}(F)\) for the set of clauses obtained from F by clausification.

By an expression E we mean a term, atom, literal, or clause. We write E[t] to mean an expression E with a particular occurrence of a term t and then E[s] for that expression with the particular occurrence of t replaced by term s.

A multi-sorted signature is a finite set of symbols and a finite set of sorts with the accompanying function \( srt \) providing sorts for the symbols.

The Theory of Finite Term Algebras. In this paper we consider induction for finite term algebras, also known as algebraic, inductive, or recursive datatypes. A definition of the first-order theory of term algebras over a finite signature can be found in e.g. [17] and a description of how saturation-based proof search may be extended to reason with such structures is given in [9]. Let \(\varSigma \) be a finite set of function symbols containing at least one constant. Denote by \(\mathcal {T}(\varSigma )\) the set of all ground terms built from the symbols in \(\varSigma \). The \(\varSigma \)-term algebra is the algebraic structure whose carrier set is \(\mathcal {T}(\varSigma )\) and defined in such a way that every ground term is interpreted by itself (we leave details to the reader).

We will often consider extensions of term algebras by additional symbols. Elements of \(\varSigma \) will be called term constructors (or simply just constructors), to distinguish them from other function symbols. We will differentiate between recursive constructors that are recursive in their arguments and base constructors that are not. Where we wish to differentiate we may write \(\mathcal {T}(\varSigma _B,\varSigma _R)\) for base constructors \(\varSigma _B\) and recursive constructors \(\varSigma _R\).

In practice, it can be useful to consider multiple sorts, especially for problems taken from functional programming. In this setting, each term algebra constructor has a type \(\tau _1 \times \dots \times \tau _n \rightarrow \tau \). The requirement for at least one constant is replaced by the requirement that for every sort, there exists a ground term of this sort. We also consider theories, which mix constructor and non-constructor sorts. That is, some sorts contain constructors and some do not (e.g. arithmetic).

Finally, we associate n destructor (or projection) functions with every constructor c of arity n such that each destructor returns one of the arguments of c. Note, that the behavior of destructors is unspecified on some terms.

Example 2

We introduce two term algebras. Firstly, that of natural numbers

$$ nat := \mathsf{zero} ~\mid ~\mathsf{succ}(\mathsf{dec}( nat )) $$

and secondly that of integer lists

$$ list := \mathsf {nil}~\mid ~\mathsf {cons}(\mathsf {hd}( Int ),\mathsf {tail}( list )). $$

Note that this second term algebra relies on a built-in integer sort.

Saturation-Based Proof Search. An important concept in this work is that of saturation with respect to an inference system. Inference systems are used in the theory of superposition [12] implemented by several leading automated first-order theorem provers, including Vampire [10] and E [18]. Superposition theorem provers implement proof-search algorithms in \(\mathcal {S}\) using so-called saturation algorithms, as follows. Given a set S of formulas, superposition-based theorem provers try to saturate S with respect to \(\mathcal {S}\), that is build a set of formulas that contains S and is closed under inferences in \(\mathcal {S}\). At every step, a saturation algorithm selects an inference of \(\mathcal {S}\), applies this inference to S, and adds conclusions of the inferences to the set S. If at some moment the empty clause \(\Box \) is obtained, by soundness of \(\mathcal {S}\), we can conclude that the input set of clauses is unsatisfiable. Figure 1 gives a simple saturation algorithm. This is missing an important notion of redundancy. We have omitted this as it does not interact with the elements of proof search we consider here. However, it is core to the implementation in the Vampire theorem prover. It is important to note that the only way to guide proof search is via how we select clauses and how we perform inferences on them.

Fig. 1.
figure 1

Simple saturation algorithm.

3 Performing Induction

This section introduces inference rules for induction on term algebras.

3.1 General Approach

We begin by describing our general approach. The idea is to add inference rules that capture the application of induction to the selected clause in proof search. These inference rules will be applied during proof search to selected clauses in the same way as other inference rules such as resolution. We define an induction axiom to be any valid (in the underlying theory) formula of the form

$$ formula \rightarrow (\forall x) (L[x]). $$

For simplicity we assume that this formula is closed, leaving out the general case due to the lack of space. The idea is to resolve this with a clause \(\lnot L[t] \vee C\) obtaining \( formula \rightarrow C\). Again, for simplicity we assume that t is a ground term. As long as the induction axiom is valid, this approach is always sound. If the resulting formula is not a clause, it should then be converted to its CNF.

The idea is that L[t] is a (sub)goal we are trying to prove (by induction). This is an interesting point. Typically, saturation-based proof search is not goal-oriented (although one can introduce heuristics that support this) but this approach to induction is goal-oriented in nature as the conclusion of an induction inference is a subgoal that, if refuted, proves the goal represented by the premise. Also, similar to [6] by resolving the induction axiom to reduce the goal to subgoals we bypass the literal selection used in saturation algorithms. This means that, if we would just add the (clausal form of) the induction axiom to the search space, we would most likely never use it to resolve against the goal in the same way as above since the literal L[x] would not necessarily be selected.

Below we consider two different kinds of induction axioms, introducing three inference rules, parametrised by some (general) term algebra. To formalise the selection of goals that can be proved by induction we introduce a predicate \( sel (C,L,t)\) that is true if C is clause, L a literal in C and t a term in L. We will call this predicate the induction heuristic since it will be used to decide when induction should be applied. In this case we will informally say that t is the induction term and L the induction literal in C.

Here we concentrate on how induction should be performed once an induction term and literal have been selected. Section 4 discusses choices for selection.

3.2 Structural Induction

We begin by motivating the inference rule by the simple example of inductively proving that the length of a list is non-negative.

Example 3

(Structural Induction on Lists). Consider the following conjecture \((\forall x : list )( \mathsf {len}(x) \ge 0)\) for integer lists (defined in Example 2) given the axioms \(\mathsf {len}(\mathsf {nil}) \approx 0\) and \((\forall x : Int , y : list )( \mathsf {len}(\mathsf {cons}(x,y)) \approx 1 + \mathsf {len}(y))\) for the \(\mathsf {len}\) function. To prove this conjecture we must first negate it to get \(\lnot (\mathsf {len}(\sigma ) \ge 0)\) and then introduce the induction axiom

$$ \mathsf {len}(\mathsf {nil}) \ge 0 \wedge (\forall x,y) (\mathsf {len}(x) \ge 0 \rightarrow \mathsf {len}(\mathsf {cons}(y,x)) \ge 0) \rightarrow (\forall x)(\mathsf {len}(x) \ge 0) $$

which is then resolved against \(\lnot (\mathsf {len}(\sigma ) \ge 0)\) to give, after conversion to CNF, two clauses

$$ \begin{array}{l} \lnot (\mathsf {len}(\mathsf {nil}) \ge 0) \vee \mathsf {len}(\sigma _1) \ge 0 \\ \lnot (\mathsf {len}(\mathsf {nil}) \ge 0) \vee \lnot (\mathsf {len}(\mathsf {cons}(\sigma _2,\sigma _1)) \ge 0), \end{array} $$

which can be refuted using the axioms for \(\mathsf {len}\). The question now is what inference rule is needed for performing the above induction step. To do so, we define the induction heuristics \( sel (C,L,t)\) to hold when L is the only literal in C and t is a constant of the sort \( list \). This rule effectively results in the following inferences performed by a saturation theorem prover:

where a is a constant, A[a] is ground, \( srt (a) = \mathsf{list}\) and \(\sigma _1,\sigma _2\) are fresh constants.

3.3 Well-Founded Induction

Suppose that \(x \succ y\) is any binary predicate that is interpreted as a well-founded relation (which is not necessarily an ordering). We require both arguments of \(\succ \) to be of the same sort. Then the following is a valid formula, which represents well-founded induction on this relation:

$$ \forall x(\lnot L[x] \rightarrow \exists y (x \succ y \wedge \lnot L[y])) \rightarrow \forall x L[x]. $$

When we skolemize this formula, we obtain two clauses

$$ \begin{array}{l} \lnot L[\sigma _1] \vee L[x] \quad \quad \quad \lnot \sigma _1 \succ y \vee L[y] \vee L[x] \end{array} $$

We can use the following two equivalent clauses instead:

$$\begin{aligned} \begin{array}{l} \lnot L[\sigma _1] \vee L[x] \quad \quad \quad \lnot \sigma _1 \succ y \vee L[y] \quad \quad \quad \quad \end{array} \end{aligned}$$
(2)

Well-founded induction is the most general form of induction (though in practice it can only be used when the relation \(\succ \) can be expressed in the first-order language we are using). We are interested in finding special cases of well-founded induction for term algebras. There are two obvious candidates for it: the immediate subterm relation \(\succ _1\) and the subterm relation discussed in the sequel.

Let us begin with the immediate subterm relation. Note that the relation \(\succ \) must have both arguments of the same sort, so the corresponding induction rule will only be useful for term algebras where at least one argument of a constructor has the same sort as the constructor itself. Fortunately, this is the case for the three most commonly used inductive data types: natural numbers, lists and trees.

Let us provide a complete axiomatisation of the immediate subterm relation \(\succ _1\) first for natural numbers and lists:

$$ \begin{array}{l} \lnot (\mathsf {zero}\succ _1 x) \\ \mathsf {succ}(x) \succ _1 y \leftrightarrow x \approx y \end{array} ~~~~~~~~~~~~~~~~~~ \begin{array}{l} \lnot (\mathsf {nil}\succ _1 x) \\ \mathsf {cons}(x,y) \succ _1 z \leftrightarrow y \approx z \end{array} $$

The subterm relation is generally not axiomatisable. However, this is not a problem in general, since we can use as an incomplete axiomatisation of the subterm relation any set of formulas which are true on this relation (though this restricts what can be proved about the relation). If we then prove anything using this set of formulas, then our proof will be correct for the subterm relation too, which makes the corresponding induction rule valid too.

We can generalise the immediate subterm and subterm relation also to trees and some other (but not all!) inductively defined types. We do not include general definitions here as they become very involved with multiple sorts and mutually recursive type definitions.

3.4 Inductive Strengthening

We now consider a different form of induction axiom (inspired by [16]).

Example 4

Given the negated conjecture \(\lnot (\mathsf {len}(\sigma _1) \ge 0)\) given in Example 3 we consider a different way in which to inductively demonstrate L[x] and thus refute this claim. The idea here is to argue that if there does not exist a smallest list of non-negative length then the length of all lists is non-negative. This can be captured in the induction axiom

$$ \lnot (\exists x)\left( \begin{array}{l} \lnot (\mathsf {len}(x) \ge 0) \wedge \\ (\forall y)( \mathsf{subterm}_ list (x,y) \rightarrow \mathsf {len}(\mathsf {tail}(y)) \ge 0) \end{array} \right) \rightarrow (\forall z)(\mathsf {len}(z) \ge 0) $$

where \(\mathsf{subterm}_ list (x,y)\) is true if y is a subterm of x of \( list \) sort. However, as argued in [9], the \(\mathsf{subterm}\) relation needs to be axiomatised and these axioms (which include transitivity) can have a large negative impact on the search space. Therefore, we can consider two alternative inductive axioms. The first is the weak form where we consider only direct subterms of x as follows.

$$ \lnot (\exists x)\left( \begin{array}{l} \lnot (\mathsf {len}(x) \ge 0) \wedge \\ (x \approx \mathsf {cons}(\mathsf {hd}(x),\mathsf {tail}(x)) \rightarrow \mathsf {len}(\mathsf {tail}(x)) \ge 0) \end{array} \right) \rightarrow (\forall y)(\mathsf {len}(y) \ge 0) $$

This is clausified as

$$ \begin{array}{l} \mathsf {len}(x) \ge 0 \vee \lnot (\mathsf {len}(\sigma _2) \ge 0) \\ \mathsf {len}(x) \ge 0 \vee \sigma _2 \not \approx \mathsf {cons}(\mathsf {hd}(\sigma _2),\mathsf {tail}(\sigma _2)) \vee \mathsf {len}(\mathsf {tail}(\sigma _2)) \ge 0 \end{array} $$

which can be resolved against the conjecture \(\lnot (\mathsf {len}(\sigma _1) \ge 0)\) as before.

The second (taken from [9]) is where we represent the subterm relation in a way that is more friendly to saturation-based theorem provers i.e. we introduce a fresh predicate \(\mathsf{less}_x\) and then add axioms such that it holds for exactly those terms smaller than the existential witness x. This can be written as follows.

$$ \lnot (\exists x)\left( \begin{array}{l} \lnot (\mathsf {len}(x) \ge 0) \wedge (\forall z)( \mathsf{less}_x(z) \rightarrow \mathsf {len}(z) \ge 0) \\ \wedge ~ (x \approx \mathsf {cons}(\mathsf {hd}(x),\mathsf {tail}(x)) \rightarrow \mathsf{less}_x(\mathsf {tail}(x)))~ \wedge \\ (\forall y)(\mathsf{less}_x(\mathsf {cons}(\mathsf {hd}(y),\mathsf {tail}(y))) \rightarrow \mathsf{less}_x(\mathsf {tail}(y))) \\ \end{array} \right) \rightarrow (\forall y)(\mathsf {len}(y) \ge 0) $$

Again, the specific approach taken in this example can be generalised to the arbitrary term algebra \( ta = \mathcal{T}(\varSigma _B \cup \varSigma _R)\). The existential part \( exists _ ta (L)\) of the general induction axiom can be given as

$$ (\exists x)\left( \lnot L[x] \bigwedge _{\mathsf{con}(\ldots ,\mathsf{d_i},\ldots ) \in \varSigma _R}~ \bigwedge _{j \in rec (\mathsf{con})} (x \approx \mathsf{con}(\ldots ,\mathsf{d}_i(x),\ldots ) \rightarrow L[\mathsf{d_j}(x)]) \right) $$

for the first approach and as

$$ (\exists x)\left( \begin{array}{l} \lnot L[x]~ \wedge ~ (\forall z)( \mathsf{less}(z) \rightarrow \ L[z]) ~\wedge \\ \bigwedge \nolimits _{\mathsf{con}(\ldots ,\mathsf{d_i},\ldots ) \in \varSigma _R} \bigwedge \nolimits _{j \in rec (\mathsf{con})} x \approx \mathsf{con}(\ldots ,\mathsf{d}_i(x),\ldots ) \rightarrow \mathsf{less}_x(\mathsf{d_j}(x)) ~ \wedge \\ (\forall y)( \bigwedge \nolimits _{\mathsf{con}(\ldots ,\mathsf{d_i},\ldots ) \in \varSigma _R} \bigwedge \nolimits _{j \in rec (\mathsf{con})} \mathsf{less}_xz(\mathsf{con}(\ldots ,\mathsf{d}_i(y),\ldots )) \rightarrow \mathsf{less}_x(\mathsf{d}_j(y)) ) \\ \end{array} \right) $$

for the second approach. The general induction rule then becomes

for ground literal L[t], clause C and term t, where \( srt (t) = ta \) and \( sel ( L[t] \vee C, L[t],t)\).

One could consider an optimisation where this approach is applied directly to the input (as is done in [16]). However, this would introduce induction axioms too early in proof search i.e. it goes against the saturation-based philosophy. One could also consider reusing Skolem constants instead of introducing new ones where t in the above rule is already a Skolem constant. However, this could only be done for each Skolem constant at most once.

Table 1. Illustrating the induction inference schemas for the \( rbtree \) term algebra.

3.5 Comparing the Approaches with an Example

To illustrate the differences in the clauses produced by the above three approaches we give, in Table 1, the introduced inference rules instantiated with \( ta = rbtree \) defined as

$$ \begin{array}{ll} rbtree := \mathsf{empty}~\mid ~ \mathsf{leaf}(\mathsf{lval}( Int )) &{}~\mid ~ \mathsf{red}(\mathsf{rval}( Int ),\mathsf{rleft}( rbtree ),\mathsf{rright}( rbtree )) \\ {} &{}~\mid ~ \mathsf{black}(\mathsf{bval}( Int ),\mathsf{bleft}( rbtree ),\mathsf{bright}( rbtree )) \end{array} $$

This covers all the important cases from above (i) non-zero arity base constructors, and (ii) multiple base and multiple recursive constructors. Notice how the structural induction rule, in this case introduces 7 new Skolem constants and 9 clauses (although this could be slightly optimised here) whilst the inductive strengthening approaches introduce one Skolem constant.

4 Selecting Where to Apply Induction

We now consider how to define various induction heuristics.

4.1 Goal-Directed Search

In our introductory example (Example 1, proving commutativity of addition) we (usefully) applied induction three times. The first time was directly to the goal and the second two times were to unit clauses derived directly from the result of this first induction. We hypothesise that this is a typical scenario and introduce heuristics for this common case. An important observation is that an implicative universal goal becomes a set of unit ground clauses once negated.

Unit Clauses. A unit clause represents a single goal or subgoal that, if refuted, will lead to a final proof. Conversely, applying the above induction inference rules to non-unit clauses will lead to applications of induction that may not be as general as needed. This selection can be defined as follows for some literal L[t] and term t.

$$ sel _U(L[t],L[t],t) $$

Negative Literals. Typically, goal statements are positive and therefore proof search is attempting to derive a contradiction from a negative statement. Applying induction to a negative statement leads to a mixture of positive and negative conclusions. As we saw in the introductory example, it is common to apply further induction to the negative conclusions. This selection can be defined as follows for clause C, atom A and term t.

$$ sel _N(C \vee \lnot A[t],\lnot A[t],t) $$

However, it is easy to see cases where this is too restrictive. For example, the goal from Example 3 could have been rewritten as \((\forall x)(\lnot (\mathsf {len}(x) < 0))\) and the negated goal on which induction should be performed would have been positive.

Constants. Given a purely universal goal, the terms of interest will be Skolem constants (whether this Skolemisation occurred within the solver or not) and terms introduced by induction for repeated induction are also typically Skolem constants. Therefore, to restrict application of induction to this special case, we can restrict it to constants only. This selection can be defined as follows for clause C, literal L and constant a.

$$ sel _C(C \vee L[a], L[a], a) $$

Special Symbols. The goal will typically contain the symbols on which induction should be performed. Additionally, further induction steps are often performed on the Skolem constants introduced by a previous induction. We define a selection predicate parameterised by a set of symbols \(\alpha \) as follows for clause C, literal L and term t.

$$ sel _\alpha (C \vee L[t], L[t], t) \Leftrightarrow (t = f(t_1,\ldots ,t_n) \rightarrow f \in \alpha ) \wedge (t = a \rightarrow a \in \alpha ) $$

and define the functions \( sel _G\) and \( sel _I\) for sets of goal symbols G and induction Skolem constants I.

The \( sel \) function is defined as any conjunction of zero or more of the above with the trivial selection function that is true on all inputs of term algebra sort.

4.2 Inferring Goal Clause(s)

One issue with the above heuristics is that we may not have an explicit goal in our input problem. Indeed, SMT-LIB [1] has no syntax for indicating the goal (unlike TPTP [21]). To address this we define a notion of goal symbol that is independent of the notion of an explicit goal being given.

Given a set of input formulas \(F_1, \ldots , F_n\) and a set \(\mathcal{G}\) containing zero or more formulas \(F_i\) marked as goal formulas, a goal symbol is a symbol such that

  • It appears in a formula \(F \in \mathcal{G}\), or

  • It is a Skolem constant introduced in the clausification of some \(F \in \mathcal{G}\), or

  • It appears in at most limit formulas, or

  • It is a Skolem constant introduced by the Skolemisation of some formula \(F_i\) of the form \(\exists x . F\)

where limit is a parameter to the process. In the case where this is a single goal formula we would expect limit to be 1. However, the input may have been subject to some additional preprocessing meaning that the goal is represented by a few clauses in the input. The last point is because many goals will take the form of negated universal statements; this is also how formulas for induction are identified in [16] (our approach is more general).

Once all such goal symbols have been identified, the set \(\mathcal{G}\) is extended to include all formulas containing a goal symbol. This is done as \(\mathcal{G}\) typically plays another role in proof search as clauses derived from formulas in \(\mathcal{G}\) may be prioritised in clause selection, providing some heuristic goal-directionality.

5 Case Splitting for Free

An important part of inductive proofs is typically the case splitting between the base case and the inductive step. In this section we describe a clause splitting approach (implemented in Vampire as AVATAR [14, 22]) that achieves this.

We briefly describe the ground part of the AVATAR framework for clause splitting as case splitting for induction only requires the ground part. The general idea is that given a set of clauses S and a ground clause \(L_1 \vee L_2\) we can consider the two sub-problems \(S \cup \{ L_1 \}\) and \(S \cup \{ L_2 \}\) independently.

Let \(\mathsf{name}\) be a function from ground literals to labels that is injective up to symmetry of equality. Let \(C \leftarrow A\) be a labelled clause where A is a set of labels. We can lift an inference system on clauses to one on labelled clauses where all conclusions take the union of the labels in premises. The previous rules for induction can be extended such that the consequent clauses take the labels of the premise clause. Figure 2 shows how the simple saturation algorithm from Sect. 2 can be extended to perform ground clause splitting. It uses a SAT procedure that we add clauses of labels to and then request the difference between a new model and the previous model in terms of added/removed labels.

Fig. 2.
figure 2

Simple saturation algorithm with ground clause splitting.

To understand why this is useful consider the conclusions of the inference rules given in Table 1. These clauses are all ground and multi-literal i.e. they capture multiple cases. As an example, when proving the conjecture \(\mathsf{height}(t) \ge 0\) our implementation considers and refutes between 6 and 8 different cases depending on which form of induction rule is used.

6 Experimental Evaluation

In this section we describe the implementation and evaluation of the techniques described in this paper.

Implementation. We extended the Vampire [10] theorem prover with additional options to capture the techniques described in the previous sections. Table 2 gives an overview of these new options. The sik option captures the different approaches introduced in Sect. 3. The indm option limits the depth of induction. The remaining options capture the choices made in Sect. 4. Our implementation of the induction inference rules ensures that we never instantiate the same induction axiom more than once and that proof search when there are no term algebra sorts in the problem is unaffected. Furthermore, this implementation is fully compatible with all other proof search options and heuristics in Vampire. Our implementation is available onlineFootnote 1.

Table 2. New options and their values.

Experimental Setup. We use two sets of benchmarks from SMT-LIB from the UFDT and UFDTLIA logics where UF stands for Uninterpreted Functions, DT stands for DataTypes and LIA stands for Linear Integer Arithmetic; we do not consider AUFDTLIA as it does not contain problems interesting for induction. UFDT consists of 4376 problems known not to be satisfiable (we excluded problems either marked as, or found to be, satisfiable during experiments) and UFDTLIA consists of 303 problems that formed the TIP benchmark set in 2015 as used in [16]. Experiments are run on StarExec [20].

6.1 Research Questions

In this section we look at two research questions that naturally arise in our work.

Which Options are Useful? Given the set of introduced options, we would like to know which will be useful in general. Vampire is a portfolio solver and would normally run a series of strategies combining different options. Therefore, any options able to solve problems uniquely may be useful for a portfolio mode. Table 3 compares the option values across the SMT-LIB problems. All option values with the exception of –sik three and non-zero values for indmd solve some problems uniquely. For each option there is a clear choice for default value. The fact that non-zero values for indmd were not useful in general suggests that there was not a problem with an explosion of iterative induction steps. This is most likely due to the fact that clause selection will favour a breadth-first exploration of the space. The solved problems did not rely heavily on inferring goal symbols or selection via special symbols. This suggests that the problems of interest either had shallow proofs that followed quickly from the input, or contained few relevant symbols for induction.

Table 3. Comparing option values.

What Do the Proofs Look Like? We ran Vampire in a portfolio mode using the additional options -sik one -indm 0 -indc all on the SMT-LIB UFDTLIA problem set and recorded (i) the number of induction inferences appearing in proofs, and (ii) the maximum depth of these inductions. The results are in Table 4. In the majority of cases only a few induction steps are used but there are 11 problems where more than 10 inductions are required and the proof of induction-vmcai2015/leon/heap-goal3.smt2 uses 145 induction steps. As suggested above, induction is relatively shallow with the maximum depth in proofs being 6 and most necessary inductions not being nested.

Table 4. Statistics from 165 successful problems in UFDTLIA.

6.2 Comparative Evaluation

We compare the new techniques to CVC4 on the SMT-LIB benchmarks in Table 5 running both solvers with and without induction. We currently restrict our attention to CVC4 as this is the only solver available that runs on these problems and supports induction (Z3 does not support induction). It is worth noting that CVC4 was reported comparable to Zipperposition in [4] but has improved considerably in the meantime.

Table 5. Comparative results with CVC4 on SMT-LIB benchmarks.

Overall CVC4 solves more problems but Vampire solves 48 problems that CVC4 (or any other solver) does not. We consider it an impressive result for a first implementation and believe that Vampire will solve many more previously unsolved problems when more heuristics, options and induction axioms are implemented.

It is interesting to note that the majority of problems are solvable without induction, suggesting the need for better benchmarks. However, we also observe that Vampire will commonly use induction to solve a problem more quickly even when induction is not required. This is also a very interesting observation since normally the addition of new rules other than simplification slows down saturation theorem provers.

The fact that the UFDTLIA benchmarks are a version of TIP allows us to indirectly compare to other inductive theorem provers. Table 6 uses historical data from the literature to show that Vampire is competitive with these solvers but does not perform as well in general. This can be explained by missing features specialised for induction that have not yet been implemented in Vampire.

Table 6. Using published data to compare to induction provers. Data for CVC4 and Vampire taken from our experiments, other data taken from [23].

7 Related Work

We focus on explicit induction approaches, rather than implicit induction, e.g. the inductionless induction [3] approach. Within this we identify two areas of relevant work - the specialised area of inductive theorem proving and the general approach of extending first-order theorem provers with induction.

Tools that use theorem provers as backends often include induction hypotheses in the input. For example, Dafny was extended to wrap SMT solvers with an induction layer inserting useful induction hypotheses [11] within Dafny.

There are a number of inductive theorem provers such as ACL2 [7], IsaPlanner [5], Zeno [19], and Hipspec [2] that differ in architecture fro our approach. ACL2 relies on a special procedure for deciding when to apply induction, HipSpec is based on a technique called theory exploration and IsaPlanner and Zeno follow a top-down approach. Therefore, in these other techniques most of the proof search effort is dedicated to deciding where to apply induction (as it is quite costly) whereas our approach is less guided in this sense but induction is, in some sense, cheap. In general, inductive solvers are well suited to problems that require complex induction but only require relatively simple reasoning otherwise. Our focus is the converse case.

The main previous attempt to extend a saturation-based superposition theorem prover with induction is in Zipperposition by Cruanes [4]. This approach is formulated for (generally defined) structural induction over inductive datatypes. The main difference between this previous work and ours is the way in which [4] puts together datatype reasoning, inductive reasoning, and reasoning by cases using AVATAR, whereas our work keeps all three parts separate. As a result, our approach is more general; our definition of induction does not depend on inductive datatypes and works without AVATAR, so it can be with little effort added to existing saturation theorem provers. For example, our generality results in the ability to implement well-founded induction.

Although we do note that Cruanes explores heuristics for where to apply induction from the broader inductive theorem proving literature that we have not yet explored.

Finally, we note that the experimental results of [4] have a different focus from our own as they focus on problems suited for inductive theorem provers whereas our research (and our experiments) focus on problems requiring a little bit of induction and a lot of complex first-order reasoning.

Another approach [23] wraps superposition-based proof search in an extra process that iteratively explores the space of possible inductions. There has also been work on incorporating induction for natural numbers into the superposition calculus [8]. CVC4 has been extended with a set of techniques for induction [16]. There rules are similar to ours but the setting is different as CVC4 is a DPLL(T)-based SMT solver using quantifier instantiation to handle quantifiers.

8 Conclusion

In this paper we introduce a new method for integrating induction into a saturation-based theorem prover using superposition. Our approach utilises the clause-splitting framework for case splitting. Experimental results show that the new options allow us to solve many problems requiring complex (e.g. nested) inductions.