1 Introduction

An increasing number of automatic theorem provers can generate certificates, or proofs, that justify the formulas they derive. These proofs can be checked by other programs and shared across reasoning systems. Some users will also want to inspect this output to understand why a formula holds. Proof production is generally well understood for the core proving methods and for many theories commonly used in satisfiability modulo theories (SMT). But most automatic provers also perform some formula processing or preprocessing—such as clausification and rewriting with theory-specific lemmas—and proof production for this aspect is less mature.

For most provers, the code for processing formulas is lengthy and deals with a multitude of cases, some of which are rarely executed. Although it is crucial for efficiency, this code tends to be given much less attention than other aspects of provers. Developers are reluctant to invest effort in producing detailed proofs for such processing, since this requires adapting a lot of code. As a result, the granularity of inferences for formula processing is often coarse. Sometimes, processing features are even disabled to avoid gaps in proofs, at a high cost in proof search performance.

Fine-grained proofs are important for a variety of applications. We propose a framework to generate such proofs without slowing down proof search. Proofs are expressed using an extensible set of inference rules (Sect. 2). The succedent of a rule is an equality between the original term and the translated term. (It is convenient to consider formulas a special case of terms.) The rules have a fine granularity, making it possible to cleanly separate theories. Clausification, theory-specific simplifications, and expansion of ‘let’ expressions are instances of this framework. Skolemization may seem problematic, but with the help of Hilbert’s choice operator, it can also be integrated into the framework. Some provers provide very detailed proofs for parts of the solving, but we are not aware of any publications about practical attempts to provide easily reconstructible proofs for processing formulas containing quantifiers and ‘let’ expressions.

At the heart of the framework lies a generic contextual recursion algorithm that traverses the terms to translate (Sect. 3). The context fixes some variables, maintains a substitution, and keeps track of polarities or other data. The transformation-specific work, including the generation of proofs, is performed by plugin functions that are given as parameters to the framework. The recursion algorithm, which is critical for the performance and correctness of the generated proofs, needs to be implemented only once. Another benefit of the modular architecture is that we can easily combine several transformations in a single pass, without complicating the code unduly or compromising the level of detail of the proof output. For very large inputs, this can improve performance.

The inference rules and the contextual recursion algorithm enjoy many desirable properties (Sect. 4). The rules are sound, and the treatment of binders is correct even in the presence of name clashes. Moreover, with suitable data structures, proof generation adds an overhead that is proportional to the time spent processing the terms. Checking proofs represented as directed acyclic graphs (DAGs) can be performed with a time complexity that is linear in their size. Detailed proofs of the metatheory are included in a technical report [2], together with more explanations and examples.

We implemented the approach in veriT (Sect. 5), an SMT solver that is competitive on problems combining equality, linear arithmetic, and quantifiers [3]. Compared with other SMT solvers, veriT is known for its very detailed proofs [5], which are reconstructed in the proof assistants Coq [1] and Isabelle/HOL [6] and in the GAPT system [10]. As a proof of concept, we implemented a prototype checker in Isabelle/HOL.

By adopting the new framework, we were able to remove large amounts of complicated code in the solver, while enabling detailed proofs for more transformations than before. The contextual recursion algorithm had to be implemented only once and is more thoroughly tested than any of the monolithic transformations it subsumes. Our empirical evaluation reveals that veriT is as fast as before even though it now generates finer-grained proofs.

1.1 Conventions

Our setting is a many-sorted classical first-order logic as defined by the SMT-LIB standard [4]. A signature consists of a set of sorts and a set of function symbols. Nullary function symbols are called constants. We assume that the signature contains a \(\mathsf {Bool}\) sort and constants \(\mathsf {true},\, \mathsf {false}: \mathsf {Bool}\), a family of function symbols interpreted as equality, and the connectives \(\lnot \), \(\wedge \), \(\vee \), and . Formulas are terms of type \(\mathsf {Bool}\), and equivalence is equality (\(\simeq \)) on \(\mathsf {Bool}\). Terms are built over symbols from and variables from a fixed family of infinite sets . In addition to \(\forall \) and \(\exists \), we rely on two more binders: Hilbert’s choice operator \(\varepsilon x.\varphi \) and a ‘let’ construct, \(\mathrm {let}~{\bar{x}_n}\simeq {\bar{s}_n}~\mathrm {in}~t\), which simultaneously assigns n variables.

We use the symbol \(=\) for syntactic equality on terms. We reserve the names \(\mathsf {a}, \mathsf {f}, \mathsf {p},\mathsf {q}\) for function symbols; xy for variables; rstu for terms (which may be formulas); \(\varphi , \psi \) for formulas; and Q for quantifiers (\(\forall \) and \(\exists \)). We use the notations \({\bar{a}_n}\) and \((a_i)_{i=1}^n\) to denote the tuple, or vector, \((a_{1},\ldots ,a_{n})\). We write [n] for \(\{1,\dots ,n\}\).

Given a term t, the set of its free variables is written \( FV (t)\). The notation \(t[{\bar{x}_n}]\) stands for a term that may depend on \({\bar{x}_n}\); \(t[{\bar{s}_n}]\) is the corresponding term where the terms \({\bar{s}_n}\) are substituted for \({\bar{x}_n}\). Bound variables in t are renamed to avoid capture. Following these conventions, Hilbert choice and ‘let’ are characterized by

Substitutions \(\rho \) are functions from variables to terms such that \(\rho (x_i) \not = x_i\) for at most finitely many variables \(x_i\). We write them as \(\{\bar{x}_n \mapsto {\bar{s}_n}\}\). The substitution \(\rho [{\bar{x}_n} \mapsto {\bar{s}_n}]\) maps each variable \(x_i\) to the term \(s_i\) and otherwise coincides with \(\rho \). The application of a substitution \(\rho \) to a term t is denoted by \(\rho (t)\). It is capture-avoiding; bound variables in t are renamed as necessary. Composition \(\rho ' \circ \rho \) is defined as for functions (i.e., \(\rho \) is applied first).

2 Inference System

The inference rules used by our framework depend on a notion of context defined by the grammar  \(\mathrm {\Gamma }\,::=\, \varnothing \,\mid \, \mathrm {\Gamma },\, x \,\mid \, \mathrm {\Gamma },\,{\bar{x}_n}\mapsto {\bar{s}_n}\). Each context entry either fixes a variable x or defines a substitution \(\{{\bar{x}_n} \mapsto {\bar{s}_n}\}\). If a context introduces the same variable several times, the rightmost entry shadows the others. Abstractly, a context \(\mathrm {\Gamma }\) fixes a set of variables and specifies a substitution \( subst (\mathrm {\Gamma })\) defined by \( subst (\varnothing ) = \{\}\), \( subst (\mathrm {\Gamma },\, x) = subst (\mathrm {\Gamma })[x \mapsto x]\), and \( subst (\mathrm {\Gamma },\, {\bar{x}_n} \mapsto {\bar{t}_n}) = subst (\mathrm {\Gamma }) \mathrel \circ \{\bar{x}_n \mapsto {\bar{t}_n}\}\). In the second equation, the \([x \mapsto x]\) update shadows any replacement of x induced by \(\mathrm {\Gamma }\). We write \(\mathrm {\Gamma }(t)\) to abbreviate the capture-avoiding substitution \( subst (\mathrm {\Gamma })(t)\).

Transformations of terms (and formulas) are justified by judgments of the form \(\mathrm {\Gamma }\;{\vartriangleright }\;t\simeq u\), where \(\mathrm {\Gamma }\) is a context, t is an unprocessed term, and u is the corresponding processed term. The free variables in t and u must appear in the context \(\mathrm {\Gamma }\). Semantically, the judgment expresses the equality of the terms \(\mathrm {\Gamma }(t)\) and u for all variables fixed by \(\mathrm {\Gamma }\). Crucially, the substitution applies only on the left-hand side of the equality.

The inference rules for the transformations covered in this paper are presented below.

figure a
  • relies on an oracle to derive arbitrary lemmas in a theory \(\mathscr {T}\). In practice, the oracle will produce some kind of certificate to justify the inference. An important special case, for which we use the name Refl, is syntactic equality.

  • Trans needs the side condition because the term t appears both on the left-hand side of \(\simeq \) (where it is subject to \(\mathrm {\Gamma }\)’s substitution) and on the right-hand side.

  • Cong can be used for any function symbol \(\mathsf {f}\), including the logical connectives.

  • Bind is a congruence rule for quantifiers. The rule also justifies the renaming of the bound variable. The side condition prevents an unwarranted variable capture. In the antecedent, the renaming is expressed by a substitution in the context.

  • and exploit (\(\varepsilon _{1}\)) to replace a quantified variable with a suitable witness, simulating skolemization. We can think of the \(\varepsilon \) expression in each rule abstractly as a fresh function symbol that takes any fixed variables it depends on as arguments.

  • Let exploits (let) to expand a ‘let’ expression. The terms \(\bar{r}_n\) assigned to the variables \(\bar{x}_n\) can be transformed into terms \(\bar{s}_n\).

The antecedents of all the rules inspect subterms structurally, without modifying them. Modifications to the term on the left-hand side are delayed; the substitution is applied only in Taut. This is crucial to obtain compact proofs that can be checked efficiently. By systematically renaming variables in Bind, we can satisfy most side conditions trivially.

Judgments can be encoded into a well-understood theory of binders: the simply typed \(\lambda \)-calculus. This provides a solid basis to reason about them, and to reconstruct proofs expressed in the inference system. We refer to our technical report [2] for details.

The set of rules can be extended to cater for arbitrary transformations that can be expressed as equalities, using Hilbert choice to represent fresh symbols if necessary. The usefulness of Hilbert choice for proof reconstruction is well known [7, 19, 21], but we push the idea further and use it to simplify the inference system and make it more uniform.

Example 1

The following derivation tree justifies the expansion of a ‘let’ expression:

figure b

Skolemization can be applied regardless of polarity. Normally, we skolemize only positive existential quantifiers and negative universal quantifiers. However, skolemizing other quantifiers is sound in the context of proving. The trouble is that it is generally incomplete, if we introduce Skolem symbols and forget their definitions in terms of Hilbert choice. To paraphrase Orwell, all quantifiers are skolemizable, but some quantifiers are more skolemizable than others.

3 Contextual Recursion

We propose a generic algorithm for term transformations, based on structural recursion. The algorithm is parameterized by a few simple plugin functions embodying the essence of the transformation. By combining compatible plugin functions, we can perform several transformations in one traversal. Transformations can depend on some context that encapsulates relevant information, such as bound variables, variable substitutions, and polarity. Each transformation can define its own notion of context.

The output is generated by a proof module that maintains a stack of derivation trees. The procedure \(\textit{apply}(R,\, n,\, \mathrm {\Gamma },\, t,\, u)\) pops n derivation trees \(\smash {\mathscr {\bar{D}}_n}\) from the stack and pushes the tree of \(\mathrm {\Gamma }\;{\vartriangleright }\;t \simeq u\) obtained by applying rule R to \(\smash {\mathscr {\bar{D}}_n}\). The plugin functions are responsible for invoking \(\textit{apply}\) as appropriate.

3.1 The Generic Algorithm

The algorithm performs a depth-first postorder contextual recursion on the term to process. Subterms are processed first; then an intermediate term is built from the resulting subterms and is processed in turn. The context \(\mathrm {\Delta }\) is updated in a transformation-specific way with each recursive call. It is abstract from the point of view of the algorithm. The plugin functions are divided into two groups: \(ctx\_{let}\), \({ctx\_{quant}}\), and \({ctx\_{app}}\) update the context when entering the body of a binder or when moving from a function symbol to one of its arguments; \({build\_{let}}\), \({build\_{app}}\), \({build\_{app}}\), and \({build\_{var}}\) return the processed term and produce the corresponding proof as a side effect.

figure c

3.2 ‘Let’ Expansion

The first instance of the contextual recursion algorithm expands ‘let’ expressions and renames bound variables systematically to avoid capture. Skolemization and theory simplification, presented below, assume that this transformation has been performed. The context consists of a list of fixed variables and variable substitutions, as in Sect. 2. The plugin functions are as follows:

figure d

The \({ctx\_{let}}\) and \({build\_{let}}\) functions process ‘let’ expressions. In \({ctx\_{let}}\), the substituted terms are processed further before they are added to a substitution entry in the context. In \({build\_{let}}\), the \(\textsc {Let}\) rule is applied and the transformed term is returned. Analogously, the \({ctx\_{quant}}\) and \({build\_{quant}}\) functions rename quantified variables systematically. This ensures that any variables that arise in the range of the substitution specified by \({ctx\_{let}}\) will resist capture when the substitution is applied. Finally, the \(ctx\_{app}\), \(build\_{app}\), and \(build\_{var}\) functions simply reproduce the term traversal in the generated proof; they perform no transformation-specific work.

Example 2

Following up on Example 1, assume \(\varphi = \mathrm {let}\ x\simeq \mathsf {a}\ \mathrm {in}\ \mathsf {p}(x, x)\). Given the above plugin functions, \(\textit{process}(\varnothing ,\,\varphi )\) returns \(\mathsf {p}(\mathsf {a},\mathsf {a})\). It is instructive to study the evolution of the stack during the execution of \(\textit{process}\). First, in \({ctx\_{let}}\), the term \(\mathsf {a}\) is processed recursively; the call to \(build\_{app}\) pushes a nullary Cong step with succedent \({\vartriangleright }\;\mathsf {a} \simeq \mathsf {a}\) onto the stack. Then the term \(\mathsf {p}(x,\, x)\) is processed. For each of the two occurrences of x, \({build\_{var}}\) pushes a Refl step onto the stack. Next, \(build\_{app}\) applies a Cong step to justify rewriting under \(\mathsf {p}\): The two Refl steps are popped, and a binary Cong is pushed. Finally, \({build\_{let}}\) performs a Let inference with succedent \({\vartriangleright }\;\varphi \simeq \mathsf {p}(\mathsf {a}, \mathsf {a})\) to complete the proof: The two Cong steps on the stack are replaced by the Let step. The stack now consists of a single item: the derivation tree of Example 1.

3.3 Skolemization

Our second transformation, skolemization, assumes that ‘let’ expressions have been expanded and bound variables have been renamed apart. The context is a pair \(\mathrm {\Delta }= (\mathrm {\Gamma },\,p)\), where \(\mathrm {\Gamma }\) is as defined in Sect. 2 and \(p\) is the polarity (\(+\), −, or ?) of the term being processed. The main plugin functions are those that manipulate quantifiers:

figure e

The polarity is updated by \(ctx\_{app}\), which is not shown. For example, \(ctx\_{app}((\mathrm {\Gamma },\,-),\,\lnot ,\,\varphi ,\,1)\) returns \((\mathrm {\Gamma },\,+)\), because if \({\lnot }\,\varphi \) occurs negatively in a larger formula, then \(\varphi \) occurs positively. The plugin functions \(build\_{app}\) and \({build\_{var}}\) are as for ‘let’ expansion.

Positive occurrences of \(\exists \) and negative occurrences of \(\forall \) are skolemized. All other quantifiers are kept as they are. The \( sko\_term \) function returns an applied Skolem function symbol following some reasonable scheme; for example, outer skolemization [20] creates an application of a fresh function symbol to all variables fixed in the context. To comply with the inference system, the application of or in \({build\_{app}}\) instructs the proof module to systematically replace the Skolem term with the corresponding \(\varepsilon \) term when outputting the proof.

3.4 Theory Simplification

All kinds of theory simplification can be performed on formulas. We restrict our focus to a simple yet quite characteristic instance: the simplification of \(u + 0\) and \(0 + u\) to u. We assume that ‘let’ expressions have been expanded. The context is a list of fixed variables. The plugin functions \(ctx\_{app}\) and \({build\_{var}}\) are as for ‘let’ expansion; the remaining ones are presented below.

figure f

The quantifier manipulation code, in \({ctx\_{quant}}\) and \({build\_{app}}\), is straightforward. The interesting function is \(build\_{app}\). It first applies the Cong rule to justify rewriting the arguments. Then, if the resulting term \(\mathsf {f}(\bar{u}_n)\) can be simplified further into a term u, it performs a transitive chain of reasoning: \(\mathsf {f}(\bar{t}_n) \simeq \mathsf {f}(\bar{u}_n) \simeq u\).

3.5 Combinations of Transformations

Theory simplification can be implemented as a family of transformations, each member of which embodies its own set of theory-specific rewrite rules. If the union of the rewrite rule sets is confluent and terminating, a unifying implementation of \(build\_{app}\) can apply the rules in any order until a fixpoint is reached. Moreover, since theory simplification modifies terms independently of the context, it is compatible with ‘let’ expansion and skolemization. This allows us to perform arithmetic simplification in the substituted terms of a ‘let’ expression in a single pass.

The combination of ‘let’ expansion and skolemization is less straightforward. Consider the formula \(\varphi = \mathrm {let}\ y\simeq \exists x.\>\mathsf {p}(x)\ \mathrm {in}\ y\rightarrow y\). When processing the subformula \(\exists x.\>\mathsf {p}(x)\), we cannot (or at least should not) skolemize the quantifier, because it has no unambiguous polarity; indeed, the variable y occurs both positively and negatively in the ‘let’ expression’s body. We can of course give up and perform two passes: The first pass expands ‘let’ expressions, and the second pass skolemizes and simplifies terms. There is also a way to perform all the transformations in a single instance of the framework, described in our report [2].

3.6 Scope and Limitations

Other possible instances of contextual recursion are the clause normal form (CNF) transformation and the elimination of quantifiers using one-point rules. CNF transformation is an instance of rewriting of Boolean formulas and can be justified by a \(\textsc {Taut}\smash {_{\mathsf {Bool}}}\) rule. Tseytin transformation can be supported by representing the introduced constants by the formulas they represent, similarly to our treatment of Skolem terms. One-point rules—e.g., the transformation of into \(\mathsf {p}(\mathsf {a})\)—are similar to ‘let’ expansion and can be represented in much the same way in our framework.

Some transformations, such as symmetry breaking [9] and rewriting based on global assumptions, require a global analysis of the problem that cannot be captured by local substitution of equals for equals. They are beyond the scope of the framework. Other transformations, such as simplification based on associativity and commutativity of function symbols, require traversing the terms to be simplified when applying the rewriting. Since process visits terms in postorder, the complexity of the simplifications would be quadratic, while a processing that applies depth-first preorder traversal can perform the simplifications with a linear complexity. Hence, applying such transformations optimally is also outside the scope of the framework.

4 Theoretical Properties

The first two metatheoretical results below concern the soundness of the inference rules and the correctness of the recursion algorithm that generates proofs in that system. The other results have to do with the cost of proof generation and checking.

Theorem 1

(Soundness of Inferences). If judgment \(\mathrm {\Gamma }\;{\vartriangleright }\;t\simeq u\) is derivable using the inference system with theories \({\mathscr {T}_{1},\ldots ,\mathscr {T}_{n}}\), then .

Theorem 2

(Total Correctness of Recursion). For the instances presented in Sect. 3, the contextual recursion algorithm always produces correct proofs.

Observation 3

(Complexity of Recursion). For the instances presented in Sect. 3, the ‘\(\textit{process}\)’ function is called at most once on every subterm of the input.

As a corollary, if all the operations performed in \(\textit{process}\) excluding the recursive calls can be accomplished in constant time, the algorithm has linear-time complexity with respect to the input. There exist data structures for which the following operations take constant time: extending the context with a fixed variable or a substitution, accessing direct subterms of a term, building a term from its direct subterms, choosing a fresh variable, applying a context to a variable, checking if a term matches a simple template, and associating the parameters of the template with the subterms. Thus, it is possible to have a linear-time algorithm for ‘let’ expansion and simplification. On the other hand, skolemization is at best quadratic in the worst case.

Observation 4

(Overhead of Proof Generation). For the instances presented in Sect.3, the number of ‘\(\textit{apply}\)’ calls is proportional to the number of subterms in the input.

Notice that all arguments to \(\textit{apply}\) must be computed regardless of the \(\textit{apply}\) calls. If an \(\textit{apply}\) call takes constant time, the proof generation overhead is linear in the size of the input. To achieve this performance, it is necessary to use sharing to represent contexts and terms in the output.

Observation 5

(Cost of Proof Checking). Checking an inference step can be performed in constant time if checking the side condition takes constant time.

The above statement may appear weak, since checking the side conditions might itself be linear, leading to a cost of proof checking that can be at least quadratic in the size of the proof. Fortunately, most of the side conditions can be checked efficiently. For example, simplification proofs can be checked in linear time because \( subst (\mathrm {\Gamma })\) is always the identity. Moreover, certifying a proof by checking each step locally is not the only possibility. An alternative is to use an algorithm similar to the \(\textit{process}\) function to check a proof in the same way as it has been produced, exploiting sophisticated invariants.

5 Implementation

The ideas presented in this paper have been implemented in two tools. We implemented the contextual recursion algorithm and the transformations described in Sect. 3 in the SMT solver veriT [8], showing that replacing the previous ad hoc code with the generic proof-producing framework had no significant detrimental impact on the solving times. In addition, we developed a prototypical proof checker for the inference system described in Sect. 2 using Isabelle/HOL [18], to convince ourselves that veriT’s output can easily be reconstructed.

5.1 Isabelle

The Isabelle/HOL proof assistant is based on classical higher-order logic (HOL), a variant of the simply typed \(\lambda \)-calculus. The proof checker is included in the development version of Isabelle.Footnote 1

Derivations are represented by a recursive datatype in Standard ML, Isabelle’s primary implementation language. A derivation is a tree whose nodes are labeled by rule names. Rule also carries a theorem that represents the oracle , and rules Trans and Let are labeled with the terms that occur only in the antecedent (t and \(\bar{s}_n\)). Judgments \(\mathrm {\Gamma }\;{\vartriangleright }\;t\simeq u\) are translated to HOL equalities \(t' \simeq u'\), where \(t'\) and \(u'\) are HOL terms in which the context \(\mathrm {\Gamma }\) is encoded using \(\lambda \)-abstractions and (for substitutions) applications. For example, the judgment \(x,\, y \mapsto \mathsf {g}(x)\;{\vartriangleright }\;\mathsf {f}(y) \simeq \mathsf {f}(\mathsf {g}(x))\) is represented by the HOL equality \((\lambda x.\> (\lambda y.\> \mathsf {f}\>y)\> (\mathsf {g}\>x)) \simeq (\lambda x.\> \mathsf {f}\> (\mathsf {g}\>x))\).

Because reconstruction is not verified, there are no guarantees that it will always succeed, but when it does, the result is certified by Isabelle’s LCF-style inference kernel [11]. We hard-coded a few dozen examples to test different cases, such as this one: Given the HOL terms

and the ML tree

the reconstruction function returns the HOL theorem \(t \simeq u\).

5.2 veriT

We implemented the contextual recursion framework in the SMT solver veriT,Footnote 2 replacing large parts of the previous non-proof-producing, hard-to-maintain code. Even though it offers more functionality (proof generation), the preprocessing module is about 20% smaller than before and consists of about 3000 lines of code. There are now only two traversal functions instead of 10. This is, for us, a huge gain in maintainability.

We were able to reuse its existing proof module and proof format [5]. A proof is a list of inferences, each of which consists of an identifier, the name of the rule, the identifiers of the dependencies, and the derived clause. The use of identifiers makes it possible to represent proofs as DAGs. We extended the format with the inference rules of Sect. 2. The rules that augment the context take a sequence of inferences—a subproof—as a justification. The subproof occurs within the scope of the extended context.

In contrast with the abstract proof module described in Sect. 3, veriT leaves Refl steps implicit for judgments of the form \(\mathrm {\Gamma }\;{\vartriangleright }\;t \simeq t\). The other inference rules are generalized to cope with missing Refl judgments. In addition, when printing proofs, the proof module can automatically replace terms in the inferences with some other terms. This is necessary for transformations such as skolemization and ‘if–then–else’ elimination. We must apply a substitution in the replaced term if the original term contains variables. In veriT, efficient data structures are available to perform this.

The implementation of contextual recursion uses a single global context, augmented before processing a subterm and restored afterwards. The context consists of a set of fixed variables, a substitution, and a polarity. In our setting, the substitution satisfies the side conditions by construction. If the context is empty, the result of processing a subterm is cached. For skolemization, a separate cache is used for each polarity. No caching is attempted under binders.

Invoking process on a term returns the identifier of the inference at the root of its transformation proof in addition to the processed term. These identifiers are threaded through the recursion to connect the proof. The proofs produced by instances of contextual recursion are inserted into the larger resolution proof produced by veriT.

Transformations performing theory simplification were straightforward to port to the new framework: Their \(build\_{app}\) functions simply apply rewrite rules until a fixpoint is reached. Porting transformations that interact with binders required special attention in handling the context and producing proofs. Fortunately, most of these aspects are captured by the inference system and the abstract contextual recursion framework, where they can be studied independently of the implementation.

Some transformations are performed outside of the framework. Proofs of CNF transformation are expressed using the inference rules of veriT’s underlying SAT solver, so that any tool that can reconstruct SAT proofs can also reconstruct these proofs. Simplification based on associativity and commutativity of function symbols is implemented as a dedicated procedure, for efficiency reasons. It currently produces coarse-grained proofs.

To evaluate the impact of the new contextual recursion algorithm and of producing detailed proofs, we compare the performance of different configurations of veriT. Our experimental data is available online.Footnote 3 We distinguish three configurations. Basic only applies transformations for which the old code provided some (coarse-grained) proofs. Extended also applies transformations for which the old code did not provide any proofs, whereas the new code provides detailed proofs. Complete applies all transformations available, regardless of whether they produce proofs.

More specifically, Basic applies the transformations for ‘let’ expansion, skolemization, elimination of quantifiers based on one-point rules, elimination of ‘if–then–else’, theory simplification for rewriting n-ary symbols as binary, and elimination of equivalences and exclusive disjunctions with quantifiers in subterms. Extended adds Boolean and arithmetic simplifications to the transformations performed by Basic. Complete performs global rewriting simplifications and symmetry breaking in addition to the transformations in Extended.

The evaluation relies on two main sets of benchmarks from SMT-LIB [4] without bit vectors and nonlinear arithmetic (currently not supported by veriT): the \(20\,916\) benchmarks in the quantifier-free (QF) categories, and the \(30\,250\) benchmarks labeled as unsatisfiable in the non-QF categories. Our experiments were conducted on servers equipped with two Intel Xeon E5-2630 v3 processors, with eight cores per processor, and 126 GB of memory. Each run of the solver uses a single core. The time limit was set to 30 s, a reasonable value for interactive use within a proof assistant.

The table below shows the number of problems solved in total by each configuration.

 

Without proofs

With proofs

Old code

New code

Old code

New code

Basic

42 235

42 258

42 104

42 118

Extended

42 324

42 389

N/A

42 271

Complete

42 585

42 613

N/A

N/A

These results indicate that the new generic contextual recursion algorithm and the production of detailed proofs do not impact performance negatively compared with the old code and coarse-grained proofs. Moreover, allowing Boolean and arithmetic simplifications leads to some improvements. We expect that generating proofs for the global transformations would lead to substantial improvements on quantifier-free problems.

6 Related Work

Most automatic provers that support the TPTP syntax for problems generate proofs in TSTP format [24]. Like a veriT proof, a TSTP proof consists of a list of inferences. TSTP does not mandate any inference system; the meaning of the rules and the granularity of inferences vary across systems. For example, the E prover [22] combines clausification, skolemization, and variable renaming into a single inference, whereas Vampire [15] appears to cleanly separate preprocessing transformations. SPASS’s [25] custom proof format does not record preprocessing steps; reverse engineering is necessary to make sense of its output, and optimizations ought to be disabled [6, Sect. 7.3].

Most SMT solvers can parse the SMT-LIB [4] format, but each solver has its own output syntax. Z3’s proofs can be quite detailed [17], but rewriting steps often combine many rewrites rules. CVC4’s format is an instance of LF [13] with Side Conditions (LFSC) [23]; despite recent progress [12, 14], neither skolemization nor quantifier instantiation are currently recorded in the proofs. Proof production in Fx7 [16] is based on an inference system whose formula processing fragment is subsumed by ours; for example, skolemization is more ad hoc, and there is no explicit support for rewriting.

7 Conclusion

We presented a framework to represent and generate proofs of formula processing and its implementation in veriT and Isabelle/HOL. The framework centralizes the delicate issue of manipulating bound variables and substitutions soundly and efficiently, and it is flexible enough to accommodate many interesting transformations. Although it was implemented in an SMT solver, there appears to be no intrinsic limitation that would prevent its use in other kinds of first-order, or even higher-order, automatic provers. The framework covers many preprocessing techniques and can be part of a larger toolbox.

Detailed proofs have been a defining feature of veriT for many years now. It now produces more detailed justifications than ever, but there are still some global transformations for which the proofs are nonexistent or leave much to be desired. In particular, supporting rewriting based on global assumptions would be essential for proof-producing inprocessing, and symmetry breaking would be interesting in its own right.