1 Introduction

Higher-order (HO) logic is a pervasive setting for reasoning about numerous real-world applications. In particular, it is widely used in proof-assistants (also known as interactive theorem provers) to provide trustworthy, formal, and machine-checkable proofs of theorems. A major challenge in these applications is to automate as much as possible the production of these formal proofs, thereby reducing the burden of proof on the users. An effective approach to achieve stronger automation in proof assistants is to rely on less expressive but more automatic theorem provers to discharge some of the proof obligations. Systems such as , Miz\(\mathbb {A}\mathbb{R}\), Sledgehammer, and Why3, which provide a one-click connection from proof-assistants to first-order (FO) provers, have led in recent years to considerable improvements in proof-assistant automation [14]. A similar layered approach is also used by automatic HO provers such as Leo-III [43] and Satallax [17], which regularly invoke FO provers to discharge intermediate goals that depend solely on FO reasoning. However, as noted in previous work [12, 30, 48], in both cases the reduction to FOL has its own disadvantages: full encodings into FO, such as those performed by the hammers, may lead to issues with performance, soundness, or completeness. On the other hand, the combination of FO and HO reasoning in automatic HO provers may suffer from the HO prover itself having to perform substantial FO reasoning, since it is not optimized for FO proving. This would be the case in HO problems with a large FO component, which occur often in practice. We aim to overcome these shortcomings by extending Satisfiability Modulo Theories (SMT) [8] solvers, a class of highly successful automatic FO provers, to natively support HOL.

The two main challenges for extending SMT solvers to HOL lie in dealing with partial function applications and with functional variables, i.e., quantifier variables of higher-order type. The former mainly affects term representation and core algorithms, which in FOL are based on the fact that all function symbols are fully applied. The latter impacts quantifier instantiation techniques, which must now account for quantified variables occurring in function symbol positions. Moreover, often HO problems can only be proven if functional variables are instantiated with synthesized \(\lambda \)-terms, typically via HO unification [23], which is undecidable in general.

Contributions. We present two approaches for extending SMT solvers to natively support HO reasoning (HOSMT). The first one, the pragmatic approach (Sect. 3), targets existing state-of-the-art SMT solvers with large code bases and complex data structures optimized for the FO case. In this approach, we extend a solver with only minimal modifications to its core data structures and algorithms. In the second approach, the redesign approach (Sect. 4), we rethink a solver’s data structures and develop new algorithms aimed specifically at HO reasoning. This approach may lead to better results but is better suited to lightweight solvers, i.e., less optimized solvers with a smaller code base. Moreover, this approach provides more flexibility to later develop new techniques especially suited for higher-order reasoning. A common theme of both approaches is that the instantiation algorithms are not extended with HO unification. This is a significant enough challenge that we plan to explore in a later phase of this work. We include proofs, more examples, and related work in a technical report [5].

We present an extensive experimental evaluation (Sect. 5) of our pragmatic and redesign approaches as implemented respectively in the state-of-the-art SMT solver CVC4 [6] and the lightweight solver veriT [16]. Besides comparisons against state-of-the-art HO provers, we also evaluate these solvers against themselves, comparing a native HO encoding using the extensions in this paper to the base versions of the solvers with the more traditional FO encoding (not using the extensions).

Related Work. The pioneering work of Robinson [41] on using a translation to reduce higher-order reasoning to first-order logic inspired the successful tools such as Sledgehammer [36] and CoqHammer [19] that build on this idea by automating HO reasoning via automatic FO provers. Earlier works on native HO proving are, e.g., Andrews’s higher-order resolution [1] and Kohlhase’s higher-order tableau [29], inspire the modern day HO provers such as LEO-II [11] and Leo-III [43], implementing variations of HO resolution, and Satallax [17], based on a HO tableau calculus guided by a SAT solver. Our approach however is conceptually closer to recent work by Blanchette et al. [9, 48] on gracefully generalizing the superposition calculus [2, 33] to support higher-order reasoning. As a first step, they have targeted the \(\lambda \)-free fragment of higher-order logic, presenting a refutationally complete calculus [9] and an initial implementation as a prototype extension of the Zipperposition prover [18]. More recently they integrated their approach into the state-of-the-art FO prover E [48], showing competitive results against state-of-the-art HO provers. Their next step, as is ours, is to extend their calculus to superposition with \(\lambda \)-terms while preserving their completeness guarantees.

2 Preliminaries

Our monomorphic higher-order language \(\mathscr {L}\) is defined in terms of right-associative binary sort constructors \(\rightarrow \), \(\times \) and pairwise-disjoint countably infinite sets \(\mathcal {S}\), \(\mathcal {X}\) and , of atomic sorts, variables, and function symbols, respectively. We use the notations \(\bar{a}_n\) and \(\bar{a}\) to denote the tuple or the cross product , depending on context, with \({n\ge 0}\). We extend this notation to pairwise binary operations over tuples in the natural way. A sort \(\tau \) is either an element of \(\mathcal {S}\) or a functional sort \(\bar{\tau }_n\rightarrow \tau \) from sorts to sort \(\tau \). The elements of \(\mathcal {X}\) and are annotated with sorts, so that \({x:\tau }\) is a variable of sort \(\tau \) and \({\mathsf {f}:\bar{\tau }_n\rightarrow \tau }\) is an n-ary function symbol of sort \(\bar{\tau }_n\rightarrow \tau \). We identify function symbols of sort \(\bar{\tau }_0\rightarrow \tau \) with function symbols of sort \(\tau \), which we call constants when \(\tau \) is not a functional sort. Whenever convenient, we drop the sort annotations when referring to symbols.

The set of terms is defined inductively: every variable \(x:\tau \) is a term of sort \(\tau \). For variables \({\bar{x}_n:\bar{\tau }_n}\) and a term \({t:\tau }\) of sort \(\tau \), the expression \(\lambda \bar{x}_n.\,t\) is a term of sort \(\bar{\tau }_n\rightarrow \tau \), called a \(\lambda \)-abstraction, with bound variables \(\bar{x}_n\) and body t. A variable occurrence is free in a term if it is not bound by a \(\lambda \)-abstraction. For a term \({t:\bar{\tau }_n\rightarrow \tau }\) and terms \({t_1:\tau _1},\dots ,\,{t_m:\tau _m}\) with \(m \le n\), the expression is a term, called an application of \(\mathsf {f}\), the head of the application, to the arguments \(\bar{t}_m\). The application is total and has sort \(\tau \) if \(m = n\); it is partial and has sort if \(m < n\). A \(\lambda \)-application is an application whose head is a \(\lambda \)-abstraction. The subterm relation is defined recursively: a term is a subterm of itself; if a term is an application, all subterms of its arguments are also its subterms. Note this is not the standard definition of subterms in HOL, which also includes application heads and all partial applications. The set of all subterms in a term t is denoted by \(\mathbf {T}(t)\). We assume \(\mathcal {S}\) contains a sort o, the Boolean sort, and that contains Boolean constants \(\top ,\,\bot \), a Boolean unary function \(\lnot \), Boolean binary functions \(\wedge ,\,\vee \), and, for every sort \(\tau \), a family of equality symbols \({{\simeq } : \tau \times \tau \rightarrow o}\) and a family of symbols \({\mathsf {ite} : o\times \tau \times \tau \rightarrow \tau }\). These symbols are interpreted in the usual way as, respectively, logical constants, connectives, identity, and if-then-else (ITE). We refer to terms of sort o as formulas and to terms of sort \(\bar{\tau }\rightarrow o\) as predicates. An atom is a total predicate application. A literal or constraint is an atom or its negation. We assume the language contains the \(\forall \) and \(\exists \) binders over formulas, defined as usual, in addition to the \(\lambda \) binder. A formula or a term is ground if it is binder-free. We use the symbol \(=\) for syntactic equality on terms. We reserve the names \(\mathsf {a},\mathsf {b},\mathsf {c},\mathsf {f},\mathsf {g},\mathsf {h},\mathsf {p}\) for function symbols; wxyz for variables in general; FG for variables of functional sort; rstu for terms; and \(\varphi , \psi \) for formulas. The notation \(t[\bar{x}_n]\) stands for a term whose free variables are included in the tuple of distinct variables \(\bar{x}_n\); \(t[\bar{s}_n]\) is the term obtained from t by a simultaneous substitution of \(\bar{s}_n\) for \(\bar{x}_n\).

We assume contains a family of application symbols for all \(n > 1\). We use it to model (curried) applications of terms of functional sort \(\bar{\tau }_n\rightarrow \tau \). For example, given a function symbol \({\mathsf {f}:\tau _1\times \tau _2\rightarrow \tau _3}\) and application symbols \({@:(\tau _1\times \tau _2\rightarrow \tau _3)\times \tau _1\rightarrow (\tau _2\rightarrow \tau _3)}\) and \(@:(\tau _2\rightarrow \tau _3)\times \tau _2\rightarrow \tau _3\), \(@(\mathsf {f},\,t_1)\) and \(@(@(\mathsf {f},\,t_1),\,t_2)\) have, respectively, the same denotation as \(\lambda x_2:\tau _2. f(t_1, x_2)\) and \(\mathsf {f}(t_1, t_2)\).

An applicative encoding is a well-known approach for performing HO reasoning using FO provers. This encoding converts every functional sort into an atomic sort, every n-ary symbol into a nullary symbol, and uses @ to encode applications. Thus, all applications, partial or not, become total, and quantification over functional variables becomes quantification over regular FO variables. We adopt Henkin semantics [10, 27] with extensionality and choice, as is standard in automatic HO theorem proving.

2.1 SMT Solvers and Quantified Reasoning

SMT solvers that process quantified formulas can be seen as containing three main components: a preprocessing module, a ground solver, and an instantiation module. Given an input formula \(\varphi \), the preprocessing module applies various transformations (for instance, Skolemization, clausification and so on) to it to obtain another, equisatisfiable, formula \(\varphi '\). The ground solver operates on the formula \(\varphi '\). It abstracts all of its atoms and quantified formulas and treats them as if they were propositional variables. The solver for ground formulas provides an assignment \(E\cup Q\), where E is a set of ground literals and Q is a set of quantified formulas appearing in \(\varphi '\), such that \(E\cup Q\) propositionally entails \(\varphi '\). The ground solver then determines the satisfiability of E according to a decision procedure for a combination of background theories. If E is satisfiable, the instantiation module of the solver generates new instances, ground formulas of the form \(\lnot (\forall \bar{x}.\,\psi ) \vee \psi \sigma \) where \(\forall \bar{x}.\,\psi \) is a quantified formula in Q and \(\sigma \) is a substitution from the variables in \(\bar{x}\) to ground terms. These instances will be, after preprocessing, added conjunctively to the input of the ground solver, which will proceed to derive a new assignment \(E'\cup Q'\), if possible. This interplay may terminate either if \(\varphi '\) is proven unsatisfiable or if a model is found for an assignment \(E\cup Q\) that is also a model of \(\varphi '\).

Extending SMT solvers to HOL can be achieved by extending these three components so that: (1) the preprocessing module eliminates \(\lambda \)-abstractions; (2) the ground decision procedure supports a ground extensional logic with partial applications, which we denote QF_HOSMT; and (3) the instantiation module instantiates variables of functional type and takes into account partial applications and equations between functions. We can perform each of these tasks pragmatically without heavily modifying the solver, which is useful when extending highly optimized state-of-the-art SMT solvers (Sect. 3). Alternatively, we can perform these extensions in a more principled way by redesigning the solver, which better suits lightweight solvers (Sect. 4).

3 A Pragmatic Extension for HOSMT

We pragmatically extend the ground SMT solver to QF_HOSMT by removing \(\lambda \)-expressions (Sect. 3.1), checking ground satisfiability (Sect. 3.2), and generating models (Sect. 3.3). Extensions to the instantiation module are discussed in Sect. 3.4.

3.1 Eliminating \(\lambda \)-Abstractions and Partial Applications of Theory Symbols

To ensure that the formulas that reach the core solving algorithm are \(\lambda \)-free, a preprocessing pass is used to first eliminate \(\lambda \)-applications and then eliminate any remaining \(\lambda \)-abstractions. The former are eliminated via \(\beta \)-reduction, with each application \((\lambda \bar{x}.\, t[\bar{x}])\> \bar{u}\) replaced by the equivalent term \(t[\bar{u}]\). The substitution renames bound variables in t as needed to avoid capture.

Two main approaches exist for eliminating (non-applied) \(\lambda \)-abstractions: reduction to combinators [35] and \(\lambda \)-lifting [28]. Combinators allow \(\lambda \)-terms to be synthesized during solving without the need for HO unification. This translation, however, introduces a large number of quantifiers and often leads to performance loss [13, Sect. 6.4.2]. We instead apply \(\lambda \)-lifting in our pragmatic extension.

In \(\lambda \)-lifting, each \(\lambda \)-abstraction is replaced by a fresh function symbol, and a quantified formula is introduced to define the symbol in terms of the original expression. Note this is similar to the typical approach used for eliminating ITE expressions in SMT solvers. The new function takes as arguments the variables bound by the respective \(\lambda \)-abstraction and the free variables occurring in its body. More precisely, \(\lambda \)-abstractions of the form \(\lambda \bar{x}_n.\,t[\bar{x}_n,\,\bar{y}_m]\) of type \(\bar{\tau }_n\rightarrow \tau \) with \({\bar{y}_m:\bar{\upsilon }_m}\) occurring in a formula \(\varphi \) are lifted to (possibly partial) applications where \(\mathsf {f}\) is a fresh function symbol of type \(\bar{\upsilon }_m\times \bar{\tau }_n\rightarrow \tau \). Moreover, the formula \(\forall \bar{y}_m\bar{x}_n.\,\mathsf {f}(\bar{y}_m,\, \bar{x}_n)\simeq t[\bar{x}_n,\,\bar{y}_m]\) is added conjunctively to \(\varphi \). To minimize the number of new functions and quantified formulas introduced, eliminated expressions are cached so that the same definition can be reused.

In the presence of a background theory T, the norm in SMT, a previous preprocessing step is also needed to make all applications of theory, or interpreted, symbols total: each term of the form \(\mathsf {h}(\bar{t}_m)\), where \({\mathsf {h}:\bar{\tau }_n\rightarrow \tau }\) is a symbol of T and \(m < n\), is converted to \(\lambda \bar{x}_{n-m}.\,\mathsf {h}(\bar{t}_m,\,\bar{x}_{n-m})\), which is then \(\lambda \)-lifted as above to an uninterpreted symbol \(\mathsf {f}\), defined by the quantified formula \(\forall \bar{y} \forall \bar{x}_{n-m}.\, \mathsf {f}(\bar{x}_{n-m})\simeq \mathsf {h}(\bar{t}_m,\,\bar{x}_{n-m})\), with \(\bar{y}\) collecting the free variables of \(\bar{t}_m\).

We stress that careful engineering is required to perform \(\lambda \)-lifting correctly in an SMT solver not originally designed for it. For instance, using the existing machinery for ITE removal may be insufficient, since this may not properly handle instances occurring inside binders or as the head of applications.

3.2 Extending the Ground Solver to QF_HOSMT

Since we operate after preprocessing in a \(\lambda \)-free setting in which only uninterpreted functions may occur partially applied, lifting the ground solver to QF_HOSMT amounts to extending the solver for ground literals in the theory of Equality and Uninterpreted Functions (EUF) to handle partial applications and extensionality.

The decision procedure for ground EUF adopted by SMT solvers is based on classical congruence closure algorithms [24, 31]. While the procedure is easily extensible to HOL (with partial applications but no \(\lambda \)-abstractions) via a uniform applicative encoding [32], many SMT solvers require that function symbols occurring in (FO) terms be fully applied. Instead of redesigning the solver to accommodate partial applications, we apply a lazy applicative encoding where only partial applications are converted.

Concretely, during term construction, all partial applications are converted to total applications by means of the binary symbol @, while fully applied terms are kept in their regular representation. Determining the satisfiability of a set of EUF constraints E containing terms in both representations is done in two phases: if E is determined to be satisfiable by the regular first-order procedure, we introduce equalities between regular terms (i.e., fully applied terms without the @ symbol) and their applicative counterpart and recheck the satisfiability of the resulting set of constraints. However, we only introduce these equalities for regular terms which interact with partially applied ones. This interaction is characterized by function symbols appearing as members of congruence classes in the E-graph, the congruence closure of E built by the EUF decision procedure. A function symbol occurs in an equivalence class if it is an argument of an @ symbol or if it appears in an equality between function symbols. The equalities between regular terms and their applicative encodings are kept internal to the E-graph, therefore not affecting other parts of the ground decision procedure.

Example 1

Given \({\mathsf {f}:\tau \times \tau \rightarrow \tau }\), \({\mathsf {g},\mathsf {h}:\tau \rightarrow \tau }\) and \({\mathsf {a}:\tau }\), consider the set of constraints \(E=\{@(\mathsf {f},\mathsf {a})\simeq \mathsf {g},\ \mathsf {f}(\mathsf {a},\mathsf {a})\not \simeq \mathsf {g}(\mathsf {a}),\ \mathsf {g}(\mathsf {a})\simeq \mathsf {h}(\mathsf {a})\}\). We have that E is initially found to be satisfiable. However, since \(\mathsf {f}\) and \(\mathsf {g}\) occur partially applied, we augment the set of constraints with a correspondence between the HO and FO applications of \(\mathsf {f},\mathsf {g}\):

$$ E'=E\cup \{@( @(\mathsf {f},\mathsf {a}), \mathsf {a})\simeq \mathsf {f}(\mathsf {a},\mathsf {a}),\, @(\mathsf {g},\mathsf {a})\simeq \mathsf {g}(\mathsf {a})\} $$

When determining the satisfiability of \(E'\), the equality \(@(@(\mathsf {f},\mathsf {a}),\mathsf {a})\simeq @(\mathsf {g},\mathsf {a})\) will be derived by congruence and hence, \(\mathsf {f}(\mathsf {a},\mathsf {a}) \simeq \mathsf {g}(\mathsf {a})\) will be derived by transitivity, leading to a conflict. Notice that we do not require equalities between fully applied terms whose functions do not appear in the E-graph and their equivalent in the applicative encoding. In particular, the equality \(\mathsf {h}(\mathsf {a}) \simeq @(\mathsf {h},\mathsf {a})\) is not introduced in this example.\(\bullet \)

Fig. 1.
figure 1

Derivation rules for checking satisfiability of QF_HOSMT constraints in EUF.

We formalize the above procedure via the calculus in Fig. 1. The derivation rules operate on a current set E of constraints. A derivation rule can be applied if its premises are met. A rule’s conclusion either adds an equality literal to E or replaces it by \(\bot \) to indicate unsatisfiability. A rule application is redundant if its conclusion leaves E unchanged. A constraint set is saturated if it admits only redundant rule applications.

Rules Refl, Sym, Trans, Cong and Conflict are standard for EUF decision procedures based on congruence closure, i.e., the smallest superset of a set of equations that is closed under entailment in the theory of equality. The rule App-encode equates a full application to its applicative encoding equivalent, and it is applied only to applications of functions which occur as subterms in E. As mentioned above, this can only be the case if the function itself appears as an argument of an application, which happens when it is partially applied (as argument of @ or \({\simeq }\)).

Rule Extensionality is similar to how extensionality is handled in decision procedures for extensional arrays [21, 44]. If two non-nullary functions are disequal in E, then a witness of their disequality is introduced. The extensionality property is characterized by the axiom , for all functions \(\mathsf {f}\) and \(\mathsf {g}\) of the same type. The rule ensures the left-to-right direction of the axiom (the opposite one is ensured by App-encode together with the congruence closure rules). To simplify the presentation we assume that, for every term , there is a fresh symbol \(\mathsf {f}':\bar{\tau }_n\rightarrow \tau \) such that .

Example 2

Consider the function symbols \({\mathsf {f},\mathsf {g}:\tau \rightarrow \tau }\), \({\mathsf {a}:\tau }\), and the set of constraints \(E=\{\mathsf {f}\simeq \mathsf {g}, \mathsf {f}(\mathsf {a})\not \simeq \mathsf {g}(\mathsf {a})\}\). The constraints are initially satisfiable with respect to the congruence closure rules, however, since \(\mathsf {f},\mathsf {g}\in \mathbf {T}(E)\), the rule App-encode will be applied twice to derive \(\mathsf {f}(\mathsf {a})\simeq @(\mathsf {f},\,\mathsf {a})\) and \(\mathsf {g}(\mathsf {a})\simeq @(\mathsf {g},\,\mathsf {a})\). Then, via Cong, from \(\mathsf {f}\simeq \mathsf {g}\) we infer \(@(\mathsf {f},\,\mathsf {a})\simeq @(\mathsf {g},\,\mathsf {a})\), which leads to a conflict via transitivity.\(\bullet \)

Decision Procedure. Any derivation strategy for the calculus that does not stop until it saturates or generates \(\bot \) yields a decision procedure for the satisfiability of QF_HOSMT constraints in the EUF theory, according to the following results for the calculus.

Proposition 1

(Termination). Every sequence of non-redundant rule applications is finite.

Proposition 2

(Refutation Soundness). A constraint set is unsatisfiable if \(\bot \) is derivable from it.

Proposition 3

(Solution Soundness). Every saturated constraint set is satisfiable.

Even though we could apply the rules in any order, for better performance we only apply App-Encode and Extensionality once other rules have only redundant applications. Moreover, App-Encode has precedence over Extensionality.

3.3 Model Generation for Ground Formulas

When our decision procedure for QF_HOSMT saturates, it can produce a first-order model M as a witness for the satisfiability of its input. Typically, the models generated by SMT solvers for theories in first-order logic map uninterpreted functions \(\mathsf {f}:\bar{\tau }_n\rightarrow \tau \) to functions, denoted \(M(\mathsf {f})\), of the form

$$ \lambda \bar{x}_n.\,\mathsf {ite}(x_1 \simeq t^1_1\wedge \ldots x_n \simeq t^1_n, s_1,\ldots , \mathsf {ite}(x_1 \simeq t^{m-1}_1\wedge \ldots x_n \simeq t^{m-1}_n, s_{m-1},\, s_m) \ldots ) $$

in which every entry but the last corresponds to an application , modulo congruence, occurring in the problem. In other words, functions are interpreted in models M as almost constant functions.

In the presence of partial applications, this scheme can sometimes lead to functions with exponentially many entries. For example, consider the satisfiable formula

$$ \begin{array}{l} \mathsf {f}_1(\mathsf {a})\simeq \mathsf {f}_1(\mathsf {b})\wedge \mathsf {f}_1(\mathsf {b})\simeq \mathsf {f}_2\wedge \mathsf {f}_2(\mathsf {a})\simeq \mathsf {f}_2(\mathsf {b})\wedge \mathsf {f}_2(\mathsf {b})\simeq \mathsf {f}_3\wedge \mathsf {f}_3(\mathsf {a})\simeq \mathsf {f}_3(\mathsf {b})\wedge \mathsf {f}_3(\mathsf {b})\simeq \mathsf {c} \end{array} $$

in which \(\mathsf {f}_1:\tau \times \tau \times \tau \rightarrow \tau \), \(\mathsf {f}_2:\tau \times \tau \rightarrow \tau \), \(\mathsf {f}_3:\tau \rightarrow \tau \), and \(\mathsf {a}, \mathsf {b}, \mathsf {c}:\tau \). To produce the model values of \(\mathsf {f}_1\) as a list of total applications with three arguments into an element of the interpretation of \(\tau \), we would need to account for 8 cases. In other words, we require 8 \(\mathsf {ite}\) cases to indicate \(\mathsf {f}_1(x,y,z) \simeq \mathsf {c}\) for all inputs where \(x,y,z \in \{\mathsf {a}, \mathsf {b}\}\). The number of entries in the model is exponential on the “depth” of the chain of functions that each partial application is equal to, which can make model building unfeasible if just a few functions are chained as in the above example.

To avoid such an exponential behavior, model building assigns values for functions in terms of the other functions that their partial applications are equated to. In the above example \(\mathsf {f}_1\) would have only two model values, depending on its application’s first argument being \(\mathsf {a}\) or \(\mathsf {b}\), by using the model values of \(\mathsf {f}_2\) applied on its two other arguments. In other words, we construct \(M(\mathsf {f}_1)\) as the term:

$$ \lambda xyz.\, \mathsf {ite}(x \simeq \mathsf {a},\, M(\mathsf {f}_2)(y,\, z), \mathsf {ite}(x \simeq \mathsf {b},\, M(\mathsf {f}_2)(y,\, z), \_ )) $$

where \(M(\mathsf {f}_2)\) is the model for \(\mathsf {f}_2\) and \(\_\) is an arbitrary value. The model value of \(\mathsf {f}_2\) would be analogously built in terms of the model value of \(\mathsf {f}_3\). This guarantees a polynomial construction for models in terms of the number of constraints in the problem in the presence of partial applications.

Extensionality and Finite Sorts. Model construction assigns different values to terms not asserted equal. Therefore, if non-nullary functions \(\mathsf {f},\mathsf {g}:\bar{\tau }_n\rightarrow \tau \) occur as terms in different congruence classes but are not asserted disequal, we ensure they are assigned different model values by introducing disequalities of the form for fresh \(\bar{\mathsf {sk}}_n\). This is necessary because model values for functions are built based on their applications occurring in the constraint set. However, such disequalities are only always guaranteed to be satisfied if \(\bar{\tau }_n,\tau \) are infinite sorts.

Example 3

Let E be a saturated set of constraints s.t. \(\mathsf {p}_1,\mathsf {p}_2,\mathsf {p}_3:\tau \rightarrow o\in \mathbf {T}(E)\) and . In the congruence closure of E the functions \(\mathsf {p}_1,\mathsf {p}_2,\mathsf {p}_3\) each occur in a different congruence class but are not asserted disequal, so a naive model construction would, in order to build their model values, introduce disequalities \(\mathsf {p}_1(\mathsf {sk}_1) \not \simeq \mathsf {p}_2(\mathsf {sk}_1)\), \(\mathsf {p}_1(\mathsf {sk}_2) \not \simeq \mathsf {p}_3(\mathsf {sk}_2)\), and \(\mathsf {p}_2(\mathsf {sk}_3) \not \simeq \mathsf {p}_3(\mathsf {sk}_3)\), for fresh \(\mathsf {sk}_1,\mathsf {sk}_2,\mathsf {sk}_3:\tau \). However, if \(\tau \) has cardinality one these disequalities make E unsatisfiable, since \(\mathsf {sk}_1,\mathsf {sk}_2,\mathsf {sk}_3\) must be equal and o has cardinality 2.\(\bullet \)

To prevent this issue, whenever the set of constraints E is saturated, we introduce, for every pair of functions \(\mathsf {f},\mathsf {g}:\bar{\tau }_n\rightarrow \tau \in \mathbf {T}(E)\) s.t. \(n>0\) and , the splitting lemma \(\mathsf {f}\simeq \mathsf {g}\vee \mathsf {f}\not \simeq \mathsf {g}\). In the above example this would amount to add the lemmas \(\mathsf {p}_1 \simeq \mathsf {p}_2 \vee \mathsf {p}_1 \not \simeq \mathsf {p}_2,\, \mathsf {p}_1 \simeq \mathsf {p}_3 \vee \mathsf {p}_1 \not \simeq \mathsf {p}_3,\text { and } \mathsf {p}_2 \simeq \mathsf {p}_3 \vee \mathsf {p}_2 \not \simeq \mathsf {p}_3\), thus ensuring that the decision procedure detects the inconsistency before saturation.

3.4 Extending the Quantifier Instantiation Module to HOMST

The main quantifier instantiation techniques in SMT solving are trigger-based [22], conflict-based [4, 38], model-based [26, 40], and enumerative [37]. Lifting any of them to HOSMT presents its own challenges. We focus here on extending the E-matching [20] algorithm, the keystone of trigger-based instantiation, the most commonly used technique in SMT solvers. In this technique, instantiations are chosen for quantified formulas \(\varphi \) based on triggers. A trigger is a term (or set of terms) containing the free variables occurring in \(\varphi \). Matching a trigger term against ground terms in the current set of assertions E results in a substitution that is used to instantiate \(\varphi \).

The presence of higher-order constraints poses several challenges for E-matching. First, notice that the @ symbol is an overloaded operator. Applications of this symbol can be selected as terms that appear in triggers. Special care must be taken so that applications of @ are not matched with ground applications of @ whose arguments have different types. Second, functions can be equated in higher-order logic. As a consequence, a match may involve a trigger term and a ground term with different head symbols. Third, since we use a lazy applicative encoding, our ground set of terms may contain a mixture of partially and fully applied function applications. Thus, our indexing techniques must be robust to handle combinations of the two. The following example demonstrates the last two challenges.

Example 4

Consider E with the equality \(@(\mathsf {f},\mathsf {a})\simeq \mathsf {g}\) and the term \(\mathsf {f}(\mathsf {a},\mathsf {b})\) where \(\mathsf {f}:\tau \times \tau \rightarrow \tau \) and \(\mathsf {g}:\tau \rightarrow \tau \). Notice that \(\mathsf {g}(x)\) is equivalent modulo E to the term \(\mathsf {f}(\mathsf {a},\mathsf {b})\) under the substitution \(x \mapsto \mathsf {b}\). Such a match is found by indexing all terms that are applications of either \(@(\mathsf {f},\mathsf {a})\) or \(\mathsf {g}\) in a common term index. This ensures when matching \(\mathsf {g}(x)\), the term \(\mathsf {f}(\mathsf {a},\mathsf {b})\), whose applicative counterpart is \(@(@(\mathsf {f},\mathsf {a}), \mathsf {b})\), is considered.

We extended the regular first-order E-matching algorithm of CVC4 as described in this section. Extensions to the other instantiation techniques of CVC4, such as model-based quantifier instantiation, are left as future work.

Extending Expressivity via Axioms. Even though not synthesizing \(\lambda \)-abstractions prevents us from fully lifting the above instantiation techniques to HOL, we remark that, as we see in Sect. 5, this pragmatic extension very often can prove HO theorems, many times even at higher rates than full-fledged HO provers. Success rates can be further improved by using well-chosen axioms to prove problems that otherwise cannot be proved without synthesizing \(\lambda \)-abstractions.

Example 5

Consider the ground formula \(\varphi =\mathsf {a}\not \simeq \mathsf {b}\) with \(\mathsf {a}\), \(\mathsf {b}\) of sort \(\tau \) and the quantified formula \(\psi =\forall {F,G : \tau \rightarrow \tau }.\, F\simeq G\). Intuitively \(\psi \) states that all functions of sort \(\tau \rightarrow \tau \) are equal. However, this is inconsistent with \(\varphi \), which forces \(\tau \) to contain at least two elements and therefore \(\tau \rightarrow \tau \) to contain at least four functions. For a prover to detect this inconsistency it must apply an instantiation like \(\{F \mapsto (\lambda w.\, \mathsf {a}),\, G \mapsto (\lambda w.\, \mathsf {b})\}\) to \(\psi \), which would need HO unification. However, adding the axiom

$$\begin{aligned} \forall {F : \tau \rightarrow \tau }.\, \forall x,y:\tau .\, \exists {G:\tau \rightarrow \tau }.\,\forall z:\tau .\, G(z)\simeq \mathsf {ite}(z\simeq x,\,y,\,F(z)) \end{aligned}$$
(SAX)

makes the problem provable without the need to synthesize \(\lambda \)-abstractions.\(\bullet \)

We denote the above axiom as the store axiom (SAX) because it simulates how arrays are updated via the store operation. As we note in Sect. 5, introducing this axiom for all functional sorts occurring in the problem often allows our pragmatically extended solver to prove problems it would not be able to prove otherwise. Intuitively, the reason is that instances can be generated not only from terms in the original problem, but also from the larger set of functions representable in the formula signature.

4 Redesigning a Solver for HOSMT

In the previous section we discussed how to address the challenges of HO reasoning in SMT while minimally changing the SMT solver. Alternatively, we can redesign the solver to support HO features directly. However, this requires a redesign of the core data structures and algorithms. We propose one such redesign below. We again assume that the solver operates on formulas with no \(\lambda \)-abstraction and no partial applications of theory symbols, which can be achieved via preprocessing (Sect. 3.1).

4.1 Redesigning the Core Ground Solver for HOSMT

Efficient implementations of the congruence closure (CC) procedure for EUF reasoning operate on Union-Find data structures and have asymptotic time complexity \(\mathcal {O}(n\,\mathrm {log}\,n)\). To accommodate partial applications, we propose a simpler algorithm which operates on an E-graph where nodes are terms, and edges are relations (equality, congruence, disequality) between them. An equivalence class is a connected component without disequality edges. All operations on the graph (incremental addition of new constraints, backtracking, conflict analysis, proof production) are implemented straightforwardly. This simpler implementation comes at the cost of higher worse-case time complexity (the CC algorithm becomes quadratic) but integrates better with various other features such as term addition, support of injective functions, rewriting or even computation, in particular for \(\beta \)- and \(\eta \)-conversion, which now can be done during solving rather than as preprocessing. In the redesigned approach, the solver keeps two term representations, a curried representation and a regular one. In the regular one, partial and total applications are distinguished by type information. The curried representation is used only by the congruence closure algorithm. It is integrated with the rest of the solver via an interface with translation functions \(\mathtt {curry}\) and \(\mathtt {uncurry}\) between the two different representations. For conciseness, instead of writing below, we use the curried notation , omitting parenthesis when unambiguous.

Example 6

Given \({\mathsf {f}:\tau \times \tau \rightarrow \tau }\), \({\mathsf {g},\mathsf {h}:\tau \rightarrow \tau }\) and \({\mathsf {a}:\tau }\), consider the constraints \(\{ \mathsf {f}(\mathsf {a}) \) \(\simeq \mathsf {g},\ \mathsf {f}(\mathsf {a},\mathsf {a})\not \simeq \mathsf {g}(\mathsf {a}),\ \mathsf {g}(\mathsf {a})\simeq \mathsf {h}(\mathsf {a})\}\). The congruence closure module will operate on \(\{ \mathsf {f}\, \mathsf {a}\simeq \mathsf {g},\ \mathsf {f}\, \mathsf {a}\,\mathsf {a} \not \simeq \mathsf {g}\, \mathsf {a},\ \mathsf {g}\, \mathsf {a}\simeq \mathsf {h}\, \mathsf {a}\}\), thanks to the \(\mathtt {curry}\) translation.\(\bullet \)

SMT solvers generally perform theory combination via equalities over terms shared between different theories. Given the different term representations kept between the CC procedure and the rest of the solver, to ensure that theory combination is done properly, the redesigned core ground solver keeps track of terms shared with other theory solvers. Whenever an equality is inferred on a term whose translation is shared with another theory, a shared equality is sent out in terms of the translation.

Example 7

Consider the function symbols \(\mathsf {f}:\mathsf {Int}\rightarrow \mathsf {Int},\, {\mathsf {p}:\mathsf {Int}\rightarrow o}\)\(\mathsf {a},\mathsf {b},\mathsf {c_1},\mathsf {c_2},\mathsf {c_3}\), \(\mathsf {c_4}:\mathsf {Int}\), the set of arithmetic constraints \(\{\mathsf {a} \le \mathsf {b},\, \mathsf {b} \le \mathsf {a},\, \mathsf {p}(\mathsf {f}(\mathsf {a}) - \mathsf {f}(\mathsf {b})),\, \lnot \mathsf {p}(0),\, \mathsf {c_1} \simeq \mathsf {c_3} - \mathsf {c_4},\, \mathsf {c_2} \simeq 0\}\), and the set of curried equality constraints \(E=\{\mathsf {p}\,\mathsf {c_1},\, \lnot (\mathsf {p}\,\mathsf {c_2}),\, \mathsf {c_3} \simeq \mathsf {f}\,\mathsf {a},\, \mathsf {c_4} \simeq \mathsf {f}\,\mathsf {b}\}\). The equalities \(\mathsf {c_3} \simeq \mathsf {f}\,\mathsf {a}\) and \(\mathsf {c_4} \simeq \mathsf {f}\,\mathsf {b}\) keep track of the fact that \(\mathsf {f}\,\mathsf {a}\) and \(\mathsf {f}\,\mathsf {b}\) are shared. The arithmetic module deduces \(\mathsf {a} \simeq \mathsf {b}\), which is added to \(E'= E \cup \{\mathsf {a} \simeq \mathsf {b}\}\). By congruence, \(\mathsf {f}\,\mathsf {a} \simeq \mathsf {f}\,\mathsf {b}\) is derived, which propagates \(\mathsf {c_3} \simeq \mathsf {c_4}\) to the arithmetic solver. With this new equality, arithmetic reasoning derives \(\mathsf {c_1} \simeq \mathsf {c_2}\), whose addition to the equality constraints produces the unsatisfiable constraint set \(E' \cup \{\mathsf {c_1} \simeq \mathsf {c_2}\}\).\(\bullet \)

Extensionality. The Extensionality rule (Fig. 1) is sufficient for handling extensionality at the ground level. However, it has shortcomings when quantifiers, even just first-order ones, are considered, as shown in the example below. In the redesigned solver, extensionality is better handled via axioms.

Example 8

Consider the constraints \(E = \{\mathsf {h}\,\mathsf {f}\simeq \mathsf {b},\,\mathsf {h}\,\mathsf {g}\not \simeq \mathsf {b},\,\forall x.\,\mathsf {f}(x)\simeq \mathsf {a},\) \(\forall x.\,\mathsf {g}(x)\simeq \mathsf {a}\}\), with \(\mathsf {h}:\tau \rightarrow \tau \rightarrow \tau ,\,\mathsf {f},\mathsf {g}:\tau \rightarrow \tau , \,\mathsf {a},\mathsf {b}:\tau \). The pragmatic solver could prove this problem unsatisfiable only with a ground decision procedure that derives consequences of disequalities, since deriving \(\mathsf {f}\not \simeq \mathsf {g}\) is necessary to derive \(\mathsf {f}(\mathsf {sk})\not \simeq \mathsf {g}(\mathsf {sk})\), via extensionality, which then leads to a conflict. But SMT solvers are well known not to propagate all disequalities for efficiency considerations. In contrast, with the axiom , the instantiation \(\{F\mapsto \mathsf {f},\,G\mapsto \mathsf {g}\}\) (which may be derived, e.g., via enumerative instantiation, since \(\mathsf {f},\mathsf {g}\in \mathbf {T}(E)\)), provides the splitting lemma \({ \mathsf {f} \simeq \mathsf {g} \vee \mathsf {f}(\mathsf {sk}) \not \simeq \mathsf {g}(\mathsf {sk})}\). The case \(E \cup \{\mathsf {f} \simeq \mathsf {g}\}\) leads to a conflict by pure ground reasoning, while the case \(E \cup \{ \mathsf {f}\,\mathsf {sk} \not \simeq \mathsf {g}\,\mathsf {sk} \}\) leads to a conflict from the instances \(\mathsf {f}(\mathsf {sk}) \simeq \mathsf {a},\, \mathsf {g}(\mathsf {sk})\simeq \mathsf {a}\) of the quantified formulas in E. \(\bullet \)

4.2 Quantifier Instantiation Module

In the pragmatic approach, the challenges for the E-matching procedure lied in properly accounting for the @ symbol, functional equality, and the mixture of partial and total applications, all of which lead to different term representations, in the term indexing data structure. In the redesign approach, the second challenge remains the same, and term indexing is extended in the same manner of Sect. 3.4 to cope with it. The first and third challenge present themselves in a different way, however, since the curried representation of terms is only used inside the E-graph of the new CC procedure. To apply E-matching properly, term indexing is extended to perform query by types, returning all the subterms of a given type that occur in the E-graph, but translated back to the uncurried representation.

Example 9

Consider \(E=\{\mathsf {f}(\mathsf {a}, \mathsf {g}(\mathsf {b}, \mathsf {c}))\simeq \mathsf {a}, \forall F.\, F(\mathsf {a}) \simeq \mathsf {h},\,\) \(\forall y.\, \mathsf {h}(y) \not \simeq \mathsf {a}\}\) and the set of triggers \(\{ F(\mathsf {a}), \mathsf {h}(y)\}\) where \(\mathsf {a},\mathsf {b},\mathsf {c}:\tau \), \({\mathsf {h}:\tau \rightarrow \tau }\) and \({\mathsf {f},\mathsf {g}:\tau \times \tau \rightarrow \tau }\). The set of ground curried terms in E is \(\{ \mathsf {f}\, \mathsf {a}\, (\mathsf {g}\, \mathsf {b}\, \mathsf {c}), \mathsf {f}\, \mathsf {a}, \mathsf {g}\, \mathsf {b}, \mathsf {g}\, \mathsf {b}\, \mathsf {c}, \mathsf {f}, \mathsf {g},\mathsf {a},\mathsf {b},\mathsf {c} \}\). To do E-matching with \(F(\mathsf {a})\) and \(\mathsf {h}(y)\) the index returns the sets of uncurried subterms \(\{\mathsf {f}(\mathsf {a}, \mathsf {g}(\mathsf {b}, \mathsf {c})), \mathsf {a},\) \(\mathsf {g}(\mathsf {b}, \mathsf {c}), \mathsf {b},\mathsf {c}\}\) and \(\{ \mathsf {f}(\mathsf {a}), \mathsf {g}(\mathsf {b})\}\) for the types \(\tau \) and \(\tau \rightarrow \tau \), respectively. \(\bullet \)

Since we do not perform HO unification, to instantiate functional variables it suffices to extend the standard E-matching algorithm applied by SMT solvers by accounting for function applications with variable heads. When matching a term \(F(\bar{s}_n)\) with a ground term t the procedure essentially matches F with the head of ground terms congruent to t, as long as each \(s_i\) in \(\bar{s}_n\) can be matched with each \(t_i\) in \(\bar{t}_n\). In the above example, matching the trigger \(F(\mathsf {a})\) with the term \(\mathsf {f}(\mathsf {a})\) yields the substitution \(\{F\mapsto \mathsf {f}\}\).

5 Evaluation

We have implemented the above techniques in the state-of-the-art CVC4 solver and in the lightweight veriT solvers. We distinguish between two main versions of each solver: one that performs a full applicative encoding (Sect. 2) into FOL a priori, denoted @cvc and @vt, and another that implements the pragmatic (Sects. 3) or redesigned (Sect. 4) extensions to HOL within the solvers, denoted cvc and vt. Both CVC4 modes eliminate \(\lambda \)-abstractions via \(\lambda \)-lifting. Neither veriT configuration supports benchmarks with \(\lambda \)-abstractions. The CVC4 configurations that employ the “store axiom” (Sect. 3.4) are denoted by having the suffix -sax.

We use the state-of-the-art HO provers Leo-III [43], Satallax [17, 25] and Ehoh [42, 48] as baselines in our evaluation. The first two have refutationally complete calculi for extensional HOL with Henkin semantics, while the third only supports \(\lambda \)-free HOL without first-class Booleans. For Leo-III and Satallax we use their configurations from the CASC competition [47], while for Ehoh we report on their best non-portfolio configuration from Vukmirović et al., Ehoh hb, [48].

We split our account between the case of proving HO theorems and that of producing countermodels for HO conjectures since the two require different strengths from the system considered. We discus only two of them, CVC4 and Satallax, for the second evaluation. The reason is that Leo-III and veriT do not provide models and Ehoh is not model-sound with respect to Henkin semantics, only with respect to \(\lambda \)-free Henkin semantics. We ran our experiments on a cluster equipped with Intel E5-2637 v4 CPUs running Ubuntu 16.04, providing one core, 60 s, and 8 GB RAM for each job. The full experimental data is publicly available.Footnote 1

We consider the following setsFootnote 2 of HO benchmarks: the 3,188 monomorphic HO benchmarks in TPTP [46], split into three subsets: the 530 problems that are both \(\lambda \)-free and without first-class Booleans (TH0); the 743 that are only \(\lambda \)-free (oTH0); and the 1,915 that are neither (\(\lambda o\)TH0). The next sets are Sledgehammer (SH) benchmarks from the Judgment Day test harness [15], consisting of 1,253 provable goals manually chosen from different Isabelle theories [34] and encoded into \(\lambda \)-free monomorphic HOL problems without first-class Booleans. The encoded problems are such that they are provable only if the original goal is. These problems are split into four subsets, JD\(^{32}_{\mathrm {lift}}\), JD\(^{32}_{\mathrm {combs}}\), JD\(^{512}_{\mathrm {lift}}\), and JD\(^{512}_{\mathrm {combs}}\) depending, respectively, on whether they have 32 or 512 Isabelle lemmas, or facts, and whether \(\lambda \)-abstractions are removed via \(\lambda \)-lifting or via SK-style combinators. The last set, \(\lambda o\)SH\(^{1024}\), has 832 SH benchmarks from 832 provable goals randomly selected from different Isabelle theories, encoded with 1,024 facts and preserving \(\lambda \)s and first-class Booleans. Considering a varying number of facts in the SH benchmarks emulates the needs of increasingly larger problems in interactive verification, while different \(\lambda \) handling schemes allow us to measure from which alternative each particular solver benefits more.

We point out that our extensions of CVC4 and veriT do not significantly compromise their performance on FO benchmarks. The pragmatic extension of CVC4 has virtually the same performance as the original solver on SMT-LIB [7], the standard SMT test suite. The redesigned veriT does have a considerably lower performance. However, while it is, for example, three times slower on the QF_UF category of SMT-LIB due to its slower ground solver for EUF, it still performs better on this category than CVC4. This shows that despite the added cost of supporting higher-order reasoning, the FO performance of veriT is still on par with the state of the art.

Table 1. Proved theorems per benchmark set. Best results are in bold.

5.1 Proving HO Theorems

The number of theorems proved by each solver configuration per benchmark set is given in Table 1. Grayed out cells represent unsupported benchmark sets. Figure 2 compares benchmarks solved per time. It only includes benchmark sets supported by all solvers (namely TH0 and the JD benchmarks).

As expected, the results vary significantly between benchmark sets. Leo-III and Satallax have a clear advantage on TPTP, which contains a significant number of small logical problems meant to exercise the HO features of a prover. Considering the TPTP benchmarks from less to more expressive, i.e., including first-class Booleans and then \(\lambda \)s, we see the advantages of these systems only increase. We also observe that both @cvc and cvc, but especially the latter, benefit from -sax as more complex benchmarks are considered in TPTP, showing that the disadvantage of not synthesizing \(\lambda \)-abstractions can sometimes be offset by well-chosen axioms. Nevertheless, the results on \(\lambda o\mathrm {TH0}\) show that this axiom alone is far from enough to offset the gap between @cvc and cvc, with cvc giving up more often from lack of instantiations to perform.

Fig. 2.
figure 2

Execution times in secs on 5,543 benchmarks, from TH0 and JD, supported by all solvers.

Sledgehammer-generated problems stem from formalization efforts across different applications. As others note [45, 48], the bottleneck in solving these problems is often scalability and efficient FO reasoning, rather than a refined handling of HO constructs, especially as more facts are considered. Thus, the ability to synthesize \(\lambda \)-abstractions is not sufficient for scalability as more facts are considered, and Ehoh and the CVC4 extensions eventually surpass the native HO provers. In particular, in the largest set we considered, \(\lambda o\mathrm {SH}^{1024}\), both @cvc and cvc have significant advantages. As in \(\lambda o\mathrm {TH0}\), @cvc also solves more problems than cvc in \(\lambda o\mathrm {SH}^{1024}\), which we attribute again to @cvc being able to perform more instantiations than cvc On commonly solved problems, however, cvc is often faster than @cvc, albeit by a small margin: 15% on average.

Both CVC4 configurations dominate \(\mathrm {JD}^{512}\) with a significantly margin over Ehoh and Leo-III. Comparing the results between using \(\lambda \)-lifting or combinators, the former favors cvc and the latter, @cvc. These results, as well as the previously discussed ones, indicate that for unsatisfiable benchmarks the pragmatic extension of CVC4 should not, in its current state, substitute an encoding-based approach but complement it. In fact, a virtual best solver of all the CVC4 configurations, as well as others employing interleaved enumerative instantiation [37], in portfolio, would solve 703 problems in \(\mathrm {JD}^{512}_{\mathrm {lift}}\), 702 in \(\mathrm {JD}^{512}_{\mathrm {combs}}\), 453 in \(\lambda o\mathrm {SH}^{1024}\), and 408 in TH0, the most in these categories, even also considering a virtual best solver of all Ehoh configurations from [48]. The CVC4 portfolio would also solve 482 problems in \(\mathrm {JD}^{32}_{\mathrm {lift}}\), and 482 in \(\mathrm {JD}^{32}_{\mathrm {combs}}\), doing almost as well as Leo-III, and 1,001 problems in \(\lambda o\mathrm {TH0}\), The virtual best CVC4 has a success rate 3% points higher than @cvc on Sledgehammer benchmarks, as well as overall, which represents a significant improvement when considering the usage of these solvers as backends for interactive theorem provers.

Differently from the pragmatic extension in CVC4, which provides more of an alternative to the full applicative encoding, the redesigned veriT is an outright improvement, with vt consistently solving more problems and with better solving times than @vt, especially on harder problems, as seen by the wider separation between them after 10s in Fig. 2. Overall, veriT ’s performance, consistently with it being a lightweight solver, lags behind CVC4 and Ehoh as bigger benchmarks are considered. However, it is respectable compared with Leo-III ’s and ahead of Satallax ’s performance, thus validating the effort of redesigning the solver for a more refined handling of higher-order constructs and suggesting that further extensions should be beneficial.

Table 2. Conjectures with found countermodels per benchmark set. Best results in bold.

5.2 Providing Countermodels to HO Conjectures

The number of countermodels found by each solver configuration per benchmark set is given in Table 2. We consider the two CVC4 extension, @cvc and cvc, run in finite-model-finding mode (-fmf) [39]. The builtin HO support in cvc is vastly superior to @cvc when it comes to model finding, as cvc-fmf greatly outperforms @cvc-fmf-sax. We note that @cvc-fmf is only model-sound if combined with -sax. Differently from cvc-fmf, which fails to provide a model as soon as it is faced with quantification over a functional sort, in @cvc-fmf functional sorts are encoded as atomic sorts. Thus it needs the extra axiom to ensure model soundness. For example, @cvc-fmf considers Example 5 satisfiable while @cvc-fmf-sax properly reports it unsatisfiable.

The high number of countermodels in \(\mathrm {JD}^{32}\) indicates, not surprisingly, that providing few facts makes several SH goals unprovable. Nevertheless, it is still useful to know where exactly the Sledgehammer generation is being “incomplete” (i.e., making originally provable goals unprovable), something that is difficult to determine without effective model finding procedures.

6 Concluding Remarks

We have presented extensions for SMT solvers to handle HOSMT problems. The pragmatic extension of CVC4, which can be implemented in other state-of-the-art SMT solver with similar level of effort, performs similarly to the standard encoding-based approach despite its limited support for HO instantiation. Moreover, it allows numerous new problems to be solved by CVC4, with a portfolio approach performing very competitively and often ahead of state-of-the-art HO provers. The redesigned veriT on the other hand consistently outperforms its standard encoding-based counterpart, showing it can be the basis for future advancements towards stronger HO automation.