figure a

1 Introduction

Bit-precise reasoning is paramount for software and hardware verification. Bit-vectors directly and naturally model basic building blocks of both software and hardware, like registers, integers, memory, and more. Many applications rely on satisfiability modulo theories (SMT) for reasoning about bit-vectors, and the number of solvers and techniques for handling bit-vector formulas is large and increasing. One indication of that is the number of bit-vector benchmarks in the SMT-LIB [7] benchmark library, by far the highest among all benchmark categories in the library. The current state of the art for determining the satisfiability of fixed-size bit-vector formulas is a technique called bit-blasting. With this technique, the input formula is first simplified by means of satisfiability preserving transformations. Then, it is fully reduced to a propositional satisfiability (SAT) problem and handed to a SAT solver [11]. The success of this approach is mainly due to the fact that modern SAT solvers are able to solve complex propositional formulas with millions of variables very efficiently. Thus, problems that can be efficiently encoded as SAT instances can leverage the great progress in SAT solving. Nevertheless, bit-blasting has scalability limitations, especially with large bit-widths. In fact, even for conventional bit-widths such as 32 and 64, bit-blasting may face scalability issues, in particular for formulas containing bit-vector arithmetic operators.

The work described in this paper is part of an ongoing effort to improve the scalability of bit-precise reasoning by offering alternatives to bit-blasting that primarily use word-level reasoning and rely on bit-level reasoning only when needed. Specifically, we study a translation of bit-vector formulas to an extension of integer arithmetic; that is, we replace bit-blasting by int-blasting. To encode bitwise bit-vector operators, the extension introduces an operator that represents the bitwise and operation over integers, parameterized by bit-width. The idea of using arithmetic reasoning to solve bit-vector formulas is not new (e.g., [12, 19]). We believe, however, that recent progress in arithmetic solvers (e.g., [14]), especially for non-linear arithmetic (e.g., [17, 18, 26, 41]), make it worthwhile to revisit this approach, as these techniques can be leveraged by applying them to the int-blasted formulas.

We study two kinds of translations: an eager one and a (semi-)lazy one. In the former, the input bit-vector formula is eagerly translated to an integer formula with uninterpreted functions. In the latter, most of the formula is translated eagerly while preserving satisfiability except for bitwise operators (such as bitwise and), which are handled lazily using a counterexample-guided abstraction refinement (CEGAR) loop [28].

We additionally consider two alternative ways to encode bitwise bit-vector operations in integer arithmetic for the purposes of abstraction refinement: one based on a polynomial expansion and the other based on bit-level comparisons. Both alternatives require non-linear arithmetic reasoning, as recovering individual bits from an integer encoding of a bit-vector is achieved via division and modulo operations. The main difference between the two alternatives in the context of an SMT solver implementation, and our reason for considering both, is that the first further exercises the arithmetic subsolver whereas the second relies more heavily on the underlying SAT engine.

Contributions. We have implemented the aforementioned variants of int-blasting in the cvc5 SMT solver (the successor of CVC4 [5]) and evaluated our implementation experimentally to estimate its potential. For that, we compiled a new set of benchmarks, encoding equivalence checks of rewrite rule candidates proposed by the syntax-guided rewrite rule enumeration framework presented by Nötzli et al. [36]. We show that for those benchmarks, int-blasting significantly outperforms bit-blasting as the bit-width increases. We further evaluated our technique on the QF_BV benchmarks in the SMT-LIB benchmark library [6], as well as on 35 benchmarks that arise from smart contract verification, and observed that int-blasting is complementary to bit-blasting on those benchmarks.

Outline. After introducing some background and notation in Sect. 2, Sect. 3 introduces an extension of the theory of integer arithmetic, in which an operator representing bitwise and is added for each bit-width. We present a translation from the theory of bit-vectors to this extension, in Sect. 4, along with eager and lazy algorithms for solving the translated formula. We discuss an initial experimental evaluation of the various translations in Sect. 5 and conclude in Sect. 6 with some directions for further work.

Table 1. Considered bit-vector operators with SMT-LIB 2 syntax. In , \(0 \le l \le u < n\).

2 Preliminaries

We review the usual notions and terminology of many-sorted first-order logic with equality (see [21, 44] for more detailed information). Let \(S\) be a set of sort symbols. For every sort \({\sigma \in S}\), we assume an infinite set of variables that are pairwise disjoint across sorts. A signature \(\varSigma \) consists of a set \(\varSigma ^s \!\subseteq S \) of sort symbols and a set \(\varSigma ^f \) of function symbols. Arities of function symbols are defined in the usual way, and correspond to their types, that is, they take the form \(\sigma _1\times \ldots \times \sigma _n\rightarrow \sigma \) where \(\sigma _1,\ldots ,\sigma _n,\sigma \) are sorts. Constants are treated as functions with no input sorts. We assume that \(\varSigma \) includes a sort \(\mathsf {Bool}\), interpreted as the Boolean domain, and the \(\mathsf {Bool}\) constants \(\top \) and \(\bot \) (respectively for true and false). Signatures do not contain separate predicate symbols and use instead function symbols with \(\mathsf {Bool}\) return type.

We assume the usual definitions of well-sorted terms, literals, and formulas, and refer to them as \(\varSigma \)-terms, \(\varSigma \)-literals, and \(\varSigma \)-formulas, respectively. These are constructed using the symbols in \(\varSigma \), variables, quantifiers and connectives, as well as the if-then-else constructor \(\mathrm {ite}(\varphi ,\,t_1,\,t_2) \), where \(\varphi \) is a formula and \(t_1\) and \(t_2\) are \(\varSigma \)-terms of the same sort.

A \(\varSigma \)-interpretation \({\mathcal {I}} \) maps: each \(\sigma \in \varSigma ^s \) to a distinct non-empty set of values \(\sigma ^{\mathcal {I}} \) (the domain of \(\sigma \) in \({\mathcal {I}}\)); each variable x of sort \(\sigma \) to an element \({x^{\mathcal {I}} \in \sigma ^{\mathcal {I}} }\); and each \(f^{\sigma _1 \cdots \sigma _n \sigma } \in \varSigma ^f \) to a total function \(f^{\mathcal {I}} \!\!: \sigma _1^{\mathcal {I}} \,\times \,...\, \times \, \sigma _n^{\mathcal {I}} \rightarrow \sigma ^{\mathcal {I}} \) if \(n > 0\), and to an element in \(\sigma ^{\mathcal {I}} \) if \(n = 0\). We use the usual notion of a satisfiability relation \(\models \) between \(\varSigma \)-interpretations and \(\varSigma \)-formulas. A term of the form \(\mathrm {ite}(\varphi ,\,t_1,\,t_2) \) is interpreted in an interpretation \({\mathcal {I}} \) as \(t_{1}^{{\mathcal {I}}}\) if \({\mathcal {I}} \,\models \, \varphi \), and as \(t_2^{{\mathcal {I}}}\) otherwise. For each sub-signature \(\varSigma '\) of \(\varSigma \), the reduct \({\mathcal {I}} ^{\varSigma '}\) of \({\mathcal {I}} \) to \(\varSigma '\) is obtained from \({\mathcal {I}} \) by restricting it to the sorts and symbols of \(\varSigma '\).

A \(\varSigma \)-theory T is a non-empty class of \(\varSigma \)-interpretations, such that every interpretation that only disagrees from one in T on the variable assignments is also in T. A \(\varSigma \)-formula \(\varphi \) is T-satisfiable (resp., T-unsatisfiable, T-valid) if it is satisfied by some (resp., no, all) interpretations in T.

The signature \(\varSigma _{\mathrm {BV}} \) of fixed-size bit-vectors is defined in the SMT-LIB 2 standard [7], and includes a unique sort for each positive integer n (representing the bit-vector width), denoted here as \(\sigma _{[n]} \). Without loss of generality, we take \(\varSigma _{\mathrm {BV}} \) to consist of a restricted set of bit-vector function symbols (or bit-vector operators) as listed in Table 1. The selection of operators is arbitrary but complete in the sense that it suffices to express all bit-vector operators defined in SMT-LIB 2. We further assume that \(\varSigma _{\mathrm {BV}} \) includes all bit-vector constants of sort \(\sigma _{[n]} \) for each n, represented as bit-strings. To simplify the notation, we will sometimes denote them by the corresponding natural number. If a term t has sort \(\sigma _{[n]} \) then we denote n by \(\kappa (t) \). The SMT-LIB 2 standard for the \(\varSigma _{\mathrm {BV}} \)-theory \(T_{\mathrm {BV}} \) defines a set of \(\varSigma _{\mathrm {BV}} \)-interpretations \({\mathcal {I}} \), such that for each positive integer n, \({\sigma _{[n]}}^{{\mathcal {I}}}\) is the set of all bit-vectors of size n and function symbols are interpreted as the corresponding word-level operations in these domains (for details, see [38, 39]). All function symbols (of non-zero arity) in \(\varSigma _{\mathrm {BV}} \) are overloaded for every \(\sigma _{[n]} \! \in \varSigma _{\mathrm {BV}} \). We refer to the i-th bit of t as t[i] with \(0\le i < n\). We interpret t[0] as the least significant bit (LSB), and \(t[n-1]\) as the most significant bit (MSB). The unsigned interpretation of a bit-vector v of width k as a natural number is given by \({\left[ v\right] }_{\mathbb {N}} = \varSigma _{i=0}^{k-1}{v}\left[ i\right] \cdot 2^i\), and its signed interpretation as an integer is given by . Given \(0\le n < 2^{k}\), the bit-vector of width k with unsigned interpretation n is denoted \({\left[ n\right] }^{k}_{\mathbb {BV}}\). This notation is extended also for n outside this bound by defining \({\left[ n\right] }^{k}_{\mathbb {BV}}:={\left[ n \bmod 2^{k}\right] }^{k}_{\mathbb {BV}}\).Footnote 1

We consider a theory \(T_{\mathrm {IA}} \) of integer arithmetic whose signature \(\varSigma _{\mathrm {IA}} \) includes a single sort \(\mathsf {Int} \), function symbols \(+\), −, \(\cdot \), \(\mathrm{div}\), and \(\bmod \) of arity \(\mathsf {Int} \times \mathsf {Int} \rightarrow \mathsf {Int} \), the function symbol \(\mathrm {pow2} \) of arity \(\mathsf {Int} \rightarrow \mathsf {Int} \), the predicate symbols < and \(\le \) of arity \(\mathsf {Int} \times \mathsf {Int} \rightarrow \mathsf {Bool} \), and a constant symbol of sort \(\mathsf {Int} \) for every integer. The \(\mathrm {pow2} \)-free fragment of this theory is identical to the SMT-LIB 2 theory of integers [43]. Its models are all possible expansions of the models of the SMT-LIB 2 theory obtained by interpreting \(\mathrm {pow2} ({n}) \) as \(2^n\) when n is a non-negative constant, and interpreting \(\mathrm {pow2} ({n}) \) arbitrarily otherwise.

3 Integer Arithmetic with Bitwise and

In this paper, we reduce \(T_{\mathrm {BV}} \)-satisfiability to satisfiability in a theory that extends \(T_{\mathrm {IA}} \) as follows. We first extend the signature \(\varSigma _{\mathrm {IA}} \) with binary function symbols , one for each positive integer k. We define two theories for the extended signature: the first treats the new symbols as uninterpreted functions (UF); the second interprets them as bitwise and operators on integers modulo \(2^k\). This is defined formally as follows:

Definition 1

The signature is obtained from \(\varSigma _{\mathrm {IA}} \) by adding a function symbol of arity \(\mathsf {Int} \times \mathsf {Int} \rightarrow \mathsf {Int} \) for each \(k>0\). The -theory \(T_{\mathrm {IA} \mathrm {UF}} \) consists of all -interpretations whose \(\varSigma _{\mathrm {IA}} \)-reduct is a \(T_{\mathrm {IA}} \)-interpretation. The -theory consists of all \(T_{\mathrm {IA} \mathrm {UF}} \)-interpretations \({\mathcal {I}} \) in which .

Following Footnote 1, notice that is fully interpreted, even for integers that are not between 0 and \(2^{k}\). In the following definition, we identify a decidable fragment of that corresponds to formulas that originate from \(\varSigma _{\mathrm {BV}} \).

Definition 2

Let n be a positive integer, and let t be a -term of sort \(\mathsf {Int} \). A -formula \(\varphi \) is a t-n-range constraint if it has the form , , or (\(0 \bowtie _1 t \wedge t\bowtie _2 n)\) for \(\bowtie _1,\bowtie _2\in \left\{ \,<,\le \,\right\} \). A formula \(\varphi \) is a range constraint if it is a t-n-range constraint for some t and n; a formula \(\varphi \) is bounded if there are quantifier-free formulas \(\varphi _1\), \(\varphi _2\), \(\psi _1,\ldots ,\psi _m\) such that \(\varphi =\varphi _1\wedge \varphi _2\), \(\varphi _2=\bigwedge _{i=1}^m \psi _i\), each \(\psi _i\) is a range constraint, and for each term t that occurs in \(\varphi _1\) that is either a variable or has the form , there exist \(1\le i\le m\) and a positive integer n such that \(\psi _i\) is a t-n-range constraint.

Example 1

Let \(\varphi _1\) be and \(\varphi _2\) be . Then \(\varphi _1\wedge \varphi _2\) is not bounded, because it does not include any range constraint for . Consider the formula \(\varphi _2'\) obtained from \(\varphi _2\) by conjoining the range constraint . Then \(\varphi _1\wedge \varphi _2'\) is bounded.

A naive algorithm for deciding -satisfiability of bounded -formulas can be obtained by enumerating all possible values for variables within the specified bounds, and checking if the formula evaluates to true. If it does, a full model can be constructed according to Definition 1. In fact, bounds over variables are sufficient for -satisfiability since the semantics of in is fixed. A similar decision procedure can be obtained for \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability, which does require the bounds over -terms. This algorithm gives us:

Proposition 1

The - and \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability of bounded formulas is decidable.

In the next section, we show that the class of bounded formulas in is both useful and effective: it is expressive enough to describe bit-vector formulas and can be reduced to problems for which there are efficient solvers.

4 Int-Blasting

In this section, we present our integer-based approach for solving \(T_{\mathrm {BV}} \)-satisfiability. There are two stages in our approach. The first, described in Sect. 4.1 and proved correct in Sect. 4.2, translates \(T_{\mathrm {BV}} \)-formulas to . The second, described in Sect. 4.3 and 4.4, solves the resulting formulas by eager and lazy reductions to \(T_{\mathrm {IA} \mathrm {UF}} \), respectively. Although we developed our translations for the full fragment of \(T_{\mathrm {BV}} \), to simplify the exposition in this paper, we will restrict ourselves to quantifier-free formulas only.

Fig. 1.
figure 1

Translation \(\mathcal {T} \) from \(\varSigma _{\mathrm {BV}} \) to . We denote by k the bit-width \(\kappa (t_2) \) of the second argument, except for the cases of and , where it denotes the bit-width \(\kappa (t_1) \) of the only argument; x ranges over bit-vector variables; \(\chi \) is a one-to-one mapping from bit-vector variables to integer variables; c ranges over bit-vector constants.

4.1 From \(T_{\mathrm {BV}} \) to 

The first step is to translate \(\varSigma _{\mathrm {BV}} \)-formulas to equisatisfiable -formulas, so that the original formula is \(T_{\mathrm {BV}} \)-satisfiable if, and only if, its translation is -satisfiable. For this purpose, we define a translation function \(\mathcal {T} \) as shown in Fig. 1, which recursively translates \(\varSigma _{\mathrm {BV}} \)-formulas to (via the conversion function \(\mathcal {C} \)) and collects additional lemmas about the ranges of the translated variables and the introduced -terms (via the function \(\textsc {Lem} ^{\le }\)).

Conversion Function \(\mathcal {C} \). We use a one-to-one mapping \(\chi \) from bit-vector variables (i.e., variables of sort \(\sigma _{[k]} \) for some \(k>0\)) to integer variables (i.e., variables of sort \(\mathsf {Int} \)). A bit-vector constant c is translated to its integer counterpart using \({\left[ \_\right] }_{\mathbb {N}}\) which maps c to its unsigned integer interpretation. For Boolean connectives \(\diamond \, \in \{ \wedge ,\vee ,\Rightarrow ,\lnot ,\Leftrightarrow \}\), equalities, and unsigned comparators with \(\bowtie \ \in \{ <, \le , >, \ge \}\), the conversion function is recursively applied to their arguments. In the latter case, is replaced by its \(\varSigma _{\mathrm {IA}} \) counterpart \(\bowtie \). Signed comparators are handled similarly, except that the arguments are processed with function \(\mathsf {uts} _{k}(\_) \) (unsigned to signed with bit-width k), also defined in Fig. 1, which ensures that the semantics of signed comparison is preserved properly. For a given integer n in the range \(0\le n < 2^{k}\), it returns \({\left[ {\left[ n\right] }^{k}_{\mathbb {BV}}\right] }_{\mathbb {Z}}\), the signed interpretation of the bit-vector whose unsigned interpretation is n. Bit-vector addition is translated to integer addition modulo \(2^{k}\), where k is the bit-width of the arguments. Bit-vector subtraction, multiplication, and one’s and two’s complement are handled similarly. For division, the SMT-LIB 2 standard defines a default value for bit-vector division by 0, but not for integer division by 0. This is handled by wrapping the translated division term in an \(\mathrm {ite}\), which embeds the semantics of bit-vector division within integer arithmetic. A similar pattern is followed for remainder. Note that there is no need to take the result modulo \(2^k\) for one’s complement and unsigned division and remainder, as they are guaranteed to be within the correct bounds. Concatenation and extraction are handled as expected, using multiplication, division, and modulo. Left/right shifts are obtained by multiplying/dividing the first argument by 2 to the power of the second argument. Bitwise and is translated to , where k is determined according to the bit-width of the bit-vector arguments. Bitwise or () and xor ( ) are reduced to other operators, using the following identities that hold for all bit-vectors x and y [46]:

(1)

Lemmas Function \(\textsc {Lem} ^{\le } \). Function \(\textsc {Lem} ^{\le } \) takes a -formula and collects necessary range constraints for integer variables and terms of the form that are introduced by \(\mathcal {C} \). For variables, the range is determined by the bit-width of the original bit-vector variable. For , and terms, the constraint is determined by the bit-width of the arguments. Since and are eliminated, the constraint is stated in terms of . Notice that the terms introduced by Eq. (1) have the same arguments as the original terms. For all other terms and formulas, \(\textsc {Lem} ^{\le } \) simply collects such constraints recursively.

4.2 Correctness

The correctness of \(\mathcal {T} \) is stated in the following theorem. It follows from the SMT-LIB 2 semantics of bit-vectors and arithmetic, and from Definition 1. Its proof is by structural induction on \(\varphi \). Most cases are similar to the correctness proof of the translation by Niemetz et al. [35], from bit-vectors with parametric width to integers, with the main difference being the case of . Unlike that work, where the quantified axiomatization had to be proven correct by induction on the bit-width, here the correctness follows directly from Definition 1.

Theorem 1

A \(\varSigma _{\mathrm {BV}} \)-formula \(\varphi \) is \(T_{\mathrm {BV}} \)-satisfiable iff \(\mathcal {T} \,\varphi \) is -satisfiable.

The theorem is actually stronger than stated: from any model \({\mathcal {I}} \) of \(\mathcal {T} \,\varphi \) one can compute a satisfying assignment for \(\varphi \)’s free variables, simply by assigning to each free variable x of \(\varphi \) the bit-vector corresponding to the (integer) value of \(\chi (x)\) in \({\mathcal {I}} \). An analogous result holds in the opposite direction as well.

We prove this theorem in the remainder of this section, focusing on the left-to-right direction. The other direction is shown similarly. Throughout the proof, we employ the following notation:

$$\begin{aligned} {{ bsel }}_{i}(x):=(x \,\mathrm{div}\, 2^i) \bmod 2 \end{aligned}$$
(2)

The term \({{ bsel }}_{i}(x)\) represents the selection of the i-th bit in the bit-vector representation of x. In particular, it is always 0 or 1.

Let \(\varphi \) be a \(\varSigma _{\mathrm {BV}} \)-formula. We assume without loss of generality that \(\varphi \) does not have any occurrence of the ite operator, as it can be eliminated using the Boolean operators and the introduction of fresh variables. Suppose \(\varphi \) is \(T_{\mathrm {BV}} \)-satisfiable and let \({\mathcal {A}} \) be a \(T_{\mathrm {BV}} \)-interpretation that satisfies it. We prove that \(\mathcal {T} \,\varphi \) is -satisfiable. Define the following -interpretation \({\mathcal {B}} \): all function symbols and constants are interpreted as defined by ; division and remainder by 0, as well as \(\mathrm {pow2} ({m}) \) for any negative m are defined arbitrarily; for every bit-vector variable x, the value in \({\mathcal {B}} \) of its translation is the unsigned interpretation of its value in \({\mathcal {A}} \), that is:

$$\chi (x)^{{\mathcal {B}}}:={\left[ x^{{\mathcal {A}}}\right] }_{\mathbb {N}}.$$

This fixes \({\mathcal {B}} \). Also, \({\mathcal {B}} \) is a -interpretation by construction.

Notice that every term of the form \(t_1 \,\mathrm{div}\, t_2\) or \(t_1 \bmod t_2\) that occurs in the translation is guarded by an assumption that \(t_2\) is not 0. Similarly, \(\mathrm {pow2} \) is always applied on arguments that are guaranteed to be non-negative. Therefore, the interpretation of these corner cases in \({\mathcal {B}} \) can indeed remain arbitrary.

We first prove the following lemma, which states the correctness of the translation for terms, that is, that the translation of each \(\varSigma _{\mathrm {BV}} \)-term is interpreted in \({\mathcal {B}} \) as the unsigned interpretation of the original term’s value in \({\mathcal {A}} \).

Lemma 1

\((\mathcal {C} \,t)^{{\mathcal {B}}}={\left[ t^{{\mathcal {A}}}\right] }_{\mathbb {N}}\) for every \(\varSigma _{\mathrm {BV}} \)-term t of sort \(\sigma _{[k]} \).

Proof

By induction on t. If t is a bit-vector variable then \((\mathcal {C} \,t)^{{\mathcal {B}}}=\chi (t)^{{\mathcal {B}}}={\left[ t^{{\mathcal {A}}}\right] }_{\mathbb {N}}\) by the definitions of \(\mathcal {C} \) and \({\mathcal {B}} \). If t is a bit-vector constant then \((\mathcal {C} \,t)^{{\mathcal {B}}}={\left[ t\right] }_{\mathbb {N}}^{{\mathcal {B}}}={\left[ t^{{\mathcal {A}}}\right] }_{\mathbb {N}}\) by the definition of \({\left[ \_\right] }_{\mathbb {N}}\). If t has the form , then by the definition of \(\mathcal {C} \), \((\mathcal {C} \,t)^{{\mathcal {B}}}=(\mathcal {C} \,t_1 + \mathcal {C} \,t_2 \bmod 2^{k})^{{\mathcal {B}}}\). Now, \(2^{k}\ne 0\) and hence the interpretation in \({\mathcal {B}} \) is governed by , and is equal to \((\mathcal {C} \,t_1)^{{\mathcal {B}}}+(\mathcal {C} \,t_2)^{{\mathcal {B}}} \bmod 2^{k}\). By the induction hypothesis, this is equal to \({\left[ t_1^{{\mathcal {A}}}\right] }_{\mathbb {N}}+{\left[ t_2^{{\mathcal {A}}}\right] }_{\mathbb {N}} \bmod 2^{k}\). By the semantics of according to the SMT-LIB 2 standard, this is the same as \({\left[ t^{{\mathcal {A}}}\right] }_{\mathbb {N}}\). The other bit-vector operators are handled similarly. and are eliminated by \(\mathcal {C} \), and the correctness of this elimination follows from [46].

Finally suppose t has the form . By the definition of \(\mathcal {C} \), . By Definition 1, since \({\mathcal {B}} \) is a -interpretation, . By the induction hypothesis, this is the same as . Now, \({\left[ \_\right] }_{\mathbb {N}}\) and \({\left[ \_\right] }^{}_{\mathbb {BV}}\) cancel each other, and hence we get , which is the same as \({\left[ t^{{\mathcal {A}}}\right] }_{\mathbb {N}}\).    \(\square \)

Going back to \(\varphi \), which is assumed to be satisfied by \({\mathcal {A}} \), we now prove that , that is . First, we prove that \({\mathcal {B}} \,\models \,\mathcal {C} \,\varphi \) by induction on \(\varphi \). The induction step, in which \(\varphi \) is recursively constructed from propositional connectives, trivially follows from the induction hypothesis, hence we focus on the induction base. In the induction base, \(\varphi \) has either the form \(t_1=t_2\), , or for some . If \(\varphi \) has the form \(t_1=t_2\), then since \({\mathcal {A}} \,\models \,\varphi \), \(t_1^{{\mathcal {A}}}=t_2^{{\mathcal {A}}}\). By Lemma 1, \((\mathcal {C} \,t_1)^{{\mathcal {B}}}={\left[ t_1^{{\mathcal {A}}}\right] }_{\mathbb {N}}={\left[ t_2^{{\mathcal {A}}}\right] }_{\mathbb {N}}=(\mathcal {C} \,t_2)^{{\mathcal {B}}}\), and therefore \({\mathcal {B}} \,\models \,\mathcal {C} \,\varphi \). If \(\varphi \) has the form then since \({\mathcal {A}} \,\models \,\varphi \), . By Lemma 1, and . Thus we get , and so . The case of is shown similarly. Finally, if \(\varphi \) has the form then since \({\mathcal {A}} \,\models \,\varphi \), we have . In turn, by the semantics of as defined in the SMT-LIB 2 standard, this means that \({\left[ t_1^{{\mathcal {A}}}\right] }_{\mathbb {Z}}<{\left[ t_2^{{\mathcal {A}}}\right] }_{\mathbb {Z}}\). By the definition of \(\mathsf {uts} \), we get \(\mathsf {uts} _{k}({\left[ t_1^{{\mathcal {A}}}\right] }_{\mathbb {N}}) <\mathsf {uts} _{k}({\left[ t_2^{{\mathcal {A}}}\right] }_{\mathbb {N}}) \), with \(k=\kappa (t_1) \). By Lemma 1 we have: \(\mathsf {uts} _{k}((\mathcal {C} \,t_1)^{{\mathcal {B}}}) <\mathsf {uts} _{k}((\mathcal {C} \,t_2)^{{\mathcal {B}}}) \), which means \({\mathcal {B}} \,\models \, \mathcal {C} \,\varphi \). The case of is shown similarly.

Next, we prove that , also by induction on \(\varphi \). Similarly to the above, the induction step follows directly from the induction hypothesis and so we focus on the induction base, in which \(\varphi \) is atomic, and hence it has the form \(t_1=t_2\), , or for some \(\bowtie \in \left\{ \,<,\le ,>,\ge \,\right\} \). By the definition of \(\textsc {Lem} ^{\le } \), \(\textsc {Lem} ^{\le } (\varphi )=\textsc {Lem} ^{\le } (t_1)\wedge \textsc {Lem} ^{\le } (t_2)\). We thus prove that \({\mathcal {B}} \,\models \,\textsc {Lem} ^{\le } (t)\) for any term t of sort \(\sigma _{[k]} \) by an inner induction on t. If t is a bit-vector variable, \(\textsc {Lem} ^{\le } (t)=0\le \chi (t) < 2^{k}\). By Lemma 1, \(\chi (t)^{{\mathcal {B}}}={\left[ t^{{\mathcal {A}}}\right] }_{\mathbb {N}}\), and by the definition of \({\left[ \_\right] }_{\mathbb {N}}\), \(0\le \chi (t)^{{\mathcal {B}}}<2^{k}\). If t is a bit-vector constant, then the condition is trivially satisfied. If t has the form with , then . By the induction hypothesis, \({\mathcal {B}} \,\models \, \textsc {Lem} ^{\le } (t_1)\wedge \textsc {Lem} ^{\le } (t_2)\). Also, by Definition 1, and the fact that it is a -interpretation. For any other form of t, this follows immediately from the induction hypothesis.    \(\square \)

Fig. 2.
figure 2

Translation \(\mathcal {T} _A\) from to \(T_{\mathrm {IA} \mathrm {UF}} \), parameterized by \(A\in \left\{ \,\textsf {sum},\textsf {bitwise} \,\right\} \). x and c range over integer variables and constants, resp.; \({\diamond }\) ranges over the connectives; f ranges over \(\varSigma _{\mathrm {IA}} \)-symbols; \({ bsel }\) is from Eq. (2).

4.3 -Satisfiability: Eager Approach

Now that we have reduced \(T_{\mathrm {BV}} \)-satisfiability to -satisfiability, we present eager and lazy reductions from the latter to \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability. The first approach for determining -satisfiability is an eager reduction to \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability. The reduction is defined by the translation \(\mathcal {T} _{A}\), which is parameterized by a mode \(A\in \left\{ \,\textsf {sum},\textsf {bitwise} \,\right\} \), as shown in Fig. 2.

The translation adds to \(\varphi \) a conjunction \( \textsc {Lem} ^{ \& }_{A}(\varphi ) \) of lemmas that reflect the definition of for each relevant k. Function \( \textsc {Lem} ^{ \& }_{A} \), when applied to a term or formula e, recursively collects lemmas for subterms of e of the form .

The introduced lemma depends on the mode A. For \(A=\textsf {sum} \), the lemma represents the usual encoding of integers in binary notation, by summing powers of 2 with coefficients that depend on the bits. Alternatively, for \(A=\textsf {bitwise} \), the translation introduces a lemma that compares each i-parity of the -term to its expected result, based on the i-parities of the two arguments. The lemmas use the term \(\mathrm {ITE}_{\textit{and}}({x},{y}) \) to encode each bit using the \(\mathrm {ite}\) operator. This case splitting requires access to the i-th bit in the bit-vector representations of \(t_1\), \(t_2\), and . These are abbreviated by \(a_i\), \(b_i\), and \(c_i\) in Fig. 2, and are defined using the function \({ bsel }\) from Eq. (2).

The main difference between \(\textsf {bitwise}\) and \(\textsf {sum}\) is in the balance between the arithmetic solver and the Boolean solver. While both approaches heavily use \(\bmod \) and \(\mathrm{div}\) terms, the \(\textsf {bitwise}\) mode only includes comparisons between such terms, thus relying mainly on the SAT solver, as well as the equality solver. In contrast, the \(\textsf {sum}\) mode incorporates them within sums and multiplications by constants, making heavy use of the arithmetic solver.

Fig. 3.
figure 3

Procedures for \(T_{\mathrm {BV}} \)-satisfiability. We assume \({P}_{T_{\mathrm {IA} \mathrm {UF}}} \) is a procedure for \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability that returns a finite representation of a model for satisfiable formulas.

The following theorem states the correctness of the reduction described in Fig. 2 from -satisfiability to \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability. It follows from the semantics of \(T_{\mathrm {BV}} \) and Definition 1, which induces the same semantics for as the one induced by the lemmas that are produced in \({{\textsc {Iand}}}_{A}(t_1,t_2) \).

Theorem 2

Let \(\varphi \) be a -formula. For all \(A\in \left\{ \,\textsf {sum},\textsf {bitwise} \,\right\} \), \(\varphi \) is -satisfiable iff \(\mathcal {T} _{A}\,\varphi \) is \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiable.

Proof

Suppose \(\varphi \) is -satisfiable and let \({\mathcal {A}} \) be a -interpretation that satisfies it. Now, \({\mathcal {A}} \) is also a \(T_{\mathrm {IA} \mathrm {UF}} \)-interpretation, and hence what is left to show is that \( {\mathcal {A}} \,\models \, \textsc {Lem} ^{ \& }_{A}(\varphi ) \), which directly follows from Definition 1 and a routine verification of the -validity of \( \textsc {Lem} ^{ \& }_{A}(\varphi ) \) for \(A\in \left\{ \,\textsf {sum},\textsf {bitwise} \,\right\} \).

Now suppose \(\mathcal {T} _{A}\,\varphi \) is \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiable and let \({\mathcal {A}} \) be a \(T_{\mathrm {IA} \mathrm {UF}} \)-interpretation that satisfies \(\mathcal {T} _{A}\,\varphi \). We prove that \(\varphi \) is -satisfiable. Let \({\mathcal {B}} \) be the -interpretation obtained from \({\mathcal {A}} \) by ignoring the interpretations of in \({\mathcal {A}} \), and redefining them according to Definition 1. Clearly, \({\mathcal {B}} \) is a -interpretation. To show that it satisfies \(\varphi \), it suffices to show that for any term that occurs in \(\varphi \). All other terms that occur in \(\varphi \) are interpreted the same as in \({\mathcal {A}} \), by the way \({\mathcal {B}} \) was defined. Now suppose occurs in \(\varphi \). Suppose for contradiction that . Since \({\mathcal {B}} \) is a -interpretation, this means that . In other words, . Hence there is some \(0\le i< k\) such that . Now, recall \({ bsel }\) from Eq. (2), which equals to 0 or 1, according to the i-th bit in the bit-vector representation of the input integer. Using the semantics of in SMT-LIB 2, we get that . For both modes \(\textsf {sum}\) and \(\textsf {bitwise}\), this means \({\mathcal {A}} \,\not \models \, {{\textsc {Iand}}}_{A}(t_{1},t_{2}) \). For the former, the sums will evaluate differently, while for the latter, a direct disequality will be obtained. This is a contradiction to the assumption that \({\mathcal {A}} \,\models \,\mathcal {T} _{A}\,\varphi \).    \(\square \)

We use \(\mathcal {T} _{A}\) in the eager procedure \(\textsc {Eager}_{{BV}}^{{A}} ({\varphi }) \) of Fig. 3, in which the input \(\varSigma _{\mathrm {BV}} \)-formula \(\varphi \) is processed through \(\mathcal {T} \) to obtain an equisatisfiable formula \(\mathcal {T} \,\varphi \) in , and then through \(\mathcal {T} _A\) to get an equisatisfiable formula in \(T_{\mathrm {IA} \mathrm {UF}} \). The result is then handed to a \(T_{\mathrm {IA} \mathrm {UF}} \)-solver \({P}_{T_{\mathrm {IA} \mathrm {UF}}} \) for bounded formulas, which is expected to be a decision procedure for the \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability of quantifier-free formulas that also returns (a finite representation of) a \(T_{\mathrm {IA} \mathrm {UF}} \)-model satisfying the input formula whenever that formula is \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiable. Notice that \(\mathcal {T} \) always generates bounded formulas due to \(\textsc {Lem} ^{\le } \), and \(\mathcal {T} _A\) preserves boundedness as it does not introduce any new variables or terms of the form . This leads to the following correctness result for \(\textsc {Eager}_{{BV}}^{{A}} \).

Proposition 2

\(\textsc {Eager}_{{BV}}^{{A}} \) is a decision procedure for the \(T_{\mathrm {BV}} \)-satisfiability of quantifier-free formulas.

4.4 -Satisfiability: Lazy Approach

We now examine a CEGAR-based approach, which applies the function \( \textsc {Lem} ^{ \& }_{A} \) in the \(\mathcal {T} _A\) translation in a lazy and incremental way. Our CEGAR-procedure \(\textsc {Lazy}_{{BV}}^{{A}} \) is described in Fig. 3. It maintains a set \(\varGamma \) of assertions, initially set to the translation of the input \(\varSigma _{\mathrm {BV}} \)-formula \(\varphi \) using \(\mathcal {T} \), and a set \(\varDelta \) of terms of the form in \(\mathcal {T} \,\varphi \). Similarly to the eager approach, we utilize the decision procedure \({P}_{T_{\mathrm {IA} \mathrm {UF}}} \) for \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiability. If, at any point, \({P}_{T_{\mathrm {IA} \mathrm {UF}}} \) determines that \(\varGamma \) is \(T_{\mathrm {IA} \mathrm {UF}} \)-unsatisfiable, \(\textsc {Lazy}_{{BV}}^{{A}} \) returns “unsat”. Otherwise, the model \({\mathcal {I}} \) of \(\varGamma \) returned by \({P}_{T_{\mathrm {IA} \mathrm {UF}}} \) is validated against a set \(\varLambda \) of lemmas, instantiated with the terms in \(\varDelta \). The set \(\varLambda \) is a union of two sets of lemmas: (i) a set of basic lemmas \( Prop (\varDelta ) \) that capture basic properties of bitwise \(\textit{and} \): upper bounds, idempotence, symmetry, and values for special inputs; and (ii) lemmas based on \( \textsc {Lem} ^{ \& }_{A} \), as defined in Fig. 2. Any lemmas falsified by \({\mathcal {I}} \) make the model unsuitable for \(\varphi \). Such lemmas are then added to \(\varGamma \), and the process repeats. If all of the lemmas in \(\varLambda \) are satisfied, the algorithm returns “sat”.

The correctness argument for \(\textsc {Lazy}_{{BV}}^{{A}} \) is similar to that of Proposition 2. At any point in the procedure, \(\varGamma \) consists of \(\mathcal {T} \,\varphi \), as well as a subset of \(\varLambda \). It is routine to check that every formula in \(\varLambda \) is -valid. If the procedure returns “unsat”, this means that the abstraction \(\varGamma \) is not \(T_{\mathrm {IA} \mathrm {UF}} \)-satisfiable, which means that \(\mathcal {T} \,\varphi \) itself is -unsatisfiable. By Theorem 1, \(\varphi \) is \(T_{\mathrm {BV}} \)-unsatisfiable. In contrast, when the procedure returns “sat”, a satisfying -interpretation for \(\mathcal {T} \,\varphi \) can be constructed according to Definition 1 from the \(T_{\mathrm {IA} \mathrm {UF}} \)-interpretation \({\mathcal {I}} \), in a similar fashion to the proof of Theorem 2. In turn, this interpretation can be translated to a \(T_{\mathrm {BV}} \)-interpretation following Theorem 1. Since \(\mathcal {T} \,\varphi \) is bounded, we then have the following.

Proposition 3

\(\textsc {Lazy}_{{BV}}^{{A}} \) is a decision procedure for the \(T_{\mathrm {BV}} \)-satisfiability of quantifier-free formulas.

Remark 1

At this point, it is instructive to compare the translation presented here to that by Niemetz et al. [35]. Although the solutions offered in the two works are similar, they differ on the problem they address. Niemetz et al. study the satisfiability of formulas over bit-vectors with parametric bit-widths, while this paper focuses on the regular SMT-LIB 2 theory of fixed-width bit-vectors. Since the translation to integers involves the bit-width of the terms in the input formula, parametric bit-widths require the introduction of quantifiers in the translation in practically all cases. In contrast, by considering only inputs over fixed bit-widths, our approach requires no quantifiers at all. Also, the solving technique we present here has both eager and lazy variants, with two alternative encodings in each. Instead, Niemetz et al. present only eager translations. The most successful translation there mostly resembles our eager \(\textsf {sum}\) mode, with some additional quantified axioms that correspond \( Prop (t_1,t_2) \) from Fig. 3. A counterpart to the bitwise mode was not considered there. Furthermore, their method was only evaluated on benchmarks with a single parametric bit-width due to the limited expressiveness supported by the prototype implementation. In contrast, our technique is fully implemented within the cvc5 solver.

5 Experimental Results

5.1 Implementation and Experiments

We implemented both \(\textsc {Eager}_{{BV}}^{{A}} \) and \(\textsc {Lazy}_{{BV}}^{{A}} \) in the cvc5 SMT solver and evaluated the implementation on three classes of benchmarks.Footnote 2 The eager translations are implemented in a preprocessing pass that translates the entire input formula to a formula over the SMT-LIB 2 theory of integers, without any extension. The lazy translations use the same preprocessing pass; however, the translated formulas include the operators. The CEGAR loop for is implemented as part of the non-linear extension of the arithmetic solver of cvc5.

Note that cvc5 does not have built-in support for \(\mathrm {pow2} \). For all \(\varSigma _{\mathrm {BV}} \)-operators except and this does not matter in practice since the argument to \(\mathrm {pow2} \) is a concrete constant. For the shift operators, the argument t to \(\mathrm {pow2} \) may include variables, but the value of \(\mathrm {pow2} ({t}) \) only matters when \(0\le t < k\), where k is the bit-width of the original \(\varSigma _{\mathrm {BV}} \)-term. Thus, we are able to eliminate \(\mathrm {pow2} \)-terms by enumerating a finite set of cases using \(\mathrm {ite}\)-terms.

In accordance with Sect. 4, our implementation focuses on finding and improving strategies for lemma instantiation. Another aspect of integer reasoning is the evaluation of operations over constants, especially when the constants are large, as in our experience, operations on big integers can take up to 30–40% of the overall runtime. In the experiments described below, these are handled by the CLN library [25], which is supported by cvc5. Our focus on lemma instantiation is meant to reduce how often expensive numeric operations must be invoked.

We evaluated our int-blasting approaches \(\textsc {Eager}_{{BV}}^{{A}}\) and \(\textsc {Lazy}_{{BV}}^{{A}}\) for \(A \in \{\textsf {sum}, \textsf {bitwise} \}\) on three sets of benchmarks: (1) the QF_BV benchmarks from SMT-LIB, (2) a set of benchmarks consisting of equivalence checks of bit-vector rewrite rule candidates, and (3) 35 benchmarks originating from a smart contract verification application.Footnote 3 We compared our four int-blasting configurations, denoted \( {eager_s}\), \( {eager_b}\), \( {lazy_s}\), \( {lazy_b}\), where b stands for bitwise and s stands for sum, against (1) cvc5 running its eager bit-vector solver using CaDiCaL [10] as the SAT back end, (2) Bitwuzla [31] version 0.1-202011 (the QF_BV winner of the 2020 SMT competition), (3) Yices [20] version 2.6.2 with CaDiCaL as the SAT back end (the QF_BV runner-up at the same competition), and (4) bw-ind, the prototype implementation for proving bit-width independent properties used by Niemetz et al. [35], which uses the arithmetic solver of cvc5 as a back-end, the same arithmetic solver used in our int-blasting approaches. We used bw-ind only for the second benchmark set since its support is limited to benchmarks that contain a single bit-width. We performed all experiments on a cluster with Intel Xeon CPU E5-2620 v4 CPUs with 2.1 GHz and 128 GB memory.

Table 2. Overall results on all three benchmark sets.
Fig. 4.
figure 4

Number of solved benchmarks grouped by bit-width.

5.2 Results

Table 2 summarizes the overall results for all benchmark sets. For each set and running configuration, it shows the total number of solved benchmarks (slvd), sat results (sat), unsat result (uns) and number of memory-outs (m).

QF_BV Benchmarks (SMT-LIB). The QF_BV benchmark set includes all 41,713 benchmarks from the 2020 SMT-LIB release. We used a limit of 600 s of CPU time and a memory limit of 8 GB for each solver/benchmark pair. None of the int-blasting configurations is competitive with the other bit-blasting solvers. This is as expected since the QF_BV benchmark set contains few benchmarks with bit-widths larger than 64, the target of our approach. The pspace family of QF_BV benchmarks consists of benchmarks with bit-widths ranging from \(5,\!000\) to \(30,\!000\). The more challenging benchmarks in this set, however, contain the bitwise and operator, and our int-blasting approach cannot solve them within the time limit. All four int-blasting approaches are more competitive on unsatisfiable benchmarks than satisfiable ones. This is because int-blasting relies heavily on the performance of cvc5’s procedure for non-linear integer arithmetic. This procedure is based on instantiating a set of lemma schemas [16, 41], which may show unsatisfiability quickly when useful lemmas are discovered, but may take longer to converge when the problem is satisfiable. Overall, each of our int-blasting configurations is able to solve 18 benchmarks that none of the bit-blasting approaches is able to solve; 14 of these are from the arithmetic-heavy Sage2 family, which includes a wide range of both arithmetic and bitwise operators, including shifts and bitwise and, or, and xor.

Equivalence Checks of Rewrite Rule Candidates (ECRW). The ECRW benchmark set consists of equivalence checks of rewrite rule candidates for \(T_{\mathrm {BV}} \)-terms and formulas. They were automatically generated using a state-of-the-art Syntax-Guided Synthesis (SyGuS) [2] solver implemented in cvc5 [40]. We enumerated pairs of \(\varSigma _{\mathrm {BV}} \)-terms that are equivalent for bit-vectors of bit-width 4. These pairs of terms were generated over a sub-signature of \(\varSigma _{\mathrm {BV}} \) consisting of the constants 0 and 1, the \(=\) operator, and the unsigned comparison operators and , as well as the operators , , , , , , and . In total, we generated 5, 491 distinct equivalence checks with bit-width 4. Each equivalence check was then instantiated with bit-widths 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, and 8192, resulting in a total of 54, 910 benchmarks. An important feature of the generated checks is that they exclude equivalences that are already derivable solely by the rewriter of cvc5. We used a CPU time limit of 300 s and a memory limit of 8 GB per solver/benchmark pair. For this benchmark set, our evaluation included bw-ind, whose primary purpose is to prove bit-width independent properties via bit-vectors of parametric widths. Since this benchmark set consists of fixed-width bit-vectors and not parametric ones, we added a constraint that specifies the concrete bit-width of each benchmark, by comparing it to the parametric bit-width. It is evident that bw-ind does not perform well on this benchmark set. This is expected given that this approach is the only one that makes any use of quantifiers.

On this benchmark set, all int-blasting approaches outperformed all other approaches. Figures 4a to 4c provide a more fine-grained analysis for this set by depicting the number of solved benchmarks grouped by bit-width for each solver on (a) benchmarks with applications of the bitwise operator (29%), (b) benchmarks without (71%), and (c) the full ECRW benchmark set. The bit-blasting approaches are marked with circles, while the int-blasting approaches are marked with other shapes. For each subset of benchmarks there is a bit-width k for which the best int-blasting configuration, the lazy \(\textsf {bitwise}\) mode, outperforms all other configurations and solvers: 512 for those benchmarks with , 128 for those without, and 256 for the full set. This shows that int-blasting can be a useful tool to add to the tool-box of bit-precise reasoning engines, in the presence of large bit-widths. Surprisingly, even for bit-width 16, there were benchmarks for which int-blasting performed better than bit-blasting. For example, there are 78 benchmarks of bit-width 16, without the operator that were solved by the int-blasting approaches in less than 1 s, while all the bit-blasting approaches required more than 10 s (in many of these cases, bit-blasting required more than 100 s).

Comparing the different int-blasting configurations, Fig. 4b clearly shows that for benchmarks without applications, the lazy and eager int-blasting configurations are almost bit-width independent, and perform equally well (in turn, their markings overlap in the figure). This is expected because the translations differ from one another only in the way they handle . Moreover, the -free part of our translations is actually bit-width independent, as the size of the generated terms does not depend on it, except for shift operators, which are not included in this benchmark set. The differences between the translations are visible, also as expected, for benchmarks with applications, as shown in Fig. 4a. There, the best int-blasting configuration is \( {lazy_b}\). In the presence of bitwise operators, both the eager and lazy translations introduce terms whose size does depend on the bit-width. Accordingly, we see a clear decrease in the performance of the eager translations as the bit-width increases, while little performance degradation is observable for the lazy translations. This can be explained by the fact that the eager approach introduces bit comparison lemmas or sum-based lemmas before the integer solver comes into play. In contrast, the lazy approach introduces those lemmas only if the model generated in the CEGAR loop falsifies them, so there are generally fewer terms whose size depends on the bit-width.

As for the better performance of \(\textsf {bitwise}\) compared to \(\textsf {sum}\), we conjecture that the bitwise translation outperforms the sum translation because it is a more direct translation to SAT. The sum translation relies on the linear arithmetic solver generating simple conflicts and lemmas over linear arithmetic literals that correspond to the same reasoning in a more indirect way. While this choice is not obvious, our experiments have confirmed that the former is superior.

Smart Contract Verification Benchmarks (SC). This benchmark set consists of 35 benchmarks from a smart contract verification application. They contain (linear and non-linear) arithmetic operators, bitwise operators, as well as uninterpreted functions, and reason about bit-vectors of width 256. These benchmarks originate from verification conditions that are directly produced by Certora’s verification tool for Ethereum smart contracts [15]. They encode algebraic properties of low-level methods in smart contracts (e.g., commutativity of balance updates). The application requires the generation of models, which the eager bit-blasting configuration of cvc5 does not support for uninterpreted functions. We imposed a CPU time limit of 3,600 s and a memory limit of 32 GB per solver/benchmark pair.

The int-blasting configurations are able to solve 24 benchmarks, whereas the bit-blasting solvers solve less (Bitwuzla solves 16 and Yices solves 9). In addition to solving more benchmarks in this benchmark set, the int-blasting approaches are also faster: The 24 benchmarks that are solved by int-blasting take a total of 232 s, to be solved, where 22 out of these benchmarks are solved in a total time of 20 s. This is the case for all int-blasting configurations. In contrast, Bitwuzla solves 16 benchmarks in 5,900 s, and Yices solves 9 benchmarks in 3,900 s. Notice that unsatisfiable benchmarks seem to be better suited for int-blasting, while satisfiable benchmarks are solved better with bit-blasting. This positions int-blasting as a useful complement to bit-blasting.

6 Related Work, Conclusion, and Future Work

Related Work. Earlier integer-based techniques for bit-precise reasoning focus on translating hardware register transfer level (RTL) constraints into integer linear programming (ILP) and are thus limited to the linear arithmetic subset of the theory of bit-vectors [13, 48]. Similarly, Achterberg’s PhD thesis [1] studies translations of bit-vector constraints over linear arithmetic to integers in the context of constraint programming, while bit-blasting non-linear and bitwise operators. Kafle et al. [27] present an approach based on Benders Decomposition [9] for solving modular arithmetic problems after translating them to linear integer arithmetic (LIA). Another approach to solving modular arithmetic problems that originates from software verification was studied by Vizel et al. [45], using a model checking approach. The MathSAT5 solver [19] applies a layered approach for computing Craig’s interpolants for the theory of bit-vectors by first converting the problem into an overapproximated LIA problem [24]. When that approach is unsuccessful, MathSAT5 automatically falls back to finding a propositional interpolant via bit-blasting. Earlier versions of MathSAT also utilized this approach for solving bit-vector problems [12]. A similar but more sophisticated approach [3, 4] is implemented in the Princess theorem prover [42]. Another recent LIA-based interpolation method is presented in [37]. Although similar in spirit to that of MathSAT5 [24], it is often able to recover the word-level structure from the propositional interpolant.

In contrast to [3, 13, 27, 48], we focus on general bit-vector problems, and unlike [3, 12, 13, 24, 27, 48], we translate bit-vector problems into an extension of non-linear integer arithmetic. As a result, our approach can handle all operators of the theory of bit-vectors. We present several variants of our technique, including a CEGAR-based one similar in spirit to the lazy approaches discussed above.

Alternative approaches to bit-blasting based on bit-vector reasoning and the so-called model constructing satisfiability calculus (mcSAT) [30] have shown promising results [23, 47]. Other orthogonal bit-vector-based alternatives include local search techniques which, while refutationally incomplete, are particularly effective in combination with bit-blasting [22, 32,33,34]. We reduce the amount of bit-blasting by converting bit-vector formulas to non-linear integer arithmetic formulas and relying on a DPLL(T)-based SMT approach [8] to solve them.

Our translation of bit-vector formulas to integer formulas is similar to the one for solving formulas with bit-vectors of parametric bit-width we proposed in previous work [35]. However, in this case, the bit-width is not parametric but fixed, which eliminates the need for the translation to introduce quantifiers. A more detailed comparison with that work is provided in Remark 1.

We implemented an earlier prototype of this approach in lazybv2int [49] that used our SMT solver cvc5 as a black box, via the solver-agnostic API of Smt-switch [29]. Initial evaluation led us to the conclusion that it is preferable to implement int-blasting inside cvc5, thus utilizing its efficient mechanisms such as handling of terms and rewriting.

Conclusion. We studied eager and lazy translations from bit-vector formulas to an extension of integer arithmetic, and implemented them in the SMT solver cvc5. The translations reduce arithmetic bit-vector operators as defined in the SMT-LIB 2 standard, and differ in the way they handle bitwise operators. For those, we examined sum-based and bit-based approaches. The experiments we conducted on equivalence checks for rewrite rule candidates show promising results for formulas that involve multiplications and divisions of large bit-vectors. For SMT-LIB benchmarks, our approach is less effective than state-of-the-art approaches largely based on bit-blasting, though not in all cases. Finally, the smart contracts benchmarks show that our approach provides a complement to bit-blasting, especially for unsatisfiable formulas.

Future Work. We believe that alternative approaches for bit-precise reasoning, including mcSAT, local search, and integer-based approaches, can be further developed and improved to the point where they can become a true complement to bit-blasting in applications where bit-blasting struggles to scale up. We plan to continue this line of research by studying integer-based abstractions of other bit-vector operators, in particular, the shift operators. Interestingly, our translations also generate challenging benchmarks for non-linear integer arithmetic solvers. We plan to use these benchmarks to improve non-linear integer reasoning, specifically in the presence of division and modulo operations. For that, we target a submission of such benchmarks to the SMT-LIB library.