1 Introduction

Bit-precise reasoning over bit-vector logics is important for many practical applications of Satisfiability Modulo Theories (SMT), particularly for hardware and software verification. Examples of state-of-the-art SMT solvers with support for bit-precise reasoning are Boolector [9], MathSAT [12], STP [31], Z3 [22], and Yices [25].

The theory of fixed-size bit-vector logics is investigated in several scientific works [4, 5, 13, 21, 27], and even concrete formats for specifying such bit-vector problems exist, e.g., the SMT-LIB format [3] or the BTOR format [10]. Working with non-fixed-size bit-vectors has been considered for instance in [1, 5], and more recently in [55, 56], but is not further discussed in this paper. Most industrial applications (and examples in the SMT-LIB Footnote 1) have fixed bit-width.

We investigate the complexity of solving fixed-size bit-vector formulas. Some papers propose such complexity results, e.g., in [4], the authors consider the common quantifier-free bit-vector logic and give an argument for NP-hardness of its satisfiability problem. In [13], a sublogic of the previous one is claimed to be NP-complete. Interestingly, in [14], there is a claim about the full quantifier-free logic being NP-complete, however the proposed decision procedure justifies this claim only if the bit-widths of the bit-vectors in the input formula are written/encoded in unary format. In [59, 60], the quantified case is addressed, and the satisfiability problem for this logic with uninterpreted functions is proved to be NExpTime-complete. However, the proof, similarly to the decision procedure in [14], only holds if we assume unary encoded bit-widths.

Parts of our paper already appeared as previous work [30, 41]. Apart from this, we are not aware of any work that investigates how the encoding of the bit-widths in the input affects complexity (as an exception, see [19, Page 239, Footnote 3]). In practice, the more natural and exponentially more succinct logarithmic encoding is used, such as in the SMT-LIB [3] or the BTOR [10] format. We investigate how complexity varies if we consider either a unary or a binary encoding. Note that binary encoding, throughout the whole paper, can be replaced with any other logarithmic encoding.

The present paper extends our previous work in several ways. After giving a motivation for the use of binary encoded bit-vector logics in Section 2, we specify various fixed-size bit-vector logics in detail (Section 3). While our previous papers were referring to the common syntax and semantics used in other works, e.g., [4, 5, 10, 13, 21, 27], but was never fully specified from the theoretical point of view, we now want to provide self-contained descriptions for the bit-vector logics that we are considering. Therefore, we introduce syntax and semantics for fixed-size bit-vector logics containing all common bit-vector operations as used in the SMT-LIB format.

After these preliminary definitions, we give a short overview of the existing complexity results for bit-vector logics with unary encoding in Section 4. We then introduce the concept of scalar-boundedness for bit-vector logics with binary encoding in Section 5 and give improved versions of our complexity proofs for quantifier-free bit-vector logics in Section 6. Although our previous proofs from [30, 41] are still valid, we modified and restructured our work to present those proofs in a clearer, easier-to-read, way. In Section 7, we look at the expressiveness of various bit-vector operations and analyze whether they can be used to extend some of the previously defined fragments or to give an alternative characterization of a given class.

We then revisit the quantified case in Section 8 and give new complexity results for fragments with restrictions on operations and the bit-widths of universal variables. Also, we provide a new complexity result for quantifier-free logics extended with non-recursive macros, which are allowed, for example, in the SMT-LIB format. Finally, we discuss practical considerations of our results in Section 9. A brief overview of related work is presented in [29, 42]. We then explain how our theoretic contributions can help to improve practical SMT solving.

The Appendix contains examples that make some definitions and proofs easier to understand.

2 Motivation

In practice, state-of-the-art bit-vector solvers rely on rewriting and bit-blasting. The latter is defined as the process of translating a bit-vector description (also called word-level description) into a combinatorial circuit, as in hardware synthesis. The result can then be checked by a (propositional) SAT solver.

Usually, numbers contained in a bit-vector description (e.g. the bit-widths of bit-vector variables) are encoded in a logarithmic way. When translating the original description into a circuit, all numbers are effectively replaced by their unary encoding. Bit-blasting can therefore lead to an exponential growth, if the numbers are not logarithmic in the original description size.

To illustrate this effect on a practical example, consider the following bit-vector formula in SMT-LIB syntax [3]:

(set-logic QF_BV)

(declare-fun x () (_ BitVec 1000000))

(declare-fun y () (_ BitVec 1000000))

(declare-fun z () (_ BitVec 1000000))

(assert (= z (bvadd x y)))

(assert (= z (bvshl x (_ bv1 1000000))))

(assert (distinct x y))

The first line defines the logic to be the one of quantifier-free bit-vectors. The following three lines introduce bit-vector variables x, y, and z of bit-width one million. The last three lines enforce some constraints between the variables. Basically, the formula verifies that, for an arbitrary bit-vector x of bit-width one million, there exists no bit-vector yx with x+y=x≪1.

Written to a file, this formula can be encoded with 217 bytes. Using the SMT solver Boolector (even with all rewritings switched on), bit-blasting produces a circuit of size 129 MB encoded in the actually rather compact AIGER format. Tseitin transformation results in a CNF in DIMACS format of size 843 MB. A bit-width of 10 million bits can be represented by four more bytes in the original SMT-LIB input, but could not be bit-blasted anymore with our tool-flow (due to integer overflow). As this example illustrates, checking satisfiability of bit-vector formulas through bit-blasting can suffer dramatically from the exponential growth caused by the implicit unary re-encoding of the numbers.

Obviously, its exponential nature also disqualifies bit-blasting as a sound way to prove that the satisfiability problem for (quantifier-free) bit-vector logics is in NP. In [41], we showed that deciding bit-vector logics, even without quantifiers, is much harder. It turned out to be NExpTime-complete. Informally speaking, we showed that moving from unary to binary encoding for bit-widths increases complexity exponentially and that binary encoding has at least as much expressive power as quantification. However, in [30, 41], we also proposed certain restrictions for bit-vector problems to remain in a “lower” complexity class, when moving from unary to binary encoding.

These theoretical insights as well as later practical results from [29, 42] give reason to look into bit-vector logics more closely and to provide a comprehensive framework for dealing with complexity of bit-vector logics, particularly combined with the use of a binary encoding.

3 Preliminaries

\(\mathbb {N}\) denotes the set of natural numbers {0,1,2,… }, while \(\mathbb {N}^{+}\) denotes \(\mathbb {N}\backslash \{0\}\). \(\mathbb {B} := \{0,1\}\) is the Boolean domain, thus truth values false and true are represented by 0 and 1, respectively.

Given \(n\in \mathbb {N}^{+}\), let L n denote the ceiling of the logarithm of n base 2: L n:=⌈log2n⌉.

3.1 SAT, QBF, and DQBF

Let V be a set of Boolean variables. Boolean formulas over V are defined inductively as follows: (i) x is a Boolean formula where xV; (ii) ¬ϕ 0, (ϕ 0ϕ 1), (ϕ 0ϕ 1), (ϕ 0ϕ 1), and (ϕ 0ϕ 1) are Boolean formulas where ϕ 0,ϕ 1 are Boolean formulas. A Boolean formula ϕ is satisfiable iff there exists an assignment \(\alpha : V \mapsto \mathbb {B}\) to the variables, such that ϕ evaluates to 1 under α. The Boolean satisfiability problem (SAT) is NP-complete.

The class of Quantified Boolean Formulas (QBF) is obtained by adding quantifiers to Boolean formulas. Each QBF ψ can be written in prenex normal form, i.e., as a closed formula Q.ϕ where Q is a quantifier prefixV 0V 1V 2V 3…∀V m−1V m , the V i s are pairwise disjoint sets of variables, and ϕ is a Boolean formula, which is called the matrix of ψ. A variable vV i depends on a variable v V j iff i>j. This defines a total order on the variables of ψ. A QBF is satisfiable iff there exist Skolem functions for its existential variables to make the formula evaluate to 1. The satisfiability problem for QBF is PSpace-complete [48, 57].

Instead of using totally ordered quantifiers, it is also possible to extend Boolean formulas with Henkin quantifiers [34]. Henkin quantifiers specify variable dependencies explicitly instead of using implicit dependencies defined by the quantifier order. This allows to define more general dependency constraints only requiring a partial order. Adding Henkin quantifiers to Boolean formulas results in the class of Dependency Quantified Boolean Formulas (DQBF), as first defined in [50]. Again, a DQBF can always be expressed in prenex normal form, i.e., as a closed formula Q .ϕ, where Q is a quantifier prefix

$$\forall u_{1},\dots,u_{m} \exists e_{1}(u_{1,1},\dots,u_{1,m_{1}}),\dots,e_{n}(u_{n,1},\dots,u_{n,m_{n}}) $$

where each u i,j is a universally quantified variable, \(m_{i}\in \mathbb {N}\), and the matrix ϕ is a Boolean formula. In DQBF, existential variables can always be placed after all universal variables in the quantifier prefix, since the dependencies of a certain variable are explicitly given and not implicitly defined by the order of the prefix (in contrast to QBF). The more general quantifier order makes DQBF more powerful than QBF and allows more succinct encodings. A DQBF is satisfiable iff there exist Skolem functions for its existential variables to make the formula evaluate to 1. In DQBF, the arguments for Skolem functions of an existential variable are exactly the universal variables that are explicitly specified in its Henkin quantifier. The satisfiability problem for DQBF is NExpTime-complete [49, 50]. Although we did not formally specify the dependencies of universal variables, this can be done by the use of Herbrand functions [2].

Throughout our paper, we use SAT, QBF, and DQBF to give reductions from or to certain bit-vector logics, showing inclusion or hardness for the corresponding complexity class, respectively. While SAT and QBF are considered to be prototypical complete problems for their complexity classes, DQBF is used less frequently. Another NExpTime-complete logic used in reductions in the context of unary encoded bit-vector logics [59] is Effectively Propositional Logic (EPR) [45]. However, due to its simplicity, we consider DQBF to be a better choice for our purposes.

3.2 Circuits

We distinguish between two kind of circuits: combinatorial circuits and sequential circuits. For both kinds of circuits, we stick closely to the definitions in [55]:

A combinatorial circuit with n i inputs and n o outputs is a finite acyclic directed graph with exactly n i vertices of in-degree zero and n o vertices of out-degree zero. All vertices of a non-zero in-degree have a logical function assigned to them and are called gates. All vertices of in-degree one represent a NOT-gate and vertices of greater in-degrees are either AND- or OR-gates. Given boolean values for the inputs, each gate can be evaluated in the natural way according to the logical function it represents. As already noted in the introduction, this kind of representation of a bit-vector formula is created during bit-blasting. For every combinatorial circuit, a corresponding set of n o SAT formulas with n i variables can be constructed naturally.

A (clocked) sequential circuit SC consists of a combinatorial circuit C and a set of D-type flip-flops. The data input of each flip-flop is connected to a unique output of C and the Q-output of each flip-flop is connected to a unique input of C. Such a backward-connected output-input pair will be denoted as a state variable. The circuit is assumed to work in clock pulses. In every clock pulse, it takes the values of its inputs and computes the output values. Via the flip-flops these values are routed back to the inputs for the use in the next clock cycle. Inputs of C that do not receive their value from an output through a flip-flop will be called the inputs of the sequential circuit SC and outputs of C that do not pass their value to an input of a flip-flop will be called the outputs of the sequential circuit SC.

All the state variables are assumed to be provided with initial values stored in the flip-flops before the first clock cycle. The input variables need to be provided values from outside the system at every clock cycle and the output variables produce a new output at every clock cycle. A sequential circuit can be used to recognize languages. A word \(w \in (\{0,1\}^{n_{i}})^{+}\) is said to be accepted by a sequential circuit SC with one output o, iff the value of o is 1 after the last clock cycle when w is given as input, one letter each clock cycle.

Symbolic model checking for sequential circuits refers to the problem of checking whether the language for a given sequential circuit is empty. It is known to be PSpace-complete [51, 52, 54].

3.3 Fixed-Size Bit-Vector Logics

A bit-vector, or word, is a sequence of bits, i.e., Boolean values. Such a sequence may be either infinite or of a fixed size \(n\in \mathbb {N}^{+}\), where n is called the bit-width of the bit-vector. While non-fixed-size bit-vectors have been considered for example in [1, 5, 55, 56], working with fixed-size bit-vectors is the focus of this paper.

Let D n denote the set of all bit-vectors of bit-width n. Given dD n , the ith bit of d is denoted by d [i], where \(i\in \mathbb {N}\) and i<n. Using vector notation, d is written as (d [n−1] ,…,d [1] ,d [0] ), i.e., the most significant bit standing on the left-hand side and the least significant bit on the right-hand side. Sometimes we omit parentheses and commas.

Syntax and semantics of fixed-size bit-vector logics do not differ much in the literature [4, 5, 13, 21, 27]. Concrete formats for specifying bit-vector problems also exist, e.g., the SMT-LIB format [3] or the BTOR format [10]. In the subsequent sections, we give the necessary definitions, in a more general way than in the works cited above, in order to propose a uniform and general framework using any set of bit-vector operations.

3.3.1 Syntax

The main objective of this section is to define bit-vector formulas. As it turns out in Definition 2 and 3, such a formula, informally speaking, is a combination of bit-vector operations on some atomic elements, each of which can be represented either as a bit-vector or an integer, which we call a scalar. Let us emphasize that scalars in formulas are not represented as bit-vectors. Note that the bit-width of a bit-vector is also a scalar.

A bit-vector operator symbol (or operator for short) represents an operation that takes some bit-vector operands and scalar operands, and computes a single bit-vector. Given an arbitrary operator set, one has to specify syntactic rules for using the operators. Definition 1 of a signature captures these rules by providing three properties for each operator: (1) An operator is given an arity, which is a pair of numbers that specify the number of bit-vector operands and the number of scalar operands, respectively. For instance, the arithmetic operator addition has 2 bit-vector and 0 scalar operands, while extraction has 1 bit-vector and 2 scalar operands. (2) Since there usually exist restrictions on what kind of operands are legal to use with an operator, a signature has to specify a condition on the bit-widths and scalar values of operands. For instance, the operands of addition must be of the same bit-width; the scalar operands i,j of extraction must be less than the bit-width of the bit-vector operand and ij. (3) A bit-width of the resulting bit-vector is assigned to each legal combination of bit-widths and scalar values of operands.

Definition 1 (Signature)

A signature for an operator set O p is defined as a set Σ O p :={〈a r i t y o ,c o n d o ,w i d o 〉 | oO p}, where

  • \(arity_{o} \in \mathbb {N}\times \mathbb {N}\);

  • \(cond_{o}: (\mathbb {N}^{+})^{k} \times \mathbb {N}^{l} \mapsto \mathbb {B}\) where 〈k,l〉:=a r i t y o ;

  • \(wid_{o}: Par_{o} \mapsto \mathbb {N}^{+}\) where

    $$Par_{o} := \left\{ p \in (\mathbb{N}^{+})^{k} \times \mathbb{N}^{l} \ | \ \langle k,l \rangle := arity_{o},\ cond_{o}(p) \right\}. $$

Table 1 shows the set of the most common operators provided by the SMT-LIB format [3] and the literature [4, 5, 13, 21, 27], such as bitwise operators (negation, and, or, xor, etc.), relational operators (equality, unsigned/signed less than, unsigned/signed less than or equal, etc.), arithmetic operators (addition, subtraction, multiplication, unsigned/signed division, unsigned/signed remainder, etc.), shifts (left shift, logical/arithmetic right shift), extraction, concatenation, zero/sign extension, etc. Let O p denote the common operator set given in Table 1. O p includes all bit-vector operators used in the SMT-LIB providing a collection of the most common bit-vector operators in software and hardware verification; other frameworks, like Boolector and Z3, provide additional useful operators, e.g., reduction operators and overflow operators. Let Σ O p denote the common signature for O p. Note that Table 1 specifies some of the syntactic properties provided by Σ O p in an implicit way: the arity is completely, the condition is partly implicit.

Table 1 Syntax (signature) for common bit-vector operators

The simplest bit-vector expressions, or terms, are the variables and constants, as Definition 2 shows. Operators can be applied to bit-vector terms which obey the syntactic rules given by the signature of the operator set. While operators have a priori fixed syntax and semantics, uninterpreted functions can be introduced on demand.

Definition 2 (Term)

A bit-vector term t of bit-width \(n\in \mathbb {N}^{+}\) is denoted by t [n]. A term over a signature Σ O p is defined inductively as follows:

ᅟ ᅟ

Let us emphasize that, in a term, bit-widths are specified explicitly only for constants, variables, and uninterpreted functions. In all other cases, the bit-width is implicit, i.e., it can be derived from the bit-widths of the operands of operations. In the following, we may omit explicit bit-widths and parentheses if they can be concluded from the context.

Definition 3 (Formula)

A bit-vector formula is an expression Q.t [1], where t [1] is a bit-vector term, Q is a quantifier prefix \(Q_{0}x_{0}{\!}^{[n_{0}]}Q_{1}{\dots } Q_{k}x_{k}{\!}^{[n_{k}]}\), each Q i ∈{∀,∃}, and each \(x_{i}^{[n_{i}]}\) is a bit-vector variable. We call t the matrix of the formula.

If only existential quantifiers appear in a formula, we may omit the quantifier prefix and refer to this kind of formula as a quantifier-free one. In the same way, we refer to a formula as being quantified, if it contains universal quantifiers.

Without loss of generality, we can assume that variables and uninterpreted functions are identified by their unique names. In a formula, therefore, each variable and each uninterpreted function must be used in a consistent way, regarding its bit-width and the bit-widths of its arguments.

In the literature, most of the approaches distinguish between a bit-vector level and a Boolean level within a bit-vector formula, by allowing only relational operators (i.e., operators with result of bit-width 1) at the Boolean level [4, 11, 13, 21, 27]. Note that, in our definitions, there is no such explicit distinction. Therefore, for example, relational operators are allowed to be embedded in concatenations or arithmetic operations. However, by introducing the so-called flat form in Definition 8, the same separation of a Boolean level and a bit-vector level can be made in any bit-vector formula over Σ O p , assuming the common interpretation of Σ O p , defined in Section 3.3.2.

3.3.2 Semantics

Given a signature Σ O p and an operator oO p where 〈k,l〉:=a r i t y o , each p:=(n 1,…,n k ,i 1,…,i l )∈P a r o can be mapped to a set of possible operands (bit-vectors and scalars) and also to a set of possible results (bit-vectors). These two sets, called the domain and the range of p, are defined as follows:

$$\begin{array}{@{}rcl@{}} Dom_{o}(p) & := & D_{n_{1}} \times {\dots} \times D_{n_{k}} \times \{i_{1}\} \times {\dots} \times \{i_{l}\} \\ Range_{o}(p) & := & D_{wid_{o}(p)} \end{array} $$

In order to evaluate a term or formula, it is first necessary to interpret all the operators we use (Definition 4), and then to assign domain elements to free variables and to interpret uninterpreted functions (Definition 5).

Definition 4 (Interpretation)

An interpretation of a signature Σ O p is defined as a set \(\widehat {Op}\) of functions, consisting of an \(\widehat {o}\) for each oO p, such that

$$\widehat{o}: \bigcup_{p\in Par_{o}} Dom_{o}(p) \ \mapsto \bigcup_{p\in Par_{o}} Range_{o}(p) $$

where

$$\forall p\in Par_{o}, d\in Dom_{o}(p) \ . \ \widehat{o}(d) \in Range_{o}(p) $$

Let \(\widehat {\mathbf {Op}}\) denote the common interpretation of Σ O p , detailed in Table 2, based on [13, 16, 27] and the SMT-LIB. Note that Table 2 uses a notation that is introduced by the following definitions.

Table 2 Semantics (interpretation) for common bit-vector operators

Definition 5 (Model)

\(M := \langle \alpha ,\widehat {F} \rangle \) is a model for a formula Φ where

  • α is an assignment, i.e., it assigns an element of D n to each free variable x [n] in Φ;

  • \(\widehat {F}\) is a set of interpretations \(\widehat {f}: D_{n_{1}} \times {\dots } \times D_{n_{k}} \mapsto D_{n}\) of all uninterpreted functions \(f^{[n]}\left (t_{1}{\!}^{[n_{1}]},\dots ,t_{k}{\!}^{[n_{k}]} \right )\) in Φ.

To facilitate the presentation, similar to [13, 27], we define an auxiliary bijective meta-function n a t n :D n ↦[0,2n−1]. Given a bit-vector dD n , \(nat_{n}(d) := {\sum }_{i=0}^{n-1} 2^{i}d\!\left [i\right ]\). We also introduce the inverse meta-function \(bv_{n} := nat_{n}^{-1}\).

Definition 6 (Evaluation)

Given a signature Σ O p , a formula Φ over Σ O p , an interpretation \(\widehat {Op}\) of Σ O p , and a model \(M := \langle \alpha ,\widehat {F} \rangle \) for Φ, Φ can be evaluated to either 0 or 1, by using the inductive definition of the evaluation function \([{\kern -2.3pt}[ \cdot ]{\kern -2.3pt}]^{\widehat {Op}}_{M}\), as follows:

ᅟ ᅟ

As mentioned before, the common interpretation \(\widehat {\mathbf {Op}}\) is given in Table 2. In the table, we omit the interpretation and the model for evaluation. Furthermore, we use two abbreviations:

$$\begin{array}{@{}rcl@{}} \textit{msb}\left( t^{[n]}\right) & \ := \ & \left[{\kern-2.3pt}[ t \right]{\kern-2.3pt}]\!\left[n-1\right] \\ \textit{abs}\left( t^{[n]}\right) & \ := \ & \left\{ \begin{array}{ll} - t & \text{if } msb(t) \\ t & \text{otherwise} \\ \end{array} \right. \end{array} $$

In the Appendix, we use the notation t [n] d, where dD n , as an alternative for [[t [n]]]=d, assuming an appropriate model for t, implied by the context.

A formula Φ (over Σ O p ) is satisfiable over an interpretation \(\widehat {Op}\) (of Σ O p ) iff there exists a model M for Φ such that \(\left [{\kern -2.3pt}[ {\Phi } \right ]{\kern -2.3pt}]^{\widehat {Op}}_{M} = 1\). M is called a satisfying model for Φ over \(\widehat {Op}\).

Definition 7 (Bit-blasting)

Bit-blasting, or flattening [44], a bit-vector formula Φ means to construct an equisatisfiable Boolean formula ϕ. Φ and ϕ are equisatisfiable over an interpretation \(\widehat {Op}\) iff the following condition holds: there exists a satisfying model for Φ over \(\widehat {Op}\) iff there exists a satisfying assignment for ϕ.

Bit-blasting techniques represent bit-vector variables as strings of Boolean variables and encode bit-vector operations as corresponding Boolean circuits. It is a well-known fact that for all common operations, interpreted by \(\widehat {\mathbf {Op}}\), a corresponding polynomial-size (in the bit-widths of operands) Boolean circuit can be constructed. This fact plays an important role in several of our proofs.

3.3.3 Logics and Encodings

For the rest of this paper, we fix the operator set we use to O p with the signature Σ O p (Table 1) and the interpretation \(\widehat {\mathbf {Op}}\) (Table 2), and we refer to this framework as the Common Operator Framework.

By considering bitwise operators in the Boolean case (i.e., for bit-width 1) as logical connectives, the same separation of a Boolean level and a bit-vector level can be made in any bit-vector formula as in most approaches in the literature [4, 11, 13, 21, 27]. Notice, however, that relational operations can occur not only at the Boolean level, but even below that, due to Definition 2, which allows any operations to be nested. In order to be compatible with the above-mentioned two-level approaches, we introduce a normal form for bit-vector formulas as follows:

Definition 8 (Flat Form)

A bit-vector formula Φ is in flat form iff it does not contain any nested relational operations.

It is easy to see that any bit-vector formula Φ can be translated into flat form with only linear growth in formula size. For each nested relational operation in Φ, iteratively replace the innermost one \(o(t_{1}{\!}^{[n_{1}]},\dots ,t_{k}{\!}^{[n_{k}]},i_{1},\dots ,i_{l})\) by introducing a new (Tseitin) variable t s [1] existentially quantified at the innermost prefix position and adding the constraint \(ts{\!}^{[1]} \Leftrightarrow o(t_{1}{\!}^{[n_{1}]},\dots ,t_{k}{\!}^{[n_{k}]},i_{1},\dots ,i_{l})\) to the formula (i.e., conjuncting it with the matrix).

In this paper, we investigate the following four common bit-vector logics, as well as fragments and extensions thereof:

QF_BV::

quantifier-free bit-vector formulas without uninterpreted functions;

QF_UFBV::

quantifier-free formulas allowing uninterpreted functions;

BV::

formulas allowing quantification, but no uninterpreted functions;

UFBV::

formulas allowing quantification and uninterpreted functions.

We distinguish between logics that use a unary or a binary encoding on scalars appearing in formulas. Recall that binary encoding can be replaced with any other logarithmic encoding. Note that a scalar can appear either as a bit-width or a scalar operand. The value c of a bit-vector constant c [n] is always encoded in binary format, since it represents a bit-vector.

Definition 9 (Logic with Unary and Binary Encoding)

Given a bit-vector logic \(\mathcal {L}\), let \(\mathcal {L}{}1\) and \(\mathcal {L}{}2\) denote the logic \(\mathcal {L}\) using unary and binary encoding on all the scalars in formulas, respectively.

In the rest of this paper, we investigate the complexity of the satisfiability problem for QF_BV1, QF_UFBV1, BV1, UFBV1, QF_BV2, QF_UFBV2, BV2, and UFBV2. For this, we define the size of a formula.

Definition 10 (Formula Size)

Suppose we are given a bit-vector logic \(\mathcal {L}\) and a formula \({\Phi }\in \mathcal {L}\), with \({\Phi } := Q_{0}x_{0}{\!}^{[n_{0}]}Q_{1}x_{1}{\!}^{[n_{1}]}{\dots } Q_{k}x_{k}{\!}^{[n_{k}]}.t^{[1]}\). The size of Φ is defined as \(\left |{\Phi }\right | := \left |x_{0}{\!}^{[n_{0}]}\right | + {\dots } + \left |x_{k}{\!}^{[n_{k}]}\right | + \left |t^{[1]}\right |\).

The expression |t [n]| denotes the size of a term t [n] and is defined as follows:

ᅟ ᅟ

4 Logics With Unary Encoding

First, we consider bit-vector logics with unary encoding. The results of this section can also be found in our previous work [41].

Without uninterpreted functions nor quantification, i.e., for QF_BV1, the following complexity result can be shown (for partial results and related work see also [4] and [13]):

Proposition 1

QF_BV1 is NP-complete.Footnote 2

Proof

Recall that QF_BV1 uses the Common Operator Framework. Therefore, by bit-blasting, QF_BV1 can be (polynomially) reduced to Boolean formulas, for which the satisfiability problem (SAT) is NP-complete. The other direction follows from the fact that Boolean formulas are actually QF_BV1 formulas with terms of bit-width 1. i.e., the class of Boolean formulas is a subset of QF_BV1. □

Adding uninterpreted functions to QF_BV1 does not increase complexity:

Proposition 2

QF_UFBV1 is NP-complete.

Proof

In a quantifier-free formula, uninterpreted functions can be eliminated by replacing each occurrence with a new bit-vector variable and adding (at most quadratic many) Ackermann constraints (see, e.g., [44, Chapter 3.3.1]). Therefore, QF_UFBV1 can be polynomially translated into QF_BV1. The other direction follows from the fact that QF_BV1⊂QF_UFBV1. □

Adding quantifiers to QF_BV1 yields the following complexity (see also [19]):

Proposition 3

BV1 is PS pace-complete.

Proof

By bit-blasting, BV1 can be reduced to Quantified Boolean Formulas (QBF), which is PSpace-complete. Hardness follows from the fact that QBF⊂BV1 (following the same argument as in Proposition 1). □

Adding quantifiers to QF_UFBV1 increases complexity exponentially:

Proposition 4

UFBV1 is NE xpTime-complete (see [59]).

Proof

The Effectively Propositional Logic (EPR), is a common NExpTime-complete [45] logic, and can be reduced to UFBV1 [59, Theorem 7]. For completing the other direction, apply the reduction in [59, Theorem 7] combined with bit-blasting of the bit-vector operations. □

5 Scalar-Bounded Problems

For some of our remaining complexity results, we apply the concept of re-encoding scalars from binary to unary format. Due to the nature of these encodings, this process can lead to an exponential growth in formula size for the general case. However, this exponential growth can be avoided sometimes.

In [41], we introduced the concept of bit-width bounded bit-vector problems. In this section, we generalize this concept by introducing the concept of scalar-boundedness, a sufficient condition for bit-vector problems to remain in the “lower” complexity class, when re-encoding scalars from binary to unary format. This condition tries to capture the bounded nature of scalars in certain problems.

Note that, in any bit-vector formula, there has to be at least one scalar, due to the fact that there has to be at least one term with explicit specification of its bit-width (as a scalar).Footnote 3 Given a formula Φ, let m a x s c l (Φ) denote the maximal scalar in Φ and, furthermore, let c n t s c l (Φ) denote the number of scalars in Φ.

Definition 11 (Scalar-Bounded Formula Set)

An infinite set S of bit-vector formulas is (polynomially) scalar-bounded, iff there exists a polynomial function \(p:\mathbb {N}\mapsto \mathbb {N}\) such that ∀Φ∈S. max s c l (Φ)≤p(cnt s c l (Φ)).

Proposition 5

Given a scalar-bounded set S of formulas with binary encoded scalars, any Φ∈S grows polynomially when re-encoding the scalars to unary format.

Proof

Let Φ denote the formula obtained through re-encoding scalars in Φ to unary format. For the size of Φ, the following upper bound holds: |Φ|≤cnt s c l (Φ)⋅max s c l (Φ)+|Φ|. Note that c n t s c l (Φ)⋅max s c l (Φ) is an upper bound on the sum over the sizes of all the scalars in Φ. The second term, |Φ|, represents an upper bound for the part of Φ that does not contain any scalars. Since S is scalar-bounded, it holds that

$$\begin{array}{@{}rcl@{}} |{\Phi}^{\prime}| & \ \leq \ & \textit{cnt}_{\mathsf{scl}}\left( {\Phi} \right)\cdot\textit{max}_{\mathsf{scl}}\left( {\Phi} \right) + |{\Phi}| \\ & \ \leq \ & \textit{cnt}_{\mathsf{scl}}\left( {\Phi} \right)\cdot p(\textit{cnt}_{\mathsf{scl}}\left( {\Phi} \right)) + |{\Phi}| \ \leq \ |{\Phi}|\cdot p(|{\Phi}|) + |{\Phi}| \end{array} $$

where p is a polynomial function. Therefore, the size of Φ is polynomial in the size of Φ. □

By applying this proposition to the logics of Section 3.3.3 together with the results from Section 4, we get:

Corollary 1

Suppose we are given a scalar-bounded set S of bit-vector formulas. If S⊆QF_BV2 (and even if S⊆QF_UFBV2), then S∈ NP. If S⊆BV2, then S∈ PS pace. If S⊆ UFBV2, then S∈ NE xpTime.

6 Quantifier-Free Logics with Binary Encoding

Our main contribution in [30, 41] was to give complexity results for bit-vector logics with the more common binary encoding in the general case (i.e., for sets of formulas that are not scalar-bounded). In this section, we present modified versions of our proofs for the quantifier-free logics and restructured our results in order to give a better overall picture.

First we introduce our main complexity results as theorems, starting with the full logic of QF_BV2 in Theorem 1, and continuing with three fragments of QF_BV2 in Theorem 2, 3, 4. All these theorems reference separate lemmas, which we introduce afterwards.

Theorem 1

QF_BV2 is NE xpTime-complete [41].

Proof

It is easy to see that QF_BV2∈N ExpTime, since a QF_BV2 formula can be translated exponentially to QF_BV1∈ NP (Proposition 1), by applying a simple unary re-encoding to all the scalars in the formula. NExpTime-hardness of QF_BV2 is a direct consequence of Lemma 1, in which a fragment of QF_BV2 is proved to be NExpTime-hard. □

Note that UFBV1 and QF_BV2 have the same complexity. This shows that, informally speaking, binary encoding on scalars has the same expressive power as quantification and uninterpreted functions altogether.

In [30], we investigated the complexity of the satisfiability problem for the following three fragments of QF_BV2, which only allow a restricted set of bit-vector operations in formulas:

QF_BV2c ::

only bitwise operations, equality, and left shift by constant, i.e., t [n]c [n] where c is a constant, are allowed.

QF_BV2≪1::

only bitwise operations, equality, and left shift by 1, i.e., t [n]≪1[n], are allowed.

QF_BV2 b w ::

only bitwise operations and equality are allowed.

Theorem 2

QF_BV2 ≪c is NE xpTime-complete [30].

Proof

In Lemma 1, we give a reduction from DQBF (which is NExpTime-complete) to QF_BV2c . This shows the NExpTime-hardness of QF_BV2c . The fact that QF_BV2c N ExpTime directly follows from Theorem 1. □

Theorem 3

QF_BV2 ≪1 is PS pace-complete [30].

Proof

In Lemma 2, we give a reduction from QBF (which is PSpace-complete) to QF_BV2≪1. This shows the PSpace-hardness of QF_BV2≪1. In Lemma 3, we then prove PSpace-inclusion by giving a reduction from satisfiability for QF_BV2≪1 to the model checking problem for sequential circuits. Symbolic model checking for sequential circuits is PSpace-complete as well [51, 52, 54]. □

Also note that this theorem has an important practical aspect. It allows us to use symbolic model checkers (see the hardware model checking competition) for solving these restricted bit-vector problems instead of using SAT solvers after an exponential explosion through bit-blasting. This is further discussed in Section 9.

Theorem 4

QF_BV2 b w is NP-complete [30].

Proof

Since Boolean formulas are a subset of QF_BV2 b w , NP-hardness follows directly. To show that QF_BV2 b w ∈ NP, we give a reduction from QF_BV2 b w to a scalar-bounded set of formulas S⊂QF_BV2 in Lemma 4. The claim then follows from Corollary 1. □

As already hinted in Proposition 2, adding uninterpreted functions to all quantifier-free logics we discussed so far does not affect complexity. We formalize this in the following proposition:

Proposition 6

QF_UFBV2 and QF_UFBV2 ≪c are NE xp T ime-complete, QF_UFBV2 ≪1 is PS pace-complete, and QF_UFBV2 b w is NP-complete [30, 41].

Proof

Apply the same arguments as were used in Proposition 2. □

As we outlined above, now we propose our main lemmas, referenced in the previous theorems.

Lemma 1

DQBF can be reduced to QF_BV2 ≪c [30, 41].

Proof

The basic idea is to use bit-vector expressions to encode function tables in an exponentially more succinct way, which then allows us to characterize independence of an existential variable from a particular universal variable in a polynomial way.

In the proof, we apply bit masks of the form

$$\mathit{binmagic}\left( 2^{m},\ 2^{n}\right) := \overbrace{\underbrace{0{\dots} 0}_{2^{m}}\underbrace{1\dots 1}_{2^{m}}\dots\underbrace{0{\dots} 0}_{2^{m}}\underbrace{1{\dots} 1}_{2^{m}}}^{2^{n}} $$

Note that these bit masks correspond to the so-called binary magic numbers (or magic masks in [39, p. 141]), and can arithmetically be calculated in the following way (actually as the result of a geometric sum):

$$\begin{array}{@{}rcl@{}} \mathit{binmagic}\left( 2^{m},\ 2^{n}\right) & := & {2^{(2^{n})}-1} \over {2^{(2^{m})}+1} \end{array} $$

In order to reformulate this definition in terms of bit-vectors, (i) the numerator can be written as \(\sim \!0^{[2^{n}]}\), (ii) \(2^{(2^{m})}\) as 1≪2m, and (iii) the resulting binary magic number as a bit-vector variable \(b^{[2^{n}]}\):

$$\begin{array}{@{}rcl@{}} b^{[2^{n}]} & \ = \ & \sim\!0^{[2^{n}]} \ /_{\mathbf{u}} \ \left( (1 \ll 2^{m}) + 1 \right) \\ b \cdot \left( (1 \ll 2^{m}) + 1 \right) & \ = \ & \sim\!0^{[2^{n}]} \\ (b \ll 2^{m}) + b & \ = \ & \sim\!0^{[2^{n}]} \end{array} $$

Addition can be eliminated easily as follows, by using two’s complement representation for −1 and −b:

$$\begin{array}{@{}rcl@{}} (b \ll 2^{m}) + b & \ = \ & -1 \\ b \ll 2^{m} & \ = \ & - 1 - b \\ b \ll 2^{m} & \ = \ & - 1 + \sim\! b + 1 \\ b \ll 2^{m} & \ = \ & \sim\! b \end{array} $$

We now use the binary magic numbers to create a certain set of fully-specified exponential-size bit-vectors by using a polynomial expression, due to binary encoding on scalars. Afterwards, we then formally point out the well-known fact that those bit-vectors correspond exactly to the set of all assignments. By adding constraints on those bit-vectors, we can then use a polynomial-size bit-vector formula for cofactoring Skolem-functions in order to express independency constraints. □

First, we describe the reduction, then we show that the reduction is polynomial, and, finally, that it is correct. An example can be found in Appendix A.

The reduction

Let ψ:=Q.ϕ denote a DQBF with quantifier prefix Q and matrix ϕ. Further, let u 0,…,u n−1 and \(e_{0},\dots ,e_{n^{\prime }-1}\) denote all the universal and existential variables that occur in Q, respectively. Translate ψ to a QF_BV2c formula Φ by eliminating the quantifier prefix and translating the matrix ϕ as follows:

Step 1.:

Replace all Boolean constants 0 and 1 with \(0^{[2^{n}]}\) and \(\sim \!0^{[2^{n}]}\), all Boolean universal variables u m and existential variables \(e_{m^{\prime }}\) with bit-vector variables \(U_{m}{\!}^{[2^{n}]}\) and \(E_{m^{\prime }}{\!}^{[2^{n}]}\), and all logical connectives with corresponding bitwise bit-vector operators (e.g., ∧ with & ). Let \(t^{[2^{n}]}\) denote the bit-vector term generated so far. Extend it to the formula \(t = \ \sim \!0^{[2^{n}]}\). We refer to this as Φ0.

Step 2.:

We now construct Φ1 by adding new constraints to Φ0. For each u m ∈{u 0,…,u n−1}, in order to assign a binary magic number to U m , add the following equality (i.e., conjunct it with the current formula):

$$\begin{array}{@{}rcl@{}} U_{m} \ll 2^{m} & \ = \ & \sim\! U_{m} \end{array} $$
Step 3.:

Next, we construct Φ2 by adding another set of constraints to Φ1. For each existential variable \(e_{m^{\prime }} \in \{e_{0},\dots ,e_{n^{\prime }-1}\}\), depending on the universal variables \(Deps(e_{m^{\prime }})\subseteq \{u_{0},\dots ,u_{n-1}\}\), and for each \(u_{m}\notin Deps(e_{m^{\prime }})\), add the following equality:

$$\begin{array}{@{}rcl@{}} E_{m^{\prime}} \ \& \ \sim\! U_{m} & \ = \ & (E_{m^{\prime}} \ll 2^{m}) \ \& \ \sim\! U_{m} \end{array} $$
(1)

Finally, we define Φ:=Φ2.

Polynomiality

Note that all the scalars and constants in Φ are encoded in binary form. Therefore, exponential bit-widths and constants (2n and 2m) are encoded into linear many (n and m) binary digits. We now show that each reduction step results only in polynomial growth of the formula size.

Step 1 may introduce additional bit-vector constants to the formula and adds variables \(U_{m}^{[2^{n}]}, E_{m^{\prime }}^{[2^{n}]}\). The total number of elements is bounded by the size of the input. All bit-widths are 2n and, therefore, the resulting formula is bounded quadratically in the input size. Step 2 adds n equalities as constraints. Again, all bit-widths are 2n. Thus, the size of the added constraints is bounded quadratically in the input size. Step 3 adds at most n constraints for each existential variable. All bit-widths are 2n. Therefore, the size is bounded cubically in the input size.

Correctness

In order to show that the original DQBF ψ and the resulting bit-vector formula Φ are equisatisfiable we consider the individual steps separately.

In Step 1, we used the matrix ϕ of ψ to create a bit-vector formula with the same underlying structure which is true iff each row evaluates to 1. Since all the bits of bit-vectors in Φ0 are independent of each other and there are no additional constraints on the bit-vector variables, Φ0 is satisfiable iff the Boolean formula ϕ is satisfiable.

Now consider the bit-vector variables U m after constructing Φ1 by adding the constraints of Step 2. In the following, we formalize the well-known fact that the combination of all the U m s corresponds exactly to all possible assignments to the universal variables of ψ. By construction, all bits of U m are fixed to some constant value. Additionally, for every bit-index b i ∈[0,2n−1], there exists a bit-index b j ∈[0,2n−1] such that

$$\begin{array}{@{}rcl@{}} \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i}\right] & \neq& \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{j}\right] \ \text{ and} \end{array} $$
(2a)
$$\begin{array}{@{}rcl@{}} \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{i}\right] & =& \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{j}\right], \ \forall k \neq m. \end{array} $$
(2b)

Actually, we can define b j in the following way (considering the 0th bit the least significant):

$$\begin{array}{@{}rcl@{}} b_{j} := \left\{\begin{array}{ll} b_{i} - 2^{m} & \text{if } \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = 0 \\ b_{i} + 2^{m} & \text{if}\, \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = 1 \end{array} \right. \end{array} $$

By defining b j this way, (2a) and (2b) both hold, which can be seen as follows. Let R(c,l) be the bit-vector of length l with each bit set to the Boolean constant c. Equation (2a) holds, since, due to construction, U m consists of 2n−1−m concatenated bit-vector fragments 0…01…1=R(0,2m)R(1,2m) (with both 2m zeros and 2m ones). Therefore, it is easy to see that

$$\begin{array}{c} \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i}\right] \neq \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i} - 2^{m}\right] \text{ and } \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i}\right] \neq \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i} + 2^{m}\right] \text{ holds if} \\ \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = 0 \text{ and } \left[{\kern-2.3pt}[ U_{m} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = 1, \text{ respectively}. \end{array} $$

With a similar argument, we can show that (2b) holds:

$$\begin{array}{c} \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{i} - 2^{m}\right] \text{ and } \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{i} + 2^{m}\right] \text{ holds if} \\ \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = 0 \text{ and } \left[{\kern-2.3pt}[ U_{k} \right]{\kern-2.3pt}]\!\left[b_{i}\right] = 1, \text{ respectively}, \end{array} $$

since b i −2m and b i +2m are located either still in the same half or already in a concatenated copy of a R(0,2k)R(1,2k) fragment, if km.

Now, consider all possible assignments to the universal variables of our original DQBF ψ. For a given assignment α∈{0,1}n, the existence of such a previously defined b j for every U m and b i allows us to iteratively find a b α such that ([[U 0]] [b α ],…,[[U n−1]] [b α ])=α. Thus, we have a bijective mapping from the universal assignments α for ψ to the bit-indices b α for Φ1. Up to this point, each bit-vector \(e_{m^{\prime }}\) can basically still take \(2^{(2^{n})}\) different values in Φ1. The value of each individual bit \(\left [{\kern -2.3pt}[ E_{m^{\prime }} \right ]{\kern -2.3pt}]\!\left [b_{\alpha }\right ]\) corresponds to the value that \(e_{m^{\prime }}\) takes under a given universal assignment α∈{0,1}n. Note that, without any further restriction, there is no connection between the different bits of \(e_{m^{\prime }}\) and, therefore, the bit-vector represents an arbitrary Skolem-function for \(e_{m^{\prime }}\). It may have different values for all universal assignments and thus would allow \(e_{m^{\prime }}\) to depend on all universal variables. Consequently, Φ1 is satisfiable iff the QBF \(\forall u_{1},\dots ,u_{n-1} \exists e_{1},\dots ,e_{n^{\prime }-1}. \phi \) is satisfiable.

In Step 3, we rule out all those assignments to the \(e_{m^{\prime }}\)s that correspond to Skolem-functions which do not respect the dependency scheme of ψ. Whenever \(e_{m^{\prime }}\) does not depend on a universal variable u m , we add the constraint of (1). In DQBF, independence can be formalized in the following way: \(e_{m^{\prime }}\) does not depend on u m if \(e_{m^{\prime }}\) has to take the same value in the case of all pairs of universal assignments α,β∈{0,1}n where α[k]=β[k] for all km. Exactly this is enforced by our constraint. Looking at the corresponding bit-indices b α and b β for α and β, respectively, our constraint for independence ensures that [[E]] [b α ]=[[E]] [b β ]. More precisely, (1) ensures that the positive and negative cofactors of the Skolem-function for \(e_{m^{\prime }}\) with respect to an independent variable u m have the same value. Having added those constraints, Φ2 is now respecting the dependency scheme and therefore Φ is satisfiable iff the original DQBF ψ is satisfiable.

Lemma 2

QBF can be reduced to QF_BV2 ≪1 [30].

Proof

To show the PSpace-hardness of QF_BV2≪1, we give a reduction from QBF, similar to the one from DQBF to QF_BV2c that we used in Lemma 1.

For our reduction, we again use the binary magic numbers. Note that, in Lemma 1, we used left shift by constant to construct the binary magic numbers. This is not permitted in QF_BV2≪1. We therefore give an alternative construction of the binary magic numbers using only bitwise operations, equality, and left shift by 1.

Let \(b_{0}{\!}^{[2^{n}]},\dots ,b_{n-1}{\!}^{[2^{n}]}\) be n initially unconstrained bit-vector variables. By adding certain constraints, we want to ensure that the only possible value the variables can take are those of the binary magic numbers. For the following argument, consider the bit-vector variables \(b_{0}{\!}^{[2^{n}]},\dots ,b_{n-1}{\!}^{[2^{n}]}\) as column vectors in a matrix \(B^{[2^{n} \times n]}\). Written next to each other in this way, the matrix formed by the binary magic numbers would be uniquely determined by the following property: If each row of B is interpreted as a number 0≤c<2n in binary representation, the next row is equal to c+1. The rows of B therefore represent a counter from 0 to 2n−1. We can capture this fact by adding the following n constraints, with m∈{0,…,n−1}:

$$\begin{array}{@{}rcl@{}} \left( \bigwedge_{0 \leq i < m} b_{i}\right) \oplus b_{m} \ = \ b_{m} \ll 1 \end{array} $$

The left side of each constraint considers one specific column of B (i.e. one index of the counter) and the value of each position will change iff all columns to the right are equal to 1 (i.e. the lower indices of the counter generate an overflow). In this sense, the left sides of all constraints increment the counter value corresponding to a row of B. The right sides of all constraints ensure that the incremented counter value is placed in the next row of B. □

As already mentioned, we now give the reduction which is similiar to the one in Lemma 1. An example can be found in Appendix B.

The reduction

Let ψ:=Q.ϕ denote a QBF with quantifier prefix Q and matrix ϕ. Since ψ is a QBF (in contrast to DQBF in Lemma 1), we know that Q defines a total order on the universal variables. We assume the universal variables u 0,…,u n−1 of ϕ are ordered according to their appearance in Q, with u 0 and u n−1 being the innermost and outermost variable, respectively. Translate ψ to a QF_BV2≪1 formula Φ by eliminating the quantifier prefix and translating the matrix as follows:

Step 1.:

Replace all Boolean constants 0 and 1 with \(0^{[2^{n}]}\) and \(\sim \!0^{[2^{n}]}\), all Boolean universal variables u m and existential variables \(e_{m^{\prime }}\) with bit-vector variables \(U_{m}{\!}^{[2^{n}]}\) and \(E_{m^{\prime }}{\!}^{[2^{n}]}\), and all logical connectives with corresponding bitwise bit-vector operators (e.g., ∧ with &). Let \(t^{[2^{n}]}\) denote the bit-vector term generated so far. Extend it to the formula \(t = \ \sim \!0^{[2^{n}]}\). We refer to this as Φ0.

Step 2.:

We now construct Φ1 by adding new constraints to Φ0. For each universal variable u m ∈{u 0,…,u n−1}, in order to assign a binary magic number to \(U_{m}{\!}^{[2^{n}]}\), add the following equality (i.e., conjunct it with the current formula):

$$\begin{array}{@{}rcl@{}} \left( \bigwedge_{0 \leq i < m} U_{i}\right) \oplus U_{m} = U_{m} \ll 1 \end{array} $$
Step 3.:

Next, we construct Φ2 by adding another set of constraints to Φ1. For each existential variable \(e_{m^{\prime }} \in \{e_{0},\dots ,e_{n^{\prime }-1}\}\) depending on the universal variables \(Deps(e_{m^{\prime }}) = \{u_{m},\dots ,u_{n-1}\}\), with u m being the innermost universal variable that \(e_{m^{\prime }}\) depends on, check the following conditions:

if \(Deps(e_{m^{\prime }}) = \emptyset \), add the equality:

$$\begin{array}{@{}rcl@{}} E_{m^{\prime}} \ \& \ \! \sim\! 1 \ = \ E_{m^{\prime}} \ll 1 \end{array} $$
(3)

otherwise, if m≠0, add the two equalities:

$$\begin{array}{@{}rcl@{}} U^{\prime}_{m} & \ = \ & \sim\! \left( (U_{m} \ll 1) \oplus U_{m} \right) \end{array} $$
(4)
$$\begin{array}{@{}rcl@{}} E_{m^{\prime}} \ \& \ U^{\prime}_{m} & \ = \ & (E_{m^{\prime}} \ll 1) \ \& \ U^{\prime}_{m} \end{array} $$
(5)

Finally, we define Φ:=Φ2.

Step 1 and Step 2 are equal to those of Lemma 1 apart from the fact that a different construction for the binary magic numbers is used.

Again, each bit-index of Φ corresponds to the evaluation of ψ under a specific assignment to the universal variables u 0,…,u n−1, and, by construction of \(U_{0}{\!}^{[2^{n}]},\dots ,U_{n-1}{\!}^{[2^{n}]}\), all possible assignments are considered. Equation (4) creates a bit-vector \(U^{\prime }_{m}{\!}^{[2^{n}]}\) for which each bit equals to 1 iff the corresponding universal variable changes its value from one universal assignment to the next. In contrast to Lemma 1, this can now only be done for neighbouring bit-indices since we are only allowed to use left shift by 1 instead of arbitrary constants in Step 3. For QBF, this is sufficient because Q defines a total order on the universal variables.

Of course, (4) does not have to be added multiple times, if several existential variables depend on the same universal variable. Equations (5) and (3) ensure that the corresponding bits of \(E_{m^{\prime }}{\!}^{[2^{n}]}\) satisfy the dependency scheme of ψ by only allowing the value of \(e_{m^{\prime }}\) to change if an outer universal variable takes a different value. If \(Deps(e_{m^{\prime }}) = \{u_{0},\dots ,u_{n-1}\}\), i.e., if \(e_{m^{\prime }}\) depends on all universal variables, (4) evaluates to \(U^{\prime }_{0} = 0^{[2^{n}]}\), and, as a consequence, (5) simplifies to true. Because of this, no constraints need to be added for m=0.

A similar approach used for translating QBF to Symbolic Model Verification (SMV) can be found in [23]. See also [51] for a translation from QBF to sequential circuits.

Lemma 3

QF_BV2 ≪1 can be reduced to sequential circuits [30].

Proof

In [55, 56], the authors give a polynomial translation from quantifier-free Presburger arithmetic with bitwise operations (QFPAbit [53]) to sequential circuits. While they deal with non-fixed-size bit-vectors, we focus on fixed-size bit-vectors but share the goal of avoiding the exponential explosion due to explicit state representation as for example used in MONA [38]. We can adopt their approach in order to construct a translation for QF_BV2≪1. Related work, introducing an automata-based representation for Presburger Arithmetic (without bitwise operations), can be found in [61].

For the most part, the basic structure as well as the arguments used throughout the reduction are the same as in [55, 56]. To keep the proof compact, we therefore focus on pointing out the changes compared to their earlier work and regularly refer to [55, 56] for the technical details.

As mentioned, the main difference between QFPAbit and QF_BV2≪1 is the fact that bit-vectors of arbitrary, non-fixed, size are allowed in QFPAbit while all bit-vectors contained in QF_BV2≪1 have a fixed bit-width. We now give the reduction.

Given Φ∈QF_BV2≪1 in flat form, let x [n],y [n] denote bit-vector variables, c [n] a bit-vector constant, and t 1 [n],t 2 [n] bit-vector terms only containing bit-vector variables and bitwise operations. Following [55, 56], we further assume w.l.o.g that Φ only consists of logical combinations of three types of atomic expressions: t 1 [n]=t 2 [n], x [n]=c [n], and x [n]=y [n]≪1[n]. Similar to generating a formula in flat form (Definition 8), it is easy to see that any QF_BV2≪1 formula can be written like this with only linear growth in size by introducing Tseitin variables.

We then encode each equality in Φ into an individual sequential circuit separately. In the following, those are referred to as atomic sequentical circuits. Compared to [55, 56], two modifications for the construction of an atomic sequential circuits are needed. First, we need to give a translation of x=y≪1 to sequential circuits. This can be done, for example, by using the sequential circuit for x=2⋅y in QFPAbit. The second modification relates to dealing with fixed-size bit-vectors. Let n be the bit-width of all bit-vectors in a given atomic expression. We extend each atomic sequential circuit to include a counter (circuit). The counter initially is set to 0 and is incremented by 1 in each clock cycle up to a value of n. When the counter reaches a value of n, the counter as well as the original atomic sequential circuit keep their value during all remaining cycles. In this way, their output also remains the same during all following cycles.

Using D-type flip-flops, as in the definition of sequential circuits in Section 3.2, this can be easily realized by adding a combinatorial part: Assume that the counter consists of k bits, represented by flip-flops c 0,…,c k−1 with outputs o 0,…,o k−1, respectively. Checking whether the counter has reached a value of n can be realized by a Boolean function f(o 0,…,o k−1), represented as a combinatorial circuit. Further, let c denote the flip-flop of the original atomic sequential circuit and let o and i (which again can be an arbitrary function) denote its output and its input, respectively. We now replace the input i by a combinatorial circuit realizing the function

$$(f(o_{0},\dots,o_{k-1}) \land o) \lor (\neg f(o_{0},\dots,o_{k-1}) \land i) $$

This forces c to use its own output as its input if the counter has reached a value of n, and use its regular input otherwise. The counter flip-flops c 1,…,c k will be forced to stabilize after n has been reached in the same way. Note that a counter like this can be realized with L n gates, i.e., polynomially in the size of Φ. For a practical implementation, it is of course not necessary to introduce separate counters for each atomic sequential circuit. Instead, one counter can be used to address all atomic sequential circuits. However, concerning our complexity result, this obviously makes no difference.

In contrast to the implementation described in [55], we further assume that the input streams for all variables start with the least significant bit. As already pointed out by the authors in [55], their choice was arbitrary and it is no more complicated to construct the circuits the other way around.

Finally, after constructing all atomic sequential circuits, their outputs are combined by logical gates following the Boolean structure of Φ, in the same way as for non-fixed bit-width in [55, 56]. Due to the counters being part of the atomic sequential circuits, we ensure that for every input stream x i , that represents a bit-vector variable of bit-width n i , only the first n i bits of x i influence the result of the whole circuit. □

Lemma 4

QF_BV2 b w ∈ NP [30].

Proof

To show that QF_BV2 b w ∈ NP, we give a reduction from QF_BV2 b w to a scalar-bounded set of formulas S. With S⊂QF_BV2, the claim then follows from Corollary 1. An example, that combines further results from Section 7.2, can be found in Appendix C.

Suppose we are given a formula Φ∈QF_BV2 b w in flat form (Definition 8). We assume that any inequality t 1 [n]t 2 [n] in Φ is expressed by ∼ (t 1 [n]=t 2 [n]). If Φ contains any constants c [m] where c≠0, we remove those constants in a (polynomial) pre-processing step. Let c m a x [m]:=b k−1b 1 b 0 be the largest constant in Φ denoted in binary representation with b k−1=1 and arbitrary bits b k−2,…,b 0. We now replace each equality t 1 [n]=t 2 [n], in Φ with

$$t_{1,0}{\!}^{[1]} = t_{2,0}{\!}^{[1]} \ \land \ {\dots} \ \land \ t_{1,{n-1}}{\!}^{[1]} = t_{2,{n-1}}{\!}^{[1]}, $$

if nk. Otherwise, if n>k, we instead replace t 1 [n]=t 2 [n] with

$$t_{1,0}{\!}^{[1]} = t_{2,0}{\!}^{[1]} \ \land \ {\dots} \ \land \ t_{1,{k-1}}{\!}^{[1]} = t_{2,{k-1}}{\!}^{[1]} \ \land \ t\text{\textsc{hi}}_{1}{\!}^{[n-k]} = t\text{\textsc{hi}}_{2}{\!}^{[n-k]}. $$

For 0≤i< min{n,k}, we use t 1,i [1]=t 2,i [1] to express the ith row of the original equality. For constructing the terms t 1,i [1] and t 2,i [1], (i) replace each occurrence of a variable x [n] with the variable x i [1], and (ii) replace each constant c [n] with 0[1] if the ith bit of c is 0, and with ∼ 0[1] otherwise.

In a similar way, if n>k, t hi 1 [nk]=t hi 2 [nk] represents the remaining nk rows of the original equality corresponding to the most significant bits. For constructing t hi 1 [nk] and t hi 2 [nk], (i) replace each occurrence of a variable x [n] with the variable x hi [nk], and (ii) replace each constant c [n] with 0[nk].

Since this pre-processing step is logarithmic in the value of c m a x , it is polynomial in |Φ|. Without loss of generality, we now assume that Φ does not contain any bit-vector constants different from 0[n].

We now construct a formula Φ by reducing the bit-widths of all bit-vector terms in Φ. We use c n t e q (Φ) to denote the number of equalities in Φ. Each term t [n] in Φ is then replaced with a term \(t^{[n^{\prime }]}\), with n := min{n,cnt e q (Φ)}≤|Φ|. Apart from this, Φ is exactly the same as Φ. As a consequence, m a x s c l )≤|Φ|. The set of formulas constructed in this way is scalar-bounded according to Definition 11.

To complete our proof, we now have to show that the proposed reduction is sound, i.e., out of every satisfying assignment to the bit-vector variables \(x_{1}{\!}^{[n_{1}]},\dots ,x_{k}{\!}^{[n_{k}]}\) for Φ we can also construct a satisfying assignment to \(x_{1}{\!}^{[n^{\prime }_{1}]},\dots ,x_{k}{\!}^{[n^{\prime }_{k}]}\) for Φ and vice versa.

It is easy to see that whenever we have a satisfying assignment α for Φ, we can construct a satisfying assignment α for Φ. This can be done by simply setting all additional bits of all bit-vector variables to the same value as the most significant bit of the corresponding original vector, i.e., by performing a signed extension. Since all equalities still evaluate to the same value under the extended assignment, α(F)=α (F ) for all equalities F and F of Φ and Φ, respectively. As a direct consequence, α(Φ)=α )=1.

The other direction needs slightly more reasoning. Given α, with α(Φ)=1, we need to construct α , with α )=1. Again, we want to ensure that α (F )=α(F) for all equalities F and F in Φ and Φ, respectively.

In each variable \(x_{i}^{[n_{i}]}\), i∈{1,…,k}, we select some of the bits. For each equality F with α(F)=0, we select a bit-index as a witness for its evaluation. If α(F)=1, we select an arbitrary bit-index. We then mark the selected bit-index in all bit-vector variables contained in F, as well as in all other bit-vector variables of the same bit-width. Having done this for all equalities, we end up with sets M i of selected bit-indices, for all i∈{1,…,k}, where

$$\begin{array}{@{}rcl@{}} \left| M_{i} \right| & \ \leq \ & \min\{n_{i}, \textit{cnt}_{\mathsf{eq}}\left( {\Phi}\right)\} \\ M_{i} & \ = \ & M_{j} \qquad \forall j \in \{1,\dots,k\} \ \text{with} \ n_{i} = n_{j} \end{array} $$

The selected indices contain a witness for the evaluation of each equality. We now add arbitrary further bit-indices, again selecting the same indices in bit-vector variables of the same bit-width, until |M i |= min{n i ,cnt e q (Φ)} ∀i∈{1,…,k}.

Finally, we can directly construct α using the selected indices and get α )=α(Φ)=1 because of the fact that we included a witness for every equality in our index-selection process. Note that we only had to choose a specific witness for the case that α(F)=0. For α(F)=1, we were able to choose an arbitrary bit-index because every satisfied equality is obviously still satisfied when only a subset of all bit-indices is considered. □

Remark 1

A similar proof can be found in [35, 36]. While the focus of [35, 36] lies on improving the practical efficiency of SMT-solvers by reducing the bit-width of a given formula before bit-blasting, the author does not investigate its influence on the complexity of a given problem class. In fact, the author claims that bit-vector theories with common operations are NP-complete. As we have already shown, this only holds if unary encoding on scalars is used. However, unary encoding leads to the fact that the given class of formulas remains NP-complete, independent of whether a reduction of the bit-width is possible. While the arguments on bit-width reduction given in [35, 36] still hold for binary encoded bit-vector formulas when only bitwise operations are used, our proof considers the effect on the complexity of the problem class.

7 Fragment Extensions and Alternative Characterizations

In this section, we investigate possible extensions to the fragments we have been dealing with so far and give alternative characterizations of specific logics. We use the term base operations to refer to the operations that we previously selected to define a certain class of bit-vector problems. Considering the complexity results from the previous section, we know that the specific sets of base operations are sufficient to guarantee certain completeness results. This leads towards two potential directions of analysis.

On the one hand, it is interesting to see which common operations could be added to a fragment without increasing the complexity of the satisfiability problem. With QF_BV2c being NExpTime-complete, any common operation can extend this fragment without increasing complexity; the full extension is exactly the definition of QF_BV2. It is more interesting to investigate which operations can be added to QF_BV2 b w and QF_BV2≪1 while still remaining in NP and PSpace, respectively. In order to check this, we present several reductions of additional operations to base operations.

On the other hand, it is also interesting to explore possible reductions of base operations to additional ones. We showed that the satisfiability problem for QF_BV2 b w , i.e., when bitwise operations and equality are used as base operations, is NP-complete. Using left shift by 1 or left shift by constant as an additional base operation makes the satisfiability problem PSpace-hard (Lemma 2) or NExpTime-hard (Lemma 1), respectively. If it is possible to show that any of these two base operations can be reduced to another operation o (together with bitwise operations and equality), then o can be considered as an alternative base operation, ensuring the satisfiability problem to remain hard for the specific complexity class.

7.1 Notation

Note that, since binary encoding is used on scalars, all the translations of operations must be logarithmic in the bit-widths of operands, in order to ensure that a reduction is polynomial in the formula size.

For describing our reductions, we often use the following form:

figure b

By this description, we want to express that we replace a term t e r m 1 in a formula Φ with t e r m 2, and simultaneously add all the quantifier-free formulas f o r m u l a 1, …, f o r m u l a k to Φ(i.e., conjunct each of them with the matrix of Φ). We call f o r m u l a 1, …, f o r m u l a k the assertions in the definition. All the variables that do not occur in t e r m 1, but do occur in any of the expressions t e r m 2, f o r m u l a 1, …, f o r m u l a k are considered as Tseitin variables, i.e., they are assumed to be added to Φ as new existential variables at the innermost prefix position.

Let us note that, in our fragments, it is sufficient to use a minimal functionally complete set of bitwise operations, e.g., bvnand alone.

By bitwise operations and equality, functional if-then-else (ite) can be expressed easily, as follows. Note that, in order to avoid exponential blowup, a Tseitin variable x is introduced for the Boolean condition:

figure c

7.2 QF_BV2 b w

Let us introduce the operation indexing t [n] [i], which is defined as t[i:i], i.e., a special case of extraction. Although, in Section 7.4, we show that adding extraction makes the fragment NExpTime-hard, QF_BV2 b w can be extended with indexing without growth in complexity.

Theorem 5

QF_BV2 b w extended by indexing is in NP.

Proof

To show this, we extend the proof of Lemma 4 by an additional pre-processing step even before removing the non-zero constants. Suppose we are given a formula Φ∈QF_BV2 b w , also containing expressions t [n] [i]. Let

$$I \ := \ \{i \ | \ t^{[n]}\!\left[i\right] \textnormal{ appears in } {\Phi}\} $$

be the set of all indices that appear explicitly in the formula. Assume I={i 1,…,i m } with i l <i l+1, ∀l∈{1,…,m−1}. After extracting those bit-indices from Φ, we explicitly encode the corresponding bits into Boolean variables, by translating Φ in a similar way as in Lemma 4. Consider three different kinds of terms in the following order:

  1. 1.

    Terms t [n] [i] are replaced by t i [1].

  2. 2.

    Terms t [1] remain in the formula as they are.

  3. 3.

    Any other term has a bit-width n>1. Therefore, we know that it can only occur as part of an equality t 1 [n]=t 2 [n]. We define

    $$l^{\prime} := \left|\left\{ l \in \{1,\dots,m\} \ | \ i_{l} < n\right\}\right| $$

    as the number of explicitly specified indices smaller than n. Now, similar to Lemma 4, replace each equality t 1 [n]=t 2 [n] with

    $$(t_{1,0}{\!}^{[1]} = t_{2,0}{\!}^{[1]}) \ \land \ {\dots} \ \land \ (t_{1,n-1}{\!}^{[1]} = t_{2,n-1}{\!}^{[1]}), $$

    if n=l . Otherwise, if n>l , replace t 1 [n]=t 2 [n] with

    $$\left( \displaystyle\bigwedge_{l \in \{1,\dots,l^{\prime}\}} (t_{1,i_{l}}{\!}^{[1]} = t_{2,i_{l}}{\!}^{[1]})\right) \land \ t\text{\textsc{rem}}_{1}{\!}^{[n-l^{\prime}]} = t\text{\textsc{rem}}_{2}{\!}^{[n-l^{\prime}]}. $$

As in Lemma 4, we use t 1,i [1]=t 2,i [1] to express the ith row of the original equality. In the same way, t i [1], being introduced for an indexing, represents the ith bit of t. The new terms t 1,i , t 2,i , and t i are constructed in the same way as in Lemma 4.

Similarly, if n>l , the expression \(t\text {\textsc {rem}}_{1}{\!}^{[n-l^{\prime }]} = t\text {\textsc {rem}}_{2}{\!}^{[n-l^{\prime }]}\) represents the remaining nl rows of the original equality corresponding to the indices that have not been extracted explicitly. Those terms are again constructed in the same way as in Lemma 4, except for the construction of new constants: each constant c [n] is replaced with a new constant \(c\text {\textsc {rem}}^{[n-l^{\prime }]}\) by setting the jth bit of c rem to the value of the kth bit of c, for k:= min{k | |{1,…,k }∖I|=j}.

After this translation, the resulting formula Φ does not contain indexing operations anymore and is equisatisfiable to the original one. Also, |Φ|≤p(|Φ|) for some polynomial p, since the growth in size is bounded by the number of occurrences of the indexing operation in Φ. Note that this reduction is only possible because there is no interaction between different bit-indices, i.e., because Φ only contains bitwise operations and equality, apart from indexing. □

Similarly, extending QF_BV2 b w with additional relational operations from Table 1 does not increase complexity, either.

Theorem 6

QF_BV2 b w extended by relational operations from Table 1 is in NP.

Proof

We give a reduction for the relational operation unsx‘igned less than (bvult). The remaining relational operations in Table 1 can be reduced in a similar way. Given Φ∈QF_BV2 b w (without indexing), additionally containing expressions t 1 [n] < u t 2 [n], we adopt the proof of Lemma 4 in three ways.

First, the elimination of constants has to be modified. Again, let c m a x :=b k−1b 1 b 0 be the largest constant in Φ denoted in binary representation with b k−1=1 and arbitrary bits b k−2,…,b 0. Without loss of generality, assume n>k. We now replace each relation t 1 [n] < u t 2 [n] in Φ with

$$\begin{array}{lr} & (t\text{\textsc{hi}}_{1}{\!}^{[n-k]} \ <_{\mathbf{u}}\ t\text{\textsc{hi}}_{2}{\!}^{[n-k]}) \\ \lor & (t\text{\textsc{hi}}_{1}{\!}^{[n-k]} = t\text{\textsc{hi}}_{2}{\!}^{[n-k]}) \land (\lnot t_{1,{k-1}}{\!}^{[1]} \land t_{2,{k-1}}{\!}^{[1]}) \\ \lor & {\dots} \\ \lor & (t\text{\textsc{hi}}_{1}{\!}^{[n-k]} = t\text{\textsc{hi}}_{2}{\!}^{[n-k]}) \land (t_{1,{k-1}}{\!}^{[1]} \Leftrightarrow t_{2,{k-1}}{\!}^{[1]}) \land {\dots} \land (\lnot t_{1,0}{\!}^{[1]} \land t_{2,0}{\!}^{[1]}) \\ \end{array} $$

All expressions t 1,i [1], t 2,i [1], t hi 1 [nk], and t hi 2 [nk] are defined in the same way as it was done in Lemma 4.

Second, we need to use the number of all the relational operations c n t r e l (Φ), when reducing the bit-widths in Φ.

The third modification is needed for constructing a satisfying assignment α for the bit-width reduced formula Φ out of the satisfying assignment α for Φ. When selecting the bit-index which is used as a witness for the evaluation of a given expression t 1 [n] < u t 2 [n], we choose the index of the most significant bit which is assigned to a different value in the two terms. As in Lemma 4, an arbitrary bit-index can be chosen if both terms are assigned to the same value.

Again, the reduction is only possible because there is no interaction between different bit-indices. While we only considered t 1 [n] < u t 2 [n] in our proof, it is easy to see that it holds for all relational operations from Table 1. All unsigned operations can be replaced by t 1 [n] < u t 2 [n] as in the definition of Table 1. For signed operations, an additional if-then-else constraint on the most significant bit is needed. □

So far, we only discussed extensions by indexing and relational operations separately. However, using the same principles, it is indeed possible to show that we can add both kind of operations at the same time without growth in complexity. We only sketch the argument: As in the original proof for indexing, we first remove all occurrences of the indexing operation from the formula. This time, it is not sufficient to extract those bit-indices from the bit-vectors. Instead, we have to split all bit-vectors at the corresponding bit-index. Let i with 0<i<n be an index that explicitly occurs at some point in the formula. Replace t 1 [n] < u t 2 [n] with

$$\begin{array}{lr} & (t\text{\textsc{hi}}_{1}{\!}^{[n-i-1]} \ <_{\mathbf{u}}\ t\text{\textsc{hi}}_{2}{\!}^{[n-i-1]}) \\ \lor & (t\text{\textsc{hi}}_{1}{\!}^{[n-i-1]} = t\text{\textsc{hi}}_{2}{\!}^{[n-i-1]}) \land (\lnot t_{1,i}{\!}^{[1]} \land t_{2,i}{\!}^{[1]})\\ \lor & (t\text{\textsc{hi}}_{1}{\!}^{[n-i-1]} = t\text{\textsc{hi}}_{2}{\!}^{[n-i-1]}) \land (t_{1,i}{\!}^{[1]} \Leftrightarrow t_{2,i}{\!}^{[1]}) \land (t\text{\textsc{lo}}_{1}{\!}^{[i]} \ <_{\mathbf{u}}\ t\text{\textsc{lo}}_{2}{\!}^{[i]}) \end{array} $$

For the more general case, with indices I={i 1,…,i m }, the bit-vectors need to be split analogously at all bit-indices i l . Apart from this, the reduction works as already described. This leads to the following corollary:

Corollary 2

QF_BV2 b w extended by indexing together with relational operations from Table 1 is in NP.

See Appendix C for an example.

7.3 QF_BV2≪1

Figure 1 depicts our forthcoming results on extending QF_BV2≪1 with operations. An edge (o 1,o 2) means that o 1 can be reduced to o 2, together with bitwise operations and equality. The vertex bvshl_1 represents left shift by 1, and plays a central role as being a base operation in QF_BV2≪1. The vertex bvmul_c represents multiplication by constant, and the four vertices to the right correspond to different kinds of unsigned and signed relational operations. All the other vertices are self-explanatory. Note that each operation which is mutually reachable with bvshl_1, namely bvlshr_1, bvadd, bvsub, and bvmul_c, can be used as an alternative base operation instead of bvshl_1.

Fig. 1
figure 1

Extending QF_BV2≪1 with operations

First, we show that QF_BV2≪1 can be extended with indexing. Although a similar result was proposed for QF_BV2 b w , the reduction we used there is not appropriate for QF_BV2≪1, because of the presence of shifts in the formulas.

Theorem 7

QF_BV2 ≪1 extended by indexing is in PS pace.

Proof

The counter we introduced in our translation from QF_BV2≪1 to sequential circuits (Lemma 3) can be used to return the value at a specific bit-index of a bit-vector. □

Instead of left shift by 1, we could also have used logical right shift by 1 to define QF_BV2≪1.

Theorem 8

left shift by 1 and logical right shift by 1 can be reduced to each other.

Proof

We give a direct translation:

figure d

Further, it is well-known that any arithmetic right shift t 1 [n] s t 2 [n] can be reduced to logical right shift, as follows: i t e(t 1 [n−1] , ∼ (∼ t 1 u t 2) , t 1 u t 2). □

We now look at arithmetic operations:

Theorem 9

QF_BV2 ≪1 extended with linear modular arithmetic is in PS pace.

Proof

Addition can be expressed as follows:

figure e

Multiplication by constant can be splitted into several multiplications by 2, i.e., left shift by 1, and addition, similar to [55, 56]. Given such a multiplication t [n]c [n], we introduce two sets of variables, s i and x i , 0≤iL c. Each s i represents ti, and calculated by shifting s i−1 by 1. Note that only logarithmic many steps need to be performed. Each x i represents the subresult in the ith step. By considering the individual bits of c, s i either is or is not added to the previous subresult x i−1. Finally, x L c provides the required product.

figure f

Considering the opposite direction, t≪1 can easily be expressed as t⋅2. Consequently, it can also be expressed as t s+t s where t s [n] is a Tseitin variable for t. This shows we could also have used addition instead of left shift by 1 to define QF_BV2≪1.

Unary minus (bvneg) and subtraction (bvsub) can obviously be added to QF_BV2≪1 by using two’s complement representation. Furthermore, it is easy to see that addition and subtraction can be reduced to each other. Extending QF_BV2≪1 with additional relational operations, such as unsigned less than (bvult), does not increase complexity either. A term t 1 [n] < u t 2 [n] is the same as checking whether t 1t 2 < u 0 holds, which can be replaced by constructing an adder for t 1+(∼ t 2)+1, analogously to the one above, and then check whether overflow occurs, i.e., t s 2≠0 & ¬c o u t [n−1]. Obviously, the common unsigned or signed relational operations less than, greater than, less than or equal, and greater than or equal are equally powerful. □

7.4 QF_BV2c

Figure 2 depicts our forthcoming results on extending QF_BV2c with operations. The vertex bvshl_c represents left shift by constant, which is a base operation. Since bvshl_1 is a special case of bvshl_c, all the operations that can extend QF_BV2≪1 (cf. the previous section), represented by the dashed segment in the upper left corner, can obviously be reduced to bvshl_c. Actually, as we have already mentioned before, any common operation can extend this fragment, with QF_BV2c being NExpTime-complete. This explains why bvshl_c is reachable from all the vertices. We only give the most interesting explicit reductions in this direction.

Fig. 2
figure 2

Extending QF_BV2c with operations

The other direction, i.e., presenting operations being reachable from bvshl_c, is more important from the theoretical point of view, since those ones can be used as alternative base operations instead of bvshl_c. These operations are extract, concat, bvmul, bvshl, bvlshr_c, and bvlshr.

Theorem 10

bvshl_c and bvlshr_c can be reduced to each other.

Proof

Given a term t [n]c [n] or t [n] u c [n], there are two boundary cases: if c=0 then rewrite the term to t; if cn then to 0[n]. Otherwise, i.e., when 0<c<n, the following reductions can be applied:

figure g

Each kind of shift by constant is a special case of the respective general shift.Footnote 4 As mentioned in the previous section, arithmetic shift can be expressed by logical shift. □

Theorem 11

extraction, concatenation, and bvshl_c can be reduced to each other.

Proof

First, consider extraction and concatenation:

figure h

The base operation bvshl_c can then easily be expressed by extraction and concatenation (and also by any of them alone, since they can be reduced to each other). The boundary cases for bvshl_c can be handled in the same way as above, therefore we now assume that 0<c<n, and rewrite the term t [n]c [n] to t[nc−1:0]∘0[c].

The reduction in the other way around, i.e., extraction (or concatenation) to bvshl_c and bvlshr_c, takes a special role. Given a term t [n][i:j], extraction produces a new term of bit-width ij+1. This change in bit-width (which also occurs for concatenation) cannot be captured by only applying rewriting rules using shifts. However, we can find a reduction from bit-vector formulas using only extraction, bitwise operations, and equality to ones using only shifts by constant, bitwise operations, and equality, as follows.

Given a formula Φ with bit-vector variables \(x_{1}{\!}^{[n_{1}]}, \dots , x_{l}{\!}^{[n_{l}]}\), let us calculate the maximal bit-width n m a x := maxk{n k }. First, replace each extraction t [m][i:j] in Φ with

$$\left( t \ll (n_{\mathsf{max}}-1-i) \right) \ \gg_{\mathbf{u}} \ (n_{\mathsf{max}}-1-i+j) $$

Then, replace each bit-vector variable \(x_{k}{\!}^{[n_{k}]}\) with a new bit-vector variable \(x^{\prime }_{k}{\!}^{[n_{\mathsf {max}}]}\). Finally, for each \(x^{\prime }_{k}\), add the following assertion to the formula:

$$\begin{array}{@{}rcl@{}} x^{\prime}_{k}{\!}^{[n_{\mathsf{max}}]} \ = \ \left( x^{\prime}_{k} \ll (n_{\mathsf{max}} - n_{k})\right) \ \gg_{\mathbf{u}} \ (n_{\mathsf{max}} - n_{k}) \end{array} $$

In the resulting formula, all bit-vectors have the same bit-width, and each bit-vector and each result of an extraction can take exactly those values it could take in the original formula, apart from leading zeros. □

We now take a closer look at multiplication:

Theorem 12

multiplication and bvshl_c can be reduced to each other.

Proof

First, we show how bvshl_c can be expressed by bvmul. Again, assume that 0<c<n. In this case, t [n]c [n] can be expressed as t⋅2c. We can construct 2c, being an exponential number, as a bit-vector in L c steps using exponentiation by squaring. We introduce two sets of variables, p i and x i , 0≤iL c. Each p i represents the number \(2^{(2^{i})}\), and each x i the subresult in the ith step. By considering the individual bits of c, the previous subresult x i−1 either is or is not multiplied by p i . Finally, x L c provides the value 2c.

figure i

Although we know, based on the complexity results, that even general multiplication can be expressed in this fragment, it is still a non-trivial task to give an explicit reduction. While everal polynomial multiplication algorithms in the bit-width of operands exist, we cannot directly apply them since we now need a logarithmic translation in the bit-width. Before showing how to simulate the common “shift and add” algorithm, we first introduce four bit-vector helper operations to make the presentation as transparent as possible: binmagic, selfconcat, halfshuffle, and expand. Furthermore, let us introduce the notation P n for the nearest power of 2, and define it as follows: P n:=2Ln.

For the helper operation binmagic, which is in fact about constructing a binary magic number, we use the same notation and approach as in Lemma 1, where m<n:

figure j

Halfshuffle applies a logarithmic translation, which is based on the generalization of a bit-vector operation called half-shuffle [58, Chpt. 7]. This generalized variant receives a bit-vector \(t^{[2^{m}]}\) and produces the following bit-vector of bit-width 2n:

$$\underbrace{0{\dots} 0}_{2^{n-m}-1} \ t[2^{m}-1] \ \underbrace{0{\dots} 0}_{2^{n-m}-1} \ t[2^{m}-2] \ {\dots} \ \underbrace{0{\dots} 0}_{2^{n-m}-1} \ t[0] $$

In the initialization step, we apply zero extension to t. Then, in m steps, we shuffle smaller and smaller bit groups in our bit-vector. In the 1st step, the two halves (i.e., 2m−1-bit groups) are shuffled. In the 2nd step, the halves of all the previously shuffled halves (i.e., 2m−2-bit groups), and so on. In the last step, we shuffle single bits, and this is how to put each bit at its destination. Assume again that mn.

figure k

As it can be seen above, in the ith step we (i) shift our current bit-vector left by the constant 2ni−2mi, (ii) merge it with the original bit-vector, by using bitwise or, (iii) and we mask some unnecessary bit groups out, by using a binary magic number.

For an example, see Appendix D.

Expand “multiplies” each bit of \(t^{[2^{m}]}\) into a bit group of size 2nm. The resulting bit-vector can be visualized as follows:

$$\underbrace{t[2^{m}-1]{\dots} t[2^{m}-1]}_{2^{n-m}} \ \underbrace{t[2^{m}-2]{\dots} t[2^{m}-2]}_{2^{n-m}} \ {\dots} \ \underbrace{t[0]{\dots} t[0]}_{2^{n-m}} $$

In the initial step, we use halfshuffle. In the next nm steps, we copy larger and larger non-zero bit groups, by using left shift and bitwise or. Assume again that mn.

figure l

Now we are ready to propose how to express multiplication by simulating the common “shift and add” algorithm for integers. In a first step, one of the operands is multiplied independently by each digit of the other operand. Using base 2, this multiplication by a single digit can be expressed by a logical and-operation. Afterwards the results of the single-digit multiplications are shifted by the offset of the corresponding digit and finally added to give the result of the full multiplication. While this approach is straightforward in a naive implementation, we have to ensure only logarithmic many operations in the bit-width are used in our encoding. To achieve this, we generate bit-vectors of quadratic bit-width (P n)2 out of our original operands of bit-width n, by applying selfconcat to the first one and expand to the second one. Using bitwise and on the two new bit-vectors, we directly get the results of all single-digit multiplications in one step. More precisely, the resulting bit-vector consists of P n groups of P n bits, each group representing the result of one single-digit multiplication. To add all P n partial results, a binary addition algorithm is used. Iteratively pairs of neighbouring groups are shifted relative to each others’ offsets and then added to form one new group. The number of groups therefore is halved in each step, resulting in the final sum after log2(P n)=L n steps. For a detailed example, see also Appendix E.

figure m

8 Logics with Quantifiers and Binary Encoding

In this section, we look into the complexity of quantified bit-vector logics with binary encoding. While we already gave some results for BV2 and UFBV2 in [41], we now extend our previous work by discussing some fragments of those logics. Additionally, we also take a look at non-recursive macros (as allowed, e.g., in the SMT-LIB format) for quantifier-free logics, which have a very similar effect to restricting the bit-width of universal variables in quantified logics. We give new complexity results for all fragments and extensions.

8.1 General Quantification

By allowing quantification and uninterpreted functions, and using binary encoding, we obtain UFBV2, the most expressive bit-vector logic considered in this paper.

Theorem 13

UFBV2 is 2-NE xpTime-complete [41].

Proof

It is straightforward to see that UFBV2 ∈ 2-NExpTime, since every UFBV2 formula can be translated exponentially to a formula in UFBV1 ∈ NExpTime (Proposition 4), by applying a simple unary re-encoding to all the scalars in the formula. 2-NExpTime-hardness directly follows from Lemma 5. □

To prove that UFBV2 is 2-NExpTime-hard, we pick a 2-NExpTime-hard problem and then reduce it to UFBV2. We can find a suitable problem among the so-called Domino Tiling problems [17]. First, we give a definition of a domino system and then specify a 2-NExpTime-hard problem on this kind of systems.

Definition 12 (Domino System)

A domino system is a tuple 〈T,H,V,n〉, where

  • T is a finite set of tile types, in our case, T=[0,k−1], where k≥1;

  • H,VT×T are the horizontal and vertical matching conditions, respectively;

  • n≥1, encoded in unary format.

Note that the above definition differs (but not substantially) from the classical one in [17]. Without loss of generality, we use sub-sequential natural numbers for identifying tiles. Similarly to [46 , 47], the size factor n, encoded in unary form, is part of the input. However, while a start tile α and a terminal tile ω is usually used, in our case the starting tile is denoted by 0 and the terminal tile by k−1, without loss of generality.

There are different Domino Tiling problems examined in the literature. In [17], a classical tiling problem is introduced, namely the Square Tiling problem, which can be defined as follows:

Definition 13 (Square Tiling)

Given a domino system 〈T,H,V,n〉, an f(n)-square tiling is a mapping λ:[0,f(n)−1]×[0,f(n)−1]↦T such that

  • the first row starts with the start tile: λ(0,0)=0

  • the last row ends with the terminal tile: λ(f(n)−1,f(n)−1)=k−1

  • all horizontal matching conditions hold:

    $$\left( \lambda(i,j),\lambda(i,j+1) \right) \in H \,\,\, \forall i\leq f(n),j< f(n)-1 $$
  • all vertical matching conditions hold:

    $$\left( \lambda(i,j),\lambda(i+1,j) \right) \in V\,\,\, \forall i< f(n)-1, \ j\leq f(n) $$

In [17], a general theorem on the complexity of Domino Tiling problems is proved. More precisely, the f(n)-square tiling problem can be shown to be NTime (f(n))-complete. In particular, this implies that the \(2^{(2^{n})}\)-square tiling problem is 2-NExpTime-complete.

Lemma 5

The \(2^{(2^{n})}\) -square tiling problem can be reduced to UFBV2.

Proof

Given a domino system 〈T=[0,k−1],H,V,n〉, let us introduce the following notations which we intend to use in the resulting UFBV2 formula.

  • Represent each tile in T with the corresponding bit-vector constant of bit-width L k.

  • Represent the horizontal and vertical matching conditions with the uninterpreted functions (actually predicates) h [1](t 1 [Lk],t 2 [Lk]) and v [1](t 1 [Lk],t 2 [Lk]), respectively.

  • Represent the tiling with an uninterpreted function \(\lambda ^{[{\mathsf {L}k}]}(i^{[2^{n}]} , j^{[2^{n}]})\). λ returns the tile in the cell at the row index i and column index j. Notice that the bit-width of i and j is exponential in the size of the domino system, but due to binary encoding it can represented polynomially.

The resulting UFBV2 formula is as follows:

$$\begin{array}{@{}rcl@{}} \begin{array}{ll} \forall i^{[2^{n}]},j^{[2^{n}]}.& \\ & \lambda(0,0)=0 \quad \land \quad \ \lambda\!\left( 2^{(2^{n})}-1,2^{(2^{n})}-1\right)=k-1 \quad \\ & \land \bigwedge_{(t_{1},t_{2})\in H} h(t_{1},t_{2}) \quad \land \quad \bigwedge_{(t_{1},t_{2})\in V} v(t_{1},t_{2}) \\ & \land \quad \left( \ j \neq 2^{(2^{n})}-1 \ \Rightarrow \ h\left( \lambda(i,j),\lambda(i,j+1) \right) \ \right) \\ & \land \quad \left( \ i \neq 2^{(2^{n})}-1 \ \Rightarrow \ v\left( \lambda(i,j),\lambda(i+1,j) \right) \ \right) \end{array} \end{array} $$

This formula contains four kinds of constants. Three can be encoded directly (\(0^{[2^{n}]}\), 0[Lk], and (k−1)[Lk]). The constant \(2^{(2^{n})}-1\) has to be encoded as \(\sim \! 0^{[2^{n}]}\) in order to avoid an exponential representation. The size of the resulting formula, due to binary encoding on bit-widths, is polynomial in the size of the domino system. □

Similar to Section 6 and to our work in [30], we can now restrict the set of operations in UFBV2 to allow only bitwise operations, equality and left shift by constant (or left shift by 1). We refer to this logic as UFBV2c (or UFBV2≪1, in the case of left shift by 1). From a different point of view, it is also possible to consider this as an extension of QF_BV2c and QF_BV2≪1 by quantifiers and uninterpreted functions.

Since we can use bitwise operations, equality and left shift by constant to express all common operations, UFBV2c remains 2-NExpTime-complete. However, in contrast to quantifier-free logics, we do not lose any expressiveness in UFBV2≪1, either. We can see this already from the fact that we only used bitwise operations, equality and addition in Lemma 5. Since, as we pointed out in Section 7.3, addition can be reduced to bitwise operations, equality and left shift by 1, the following result follows immediately:

Corollary 3

UFBV2 ≪1 is 2-NE xpTime-complete.

Nevertheless, we want to formalize this in a proposition and give a constructive proof by showing how UFBV2c can be reduced to UFBV2≪1.

Proposition 7

UFBV2 ≪c can be reduced to UFBV2 ≪1 .

Proof

Let Φ denote a bit-vector formula, x [n], y [n] fresh bit-vector variables, and f n [n](⋅,⋅) a fresh uninterpreted function of arity 2, taking arguments of bit-width n. Replace each expression t [n]c [n] in Φ with f n [n](t [n],c [n]), extend the quantifier prefix of Φ with ∀x [n],y [n], and add the following two constraints to the matrix of Φ:

$$\begin{array}{@{}rcl@{}} f_{n}{\!}^{[n]}(x,0) &=& x \\ f_{n}{\!}^{[n]}(x,y+1) &=& f_{n}{\!}^{[n]}(x,y) \ll 1 \end{array} $$

While the second constraint still contains addition to improve readability, this can be replaced by using left shift by 1, as described in Section 7.3. □

Remark 2

This result is not very surprising if we consider the alternative characterizations of QF_BV2≪1 and QF_BV2c as given in Section 7. We showed that addition is equally expressive as left shift by 1 and multiplication is equally expressive as left shift by constant. In Peano arithmetic, multiplication is defined by using addition, uninterpreted functions, and quantification. In the context of bit-vectors, this definition of multiplication can be expressed by introducing ∀x [n],y [n] to the quantifier prefix and adding the following constraints:

$$\begin{array}{@{}rcl@{}} f_{n}{\!}^{[n]}\left( x,0 \right) & \ = \ & 0 \\ f_{n}{\!}^{[n]}\left( x,y + 1 \right) & \ = \ & f_{n}{\!}^{[n]}\left( x,y \right) + x \end{array} $$

With these two axioms, the multiplication t 1 [n]t 2 [n] of two elements in Peano arithmetic is uniquely defined by the instance f n [n](t 1 [n],t 2 [n]) of the uninterpreted function f n .

While we were also able to give some complexity results for BV2 in [41], it remains unclear whether BV2 is complete for any complexity class.

Proposition 8

BV2 ∈ E xpSpace and BV2 is NExpTime-hard [41].

Proof

Given a BV2 formula, a simple unary re-encoding can be used to give an exponential translation to BV1 ∈ PSpace (Proposition 3). Therefore, BV2 ∈ ExpSpace. Because of QF_BV2 ⊂ BV2, NExpTime-hardness follows trivially. □

8.2 Restricting the Bit-Width of Universal Variables

We now show that a completeness result can be obtained when a certain restriction to the bit-width of the universal variables is applied. For a given formula Φ∈ BV2, let m a x b w(∃)(Φ) and m a x b w(∀)(Φ) denote the maximal bit-width of all the existentially and universally quantified variables, respectively. (We define m a x b w(∃)(Φ):=0 and m a x b w(∀)(Φ):=0 if Φ does not contain any existential or universal variables respectively.) Now we give a definition, similar to the one of scalar-boundedness in Definition 11:

Definition 14 (Universally Bit-Width Bounded Formula Set)

An infinite set S of quantified bit-vector formulas is universally bit-width bounded, iff there exists a polynomial function \(p:\mathbb {N}\mapsto \mathbb {N}\) such that ∀Φ∈S. max b w(∀)(Φ)≤p(L max b w(∃)(Φ)).

Theorem 14

If S⊂UFBV2 (or S⊂BV2) is universally bit-width bounded, then S∈NExpTime.

Proof

Let S⊂UFBV2 be universally bit-width bounded and let p 0 be the polynomial function that exists according to Definition 14. For any Φ0S, let n:=|Φ0|. We can assume that Φ0 contains at most kn universal variables. Also, let m a x s c l 0) and c n t s c l 0) be defined in the same way as it was done in Section 5. This implies m a x b w(∃)0)≤max s c l 0)≤2n and c n t s c l 0)≤n.

In order to prove that SN ExpTime, we now give a translation into QF_BV1∈ NP which is only single-exponential in n=|Φ0| for any Φ0S. First, all universal variables are eliminated by universal expansion. This produces a quantifier-free formula Φ1∈QF_UFBV2 with

$$\begin{array}{@{}rcl@{}} \textit{max}_{\mathsf{scl}}\left( {\Phi}_{1} \right) & \ = \ & \textit{max}_{\mathsf{scl}}\left( {\Phi}_{0} \right) \ \leq \ 2^{n} \\ \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{1} \right) & \ \leq \ & \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{0} \right) \cdot 2^{k \cdot \textit{max}_{\mathsf{bw}(\forall)}\left( {\Phi}_{0} \right)} \\ & \ \leq \ & \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{0} \right) \cdot 2^{n \cdot p_{0}\left( {\mathsf{L}\textit{max}_{\mathsf{bw}(\exists)}\left( {\Phi}_{0} \right)}\right)} \\ & \ \leq \ & \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{0} \right) \cdot 2^{p_{1}(n)} \end{array} $$

for some polynomial function p 1. Since Φ1 does not contain any (universal) quantifiers, it can be polynomially translated to some Φ2∈QF_BV2, by replacing all uninterpreted functions of Φ1 with bit-vector variables and adding at most quadratic many Ackermann constraints (as in Proposition 2). Therefore,

$$\begin{array}{@{}rcl@{}} \textit{max}_{\mathsf{scl}}\left( {\Phi}_{2} \right) & \ = \ & \textit{max}_{\mathsf{scl}}\left( {\Phi}_{1} \right) \ \leq \ 2^{n} \\ \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{2} \right) & \ \leq \ & p_{2}\left( \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{1} \right)\right) \ \leq \ p_{2}\left( \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{0} \right) \cdot 2^{p_{1}(n)}\right) \end{array} $$

for some polynomial function p 2. In a last step, a unary re-encoding is applied to Φ2 (similar to Proposition 1), resulting in Φ3∈QF_BV1. The size of Φ3 is bounded by

$$\begin{array}{@{}rcl@{}} |{\Phi}_{3}| & \ \leq \ & \textit{max}_{\mathsf{scl}}\left( {\Phi}_{2} \right) \cdot \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{2} \right) + c \\ & \ \leq \ & 2^{n} \cdot p_{2}\left( \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{0} \right) \cdot 2^{p_{1}(n)}\right) + c \ \leq \ 2^{p_{3}(n)} + c \end{array} $$

for some polynomial function p 3. Therefore, Φ3∈QF_BV1 is only single-exponential in the size of Φ0. Together with QF_BV1∈ NP (Proposition 1), this shows that SN ExpTime. □

We now define BVlog⊂ BV2 and UFBVlog⊂ UFBV2 as the set of all Φ∈BV2 and Φ∈UFBV2 with m a x b w(∀)(Φ)≤L max b w(∃)(Φ)+1, respectively. These fragments are of special practical interest, because they can be used to express quantification over array indices if arrays are represented as bit-vectors. Arrays play an important role in automated Software Model Checking as, for example, done in the SAGE project by Microsoft [33]. Quantification over array indices is also discussed in [7], where the so-called bounded array property fragment is addressed.

Theorem 15

BV log and UFBV log are NE xp T ime-complete.

Proof

It is easy to see that BVlog and UFBVlog are NExpTime-hard since both logics are an extension of QF_BV2, which is already NExpTime-hard (Proposition 6). The other direction is a consequence of Theorem 14, since BVlog and UFBVlog are universally bit-width bounded by definition. □

Note that this kind of proof only holds for bit-vector logics with binary encoding. When a unary encoding is used, restricting the bit-width of universal variables does not have any effect on the complexity of the given problem class.

8.3 Non-Recursive Macros

A very similar effect occurs when non-recursive macros are added to our logics. For example, SMT-LIB allows the usage of non-recursive macros via the keywords define-fun and let. In the general case, allowing macros can increase the complexity of a given class. For instance, Boolean formulas extended by non-recursive macros equal to the class of Boolean Programs or Nested Boolean Functions (NBF), which is known to be PSpace-complete [15, 20]. The same obviously holds for QF_BV1.

However, as shown in Theorem 16, extending QF_UFBV2 (and even QF_BV2) with non-recursive macros does not give additional expressiveness, in terms of complexity. Let the subscript \(\mathcal {M}\) denote the fact that, additionally, non-recursive macros can be used in our logic.

Definition 15 (Logic with Non-Recursive Macros)

Given a bit-vector logic \(\mathcal {L}\), let \(\mathcal {L}_{\mathcal {M}}\) denote the set of all bit-vector formulas in the following form:

$$\begin{array}{@{}rcl@{}} Q \ \forall u_{0}{\!}^{[n_{0}]},\dots,u_{k}{\!}^{[n_{k}]} \ . & & t^{[1]} \\ & \land \ & f_{0}{\!}^{[w_{0}]}(u_{0},\dots,u_{k_{0}}) = d_{0}{\!}^{[w_{0}]} \\ & \land \ & {\dots} \\ & \land \ & f_{m}{\!}^{[w_{m}]}(u_{0},\dots,u_{k_{m}}) = d_{m}{\!}^{[w_{m}]} \end{array} $$

where (i) \(Q.t^{[1]}\in \mathcal {L}\), (ii) the universal variables \(u_{i}{\!}^{[n_{i}]}\) do not appear in Q.t [1], (iii) the uninterpreted functions f i are called macros, (iv) the terms \(d_{i}{\!}^{[w_{i}]}\) are called macro definitions, and (v) d i contains no occurrence of f j if ij.

Note that t might contain occurrences of any f i . Expanding a macro f i means to replace all occurrences \(f_{i}(s_{0},\dots ,s_{k_{i}})\) in t with d i σ, where \(s_{0},\dots ,s_{k_{i}}\) are terms and \(\sigma :=\{ u_{0} \mapsto s_{0},\dots ,u_{k_{i}} \mapsto s_{k_{i}} \}\) is a term substitution.

We now introduce a normal form, similar to the flat form in Definition 16, in order to obtain an upper bound for the growth in formula size when applying macro expansion.

Definition 16 (Functional Flat Form)

A bit-vector formula Φ is in function flat form iff every uninterpreted function in Φ has only variables as arguments.

It is easy to see that any Φ can be translated into functional flat form with only linear growth in formula size. Given a term \(f(t_{1}{\!}^{[n_{1}]},\dots ,t_{k}{\!}^{[n_{k}]})\) in Φ, where f is an uninterpreted function, check if t i is a variable: if it is, then x i :=t i ; otherwise let \(x_{i}^{[n_{i}]}\) be a new Tseitin variable existentially quantified at the innermost prefix position, and add the constraint x i =t i to the formula. Then, replace the original term with f(x 1,…,x k ).

Theorem 16

\(\text {QF\_UFBV2}_{\mathcal {M}}\) is NE xp T ime-complete.

Proof

NExpTime-hardness is obvious, since \(\text {QF\_UFBV2} \subset \text {QF\_UFBV2}_{\mathcal {M}}\). Inclusion can be shown in a similar way as it is done in Theorem 14.

Let \({\Phi }_{0} := \forall u_{0}{\!}^{[n_{0}]},\dots ,u_{k}{\!}^{[n_{k}]} \ . \ t \land t_{M}\) be a \(\text {QF\_UFBV2}_{\mathcal {M}}\) formula of size n:=|Φ0|, where t∈QF_UFBV2 and t M consists of all the macro definitions. Assume that t is in functional flat form. We now inductively expand all macros in t, in the order of f m ,f m−1,…,f 0, and also, after each expansion step, we translate the resulting formula into functional flat form again.

First, each macro occurrence \(f_{m}(x_{0},\dots ,x_{k_{m}})\) in t is replaced by an instance d m σ of the macro definition. Since each x i is a variable, we know that |d m σ|=|d m |≤n. Because f m has at most n occurrences in t, expanding f m results in a formula of size bounded by n 2. Recall that we also translate the resulting formula into functional flat form, resulting in formula size bounded linearly in n 2.

Then, we expand f m−1, which now has at most n 2 occurrences. The resulting formula is of size bounded linearly in n 3. By continuing the expansion process with f m−2,…,f 0, we finally obtain from t a formula Φ1∈QF_UFBV2 that contains no more macros. It holds that

$$\begin{array}{@{}rcl@{}} \textit{max}_{\mathsf{scl}}\left( {\Phi}_{1} \right) & \ = \ & \textit{max}_{\mathsf{scl}}\left( {\Phi}_{0} \right) \ \leq \ 2^{n} \\ \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{1} \right) & \ \leq \ & l\left( n^{m+1}\right) \ \leq \ l\left( n^{n}\right) \ \leq \ l\left( 2^{n \cdot {\mathsf{L}n}}\right) \end{array} $$

for some linear function l. We now apply a unary re-encoding to Φ1, yielding Φ2∈QF_UFBV1. The size of Φ2 is bounded by

$$|{\Phi}_{2}| \ \leq \ \textit{max}_{\mathsf{scl}}\left( {\Phi}_{1} \right) \cdot \textit{cnt}_{\mathsf{scl}}\left( {\Phi}_{1} \right) + c \ \leq \ 2^{n} \cdot l\left( 2^{n \cdot {\mathsf{L}n}}\right) + c $$

which is only single exponential in the size of Φ0. This gives \(\text {QF\_UFBV2}_{\mathcal {M}} \in NE\textsc {xp}T\textsc {ime}\). □

9 Practical Considerations

As mentioned in Section 2, our original motivation for considering the complexity of bit-vector logics comes from the fact that state-of-the-art SMT solvers usually rely on bit-blasting when dealing with bit-vector formulas. Our introductory example shows the effect that the exponential explosion caused by bit-blasting can have on a bit-vector formula and, therefore, current SMT solvers often are not able to deal efficiently with bit-vector formulas that are not scalar-bounded.

While our complexity results in Section 6 explain why this is the case from a complexity-theoretic point of view, it is of high practical interest if and how state-of-the-art SMT solvers can profit from this knowledge.

9.1 Alternative Approaches

Instead of using bit-blasting, we can try to find alternative approaches for solving bit-vector formulas.

One possible approach is to polynomially translate bit-vector formulas to some other logic in the same complexity class. For example, target logics for QF_BV2c (or general QF_BV2) are DQBF or EPR, which are both NExpTime-complete [45, 49, 50]. For QF_BV2≪1, a translation to model checking for sequential circuits as given in Lemma 3 can be used instead. In both cases, we can profit from the performance of existing techniques for other problem classes. While DQBF solvers have not been considered at all until our recent work in [28], their performance does not nearly reach the one of current EPR solvers as, e.g., iProver [40]. On the other hand, many efficient model checkers for sequential circuits in SMV or AIGER format exist.

In [42], we therefore gave a polynomial translation from QF_BV2 to EPR (this is in contrast to existing translations in [26, 37], which are not guaranteed to be polynomial in the general case), and then did an experimental evaluation using iProver to solve the resulting EPR formulas. The overall results were rather mixed. While we were able to solve some formulas faster, SMT solvers performed better by orders of magnitude on most other problems considering runtime. Looking at the space requirements, iProver performed better in general. However, the gain was less significant than expected. An explanation for this can be found in the way iProver deals with EPR formulas. By solving propositional overapproximations and iteratively applying instantiations of predicates (the underlying concept is known as the Inst-Gen calculus), the formula can also grow exponentially. Of course this is no flaw in iProver but a direct consequence of the NExpTime-completeness of EPR and QF_BV2.

A different situation occurs when we look at QF_BV2≪1. As seen in our introductory example, bit-blasting can still be exponential for formulas of this class. However, we know that it is possible to solve this kind of formulas in polynomial space, since QF_BV2≪1∈ PSpace. In [29], we therefore presented a polynomial translation from QF_BV2≪1 to SMV. Since current model checkers usually expect input in AIGER format, we then also translated our outputs to AIGER format using smv2aig, which is part of the AIGER library. Our experiments showed that, with growing bit-width, BDD-based model checkers (e.g., NuSMV [18] and IImcFootnote 5, using techniques described in [6, 8], with BDD-engine enabled) outperformed state-of-the-art SMT solvers on almost all of our benchmarks by orders of magnitude in runtime. Considering space requirements, the gain was even more significant. On the other hand, model checkers based on unrolling performed worse and comparable to SMT solvers on most benchmarks. This is not surprising, since unrolling to the full bit-width turns out to be the same as bit-blasting.

Altogether, our experiments show that the theoretical results given in [30, 41] and Section 6 can practically lead to improvements in state-of-the-art SMT solving. It is an interesting open problem to look at these results more closely and to integrate those concepts into SMT solvers in order to to increase their overall performance.

9.2 Benchmark Problems

Another practical outcome of our theoretical work was the creation of several different benchmark sets.

In [42], we proposed two new sets of QF_BV2 benchmarks for our experiments on evaluating the performance of EPR solvers for quantifier-free bit-vector formulas. In connection with our experiments on using model checkers for efficiently solving restricted bit-vector formulas, we generated six more benchmark sets for QF_BV2≪1 in [29].

Another family of benchmarks was directly derived from the discussion on the expressiveness of bit-vector operations in this paper. As we know from Section 6, all common bit-vector operations can be logarithmically expressed (in bit-width) by bitwise operations and equality in combination with shift by constant, multiplication, concatenation, or slicing. While we did not give direct translations for all common bit-vector operations in this work, we encoded most of them into SMT-LIB instances and used Boolector to verify their correctness for various bit-widths.

These benchmarks, together with those from [29, 42], can be found on our webpageFootnote 6 and will be submitted to the SMT-LIB. All of our benchmark sets are challenging for state-of-the-art SMT solvers (as well as for EPR solvers and model checkers) due to the fact that they are not scalar-bounded. For better solvers and future challenges, the difficulty of a problem can be adjusted by simply increasing the bit-widths in the original SMT-LIB instance. Bit-blasted versions of our benchmarks also turned out to be challenging for state-of-the-art SAT solvers in the SAT Competition 2013Footnote 7 [43].

10 Conclusion

We discussed the complexity of deciding various quantified and quantifier-free fixed-size bit-vector logics. In contrast to existing literature, where usually it is not distinguished between unary and binary encoding on scalars in formulas, we argued that it is important to make this distinction. Most of our results apply to the actually much more natural binary encoding as it is also used in standard formats, e.g., in the SMT-LIB format. For this kind of logics, already the quantifier-free fragment without uninterpreted functions (QF_BV2) turned out to be NExpTime-complete [41].

In this paper, we extended our previous work from [30, 41]. We first gave a detailed formal framework for fixed-size bit-vector logics including definitions for syntax and semantics. Our self-contained formalization is the first to consider different encodings and to provide a concrete measure for the size of bit-vector formulas as well as to provide the possibility to include arbitrary bit-vector operations.

Regarding the Common Operator Framework, as used, e.g., in the SMT-LIB format, we then revisited our previous complexity results from [30, 41] and extended those results in several ways. For quantifier-free logics, we combined our earlier work and restructured it to present several of our proofs in a clearer, easier-to-read way, with some small modifications in the proofs.

We then looked at several bit-vector operations and discussed their expressiveness, and checked if these operations can be logarithmically translated to each other (in bit-width). This kind of analysis helps to understand the complexity that is inherent in certain classes of bit-vector formulas and its relation to the kind of encoding used for bit-widths. While this allows us to check what kind of properties can be expressed in a given fragment, it also enables us to identify easier subclasses of formulas, which then can be solved more efficiently in practice by applying specialized algorithms.

Considering quantified logics, it is still an open question whether BV2 is complete for any complexity class. However, we could give some new results for quantified logics with a restriction on the bit-width of universal variables. We introduced the notion of universally bit-width bounded problems and showed that this kind of problems are in NExpTime. This then allowed us to prove that BV2 l o g is NExpTime-complete. Since bit-vector logics with arrays represented by bit-vectors are in this set if quantification is only allowed on array indices, this class is of particular practical interest.

For a last complexity theoretical result, we looked into \(\text {QF\_BV2}_{\mathcal {M}}\), the class of quantifier-free bit-vector logics extended with non-recursive macros, as allowed, e.g., in the SMT-LIB format. Again, we showed that this logic remains NExpTime-complete. Altogether, we provide the currently most complete overview on the complexity of common bit-vector logics.

To point out that our theoretical insights are also interesting from a practical point of view, we briefly discussed two approaches of solving bit-vector formulas not by bit-blasting but by using translations based on our complexity results. While bit-blasting is exponential in general, we proposed polynomial translations into EPR and SMV in recent practical work [29, 42], to show that bit-vector solvers can indeed profit from our techniques. Several QF_BV2 benchmark families that we created throughout our work turned out to be challenging for state-of-the-art SMT and SAT solvers

For future work, it is still an interesting topic to consider our results in the context of parametrized complexity [24]. In particular, our definitions of (polynomially) scalar-bounded and universally bit-width bounded problem sets might be of relevance in this context. So far, mainly problems in NP are considered in parametrized complexity. This is another reason why extending our work in this direction is of special interest. Also, as already mentioned, the complexity of BV2 is still another open problem. Finally, from the practical side, it would be interesting to investigate how state-of-the-art SMT solvers can profit most from our insights and techniques.