Fast Three-Valued Abstract Bit-Vector Arithmetic

Onderka, Jan; Ratschan, Stefan

doi:10.1007/978-3-030-94583-1_12

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13182))

Included in the following conference series:

International Conference on Verification, Model Checking, and Abstract Interpretation

939 Accesses
2 Citations

Abstract

Abstraction is one of the most important approaches for reducing the number of states in formal verification. An important abstraction technique is the usage of three-valued logic, extensible to bit-vectors. The best abstract bit-vector results for movement and logical operations can be computed quickly. However, for widely-used arithmetic operations, efficient algorithms for computation of the best possible output have not been known up to now.

In this paper, we present new efficient polynomial-time algorithms for abstract addition and multiplication with three-valued bit-vector inputs. These algorithms produce the best possible three-valued bit-vector output and remain fast even with 32-bit inputs.

To obtain the algorithms, we devise a novel modular extreme-finding technique via reformulation of the problem using pseudo-Boolean modular inequalities. Using the introduced technique, we construct an algorithm for abstract addition that computes its result in linear time, as well as a worst-case quadratic-time algorithm for abstract multiplication. Finally, we experimentally evaluate the performance of the algorithms, confirming their practical efficiency.

This work was supported by the Czech Technical University (CTU) grant No. SGS20/211/OHK3/3T/18 and institutional financing of the Institute of Computer Science (RVO:67985807).

Access provided by Autonomous University of Puebla. Download conference paper PDF

Sound Bit-Precise Numerical Domains

Interpolating bit-vector formulas using uninterpreted predicates and Presburger arithmetic

Article Open access 12 May 2021

Invertible Linear Transforms of Numerical Abstract Domains

Keywords

1 Introduction

In traditional microprocessors, the core operations are bitwise logical operations and fixed-point wrap-around arithmetic. Behaviour of programs in machine code can be formally verified by model checking, enumerating all possible system states and transitions (state space) and then verifying their properties. Unfortunately, naïve exhaustive enumeration of states quickly leads to prohibitively large state spaces (state space explosion), making verification infeasible.

State space explosion may be mitigated by a variety of techniques. One of them is abstraction, where a more efficient state space structure preserving certain properties of the original is constructed [3, p. 17]. Typically, the formal verification requirement is that it should be impossible to prove anything not provable in the original state space (soundness for true), while allowing overapproximation, leading to the possibility of a false counterexample.

For machine-code model checking, three-valued abstract representation of bits was introduced in [7] where each abstract bit can have value “zero”, “one”, or “perhaps one, perhaps zero” (unknown). Using this abstraction, bit and bit-vector movement operations may be performed directly on abstract bits. Each movement operation produces a single abstract result, avoiding state space explosion. The caveat is that overapproximation is incurred as relationships between unknown values are lost.

Three-valued representation was further augmented in [11] via bitwise logic operations (AND, OR, NOT...) with a single abstract result, further reducing state space explosion severity. However, other operations still required instantiation of the unknown values to enumerate all concrete input possibilities, treating each arising output possibility as distinct. This would lead not only to output computation time increasing exponentially based on the number of unknown bits, but also to potential creation of multiple new states and the possibility of severe state space explosion. For example, an operation with two 32-bit inputs and a 32-bit output could require up to $2^{64}$ concrete operation computations and could produce up to $2^{32}$ new states.

The necessity of instantiation when encountering arithmetic operations had severely reduced usefulness of a microcontroller machine-code model checker with three-valued abstraction developed by one of the authors [8]. This prompted our research in performing arbitrary operations without instantiation, with emphasis on fast computation of results of arithmetic operations.

1.1 Our Contribution

In this paper, we formulate the forward operation problem, where an arbitrary operation performed on three-valued abstract bit-vector inputs results in a single three-valued abstract bit-vector output which preserves soundness of model checking. While the best possible output can always be found in worst-case time exponential in the number of three-valued input bits, this is slow for 8-bit binary operations and infeasible for higher powers of two.

To aid with construction of polynomial-time worst-case algorithms, we devise a novel modular extreme-search technique. Using this technique, we find a linear-time algorithm for abstract addition and a worst-case quadratic-time algorithm for abstract multiplication.

Our results will allow model checkers that use the three-valued abstraction technique to compute the state space faster and to manage its size by only performing instantiation when necessary, reducing the risk of state space explosion.

2 Related Work

Many-valued logics have been extensively studied on their own, including Kleene logic [6] used for three-valued model checking [11]. In [10], three-valued logic was used for static program analysis of 8-bit microcontroller programs. Binary decision diagrams (BDDs) were used to compress input-output relationships for arbitrary abstract operations. This resulted in high generation times and storage usage, making the technique infeasible to use with 16-bit or 32-bit operands. These restrictions are not present in our approach where we produce the abstract operation results purely algorithmically, but precomputation may still be useful for abstract operations with no known worst-case polynomial-time algorithms.

In addition to machine-code analysis and verification, multivalued logics are also widely used for register-transfer level digital logic simulation. The IEEE 1164 standard [5] introduces nine logic values, out of which ‘0’ (zero), ‘1’ (one), and ‘X’ (unknown) directly correspond to three-valued abstraction. For easy differentiation between concrete values and abstract values, we will use the IEEE 1164 notation in this paper, using single quotes to represent an abstract bit as well as double quotes to represent an abstract bit-vector (tuple of abstract bits), e.g. “0X1” means (‘0’, ‘X’, ‘1’). While we primarily consider microprocessor machine-code model checking as our use case, we note that the presented algorithms also might be useful for simulation, automated test pattern generation, and formal verification of digital circuits containing adders and multipliers.

In [14], it was proposed that instantiation may be performed based only on interesting variables. For example, if a status flag “zero” is of interest, a tuple of values “XX” from which the flag is computed should be replaced by the possibilities {“00”, “1X”, “X1”}. This leads to lesser state space explosion compared to naïve instantiation, but is not relevant for our discussion as we discuss avoiding instantiation entirely during operation resolution.

In the paper, we define certain pseudo-Boolean functions and search for their global extremes. This is also called pseudo-Boolean optimization [2]. Problems in this field are often NP-hard. However, pseudo-Boolean functions for addition and multiplication that we will use in this paper have special forms that will allow us to resolve the corresponding problems in polynomial time without having to resort to advanced pseudo-Boolean optimization techniques.

3 Basic Definitions

Let us consider a binary concrete operation which produces a single M-bit output for each combination of two N-bit operands, i.e. $r: \mathbb {B}^N \times \mathbb {B}^N \rightarrow \mathbb {B}^M$. We define the forward operation problem as the problem of producing a single abstract bit-vector output given supplied abstract inputs, preserving soundness. The output is not pre-restricted (the operation computation moves only forward). To preserve soundness, the abstract output must contain all possible concrete outputs that would be generated by first performing instantiation, receiving a set of concrete possibilities, and then performing the operation on each possibility.

To easily formalize this requirement, we first formalize three-valued abstraction using sets. Each three-valued abstract bit value (‘0’,‘1’,‘X’) identifies all possible values the corresponding concrete bit can take. We define the abstract bit as a subset of $\mathbb {B}= \{0,1\}$ and the abstract bit values as

(1)

This formalization corresponds exactly to the meaning of ‘X’ as “possibly 0, possibly 1”. Even though $\emptyset $ is also a subset of $\mathbb {B}$, it is never assigned to any abstract bit as there is always at least a single output possibility.

If an abstract bit is either ‘0’ or ‘1’, we consider it known; if it is ‘X’, we consider it unknown. For ease of representation in equations, we also introduce an alternative math-style notation $\hat{X} {\mathop {=}\limits ^{\text {def}}}\{0,1\}.$

Next, we define abstract bit-vectors as tuples of abstract bits. For clarity, we use hat symbols to denote abstract bit-vectors and abstract operations. We use zero-based indexing for simplicity of representation and correspondence to typical implementations, i.e. $\hat{a}_0$ means the lowest bit of abstract bit-vector $\hat{a}$. We denote slices of the bit-vectors by indexing via two dots between endpoints, i.e. $\hat{a}_{0..2}$ means the three lowest bits of abstract bit-vector $\hat{a}$. In case the slice reaches higher than the most significant bit of an abstract bit-vector, we assume it to be padded with ‘0’, consistent with interpretation as an unsigned number.

3.1 Abstract Bit Encodings

In implementations of algorithms, a single abstract bit may be represented by various encodings. First, we formalize a zeros-ones encoding of abstract bit $\hat{a}_i$ using concrete bits $a^0_i \in \mathbb {B}$, $a^1_i \in \mathbb {B}$ via

$$\begin{aligned} \begin{aligned}&a^0_i = 1 \iff 0 \in \hat{a}_i, \; a^1_i = 1 \iff 1 \in \hat{a}_i, \end{aligned} \end{aligned}$$

(2)

which straightforwardly extends to bit-vectors $a^0$, $a^1$. Assuming $\hat{a}$ has $A \in \mathbb {N}_0$ bits, $\hat{a} \in (2^{\mathbb {B}})^A$, while $a^0 \in {\mathbb {B}}^A$, $a^1 \in {\mathbb {B}}^A$, i.e. they are concrete bit-vectors.

We also formalize a mask-value encoding: the mask bit $a^\text {m}_i = 1$ exactly when the abstract bit is unknown. When the abstract bit is known, the value bit $a^\text {v}_i$ corresponds to the abstract value (0 for ‘0’, 1 for ‘1’), as previously used in [11]. For simplicity, we further require $a^\text {v}_i = 0$ if $a^\text {m}_i = 1$. We formalize the encoding of abstract bit $\hat{a}_i$ using concrete bits $a^\text {m}_i \in \mathbb {B}$, $a^\text {v}_i \in \mathbb {B}$ via

$$\begin{aligned} \begin{aligned}&a^\text {m}_i = 1 \iff 0 \in \hat{a}_i \wedge 1 \in \hat{a}_i, \; a^\text {v}_i = 1 \iff 0 \notin \hat{a}_i \wedge 1 \in \hat{a}_i, \end{aligned} \end{aligned}$$

(3)

which, again, straightforwardly extends to bit-vectors $a^\text {m} \in {\mathbb {B}}^A$ and $a^\text {v} \in {\mathbb {B}}^A$. We note that the encodings can be quickly converted via

$$\begin{aligned} \begin{aligned}&a^0_i = 1 \iff a^\text {m}_i = 1 \vee a^\text {v}_i = 0, \; a^1_i = 1 \iff a^\text {m}_i = 1 \vee a^\text {v}_i = 1, \\ \;&a^\text {m}_i = 1 \iff a^0_i = 1 \wedge a^1_i = 1, \; a^\text {v}_i = 1 \iff a^0_i = 0 \wedge a^1_i = 1. \end{aligned} \end{aligned}$$

(4)

We note that when interpreting each concrete possibility in abstract bit-vector $\hat{a}$ as an unsigned binary number, $a^\text {v}$ corresponds to the minimum, while $a^1$ corresponds to the maximum. For conciseness and intuitiveness, we will not explicitly note the conversions in the presented algorithms. Furthermore, where usage of arbitrary encoding is possible, we will write the hat-notated abstract bit-vector, e.g. $\hat{a}$.

3.2 Abstract Transformers

We borrow the notions defined in this subsection from abstract interpretation [4, 12], adapting them for the purposes of this paper.

The set of concrete bit-vector possibilities given by a tuple containing A abstract bits, $\hat{a} \in (2^{\mathbb {B}})^A$, is given by a concretization function $\gamma : (2^{\mathbb {B}})^A \rightarrow 2^{(\mathbb {B}^A)}$,

$$\begin{aligned} \gamma (\hat{a}) {\mathop {=}\limits ^{\text {def}}}\{ a \in \mathbb {B}^A \; | \; \forall i \in \{0, \dots , A-1\} \; . \; a_i \in \hat{a}_i \}. \end{aligned}$$

(5)

Conversely, the transformation of a set of bit-vector possibilities $C \in 2^{(\mathbb {B}^A)}$ to a single abstract bit-vector $\hat{a} \in (2^{\mathbb {B}})^A$ is determined by an abstraction function $\alpha : 2^{(\mathbb {B}^A)} \rightarrow (2^{\mathbb {B}})^A$ which, to prevent underapproximation and to ensure soundness of model checking, must fulfill $C \subseteq \gamma ( \alpha (C) ).$

An abstract operation $\hat{r}:~(2^{\mathbb {B}})^N \times (2^{\mathbb {B}})^N \rightarrow (2^{\mathbb {B}})^M$ corresponding to concrete operation $r: \mathbb {B}^N \times \mathbb {B}^N \rightarrow \mathbb {B}^M$ is an approximate abstract transformer if it overapproximates r, that is,

$$\begin{aligned} \begin{aligned}&\forall \hat{a} \in (2^{\mathbb {B}})^N, \hat{b} \in (2^{\mathbb {B}})^N \; . \; \{ r(a,b) \; | \; a \in \gamma (\hat{a}), b \in \gamma (\hat{b})\} \subseteq \gamma (\hat{r}(\hat{a},\hat{b})). \end{aligned} \end{aligned}$$

(6)

The number of concrete possibilities $|\gamma ( \alpha (C) )|$ should be minimized to prevent unnecessary overapproximation. For three-valued bit-vectors, the best abstraction function $\alpha ^{\text {best}}$ is uniquely given by

$$\begin{aligned} \forall i \in \{0, \dots , A-1\} \; . \; (\alpha ^{\text {best}} (C))_i {\mathop {=}\limits ^{\text {def}}}\{ c_i \in \mathbb {B}\; | \; c \in C \}. \end{aligned}$$

(7)

By using $\alpha ^{\text {best}}$ to perform the abstraction on the minimal set of concrete results from Eq. 6, we obtain the best abstract transformer for arbitrary concrete operation r, i.e. an approximate abstract transformer resulting in the least amount of overapproximation, uniquely given as

$$\begin{aligned} \hat{r}^{\text {best}}_k(\hat{a}, \hat{b}) = \alpha ^{\text {best}}(\{ r_k(a, b) \; | \; a \in \gamma (\hat{a}), b \in \gamma (\hat{b}) \}). \end{aligned}$$

(8)

We note that when no input abstract bit is $\emptyset $, there is at least one concrete result r(a, b) and no output abstract bit can be $\emptyset $. Thus, three-valued representation is truly sufficient.

3.3 Algorithm Complexity Considerations

We will assume that the presented algorithms are implemented on a general-purpose processor that operates on binary machine words and can compute bitwise operations, bit shifts, addition and subtraction in constant time. Every bit-vector used fits in a machine word. This is a reasonable assumption, as it is likely that the processor used for verification will have machine word size equal to or greater than the processor that runs the program under consideration.

We also assume that the ratio of M to N is bounded, allowing us to express the presented algorithm time complexities using only N. Memory complexity is not an issue as the presented algorithms use only a fixed amount of temporary variables in addition to the inputs and outputs.

3.4 Naïve Universal Abstract Algorithm

Equation 8 immediately suggests a naïve algorithm for computing $\hat{r}^{\text {best}}$ for any given $\hat{a}, \hat{b}$: enumerating all $a, b \in 2^{(\mathbb {B}^N)}$, filtering out the ones that do not satisfy $a \in \gamma (\hat{a}) \; \wedge \; b \in \gamma (\hat{b})$, and marking the results of r(a, b), which is easily done in the zeros-ones encoding. This naïve algorithm has a running time of $\varTheta (2^{2N})$.

Average-case computation time can be improved by only enumerating unknown input bits, but worst-case time is still exponential. Even for 8-bit binary operations, the worst-case input combination (all bits unknown) would require $2^{16}$ concrete operation computations. For 32-bit binary operations, it would require $2^{64}$ computations, which is infeasible. Finding worst-case polynomial-time algorithms for common operations is therefore of significant interest.

4 Formal Problem Statement

Theorem 1

The best abstract transformer of abstract bit-vector addition is computable in linear time.

Theorem 2

The best abstract transformer of abstract bit-vector multiplication is computable in worst-case quadratic time.

In Sect. 5, we will introduce a novel modular extreme-finding technique which will use a basis for finding fast best abstract transformer algorithms. Using this technique, we will prove Theorems 1 and 2 by constructing corresponding algorithms in Sects. 6 and 7, respectively. We will experimentally evaluate the presented algorithms to demonstrate their practical efficiency in Sect. 8.

5 Modular Extreme-Finding Technique

The concrete operation function r may be replaced by a pseudo-Boolean function $h: \mathbb {B}^N \times \mathbb {B}^N \rightarrow \mathbb {N}_0$ where the output of r is the output of h written in base 2. Surely, that fulfills

$$\begin{aligned} \begin{aligned}&\forall a \in \mathbb {B}^N, b \in \mathbb {B}^N, \forall k \in \{0, \dots , M-1\} \; . \; \\&\qquad \quad r_k(a,b) = 1 \iff (h(a,b) \bmod 2^{k+1}) \ge 2^k. \end{aligned} \end{aligned}$$

(9)

The best abstract transformer definition in Eq. 8 is then equivalent to

$$\begin{aligned} \begin{aligned}&\forall k \in \{0, \dots , M-1\} \; . \; \\&\qquad \quad (0 \in \hat{r}^{\text {best}}_k \iff \exists a \in \gamma (\hat{a}), b \in \gamma (\hat{b}) \; . \; (h(a,b) \bmod 2^{k+1}) < 2^k) \; \wedge \\&\qquad \quad (1 \in \hat{r}^{\text {best}}_k \iff \exists a \in \gamma (\hat{a}), b \in \gamma (\hat{b}) \; . \; (h(a,b) \bmod 2^{k+1}) \ge 2^k). \end{aligned} \end{aligned}$$

(10)

The forward operation problem is therefore transformed into a problem of solving certain modular inequalities, which is possible in polynomial time for certain operations. We will later show that these include addition and multiplication.

If the inequalities were not modular, it would suffice to find the global minimum and maximum (extremes) of h. Furthermore, the modular inequalities in Eq. 10 can be thought of as alternating intervals of length $2^k$. Intuitively, if it was possible to move from the global minimum to the global maximum in steps of at most $2^k$ by using different values of $a \in \hat{a}, b \in \hat{b}$ in h(a, b), it would suffice to find the global extremes and determine whether they are in the same $2^k$ interval. If they were, only one of the modular inequalities would be satisfied, resulting in known $r_k$ (either ‘0’ or ‘1’). If they were not, each modular inequality would be satisfied by some a, b, resulting in $r_k = \hat{X}$.

We will now formally prove that our reasoning for this modular extreme-finding method is indeed correct.

Lemma 1

Consider a sequence of integers $t=(t_0, t_1, \dots , t_{T-1})$ that fulfills

$$\begin{aligned} \begin{aligned} \forall n \in [0,T-2] \; . \;&|t_{n+1} - t_n| \le 2^k.\\ \end{aligned} \end{aligned}$$

(11)

Then,

$$\begin{aligned} \begin{aligned}&\exists v \in [\min {t},\max {t}] \; . \; (v \bmod 2^{k+1})< 2^k \iff \;\\&\exists n \in [0,T-1] \; . \; (t_n\bmod 2^{k+1}) < 2^k. \end{aligned} \end{aligned}$$

(12)

Proof

As the sequence t is a subset of range $[\min {t},\max {t}]$, the backward direction is trivial. The forward direction trivially holds if v is contained in t. If it is not, it is definitely contained in some range $(v^-, v^+)$, where $v^-$, $v^+$ are successive values in the sequence t. Since $|v^+ - v^-| \le 2^k$, ${(v^- \bmod 2^{k+1}) < 2^k}$, and ${(v^+ \bmod 2^{k+1}) < 2^k}$, the value v in range $(v^-, v^+)$ definitely must also fulfill $(v \bmod 2^{k+1}) < 2^k$. $\square $

Theorem 3

Consider a pseudo-Boolean function ${f: \mathbb {B}^N \times \mathbb {B}^N \rightarrow \mathbb {Z}}$, two inputs $\hat{a}, \hat{b} \in (2^{\mathbb {B}})^N$, and a sequence $p = (p_0, p_1, \dots , p_{P-1})$ where each element is a pair $(a, b) \in (\gamma (\hat{a}), \gamma (\hat{b}))$, that fulfill

$$\begin{aligned} \begin{aligned}&\forall n \in [0, P-2] \; . \; |f(p_{n+1}) - f(p_n)| \le 2^k,\\&f(p_0) = \min _{\begin{array}{c} a \in \gamma (\hat{a})\\ b \in \gamma (\hat{b}) \end{array}}{f(a,b)}, \\&f(p_{P-1}) = \max _{\begin{array}{c} a \in \gamma (\hat{a})\\ b \in \gamma (\hat{b}) \end{array}}{f(a,b)}. \end{aligned} \end{aligned}$$

(13)

Then,

$$\begin{aligned} \begin{aligned} \forall \, C \in \mathbb {Z} \; . \; (&\exists a \in \gamma (\hat{a}), b \in \gamma (\hat{b}) \; . \; ((f(a, b) + C) \bmod 2^{k+1})< 2^k\\ \iff&\exists n \in [0, P-1] \; . \; ((f(p_n) + C) \bmod 2^{k+1}) < 2^k). \end{aligned} \end{aligned}$$

(14)

Proof

Since each element of p is a pair $(a, b) \in (\gamma (\hat{a}), \gamma (\hat{b}))$, the backward direction is trivial. For the forward direction, use Lemma 1 to convert the sequence $(f(p_n) + C)_{n=0}^{P-1}$ to range $[f(p_0) + C, f(p_{P-1}) + C]$ and rewrite the forward direction as

$$\begin{aligned} \begin{aligned}&\forall \, C \in \mathbb {Z} \; . \; (\exists a \in \gamma (\hat{a}), b \in \gamma (\hat{b}) \; . \; ((f(a, b) + C) \bmod 2^{k+1})< 2^k \implies \\&\exists v \in \bigg [\min _{\begin{array}{c} a \in \gamma (\hat{a})\\ b \in \gamma (\hat{b}) \end{array}}{(f(a,b) + C)} \; , \max _{\begin{array}{c} a \in \gamma (\hat{a})\\ b \in \gamma (\hat{b}) \end{array}}{(f(a,b) + C)} \bigg ] \; . \; (v \bmod 2^{k+1}) < 2^k). \end{aligned} \end{aligned}$$

(15)

The implication clearly holds, completing the proof. $\square $

While Theorem 3 forms a basis for the modular extreme-finding method, there are two problems. First, finding global extremes of a pseudo-Boolean function is not generally trivial. Second, the step condition, that is, the absence of a step longer than $2^k$ in h, must be ensured. Otherwise, one of the inequality intervals could be “jumped over”. For non-trivial operators, steps longer than $2^k$ surely are present in h for some k. However, instead of h, it is possible to use a tuple of functions $(h_k)_{k=0}^{M-1}$ where each one fulfills Eq. 10 for a given k exactly when h does. This is definitely true if each $h_k$ is congruent with h modulo $2^{k+1}$.

Fast best abstract transformer algorithms can now be formed based on finding extremes of $h_k$, provided that $h_k$ changes by at most $2^k$ when exactly one bit of input changes its value, which implies that a sequence p with properties required by Theorem 3 exists. For ease of expression of the algorithms, we define a function which discards bits of a number x below bit k (or, equivalently, performs integer division by $2^k$),

$$\begin{aligned} \zeta _k(x) = \left\lfloor \frac{x}{2^k} \right\rfloor . \end{aligned}$$

(16)

For conciseness, given inputs $\hat{a} \in (2^{\mathbb {B}})^N, \hat{b} \in (2^{\mathbb {B}})^N$, we also define

$$\begin{aligned} \begin{aligned} h^{\text {min}}_k {\mathop {=}\limits ^{\text {def}}}\min _{\begin{array}{c} a \in \gamma (\hat{a})\\ b \in \gamma (\hat{b}) \end{array}}{h_k(a,b)}, h^{\text {max}}_k {\mathop {=}\limits ^{\text {def}}}\max _{\begin{array}{c} a \in \gamma (\hat{a})\\ b \in \gamma (\hat{b}) \end{array}}{h_k(a,b)}, \end{aligned} \end{aligned}$$

(17)

Equation 10 then can be reformulated as follows: if $\zeta _k(h^{\text {min}}_k) \ne \zeta _k(h^{\text {max}}_k)$, both inequalities are definitely fulfilled (as each one must be fulfilled by some element of the sequence) and output bit k is unknown. Otherwise, only one inequality is fulfilled, the output bit k is known and its value corresponds to $\zeta _k(h^{\text {min}}_k) \bmod 2$. This forms the basis of Algorithm 1, which provides a general blueprint for fast abstract algorithms. Proper extreme-finding for the considered operation must be added to the algorithm, denoted by $(\dots )$ in the algorithm pseudocode. We will devise extreme-finding for fast abstract addition and multiplication operations in the rest of the paper.

6 Fast Abstract Addition

To express fast abstract addition using the modular extreme-finding technique, we first define a function expressing the unsigned value of a concrete bit-vector a with an arbitrary number of bits A,

$$\begin{aligned} \varPhi (a) {\mathop {=}\limits ^{\text {def}}}\sum _{i=0}^{A-1} 2^i a_i. \end{aligned}$$

(18)

Pseudo-Boolean addition is then defined simply as

$$\begin{aligned} \begin{aligned} h^+(a,b) {\mathop {=}\limits ^{\text {def}}}\varPhi (a) + \varPhi (b). \end{aligned} \end{aligned}$$

(19)

To fulfill the step condition, we define

$$\begin{aligned} \begin{aligned} h_k^+(a,b) = \varPhi (a_{0..k}) + \varPhi (b_{0..k}). \end{aligned} \end{aligned}$$

(20)

This is congruent with $h^+$ modulo $2^{k+1}$. The step condition is trivially fulfilled for every function $h_k^+$ in $(h^+_k)_{k=0}^{M-1}$, as changing the value of a single bit of a or b changes the result of $h_k^+$ by at most $2^k$. We note that this is due to $h^+$ having a special form where only single-bit summands with power-of-2 coefficients are present. Finding the global extremes is trivial as each summand only contains a single abstract bit. Recalling Subsect. 3.1, the extremes can be obtained as

$$\begin{aligned} \begin{aligned} h^{+,\text {min}}_k \leftarrow \varPhi (a^\text {v}_{0..k}) + \varPhi (b^\text {v}_{0..k}),\\ h^{+,\text {max}}_k \leftarrow \varPhi (a^1_{0..k}) + \varPhi (b^1_{0..k}). \end{aligned} \end{aligned}$$

(21)

The best abstract transformer for addition is obtained by combining Eq. 21 with Algorithm 1. Time complexity is trivially $\varTheta (N)$, proving Theorem 1. Similar reasoning can be used to obtain fast best abstract transformers for subtraction and general summation, only changing computation of $h^{\text {min}}_k$ and $h^{\text {max}}_k$.

For further understanding, we will show how fast abstract addition behaves for “X0” + “11”:

(22)

For $M=2$, the result is “XX1”. For $M>2$, the result is padded by ‘0’ to the left, preserving the unsigned value of the output. For $M<2$, the addition is modular. This fully corresponds to behaviour of concrete binary addition.

7 Fast Abstract Multiplication

Multiplication is typically implemented on microprocessors with three different input signedness combinations: unsigned $\times $ unsigned, signed $\times $ unsigned, and signed $\times $ signed, with signed variables using two’s complement encoding. It is a well-known fact that the signed-unsigned and signed multiplication can be converted to unsigned multiplication by extending the signed multiplicand widths to product width using an arithmetic shift right. This could pose problems when the leading significant bit is ‘X’, but it can be split beforehand into two cases, ‘0’ and ‘1’. This allows us to only consider unsigned multiplication in this section, signed multiplication only incurring a constant-time slowdown.

7.1 Obtaining a Best Abstract Transformer

Abstract multiplication could be resolved similarly to abstract addition by rewriting multiplication as addition of a sequence of shifted summands (long multiplication) and performing fast abstract summation. However, this does not result in a best abstract transformer. The shortest counterexample is “11”$\, \cdot \,$“X1”. Here, the unknown bit $b_1$ is added twice before influencing $r_2$, once as a summand in the computation of $r_2$ and once as a carryover from $r_1$:

In fast abstract summation, the summand $b_1$ is treated as distinct for each output bit computation, resulting in unnecessary overapproximation of multiplication.

Instead, to obtain a fast best abstract transformer for multiplication, we apply the modular extreme-finding technique to multiplication itself, without intermediate conversion to summation. Fulfilling the maximum $2^k$ step condition is not as easy as previously. The multiplication output function $h^*$ is defined as

$$\begin{aligned} \begin{aligned} h^*(a,b) {\mathop {=}\limits ^{\text {def}}}\varPhi (a) \cdot \varPhi (b) = \sum _{i=0}^{N-1} \; \sum _{j=0}^{N-1} \; 2^{i+j} a_i b_j. \end{aligned} \end{aligned}$$

(23)

One could try to use congruences to remove some summands from $h^*_k$ while keeping all remaining summands positive. This would result in

$$\begin{aligned} \begin{aligned} h_k(a,b) = \sum _{i=0}^{k} \; \sum _{j=0}^{k-i} \; 2^{i+j} a_i b_j. \end{aligned} \end{aligned}$$

(24)

Changing a single bit $a_i$ would change the result by $\sum _{j=0}^{k-i} \; 2^{i+j} b_j$. This sums to at most $2^{k+1}-1$ and thus does not always fulfill the maximum $2^k$ step condition. However, the sign of the summand $2^k a_i b_{k-i}$ can be flipped due to congruence modulo $2^{k+1}$, after which the change of result from a single bit flip is always in the interval $[-2^k,2^k-1]$. Therefore, to fulfill the maximum $2^k$ step condition, we define $h^*_k: \mathbb {B}^N \times \mathbb {B}^N \rightarrow \mathbb {Z}$ as

$$\begin{aligned} \begin{aligned} h^*_k(a,b) {\mathop {=}\limits ^{\text {def}}}\left( - \sum _{i=0}^{k} 2^k a_i b_{k-i} \right) + \left( \sum _{i=0}^{k-1} \; \sum _{j=0}^{k-i-1} \; 2^{i+j} a_i b_j \right) . \end{aligned} \end{aligned}$$

(25)

For more insight into this definition, we will return to the counterexample to the previous approach, “11”$\, \cdot \,$“X1”, which resulted in unnecessary overapproximation for $k=2$. Writing $h^*_2$ computation as standard addition similarly to the previously shown long multiplication, the carryover of $b_1$ is counteracted by the summand $-2^2 b_1$:

It is apparent that $\zeta _2(h^{\text {min}}_2) = \zeta _k(h^{\text {max}}_2) = 0$ and unnecessary overapproximation is not incurred. Using that line of thinking, the definition of $h^*_k$ in Eq. 25 can be intuitively regarded as ensuring that the carryover of an unknown bit into the k-th column is neutralized by a corresponding k-th column summand. Consequently, if the unknown bit can appear only in both of them simultaneously, no unnecessary overapproximation is incurred.

While the maximum $2^k$ step condition is fulfilled in Eq. 25, extreme-finding is much more complicated than for addition, becoming heavily dependent on abstract input bit pairs of form $(\hat{a}_i, \hat{b}_{k-i})$ where $0 \le i \le k$. Such pairs result in a summand $-2^k a_i b_{k-i}$ in $h^*_k$. When multiplication is rewritten using long multiplication as previously, this summand is present in the k-th column. We therefore name such pairs k-th column pairs for conciseness.

In Subsect. 7.2, we show that if at most one k-th column pair where $\hat{a}_i = \hat{b}_{k-i} = \hat{X}$ (double-unknown pair) exists, extremes of $h^*_k$ can be found easily. In Subsect. 7.3, we prove that if at least two double-unknown pairs exist, $r_k=\hat{X}$. Taken together, this yields a best abstract transformer algorithm for multiplication. In Subsect. 7.4, we discuss implementation considerations of the algorithm with emphasis on reducing computation time. Finally, in Subsect. 7.5, we present the final algorithm.

7.2 At Most One Double-Unknown k-th Column Pair

An extreme is given by values $a \in \hat{a}, b \in \hat{b}$ for which the value $h^*_k(a,b)$ is minimal or maximal (Eq. 17). We will show that such a, b can be found successively when at most one double-unknown k-th column pair is present.

First, for single-unknown k-th column pairs where $\hat{a}_i = \hat{X}$, $\hat{b}_{k-i} \ne \hat{X}$, we note that in Eq. 25, the difference between $h^*_k$ when $a_i=1$ and when $a_i=0$ is

$$\begin{aligned} \begin{aligned} h^*_k(a,b \; | \; a_i=1) - h^*_k(a,b \; | \; a_i=0) = - 2^k b_{k-i} + \sum _{j=0}^{k-i-1} \; 2^{i+j} b_j. \end{aligned} \end{aligned}$$

(26)

Since the result of the sum over j must be in the interval $[0,2^k-1]$, the direction of the change (negative or non-negative) is uniquely given by the value of $b_{k-i}$, which is known. It is therefore sufficient to ensure $a^{\text {min}}_i \leftarrow b_{k-i}$ when minimizing and $a^{\text {min}}_i \leftarrow 1 - b_{k-i}$ when maximizing. Similar reasoning can be applied to single-unknown k-th column pairs where $\hat{a}_i \ne \hat{X}$, $\hat{b}_{k-i} = \hat{X}$.

After assigning values to all unknown bits in single-unknown k-th column pairs, the only still-unknown bits are the ones in the only double-unknown k-th column pair present. In case such a pair $\hat{a}_i = \hat{X}$, $\hat{b}_j = \hat{X}, j = k-i$ is present, the difference between $h^*_k$ when $a_i$ and $b_j$ are set to arbitrary values and when they are set to 0 is

$$\begin{aligned} \begin{aligned}&h^*_k(a,b) - h^*_k(a,b \; | \; a_i=0,b_j=0) =\\&- 2^k a_i b_j + 2^i a_i \left( \sum _{z=0}^{j-1} \; 2^z b_z \right) + 2^j b_j \left( \sum _{z=0}^{i-1} \; 2^z a_z \right) . \end{aligned} \end{aligned}$$

(27)

When minimizing, it is clearly undesirable to choose $a^{\text {min}}_i \ne b^{\text {min}}_j$. Considering that the change should not be positive, $a^{\text {min}}_i = b^{\text {min}}_j = 1$ should be chosen if and only if

$$\begin{aligned} \begin{aligned} 2^i \left( \sum _{z=0}^{j-1} \; 2^z b_z \right) + 2^j \left( \sum _{z=0}^{i-1} \; 2^z a_z \right) \le 2^k. \end{aligned} \end{aligned}$$

(28)

When maximizing, it is clearly undesirable to choose $a^{\text {max}}_i = b^{\text {max}}_j$. That said, $a^{\text {max}}_i = 1, b^{\text {max}}_j = 0$ should be chosen if and only if

$$\begin{aligned} \begin{aligned} 2^j \left( \sum _{z=0}^{i-1} \; 2^z a_z \right) \le 2^i \left( \sum _{z=0}^{j-1} \; 2^z b_z \right) . \end{aligned} \end{aligned}$$

(29)

Of course, the choice is arbitrary when both possible choices result in the same change. After the case of the only double-unknown k-th column pair present is resolved, there are no further unknown bits and thus, the values of $h^*_k$ extremes can be computed as

$$\begin{aligned} \begin{aligned}&h^{*,\text {min}}_k = \left( - \sum _{i=0}^{k} 2^k a^{\text {min}}_i b^{\text {min}}_{k-i} \right) + \left( \sum _{i=0}^{k-1} \; \sum _{j=0}^{k-i-1} \; 2^{i+j} a^{\text {min}}_i b^{\text {min}}_j \right) ,\\&h^{*,\text {max}}_k = \left( - \sum _{i=0}^{k} 2^k a^{\text {max}}_i b^{\text {max}}_{k-i} \right) + \left( \sum _{i=0}^{k-1} \; \sum _{j=0}^{k-i-1} \; 2^{i+j} a^{\text {max}}_i b^{\text {max}}_j \right) . \end{aligned} \end{aligned}$$

(30)

7.3 Multiple Double-Unknown k-th Column Pairs

Lemma 2

Consider a sequence of integers $t=(t_0, t_1, \dots , t_{T-1})$ that fulfills

$$\begin{aligned} \begin{aligned} \forall n \in [0, T-2] \; . \; |t_{n+1} - t_n| \le 2^k, t_0 + 2^k \le t_{T-1}. \end{aligned} \end{aligned}$$

(31)

Then,

$$\begin{aligned} \begin{aligned} \exists n \in [0, T-1] \; . \; (t_n \bmod 2^{k+1}) < 2^k. \end{aligned} \end{aligned}$$

(32)

Proof

Use Lemma 1 to transform the claim to equivalent

$$\begin{aligned} \begin{aligned} \exists v \in [\min {t}, \max {t}] \; . \; (v \bmod 2^{k+1}) < 2^k. \end{aligned} \end{aligned}$$

(33)

Since $[t_1, t_1 + 2^k] \subseteq [\min {t}, \max {t}]$, such claim is implied by

$$\begin{aligned} \begin{aligned} \exists v \in [t_0, t_0 + 2^k] \; . \; (v \bmod 2^{k+1}) < 2^k. \end{aligned} \end{aligned}$$

(34)

As $[t_0, t_0 + 2^k] \bmod 2^{k+1}$ has $2^k+1$ elements and there are only $2^k$ elements that do not fulfill $(v \bmod 2^{k+1}) < 2^k$, Eq. 34 holds due to the pigeonhole principle. $\square $

Corollary 1

Given a sequence of integers $(t_0, t_1, \dots , t_{T-1})$ that fulfills Lemma 2 and an arbitrary integer $C \in \mathbb {Z}$, the lemma also holds for sequence $(t_0+C, t_1+C, \dots , t_{T-1}+C)$.

Theorem 4

Let $\hat{r}^{*,\text {best}}_k$ be the best abstract transformer of multiplication. Let $\hat{a}$ and $\hat{b}$ be such that there are $p_1,p_2,q_1,q_2$ in $\{0,\dots ,k\}$ where

$$\begin{aligned} \begin{aligned} p_1 \ne p_2, p_1+q_2=k, p_2+q_1=k,\\ \hat{a}_{p_1}=\hat{X}, \hat{a}_{p_2}=\hat{X}, \hat{b}_{q_1}=\hat{X}, \hat{b}_{q_2}=\hat{X}. \end{aligned} \end{aligned}$$

(35)

Then $\hat{r}^{best,*}_k(\hat{a}, \hat{b})=\hat{X}$.

Proof

For an abstract bit-vector $\hat{c}$ with positions of unknown bits $u_1,\dots , u_n$, denote the concrete bit-vector $c\in \gamma (\hat{c})$ for which $\forall i\in \{ 1,\dots , n\} \; . \; c_{u_i}=s_i$ by $\gamma _{s_1,\dots , s_n}(\hat{c})$. Let $\varPhi _{s_1,\dots ,s_n}(\hat{c}){\mathop {=}\limits ^{\text {def}}}\varPhi (\gamma _{s_1,\dots , s_n}(\hat{c}))$.

Now, without loss of generality, assume $\hat{a}$ only has unknown values in positions $p_1$ and $p_2$ and $\hat{b}$ only has unknown positions $q_1, q_2$ and $p_1<p_2, q_1<q_2$. Then, for $s_1,s_2, t_1, t_2 \in \mathbb {B}$, using $h(a,b)=\varPhi (a) \cdot \varPhi (b)$,

$$\begin{aligned} \begin{aligned}&h(\gamma _{s_1,s_2}(\hat{a}), \gamma _{t_1, t_2}(\hat{b})) = (2^{p_1} s_1 + 2^{p_2} s_2 + \varPhi _{00}(\hat{a})) \cdot (2^{q_1} t_1 + 2^{q_2} t_2 + \varPhi _{00}(\hat{b})). \end{aligned} \end{aligned}$$

(36)

Define $A{\mathop {=}\limits ^{\text {def}}}\varPhi _{00}(\hat{a})$ and $B{\mathop {=}\limits ^{\text {def}}}\varPhi _{00}(\hat{b})$ and let them be indexable similarly to bit-vectors, i.e. $A_{0..z} = {(A \mod 2^{z+1})}$, $A_{z} = \zeta _z(A_{0..z})$. Define

$$\begin{aligned} \begin{aligned}&h^{\text {proof}}_k(\gamma _{s_1,s_2}(\hat{a}), \gamma _{t_1, t_2}(\hat{b})) {\mathop {=}\limits ^{\text {def}}}\\&\qquad \quad 2^{{p_1}+{q_1}} s_1 t_1+2^{{p_1}+{q_2}} s_1 t_2 + 2^{q_1} t_1 A_{0..p_2-1} + 2^{p_1} s_1 B_{0..q_2-1} \; + \\&\qquad \quad 2^{{p_2}+{q_1}} s_2 t_1 + 2^{{p_2}+{q_2}} s_2 t_2 + 2^{q_2} t_2 A_{0..p_1-1} + 2^{p_2} s_2 B_{0..q_1-1} + AB. \end{aligned} \end{aligned}$$

(37)

As $A_{p_1}=A_{p_2}=B_{q_1}=B_{q_2}=0$, $h^{\text {proof}}_k$ and h are congruent modulo $2^{k+1}$. Define

$$\begin{aligned} \begin{aligned} D(s_1,s_2, t_1, t_2){\mathop {=}\limits ^{\text {def}}}h^{\text {proof}}_k(\gamma _{s_1,s_2}(\hat{a}), \gamma _{t_1, t_2}(\hat{b})) \; - \; h^{\text {proof}}_k(\gamma _{00}(\hat{a}), \gamma _{00}(\hat{b})). \end{aligned} \end{aligned}$$

(38)

As $p_1+q_2=k$ and $p_2+q_1=k$,

$$\begin{aligned} \begin{aligned} D(s_1,s_2, t_1, t_2) = \;&2^{{p_1}+{q_1}} s_1 t_1+2^k s_1 t_2 + 2^{q_1} t_1 A_{0..p_2-1} + 2^{p_1} s_1 B_{0..q_2-1} + \\&2^k s_2 t_1 + 2^{{p_2}+{q_2}} s_2 t_2 + 2^{q_2} t_2 A_{0..p_1-1} + 2^{p_2} s_2 B_{0..q_1-1}. \end{aligned} \end{aligned}$$

(39)

Set $s_1, s_2, t_1, t_2$ to specific chosen values and obtain

$$\begin{aligned} \begin{aligned}&D(1,1,0,0)= D(1,0,0,0)+D(0,1,0,0),\\&D(0,0,1,1)= D(0,0,1,0)+D(0,0,0,1),\\&D(1,0,0,1)= 2^k+D(1,0,0,0)+D(0,0,0,1). \end{aligned} \end{aligned}$$

(40)

Inspecting the various summands, note that

$$\begin{aligned} \begin{aligned} D(1,0,0,0) \in [0,2^k-1], \; D(0,1,0,0) \in [0,2^k-1],\\ D(0,0,1,0) \in [0,2^k-1], \; D(0,0,0,1) \in [0,2^k-1],\\ D(1,1,0,0) - D(1,0,0,0) \in [0,2^k-1],\\ D(0,0,1,1) - D(0,0,1,0) \in [0,2^k-1]. \end{aligned} \end{aligned}$$

(41)

Recalling Eq. 10, the best abstract transformer can be obtained as

$$\begin{aligned} \begin{aligned}&0 \in \hat{r}^{\text {best}}_k \iff \exists a \in \gamma (\hat{a}), b \in \gamma (\hat{b}) \; . \; (h_k^{\text {proof}}(a,b) \bmod 2^{k+1})< 2^k,\\&1 \in \hat{r}^{\text {best}}_k \iff \exists a \in \gamma (\hat{a}), b \in \gamma (\hat{b}) \; . \; ((h_k^{\text {proof}}(a,b) + 2^k) \bmod 2^{k+1}) < 2^k. \end{aligned} \end{aligned}$$

(42)

Constructing a sequence of $h^{\text {proof}}_k(\gamma _{s_1,s_2}(\hat{a}), \gamma _{t_1, t_2}(\hat{b}))$ that fulfills the conditions of Lemma 2 then implies that both inequalities can be fulfilled due to Corollary 1, which will complete the proof. Furthermore, as $D(s_1,s_2,t_1,t_2)$ only differs from $h^{\text {proof}}_k(\gamma _{s_1,s_2}(\hat{a}), \gamma _{t_1, t_2}(\hat{b}))$ by the absence of summand AB that does not depend on the choice of $s_1,s_2,t_1,t_2$, it suffices to construct a sequence of $D(s_1,s_2,t_1,t_2)$ that fulfills Lemma 2 as well.

There is at least a $2^k$ step between D(0, 0, 0, 0) and D(1, 0, 0, 1). They will form the first and the last elements of the sequence, respectively. It remains to choose the elements in their midst so that there is at most $2^k$ step between successive elements.

Case 1

$D(0,1,0,0) \ge D(0,0,0,1)$. Considering Eqs. 40 and 41, a qualifying sequence is

$$\begin{aligned} (D(0,0,0,0), D(1,0,0,0), D(1,1,0,0), D(1,0,0,1)). \end{aligned}$$

(43)

Case 2

$D(0,1,0,0) < D(0,0,0,1)$. Using Eq. 39, rewrite the case condition to

$$\begin{aligned} 2^{p_2-p_1} D(1,0,0,0) < 2^{q_2-q_1} D(0,0,1,0). \end{aligned}$$

(44)

As $p_1+q_2=k, p_2+q_1=k$, it also holds that $q_2-q_1=p_2-p_1$. Rewrite the case condition further to

$$\begin{aligned} 2^{p_2-p_1} D(1,0,0,0) < 2^{p_2-p_1} D(0,0,1,0). \end{aligned}$$

(45)

Therefore, $D(1,0,0,0) < D(0,0,1,0)$. Considering Eqs. 40 and 41, a qualifying sequence is

$$\begin{aligned} (D(0,0,0,0), D(0,0,1,0), D(0,0,1,1), D(1,0,0,1)). \end{aligned}$$

(46)

This completes the proof. $\square $

7.4 Implementation Considerations

There are some considerations to be taken into account for an efficient implementation of the fast multiplication algorithm.

The first question is how to detect the positions of single-unknown and double-unknown k-th column pairs. As such pairs have the form $2^k a_i b_{k-i}$, it is necessary to perform a bit reversal of one of the bit-vectors before bitwise logic operations can be used for position detection. Fortunately, it suffices to perform the reversal only once at the start of the computation. Defining the bit reversal of the first z bits of b as $\lambda (b,z) = (b_{z-1-i})_{i=0}^{z-1}$, when the machine word size $W \ge k+1$, reversal of the first $k+1$ bits (i.e. the bits in $b_{0..k}$) may be performed as

$$\begin{aligned} \begin{aligned} \lambda (b,k+1) = ((b_{k-i})_{i=0}^{k}) = ((b_{W-1-i})_{i=W-k-1}^{W-1}) = \lambda (b,W)_{W-k-1..W-1}. \end{aligned} \end{aligned}$$

(47)

It is thus possible to precompute $\lambda (b,W)$ and, for each k, obtain $\lambda (b,k+1)$ via a right shift through $W-k-1$ bits, which can be performed in constant time. Furthermore, power-of-two bit reversals can be performed in logarithmic time on standard architectures [1, p. 33–35], which makes computation of $\lambda (b,W)$ even more efficient.

The second problem is finding out whether multiple double-unknown k-th column pairs exist, and if there is only a single one, what is its position. While that can be determined trivially in linear time, a find-first-set algorithm can also be used, which can be implemented in logarithmic time on standard architectures [1, p. 9] and also is typically implemented as a constant-time instruction on modern processors.

The third problem, computation of $h^*_k$ extremes in Eq. 30, is not as easily mitigated. This is chiefly due to removal of summands with coefficients above $2^k$ due to $2^{k+1}$ congruence. While typical processors contain a single-cycle multiplication operation, we have not found an efficient way to use it for computation of Eq. 25. To understand why this is problematic, computation of $h^*_k$ with 3-bit operands and $k=2$ can be visualised as

The striked-out operands are removed due to $2^{k+1}$ congruence, while the k-th column pair summands are subtracted instead of adding them. These changes could be performed via some modifications of traditional multiplier implementation (resulting in a custom processor instruction), but are problematic when only traditional instructions can be performed in constant time. Instead, we propose computation of $h^*_k$ via

$$\begin{aligned} \begin{aligned} h^*_k(a,b) = \sum _{i=0}^{k} a_i \left( - 2^k b_{k-i} + 2^i \varPhi (b_{0..k-i-1}) \right) . \end{aligned} \end{aligned}$$

(48)

As each summand over i can be computed in constant time on standard architectures, $h^*_k(a,b)$ can be computed in linear time. Modified multiplication techniques with lesser time complexity such as Karatsuba multiplication or Schönhage-Strassen algorithm [13] could also be considered, but they are unlikely to improve practical computation time when N corresponds to the word size of normal microprocessors, i.e. $N \le 64$.

7.5 Fast Abstract Multiplication Algorithm

Applying the previously discussed improvements directly leads to Algorithm 2. For conciseness, in the algorithm description, bitwise operations are denoted by the corresponding logical operation symbol, shorter operands have high zeros added implicitly, and the bits of $a^{\text {min}}, a^{\text {max}}, b^{\text {min}}, b^{\text {max}}$ above k are not used, so there is no need to mask them to zero.

Upon inspection, it is clear that the computation complexity is dominated by computation of $h^{\text {min}}_{k}, h^{\text {max}}_{k}$ and the worst-case time complexity is $\varTheta (N^2)$, proving Theorem 2. Since the loops depend on M which does not change when signed multiplication is considered (only N does), signed multiplication is expected to incur at most a factor-of-4 slowdown when 2N fits machine word size, the possible slowdown occurring due to possible splitting of most significant bits of multiplicands (discussed at the start of Sect. 7).

8 Experimental Evaluation

We implemented the naïve universal algorithm, the fast abstract addition algorithm, and the fast abstract multiplication algorithm in the C++ programming language, without any parallelization techniques used. In addition to successfully checking equivalence of naïve and fast algorithm outputs for $N \le 9$, we measured the performance of algorithms with random inputs. The implementation and measurement scripts are available in the accompanying artifact [9].

To ensure result trustworthiness, random inputs are uniformly distributed and generated using a C++ standard library Mersenne twister before the measurement. The computed outputs are assigned to a volatile variable to prevent their removal due to compile-time optimization. Each measurement is taken 20 times and corrected sample standard deviation is visualised.

The program was compiled with GCC 9.3.0, in 64-bit mode and with maximum speed optimization level -O3. It was ran on the conference-supplied virtual machine on a x86-64 desktop system with an AMD Ryzen 1500X processor.

8.1 Visualisation and Interpretation

We measured the CPU time taken to compute outputs for $10^6$ random input combinations for all algorithms for $N \le 8$, visualising the time elapsed in Fig. 1. As expected, the naïve algorithm exhibits exponential dependency on N and the fast addition algorithm seems to be always better than the naïve one. The fast multiplication algorithm dominates the naïve one for $N \ge 6$. The computation time of the naïve algorithm makes its usage for $N \ge 16$ infeasible even if more performant hardware and parallelization techniques were used.

For the fast algorithms, we also measured and visualised the results up to ${N = 32}$ in Fig. 2. Fast addition is extremely quick for all reasonable input sizes and fast multiplication remains quick enough even for $N = 32$. Fast multiplication results do not seem to exhibit a noticeable quadratic dependency. We consider it plausible that as N rises, so does the chance that there are multiple double-unknown k-th column pairs for an output bit and it is set to ‘X’ quickly, counteracting the worst-case quadratic computation time.

Finally, we fixed $N = 32$, changing the independent variable to the number of unknown bits in each input, visualising the measurements in Fig. 3. As expected, the fast multiplication algorithm exhibits a prominent peak with the easiest instances being all-unknown, as almost all output bits will be quickly set to ‘X’ due to multiple double-unknown k-th column pairs. Even at the peak around $N = 6$, the throughput is still above one hundred thousands computations per second, which should be enough for model checking usage.

In summary, while the naïve algorithm is infeasible for usage even with 16-bit inputs, the fast algorithms remain quick enough even for 32-bit inputs.

9 Conclusion

We devised a new modular extreme-finding technique for construction of fast algorithms which compute the best permissible three-valued abstract bit-vector result of concrete operations with three-valued abstract bit-vector inputs when the output is not restricted otherwise (forward operation problem). Using the introduced technique, we presented a linear-time algorithm for abstract addition and a worst-case quadratic algorithm for abstract multiplication. We implemented the algorithms and evaluated them experimentally, showing that their speed is sufficient even for 32-bit operations, for which naïve algorithms are infeasibly slow. As such, they may be used to improve the speed of model checkers which use three-valued abstraction.

There are various research paths that could further the results of this paper. Lesser-used operations still remain to be inspected, most notably the division and remainder operations. Composing multiple abstract operations into one could also potentially reduce overapproximation. Most interestingly, the forward operation problem could be augmented with pre-restrictions on outputs, which would allow not only fast generation of the state space in forward fashion, but its fast pruning as well, allowing fast verification via state space refinement. Furthermore, verification of hardware containing adders and multipliers could be improved as well, e.g. by augmenting Boolean satisfiability solvers with algorithms that narrow the search space when such a structure is found.

References

Arndt, J.: Bit wizardry. In: Arndt, J. (ed.) Matters Computational, pp. 2–101. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-14764-7_1
Chapter MATH Google Scholar
Boros, E., Hammer, P.L.: Pseudo-Boolean optimization. Discret. Appl. Math. 123(1), 155–225 (2002)
Article MathSciNet Google Scholar
Clarke, E.M., Henzinger, T.A., Veith, H.: Introduction to model checking. In: Handbook of Model Checking, pp. 1–26. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_1
Chapter MATH Google Scholar
Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL ’79, pp. 269–282. Association for Computing Machinery, New York (1979). https://doi.org/10.1145/567752.567778
Institute of Electrical and Electronics Engineers: IEEE standard multivalue logic system for VHDL model interoperability (std_logic_1164). IEEE Std 1164–1993 pp. 1–24 (1993)
Google Scholar
Kleene, S.C.: On notation for ordinal numbers. The Journal of Symbolic Logic 3(4), 150–155 (1938)
Article Google Scholar
Noll, T., Schlich, B.: Delayed nondeterminism in model checking embedded systems assembly code. In: Yorav, K. (ed.) HVC 2007. LNCS, vol. 4899, pp. 185–201. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77966-7_16
Chapter Google Scholar
Onderka, J.: Deadline verification using model checking. Master’s thesis, Czech Technical University in Prague, Faculty of Information Technology (2020). http://hdl.handle.net/10467/87989
Onderka, J.: Operation checker for fast three-valued abstract bit-vector arithmetic (2021). Companion artifact to this paper
Google Scholar
Regehr, J., Reid, A.: HOIST: a system for automatically deriving static analyzers for embedded systems. SIGOPS Oper. Syst. Rev. 38(5), 133–143 (2004)
Article Google Scholar
Reinbacher, T., Horauer, M., Schlich, B.: Using 3-valued memory representation for state space reduction in embedded assembly code model checking. In: 2009 12th International Symposium on Design and Diagnostics of Electronic Circuits Systems, pp. 114–119 (2009)
Google Scholar
Reps, T., Thakur, A.: Automating abstract interpretation. In: Jobstmann, B., Leino, K.R.M. (eds.) VMCAI 2016. LNCS, vol. 9583, pp. 3–40. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49122-5_1
Skiena, S.S.: Introduction to algorithm design. In: Skiena, S.S. (ed.) The Algorithm Design Manual, pp. 3–30. Springer, London (2008). https://doi.org/10.1007/978-1-84800-070-4_1
Chapter MATH Google Scholar
Yamane, S., Konoshita, R., Kato, T.: Model checking of embedded assembly program based on simulation. IEICE Trans. Inf. Syst. E100.D(8), 1819–1826 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Czech Technical University in Prague, Prague, Czech Republic
Jan Onderka
Institute of Computer Science, The Czech Academy of Sciences, Prague, Czech Republic
Stefan Ratschan

Authors

Jan Onderka
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Ratschan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Onderka .

Editor information

Editors and Affiliations

Helmholtz Center for Information Security, Saarbrücken, Germany
Bernd Finkbeiner
New York University, New York, NY, USA
Thomas Wies

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Onderka, J., Ratschan, S. (2022). Fast Three-Valued Abstract Bit-Vector Arithmetic. In: Finkbeiner, B., Wies, T. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2022. Lecture Notes in Computer Science(), vol 13182. Springer, Cham. https://doi.org/10.1007/978-3-030-94583-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-94583-1_12
Published: 14 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94582-4
Online ISBN: 978-3-030-94583-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast Three-Valued Abstract Bit-Vector Arithmetic

Abstract

Similar content being viewed by others

Sound Bit-Precise Numerical Domains

Interpolating bit-vector formulas using uninterpreted predicates and Presburger arithmetic

Invertible Linear Transforms of Numerical Abstract Domains

Keywords

1 Introduction

1.1 Our Contribution

2 Related Work

3 Basic Definitions

3.1 Abstract Bit Encodings

3.2 Abstract Transformers

3.3 Algorithm Complexity Considerations

3.4 Naïve Universal Abstract Algorithm

4 Formal Problem Statement

Theorem 1

Theorem 2

5 Modular Extreme-Finding Technique

Lemma 1

Proof

Theorem 3

Proof

6 Fast Abstract Addition

7 Fast Abstract Multiplication

7.1 Obtaining a Best Abstract Transformer

7.2 At Most One Double-Unknown k-th Column Pair

7.3 Multiple Double-Unknown k-th Column Pairs

Lemma 2

Proof

Corollary 1

Theorem 4

Proof

Case 1

Case 2

7.4 Implementation Considerations

7.5 Fast Abstract Multiplication Algorithm

8 Experimental Evaluation

8.1 Visualisation and Interpretation

9 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation