1 Introduction

A major objective of the technical industry is to provide its customers with small and fast devices which are simultaneously energy-efficient. System designers focus on three major aspects while designing a system: area, latency, and power consumption.

In terms of area, it is widely known to researchers in the field of digital technology that the number of transistors, which are able to fit on an integrated circuit, has risen steadily since the early 1960s, thanks to technological advance (also known as Moore’s law) [1]. This trend of minimization has been declining and is expected to end in 2025 [2].

The question is, how to address the three aspects when “simply” minimizing the transistors will not be possible in the foreseeable future any more. It turns out that many applications, especially in the domain of digital signal processing, do not require strictly correct computations [3]. This is due to the fact that the human perception itself is not perfect. In some other situations, it might even be the case that the customer is willing to accept incorrect results in favor of having a faster, smaller, or less energy-hungry system [4].

A design paradigm known as approximate computing [5, 6] exploits this. The basic idea is to trade off computational accuracy for gains in nonfunctional aspects such as reduced area, smaller latency, and power reduction.

In the literature, two main approaches to introduce approximations to the design in order to achieve gains on one or multiple of the aspects mentioned above are used (see, e.g., [7]): (a) physical changes to the design including voltage over-scaling or overclocking or (b) altering the functionality. We will pursue the latter approach in this work. More precisely, we will present a novel and fast approximate logic synthesis (ALS) technique. Our optimization goal is the circuit area. We refer the reader to [8] for a general survey on ALS techniques.

In this work we propose an ALS method that (a) aims to minimize the area of an approximated circuit, (b) specifically targets arithmetic circuits, (c) operates on (X)AIG representations of Boolean functions, and (d) has a small execution time due to a fast method of evaluating the error introduced by the approximation.

2 Related Work

Approximate logic synthesis has been performed on many different representations of Boolean functions using very different means of approximation.

Initial work has been done by using the structural information of a given circuit, e.g., by cutting the carry chain of adders or multipliers [9].

On a higher level of abstraction, researchers extended programming languages [10] and hardware description languages [11] with constructs to automatically compile/synthesize approximated systems.

Preliminary work on approximations on graph structures has mostly been done on BDDs [12,13,14]. In this work, we use the AIG data structure. The works that are closest to the work presented in this manuscript are [15] where the authors find cuts within an AIG that are replaced by approximations. The introduced error is bound by a miter structure that is evaluated using SAT. While the authors in [14] work on BDDs, we employ their idea of exploiting properties of the approximation operation to speed up the error metric computation process. We further also use their algorithmic approach for AIG approximation.

3 Background

3.1 Notation and Conventions

In this paper, all functions will be of type \(f:\mathbb {B}^n \rightarrow \mathbb {B}^m\). The m individual output functions are denoted as f i. The interpretation of f(x) as a natural number with the usual binary encoding is denoted by \( \operatorname {\mathrm {val}}(f(x))\).

For a function f with m = 1, i.e., a Boolean function, its ON/OFF-set is denoted by \( \operatorname {\mathrm {ON}}/ \operatorname {\mathrm {OFF}}(f)\), i.e.

$$\displaystyle \begin{aligned} \operatorname{\mathrm{ON}}(f) := \{ x \mid f(x) = 1\} ~\text{and}~ \operatorname{\mathrm{OFF}}(f) := \{ x \mid f(x) = 0\}. \end{aligned}$$

The size of a ON/OFF-set is denoted \(\# \operatorname {\mathrm {ON}}(f)/\# \operatorname {\mathrm {OFF}}(f)\).

Given a Boolean function f, an approximated version of it is denoted by a hat, i.e., \(\hat {f}\). Primary inputs are labeled A, B, … and primary outputs X, Y, Z. Within this manuscript, the approximations do not alter the number of input or output variables.

The truth density \( \operatorname {\mathrm {td}}(f)\) of a function f is the ratio between the size of the ON-set of f and the total number of inputs, i.e.

$$\displaystyle \begin{aligned} \operatorname{\mathrm{td}}(f)=\frac{1}{2^n}\cdot \#\operatorname{\mathrm{ON}}(f). \end{aligned}$$

The name stems from the fact that the truth density gives information about the probability of f being 1, i.e., true.

3.2 (XOR-)AND-Inverter Graphs

To efficiently represent Boolean functions, many representations have been presented. This work focuses on AND-Inverter Graphs (AIGs) [16] and XOR-AND-Inverter Graphs [17]. These structures are directed acyclic graphs. In both representations, nodes without incoming edges represent primary inputs, and nodes without outgoing edges represent primary outputs. For AIGs, the internal nodes represent the logical AND operation, whereas in a XAIG, the nodes can represent either the logical AND or the logical XOR operation. In both types of graphs, edges might be negated. We denote the size, i.e., the number of nodes, of an (X)AIG G by #G.

Example 1

Consider the addition of the two-bit numbers \((CA), (DB)\in \mathbb {B}^2\), i.e., (ZYX) = (CA) + (DB). An AIG representing the adder is shown in Fig. 1a. The nodes are AND operations, while dashed edges indicate negations. The output X is computed as

$$\displaystyle \begin{aligned} X= \underbrace{\neg \underbrace{\left(\neg B \wedge \neg A\right)}_{\mathrm{Node}6} \wedge \neg \underbrace{\left(A \wedge B\right)}_{\mathrm{Node}5}}_{\mathrm{Node}7}. {} \end{aligned} $$
(1)

Figure 1b shows an XAIG representing the same functionality, i.e., a two-bit adder. The gray nodes are XOR nodes. Note that the computation of X in Eq. (1) is actually an XOR operation, i.e., X = A ⊕ B. This is reflected by the XAIG in node 7 that completely represents the computation of X. This shows that XAIGs may save nodes compared to AIGs. The AIG used the three nodes 5, 6, and 7 to represent the computation of X (see Fig. 1a).

Fig. 1
An A I G and an X A I G with components connected by an AND operation, some connected by negation, and X O R nodes.

AIG and XAIG for a two-bit adder. (a) AIG for a two-bit adder computing (CA) + (DB) = (ZYX). Each node represents an AND operation. Dashed lines indicate negation. (b) XAIG representing the same functionality as the AIG in (a). The gray nodes are XOR nodes; the other nodes are AND nodes

While there is no one-to-one correspondence between the number of nodes in an (X)AIG and the resulting circuit size, the rule of thumb “less nodes lead to smaller circuits” does often hold and is used within this work.

In this work, we expect the functionality to be optimized by ALS to be given as an AIG. Hence, instead of optimizing a circuit, represented by an AIG, directly for the area used by an actual physical realization, we aim to minimize the number of AIG nodes instead.

3.3 Error Metrics

To evaluate systems in terms of the quality of the computed values, many different error metrics have been proposed. Each of these metrics measures different aspects of the approximated functionality (see [18] for an overview of commonly used metrics). Some examples of error metrics are

$$\displaystyle \begin{aligned} \operatorname{\mathrm{er}}(f, \hat{f}) &= \frac{1}{2^n}\cdot \sum_{x\in \mathbb{B}^n} f(x) \ne \hat{f}(x), {} \end{aligned} $$
(2)
$$\displaystyle \begin{aligned} \operatorname{\mathrm{wce}}(f,\hat{f}) &= \max_{x\in\mathbb{B}^n} \lvert{\operatorname{\mathrm{val}}(f(x)) - \operatorname{\mathrm{val}}(\hat{f}(x))}\rvert, \, \text{and}{} \end{aligned} $$
(3)
$$\displaystyle \begin{aligned} \operatorname{\mathrm{whd}}(f,\hat{f}) & = \sum_{i=0}^{m-1} 2^i \sum_{x\in \mathbb{B}^n} \left(f_i(x) \oplus \hat{f}_i(x)\right).{} \end{aligned} $$
(4)

The error rate (Eq. 2) counts how often the approximated function \(\hat {f}\) computes an incorrect result. This metrics is not well-suited for evaluating approximations of arithmetic circuits as it does not take into account at all how severe the errors are as it completely ignores the actual function values. As this metrics is rather simple to evaluate (or compute an estimate using Monte Carlo simulations), it is often used in the literature. The worst-case error (Eq. 3) does take the values of f and \(\hat {f}\) into account and returns the largest error. The last error metric (Eq. 4) is a weighted variant of the Hamming distance metric derived from the mean Hamming distance as presented in [19]. The weight parameters 2i ensure that the bit position of an error is taken into account. Therefore, we have that errors in the more significant bits have a larger influence on the error than the lower significant bits. The metrics (3) and (4) are well-suited for arithmetic circuits.

Example 2

Table 1 shows the truth table for a function f, an approximation \(\hat {f}\) of f, and the error metric values for the three error metrics introduced above.

Table 1 Error metric values for the error rate, the worst-case error, and the weighted Hamming distance for an exemplary function f and its approximated function \(\hat {f}\)

All these error metrics have in common that they are computationally expensive to determine [20], making iterative ALS techniques that rely on repeatedly evaluating an error metric infeasible. It is possible to accelerate the error metric computation when properties of the approximation operation on a specific data structure can be exploited [14]. In this work, we adopt the greedy bucket-based algorithm from [14] to operate on (X)AIGs and choose the weighted Hamming distance as our error metric. We will use the truth density propagation from [21] to quickly compute (an estimate of) \( \operatorname {\mathrm {whd}}\) (see Sect. 4.3).

4 Fast AIG Approximate Logic Synthesis

4.1 Bucket-Based Approximation Algorithm

We first describe the presented ALS technique, a bucket-based approximation algorithm, on a high level of abstraction before explaining the technical details in the following sections.

The main idea behind the algorithm is, given an AIG G, to define multiple buckets that contain approximations of G that have less nodes than G. Each bucket has an error threshold. Only approximated AIGs that have an \( \operatorname {\mathrm {whd}}\) error lass than the threshold are stored in the bucket. All buckets are sorted in ascending order of the threshold. The algorithm iterates over the AIGs currently stored in the buckets and tries to further approximate them without exceeding the error threshold of the last bucket. When no further approximations are possible, the algorithm terminates and returns the buckets.

Algorithm 1: Fast approximate AIG synthesis

The returned buckets form the Pareto front for the optimization criteria number of AIG nodes (which we use as a stand-in for the circuit’s area) and the weighted Hamming distance error metric.

The algorithm is depicted in Algorithm 1. In lines 1–3 the buckets are set up. They are initialized with copies of the AIG that is to be approximated; the first bucket (having the smallest error threshold) is selected as the first AIG to be approximated. The algorithm runs as long as approximations have been performed (lines 4–20). For the current bucket, nodes and corresponding approximation operations that can be applied are found (line 6). Each of these approximations is applied (line 7), and the result is evaluated for whether it can be put into one of the buckets, i.e., whether there is a bucket containing an AIG with a larger error and more nodes (line 9). If that is the case, the corresponding bucket is updated (lines 11–12). If the updated bucket has a lower error threshold than the currently used bucket, this bucket is used in the next iteration (lines 13–14); otherwise, the next bucket is used (lines 17 and 19).

We implemented the proposed ALS method in the state-of-the-art logic synthesis tool ABC [22].

4.2 Approximation Operations

In this work, we make use of the two different approximation operations, XOR replacement and constant replacement, as they can be efficiently implemented on the AIG data structure. After a replacement has been conducted, the structure of the AIG has changed, and new optimization rules may apply. Therefore, after each replacement, the AIG is again optimized by ABC.

Example 3

After replacing an input A of an AND node v (i.e., v represents A ∧ B) with a constant 0, e.g., allows to further replace the node v with the constant 0 as we have

$$\displaystyle \begin{aligned} A \wedge B \stackrel{\mathrm{replace}~\mathrm{A}~\mathrm{with}~0}{\rightsquigarrow} = 0 \wedge B = 0. \end{aligned} $$

XOR Replacement

The idea behind XOR replacement is to first identify nodes in the initial AIG G that form an XOR operation and then to replace them by a single node only.

In order to identify the nodes forming an XOR operation, the AIG G is transformed into an equivalent XAIG G′ (see step (a) in Fig. 2). This step is handled automatically by ABC. Note that the transformation does not necessarily replace all AND nodes by XOR nodes.

Fig. 2
The flowchart is as follows. A I G leads to X A I G via identifying X O R. X A I G results in approximated A I G via replacing X O R.

Exemplary XOR replacement example

The second step (see step (b) in Fig. 2) then replaces the found XOR node by a single AND node. Note that one or multiple edges in the graph might be negated in this process (see the outgoing edge of node 5 on the right of Fig. 2).

In order to find suitable replacements for the XOR node, we investigated the XOR behavior depending on the truth densities of the inputs of the XOR operation. Table 2 shows the replacements introducing the smallest error. We obtained the replacements via exhaustive testing.

Table 2 XOR Replacements based on the truth densities of A and B

The tie breaker in the case when both NAND and OR are suitable replacements, we chose the NAND replacement when either \( \operatorname {\mathrm {td}}(A) > (1- \operatorname {\mathrm {td}}(B))\) or \( \operatorname {\mathrm {td}}(B) > (1- \operatorname {\mathrm {td}}(A))\) holds. We replace the XOR node with an OR node otherwise.

Example 4

Consider the AIG on the left of Fig. 2 and assume \( \operatorname {\mathrm {td}}(A)=0.7\) and \( \operatorname {\mathrm {td}}(B)=0.5\). The nodes 3, 4, and 5 form an XOR operation and, hence, can be replaced according to Table 2. As both NAND and OR are valid replacements, we have to check the tie breaker to decide on the actual replacement. As we have \( \operatorname {\mathrm {td}}(B) = 0.5 > 0.3 = (1-0.7) = (1- \operatorname {\mathrm {td}}(A))\), the three nodes are replaced by a single NAND node.

Replacing any XOR node v in the AIG of a function f according to Table 2 yields an approximation \(\hat {f}\) where

$$\displaystyle \begin{aligned} \operatorname{\mathrm{ON}}(f_v) \subseteq \operatorname{\mathrm{ON}}(\hat{f}_v) \vee \operatorname{\mathrm{ON}}(\hat{f}_v) \subseteq \operatorname{\mathrm{ON}}(f_v) {} \end{aligned} $$
(5)

holds. Here \(f_v/\hat {f}_v\) is the function represented by the node v. Equation 5 describes over-/underapproximations, respectively. Note that the property in Eq. (5) holds only locally at the replaced node.

Constant Replacement

When a node in the AIG has a truth density close to either 0 or 1, it can be considered a constant 0 or 1 node. To make this decision, the user can specify a corresponding decision threshold. As long as this threshold is less than 0.5, i.e., replace the node v with a constant 0∕1 when \( \operatorname {\mathrm {td}}(v) < 0.5/ \operatorname {\mathrm {td}}(v) > 0.5\), the constant replacement operation also has the property in Eq. (5).

For this replacement operation, the AIG G does not need to be transformed into an XAIG.

4.3 Fast Computation of the Weighted Hamming Distance

We review the definition of the weighted Hamming distance error metric from Eq. (4)

$$\displaystyle \begin{aligned} \operatorname{\mathrm{whd}}(f,\hat{f}) = \sum_{i=0}^{m-1} 2^i \cdot \left(\sum_{x\in \mathbb{B}^n} \left(f_i(x) \oplus \hat{f}_i(x)\right)\right) \end{aligned} $$
(6)

and note that the computation of the Hamming distance on the individual output functions f i can be computed using the truth density as follows:

$$\displaystyle \begin{aligned} =\sum_{i=0}^{m-1} 2^i \cdot \left(2^n\cdot\lvert{\operatorname{\mathrm{td}}(f_i) - \operatorname{\mathrm{td}}(\hat{f}_i)}\rvert\right) =2^n\cdot \sum_{i=0}^{m-1} 2^i \cdot \lvert{\operatorname{\mathrm{td}}(f_i) - \operatorname{\mathrm{td}}(\hat{f}_i)}\rvert. {} \end{aligned} $$
(7)

For this equality to hold, the function \(\hat {f}\) must have been obtained by applying an approximation operation for which the property in Eq. (5) holds.

Example 5

Consider the two approximations \(\hat {\text{X}}\) and \(\tilde {\text{X}}\) shown in the truth table in Table 3. For the approximation operation yielding \(\hat {\text{X}}\), property (5) holds, i.e., we have that \( \operatorname {\mathrm {ON}}({\text{X}}) \subset \operatorname {\mathrm {ON}}(\hat {\text{X}})\) holds. This property does not hold for the approximation \(\tilde {\text{X}}\). Computing the \( \operatorname {\mathrm {whd}}\) using Eq. (7) shows that the over-/underapproximation property is crucial:

$$\displaystyle \begin{aligned} \operatorname{\mathrm{whd}}(\text{X}, \hat{\text{X}}) &= 2^2\cdot \lvert{0.50 - 0.75}\rvert = 4 \cdot 0.25= 1\\ \operatorname{\mathrm{whd}}(\text{X}, \tilde{\text{X}}) &= 2^2\cdot \lvert{0.50 - 0.50}\rvert = 4 \cdot 0.00 = 0\\ \end{aligned} $$
Table 3 Approximating X using operations for which the over-/underapproximation property Eq. (5) does hold (\(\hat {\text{X}}\)) and does not hold (\(\tilde {\text{X}}\))

The value \( \operatorname {\mathrm {whd}}(\text{X}, \tilde {\text{X}})=0\) is clearly incorrect.

The advantage of computing \( \operatorname {\mathrm {whd}}\) using Eq. (7) instead of using the initial definition of Eq. (4) is that the actual time necessary to determine the value can be greatly reduced if the computation of the truth densities can be done quickly. We will see in Sect. 4.4 how this is possible.

As the proposed ALS method (see Algorithm 1) is an iterative approach, many \( \operatorname {\mathrm {whd}}\) values have to be computed during a synthesis run. When the total number of inputs does not exceed 16, the AIG can be fully evaluated and exact results can be computed. When the AIG grows beyond this, the truth density and, hence, the \( \operatorname {\mathrm {whd}}\) are computed iteratively by determining the \( \operatorname {\mathrm {whd}}\) locally for the approximated node only. We then adopt an additive model accumulating the locally computed errors until the output node is reached. This additive model along with the fact that consecutive errors that might cancel each other out (a situation also known as error masking) are not taken into account leads to an overestimation of the total error. The upside of this simplification is that it allows for a very fast estimation of the total \( \operatorname {\mathrm {whd}}\).

4.4 Truth Density Computation

So far, we used the truth density values of all (X)AIG nodes without considering how to actually compute them. In this work, we make use of two different means of obtaining the truth densities of the nodes.

The first means of obtaining the truth density is to directly use ABC. The tool estimates the truth density values of the nodes by running a number of simulations of the graph, i.e., evaluating the graph for a given number of randomly generated inputs. The quality of the result greatly varies with the number of simulations and, hence, the time one is willing to spend on the estimation.

As the computation of the densities is crucial for both, the decision on which node to replace and the computation of the weighted Hamming distance error metric, we chose to use the error propagation method presented in [21]. While its intended use is to propagate the error rate through a general Boolean network, it can easily be applied for our use case as (X)AIGs are nothing but a specific Boolean network and the truth density is already computed by the approach as a “by-product.”

The speed of the approach from [21] stems from not having to perform full simulations of the (X)AIGs but computes the truth density using symbolic variables. It should be noted that computed densities are only exact in case when there is no fanout reconvergence in the (X)AIG. Nevertheless, extensive tests have shown that the degradation of the results in case of reconvergences is negligible.

5 Experimental Evaluation

5.1 Experimental Setup

We implemented the proposed ALS technique in the state-of-the-art logic synthesis tool ABC [22] using the probabilistic error propagation tool from [21].

As benchmark circuits, generic n-bit adders and multipliers as well as the EvoApproxLibLITE library [18] are used.

Instead of directly specifying the \( \operatorname {\mathrm {whd}}\) value of the buckets, we define a threshold t ∈ [0, 1] that reflects how large the error in the most significant bit of the output is allowed to be in percent. This allows to define buckets that capture similar error behavior for circuits of different size, i.e., one does not have to (manually) compute different bucket values for an 8-bit and an 16-bit multiplier. A threshold value t can be translated in an estimate on the \( \operatorname {\mathrm {whd}}\) via \( \operatorname {\mathrm {whd}}(f,\hat {f}) \approx t\cdot 2^n\cdot 2^{m-1}\).

All experiments were executed on an AMD Ryzen 5 3600XT 6-Core CPU with 3.80-GHz and 16-GB memory running Ubuntu 20.04 in WSL 2 on Windows 10 Build 19044.1645.

5.2 Scalability

To assess the scalability of our approach, we performed approximate logic synthesis on adders of increasing bit width using 5 buckets with threshold values 0.0156, 0.0.03125, 0.0625, 0.125, and 0.25. In the experiments, it turned out that the error propagation implementation has a memory leak preventing it to be used for AIGs with more than ≈ 300 nodes. Therefore, the following results were obtained using ABC’s simulation method.

The results of the synthesis runs are presented in Table 4. For each bit width, the results for each bucket are listed in a separate line. The approximated AIGs were converted to a list of logic gates using ABC. Afterward, the area and delay have been computed by ABC using the mcnc.genlib gate library. For each physical aspect, the number of gates, the area, and the delay, we present the reduction/increase in the aspect in percent after the absolute values in the table. We further report the \( \operatorname {\mathrm {whd}}\) for the AIGs.

Table 4 ALS results for adders of varying bit width. For each bucket, the number AIG node, number of gates, area, delay, and \( \operatorname {\mathrm {whd}}\) are reported. For the number of nodes/gates, the area, and the delay, the relative change to the unapproximated AIG is also shown

Using the number of AIG nodes as a stand-in for the circuit area works well: the reduction in nodes is qualitatively reflected in the reduction in the number of gates and the reduction of the area. As can be seen, the goal of optimizing circuits for area has been achieved. It is interesting to see the reduction remains in the range ≈ 65%–75% for threshold values up to 0.125. Only after allowing for 25% weighted errors in the most significant bit, further size reductions are achieved.

While our method is capable of reducing the area, it does, in turn, increase the delay of the circuit (usually in the ≈ 112%–125% range). This value, again, drops when a large threshold is used. As we do not explicitly optimize for delay, this is an acceptable trade-off.

As can be seen, the actual \( \operatorname {\mathrm {whd}}\) values for the buckets increase with increasing bit width of the adders. This shows that choosing a means to describe buckets that abstracts away the bit width is helpful.

5.3 Multi-Objective Optimization for Area and \( \operatorname {\mathrm {whd}}\)

The benchmark library EvoApproxLibLITE Footnote 1 [18, 23] provides a selection of approximate adders and multipliers. They have been synthesized via exhaustive search with respect to various error metrics (including \( \operatorname {\mathrm {er}}\) and \( \operatorname {\mathrm {wce}}\)) as well as area and power consumption. The benchmark set does not evaluate the \( \operatorname {\mathrm {whd}}\) error metric.

As the final buckets of the presented approach form the Pareto front of the multi-objective optimization problem with the optimization criteria \( \operatorname {\mathrm {whd}}\) and area, we compare our results with the exhaustive results from EvoApproxLibLITE. We select the adders and multipliers from the benchmark set that form the Pareto front with respect to area and the \( \operatorname {\mathrm {wce}}\) error metric and compute the \( \operatorname {\mathrm {whd}}\) values for them so that we can compare the benchmark circuits to our results. The circuits optimized for this metric were chosen as \( \operatorname {\mathrm {wce}}\) does take into account the order of the output bits, and, therefore, the corresponding circuits allow for the fairest comparison.

The comparison for 8-bit unsigned adders and multipliers is shown in Fig. 3. The notation for our circuits is as follows: “add8u_b75” refers to an unsigned 8-bit adder from the bucket with a threshold of 0.75. For EvoApproxLibLITE, the naming scheme is of the form “add8u_〈ID〉” and directly taken from their website.

Fig. 3
Two scatterplots of W H D versus area for proposed A L S and Evo Approx Lib LITE have more scattered plots for proposed A L S in the second graph.

Comparison of the synthesis results of the proposed approach (dark blue dots) and EvoApproxLibLITE (red squares) with respect to area and the weighted Hamming distance error metric for (a) 8-bit unsigned adders and (b) 8-bit unsigned multipliers

For the adders (Fig. 3a), the proposed ALS method clearly produces better results than EvoApproxLibLITE. These results can be explained, in part, by the fact that EvoApproxLibLITE optimized for a different error and in part by the fact that the computation of the sum bits in an adder basically is a large XOR gate. When looking at the results for the multipliers (Fig. 3b), one can see that the applied approximations are not resulting in points close to the Pareto front any more. When investigating what approximation operations have been chosen by the proposed ALS algorithm (see Table 5), one can see that the ratio of XOR replacement over constant replacements for the adder is higher than for the multiplier. This further hints that XOR replacement is well-suited for adders while multipliers do not benefit from this particular kind of approximation.

Table 5 Number of XOR/Constant replacements for 8-bit adders and multipliers

5.4 Truth Density Computation

To investigate the difference in execution time between the ABC simulation-based truth density estimation and the method from [21], we synthesized adders of increasing bit width using 5 buckets with threshold values 0.0156, 0.0.03125, 0.0625, 0.125, and 0.25. The results are reported in Fig. 4. As can be seen, the error propagation approach clearly excels with respect to the execution time. Due to the memory leakage issue (see Sect. 5.2), we can not show results for larger circuits. While using ABC for the error estimation already is fast, using error propagation shows a great potential to further accelerate our proposed ALS technique.

Fig. 4
An illustration of the execution time of A B C simulation and error propagation, 21 for 16-bit, 8-bit, 5-bit, and 4-bit. The execution time is the longest for A B C simulation for 16 bits.

Execution time of the proposed ALS method in seconds for adders of varying bit width

6 Conclusion and Outlook

We presented a novel and fast greedy, bucket-based approximate logic synthesis technique working on AIGs that aims to minimize both the area of the resulting circuit and, at the same time, the error introduced by the approximations. We chose the weighted Hamming distance error metric \( \operatorname {\mathrm {whd}}\) to assess the functional quality of the circuit as it takes into account the order of the output bits. We found a means of effectively computing \( \operatorname {\mathrm {whd}}\) via computing truth densities and exploiting properties of the used approximation operations. The effectiveness of the presented method has been evaluated in a set of experiments.

The next step is to find the memory leakage in the fast error propagation tool to further enhance the speed of the proposed method.