Keywords

1 Introduction

The design of electronic circuits is usually a complex task which requires knowledge of specific methodologies. The use of evolutionary algorithms (EAs) to design digital systems gave rise to the Evolvable Hardware (EH) field [9]. The reasons which brought EH to the spotlight of hardware development are its ability to [9]: (i) reach novel architectures which the conventional methods would hardly provide due to their non-flexible nature, (ii) find fair solutions for problems where the specifications are incomplete, (iii) deliver fair solutions in scenarios where there is not a perfect solution, (iv) achieve fault tolerance at the hardware level, and (v) to make the design less dependent on the expert.

EH has provided good results as in [4, 7, 9], but EH faces issues such as the representation of solutions, as long chromosomes are usually required to encode complex circuits which lead to large search spaces and, consequently, increased difficulty in solving the problem. In addition, in the design of combinational digital circuits (CDCs), the processing time grows exponentially with the number of circuit inputs, and EH also uses techniques which are very time consuming [7].

Addressing the design of CDCs, energy efficiency, complexity and delay are features of digital circuits to be analyzed during the manufacturing process once people desire faster, simpler and energetically efficient devices. Approximate Computing (AC), a new paradigm in electronic projects, explores systems which could tolerate loss (i.e. precision) in order to reduce complexity, costs, delay and to increase the energy efficiency of the systems [4, 5]. AC can be found in inherently error resilient situations [10], such as multimedia [5], machine learning [11], approximate arithmetic circuits [4], and FIR and IIR approximate filters [9].

The usage of AC and EH in the context of evolutionary design has gained attention as these approaches may lead to the conception of circuit architectures which are different and might be superior to the designs created by specialists [9]. In this scenario, digital circuits obtained via AC are classified as approximate digital circuits (ADCs) [10]. Their requirements are relaxed aiming at achieving savings in energy consumption, delay, and complexity. As it may be necessary to increase the complexity of a target circuit to reduce errors, energy efficiency and delay would probably be degraded as these quantities are conflicting. Thus, a multiobjective optimization problem arises [4]. A variety of trade-offs between error, delay, and energy efficiency can be found by a multiobjective approach, enabling the design of a vast number of ADCs. Applying the evolutionary approach in the scenario of intrinsic evolutionary design (e.i. in which the evolutionary process is conducted in the target device) could bring new possibilities as the final solution is already implemented in hardware.

Cartesian Genetic Programming (CGP) [7] is a genetic programming method in which programs are expressed as directed acyclic graphs (DAGs) with their nodes organized in a matrix [7]. CGP allows for a convenient representation when several inputs and outputs are required and, consequently, has become the most popular method in the evolutionary design of CDCs [7].

Here, we propose a technique to design approximate combinational digital circuits (ACDC) based on CGP where the size of the population varies according to the number of non-dominated solutions. One candidate solution starts the search process, the population is allowed to grow up to a maximum number of non-dominated solutions, and, when the population size exceeds a predefined threshold, the candidate solutions with the lowest crowding distances [3] are eliminated. We study here the trade-off between the delay, output error, and power dissipation when designing approximate digital arithmetic circuits. In particular, 8-bit adder (A8) and 8-bit multiplier (M8) are commonly used in the context of AC and EH [4]. Thus, we included them in computational experiments. In addition, the Arithmetic Logic Unit (ALU) is considered here. The results found are compared to those obtained by other methods from the literature.

2 Related Work

Vasicek et al. [10] developed ADCs in which the requirements on functional equivalences between the specifications and implementations were relaxed leading to gains in the speed of computation, area occupied on a chip, and energy consumption. That approach can also be used in multimedia and image compression applications. For instance, most users would not notice variations in the brightness degree in some pixels of an image [5].

Hrbacek et al. [4] proposed a multiobjective approach to design ADCs using CGP to represent candidate circuits, and the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) [3] to explore the search space by analyzing the trade-off between error, delay and power dissipation. The initial population was composed of fully functional circuits, instead of random circuits, and approximate versions of M8 and S8 were designed with significant power consumption savings.

Kaufmann et al. [6] proposed a local search algorithm called hybrid evolutionary strategy (hES) based on the evolutionary strategy and the concepts of Pareto dominance. The hES method uses the Fast non-dominated sort and Crowding-distance of NSGA-II [3], and alternates its evolutionary process using global and local search. That method and its periodization with NSGA-II was compared with Strength Pareto Evolutionary Algorithm 2 (SPEA2) and NSGA-II when designing 2-bit multiplier and adder. CGP’s representation was adopted, the functional quality of the solutions was treated as a constraint, and circuit area and delay were the objectives. The authors concluded that for the evolution of digital circuits which uses CGP as a representation model, hES and its periodization are significantly better than NSGA-II and SPEA2.

Fig. 1.
figure 1

(extracted from [4]).

Illustration of the representation used in CGP

3 Cartesian Genetic Programming

Cartesian Genetic Programming (CGP) [7], is an evolutionary technique where a circuit is represented by a grid of \(n_{r} \times n_{c}\) interconnected nodes, as in Fig. 1. The processing elements, represented by F, can be either elementary gates (e.g. AND, OR, XOR) or functional level components (e.g. adders, comparators, shifters, and multipliers). Nodes inputs can be connected either to one of the \(n_{i}\) primary inputs or to a node located in the previous l columns; l is defined as “levels-back”, which controls the connectivity of the graphs. Each node has a fixed number of inputs, \(N_{ni}\), and outputs, \(N_{no}\). The nodes are able to perform one of the functions from the set \(\varGamma \) (\(F \in \varGamma \)). Each one of the \(n_{o}\) primary outputs can be connected either to a primary input or to an output of a node. A candidate circuit represented by CGP is described by a sequence of integers which represents the functions and the connections between the nodes. The encoding of a chromosome consists of \(n_{r} \times n_{c}\) triplets (\(i_{1}\), \(i_{2}\), F). Thus, for the case considered here where the gates have two inputs, it is possible to represent a node with its input indices \(i_{1}\) and \(i_{2}\) (other nodes), and a function \(\psi \). The last part of GP’s encoding contains \(n_{o}\) integers which specify the primary output nodes of the circuit.

In the standard CGP, the population consists of a fixed number of individuals, commonly, randomly initialized. Also, some previous knowledge of the problem can be used. The optimization process when designing circuits is normally an evolutionary strategy \((1 + \lambda )\)-ES, where \(\lambda \) offspring are created by the application of a point mutation (i.e. \(\mu _{r}\)% of genes are modified) to the current solution. These \(1 + \lambda \) individuals are evaluated and the best one survives to the next generation. The process is repeated until a stop criterion is reached.

3.1 Objective Functions

Four objectives (to be minimized) are often considered in the design: area, error, power dissipation, and propagation delay. They are described in the sequence.

Area. Analyzing the area of a digital circuit is relevant as it affects the manufacturing cost. The larger the circuit c, the larger the area of the printed circuit board necessary for its implementation. The area occupied by c can be estimated by counting its processing elements (e.g., transistors, elementary ports, functional level components). Here, the area of c is defined as its number of gates.

Error Functions. The Hamming distance between two binary words is often adopted as the error measure in the evolutionary design of digital circuits [4]. Let \(O^{(i)}_{orig}\) and \(O^{(i)}_{app}\) denote, respectively, the output binary word of a fully functional digital circuit and that of an ADC generated for the input vector i. The Hamming distance is defined as \(d_{h} = \sum _{\forall i} (O^{(i)}_{orig} \oplus O^{(i)}_{app})\), where \(\oplus \) is XOR, i.e., the number of positions in which the bits of the two binary words differ from each other.

Power Dissipation. The main sources of power dissipation of a digital circuit are [2]: switching component of power (\(P_{switching}\)), short-circuit component of power (\(P_{short-circuit}\)), and the leakage current component of power (\(P_{leakage}\)). The power dissipation is then computed by \(P = a_{0\rightarrow 1}.C_{L}.f_{clk}.V_{dd}^{2} + I_{sc}.V_{dd} + I_{leakage}.V_{dd}\), where \(C_{L}\) is the load capacitance, \(f_{clk}\) is the clock frequency, \(a_{0\rightarrow 1}\) is the node transition activity factor, \(I_{sc}\) is the short circuit current, \(I_{leakage}\) is the leakage current, and \(V_{dd}\) is the supply voltage. The node transition factor quantifies the average number of times a logic gate makes a state transition that dissipates power within a period of clock. It can be defined as \(a_{0\rightarrow 1} = p_{0}.p_{1} = p_{0}.(1 - p_{0})\).

Propagation Delay. The delay of a digital circuit c, the time spent for the changes in the input cause any effects in the output, is defined here as \(D_{c}\) and is calculated as the delay of the longest path according to \(D_{c} =\max _{\forall p \in path}\sum _{c_{i}\in p} t_{d}(c_{i})\), where \(t_{d}\) is the delay of a cell \(c_{i}\) and \(t_{d}\) is normally provided by the manufacturers.

4 Non-dominated Sorting Genetic Algorithm II

Most of the multiobjective EAs (MOEA) are based on the Pareto dominance concept which states that a solution \(\mathbf {x_{1}}\) dominates a solution \(\mathbf {x_{2}}\) if: (i) \(\mathbf {x_{1}}\) is no worse than \(\mathbf {x_{2}}\) in all objectives; and (ii) \(\mathbf {x_{1}}\) is strictly better than \(\mathbf {x_{2}}\) in at least one objective. NSGA-II [3] has two main features: non-dominated ranking and crowding distance. In the non-dominated ranking, the individuals in the set \(R_{t}\) –parent population (\(P_{t}\)) plus offspring population (\(Q_{t}\))– are sorted according to their dominance. All Pareto optimal solutions, which are the feasible non-dominated solutions, form the first front \(F_{1}\). The non-dominated solutions from \(R_t \setminus F_{1}\)Footnote 1 compose the second front \(F_{2}\), and so on. A rank corresponding to the front index is assigned to each individual, i.e., i is given to the solutions in \(F_{i}\). The individuals in the first fronts form \(P_{t+1}\).

However, as the parent population size is \(|R_{t}|/2\), where \(|\cdot |\) denotes the cardinality of a set, one can determine i such that \(|F_{1} \cup F_{2} \cup \dots \cup F_{i-1}| \le |R_{t}|/2\) and \(|F_{1} \cup F_{2} \cup \dots \cup F_{i}| > |R_{t}|/2\). Thus, some individuals in \(F_i\) can not be included in the next population, as the population size is fixed. The crowding-distance is calculated as the semi-perimeter of the hyperrectangle formed by the values of adjacent neighbor of each candidate solution in the objectives space and it is adopted by NSGA-II to select the candidate solutions in less populated regions.

5 The Proposed Method

The idea here is to design ACDC, considering multiple objectives, using CGP. Most of the MOEAs, such as NSGA-II, maintain a population of fixed size during the evolutionary process; the larger the population, the larger the number of trade-offs between the different objectives analyzed, as a larger population allows for better coverage of the search space. As a consequence, the chances of premature convergence is reduced. However, the computational cost for each iteration of the method associated with this population may be significant. In contrast, a small population leads to a coarser coverage, which can result in the non-exploration of promising areas of the search space increasing the probability that the algorithm gets stuck at a local optimum [8]. An attractive alternative is thus to permit that the size of the population varies during the search process. For this, we propose here the variation of the number of individuals in the population from one to a (user defined) maximum value. The proposed approach uses CGP to represent the candidate circuits, builds a set of Pareto solutions observing the trade-off between key circuit parameters and, when the population size becomes larger than \(Tam\_Max\), the crowding distance from NSGA-II [3] is applied to select individuals. Algorithm 1 presents a pseudocode of the proposal.

figure a

Initially, the population of parents (\(P_{t}\)) is initialized with a single (randomly generated) individual and the population of offspring (\(Q_{t}\)) is empty. The proposal evolves the candidate circuits while the stop criterion is not met. During the search, the parent and offspring populations are combined into a single population \(R_{t} = P_{t} \cup Q_{t}\). Then, the non-dominated individuals from \(R_{t}\), \(F_{1}\), are selected to compose the population \(P_{t}\). Thus, the size of the population \(P_{t}\) is not fixed, as the number of non dominated solutions may vary. Crowding distance [3] is applied when the population size is larger than a user-defined threshold (\(Tam\_Max\)). This is used in order to avoid the population size to become very large. Crowding distance can be used to preserve the diversity of the solutions and is calculated as half of the perimeter of the cuboid formed by the nearest neighbor with respect to the objective function values. The individuals are sorted in descending order (Crowded-comparison-operator-sort) and the first ones are selected to compose \(P_t\). Finally, new candidate circuits are generated using point mutation (Mutate(P\(_{i}\))), the most commonly used mutation operator in CGP [7]. In this context, one allele of a gene is randomly selected and its value is replaced by another valid value chosen randomly. The point mutation rate (\(\mu _{r}\)%) is a parameter that affects the number of genes mutated.

To deal with constrained problems, the proposed method uses the Constrained NSGA-II approach [3]. In this context, it is said that a solution \(\mathbf {x_{1}}\) dominates a solution \(\mathbf {x_{2}}\) if any of the following conditions is true: (i) \(\mathbf {x_{1}}\) is feasible and \(\mathbf {x_{2}}\) is not; (ii) \(\mathbf {x_{1}}\) and \(\mathbf {x_{2}}\) are both infeasible, but \(\mathbf {x_{1}}\) has a smaller overall constraint violation; and (iii) \(\mathbf {x_{1}}\) and \(\mathbf {x_{2}}\) are both feasible and \(\mathbf {x_{1}}\) dominates \(\mathbf {x_{2}}\).

6 Computational Experiments

The experiments were conducted in the extrinsic evolutionary scenario (search occurs in software) using the truth tables of w-bit (\(w = 2, 8\)) adder and multiplier, and an ALU circuits. The problems are labeled here as Aw and Mw, respectively, for adder and multiplier circuits, and they can be used to construct filtering structures, such as Finite Impulse Response Filter (FIR filter) and Infinite Impulse Response Filter (IIR Filter). These filters are quite important in digital signal processing such as image processing, video processing, audio processing, and wireless communication systems [5]. Also, adders and multipliers are used in circuits dedicated to calculate hyperbolic and trigonometric functions, such as the Coordinate Rotation Digital Computer (CORDIC) [1]. The Ripple Carry Adder and Ripple Carry Array Multiplier were adopted as reference architectures for the 2- and 8-bit adder and multiplier, respectively. Finally, the architecture of ALU is important for digital signal processing, and it is present in all computing devices such as microprocessors, computers and embedded systems. The SN54/74LS181 architectureFootnote 2, a 4-bit ALU which can perform 16 logic operations and also a diversity of arithmetic operations, was adopted here. The output is encoded by a 8-bit word, 4 bits are reserved for each variable, and 6 bits are used as controllers (logical/arithmetical operator). The data provided by a current manufacturer of digital devices is used to calculate power dissipation and delay (\(V_{dd}\), \(C_{L}\) and \(t_{d}\) in Sect. 3.1). NexperiaFootnote 3 was selected as its gates have low \(t_{d}\) values. The gates are: 74LVC1G08 (AND), 74LVC1G08 (NAND), 74LVC1G32 (OR), 74LVC1G86 (XOR), 74LVC1G02 (NOR), HEF4077B (XNOR). These circuits are important but simples, as we conducted the experiments as a preliminary evaluation of the proposed approach.

The proposed CGPMO+APSFootnote 4 was compared to hES and its variant with periodization (hn\(^{10}\)) [6], and the CGP combined with NSGA-II [4]. The experiments were divided into three scenarios: (i) constrained multiobjective optimization (no domain information added), (ii) the design of ACDCs, (no domain information added); and (iii) the design of ACDCs with information from the domain specialist.

6.1 Scenario 1 – Constrained Multiobjective Optimization

Here, S2 and M2 are the problems, the Hamming distance is the measure of functional quality (a constraint in this scenario), and the area and delay are the objective functions to be minimized. The computational experiments were conducted as in [6], and the results obtained by the proposed CGPMO+APS are compared to those found by hES and hn\(^{10}\). The initial population of the CGPMO+APS is composed by 1 individual randomly generated, \(\varGamma = \{0, 1, a, b, \overline{a}, \overline{b}, a.b, a.\overline{b}, \overline{a}.b, \overline{a}.\overline{b}, a \oplus b, a \oplus \overline{b}, a +b, a +\overline{b}, \overline{a}+b, \overline{a}+\overline{b}, a .\overline{c}+b.c, a .\overline{c}+\overline{b}.c, \overline{a}.\overline{c}+b.c, \overline{a}. \overline{c}+\overline{b}.c\}\) is the set of Boolean functions that can be executed by processing nodes, \(\mu _{r} = 5\%\), \(n_{r} = 1\), \(n_{c} = 200\), \(l = 200\), 400.000 fitness evaluations were allowed for S2, and 1.600.000 fitness evaluations for M2. Finally, 20 independent runs were performed.

In [6], the Additive Epsilon Indicator (AEI) and the Kruskal-Wallis non-parametric test were used. Here we do not present a statistical comparison using the results of CGPMO+APS as the results of each independent run were not provided in [6]. However, the capacity of generating feasible circuits with various combinations of area and delay was analyzed in [6]. Thus, we provide here a comparison in terms of feasible circuits generated by the techniques considered and these values are presented in Table 1. In this scenario we analyze the number of independent runs which resulted in functionally correct solutions. The results indicate that the proposed CGPMO+APS obtained better results in terms of fully functional circuits when compared to the other methods, as it is the only technique which found functionally correct circuits in all 20 independent runs.

Table 1. Number of independent runs with functionally correct solutions.

6.2 Scenario 2 – Approximate Design of CLCs from Scratch

The present set of experiments is composed of three objective functions commonly employed when designing ADCs in the context of EH: (i) Hamming distance, (ii) delay, and (iii) power dissipation. The circuits S8, M8, and ALU were used as problems to be solved. The approach proposed in [4], labeled here as CGP+NSGA-II, was implemented and its results are used in the comparative analysis. Both CGP+NSGA-II and the proposed CGPMO+APS used the same parameter settings: \(\varGamma = \{\text {AND, NAND, OR, XOR, NOR, XNOR} \}\), \(\mu _{r} = 5\%\), \(n_{r}=1\). The number of columns were \(l = n_{c} = 300, 200, 1000\) for the ALU, S8, and M8, respectively, and these values are those used in [4]. Also, the population size of CGP+NSGA-II is equal to 50.

Table 2. Parameters of circuits with the lowest error values when the population is randomly initialized and \(30\times 10^6\) circuit evaluations are allowed. The reference models (with error = 0) are the ALU SN54/74LS181 with delay = 22.22 ns and power = 1.14 mW, the Ripple Carry Adder with delay = 25.70 ns and power = 0.50 mW, and the Ripple Carry Array Multiplier with delay = 74.30 ns and power = 3.21 mW. Here, error is the percentage of incorrect outputs values, and delay and power are the ratio of the values observed in the circuits generated and those of the reference architecture.

Here we analyze the results obtained by CGPMO+APS and CGP+NSGA-II when the initial population was randomly generated and \(30\times 10^{6}\) objective function evaluations were allowed. Table 2 presents the operational characteristics of 10 candidate solutions for ALU, S8, and M8 with the smallest errors considering all independent runs. The techniques analyzed here were not able to design fully functional ALUs, S8, and M8 from scratch. It can also be noticed that no circuits reached delay values lower than or equal to the delay of the reference circuits. The ALU models obtained by CGPMO+APS show a smaller level of error than those found by CGP+NSGA-II. The (10) ALUs designed by CGPMO+APS and presented in Table 2 dissipate less power and have a smaller delay than those (10) obtained by CGP+NSGA-II. Regarding S8, the circuits generated dissipate less power than the reference architecture. Among the S8 circuits, that with the largest power dissipation (\(79.63\%\) of the power dissipated by the reference architecture) was found by CGPMO+APS and it is superior, in terms of energy efficiency, to all 10 circuits that present the lowest error values found by CGP+NSGA-II. The error values of the S8 circuits obtained by CGPMO+APS and CGP+NSGA-II are similar. It is noted that the 10 M8 models with the smallest error values obtained by both approaches dissipate less power than the Ripple Carry Array Multiplier. Also, the M8 circuits found by CGPMO+APS dissipate less power and have smaller delay than those obtained by CGP+NSGA-II.

Table 3 presents the hypervolume found by CGPMO+APS and CGP+NSGA-II, and the p-values of the Kruskal-Wallis tests. CGPMO+APS presented the highest mean hypervolume values for all circuits analyzed. According to the Kruskal-Wallis test, the results found by the proposed CGPMO+APS are statistically different from those obtained by CGP+NSGA-II for S8 and ALU.

6.3 Scenario 3 – Approximate CLCs Using Conventional Models

Besides the design of ADCs from scratch, CGPMO+APS and CGP+NSGA-II were also applied to optimize conventional architectures by observing the trade-off between error, power, and delay. As a result, the techniques provide a set of ADCs to the specialist, who can choose the best for his/her application. Also, \(30 \times 10^{6}\) circuit evaluations are allowed here.

The operational features of 10 circuits with the smallest errors for ALUs, S8, and M8 are shown in Table 4. CGPMO+APS and CGP+NSGA-II were able to obtain fully functional circuits in terms of error, and these circuits dissipate less power or present lower delay values than those of the base architecture for all the cases tested (ALU, S8, and M8). CGPMO+APS generated an ALU with lower values of delay and power dissipation than those of the conventional architecture. Also, with respect to S8 and M8, the fully functional circuits generated by the proposed method are better in delay or power dissipation. The results obtained by CGPMO+APS are, in general, better than those found by CGP+NSGA-II. This advantage is specifically large with respect to the delay values of ALU.

Table 3. Hypervolume values. The reference points are the highest values obtained for each objective function: (0.5221, 192.3000, 0.5861), (0.4722, 291.7000, 0.4865), and (0.5257, 374.5000, 2.6115), respectively, for ALU, S8, M8. Kruskal-Wallis test is applied and the p-values are also presented.
Table 4. Parameters of circuits with the lowest error values when the population is initialized with a conventional architecture and \(30\times 10^6\) circuit evaluations are allowed. The reference models (with error = 0) are those in the caption of Table 2.

Table 5 presents the hypervolume found by CGPMO+APS and CGP+NSGA-II. CGPMO+TP presented the highest mean values of hypervolume for all circuits analyzed. According to the Kruskal-Wallis test, the results of CGPMO+APS are statistically better than those of CGP+NSGA-II for all the tested circuits.

Table 5. Hypervolumes values and p-values of the Kruskal-Wallis test. The reference points are: (0.4284, 165.8000, 1.1348), (0.4722, 74.4000, 0.6369), and (0.5353, 82.3000, 3.2135), respectively, for ALU, S8, and M8.

7 Concluding Remarks and Future Work

A multiobjective Cartesian Genetic Programming technique with adaptive population size (CGPMO+APS) was proposed here to design approximate combinational digital circuits (ACDCs). Three situations were considered in the experiments: (i) constrained multiobjective optimization (no domain information added), (ii) the design of ACDCs (no domain information added); and (iii) the design of ACDCs with information from the domain specialist.

In (i), the results indicated that CGPMO+APS was able to design more feasible circuits than the other approaches analyzed. In (ii) and (iii), the results show that both CGPMO+APS and CGP+NSGA-II are not suitable for the construction of complex architectures such as ALUs, S8, and M8 without the introduction of the knowledge of the domain expert. On the other hand, when a conventional architecture is adopted as the initial solution, the two approaches synthesized ALUs, S8, and M8 that do not present errors with respect to the truth table and with improvement in delay or power dissipation when compared to the reference architecture. Particularly, the CGPMO+APS obtained a fully functional ALU with lower delay and power dissipation than those of the reference architecture. Also, CGPMO+APS obtained better mean values of hypervolume than those of CGP+NSGA-II.

The use of Binary Decision Diagrams to reduce the processing time and solving more complex problems are relevant research avenues.