INTRODUCTION

The hardware implementation of binary addition is a fundamental architectural component in microprocessor systems such as microprocessors, digital signal processors, mathematical coprocessors, microcontrollers, mobile devices, etc. In these systems, in the design of operating devices, combiners play an important role in performing many computer arithmetic operations based on addition. In this case, the increase in the performance of operating devices depends on the efficiency of the adder. Developing an efficient adder that delivers high performance is a pressing challenge.

Currently, almost all modern computers in critical paths use parallel-prefix adders (PPAs) [1]. They are highly efficient in terms of speed and are used to quickly add two multidigit binary numbers. The literature describes several PPAs with a different occupied area and maximum delay time [2]. PPAs include Sklansky, Kogge–Stone, Brent–Kung, Han–Carlson, and Ladner–Fischer adders. The Kogge-Stone adder is considered to be the fastest, since it has the smallest propagation delay time in the circuit implementation among these PPAs [3, 4].

The aim of this study is to develop a multibit modified PPA to reduce the hardware and time costs in comparison with the well-known Kogge–Stone adder.

KOGGE–STONE ADDER

The Kogge–Stone adder is considered the fastest standard PPA, which adds two n-bit numbers \(A = {{a}_{{n - 1}}}{{a}_{{n - 2}}}.....{{a}_{0}},\;B = {{b}_{{n - 1}}}{{b}_{{n - 2}}}.....{{b}_{0}},\) and forms an n‑bit result, \(S = {{s}_{{n - 1}}}{{s}_{{n - 2}}}.....{{s}_{0}}\) and output transfer \({{c}_{{out}}}.\) The Kogge–Stone adder circuits for 8 and 32 bits are shown in Figs. 1 and 2. First the circuits compute the carry generation signals \({{g}_{i}}\) and half-sum bits \({{h}_{i}}\) for bit pairs by logical equations, and \({{g}_{i}} = {{a}_{i}} \cdot {{b}_{i}};\;{{h}_{i}} = {{a}_{i}} \otimes {{b}_{i}},\) then for nodes using the expression (gi:k, pi:k) = (gi:j + pi:jGj–1:k, pi:jpj–1:k) until the carry generation signal ci is not be known for each digit. The sum si is determined by signals hi and ci–1 according to si = hici–1 through the Exclusive OR element. The Kogge–Stone adder is described in more detail in [5, 6].

Fig. 1.
figure 1

Logic diagram of the 8-bit Kogge–Stone adder.

Fig. 2.
figure 2

32-bit Kogge–Stone adder circuit.

The Kogge–Stone adder is faster than the Sklansky, Brent–Kung, Han–Carlson, and Ladner–Fischer adders [7, 8]. However, for the hardware implementation, the Kogge–Stone adder requires more logic elements, hence, an increased area, consumes more power, and is more expensive. Nevertheless, the Kogge–Stone adder is used in high-performance applications, since performance is the most important determining factor.

DEVELOPMENT OF A MODIFIED PPA

In order to reduce the hardware and time costs, a modified version of the PPA has been developed, which uses its own prefix tree structure. Assume that A = an–1an–2a0 and B = bn–1bn–2b0 are two binary numbers that will be added up, and S = sn–1sn–2s0 and cout represent their sum and output carry. The idea of designing a modified adder begins with the logic of generating a propagation carry with a parallel-prefix carry. To calculate the generation g and propagation p of the transfer, we assume that

$${{g}_{i}} = {{a}_{i}} \cdot {{b}_{i}},\,\,\,\,{{p}_{i}} = {{a}_{i}} + {{b}_{i}},$$
(1)

where i = 0 ≤ in – 1; and the symbols “·” and “+” denote the logical operations AND and OR, respectively.

The calculation of the normal carries of the adder is performed as

$$\begin{gathered} {{c}_{i}} = {{g}_{i}} + {\text{ }}{{p}_{i}} \cdot {{g}_{{i-}}}_{1} + {\text{ }}{{p}_{i}} \cdot {{p}_{{i-}}}_{1} \cdot {{g}_{{i-}}}_{2} \\ + {\text{ }}...{\text{ }} + {\text{ }}{{p}_{i}} \cdot {{p}_{{i-}}}_{1} \cdot {{p}_{{i-}}}_{2} \cdot {{ \ldots }} \cdot {\text{ }}{{p}_{1}} \cdot {{g}_{0}}. \\ \end{gathered} $$
(2)

According to definition (1), it is assumed that piGi = gi. Then we can perform identical transformations similarly to (2):

$$\begin{gathered} {{c}_{i}} = \left( {{{g}_{i}} + {\text{ }}{{g}_{{i-}}}_{1} + {\text{ }}{{p}_{{i-}}}_{1} \cdot {{g}_{{i-}}}_{2}} \right. \\ \left. { + {\text{ }}... + {{p}_{{i-}}}_{1} \cdot {{p}_{{i-}}}_{2} \cdot \ldots \cdot {{p}_{1}} \cdot {{g}_{0}}} \right)~{{p}_{i}}. \\ \end{gathered} $$
(3)

After defining the term of variable ki, Eq. (3) can be written as

$${{c}_{i}} = {{k}_{i}} \cdot {{p}_{i}},$$
(4)
$$\begin{gathered} {\text{where}}\,\,{{k}_{i}} = {\text{ }}{{g}_{i}} + {\text{ }}{{g}_{{i-}}}_{1} \\ + \,\,{{p}_{{i-}}}_{1} \cdot {{g}_{{i-}}}_{2} + {\text{ }}...{\text{ }} + {{p}_{{i-}}}_{1} \cdot {{p}_{{i-}}}_{2} \cdot \ldots \cdot {{p}_{1}} \cdot {{g}_{0}}. \\ \end{gathered} $$
(5)

From (4) it follows that the bit switching activity ci equally depends on the values received by bits pi of the propagation carry, and values ki. In particular, for k5 according to (5) we have

$$\begin{gathered} {{k}_{5}} = {{g}_{5}} + {{g}_{4}} + {{p}_{4}} \cdot {{g}_{3}} + {{p}_{4}} \cdot {{p}_{3}} \cdot {{g}_{2}} \\ + \,\,{{p}_{4}} \cdot {{p}_{3}} \cdot {{p}_{2}} \cdot {{g}_{1}} + {{p}_{4}} \cdot {{p}_{3}} \cdot {{p}_{2}} \cdot {{p}_{1}} \cdot {{g}_{0}}. \\ \end{gathered} $$

Using the definition pigi = gi, Eq. (6) can be transformed:

$$\begin{gathered} {{k}_{5}} = {{g}_{5}} + {{g}_{4}} + {{p}_{4}} \cdot {{p}_{3}}~\left( {{{g}_{3}} + {{g}_{2}}} \right) \\ + \,{{p}_{4}} \cdot {{p}_{3}} \cdot {{p}_{2}} \cdot {{p}_{1}}~\left( {{{g}_{1}} + {{g}_{0}}} \right). \\ \end{gathered} $$
(7)

We assume that

$${{G}_{i}} = {\text{ }}{{g}_{i}} + {\text{ }}{{g}_{{i-}}}_{1},\,\,\,\,{{P}_{i}} = {\text{ }}{{p}_{i}} \cdot {{p}_{{i-}}}_{1},$$
(8)

then Eq. (7) is equivalent to the equation

$${{k}_{5}} = {{G}_{5}} + {{P}_{4}} \cdot {{G}_{3}} + {{P}_{4}} \cdot {{P}_{2}} \cdot {{G}_{1}}.$$

The value of signal k5 can be formed using the prefix associative operator ° according to the following expression:

$${{k}_{5}} = {\text{ }}\left( {{{G}_{5}},{{P}_{4}}} \right)^\circ \left( {{{G}_{3}},{{P}_{2}}} \right)^\circ \left( {{{G}_{1}}} \right).$$

Thus, an equation is obtained that realizes the value k5. In the case of an 8-bit adder, the remaining values k7k0 are calculated similarly:

$$\left. \begin{gathered} {{k}_{7}} = {{G}_{7}} + {{P}_{6}} \cdot {{G}_{5}} + {{P}_{6}} \cdot {{P}_{4}} \cdot {{G}_{3}} + {{P}_{6}} \cdot {{P}_{4}} \cdot {{P}_{2}} \cdot {{G}_{1}} \hfill \\ {{k}_{6}} = {{G}_{6}} + {{P}_{5}} \cdot {{G}_{4}} + {{P}_{5}} \cdot {{P}_{3}} \cdot {{G}_{2}} + {{P}_{4}} \cdot {{P}_{3}} \cdot {{P}_{1}} \cdot {{G}_{0}} \hfill \\ {{k}_{5}} = {{G}_{5}} + {{P}_{4}} \cdot {{G}_{3}} + {{P}_{4}} \cdot {{P}_{2}} \cdot {{G}_{1}} \hfill \\ {{k}_{4}} = {{G}_{4}} + {{P}_{3}} \cdot {{G}_{2}} + {{P}_{3}} \cdot {{P}_{1}} \cdot {{G}_{0}} \hfill \\ {{k}_{3}} = {{G}_{3}} + {{P}_{2}} \cdot {{G}_{1}} \hfill \\ {{k}_{2}} = {{G}_{2}} + {{P}_{1}} \cdot {{G}_{0}} \hfill \\ {{k}_{1}} = {{G}_{1}} \hfill \\ {{k}_{0}} = {{G}_{0}} \hfill \\ \end{gathered} \right\},$$

where P1 = p1 and G0 = g0.

The received bits ki can be represented in the prefix associative operator form:

$$\left. \begin{gathered} {{k}_{7}} = \left( {{{G}_{7}},{{P}_{6}}} \right) \circ \left( {{{G}_{5}},{{P}_{4}}} \right) \circ \left( {{{G}_{3}},{{P}_{2}}} \right) \circ \left( {{{G}_{1}}} \right) \hfill \\ {{k}_{6}} = \left( {{{G}_{6}},{{P}_{5}}} \right) \circ \left( {{{G}_{4}},{{P}_{3}}} \right) \circ \left( {{{G}_{2}},{{P}_{1}}} \right) \circ \left( {{{G}_{0}}} \right) \hfill \\ {{k}_{5}} = \left( {{{G}_{5}},{{P}_{4}}} \right) \circ \left( {{{G}_{3}},{{P}_{2}}} \right) \circ \left( {{{G}_{1}}} \right) \hfill \\ {{k}_{4}} = \left( {{{G}_{4}},{{P}_{3}}} \right) \circ \left( {{{G}_{2}},{{P}_{1}}} \right) \circ \left( {{{G}_{0}}} \right) \hfill \\ {{k}_{3}} = \left( {{{G}_{3}},{{P}_{2}}} \right) \circ \left( {{{G}_{1}}} \right) \hfill \\ {{k}_{2}} = \left( {{{G}_{2}},{{P}_{1}}} \right) \circ \left( {{{G}_{0}}} \right) \hfill \\ {{k}_{1}} = \left( {{{G}_{1}}} \right) \hfill \\ {{k}_{0}} = \left( {{{G}_{0}}} \right) \hfill \\ \end{gathered} \right\}.$$
(9)

After analyzing (9), by induction we can obtain that the bits ki are even, and odd bit positions can be expressed as

$${{k}_{{2m}}} = \left( {{{G}_{{2m}}},{{P}_{{2m-1}}}} \right)^\circ \left( {{{G}_{{2m-2}}},{{P}_{{2m-3}}}} \right)^\circ \ldots ~^\circ \left( {{{G}_{0}}} \right),$$
(10)
$${{k}_{{2m{\text{ }} + 1}}} = \left( {{{G}_{{2m + 1}}},{{P}_{{2m}}}} \right)^\circ \left( {{{G}_{{2m-1}}},{{P}_{{2m-2}}}} \right)^\circ \ldots ^\circ \left( {{{G}_{1}}} \right).$$
(11)

Each associative operator links pairs of signals G and P and is defined as follows:

$$\left( {G,P} \right)^\circ ~~\left( {G{\kern 1pt} ',P{\kern 1pt} '} \right) = \left( {G + P \cdot G{\kern 1pt} '} \right),\,\,\,\,\left( {P \cdot P{\kern 1pt} '} \right).$$
(12)

Taking into account the operator °, the calculations of signal ki with a parallel prefix can be represented as a prefix tree of the modified adder. Computing bits ki and pi instead of the normal translations ci makes it harder to get the bits of the final sum si since, in this case,

$${{s}_{i}} = {{a}_{i}} \oplus {{b}_{i}} \oplus {{c}_{{i-}}}_{1} = {{a}_{i}} \oplus {{b}_{i}} \oplus {{k}_{{i-}}}_{1}{{p}_{{i-}}}_{1}.$$

After the identical transformations (aibi) we get

$$\begin{gathered} {{s}_{i}} = \left( {\overline {{{a}_{i}}} + \overline {{{b}_{i}}} } \right)\left( {{{a}_{i}} + {{b}_{i}}} \right) \oplus {{c}_{{i - 1}}} \\ = \overline {{{a}_{i}}{{b}_{i}}} \left( {{{a}_{i}} + {{b}_{i}}} \right) \oplus {{k}_{{i - 1}}}{{p}_{{i - 1}}}{\text{,}} \\ \end{gathered} $$
(13)

where the symbols – and ⊕ denote the logical operations NOT and Exclusive OR. According to (1) at \(\overline {{{a}_{i}}{{b}_{i}}} = \overline {{{g}_{i}}} \) and ai + bi = pi, in order to realize the final sum si expression (13) can be represented as

$${{s}_{i}} = \overline {{{g}_{i}}} \cdot {{p}_{i}} \oplus {{k}_{{i - 1}}} \cdot {{p}_{{i - 1}}}.$$

However, computing bits si can be transformed in the following identical way:

$${{s}_{i}} = \overline {{{k}_{{i - 1}}}} \left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right) + {{k}_{{i - 1}}}\left( {\left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right) \oplus {{p}_{{i - 1}}}} \right).$$
(14)

Equation (14) can be implemented using a multiplexer that chooses either \(\left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right)\) or \(\left( {\left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right) \oplus {{p}_{{i - 1}}}} \right)\) according to value ki–1. As a rule, the Exclusive OR gate has an almost equal multiplexer delay and propagation delay time from the input of the input operands to the establishment of the function’s result \(\left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right) \oplus {{p}_{{i - 1}}}\) is less than before the establishment of the signal ki–1. Therefore, the excess delay will not appear in the adder circuit due to the multiplexer for the output of the final sum si. The output carryover cout is formed almost simultaneously with the bits of the sum using the relation cn–1 = kn–1pn–1. Thus, according to expressions (1), (8), (10), (11), and (14), the modified adder is implemented by a three-stage circuit consisting of a precomputation stage, a prefix tree formation stage, and a result calculation stage. The logic diagram of the 8-bit modified adder is shown in Fig. 3.

Fig. 3.
figure 3

Logic circuit of an 8-bit modified adder.

This scheme works as follows. The precomputing stage generates signals gi and pi for all categories ai and bi using the logical elements AND and OR in accordance with (1). After combining bits gi, pi, gi–1, and pi–1, signals Gi and Pi–1 are calculated based on (8) using the logical elements AND and OR. The second stage adder, or prefix tree, calculates signals ki using bits Gi and Pi–1 in accordance with (10) and (11). At the stage of generating the result, the adder calculates sum si based on (14) using the elements NOT, AND, Exclusive OR, and the multiplexer. The output carryover cout is formed using the ratio cn–1 = kn–1pn–1.

To design a 64-bit modified adder circuit, schematic blocks (black and white) are implemented using (12), which are used for greater clarity of the prefix tree. These blocks take inputs from both the given bit position (l–1Gi, l–1Pi–1), and from the lower bit positions \({{(}^{{l-}}}^{1}G_{i}^{{{\kern 1pt} '}}{{,}^{{l-}}}^{1}P_{{i - 1}}^{{{\kern 1pt} '}})\) at the previous level. They are connected to form signals ki. The network of these blocks is called the prefix tree. Figure 4 shows a schematic of a 64-bit modified adder.

Fig. 4.
figure 4

64-bit modified adder circuit.

This circuit calculates signals gi and pi for pairs of input bits, then calculates signals Gi and Pi–1 to create a prefix tree. Then log2n – 1 = 5 levels of schematic blocks is used to form a prefix tree that calculates signals ki for each digit. After that, at the last stage, the result of addition si and the output transfer cout, together with signals ki are calculated.

COMPARISON OF HARDWARE AND TIME COSTS A OF PPA

Let us estimate the hardware and time costs of a PPA in terms of the occupied area and maximum delay. To estimate the area occupied by the adder, it is necessary to sum the areas of all the used logic elements. For the maximum delay, the longest (critical) path along which the signal passes is chosen, and the delay time of all logic elements along this path is added [9]. Assume that αNOT and τNOT are the occupied area and the delay of the NOT element, respectively; αAND and τAND are the AND element; αOR and τOR are the OR element; αexc and τexc are the Exclusive OR element; and αm and τm are the multiplexer.

After analyzing the structures of the considered PPAs, we obtain the formulas for calculating the occupied area and the maximum delay:

— for the n-bit Kogge–Stone adder

$$\begin{gathered} {{\alpha }_{{{\text{KS}}}}} = \left( {2n\left( {{\text{lo}}{{{\text{g}}}_{2}}n} \right) - 2n + 3} \right){{\alpha }_{{{\text{AND}}}}} \\ + \left( {n\left( {{\text{lo}}{{{\text{g}}}_{2}}n} \right) - n + 1} \right){{\alpha }_{{{\text{OR}}}}} + \left( {2n - 1} \right){{\alpha }_{{{\text{exc}}}}}, \\ \end{gathered} $$
$${{\tau }_{{{\text{KS}}}}} = \left( {{\text{lo}}{{{\text{g}}}_{2}}n} \right){{\tau }_{{{\text{AND}}}}} + \left( {{\text{lo}}{{{\text{g}}}_{2}}n} \right){{\tau }_{{{\text{OR}}}}} + 2{{\tau }_{{{\text{exc}}}}}{\text{;}}$$

— and for the for n-bit modified adder

$$\begin{gathered} {{\alpha }_{{{\text{MOD}}}}} = n{{\alpha }_{{{\text{NOT}}}}} + \left( {\left( {n{\text{lo}}{{{\text{g}}}_{2}}n} \right) + n} \right){{\alpha }_{{{\text{AND}}}}} \\ + \,\,\left( {2n - 1 + \frac{n}{2}\left( {{\text{lo}}{{{\text{g}}}_{2}}n - 1} \right)} \right){{\alpha }_{{{\text{OR}}}}} \\ + \,\,\left( {n - 1} \right){{\alpha }_{{{\text{EXC}}}}} + \left( {n - 2} \right){{\alpha }_{{\text{m}}}}{\text{,}} \\ \end{gathered} $$
$${{\tau }_{{{\text{mod}}}}} = \left( {{\text{lo}}{{{\text{g}}}_{2}}n} \right){{\tau }_{{{\text{AND}}}}} + \left( {{\text{lo}}{{{\text{g}}}_{2}}n} \right){{\tau }_{{{\text{OR}}}}} + {{\tau }_{{\text{m}}}}.$$

To quantify the delay, assume that τNOT = 0.5, τAND = τOR = 1 conventional units, and τexc = τm= 2. To estimate the area, we assume that αNOT = 0.5, αAND = αOR = 1, and αexc = αm= 2. The obtained values of the occupied area and the maximum delay (operating time) of the PPA for 8, 16, 32, and 64 bits are given in Table 1.

It can be seen from the Table 1 that when increasing the bit depth from the point of view of hardware and time costs, the modified adder is characterized by better performance and a smaller footprint than the Kogge–Stone adder. In particular, the footprint of the 32-bit and 64-bit modified adders is 11 and 16.5% less, respectively, than that of the standard Kogge–Stone adder. For 32 and 64 bits, the proposed adder gives a decrease in the maximum delay by almost 7% compared to the Kogge–Stone adder.

Table 1.   Evaluation of the occupied area and the maximum delay of adders

DEVELOPMENT OF A SCHEME FOR CHECKING THE RESULTS OF THE MODIFIED ADDER

To confirm the reliability of the modified adder’s operation, a scheme for checking the reliability of the results is proposed (Fig. 5). The circuit contains an 8‑bit modified adder, an 8-bit standard Kogge–Stone adder, a 16-bit synchronous totalizer, a 9-bit digital equality comparator, a four-input AND gate, and two inverters. The synthesis of an adding counter and an equality comparator is considered in [10].

Fig. 5.
figure 5

Circuit for checking the results of the modified adder’s operation.

The circuit has inputs—clock signals (clk), start (start), and reset (reset)—and outputs—a 16-bit number (q[15…0]), the finish (Done), and detected error (Error). The input start is used to count or stop the counter (1, count; 0, stop). The input reset controls the operation of the counter (1 to run, 0 to reset). The calculated counter values are expressed by the output q[15…0]. The signal Done shows that all possible argument values have been successfully tested as input arguments for two adders. If the signal Done = 1, then the process of checking the results was successful; i.e., all the results of the modified adder are correct. The signal Error indicates that an error has been detected in some result of the modified adder compared to the result of the reference adder. If there is an error, then the output Error = 1. Otherwise the signal Error = 0.

Let us consider how the circuit works. Suppose the output signal E of the comparator and inversion signal’s \(\overline {{{p}_{{out}}}} \) output carry counter are 1. In this case reset = 1 and start = 1, while clk enters the clock input of the counter through the AND element. Then the counter calculates all 216 = 65 536 possible values for a 16-bit binary number, and the transition to the next value occurs on the rising edge of the clock pulse. The calculated values of the operands for the counters are defined as q15q8 for the first term (operand A) and q7q0 for the second term (operand B). The received operands are supplied to the corresponding inputs of the modified and reference adders. These adders perform addition on the input operands in different ways and send 9-bit sums to the outputs. The comparator then compares the two 9-bit binary sums from the adders and provides one output signal indicating whether they are equal or not. If the corresponding digits are equal, then signal E = 1 and the process is repeated until the finish is reached. If any discharges S1 and S2 are unequal, the signal Error = 1 and the verification process stops. When the output carry counter Pout is logical 1, then the signal Done will be set to logical 1; i.e., the verification process ends and the process stops.

MODELING AND VALIDATION OF RESULTS

The proposed scheme was modeled in a graphical editor in the CAD Altera Quartus-II environment. For simulation in the Quartus-II environment, the functional simulation mode was selected using the Cyclone-II-EP2C20F484C7 FPGA family. Figure 6 shows a diagram of the results (version at the level of register transfers of RTL-Viewer) obtained after compiling the diagram. The successful verification of the results of the modified adder operation when adding two 8-bit numbers with all possible values is confirmed by the timing diagram (Fig. 7) obtained in the Quartus II environment.

Fig. 6.
figure 6

Scheme of RTL-Viewer results obtained after compilation.

Fig. 7.
figure 7

Time chart of checking the results of the modified adder’s operation when adding two 8-bit numbers with all possible values.

CONCLUSIONS

As a result of a comparative analysis of the Kogge–Stone adder and the modified adder, the following points were established: when the number of bits of operands is 16, 32, or 64, the modified adder performs better and occupies a smaller area than the Kogge–Stone adder. The reliability of the results of the modified adder when adding with all possible values of two 8-bit operands was checked using the developed circuit, which was modeled in the CAD Altera Quartus-II environment. The results are confirmed by the time chart.

The proposed modified adder can be used in the design of high-speed operating devices in computer arithmetic.