Abstract
For the fast addition of two multibit binary numbers, parallel-prefix adders (PPAs) are currently considered effective. Several PPAs are known with different time and hardware characteristics, and in particular, the Kogge–Stone adder is faster than other PPAs. However, this adder has a large number of logical elements and, therefore, occupies a large area, which leads to an increase in its price. This paper analyzes the Kogge–Stone adder. To reduce its hardware and time costs, a modified PPA is developed. Adders are compared in terms of the occupied area and the maximum delay of an operation. A results’ verification scheme is implemented to confirm the reliability of the modified adder’s operation. This circuit is simulated in the CAD Altera Quartus-II environment. As a result, it is found that when performing operations with 32- and 64-bit operands, the developed adder reduces the occupied area by 11 and 16.5%, respectively, and the maximum delay by 7%, compared to a Kogge–Stone adder.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
INTRODUCTION
The hardware implementation of binary addition is a fundamental architectural component in microprocessor systems such as microprocessors, digital signal processors, mathematical coprocessors, microcontrollers, mobile devices, etc. In these systems, in the design of operating devices, combiners play an important role in performing many computer arithmetic operations based on addition. In this case, the increase in the performance of operating devices depends on the efficiency of the adder. Developing an efficient adder that delivers high performance is a pressing challenge.
Currently, almost all modern computers in critical paths use parallel-prefix adders (PPAs) [1]. They are highly efficient in terms of speed and are used to quickly add two multidigit binary numbers. The literature describes several PPAs with a different occupied area and maximum delay time [2]. PPAs include Sklansky, Kogge–Stone, Brent–Kung, Han–Carlson, and Ladner–Fischer adders. The Kogge-Stone adder is considered to be the fastest, since it has the smallest propagation delay time in the circuit implementation among these PPAs [3, 4].
The aim of this study is to develop a multibit modified PPA to reduce the hardware and time costs in comparison with the well-known Kogge–Stone adder.
KOGGE–STONE ADDER
The Kogge–Stone adder is considered the fastest standard PPA, which adds two n-bit numbers \(A = {{a}_{{n - 1}}}{{a}_{{n - 2}}}.....{{a}_{0}},\;B = {{b}_{{n - 1}}}{{b}_{{n - 2}}}.....{{b}_{0}},\) and forms an n‑bit result, \(S = {{s}_{{n - 1}}}{{s}_{{n - 2}}}.....{{s}_{0}}\) and output transfer \({{c}_{{out}}}.\) The Kogge–Stone adder circuits for 8 and 32 bits are shown in Figs. 1 and 2. First the circuits compute the carry generation signals \({{g}_{i}}\) and half-sum bits \({{h}_{i}}\) for bit pairs by logical equations, and \({{g}_{i}} = {{a}_{i}} \cdot {{b}_{i}};\;{{h}_{i}} = {{a}_{i}} \otimes {{b}_{i}},\) then for nodes using the expression (gi:k, pi:k) = (gi:j + pi:j ⋅ Gj–1:k, pi:j ⋅ pj–1:k) until the carry generation signal ci is not be known for each digit. The sum si is determined by signals hi and ci–1 according to si = hi ⊕ ci–1 through the Exclusive OR element. The Kogge–Stone adder is described in more detail in [5, 6].
The Kogge–Stone adder is faster than the Sklansky, Brent–Kung, Han–Carlson, and Ladner–Fischer adders [7, 8]. However, for the hardware implementation, the Kogge–Stone adder requires more logic elements, hence, an increased area, consumes more power, and is more expensive. Nevertheless, the Kogge–Stone adder is used in high-performance applications, since performance is the most important determining factor.
DEVELOPMENT OF A MODIFIED PPA
In order to reduce the hardware and time costs, a modified version of the PPA has been developed, which uses its own prefix tree structure. Assume that A = an–1an–2…a0 and B = bn–1bn–2…b0 are two binary numbers that will be added up, and S = sn–1sn–2…s0 and cout represent their sum and output carry. The idea of designing a modified adder begins with the logic of generating a propagation carry with a parallel-prefix carry. To calculate the generation g and propagation p of the transfer, we assume that
where i = 0 ≤ i ≤ n – 1; and the symbols “·” and “+” denote the logical operations AND and OR, respectively.
The calculation of the normal carries of the adder is performed as
According to definition (1), it is assumed that pi ⋅ Gi = gi. Then we can perform identical transformations similarly to (2):
After defining the term of variable ki, Eq. (3) can be written as
From (4) it follows that the bit switching activity ci equally depends on the values received by bits pi of the propagation carry, and values ki. In particular, for k5 according to (5) we have
Using the definition pi ⋅ gi = gi, Eq. (6) can be transformed:
We assume that
then Eq. (7) is equivalent to the equation
The value of signal k5 can be formed using the prefix associative operator ° according to the following expression:
Thus, an equation is obtained that realizes the value k5. In the case of an 8-bit adder, the remaining values k7…k0 are calculated similarly:
where P1 = p1 and G0 = g0.
The received bits ki can be represented in the prefix associative operator form:
After analyzing (9), by induction we can obtain that the bits ki are even, and odd bit positions can be expressed as
Each associative operator links pairs of signals G and P and is defined as follows:
Taking into account the operator °, the calculations of signal ki with a parallel prefix can be represented as a prefix tree of the modified adder. Computing bits ki and pi instead of the normal translations ci makes it harder to get the bits of the final sum si since, in this case,
After the identical transformations (ai ⊕ bi) we get
where the symbols – and ⊕ denote the logical operations NOT and Exclusive OR. According to (1) at \(\overline {{{a}_{i}}{{b}_{i}}} = \overline {{{g}_{i}}} \) and ai + bi = pi, in order to realize the final sum si expression (13) can be represented as
However, computing bits si can be transformed in the following identical way:
Equation (14) can be implemented using a multiplexer that chooses either \(\left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right)\) or \(\left( {\left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right) \oplus {{p}_{{i - 1}}}} \right)\) according to value ki–1. As a rule, the Exclusive OR gate has an almost equal multiplexer delay and propagation delay time from the input of the input operands to the establishment of the function’s result \(\left( {\overline {{{g}_{i}}} \cdot {{p}_{i}}} \right) \oplus {{p}_{{i - 1}}}\) is less than before the establishment of the signal ki–1. Therefore, the excess delay will not appear in the adder circuit due to the multiplexer for the output of the final sum si. The output carryover cout is formed almost simultaneously with the bits of the sum using the relation cn–1 = kn–1 ⋅ pn–1. Thus, according to expressions (1), (8), (10), (11), and (14), the modified adder is implemented by a three-stage circuit consisting of a precomputation stage, a prefix tree formation stage, and a result calculation stage. The logic diagram of the 8-bit modified adder is shown in Fig. 3.
This scheme works as follows. The precomputing stage generates signals gi and pi for all categories ai and bi using the logical elements AND and OR in accordance with (1). After combining bits gi, pi, gi–1, and pi–1, signals Gi and Pi–1 are calculated based on (8) using the logical elements AND and OR. The second stage adder, or prefix tree, calculates signals ki using bits Gi and Pi–1 in accordance with (10) and (11). At the stage of generating the result, the adder calculates sum si based on (14) using the elements NOT, AND, Exclusive OR, and the multiplexer. The output carryover cout is formed using the ratio cn–1 = kn–1 ⋅ pn–1.
To design a 64-bit modified adder circuit, schematic blocks (black and white) are implemented using (12), which are used for greater clarity of the prefix tree. These blocks take inputs from both the given bit position (l–1Gi, l–1Pi–1), and from the lower bit positions \({{(}^{{l-}}}^{1}G_{i}^{{{\kern 1pt} '}}{{,}^{{l-}}}^{1}P_{{i - 1}}^{{{\kern 1pt} '}})\) at the previous level. They are connected to form signals ki. The network of these blocks is called the prefix tree. Figure 4 shows a schematic of a 64-bit modified adder.
This circuit calculates signals gi and pi for pairs of input bits, then calculates signals Gi and Pi–1 to create a prefix tree. Then log2n – 1 = 5 levels of schematic blocks is used to form a prefix tree that calculates signals ki for each digit. After that, at the last stage, the result of addition si and the output transfer cout, together with signals ki are calculated.
COMPARISON OF HARDWARE AND TIME COSTS A OF PPA
Let us estimate the hardware and time costs of a PPA in terms of the occupied area and maximum delay. To estimate the area occupied by the adder, it is necessary to sum the areas of all the used logic elements. For the maximum delay, the longest (critical) path along which the signal passes is chosen, and the delay time of all logic elements along this path is added [9]. Assume that αNOT and τNOT are the occupied area and the delay of the NOT element, respectively; αAND and τAND are the AND element; αOR and τOR are the OR element; αexc and τexc are the Exclusive OR element; and αm and τm are the multiplexer.
After analyzing the structures of the considered PPAs, we obtain the formulas for calculating the occupied area and the maximum delay:
— for the n-bit Kogge–Stone adder
— and for the for n-bit modified adder
To quantify the delay, assume that τNOT = 0.5, τAND = τOR = 1 conventional units, and τexc = τm= 2. To estimate the area, we assume that αNOT = 0.5, αAND = αOR = 1, and αexc = αm= 2. The obtained values of the occupied area and the maximum delay (operating time) of the PPA for 8, 16, 32, and 64 bits are given in Table 1.
It can be seen from the Table 1 that when increasing the bit depth from the point of view of hardware and time costs, the modified adder is characterized by better performance and a smaller footprint than the Kogge–Stone adder. In particular, the footprint of the 32-bit and 64-bit modified adders is 11 and 16.5% less, respectively, than that of the standard Kogge–Stone adder. For 32 and 64 bits, the proposed adder gives a decrease in the maximum delay by almost 7% compared to the Kogge–Stone adder.
DEVELOPMENT OF A SCHEME FOR CHECKING THE RESULTS OF THE MODIFIED ADDER
To confirm the reliability of the modified adder’s operation, a scheme for checking the reliability of the results is proposed (Fig. 5). The circuit contains an 8‑bit modified adder, an 8-bit standard Kogge–Stone adder, a 16-bit synchronous totalizer, a 9-bit digital equality comparator, a four-input AND gate, and two inverters. The synthesis of an adding counter and an equality comparator is considered in [10].
The circuit has inputs—clock signals (clk), start (start), and reset (reset)—and outputs—a 16-bit number (q[15…0]), the finish (Done), and detected error (Error). The input start is used to count or stop the counter (1, count; 0, stop). The input reset controls the operation of the counter (1 to run, 0 to reset). The calculated counter values are expressed by the output q[15…0]. The signal Done shows that all possible argument values have been successfully tested as input arguments for two adders. If the signal Done = 1, then the process of checking the results was successful; i.e., all the results of the modified adder are correct. The signal Error indicates that an error has been detected in some result of the modified adder compared to the result of the reference adder. If there is an error, then the output Error = 1. Otherwise the signal Error = 0.
Let us consider how the circuit works. Suppose the output signal E of the comparator and inversion signal’s \(\overline {{{p}_{{out}}}} \) output carry counter are 1. In this case reset = 1 and start = 1, while clk enters the clock input of the counter through the AND element. Then the counter calculates all 216 = 65 536 possible values for a 16-bit binary number, and the transition to the next value occurs on the rising edge of the clock pulse. The calculated values of the operands for the counters are defined as q15…q8 for the first term (operand A) and q7…q0 for the second term (operand B). The received operands are supplied to the corresponding inputs of the modified and reference adders. These adders perform addition on the input operands in different ways and send 9-bit sums to the outputs. The comparator then compares the two 9-bit binary sums from the adders and provides one output signal indicating whether they are equal or not. If the corresponding digits are equal, then signal E = 1 and the process is repeated until the finish is reached. If any discharges S1 and S2 are unequal, the signal Error = 1 and the verification process stops. When the output carry counter Pout is logical 1, then the signal Done will be set to logical 1; i.e., the verification process ends and the process stops.
MODELING AND VALIDATION OF RESULTS
The proposed scheme was modeled in a graphical editor in the CAD Altera Quartus-II environment. For simulation in the Quartus-II environment, the functional simulation mode was selected using the Cyclone-II-EP2C20F484C7 FPGA family. Figure 6 shows a diagram of the results (version at the level of register transfers of RTL-Viewer) obtained after compiling the diagram. The successful verification of the results of the modified adder operation when adding two 8-bit numbers with all possible values is confirmed by the timing diagram (Fig. 7) obtained in the Quartus II environment.
CONCLUSIONS
As a result of a comparative analysis of the Kogge–Stone adder and the modified adder, the following points were established: when the number of bits of operands is 16, 32, or 64, the modified adder performs better and occupies a smaller area than the Kogge–Stone adder. The reliability of the results of the modified adder when adding with all possible values of two 8-bit operands was checked using the developed circuit, which was modeled in the CAD Altera Quartus-II environment. The results are confirmed by the time chart.
The proposed modified adder can be used in the design of high-speed operating devices in computer arithmetic.
REFERENCES
Yakunin, A.N. and Aung, M.S., Comparative analysis of characteristics of binary multi-bit parallel adders, Izv. Vyssh. Uchebn. Zaved., Elektron., 2018, vol. 23, no. 3, pp. 299–301.
Daphni, S. and Vijula Grace, K.S., A review analysis of parallel prefix adders for better performance in VLSI applications, in Proceedings of the IEEE International Conference on Circuits and Systems, Thiruvananthapuram, India, 2017, pp. 103–106.
Daphni, S. and Vijula Grace, S.K., Design and analysis of 32-bit parallel prefix adders for low power VLSI applications, Adv. Sci. Technol. Eng. Syst., 2019, vol. 4, pp. 102–106.
Rahila, KC. and Sajesh Kumar, U., A comprehensive comparative analysis of parallel prefix adders for asic implementation, in Proceedings of the International Conference on Systems Energy and Environment, GCE Kannur, Kerala, July 2019, pp. 1–5.
Aung, M.S. and Yakunin, A.N., Reduction of the hardware complexity of a parallel prefix adder, in Proceedings of the International Conference EIConRus-2018, St. Petersburg, Moscow, June 28–31, 2018, Moscow: MIET, 2018, pp. 1348–1349.
Penchalaiah, U. and Siva Kumar, V.G., Design of high-speed and energy-efficient parallel prefix Kogge– Stone adder, in Proceedings of the IEEE International Conference on System, Computation, Automation and Networking, Pondicherry, India, July 6–7, 2018, 2018, pp. 1–6.
Yakunin, A.N. and Aung, M.S., Increasing the operating speed of a multi-bit binary multiplier, in Problemy razrabotki perspektivnykh mikro- i nanoelektronnykh sistem MES-2018 (Problems of the Development of Promising Micro- and Nanoelectronic Systems, Proceedings of the 7th All-Russia Conference), 2018, vol. 2, pp. 149–151.
Yakunin, A.N. and Aung, M.S., Research and modification of a multi-bit parallel-prefix adder, Izv. Vyssh. Uchebn. Zaved., Elektron., 2019, vol. 24, no. 2, pp. 197–207.
Chervyakov, N.I., Lyakhov, P.A., Valueva, M.V., and Krivolapova, O.V., Comparative analysis of adders hardware implementation on FPGA, Nauka. Innov. Tekhnol., 2016, no. 4, pp. 99–108.
Harris, D. and Harris, S., Digital Design and Computer Architecture, New York: Morgan Kaufmann, 2012.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yakunin, A.N., San, A.M. & Htun, H.M. Improving the Operating Efficiency of a Multibit Binary Parallel-Prefix Adder. Russ Microelectron 50, 491–498 (2021). https://doi.org/10.1134/S106373972107012X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S106373972107012X