Multiplication of a Constant (2k ± 1) and Its Fast Hardware Implementation

Jui, Pin-Chang; Wey, Chin-Long; Shiue, Muh-Tian

doi:10.1007/s11265-015-0978-4

Multiplication of a Constant (2^k ± 1) and Its Fast Hardware Implementation

Published: 14 February 2015

Volume 82, pages 41–53, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Signal Processing Systems Aims and scope Submit manuscript

Multiplication of a Constant (2^k ± 1) and Its Fast Hardware Implementation

Download PDF

Pin-Chang Jui¹,
Chin-Long Wey² &
Muh-Tian Shiue¹

219 Accesses
Explore all metrics

Abstract

Constant multiplier performs a multiplication of a data-input with a constant value. Constant multipliers are essential components in various types of arithmetic circuits, such as filters in digital signal processor (DSP) units and they are prevalent in modern VLSI designs. This study presents efficient algorithms and their fast hardware implementation for performing multiplying-by-(2^k ± 1), or (2^k ± 1)N, operation with additions. No multiplications are needed. The value of (2^k ± 1)N can be computed by adding (±N) to its k-bits left-shifted value 2^kN. The additions can be performed by the full-adder-based (FA-based) ripple carry adder (RCA) for simple architecture. This paper presents the unit cells for additions (UCAs). Results show that the UCA-based RCA achieves 34 % faster than the FA-based RCA. Further, in order to improve the speed performance with lower hardware cost, this paper also presents a simple and modular hybrid adder with the proposed UCA concept, where the hybrid adder takes the lower-bit carry lookahead adder (CLA) as a module and many of the CLA modules are serially connected in a fashion similar to the RCA. Results show that the proposed hybrid adder achieved speed performance improvement while maintaining its modular and regular structure.

Realization of Carry-Free Adder Circuit Using FPGA

Design of Delay-Efficient Carry-Save Multiplier by Structural Decomposition of Conventional Carry-Save Multiplier

Review on Different Types of Multipliers and Its Performance Comparisons

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Constant multiplier performs a multiplication of a data-input with a constant value. Constant multipliers are essential components in various types of arithmetic circuits, such as filters in digital signal processor (DSP) units, dominate the hardware complexity of digital filters [1]. In addition, they are prevalent in modern VLSI designs.

The multiplication by a fixed-point constant can be done “multiplier-less” using additions and shifts only. In such filters the number of adders determines the implementation cost. Since the shifters are implemented as hard-wired inter-block connections, they are considered “free” in transposed implementation of an FIR filter; each input is multiplied by several coefficients [1–3]. Constant multiplier design has been investigated for several decades. However, the emphasis was placed on minimizing the number of additions required to achieve the multiplication of a given constant [4].

In this study, the emphasis is placed on performing the constant multiplication with a faster adder in only one addition operation. This paper targets the development of the multiplication of a constant (2^k ± 1). The value of (2^k + 1)N can be computed by adding N to its k-bit left-shifted value 2^kN. On the other hand, The value of (2^k-1)N can be computed by subtracting N from its k-bits left-shifted value 2^kN, or adding (−N) to 2^kN. The additions can be performed by a simple ripple carry adder (RCA), or a higher speed carry-lookahead adder (CLA).

The unit cells for additions (UCAs) are introduced in this study to construct the UCA-based RCA. Results will show that the UCA-based RCA achieves approximately 34 % faster than Full-adder-(FA)-based RCAs. In order to further improve the speed performance, a simple and modular hybrid adder is also presented, where reasonably smaller bit size of CLA is used as a module and many modules are serially connected in a fashion similar to the RCA.

In the next section, the conventional multiplication of a constant (2^k ± 1) using FA-based RCA is discussed. Section III presents the proposed UCA-based RCA structures for (2^k ± 1)N operation. Section IV describes a hybrid adder for the constant multiplication. Finally, a brief concluding remark is given in Section V.

2 FA-based RCAs for (2^k ± 1)N Operations

Let N = (a_n-1a_n-2…a₀) be an n-bit number. For the (2^k + 1)N operation, there exists a number m such that n = m × k (if n < m × k, sign extension is applied). Thus, N can be expressed as (A_m-1A_m-2…A₀), where A_i = (a_(i+1)k-1…a_ik+1a_ik), i = 0, 1, …,m-1, and the (2^k + 1)N operation can be performed by adding N to its k-bits left-shifted value 2^kN, i.e., (1 + 2^k)N = N + 2^kN, where 2^kN = (A_m-1A_m-2…A₀0) and 0 = (0..00).

$$ \begin{array}{llllllllll}\mathrm{N}\hfill & 0\hfill & {\mathrm{A}}_{\mathrm{m}-1}\hfill & \dots \hfill & {\mathrm{A}}_{\mathrm{i}+1}\hfill & {\mathrm{A}}_{\mathrm{i}}\hfill & {\mathrm{A}}_{\mathrm{i}-1}\hfill & \dots \hfill & {\mathrm{A}}_1\hfill & {\mathrm{A}}_0\hfill \\ {}\underset{\bar{\mkern6mu}}{+{2}^{\mathrm{k}}\mathrm{N}}\hfill & {\mathrm{A}}_{\mathrm{m}-1}\hfill & {\mathrm{A}}_{\mathrm{m}-2}\hfill & \dots \hfill & {\mathrm{A}}_{\mathrm{i}}\hfill & {\mathrm{A}}_{\mathrm{i}-1}\hfill & {\mathrm{A}}_{\mathrm{i}-2}\hfill & \dots \hfill & {\mathrm{A}}_0\hfill & 0\hfill \\ {}\left(1+{2}^{\mathrm{k}}\right)\mathrm{N}\hfill & {\mathrm{S}}_{\mathrm{m}}\hfill & {\mathrm{S}}_{\mathrm{m}-1}\hfill & \dots \hfill & {\mathrm{S}}_{\mathrm{i}+1}\hfill & {\mathrm{S}}_{\mathrm{i}}\hfill & {\mathrm{S}}_{\mathrm{i}-1}\hfill & \dots \hfill & {\mathrm{S}}_1\hfill & {\mathrm{S}}_0\hfill \end{array} $$

(1)

The 3N operation is performed by 3N = N + 2 N,

$$ \begin{array}{lllllll}\mathrm{N}\hfill & 0\hfill & {\mathrm{a}}_{\mathrm{n}-1}\hfill & {\mathrm{a}}_{\mathrm{n}-2}\hfill & \dots \hfill & {\mathrm{a}}_1\hfill & {\mathrm{a}}_0\hfill \\ {}\underset{\bar{\mkern6mu}}{ + 2\mathrm{N}}\hfill & {\mathrm{a}}_{\mathrm{n}-1}\hfill & {\mathrm{a}}_{\mathrm{n}-2}\hfill & {\mathrm{a}}_{\mathrm{n}-3}\hfill & \dots \hfill & {\mathrm{a}}_0\hfill & 0\hfill \\ {}3\mathrm{N}\hfill & {\mathrm{s}}_{\mathrm{n}}\hfill & {\mathrm{s}}_{\mathrm{n}-1}\hfill & {\mathrm{s}}_{\mathrm{n}-2}\hfill & \dots \hfill & {\mathrm{s}}_1\hfill & {\mathrm{s}}_0\hfill \end{array} $$

(2)

and the 9N = N + 8 N operation with k = 3 is operated as

$$ \begin{array}{lllllllll}\mathrm{N}\hfill & 0\hfill & 0\hfill & 0\hfill & {\mathrm{a}}_{3\mathrm{m}-1}\hfill & \dots \hfill & {\mathrm{a}}_{3\mathrm{i}+2}{\mathrm{a}}_{3\mathrm{i}+1}{\mathrm{a}}_{3\mathrm{i}}\hfill & \dots \hfill & {\mathrm{a}}_3{\mathrm{a}}_{2\ }{\mathrm{a}}_1{\mathrm{a}}_0\hfill \\ {}\underset{\bar{\mkern6mu}}{ + 8\mathrm{N}}\hfill & {\mathrm{a}}_{3\mathrm{m}-1}\hfill & {\mathrm{a}}_{3\mathrm{m}-2}\hfill & {\mathrm{a}}_{3\mathrm{m}-3}\hfill & {\mathrm{a}}_{3\mathrm{m}-4}\hfill & \dots \hfill & {\mathrm{a}}_{3\mathrm{i}-1\ }{\mathrm{a}}_{3\mathrm{i}-2\ }{\mathrm{a}}_{3\mathrm{i}-3}\hfill & \dots \hfill & {\mathrm{a}}_{0\ }00\ 0\hfill \\ {}9\mathrm{N}\hfill & {\mathrm{s}}_{3\mathrm{m}+2}\hfill & {\mathrm{s}}_{3\mathrm{m}+1}\hfill & {\mathrm{s}}_{3\mathrm{m}}\hfill & {\mathrm{s}}_{3\mathrm{m}-1}\hfill & \dots \hfill & {\mathrm{s}}_{3\mathrm{i}+2}{\mathrm{s}}_{3\mathrm{i}+1}{\mathrm{s}}_{3\mathrm{i}}\hfill & \dots \hfill & {\mathrm{s}}_3{\mathrm{s}}_{2\ }{\mathrm{s}}_1{\mathrm{s}}_0\hfill \end{array} $$

(3)

On the other hand, (2^k-1)N = 2^kN+(−N) = 2^kN + N* + 1, where N* is the bit-complement of N, i.e., N* = (A_m-1’A_m-2’…A₀’) and A_i’ = (a_(i+1)k-1’…a_ik+1’a_ik’), i = 0,1,…,m-1. Thus, (2^k-1)N operation can be performed by adding the k-bits left-shifted value 2^kN to N* with an initial carry of 1. For example, the 7 N operation is performed as follows,

$$ \begin{array}{lllllllll}\mathrm{N}*\hfill & 1\hfill & 1\hfill & 1\hfill & {\mathrm{a}}_{3\mathrm{m}-1}'\hfill & \dots \hfill & {\mathrm{a}}_{3\mathrm{i}+2}'{\mathrm{a}}_{3\mathrm{i}+1}'{\mathrm{a}}_{3\mathrm{i}}'\hfill & \dots \hfill & {\mathrm{a}}_3'{\mathrm{a}}_2'{\mathrm{a}}_1'{\mathrm{a}}_0'\hfill \\ {}\underset{\bar{\mkern6mu}}{ + 8\mathrm{N}}\hfill & {\mathrm{a}}_{3\mathrm{m}-1}\hfill & {\mathrm{a}}_{3\mathrm{m}-2}\hfill & {\mathrm{a}}_{3\mathrm{m}-3}\hfill & {\mathrm{a}}_{3\mathrm{m}-4}\hfill & \dots \hfill & {\mathrm{a}}_{3\mathrm{i}-1\kern0.5em }{\mathrm{a}}_{3\mathrm{i}-2\kern0.5em }{\mathrm{a}}_{3\mathrm{i}-3}\hfill & \dots \hfill & {\mathrm{a}}_{0\ }00\kern0.5em 0\hfill \\ {}7\mathrm{N}\hfill & {\mathrm{s}}_{3\mathrm{m}+2}\hfill & {\mathrm{s}}_{3\mathrm{m}+1}\hfill & {\mathrm{s}}_{3\mathrm{m}}\hfill & {\mathrm{s}}_{3\mathrm{m}-1}\hfill & \dots \hfill & {\mathrm{s}}_{3\mathrm{i}+2\ }{\mathrm{s}}_{3\mathrm{i}+1\kern0.5em }{\mathrm{s}}_{3\mathrm{i}}\hfill & \dots \hfill & {\mathrm{s}}_3{\mathrm{s}}_{2\ }{\mathrm{s}}_1{\mathrm{s}}_0\hfill \end{array} $$

(4)

Figure 1a shows a full-adder (FA) cell [5, 6]. A FA cell takes two data inputs, a_i-1 and a_i, and a carry input bit c_i-1, and produces a carry output bit c_i and a sum bit s_i, for i = 0 ~ n, where a₋₁ = 0 and c₋₁ = 0. The logic functions are

$$ \begin{array}{c}\hfill {\mathrm{s}}_{\mathrm{i}}={\mathrm{a}}_{\mathrm{i}}\oplus {\mathrm{a}}_{\mathrm{i}\hbox{-} 1}\oplus {\mathrm{c}}_{\mathrm{i}\hbox{-} 1}\hfill \\ {}\hfill {\mathrm{c}}_{\mathrm{i}}=\left({\mathrm{a}}_{\mathrm{i}}\oplus {\mathrm{a}}_{\mathrm{i}\hbox{-} 1}\right){\mathrm{c}}_{\mathrm{i}\hbox{-} 1}+{\mathrm{a}}_{\mathrm{i}}{\mathrm{a}}_{\mathrm{i}\hbox{-} 1}\hfill \end{array} $$

(5)

The point-to-point delays of the FA and HA (Half-adder) cells are

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{cc}\left(\mathrm{F}\mathrm{A}\right)}={\varDelta}_{\mathrm{carry}-\mathrm{in}-\mathrm{t}\mathrm{o}-\mathrm{carry}-\mathrm{out}\left(\mathrm{F}\mathrm{A}\right)}=2{\varDelta}_{\mathrm{NAND}2}\hfill \\ {}\hfill {\varDelta}_{\mathrm{cs}\left(\mathrm{F}\mathrm{A}\right)}={\varDelta}_{\mathrm{carry}-\mathrm{t}\mathrm{o}-\mathrm{sum}\left(\mathrm{F}\mathrm{A}\right)}={\varDelta}_{\mathrm{XOR}}\hfill \\ {}\hfill {\varDelta}_{\mathrm{ic}\left(\mathrm{F}\mathrm{A}\right)}={\varDelta}_{\mathrm{input}-\mathrm{t}\mathrm{o}-\mathrm{carry}\left(\mathrm{F}\mathrm{A}\right)}={\varDelta}_{\mathrm{XOR}}+2{\varDelta}_{\mathrm{NAND}2}\hfill \\ {}\hfill {\varDelta}_{\mathrm{cc}\left(\mathrm{H}\mathrm{A}\right)}={\varDelta}_{\mathrm{ic}\left(\mathrm{H}\mathrm{A}\right)}={\varDelta}_{\mathrm{NOR}2}\;{\varDelta}_{\mathrm{cs}\left(\mathrm{H}\mathrm{A}\right)}={\varDelta}_{\mathrm{XOR}}\hfill \end{array} $$

(6)

Figure 1b shows an (n + 1)-bit FA-based RCA for 3 N operation. The RCA is comprised of (n-2) FAs and 2 HAs connected in series. The critical path includes the input-to- carry of the right-most HA (Δ_ic(HA)), the carry-to-carry (Δ_cc(FA)) of (n-2) FAs, and the carry-to-sum (Δ_cs(HA)) of the left-most HA, i.e.,

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{ic}\left(\mathrm{H}\mathrm{A}\right)}+\left(\mathrm{n}-2\right){\varDelta}_{\mathrm{cc}\left(\mathrm{F}\mathrm{A}\right)}+{\varDelta}_{\mathrm{cs}\left(\mathrm{H}\mathrm{A}\right)}\hfill \\ {}\hfill ={\varDelta}_{\mathrm{NOR}2}+2\left(\mathrm{n}-2\right){\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XOR}}\hfill \end{array} $$

(7)

In general, the critical path delay of the (km)-bits FA-based RCA for (2^k + 1)N operation can be expressed as

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(\left({2}^{\mathrm{k}+1}\right)\mathrm{N}\right)}={\varDelta}_{\mathrm{ic}\left(\mathrm{H}\mathrm{A}\right)}+\left(\mathrm{k}\mathrm{m}-\mathrm{k}-1\right){\varDelta}_{\mathrm{cc}\left(\mathrm{F}\mathrm{A}\right)}+\left(\mathrm{k}-1\right){\varDelta}_{\mathrm{cc}\left(\mathrm{H}\mathrm{A}\right)}+{\varDelta}_{\mathrm{cs}\left(\mathrm{H}\mathrm{A}\right)}\hfill \\ {}\hfill =\mathrm{k}{\varDelta}_{\mathrm{NOR}2}+2\left(\mathrm{k}\mathrm{m}-\mathrm{k}-1\right){\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XOR}}\hfill \end{array} $$

(8)

Similarly, the critical path delay of the (km)-bits FA-based RCA for (2^k-1)N operation can be expressed as

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(\left({2}^{\mathrm{k}\hbox{-} 1}\right)\mathrm{N}\right)}={\varDelta}_{\mathrm{ic}\left(\mathrm{H}\mathrm{A}\right)*}+\left(\mathrm{k}\mathrm{m}-\mathrm{k}-1\right){\varDelta_{\mathrm{cc}}}_{\left(\mathrm{F}\mathrm{A}\right)}+\left(\mathrm{k}-1\right){\varDelta}_{\mathrm{cc}\left(\mathrm{H}\mathrm{A}\right)*}+{\varDelta}_{\mathrm{cs}\left(\mathrm{H}\mathrm{A}\right)*}\hfill \\ {}\hfill ={\varDelta}_{\mathrm{inv}}+\left[2\left(\mathrm{k}\mathrm{m}-1\right)-\mathrm{k}\right]{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XNOR}}\hfill \end{array} $$

(9)

The term, (HA)* in (9), indicate a HA resulted from a FA with an input “1”, where Δ _ic(HA) * = Δ _inv + Δ _NAND2, Δ _cc(HA) * = Δ _NAND2, and Δ _cs(HA) * = Δ _XNOR.

By (8) and (9), with k = 3, the delays of the RCA in Fig. 1c and d for 9 N and 7 N operations respectively are

$$ {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(9\mathrm{N}\right)}=3{\varDelta}_{\mathrm{NOR}2}+2{\left(3\mathrm{m}-4\right)}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XOR}} $$

(10)

$$ {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(7\mathrm{N}\right)}={\varDelta}_{\mathrm{inv}}+\left(6\mathrm{m}-5\right){\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XNOR}} $$

(11)

Finally, Fig. 1e illustrates the (3 m)-bits FA-based RCA with dual mode for both 7 N and 9 N operations, where mode = 0 for 9 N operation and mode = 1 for 7 N operation. Similarly, one can easily realize a (km)-bits FA-based RCA for (2^k ± 1)N operations.

3 UCA-based RCAs for (2^k ± 1)N Operations

This section presents the UCA-based RCAs for (2^k + 1)N operation. The implementation concept is readily applied for (2^k-1)N operations.

3.1 UCA-based RCA for 3 N Operation

Consider the 3 N operation, as illustrated in (2). By Shannon expansion theorem, the carry and sum functions in (5) can be re-written as

$$ {\mathrm{c}}_{\mathrm{i}}={\mathrm{a}}_{\mathrm{i}}'\left({\mathrm{a}}_{\mathrm{i}-1}{\mathrm{c}}_{\mathrm{i}-1}\right)+{\mathrm{a}}_{\mathrm{i}}\left({\mathrm{a}}_{\mathrm{i}-1}+{\mathrm{c}}_{\mathrm{i}-1}\right)={\mathrm{a}}_{\mathrm{i}}'\mathrm{W}{0}_{\mathrm{i}-1}+{\mathrm{a}}_{\mathrm{i}}\mathrm{W}{1}_{\mathrm{i}-1} $$

(12)

$$ {\mathrm{s}}_{\mathrm{i}} = {\mathrm{a}}_{\mathrm{i}}\oplus {\mathrm{u}}_{\mathrm{i}},\mathrm{where}\ {\mathrm{u}}_{\mathrm{i}}={\mathrm{a}}_{\mathrm{i}-1}\oplus {\mathrm{c}}_{\mathrm{i}-1} $$

(13)

where W0_i-1 = a_i-1c_i-1 and W1_i-1 = a_i-1 + c_i-1. To construct the carry propagation paths, we consider both W0_i and W1_i, in the next stage. By (12), we can easily derive

$$ \begin{array}{l}\mathrm{W}{0}_{\mathrm{i}} = {\mathrm{a}}_{\mathrm{i}}{\mathrm{c}}_{\mathrm{i}}={\mathrm{a}}_{\mathrm{i}}\left({\mathrm{a}}_{\mathrm{i}}'\mathrm{W}{0}_{\mathrm{i}-1}+{\mathrm{a}}_{\mathrm{i}}\mathrm{W}{1}_{\mathrm{i}-1}\right)={\mathrm{a}}_{\mathrm{i}}\mathrm{W}{1}_{\mathrm{i}-1};\ \mathrm{and}\hfill \\ {}\mathrm{W}{1}_{\mathrm{i}}={\mathrm{a}}_{\mathrm{i}}+{\mathrm{c}}_{\mathrm{i}}={\mathrm{a}}_{\mathrm{i}}+{\mathrm{a}}_{\mathrm{i}}'\mathrm{W}{0}_{\mathrm{i}-1}+{\mathrm{a}}_{\mathrm{i}}\mathrm{W}{1}_{\mathrm{i}-1}={\mathrm{a}}_{\mathrm{i}}+\mathrm{W}{0}_{\mathrm{i}-1}\hfill \end{array} $$

(14)

By (13), the functions u_i can be expressed as

$$ {\mathrm{u}}_{\mathrm{i}}=\left[{\mathrm{a}}_{\mathrm{i}-1}{\mathrm{c}}_{\mathrm{i}-1}+{\mathrm{a}}_{\mathrm{i}-1}'{\mathrm{c}}_{\mathrm{i}-1}'\right]'=\left[\mathrm{W}{0}_{\mathrm{i}-1}+\mathrm{W}{1}_{\mathrm{i}-1}'\right]'=\mathrm{W}{0}_{\mathrm{i}-1}'\mathrm{W}{1}_{\mathrm{i}-1} $$

(15)

Figure 2a shows a UCA cell for 3 N operation. The cell is comprised of three blocks: Carry Propagation Path (CPP) block, Function u_i Generation (FUG) block, and Sum Generation (SMG) block. The critical path of an n-bit UCA-based RCA, as illustrated in Fig. 2b, includes an NAND gate in cell#1, (n-3) UCAs in cell#2 to cell#(n-2), and the inverter, NOR2, and XOR gates in cell#(n-1), where Δ_UCA = Δ_NOR2, i.e., the delay is

$$ {\varDelta}_{\mathrm{RCA}\left(\mathrm{U}\mathrm{C}\mathrm{A}\right)\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{NAND}2}+\left(\mathrm{n}-2\right){\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{inv}}+{\varDelta}_{\mathrm{XOR}} $$

(16)

3.2 UCA-based RCA for 5 N Operation

Consider the 5 N operation, as shown in (1) with k = 2, where A_i = (a_2i+1,a_2i), i = 0,1,…,m-1, and n = 2 m. (Note that a zero is added as the most significant bit if n is not an even number.) Two additions, (a_2i+1 + a_2i-1 + c_2i) and (a_2i + a_2i-2 + c_2i-1), are performed. By (5),

$$ \begin{array}{c}\hfill {\mathrm{c}}_{2\mathrm{i}}={\mathrm{a}}_{2\mathrm{i}}'\left({\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right)+{\mathrm{a}}_{2\mathrm{i}}\left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right);\hfill \\ {}\hfill {\mathrm{s}}_{2\mathrm{i}}={\mathrm{a}}_{2\mathrm{i}}\oplus {\mathrm{u}}_{2\mathrm{i}},\ \mathrm{where}\ {\mathrm{u}}_{2\mathrm{i}}={\mathrm{a}}_{2\mathrm{i}-2}\oplus {\mathrm{c}}_{2\mathrm{i}-1};\hfill \\ {}\hfill {\mathrm{c}}_{2\mathrm{i}+1}={\mathrm{a}}_{2\mathrm{i}+1}'\left({\mathrm{a}}_{2\mathrm{i}-1}{\mathrm{c}}_{2\mathrm{i}}\right)+{\mathrm{a}}_{2\mathrm{i}+1}\left({\mathrm{a}}_{2\mathrm{i}-1}+{\mathrm{c}}_{2\mathrm{i}}\right);\hfill \\ {}\hfill {\mathrm{s}}_{2\mathrm{i}+1}={\mathrm{a}}_{2\mathrm{i}+1}\oplus {\mathrm{u}}_{2\mathrm{i}+1},\ \mathrm{where}\ {\mathrm{u}}_{2\mathrm{i}+1}={\mathrm{a}}_{2\mathrm{i}-1}\oplus {\mathrm{c}}_{2\mathrm{i}};\hfill \end{array} $$

(17)

By (17), plugging c_2i to c_2i+1, we have

$$ \begin{array}{l}{\mathrm{c}}_{2\mathrm{i}+1}={\mathrm{a}}_{2\mathrm{i}+1}'{\mathrm{a}}_{2\mathrm{i}-1}\left[{\mathrm{a}}_{2\mathrm{i}}'\left({\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right)+{\mathrm{a}}_{2\mathrm{i}}\left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right)\right]+\hfill \\ {}\begin{array}{lll}\hfill & \begin{array}{ll}\hfill & \hfill \end{array}\hfill & {\mathrm{a}}_{2\mathrm{i}+1}\left({\mathrm{a}}_{2\mathrm{i}-1} + {\mathrm{a}}_{2\mathrm{i}}'\left({\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right)+{\mathrm{a}}_{2\mathrm{i}}\left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right)\right);\hfill \end{array}\hfill \\ {}\begin{array}{ll}\hfill & ={\mathrm{a}}_{2\mathrm{i}+1}'{\mathrm{a}}_{2\mathrm{i}}'\left({\mathrm{a}}_{2\mathrm{i}-1}{\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right) + {\mathrm{a}}_{2\mathrm{i}+1}'{\mathrm{a}}_{2\mathrm{i}}\left[{\mathrm{a}}_{2\mathrm{i}-1}\left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right)\right]+\hfill \end{array}\hfill \\ {}\begin{array}{lll}\hfill & \begin{array}{ll}\hfill & \hfill \end{array}\hfill & {\mathrm{a}}_{2\mathrm{i}+1}{\mathrm{a}}_{2\mathrm{i}}'\left[{\mathrm{a}}_{2\mathrm{i}-1}+{\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right] + {\mathrm{a}}_{2\mathrm{i}+1}{\mathrm{a}}_{2\mathrm{i}}\left[{\mathrm{a}}_{2\mathrm{i}-1} + {\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right]\hfill \end{array}\hfill \\ {}\begin{array}{ll}\hfill & ={\mathrm{a}}_{2\mathrm{i}+1}'{\mathrm{a}}_{2\mathrm{i}}'\mathrm{W}{00}_{2\mathrm{i}-1}+{\mathrm{a}}_{2\mathrm{i}+1}'{\mathrm{a}}_{2\mathrm{i}}\mathrm{W}0{1}_{2\mathrm{i}-1}+\hfill \end{array}\hfill \\ {}\begin{array}{lllll}\hfill & \hfill & \hfill & \hfill & {\mathrm{a}}_{2\mathrm{i}+1}{\mathrm{a}}_{2\mathrm{i}}'\mathrm{W}1{0}_{2\mathrm{i}-1}+{\mathrm{a}}_{2\mathrm{i}+1}{\mathrm{a}}_{2\mathrm{i}}\mathrm{W}{11}_{2\mathrm{i}-1}\hfill \end{array}\hfill \end{array} $$

(18)

where

$$ \begin{array}{cc}\hfill \mathrm{W}{00}_{2\mathrm{i}-1} = {\mathrm{a}}_{2\mathrm{i}-1}{\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1};\hfill & \hfill \mathrm{W}0{1}_{2\mathrm{i}-1} = {\mathrm{a}}_{2\mathrm{i}-1}\left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right);\hfill \\ {}\hfill \mathrm{W}1{0}_{2\mathrm{i}-1} = {\mathrm{a}}_{2\mathrm{i}-1}+{\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1};\hfill & \hfill \mathrm{W}{11}_{2\mathrm{i}-1} = {\mathrm{a}}_{2\mathrm{i}-1}+{\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1};\hfill \end{array} $$

The 2-bits UCA cell for 5 N, referred to as a UCA2 cell can be derived as in the following property.

Property 1

The logic functions of the CPP block are expressed as

$$ \begin{array}{c}\hfill \mathrm{W}{00}_{2\mathrm{i}+1} = {\mathrm{a}}_{2\mathrm{i}+1}\cdotp {\mathrm{a}}_{2\mathrm{i}}\cdotp \mathrm{W}{11}_{2\mathrm{i}-1}\hfill \\ {}\hfill \mathrm{W}0{1}_{2\mathrm{i}+1} = {\mathrm{a}}_{2\mathrm{i}+1}\cdotp \left({\mathrm{a}}_{2\mathrm{i}} + \mathrm{W}1{0}_{2\mathrm{i}-1}\right)\hfill \\ {}\hfill \mathrm{W}1{0}_{2\mathrm{i}+1} = {\mathrm{a}}_{2\mathrm{i}+1} + \left({\mathrm{a}}_{2\mathrm{i}}\cdotp \mathrm{W}0{1}_{2\mathrm{i}-1}\right)\hfill \\ {}\hfill \mathrm{W}{11}_{2\mathrm{i}+1} = {\mathrm{a}}_{2\mathrm{i}+1} + {\mathrm{a}}_{2\mathrm{i}} + \mathrm{W}{00}_{2\mathrm{i}-1}\hfill \end{array} $$

(18)

and the functions u_2i and u_2i+1 of the FUG block are

$$ \begin{array}{c}\hfill {\mathrm{u}}_{2\mathrm{i}}=\left(\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}0{1}_{2\mathrm{i}-1}\right)+\left(\mathrm{W}1{0}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}\right)\hfill \\ {}\hfill {\mathrm{u}}_{2\mathrm{i}+1}={\mathrm{a}}_{2\mathrm{i}}'\left[\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}1{0}_{2\mathrm{i}-1}\right]+{\mathrm{a}}_{2\mathrm{i}}\left[\mathrm{W}0{1}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}\right]\hfill \end{array} $$

(20)

Proof

Consider W00_2i+1 = a_2i+1a_2ic_2i+1, with c_2i+1 in (18), we obtain W00_2i+1 = a_2i+1a_2iW11_2i-1. Similarly, one can easily derive the remaining terms in (19). By (17),

$$ \begin{array}{c}\hfill {\mathrm{u}}_{2\mathrm{i}}={\mathrm{a}}_{2\mathrm{i}-2}\oplus {\mathrm{c}}_{2\mathrm{i}-1}=\left({\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right)\oplus \left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right)\hfill \\ {}\hfill =\left[{\mathrm{a}}_{2\mathrm{i}-1}\oplus \left({\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right)\right]\oplus \left[{\mathrm{a}}_{2\mathrm{i}-1}\oplus \left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right)\right]\hfill \\ {}\hfill =\left(\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}1{0}_{2\mathrm{i}-1}\right)\oplus \left(\mathrm{W}0{1}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}\right)\hfill \\ {}\hfill =\left(\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}1{0}_{2\mathrm{i}-1}\right)'\left(\mathrm{W}0{1}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}\right)+\hfill \\ {}\hfill \left(\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}1{0}_{2\mathrm{i}-1}\right)\ \left(\mathrm{W}0{1}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}\right)'\hfill \end{array} $$

Since

$$ \begin{array}{c}\hfill \mathrm{W}{00}_{2\mathrm{i}-1}\mathrm{W}{11}_{2\mathrm{i}-1}=\mathrm{W}{00}_{2\mathrm{i}-1},\ \mathrm{W}1{0}_{2\mathrm{i}-1}'\mathrm{W}0{1}_{2\mathrm{i}-1}'=\mathrm{W}1{0}_{2\mathrm{i}-1}',\hfill \\ {}\hfill \mathrm{W}1{0}_{2\mathrm{i}-1}\mathrm{W}0{1}_{2\mathrm{i}-1}=\mathrm{W}0{1}_{2\mathrm{i}-1},\ \mathrm{and}\ \mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}'=\mathrm{W}{11}_{2\mathrm{i}-1}',\hfill \\ {}\hfill {\mathrm{u}}_{2\mathrm{i}} = \mathrm{W}{00}_{2\mathrm{i}-1}\mathrm{W}0{1}_{2\mathrm{i}-1}'+\mathrm{W}1{0}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}+\hfill \\ {}\hfill \mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}0{1}_{2\mathrm{i}-1}+\mathrm{W}1{0}_{2\mathrm{i}-1}\mathrm{W}{11}_{2\mathrm{i}-1}'\hfill \end{array} $$

and since

$$ \begin{array}{c}\hfill \mathrm{W}{00}_{2\mathrm{i}-1}\mathrm{W}0{1}_{2\mathrm{i}-1}'=0\ \mathrm{and}\ \mathrm{W}1{0}_{2\mathrm{i}-1}\mathrm{W}{11}_{2\mathrm{i}-1}'=0,\ \mathrm{thus}\hfill \\ {}\hfill {\mathrm{u}}_{2\mathrm{i}} = \left(\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}0{1}_{2\mathrm{i}-1}\right)+\left(\mathrm{W}1{0}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}\right)\hfill \end{array} $$

Similarly,

$$ \begin{array}{c}\hfill {\mathrm{u}}_{2\mathrm{i}+1}={\mathrm{a}}_{2\mathrm{i}-1}\oplus {\mathrm{c}}_{2\mathrm{i}}={\mathrm{a}}_{2\mathrm{i}-1}\oplus \left[{\mathrm{a}}_{2\mathrm{i}}'\left({\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right)+{\mathrm{a}}_{2\mathrm{i}}\left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right)\right]\hfill \\ {}\hfill = {\mathrm{a}}_{2\mathrm{i}}'\left[{\mathrm{a}}_{2\mathrm{i}-1}\oplus \left({\mathrm{a}}_{2\mathrm{i}-2}{\mathrm{c}}_{2\mathrm{i}-1}\right)\right]+{\mathrm{a}}_{2\mathrm{i}}\left[{\mathrm{a}}_{2\mathrm{i}-1}\oplus \left({\mathrm{a}}_{2\mathrm{i}-2}+{\mathrm{c}}_{2\mathrm{i}-1}\right)\right]\hfill \\ {}\hfill ={\mathrm{a}}_{2\mathrm{i}}'\left[\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}1{0}_{2\mathrm{i}-1}\right]+{\mathrm{a}}_{2\mathrm{i}}\left[\mathrm{W}0{1}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1}\right]\hfill \end{array} $$

Figure 3a shows the UCA2 cell, where the CPP block can be realized by using NAND2/NOR2 gates, as illustrated in Fig. 3b. The critical path, as indicated in red, of the (2 m)-bits UCA2-based RCA in Fig. 3c includes an INV and an NOR2 gate in cell #0, two NOR2 gates in each of cells #1 to #(m-2), and the FUG and SMG blocks in cell #(m-1). Thus, the propagation delay is

$$ {\varDelta}_{\mathrm{RCA}\left(\mathrm{U}\mathrm{C}\mathrm{A}2\right)\left(5\mathrm{N}\right)}={\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}2}+\left(\mathrm{m}-2\right){\varDelta}_{\mathrm{UCA}2}+{\varDelta}_{\mathrm{FUG}\left(5\mathrm{N}\right)}+{\varDelta}_{\mathrm{SMG}} $$

(21)

3.3 UCA-based RCA for (2^k + 1)N Operation

We first construct the CPP block and then the FUG and SMG blocks

CPP Block

Let r_j be a binary number, either a 0 or a 1,

$$ \begin{array}{c}\hfill {\#}_{\mathrm{r}\mathrm{i}}={\#}_0=``\cdotp "\left(\mathrm{Logic}\ \mathrm{AND}\ \mathrm{operation}\right),\ \mathrm{i}\mathrm{f}\ {\mathrm{r}}_{\mathrm{i}}=0,\hfill \\ {}\hfill {\#}_{\mathrm{r}\mathrm{i}}={\#}_1=``+"\left(\mathrm{logic}\ \mathrm{OR}\ \mathrm{operation}\right),\kern0.5em \mathrm{i}\mathrm{f}\ {\mathrm{r}}_{\mathrm{i}}=1.\hfill \end{array} $$

(22)

Thus, W01_2i+1 in (19) can be expressed as

$$ \mathrm{W}{\left({\mathrm{r}}_1{\mathrm{r}}_0\right)}_{2\mathrm{i}+1} = {\mathrm{a}}_{2\mathrm{i}+1}{\#}_0\left({\mathrm{a}}_{2\mathrm{i}}{\#}_1\mathrm{W}{\left({\mathrm{r}}_1'{\mathrm{r}}_0'\right)}_{2\mathrm{i}-1}\right) $$

(23)

where r₁’ and r₀’ are the bit-complement of r₁ and r₀, respectively.

Figure 3b shows the block diagram of the UCA2 cells, where the CPP block is highlighted. The CPP block contains 4 carry propagation paths, Pa, a = 0 ~ 3, and each path contains 2 gates, Jab, b = 0 or 1. The gate type of Jab is determined by the index of the path output Wab. For example, the output W10 of the path pa which includes both gates J21 and J20. Here, W10 means a = 1 and b = 0, i.e., J21 is an OR gate (because a = 1) and J20 is an AND gate (because b = 0). Fig. 4a is a symbolic representation of that in Fig. 3b.

Similarly, the CPP block of the UCAk cells for (2^k + 1)N operation can be expressed as

$$ \mathrm{W}{\left({\mathrm{r}}_{\mathrm{k}-1}..{\mathrm{r}}_1{\mathrm{r}}_0\right)}_{\mathrm{k}\left(\mathrm{i}+1\right)-1}={\mathrm{a}}_{\mathrm{k}\left(\mathrm{i}+1\right)-1}{\#}_{\mathrm{r}\mathrm{k}-1}\left(\dots {\#}_{\mathrm{r}1}{\left({\mathrm{a}}_{\mathrm{k}\mathrm{i}}{\#}_{\mathrm{r}0}\mathrm{W}\left({\mathrm{r}}_{\mathrm{k}-1}'\dots {\mathrm{r}}_0'\right.\right)}_{\mathrm{k}\mathrm{i}-1}\right) $$

(24)

Figure 4b shows the symbolic representation of the logic function for the CPP block of the UCAk cell and its delay is Δ_UCA(k) = kΔ_NOR2.

FUG and SMG Blocks

Let U(x0) and U(x1) be defined as follows: (“x” means “don’t care term”)

$$ \mathrm{U}\left(\mathrm{x}0\right)=\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}1{0}_{2\mathrm{i}-1},\ \mathrm{U}\left(\mathrm{x}1\right)=\mathrm{W}0{1}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1} $$

(25)

U(x0) takes the terms Wx0, while U(x1) is formed by the terms Wx1. Let V(0x) and V(1x) be defined as

$$ \mathrm{V}\left(0\mathrm{x}\right)=\mathrm{W}{00}_{2\mathrm{i}-1}'\mathrm{W}0{1}_{2\mathrm{i}-1},\mathrm{V}\left(1\mathrm{x}\right)=\mathrm{W}1{0}_{2\mathrm{i}-1}'\mathrm{W}{11}_{2\mathrm{i}-1} $$

(26)

Thus, (19) can be expressed as

$$ \begin{array}{l}{\mathrm{u}}_{2\mathrm{i}+1}={\mathrm{a}}_{2\mathrm{i}}'\mathrm{U}\left(\mathrm{x}0\right)+{\mathrm{a}}_{2\mathrm{i}}\mathrm{U}\left(\mathrm{x}1\right)\hfill \\ {}{\mathrm{u}}_{2\mathrm{i}}=\mathrm{V}\left(0\mathrm{x}\right)+\mathrm{V}\left(1\mathrm{x}\right)\hfill \end{array} $$

(27)

Figure 5a is the symbolic representation of FUG and SMG blocks of UCA2 cell for 5 N operation, where U(x0)* is an AND3 gate with 3 inputs a_2i’, W00_2i-1’, and W10_2i-1, and U(x1)* is that with a_2i, W01_2i-1’, and W11_2i-1. Thus, the delay path, as shown in Fig. 5b, includes an AND3, a OR2, and a XOR, and it can be implemented with an INV, an NAND3, an NAND2, and a XOR as follows,

$$ {\varDelta}_{\mathrm{FUG}\left(5\mathrm{N}\right)}={\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2};{\varDelta}_{\mathrm{SMG}}={\varDelta}_{\mathrm{XOR}} $$

(28)

Similarly, the UCA3 cell for 9 N operation, the FUG block can be expressed as follows,

$$ \begin{array}{c}\hfill {\mathrm{u}}_{3\mathrm{i}+2}={\mathrm{a}}_{3\mathrm{i}+1}'{\mathrm{a}}_{3\mathrm{i}}'\mathrm{U}\left(\mathrm{x}00\right)+{\mathrm{a}}_{3\mathrm{i}+1}'{\mathrm{a}}_{3\mathrm{i}}\mathrm{U}\left(\mathrm{x}01\right)+\hfill \\ {}\hfill \kern1.56em {\mathrm{a}}_{3\mathrm{i}+1}{\mathrm{a}}_{3\mathrm{i}}'\mathrm{U}\left(\mathrm{x}10\right)+{\mathrm{a}}_{3\mathrm{i}+1}{\mathrm{a}}_{3\mathrm{i}}\mathrm{U}\left(\mathrm{x}11\right)\hfill \\ {}\hfill {\mathrm{u}}_{3\mathrm{i}+1}={\mathrm{a}}_{3\mathrm{i}}'\left[\mathrm{V}\left(0\mathrm{x}0\right)+\mathrm{V}\left(1\mathrm{x}0\right)\right]+{\mathrm{a}}_{3\mathrm{i}}\left[\mathrm{V}\left(0\mathrm{x}1\right)+\mathrm{V}\left(1\mathrm{x}1\right)\right]\hfill \\ {}\hfill {\mathrm{u}}_{3\mathrm{i}} = \mathrm{V}\left(0\mathrm{x}0\right)+\mathrm{V}\left(1\mathrm{x}0\right)+\mathrm{V}\left(0\mathrm{x}1\right)+\mathrm{V}\left(1\mathrm{x}1\right)\hfill \end{array} $$

(29)

Figure 5c illustrates the delay path for 9 N operation, where u_3i+2, in (29), can be realized by four AND4 gates and one OR4 gates. However, the term a_3i+1’a_3i’U(x00) can be realized by an AND2 with a_3i+1’ and a_3i’ and an AND3 which takes the output of AND2 and two inputs of U(x00). Since the function AND2 with a_3i+1’ and a_3i’ can be pre-calculated and the AND2 gate will not be in the critical path, as shown in Fig. 5c. The OR4 is realized by a NOR2 followed by a NAND2. This results in an INV, an NAND3, an NOR2, and an NAND2 included in the critical path of the FUG block of the UCA3 cell, i.e.,

$$ {\varDelta}_{\mathrm{FUG}\left(9\mathrm{N}\right)}={\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{NAND}2};{\varDelta}_{\mathrm{SMG}}={\varDelta}_{\mathrm{XOR}} $$

(30)

Similarly, the delay path in Fig. 5d for UCA cell is,

$$ {\varDelta}_{\mathrm{FUG}\left(17\mathrm{N}\right)}={\varDelta}_{\mathrm{INV}}+2{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NOR}3};{\varDelta}_{\mathrm{SMG}}={\varDelta}_{\mathrm{XOR}} $$

(31)

3.4 Performance Evaluation

The proposed design methodology is mainly about the structure of hardware implementation which is based on the newly developed equations. The actual realization of the proposed UCA, for example, may vary significantly. Therefore, the following performance comparisons were conducted assuming the same type of hardware realization. Thus, the speed performance estimation only consider the total gate delay in the critical path under the same conditions, where the path delays and loading effects were not considered for this rough estimation. In fact, the performances are evaluated based on the gate delays of the standard cells in TSMC 0.18 μm CMOS process technology, as tabulated in Table 1 [2], where the cell height is 5.04 μm.

Table 1 Cell delay data [2]

Full size table

The delays of UCA-based RCAs are, n = 2^t, t ≥ 4, where (Δ_inv + (k-1)Δ_NOR2) is in Cell#0 for 5 N and 17 N operations,

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{U}\mathrm{C}\mathrm{A}\right)\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{NAND}2}+\left(\mathrm{n}-2\right){\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{inv}}+{\varDelta}_{\mathrm{XOR}}\hfill \\ {}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{U}\mathrm{C}\mathrm{A}\right)\left(5\mathrm{N}\right)}=\left(\mathrm{n}-3\right){\varDelta}_{\mathrm{NOR}2}+2{\varDelta}_{\mathrm{inv}}+{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XOR}}\hfill \\ {}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{U}\mathrm{C}\mathrm{A}\right)\left(17\mathrm{N}\right)}=\left(\mathrm{n}-5\right){\varDelta}_{\mathrm{NOR}2}+2{\varDelta}_{\mathrm{inv}}+2{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NOR}3}+{\varDelta}_{\mathrm{XOR}}\hfill \end{array} $$

(32)

By (8), the delays of FA-based RCAs are, n = 2^t, t ≥ 4,

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{NOR}2}+2\left(\mathrm{n}-2\right){\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XOR}}\hfill \\ {}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(5\mathrm{N}\right)}=2{\varDelta}_{\mathrm{NOR}2}+2\left(\mathrm{n}-3\right){\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XOR}}\hfill \\ {}\hfill {\varDelta}_{\mathrm{RCA}\left(\mathrm{F}\mathrm{A}\right)\left(17\mathrm{N}\right)}=4{\varDelta}_{\mathrm{NOR}2}+2\left(\mathrm{n}-5\right){\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{XOR}}\hfill \end{array} $$

(33)

Based on (32), (33), and Table 1, the delays of both FA-based RCAs and UCA-based RCAs for 3 N (k = 1), 5 N (k = 2), and 17 N (k = 4) with the 16 ~ 1024 bits were computed and the values were tabulated in Table 2. Results show that UCA-RCA for 17 N operation has slightly better than that for 5 N operation. This is so simply because (2Δ_NOR2 + Δ_NAND2) = 0.1176 ns and (Δ_NAND3 + Δ_NOR3) = 0.1044 ns, where the difference is about 0.01 ns.

Table 2 Performance evaluation for RCAs

Full size table

Based on Table 2, Fig. 6a and b plot the delays of both FA-based RCA and UCA-based RCA, respectively, for 3 N, 5 N, and 17 N operations. Results show that their delays are almost the same for the same size with the same approach.

Table 2 tabulates the normalized speed performance, where the delay of FA-based RCA is normalized and set to 1. The delay ratios of UCA-based UCAs for various bit sizes are shown in Fig. 6c. Results show that the delay ratios are approximately 65 % which can be simply calculated from the delay ratio of UCA cell (Δ_NOR2) over a FA cell (2Δ_NAND2), i.e., Δ _NOR2/(2Δ _NAND2) = 0.0426/0.0648 = 0.66 = 66 %.

Even though the proposed UCA-based RCAs are about 34 % faster than the FA-based ones, their speed performance is still too slow particularly for wider bit sizes. The following section attempts to further improve the speed performance.

4 Hybrid Adders for (2^k ± 1)N Operations

As mentioned, the RCA achieves lower hardware cost, while the CLA offers high speed performance. In order to achieve higher speed performance for the additions, a hybrid adder combining RCA and CLA (or generate-propagate adder), as shown in Fig. 7, is presented. The adder is simple and modular, where the lower bit CLA is taken as a module.

4.1 UCA-based Hybrid Adder for 3N Operation

For 3N operation, by (14), the carry-out bits at the first 4 stages of the CPP blocks can be written as follows,

$$ \begin{array}{ll}\begin{array}{l}\mathrm{W}{0}_0={\mathrm{a}}_0\mathrm{W}{1}_{-1}\hfill \\ {}\mathrm{W}{0}_1={\mathrm{a}}_1\mathrm{W}{1}_0={\mathrm{a}}_0{\mathrm{a}}_1+{\mathrm{a}}_1\mathrm{W}{0}_{-1}\hfill \\ {}\mathrm{W}{0}_2={\mathrm{a}}_1{\mathrm{a}}_2+{\mathrm{a}}_0{\mathrm{a}}_2\mathrm{W}{1}_{-1}\hfill \\ {}\mathrm{W}{0}_3={\mathrm{a}}_2{\mathrm{a}}_3+{\mathrm{a}}_0{\mathrm{a}}_1{\mathrm{a}}_3+{\mathrm{a}}_1{\mathrm{a}}_3\mathrm{W}{0}_{-1}\hfill \end{array}\hfill & \begin{array}{l}\mathrm{W}{1}_0={\mathrm{a}}_0+\mathrm{W}{0}_{-1}\hfill \\ {}\mathrm{W}{1}_1={\mathrm{a}}_1+\mathrm{W}{0}_0={\mathrm{a}}_1+{\mathrm{a}}_0\mathrm{W}{1}_{-1}\hfill \\ {}\mathrm{W}{1}_2={\mathrm{a}}_2+{\mathrm{a}}_0{\mathrm{a}}_1+{\mathrm{a}}_1\mathrm{W}{0}_{-1}\hfill \\ {}\mathrm{W}{1}_3={\mathrm{a}}_3+{\mathrm{a}}_1{\mathrm{a}}_2+{\mathrm{a}}_0{\mathrm{a}}_2\mathrm{W}{1}_{-1}\hfill \end{array}\hfill \end{array} $$

(34)

Thus, both W0₃ and W1₃ can be written as

$$ \begin{array}{ll}\mathrm{W}{0}_3={\mathrm{R}}_{01}+{\mathrm{R}}_{00}\mathrm{W}{0}_{-1}\hfill & \mathrm{W}{1}_3={\mathrm{Q}}_{01}+{\mathrm{Q}}_{00}\mathrm{W}{1}_{-1}\hfill \end{array} $$

(35)

where

$$ {\mathrm{R}}_{00}={\mathrm{a}}_1{\mathrm{a}}_3,\ {\mathrm{R}}_{01}={\mathrm{a}}_2{\mathrm{a}}_3+{\mathrm{a}}_0{\mathrm{a}}_1{\mathrm{a}}_3,{\mathrm{Q}}_{00}={\mathrm{a}}_0{\mathrm{a}}_2,\ {\mathrm{Q}}_{01}={\mathrm{a}}_3+{\mathrm{a}}_1{\mathrm{a}}_2 $$

(36)

Similarly, both W0₇ and W1₇ are expressed as

$$ \begin{array}{ll}\mathrm{W}{0}_7={\mathrm{R}}_{11}+{\mathrm{R}}_{10}\mathrm{W}{0}_3\hfill & \mathrm{W}{1}_7={\mathrm{Q}}_{11}+{\mathrm{Q}}_{10}\mathrm{W}{1}_3\hfill \end{array} $$

(37)

where

$$ {\mathrm{R}}_{10}={\mathrm{a}}_5{\mathrm{a}}_7,\ {\mathrm{R}}_{11}={\mathrm{a}}_6{\mathrm{a}}_7+{\mathrm{a}}_4{\mathrm{a}}_5{\mathrm{a}}_7,\ {\mathrm{Q}}_{00}={\mathrm{a}}_4{\mathrm{a}}_6,\ {\mathrm{Q}}_{01}={\mathrm{a}}_7+{\mathrm{a}}_5{\mathrm{a}}_6 $$

By (35), both W0₇ and W1₇ in (37) can be re-written as

$$ \begin{array}{l}\mathrm{W}{0}_7={\mathrm{R}}_{11}+{\mathrm{R}}_{10}{\mathrm{R}}_{01}+{\mathrm{R}}_{10}{\mathrm{R}}_{00}\mathrm{W}{0}_{-1}\hfill \\ {}\mathrm{W}{1}_7={\mathrm{Q}}_{11}+{\mathrm{Q}}_{10}{\mathrm{Q}}_{01}+{\mathrm{Q}}_{10}{\mathrm{Q}}_{00}\mathrm{W}{1}_{-1}\hfill \end{array} $$

(38)

Figure 8a shows a 32-bits hybrid adder, where each CLA unit processes 8-bit data. Each CLA unit, adopting the parallel prefix adder structure [7, 8], is comprised of two 4-bit RQ generator units (RQGUs, or RQGU-4), a 2-bit CLA (or CLA-2), and two 4-bit Function U generation units (FUGUs, or FUGU-4). The RQGU-4, as shown in Fig. 8b, realizes the functions in (34) and it replaces both PG unit and Block CLA-4 (BCLA-4) in the conventional CLA structure. The CLA-2, as illustrated in Fig. 8c, implements the functions in (35) and (38), and the FUGU-4 in Fig. 8d, is for the functions in (34) and (15). Note that c_i = (W0_i,W1_i) in Fig. 8a, The delays of these units can be expressed as

$$ \begin{array}{l}{\varDelta}_{\mathrm{RQGU}-4}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2};{\varDelta}_{\mathrm{CLA}-2}=2{\varDelta}_{\mathrm{NAND}3};\hfill \\ {}{\varDelta}_{\mathrm{FUGU}-4}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}2};{\varDelta}_{\mathrm{SUM}}={\varDelta}_{\mathrm{XOR}};\hfill \end{array} $$

The critical path of the 32-bit hybrid adder includes a RQGU-4, four CLA-2 s, a FUGU-4, and a SUM, i.e.,

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{HyA}\left(3\mathrm{N}\right)(32)}={\varDelta}_{\mathrm{RQGU}-4}+4{\varDelta}_{\mathrm{CLA}-2}+{\varDelta}_{\mathrm{FUGU}-4}+{\varDelta}_{\mathrm{SUM}}\hfill \\ {}\hfill =10{\varDelta}_{\mathrm{NAND}3}+2{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{XOR}}\hfill \end{array} $$

In general, for n = 2^t, t > 3, the delay

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{HyA}\left(3\mathrm{N}\right)\left(\mathrm{n}\right)}={\varDelta}_{\mathrm{RQGU}-4}+\left(\mathrm{n}/8\right){\varDelta}_{\mathrm{CLA}-2}+{\varDelta}_{\mathrm{FUGU}-4}+{\varDelta}_{\mathrm{SUM}}\hfill \\ {}\hfill =\left(\mathrm{n}/4+2\right){\varDelta}_{\mathrm{NAND}3}+2{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{XOR}}\hfill \end{array} $$

(39)

Similarly, the 4-bit CLA module of the hybrid adder in Fig. 8 can be replaced by 8-bit one. The delay is

$$ {\varDelta}_{\mathrm{HyA}\left(3\mathrm{N}\right)\left(\mathrm{n}\right)}={\varDelta}_{\mathrm{RQGU}-8}+\left(\mathrm{n}/16\right){\varDelta}_{\mathrm{CLA}-2}+{\varDelta}_{\mathrm{FUGU}-8}+{\varDelta}_{\mathrm{SUM}} $$

(40)

where

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RQGU}-8}={\varDelta}_{\mathrm{AND}5}+{\varDelta}_{\mathrm{OR}3};\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-8}={\varDelta}_{\mathrm{AND}5}+{\varDelta}_{\mathrm{OR}4}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}2}.\hfill \end{array} $$

(41)

The CLA module of the hybrid adder may include multi-level structure similar to the conventional CLA. For example, the CLA module of the 64-bit hybrid adder in Fig. 9 is with 2-level CLA structure. Let L denote as the number of levels in the CLA module. Thus, the delay of such hybrid adder is, r = m*2^L, m = 4 or 8, L ≥ 2, and n ≥ 2r,

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{HyA}\left(3\mathrm{N}\right)\left(\mathrm{n}\right)}={\varDelta}_{\mathrm{RQGU}-\mathrm{m}}+2\left(\mathrm{L}-1\right){\varDelta}_{\mathrm{BCLA}-2}+\left(\mathrm{n}/\mathrm{r}\right){\varDelta}_{\mathrm{CLA}-2}+\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-\mathrm{m}}+{\varDelta}_{\mathrm{SUM}}\hfill \end{array} $$

(42)

4.2 UCA-based Hybrid Adder for (2^k + 1)N Operation

Similar to the derivation process for 3 N operation in (35)-(38), the following component delays can be derived from Property 1 for 5 N operation, where

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RQGU}-4\left(5\mathrm{N}\right)}={\varDelta}_{\mathrm{RQGU}-4\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2};\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-4\left(5\mathrm{N}\right)}=2{\varDelta}_{\mathrm{NAND}3}+2{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{INV}};\hfill \\ {}\hfill {\varDelta}_{\mathrm{RQGU}-8\left(5\mathrm{N}\right)}={\varDelta}_{\mathrm{RQGU}-8\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+2{\varDelta}_{\mathrm{NOR}2}\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-8\left(5\mathrm{N}\right)}=2{\varDelta}_{\mathrm{NAND}3}+2{\varDelta}_{\mathrm{NAND}2}+2{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{INV}};\hfill \end{array} $$

(43)

Similarly, one can also derive for 17 N operation as

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RQGU}-4\left(17\mathrm{N}\right)}={\varDelta}_{\mathrm{RQGU}-4\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2};\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-4\left(17\mathrm{N}\right)}=3{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}3};\hfill \\ {}\hfill {\varDelta}_{\mathrm{RQGU}-8\left(17\mathrm{N}\right)}={\varDelta}_{\mathrm{RQGU}-8\left(3\mathrm{N}\right)}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+2{\varDelta}_{\mathrm{NOR}2};\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-8\left(17\mathrm{N}\right)}=3{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+2{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}3}.\hfill \end{array} $$

(44)

For 9N operation with k = 3, the UCA3 cells are employed and each cell contains 3 bits. Thus, the delays are

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RQGU}-3\left(9\mathrm{N}\right)}=2{\varDelta}_{\mathrm{NAND}2};\hfill \\ {}\hfill {\varDelta}_{\mathrm{RQGU}-6\left(9\mathrm{N}\right)}={\varDelta}_{\mathrm{AND}4}+{\varDelta}_{\mathrm{OR}3}={\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{OR}3}\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-3\left(9\mathrm{N}\right)}=3{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NOR}2}+2{\varDelta}_{\mathrm{NAND}3};\hfill \\ {}\hfill {\varDelta}_{\mathrm{FUGU}-6\left(9\mathrm{N}\right)}=2{\varDelta}_{\mathrm{NAND}2}+2{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{INV}}+{\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{OR}3}.\hfill \end{array} $$

(45)

Similarly, for (2^k + 1)N operations, the k-bit cells, UCAk cells, are employed. Thus, the hybrid adder may include the k-bit CLA modules.

4.3 Performance Evaluation

This section roughly estimates the speed performances of the FA-based RCAs, UCA-based RCA, and the hybrid adders with various bit sizes for 3N, 5N, 9N, and 17N operations. However, the performance evaluation process is readily applied for any bit sizes and (2^k ± 1)N operations.

The speed performance is roughly evaluated based on the gate delays of the standard cells listed in Table 1, where the AND (NAND) and OR (NOR) gates are limited to their fan-ins up to 3. Thus, the AND5 and OR4 gates in (41) are implemented in two-level logic gates, i.e., AND5 = (AND3) •(AND2), and OR4 = (OR2) + (OR2).

The delays of the RQGU-4 and RQGU-8 for 3N, 5N, and 17N operations are the same. By Table 1,

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{RQGU}-4}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}=0.0777\mathrm{ns}\hfill \\ {}\hfill {\varDelta}_{\mathrm{RQGU}-8}={\varDelta}_{\mathrm{AND}5}+{\varDelta}_{\mathrm{OR}4}\hfill \\ {}\hfill \kern2.04em ={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+2{\varDelta}_{\mathrm{NOR}2}=0.1629\mathrm{ns}\hfill \\ {}\hfill {\varDelta}_{\mathrm{RQGU}-3}=2{\varDelta}_{\mathrm{NAND}2}=0.0648\mathrm{ns}\hfill \\ {}\hfill {\varDelta}_{\mathrm{RQGU}-6}={\varDelta}_{\mathrm{NAND}}+{\varDelta}_{\mathrm{NOR}2}+{\varDelta}_{\mathrm{NOR}3}+{\varDelta}_{\mathrm{INV}}=0.1602\mathrm{ns}\hfill \end{array} $$

(46)

Based on (39, 40, 41, 42, 43, 44, 45, and 46), the delays of the hybrid adders are calculated and tabulated in Table 3.

Table 3 Performance comparison

Full size table

Results show that the delays are indeed improved as the number of levels, L, increases. For example, for 17 N operation with 1024 bits, the delays was improved from 66.51 ns for the FA-based RCA to 43.64 ns for the proposed UCA-based RCA, as shown in Table 2, Further, the delay is reduced to 12.07 ns and 6.44 ns for the proposed hybrid adders with 4-bits and 8-bits modules, respectively, as illustrated in Table 3 and plotted in Fig. 10a. The delays can be further decreased to 2.31 ns and 1.76 ns by using the proposed hybrid with 4-levels of 4-bits and 8-bits modules, respectively, as plotted in Fig. 10b.

4.4 Discussion

The proposed UCA design concept has the advantages of better speed performance than the conventional FA design approach. This sub-section compares the speed performance of the UCA-CLA and conventional CLA (CV-CLA). The delays of both BCLA-4 and CLA-4 [5–7] are

$$ \begin{array}{c}\hfill {\varDelta}_{\mathrm{CLA}-4}={\varDelta}_{\mathrm{AND}5}+{\varDelta}_{\mathrm{OR}5}={\varDelta}_{\mathrm{NAND}3}+{\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{NOR}3}+{\varDelta}_{\mathrm{NOR}2}\hfill \\ {}\hfill {\varDelta}_{\mathrm{BCLA}-4}={\varDelta}_{\mathrm{AND}4}+{\varDelta}_{\mathrm{OR}4}=2\left({\varDelta}_{\mathrm{NAND}2}+{\varDelta}_{\mathrm{NOR}2}\right)\hfill \end{array} $$

For n-bit CV-CLAs, n = 4^t, with CLA-4 and BCLA-4 modules, the delay is

$$ {\varDelta}_{\mathrm{CV}-\mathrm{L}\mathrm{A}}={\varDelta}_{\mathrm{PGU}}+2\left(\mathrm{t}-1\right){\varDelta}_{\mathrm{BCLA}-4}+{\varDelta}_{\mathrm{CLA}-4}+{\varDelta}_{\mathrm{SUM}} $$

(47)

and the delay of an n-bit UCA-CLA is

$$ {\varDelta}_{\mathrm{UCA}-\mathrm{C}\mathrm{L}\mathrm{A}}={\varDelta}_{\mathrm{RQGU}-4}+2\left(\mathrm{t}-2\right){\varDelta}_{\mathrm{BCLA}-4}+{\varDelta}_{\mathrm{CLA}-4}+{\varDelta}_{\mathrm{FUGU}-4}+{\varDelta}_{\mathrm{SUM}} $$

(48)

Note that the delay (Δ_RQGU-4 + Δ_FUGU-4) in UCA-CLA is compared with (Δ_PGU + 2Δ_BCLA-4) in CV-CLA. Table 4 compares the speed performance of CV-CLA and UCA- CLA for 3 N, 5 N, and 17 N operations with various bit sizes.

Table 4 Performance comparison (CLA Modules)

Full size table

The UCA-CLAs for 3N, 5N, and 17N operations respectively achieve approximately 30 %, 25 %, and 15 % less delays than CV-CLA for 16-bit addition, and about 13 %, 11 %, and 7 % for 1024 bit addition. As mentioned, the delay of FUGU-2^k block increases with k considerably. Thus, the proposed UCA-CLAs may gain the advantage of better speed performance for lower values of k. The proposed simple and modular UCA-RCA are perfectly applied for those constant multiplications with smaller bit sizes, such as 3 N, 5 N, and 7 N operations for arithmetic coding, Booth encoding, and the divider design [9] as a cost-effective solution. For higher speed performance, the proposed UCA-CLA can also be applied.

5 Conclusion

Constant multiplier performs a multiplication of a data-input with a constant value. They are essential components in various types of arithmetic circuits, such as filters in digital signal processor (DSP) units and they are prevalent in modern VLSI designs. This study has presented efficient algorithm and fast hardware implementation for (2^k ± 1)N operation with additions. No multiplications were needed.

The salient features of the proposed UCA design concept came from the use of the carry function of (12) instead of (5), thus improving the propagation delay path and achieving 34 % in speed performance improvement. The design concept is perfectly applied for the RCA design for faster speed. The proposed UCA-HyA provides a cost-effective solution for constant multiplication, and the proposed UCA-CLA further improves the speed performance.

In addition, the study presents the hardware implementation for (2^k + 1)N operation, how to apply the proposed UCA design concept for any other form of constants with one addition also leads a very interesting topic for future research. Finally, as mentioned in [2], the constant divider can be developed in a similar way.

References

Cappello, P. R., & Steiglitz, K. (1984). Some complexity issues in digital signal processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(No. 5), 1037–1041.
Article MATH Google Scholar
Wey, C. L., Jui, P.-C., & Sung, G.-N. (2014). Efficient Multiply-by-3 and Divide-by-3 Algorithms and Their Fast Hardware Implementation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E97-A(2), 616–623.
Article Google Scholar
Jui, P.-C., Sung, G.-N., & Wey, C. L. (2012). Efficient algorithm and hardware implementation of 3N for arithmetic and for radix-8 encodings, Proc. of IEEE Midwest Symp. on Circuits and Systems, Boise, Idaho (pp. 418–421).
Google Scholar
Oudjida, A. K., & Chaillet, N. (2014). Radix-2r Arithmetic for multiplication by a constant. IEEE Transactions on Circuits and Systems – II: Express Briefs, 61, 349–353.
Article Google Scholar
Koren, I. (1993). Computer arithmetic algorithms. New Jersey: Prentice-Hall, Inc.
Google Scholar
Huang, K. (1979). Computer arithmetic: principles, architecture, and design. New York: Wiley.
Google Scholar
Chang, S.-K., & Wey, C. L. (2012). A fast 64-bit hybrid adder design in 90 nm CMOS process. Boise: Proc. of IEEE Midwest Symp. on Circuits and Systems.
Google Scholar
Brent, R. P., & Kung, H. T. (1982). A regular layout for parallel adders. IEEE Transactions on Computers, 31(3), 260–264.
Article MATH MathSciNet Google Scholar
Wey, C. L. (1996). Built-In Self-Test (BIST) design of high-speed carry-free dividers. IEEE Transactions on VLSI Systems, 4(1), 141–145.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Central University, Jhongli, Taiwan
Pin-Chang Jui & Muh-Tian Shiue
Department of Electrical and Computer Engineering, National Chiao Tung Univesity, Hsinchu, Taiwan
Chin-Long Wey

Authors

Pin-Chang Jui
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Long Wey
View author publications
You can also search for this author in PubMed Google Scholar
Muh-Tian Shiue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pin-Chang Jui.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jui, PC., Wey, CL. & Shiue, MT. Multiplication of a Constant (2^k ± 1) and Its Fast Hardware Implementation. J Sign Process Syst 82, 41–53 (2016). https://doi.org/10.1007/s11265-015-0978-4

Download citation

Received: 09 November 2014
Revised: 25 January 2015
Accepted: 02 February 2015
Published: 14 February 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11265-015-0978-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multiplication of a Constant (2^k ± 1) and Its Fast Hardware Implementation

Abstract

Similar content being viewed by others

Realization of Carry-Free Adder Circuit Using FPGA

Design of Delay-Efficient Carry-Save Multiplier by Structural Decomposition of Conventional Carry-Save Multiplier

Review on Different Types of Multipliers and Its Performance Comparisons

1 Introduction

2 FA-based RCAs for (2^k ± 1)N Operations

3 UCA-based RCAs for (2^k ± 1)N Operations

3.1 UCA-based RCA for 3 N Operation

3.2 UCA-based RCA for 5 N Operation

Property 1

Proof

3.3 UCA-based RCA for (2^k + 1)N Operation

CPP Block

FUG and SMG Blocks

3.4 Performance Evaluation

4 Hybrid Adders for (2^k ± 1)N Operations

4.1 UCA-based Hybrid Adder for 3N Operation

4.2 UCA-based Hybrid Adder for (2^k + 1)N Operation

4.3 Performance Evaluation

4.4 Discussion

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiplication of a Constant (2k ± 1) and Its Fast Hardware Implementation

Abstract

Similar content being viewed by others

Realization of Carry-Free Adder Circuit Using FPGA

Design of Delay-Efficient Carry-Save Multiplier by Structural Decomposition of Conventional Carry-Save Multiplier

Review on Different Types of Multipliers and Its Performance Comparisons

1 Introduction

2 FA-based RCAs for (2k ± 1)N Operations

3 UCA-based RCAs for (2k ± 1)N Operations

3.1 UCA-based RCA for 3 N Operation

3.2 UCA-based RCA for 5 N Operation

Property 1

Proof

3.3 UCA-based RCA for (2k + 1)N Operation

CPP Block

FUG and SMG Blocks

3.4 Performance Evaluation

4 Hybrid Adders for (2k ± 1)N Operations

4.1 UCA-based Hybrid Adder for 3N Operation

4.2 UCA-based Hybrid Adder for (2k + 1)N Operation

4.3 Performance Evaluation

4.4 Discussion

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Multiplication of a Constant (2^k ± 1) and Its Fast Hardware Implementation

2 FA-based RCAs for (2^k ± 1)N Operations

3 UCA-based RCAs for (2^k ± 1)N Operations

3.3 UCA-based RCA for (2^k + 1)N Operation

4 Hybrid Adders for (2^k ± 1)N Operations

4.2 UCA-based Hybrid Adder for (2^k + 1)N Operation