Multipliers

Deschamps, Jean-Pierre; Sutter, Gustavo D.; Cantó, Enrique

doi:10.1007/978-94-007-2987-2_8

Jean-Pierre Deschamps⁴,
Gustavo D. Sutter⁵ &
Enrique Cantó⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 149))

5513 Accesses

Abstract

Multiplication is a basic arithmetic operation whose execution is based on 1-digit by 1-digit multipliers and multi-operand adders. Most FPGA families include the basic components for implementing fast and cost-effective multipliers. Furthermore, they also include optimized fixed-size multipliers which, in turn, can be used for implementing larger-size multipliers.

Access provided by Autonomous University of Puebla. Download chapter PDF

Review on Different Types of Multipliers and Its Performance Comparisons

Modular Addition and Multiplication

Implementing Fast Carryless Multiplication

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Multiplication is a basic arithmetic operation whose execution is based on 1-digit by 1-digit multipliers and multi-operand adders. Most FPGA families include the basic components for implementing fast and cost-effective multipliers. Furthermore, they also include optimized fixed-size multipliers which, in turn, can be used for implementing larger-size multipliers.

The basic multiplication algorithm is described in Sect. 8.1. Several combinational implementations are proposed in Sect. 8.2. They correspond to different types of multi-operand adders: iterative ripple-carry adders, carry-save adders, multi-operand adders based on counters, radix-2^k and mixed-radix adders. Sequential implementations are proposed in Sect. 8.3. They used the shift and add method implemented with either a ripple-carry adder or a carry-save adder. If integer operands are considered, several options are proposed in Sect. 8.4. A first method consists of multiplying B’s complement integers as they are naturals; the drawback of this conceptually simple method is that the operands must be represented, and multiplied, with as many digits as the final result. Better options are a modification of the shift and add algorithm, multiplication of naturals followed by a post-correction, and the Booth algorithms. The last section describes a LUT-based method for implementing a constant multiplier, that is to say, circuits that compute c · y + u, where c is a constant.

8.1 Basic Algorithm

Consider two radix-B numbers

$$ x = x_{n - 1} \,\cdot\,B^{n - 1} + x_{n - 2} \,\cdot\,B^{n - 2} + \, \ldots \, + x_{ 1} \,\cdot\,B + x_{0} \,{\text{and}}\,y = y_{m - 1} \,\cdot\,B^{m - 1} + y_{m - 2} \,\cdot\,B^{m - 2} + \, \ldots \, \, + y_{ 1} \,\cdot\,B + y_{0} , $$

where x _i and y _i belong to {0, 1,…, B −1}. An n-digit by m-digit multiplier generates a radix-B number

$$ z = z_{n + m - 1} \,\cdot\,B^{n + m - 1} + z_{n + m - 2} \,\cdot\,B^{n + m - 2} + \, \ldots \, + z_{ 1} \,\cdot\,B + z_{0} $$

such that

$$ z = x\, \cdot \,y. $$

A somewhat more general definition considers the addition of two additional numbers

$$ u = u_{n - 1} \,\cdot\,B^{n - 1} + u_{n - 2} \,\cdot\,B^{n - 2} + \, \ldots \, + u_{ 1} \,\cdot\,B + u_{0} \,{\text{and}}\,v = v_{m - 1} \,\cdot\,B^{m - 1} \, + v_{m - 2} \,\cdot\,B^{m - 2} + \, \ldots \, + v_{ 1} \,\cdot\,B + v_{0} , $$

so that

$$ z = x\,\cdot\,y + u + v. $$

(8.1)

Observe that the maximum value of z is

$$ \left( {B^{n} - 1} \right)\left( {B^{m} - 1} \right) \, + \, \left( {B^{n} - 1} \right) \, + \, \left( {B^{m} - 1} \right) \, = B^{n + m} - 1. $$

In order to compute (8.1), first define a 1-digit by 1-digit multiplier: given four B-ary digits a, b, c and d, it generates two B-ary digits e and f such that

$$ a\,\cdot\,b + c + d = e\,\cdot\,B + f $$

(8.2)

(Fig. 8.1a).

If B = 2, it amounts to a 2-input AND gate and a 1-digit adder (Fig. 8.1b).

An n-digit by 1-digit multiplier made up of n 1-digit by 1-digit multipliers is shown in Fig. 8.2. It computes as

$$ z = x\,\cdot\,b + u + d $$

(8.3)

where x and u are n-digit numbers, b and d are 1-digit numbers, and z is an (n + 1)-digit number. Observe that the maximum value of z is

$$ \left( {B^{n} - 1} \right)\left( {B - 1} \right) \, + \, \left( {B^{n} - 1} \right) \, + \, \left( {B - 1} \right) \, = B^{n + 1} - 1. $$

Using the iterative circuit of Fig. 8.2 as a computation resource, the computation of (8.1) amounts to computing the m n-digit by 1-digit products

$$ \begin{aligned} z^{\left( 0 \right)} &= x\,\cdot\,y_{0} + u + v_{0} , \\ z^{\left( 1\right)} &= \, \left( {x\,\cdot\,y_{ 1} + v_{ 1} } \right)B, \\ z^{\left( 2\right)} &= \, \left( {x\,\cdot\,y_{ 2} + v_{ 2} } \right)B^{ 2} , \\ &\qquad\quad\ldots \\ z^{{\left( {m - 1} \right)}} &= \, \left( {x\,\cdot\,y_{m - 1} + v_{m - 1} } \right)B^{m - 1} , \\ \end{aligned} $$

(8.4)

and to adding them, that is

$$ z = z^{\left( 0 \right)} + z^{( 1)} + z^{( 2)} + \, \ldots \, + z^{(m - 1)} = x\,\cdot\,y + u + v. $$

(8.5)

For that, one of the multioperand adders of Sect. 7.7 can be used. As an example, if Algorithm 7.2 is used, then z is computed as follows.

Algorithm 8.1: Multiplication, right to left algorithm

8.2 Combinational Multipliers

8.2.1 Ripple-Carry Parallel Multiplier

The combinational circuit of Fig. 8.3 implements Algorithm 8.1 (with n = 4 and m = 3). One of its critical paths has been shaded. Its computation time is equal to

$$ T_{multiplier} \left( {n,m} \right) \, = \, \left( {n + 2m - 2} \right)\,\cdot\,T_{multiplier} \left( { 1, 1} \right). $$

(8.6)

The following VHDL model describes the circuit of Fig. 8.3 (B = 2).

A complete generic model parallel_multiplier.vhd is available at the Authors’ web page.

8.2.2 Carry-Save Parallel Multiplier

A straightforward modification of the multiplier of Fig. 8.3, similar to the carry-save principle, is shown in Fig. 8.4. The circuit is made up of an n-by-m array of 1-by-1 multipliers, whose computation time is equal to n · T(1,1), plus an m-digit output adder. Its critical path has been shaded. Its computation time is equal to

$$ T_{multiplier} \left( {n,m} \right) \, = n\, \cdot \,T_{multiplier} \left( { 1, 1} \right) \, + m\, \cdot \,T_{adder} \left( 1\right) \, \le \, \left( {n + m} \right)\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$

(8.7)

The following VHDL model describes the circuit of Fig. 8.4 (B = 2).

A complete generic model parallel_csa_multiplier.vhd is available at the Authors’ web page.

8.2.3 Multipliers Based on Multioperand Adders

A straightforward implementation of Eqs. (8.4) and (8.5) can also be considered (Fig. 8.5). For that, any type of multioperand adder can be used.

Example 8.1

Consider an n-bit by 7-bit multiplier. The 7-operand adder can be divided up into a 7-to-3 counter, a 3-to-2 counter and a ripple-carry adder. The complete structure is shown in Fig. 8.6 and is described by the following VHDL model:

A complete generic model N_by_7_multiplier.vhd is available at the Authors’ web page.

Numerous multipliers, based on trees of counters, have been proposed and reported, among others the Wallace and Dadda multipliers (Wallace [4]; Dadda [3]). Nevertheless, as already mentioned before (Comment 7.5), in many cases the best FPGA implementations are based on relatively simple algorithms, to which correspond regular circuits that allow taking advantage of the special purpose carry logic circuitry. To follow, an example of efficient FPGA implementation is described.

Consider the set of equations (8.4). If two successive steps are merged within an only step (loop unrolling), the new set of equations is:

$$ \begin{aligned} z^{{\left( { 1,0} \right)}} &= \, \left( {x\,\cdot\,y_{ 1} + v_{ 1} } \right)B + x\,\cdot\,y_{0} + u + v_{0} , \\ z^{{\left( { 3, 2} \right)}} &= \, \left[ {\left( {x\,\cdot\,y_{ 3} + v_{ 3} } \right)B + \, \left( {x\,\cdot\,y_{ 2} + v_{ 2} } \right)} \right]B^{ 2} , \\ &\qquad\qquad\ldots \\ z^{{\left( {m - 1,m - 2} \right)}} &= \, [\left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B + \, \left( {x\, \cdot \,y_{m - 2} + v_{m - 2} } \right)]B^{m - 2} , \\ \end{aligned} $$

(8.8)

and the product is equal to

$$ z = z^{{\left( { 1,0} \right)}} + z^{( 3, 2)} + \, \ldots \, + z^{(m - 1,m - 2)} . $$

Assuming that u = 0, the basic operation to implement (8.8) is

$$ z^{{\left( {i + 1,i} \right)}} = \, \left( {x\, \cdot \,y_{j + 1} + v_{j + 1} } \right)B + \left( {x\,\cdot\,y_{j} + v_{j} } \right) $$

to which corresponds the circuit of Fig. 8.7 (with n = 4).

The circuit of Fig. 8.7 can be decomposed into n + 1 vertical slices of the type shown in Fig. 8.8a (with obvious simplifications regarding the first and last slices). Finally, if B = 2 and v _j = 0, the carries of the first line are equal to 0, so that the circuit of Fig. 8.8a can be implemented as shown in Fig. 8.8b.

Comment 8.1

Most FPGA’s include the basic components for implementing the structure of Fig. 8.8b, and the synthesis tools have the capability to generate optimized multipliers from a simple VHDL expression, such as

$$ z\,<\,=\,{\text{ x}}\,*\,{\text{y}}; $$

Furthermore, many FPGA’s also include fixed-size multiplier blocks.

8.2.4 Radix-2^k and Mixed-Radix Parallel Multipliers

The basic multiplication algorithm (Sect. 8.1) and the corresponding ripple-carry and carry-save multipliers (Sects. 8.2.1 and 8.2.2) have been defined for any radix-B. In particular, radix-2^k multipliers can be defined. This allows the synthesis of n · k-bit by m · k-bit multipliers using k-bit by k-bit multipliers as building blocks.

The following VHDL model defines a radix-2^k ripple-carry parallel multiplier. The main iteration consists of m · n instantiations of any type of k-bit by k-bit combinational multiplier that computes z = a · b + c + d and represents z under the form $ z_{H} \, \cdot \, 2^{k} \, + z_{L} , $ where z _H and z _L are k-bit numbers:

A complete generic model base_2k_parallel_multiplier.vhd is available at the Authors’ web page.

The stored-carry encoding can also be applied. Once again, the main iteration consists of m · n instantiations of any type of k-bit by k-bit multiplier, and the connections are similar to those of the carry-save multiplier of Sect. 8.2.2. A complete generic model base_2k_csa_multiplier.vhd is available at the Authors’ web page.

A straightforward generalization of relations (8.2) to (8.5) allows defining mixed-radix combinational multipliers. First consider the circuit of Fig. 8.1a, assuming that

$$ a,c\, \in \left\{ {0,{ 1},\, \ldots \,,B_{ 1} - 1} \right\},\,{\text{and}}\,b,d\, \in \left\{ {0,{ 1},\, \ldots \,,B_{ 2} - 1} \right\}. $$

Then

$$ z = a\, \cdot \,b + c + d \le \, \left( {B_{ 1} - 1} \right)\, \cdot \,\left( {B_{ 2} - 1} \right) \, + \, \left( {B_{ 1} - 1} \right) \, + \, \left( {B_{ 2} - 1} \right) \, = B_{ 1} \, \cdot \,B_{ 2} - { 1}, $$

so that z can be expressed under the form

$$ z = e\, \cdot \,B_{ 1} + f,{\text{ with}}\,e \in \left\{ {0,{ 1},\, \ldots \,,B_{ 2} - 1} \right\},f \in \left\{ {0,{ 1},\, \ldots \,,B_{ 1} - 1} \right\}. $$

Then, consider the circuit of Fig. 8.2, assuming that x and u are n-digit radix-B ₁ numbers, and b and d are 1-digit radix-B ₂ numbers. Thus,

$$ x\, \cdot \,b + u + d = z_{n} \, \cdot \,B_{ 1}^{n} + z_{n - 1} \, \cdot \,B_{ 1}^{n - 1} + \, \ldots \, + z_{ 1} \, \cdot \,B_{ 1} + z_{0} , $$

(8.9)

with

$$ z_{n} \in \left\{ {0,{ 1},\, \ldots \,,B_{ 2} - 1} \right\}{\text{ and}}\,z_{i} \in \left\{ {0, 1,\, \ldots \,,B_{ 1} - 1} \right\},\forall i\,{\text{in }}\left\{ {0, 1,\, \ldots \,,n - 1} \right\}. $$

Finally, given two n-digit radix-B ₁ numbers x and u, and two m-digit radix-B ₂ numbers y and v, compute

$$ \begin{aligned} z^{\left( 0 \right)} &= x\, \cdot \,y_{0} + u + v_{0} , \\ z^{\left( 1\right)} &= \, \left( {x\, \cdot \,y_{ 1} + v_{ 1} } \right)B_{ 2} , \\ z^{\left( 2\right)} &= \, \left( {x\, \cdot \,y_{ 2} + v_{ 2} } \right)B_{ 2}^{ 2} , \\ &\qquad\quad\ldots \\ z^{{\left( {m - 1} \right)}} &= \, \left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B_{ 2}^{m - 1} . \\ \end{aligned} $$

(8.10)

Then

$$ z = z^{\left( 0 \right)} + z^{( 1)} + z^{( 2)} + \, \ldots \, + z^{(m - 1)} = x\, \cdot \,y + u + v. $$

(8.11)

Consider the case where

$$ B_{1} = 2^{{k_{1} }} ,\;B_{2} = 2^{{k_{2} }} . $$

An easy way to define a VHDL model of the corresponding multiplier consists in first modelling a circuit that implements (8.9). The main iteration consists of n instantiations of any type of k ₁-bit by k ₂-bit combinational multiplier that computes

$$ a\,\cdot\,b + c + d = z_{H} \,\cdot\,2^{{k_{1} }} + z_{L} , $$

where z _H is a k ₂-bit number and z _L a k ₁-bit number:

Then, it remains to instantiate m rows:

A complete generic model MR_parallel_multiplier.vhd is available at the Authors’ web page.

The circuit defined by the preceding VHDL model is a bidirectional array similar to that of Fig. 8.3, but with more complex connections. As an example, with k ₁ = 4 and k ₂ = 2, the connections corresponding to cell (j, i) are shown in Fig. 8.9. As before, a stored-carry encoding circuit could also be designed, but with an even more complex connection pattern. It is left as an exercise.

8.3 Sequential Multipliers

8.3.1 Shift and Add Multiplier

In order to synthesize sequential multipliers, the basic algorithm of Sect. 8.1 can be modified. For that, Eq. (8.4) are substituted by the following:

$$ \begin{aligned} z^{\left( 0 \right)} &= \, \left( {u + x\, \cdot \,y_{0} + v_{0} } \right)/B, \\ z^{\left( 1\right)} &= \, \left( {z^{(0)} + x\, \cdot \,y_{ 1} + v_{ 1} } \right)/B, \\ z^{\left( 2\right)} &= \, \left( {z^{ 1)} + x\, \cdot \,y_{ 2} + v_{ 2} } \right)/B, \\ &\qquad\qquad\ldots \\ z^{{\left( {m - 1} \right)}} &= \, \left( {z^{(m - 2)} + x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)/B. \\ \end{aligned} $$

(8.12)

Multiply the first equation by B, the second by B ², and so on, and add the so obtained equations. The result is

$$ z^{{\left( {m - 1} \right)}} B^{m} = u + x\, \cdot \,y_{0} + v_{0} + \, \left( {x\, \cdot \,y_{ 1} + v_{ 1} } \right)B + \, \ldots \, + \, \left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B^{m - 1} = xy + u + v. $$

Algorithm 8.2: Shift and add multiplication

A data path for executing Algorithm 8.2 is shown in Fig. 8.10a. The following VHDL model describes the circuit of Fig. 8.10a (B = 2).

The complete circuit also includes an m-state counter and a control unit. A complete generic model shift_and_add_multiplier.vhd is available at the Authors’ web page.

If v = 0, the same shift register can be used for storing both y and the least significant bits of z. The modified circuit is shown in Fig. 8.10b. A complete generic model shift_and_add_multiplier2 is also available.

The computation time of the circuits of Fig. 8.10 is approximately equal to

$$ T_{multiplier} \left( {n,m} \right) \, = m\, \cdot \,T_{multiplier} \left( {n, 1} \right) \, = m\, \cdot \,n\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$

(8.13)

8.3.2 Shift and Add Multiplier with CSA

The shift and add algorithm can also be executed with stored-carry encoding. After m steps the result is obtained under the form

$$ s_{n - 1} \,B^{n + m - 1} + \, \left( {c_{n - 2} + s_{n - 2} } \right)B^{n + m - 2} + \, \ldots \, + \, \left( {c_{0} + s_{0} } \right)B^{m} + z_{m - 1} \,B^{m - 1} + \, \ldots \, + z_{ 1} B + z_{0} , $$

and an additional n-digit adder computes

$$ s_{n - 1} \,B^{n - 1} + \, \left( {c_{n - 2} + s_{n - 2} } \right)B^{n - 2} + \ldots + \, \left( {c_{0} + s_{0} } \right) \, = z_{m + n - 1} \,B^{n - 1} + \, \ldots \, + z_{m + 1} \,B + z_{m} . $$

The corresponding data path is shown in Fig. 8.11. The carry-save adder computes

$$ y_{ 1} + y_{ 2} + y_{ 3} = s + c, $$

where y ₁, y ₂ and y ₃ are n-bit numbers, and s and c are (n + 1)-bit numbers. At the end of step i, the less significant bit of s is z _i, and the n most significant bits of s and c are transmitted to the next step:

The complete circuit also includes an m-state counter and a control unit. A complete generic model sequential_CSA_multiplier.vhd is available at the Authors’ web page. The minimum clock period is equal to the delay of a 1-bit by 1-bit multiplier. Thus, the total computation time is equal to

$$ T_{multiplier} \left( {n,m} \right) \, = m\, \cdot \,T_{multiplier} \left( { 1, 1} \right) \, + T_{adder} \left( n \right) \, \le \, \left( {n + m} \right)\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$

(8.14)

Comment 8.2

In sequential_CSA_multiplier.vhd the done flag is raised as soon as the final values of the adder inputs are available. A more correct control unit should raise the flag k cycles later, being k · T _clk an upper bound of the n-bit adder delay. The value of k could be defined as a generic parameter (Exercise 8.3).

8.4 Integers

Given four B’s complement integers

$$ x = x_{n} x_{n - 1} x_{n - 2} \ldots x_{0} ,\,\,\,y = \, y_{m} y_{m - 1} y_{m - 2} \ldots y_{0} ,\,\,u = u_{n} u_{n - 1} \,\,u_{n - 2} \ldots u_{0} ,\,\,\,\,v = v_{m} v_{m - 1} v_{m - 2} \, \ldots \,v_{0} , $$

belonging to the ranges

$$ - B^{n} \le \, x < B^{n} ,-B^{m} \le \, y < B^{m} , - B^{n} \le \, u < B^{n} ,-B^{m} \le \, v < B^{m} , $$

then z = x · y + u + v belongs to the interval

$$ - B^{n + m + 1} \le \, z < B^{n + m + 1} . $$

Thus, z is a B’s complement number of the form

$$ z = z_{n + m + 1} \,z_{n + m} \,\,z_{n + m - 1} \, \ldots \,z_{ 1} z_{0} . $$

8.4.1 Mod 2B^n+m Multiplication

The integer represented by a vector $ x_{n} \,x_{n - 1} \,x_{n - 2} \, \ldots \,x_{ 1} \,x_{0} $ is

$$ x = - x_{n} B^{n} + x_{n - 1} \,B^{n - 1} + x_{n - 2} \,B^{n - 2} \, + \ldots + x_{ 1} B + x_{0} , $$

while the natural natural(x) represented by this same vector is

$$ natural\,\left( x \right) \, = x_{n} B^{n} + x_{n - 1} \,B^{n - 1} + x_{n - 2} \,B^{n - 2} + \ldots + x_{ 1} B + x_{0} . $$

As x _n ∈ {0, 1}, either natural(x) = x or natural(x) = x +2B ⁿ. So,

$$ natural\left( x \right) = x\,\bmod \,2B^{n} . $$

The following method can be used to compute z = x · y + u + v. First, represent the operands x, y, u and v with the same number of digits (n + m + 2) as the result z (digit extension, Sect. 7.8). Then, compute z = x · y + u + v as if x, y, u and v were naturals:

$$ z = natural\left( x \right)\, \cdot \,natural\left( y \right) \, + natural\left( u \right) \, + natural\left( v \right) \, = natural\left( {x\, \cdot \,y + u + v} \right). $$

Finally, reduce z modulo $ 2B^{n + m + 1} . $ Assume that before the mod $ 2B^{n + m + 1} $ reduction

$$ z = \, \ldots \, + z_{n + m + 1} \,B^{n + m + 1} + z_{n + m} \,B^{n + m} + z_{n + m - 1} \,B^{n + m - 1} + \ldots + z_{ 1} B + z_{0} ; $$

then

$$ z\,\bmod \,2B^{n + m + 1} = \left( { \ldots + z_{n + m + 1} \bmod 2} \right)B^{n + m + 1} + z_{n + m} \,B^{n + m} + z_{n + m + 1} \,B^{n + m - 1} + \ldots + z_{1} B + z_{0} . $$

In particular, if B is even,

$$ z\, \bmod \, 2B^{n + m + 1} = \left( {z_{n + m + 1} \, \mod \, 2} \right)B^{n + m + 1} + z_{n + m} \,B^{n + m} + z_{n + m - 1} \,B^{n + m - 1} + \ldots + z_{ 1} B + z_{0} . $$

Example 8.2

Assume that B = 10, n = 4, m = 3, x = 7918, y = −541, u = −7017, v = 742, and compute z = 7918·(−541) + (−7017) + 742. In 10’s complement: x = 07918, y = 1459, u = 12983, v = 0742.

1.
Express all operands with 9 digits: x = 000007918, y = 199999459, u = 199992983, v = 000000742.
2.
Compute x · y + u + v: 000007918·199999459 + 199992983 + 000000742 = 1583795710087.
3.
Reduce 1583795710087 modulo 2·10⁸: (1583795710087) mod 2·10⁸ = (7 mod 2)·10⁸ + 95710087 = 195710087.

The result 195710087 is the 10’s complement representation of −4289913.

Thus, any multiplier for natural numbers can be used. As an example, an (n + m + 2)-digit by (n + m + 2)-digit carry-save multiplier could be used (Fig. 8.4). As the result is reduced modulo 2B ^n+m+1, only the rightmost part of the circuit is used (if B is even), so that there is no output adder, and the most significant digit is reduced mod 2. An example with n = 3 and m = 2 is shown in Fig. 8.12. The corresponding computation time is equal to

$$ \left( {n + m + 2} \right)\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$

(8.15)

This delay is practically the same as that of a carry-save combinational multiplier (8.7). Nevertheless, the number of 1-digit by 1-digit multiplication cells is equal to $ 1 { } + { 2 } + { 3 } + \ldots + \, \left( {n + m + 2} \right) \, = \, \left( {n + m + 2} \right)\left( {n + m + 3} \right)/ 2 $ instead of n · m.

A very simple way to generate a VHDL model consists of defining (n + m + 2)-bit representations of all operands and instantiating an (n + m + 2)-bit by (n + m + 2)-bit carry-save multiplier:

Only n + m + 2 output bits of the carry-save multiplier are connected to output ports, and the synthesis program will prune the circuit accordingly.

A complete generic model integer_CSA_multiplier.vhd is available at the Authors’ web page.

To conclude, this approach is conceptually attractive because any type of multiplier for natural numbers can be used. Nevertheless, the cost of the corresponding circuits is very high.

8.4.2 Modified Shift and Add Algorithm

Consider again four B’s complement integers

$$ x = x_{n} \,x_{n - 1} \,x_{n - 2} \, \ldots \,x_{0} ,y \, = \, y_{m} y_{m - 1} \,y_{m - 2} \, \ldots \,y_{0} ,\,\,u = u_{n} \,u_{n - 1} \,\,u_{n - 2} \, \ldots \,u_{0} ,\,\,v = v_{m} \,v_{m - 1} \,v_{m - 2} \, \ldots \,v_{0} . $$

A set of equations similar to (8.12) can be defined:

$$ \begin{aligned} z^{\left( 0 \right)} &= \left( {u + x\, \cdot \,y_{0} + v_{0} } \right)/B, \\ z^{\left( 1\right)} &= \left( {z^{(0)} + x\, \cdot \,y_{ 1} + v_{ 1} } \right)/B, \\ z^{\left( 2\right)} &= \left( {z^{ 1)} + x\, \cdot \,y_{ 2} + v_{ 2} } \right)/B, \\ &\qquad\quad\ldots \\ z^{{\left( {m - 1} \right)}} &= \left( {z^{(m - 2)} + x\cdot\,y_{m - 1} + v_{m - 1} } \right)/B, \\ z^{\left( m \right)} &= \left( {z^{(m - 1)} - x\cdot\,y_{m} - v_{m} } \right)/B. \\ \end{aligned} $$

(8.16)

Multiply the first equation by B, the second by B ², and so on, and add the m + 1 so obtained equations. The result is

$$ z^{\left( m \right)} B^{m + 1} = u + x\, \cdot \,y_{0} + v_{0} + \, \left( {x\, \cdot \,y_{ 1} + v_{ 1} } \right)B + \, \ldots \, + \left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B^{m - 1} - \left( {x\, \cdot \,y_{m} + v_{m} } \right)B^{m} = xy + u + v. $$

Algorithm 8.3: Modified shift and add multiplicationIn what follows it is assumed that v_m = 0, that is to say v ≥ 0; so, in order to implement Algorithm 8.3, the two following computation primitives must be defined:

$$ z = u + x\, \cdot \,b + d $$

(8.17)

and

$$ z = u - x\, \cdot \,b, $$

(8.18)

where

$$ - B^{n} \le x < B^{n} , - B^{n} \le u < B^{n} , \, 0 \, \le b < B, \, 0 \, \le d < B. $$

Thus, in the first case,

$$ - B^{n + 1} \le z < B^{n + 1} , $$

and in the second case

$$ - B^{n + 1} + \, \left( {B - 1} \right) \, \le z < B^{n + 1} , $$

so that in both cases z is an (n + 2)-digit B’s complement integer and natural(z) = z mod 2B ⁿ⁺¹.

The first primitive (8.17) is implemented by the circuit of Fig. 8.13 and the second (8.18) by the circuit of Fig. 8.14. In both, circuit z _n+1 is computed modulo 2.

As an example, the combinational circuit of Fig. 8.15 implements Algorithm 8.3 (with n = m = 2). Its cost and computation time are practically the same as in the case of a ripple-carry multiplier for natural numbers. It can be described by the following VHDL model.

A complete generic model modified_parallel_multiplier.vhd is available at the Authors’ web page.

The design of a sequential multiplier based on Algorithm 8.3 is left as an exercise.

8.4.3 Post Correction Multiplication

Given four B’s complement integers

$$ x = \, x_{n} \,x_{n - 1} \,x_{n - 2} \, \ldots \,x_{0} ,y \, = \, y_{m} \,y_{m - 1} \,y_{m - 2} \, \ldots \,y_{0} ,u \, = \, u_{n} u_{n - 1} \,u_{n - 2} \, \ldots \,u_{0} ,v \, = \, v_{m} v_{m - 1} \,v_{m - 2} \, \ldots \,v_{0} , $$

then z = x · y + u + v, belonging to the interval $ - B^{n + m + 1} \le \, z < B^{n + m + 1} , $ can be expressed under the form

$$ z = \left( {X_{0} \, \cdot \,Y_{0} + U_{0} + V_{0} } \right) \, + x_{n} \, \cdot \,y_{m} \, \cdot \,B^{n + m} - \, \left( {x_{n} \, \cdot \,Y_{0} + u_{n} } \right)\, \cdot \,B^{n} - \, \left( {y_{m} \, \cdot \,X_{0} + v_{m} } \right)\, \cdot \,B^{n} , $$

where X ₀, Y ₀, U ₀ and V ₀ are four naturals

$$ X_{0} = x_{n - 1} \,x_{n - 2} \, \ldots \,x_{0} ,Y_{0} = \, y_{m - 1} \,y_{m - 2} \, \ldots \,y_{0} ,\,U_{0} = \, u_{n - 1} \,u_{n - 2} \, \ldots \,_{ } u_{0} ,\,X_{0} = v_{m - 1} v_{m - 2} \, \ldots \,v_{ 1} v_{0} $$

deduced from x, y, u and v by eliminating the sign bits. Thus, the computation of z amounts to the computation of

$$ Z_{0} = X_{0} \, \cdot \,Y_{0} + U_{0} + V_{0} , $$

that can be executed by any type of multiplier for naturals, plus a post correction that consists of several additions and left shifts.

If B = 2 and u = v = 0, then

$$ z = x\, \cdot \,y = X_{0} \, \cdot \,Y_{0} + x_{n} \, \cdot \,y_{m} \, \cdot \, 2^{n + m} -x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{n} . $$

The (n + m + 2)-bit 2’s complement representations of $ - x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} $ and $ - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{m} $ are

$$ (2^{m + 1} + 2^{m} + \overline{{(x_{n} \, \cdot \,y_{m - 1} )}} \, \cdot \,2^{m - 1} + \ldots + \overline{{(x_{n} \, \cdot \,y_{0} )}} \, \cdot \,2^{0} + 1)\, \cdot \,2^{n} \;\bmod \,2^{n + m - 2} , $$

and

$$ (2^{n + 1} + 2^{n} + \overline{{(y_{m} \, \cdot \,x_{n - 1} )}} \, \cdot \,2^{n - 1} + \, \ldots \, + \overline{{(y_{m} \, \cdot \,x_{0} )}} \, \cdot \,2^{0} + 1)\, \cdot \,2^{m} \;\bmod \,2^{n + m - 2} , $$

so that the representation of $ x_{n} \, \cdot \,y_{m} \, \cdot \, 2^{n + m} -x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{n} $ is

$$ \begin{gathered} (2^{n + m + 1} + x_{n} \, \cdot \,y_{m} \, \cdot \,2^{n + m} + \overline{{(x_{n} \, \cdot \,y_{m - 1} )}} \, \cdot \,2^{n + m - 1} + \ldots + \overline{{(x_{n} \, \cdot \,y_{0} )}} \, \cdot \,2^{n} \hfill \\ + 2^{n} + \overline{{(y_{m} \, \cdot \,x_{n - 1} )}} \, \cdot \,2^{n + m - 1} + \ldots + \overline{{(y_{m} \, \cdot \,x_{0} )}} \, \cdot \,2^{m} + 2^{m} )\;\bmod \,2^{n + m + 2} . \hfill \\ \end{gathered} $$

A simple modification of the combinational multipliers of Fig. 8.3 and 8.4 allows computing x · y, where x is an (n + 1)-bit 2’s complement integer and y an (m + 1)-bit 2’s complement integer. An example is shown in Fig. 8.16 (n = 3, m = 2). The nand multiplication cells are similar to that of Fig. 8.1b, but for the substitution of the AND gate by a NAND gate [1].

The following VHDL model describes the circuit of Fig. 8.16.

A complete generic model postcorrection_multiplier.vhd is available at the Authors’ web page.

8.4.4 Booth Multiplier

Given an (m + 1)-bit 2’s complement integer $ y = - y_{m} \, \cdot \, 2^{m} + y_{m - 1} \, \cdot \, 2^{m - 1} + \, \ldots \, + y_{ 1} \, \cdot \, 2 { } + y_{0} , $ define

$$ y_{0}^{\prime} = - y_{0} \,{\text{and}}\,y_{j}^{\prime} = - y_{j} + y_{j - 1} ,\,\forall i\,{\text{in }}\left\{ { 1, 2, \ldots ,m} \right\}, $$

so that all coefficients y _i’ belong to {−1, 0, 1}. Then y can be represented under the form

$$ y = y_{m}^{\prime} \, \cdot \, 2^{m} + y_{m - 1}^{\prime} \, \cdot \, 2^{m - 1} + \ldots + y_{1}^{\prime} \, \cdot \, 2 { } + y_{0}^{\prime} , $$

the so-called Booth’s encoding of y (Booth [2]. Unlike the 2’s complement representation in which y _m has a specific function, all coefficients y _i’ have the same function. Formally, the Booth’s representation of an integer is the same as the binary representation of a natural. The basic multiplication algorithm (Algorithm 8.1), with v = 0, can be used.

Algorithm 8.4: Booth multiplication, z = x·y + u

The following VHDL model describes a combinational circuit based on Algorithm 8.4. A complete generic model Booth1_multiplier.vhd is available at the Authors’ web page.

Higher radix Booth multipliers can be defined. Given an (m + 1)-bit 2’s complement integer $ y = - y_{m} \, \cdot \, 2^{m} + y_{m - 1} \, \cdot \, 2^{m - 1} + \ldots + y_{ 1} \, \cdot \, 2 { } + y_{0} , $ where m is odd, define

$$ y_{0}^{\prime} = - 2\, \cdot \,y_{1} + y_{0} ,y_{i}^{\prime} = - 2\, \cdot \,y_{2\, \cdot \,i + 1} + y_{2\, \cdot \,i} + y_{2\cdot\,i - 1} ,\,\forall i\,{\text{in}}\left\{ {1,2, \ldots ,\left( {m - 1} \right)/2} \right\}, $$

so that all coefficients y _i’ belong to {−2, −1, 0, 1, 2}. Then y can be represented under the form

$$ y = y_{{\left( {m - 1} \right)/ 2}}^{\prime} \, \cdot \, 4^{{\left( {m - 1} \right)/ 2}} + y_{{\left( {m - 1} \right)/ 2- 1}}^{\prime} \, \cdot \, 4^{{\left( {m - 1} \right)/ 2- 1}} + \, \ldots \, + y_{1}^{\prime} \, \cdot \, 4 { } + y_{0}^{\prime} , $$

the so-called Booth-2 encoding of y.

Example 8.3

Consider the case where m = 9 and thus (m−1)/2 = 4. The 2’s complement representation of −137 is 1101110111. The corresponding Booth-2 encoding is −1 2 −1 2 −1 and, indeed, −4⁴ + 2 · 4³ − 4² +2 · 4 − 1 = −137. The basic radix-4 multiplication algorithm, with v = 0, can be used.

Algorithm 8.5: Radix-4 Booth multiplication, z = x · y + u

A sequential implementation is shown in Fig. 8.17. It includes a shift register whose content is shifted two positions at each step, a parallel register and an adder whose second operand is −2x, −x, 0, x or 2x depending on the three least significant bits (y_2·i+1, y_2·i, y_2·i−1) of the shift register. At each step, two output bits are generated. Hence, the total computation time is equal to (m + 1)/2·T_clk, where T_clk must be greater than the computation time of an (n + 3)-bit adder. Thus,

$$ T(n,m) \cong \frac{m + 1}{2}\, \cdot \,T_{adder} (n + 3). $$

With respect to a radix-2 shift and add multiplier (Sect. 8.2.1), the computation time has been divided by 2.

The following VHDL model describes the circuit of Fig. 8.17.

The complete circuit also includes an (m + 1)/2-state counter and a control unit. A complete generic model Booth2_sequential_multiplier.vhd is available at the Authors’ web page.

8.5 Constant Multipliers

Given an n-bit constant natural c and an m-bit natural y, the computation of c · y can be performed with any n-bit by m-bit multiplier whose first operand is connected to the constant value c. Then, the synthesis tool will eliminate useless components. In the case of FPGA implementations, an alternative method is to store the constant c within the LUTs.

Assume that the technology at hand includes k-input LUTs. The basic component is a circuit that computes w = c · b, where b is a k-bit natural. The maximum value of w is

$$ \left( { 2^{n} - 1} \right)\left( { 2^{k} - 1} \right) \, = { 2}^{n + k} - { 2}^{k} - { 2}^{n} + 1, $$

so w is an (n + k)-bit number. The circuit is shown in Fig. 8.18, with k = 6. It is made up of n + 6 LUT-6, each of them being programmed in such a way that

$$ w_{ 6j + 5\, \ldots \, 6j} \left( b \right) \, = \, \left[ {c_{ 1} \, \cdot \,b} \right]_{ 6j + 5\, \ldots \, 6j} . $$

Its computation time is equal to T _LUT6.

The following VHDL model describes the circuit of Fig. 8.18.

The function LUT_definition defines the LUT contents.

The circuit of Fig. 8.18 can be used as a component for generating constant multipliers. As an example, a sequential n-bit by m-bit constant multiplier is synthesized. First define a component similar to that of Fig. 8.2, with x constant. It computes z = c · b + u, where c is an n-bit constant natural, b a k-bit natural, and u an n-bit natural. The maximum value of z is

$$ \left( { 2^{n} - 1} \right)\left( { 2^{k} - 1} \right) \, + { 2}^{n} - 1 { } = { 2}^{n + k} - { 2}^{k} , $$

so it is an (n + k)-bit number. It consists of a k-bit by n-bit multiplier (Fig. 8.18) and an (n + k)-bit adder (Fig. 8.19).

Finally, the circuit of Fig. 8.19 can be used to generate a radix-2^k shift and add multiplier that computes z = c · y + u, where c is an n-bit constant natural, y an m-bit natural, and u an n-bit natural. The maximum value of z is

$$ \left( { 2^{n} - 1} \right)\left( { 2^{m} - 1} \right) \, + { 2}^{n} - 1 { } = { 2}^{n + m} - { 2}^{m} , $$

so z is an (n + m)-bit number. Assume that the radix-2^k representation of y is Y _m/k−1 Y _m/k−2… Y ₀, where each Y _i is a k-bit number. The circuit implements the following set of equations:

$$ \begin{aligned} z^{\left( 0 \right)} &= \, \left( {u + c\, \cdot \,Y_{0} } \right)/ 2^{k} , \\ z^{\left( 1\right)} &= \, \left( {z^{(0)} + c\, \cdot \,Y_{ 1} } \right)/ 2^{k} , \\ z^{\left( 2\right)} &= \, \left( {z^{ 1)} + c\, \cdot \,Y_{ 2} } \right)/ 2^{k} , \\ &\qquad\qquad\ldots \\ z^{{\left( {m/k - 1} \right)}} &= \, \left( {z^{(m/k - 2)} + c\, \cdot \,Y_{m/k - 1} } \right)/ 2^{k} . \\ \end{aligned} $$

(8.19)

Thus,

$$ z^{{\left( {m/k - 1} \right)}} \, \cdot \,\left( { 2^{k} } \right)^{m/k} = u + c\, \cdot \,Y_{0} + c\, \cdot \,Y_{ 1} \, \cdot \, 2^{k} + \ldots + c\, \cdot \,Y_{m/k - 1} \, \cdot \,\left( { 2^{k} } \right)^{m/k - 1} , $$

that is to say

$$ z^{{\left( {m/k - 1} \right)}} \, \cdot \, 2^{m} = c\, \cdot \,y + u. $$

The circuit is shown in Fig. 8.20.

The computation time is approximately equal to

$$ T \cong \left( {m/k} \right)\, \cdot \,(T_{LUT - k} + T_{adder} \left( {n + k} \right). $$

The following VHDL model describes the circuit of Fig. 8.20 (k = 6).

A complete model sequential_constant_multiplier.vhd is available at the Authors’ web page.

The synthesis of constant multipliers for integers is left as an exercise.

8.6 FPGA Implementations

Several multipliers have been implemented within a Virtex 5-2 device. Those devices include Digital Signal Processing (DSP) slices that efficiently perform multiplications (25 bits by 18 bits), additions and accumulations. Apart from multiplier implementations based on LUTs and FFs, more efficient implementations, taking advantage of the availability of DSP slices, are also reported. As before, the times are expressed in ns and the costs in numbers of Look Up Tables (LUTs), flip-flops (FFs) and DSP slices. All VHDL models as well as several test benches are available at the Authors’ web page.

8.6.1 Combinational Multipliers

The circuit is shown in Fig. 8.3. The synthesis results for several numbers n and m of bits are given in Table 8.1.

Table 8.1 Combinational multiplier

Full size table

A faster implementation is obtained by using the carry-save method (Fig. 8.4; Table 8.2).

Table 8.2 Carry-save combinational multiplier

Full size table

If multipliers based on the cell of Fig. 8.8b are considered, more efficient circuits can be generated. It is the “by default” option of the synthesizer (Table 8.3).

Table 8.3 Optimized combinational multiplier

Full size table

Finally, if DSP slices are used, better implementations are obtained (Table 8.4).

Table 8.4 Combinational multiplier with DSP slices

Full size table

8.6.2 Radix-2^k Parallel Multipliers

Several m · k bits by n · k bits multipliers (Sect. 8.2.4) have been implemented (Table 8.5).

Table 8.5 Radix-2^k parallel multipliers

Full size table

A faster implementation is obtained by using the carry-save method (Table 8.6).

Table 8.6 Carry-save radix-2^k parallel multipliers

Full size table

The same circuits have been implemented with DSP slices. The implementation results are given in Tables 8.7, 8.8

Table 8.7 Radix-2^k parallel multipliers with DSPs

Full size table

Table 8.8 Carry-save radix-2^k parallel multipliers with DSPs

Full size table

8.6.3 Sequential Multipliers

Several shift and add multipliers have been implemented. The implementation results are given in Tables 8.9, 8.10. Both the clock period T _clk and the total delay (m · T _clk) are given.

Table 8.9 Shift and add multipliers

Full size table

Table 8.10 Sequential carry-save multipliers

Full size table

8.6.4 Combinational Multipliers for Integers

A carry-save multiplier for integers is shown in Fig. 8.12. The synthesis results for several numbers n and m of bits are given in Table 8.11.

Table 8.11 Carry-save mod 2^n+m+1 multipliers

Full size table

Another option is the modified shift and add algorithm of Sect. 8.4.2 (Fig. 8.15; Table 8.12).

Table 8.12 Modified shift and add algorithm

Full size table

In Table 8.13, examples of post correction implementations are reported.

Table 8.13 Multipliers with post correction

Full size table

As a last option, several Booth multipliers have been implemented (Table 8.14).

Table 8.14 Combinational Booth multipliers

Full size table

8.6.5 Sequential Multipliers for Integers

Several radix-4 Booth multipliers have been implemented (Fig. 8.17). Both the clock period T _clk and the total delay (m · T _clk) are given (Table 8.15).

Table 8.15 Sequential radix-4 Booth multipliers

Full size table

8.7 Exercises

1.
Generate the VHDL model of a mixed-radix parallel multiplier (Sect. 8.2.4).
2.
Synthesize a 2n-bit by 2n-bit parallel multiplier using n-bit by n-bit multipliers as building blocks.
3.
Modify the VHDL model sequential_CSA_multiplier.vhd so that the done flag is raised when the final result is available (Comment 8.2).
4.
Generate the VHDL model of a carry-save multiplier with post correction (Sect. 8.4.3).
5.
Synthesize a sequential multiplier based on Algorithm 8.3.
6.
Synthesize a parallel constant multiplier (Sect. 8.5).
7.
Generate models of constant multipliers for integers.
8.
Synthesize a constant multiplier that computes z = c ₁·y ₁ + c ₂·y ₂ + … + c _s·y _s + u.

References

Baugh CR, Wooley BA (1973) A two’s complement parallel array multiplication algorithm. IEEE Trans Comput C 31:1045–1047
Article Google Scholar
Booth AD (1951) A signed binary multiplication technique. Q J Mech Appl Mech 4:126–140
MathSciNet Google Scholar
Dadda L (1965) Some schemes for parallel multipliers. Alta Frequenza 34:349–356
Google Scholar
Wallace CS (1964) A suggestion for fast multipliers. IEEE Trans Electron Comput EC-13:14–17
Article Google Scholar

Download references

Author information

Authors and Affiliations

University Rovira i Virgili, Tarragona, Spain
Jean-Pierre Deschamps & Enrique Cantó
School of Computer Engineering, Universidad Autonoma de Madrid, Ctra. de Colmenar Km. 15, 28049, Madrid, Spain
Gustavo D. Sutter

Authors

Jean-Pierre Deschamps
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo D. Sutter
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Cantó
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean-Pierre Deschamps .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Deschamps, JP., Sutter, G.D., Cantó, E. (2012). Multipliers. In: Guide to FPGA Implementation of Arithmetic Functions. Lecture Notes in Electrical Engineering, vol 149. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2987-2_8

Download citation

DOI: https://doi.org/10.1007/978-94-007-2987-2_8
Published: 03 April 2012
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2986-5
Online ISBN: 978-94-007-2987-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Multipliers

Abstract

Similar content being viewed by others

Review on Different Types of Multipliers and Its Performance Comparisons

Modular Addition and Multiplication

Implementing Fast Carryless Multiplication

Keywords

8.1 Basic Algorithm

8.2 Combinational Multipliers

8.2.1 Ripple-Carry Parallel Multiplier

8.2.2 Carry-Save Parallel Multiplier

8.2.3 Multipliers Based on Multioperand Adders

Example 8.1

8.2.4 Radix-2k and Mixed-Radix Parallel Multipliers

8.3 Sequential Multipliers

8.3.1 Shift and Add Multiplier

8.3.2 Shift and Add Multiplier with CSA

8.4 Integers

8.4.1 Mod 2Bn+m Multiplication

Example 8.2

8.4.2 Modified Shift and Add Algorithm

8.4.3 Post Correction Multiplication

8.4.4 Booth Multiplier

Example 8.3

8.5 Constant Multipliers

8.6 FPGA Implementations

8.6.1 Combinational Multipliers

8.6.2 Radix-2k Parallel Multipliers

8.6.3 Sequential Multipliers

8.6.4 Combinational Multipliers for Integers

8.6.5 Sequential Multipliers for Integers

8.7 Exercises

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

8.2.4 Radix-2^k and Mixed-Radix Parallel Multipliers

8.4.1 Mod 2B^n+m Multiplication

8.6.2 Radix-2^k Parallel Multipliers