Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Multiplication is a basic arithmetic operation whose execution is based on 1-digit by 1-digit multipliers and multi-operand adders. Most FPGA families include the basic components for implementing fast and cost-effective multipliers. Furthermore, they also include optimized fixed-size multipliers which, in turn, can be used for implementing larger-size multipliers.

The basic multiplication algorithm is described in Sect. 8.1. Several combinational implementations are proposed in Sect. 8.2. They correspond to different types of multi-operand adders: iterative ripple-carry adders, carry-save adders, multi-operand adders based on counters, radix-2k and mixed-radix adders. Sequential implementations are proposed in Sect. 8.3. They used the shift and add method implemented with either a ripple-carry adder or a carry-save adder. If integer operands are considered, several options are proposed in Sect. 8.4. A first method consists of multiplying B’s complement integers as they are naturals; the drawback of this conceptually simple method is that the operands must be represented, and multiplied, with as many digits as the final result. Better options are a modification of the shift and add algorithm, multiplication of naturals followed by a post-correction, and the Booth algorithms. The last section describes a LUT-based method for implementing a constant multiplier, that is to say, circuits that compute c · y + u, where c is a constant.

8.1 Basic Algorithm

Consider two radix-B numbers

$$ x = x_{n - 1} \,\cdot\,B^{n - 1} + x_{n - 2} \,\cdot\,B^{n - 2} + \, \ldots \, + x_{ 1} \,\cdot\,B + x_{0} \,{\text{and}}\,y = y_{m - 1} \,\cdot\,B^{m - 1} + y_{m - 2} \,\cdot\,B^{m - 2} + \, \ldots \, \, + y_{ 1} \,\cdot\,B + y_{0} , $$

where x i and y i belong to {0, 1,…, B −1}. An n-digit by m-digit multiplier generates a radix-B number

$$ z = z_{n + m - 1} \,\cdot\,B^{n + m - 1} + z_{n + m - 2} \,\cdot\,B^{n + m - 2} + \, \ldots \, + z_{ 1} \,\cdot\,B + z_{0} $$

such that

$$ z = x\, \cdot \,y. $$

A somewhat more general definition considers the addition of two additional numbers

$$ u = u_{n - 1} \,\cdot\,B^{n - 1} + u_{n - 2} \,\cdot\,B^{n - 2} + \, \ldots \, + u_{ 1} \,\cdot\,B + u_{0} \,{\text{and}}\,v = v_{m - 1} \,\cdot\,B^{m - 1} \, + v_{m - 2} \,\cdot\,B^{m - 2} + \, \ldots \, + v_{ 1} \,\cdot\,B + v_{0} , $$

so that

$$ z = x\,\cdot\,y + u + v. $$
(8.1)

Observe that the maximum value of z is

$$ \left( {B^{n} - 1} \right)\left( {B^{m} - 1} \right) \, + \, \left( {B^{n} - 1} \right) \, + \, \left( {B^{m} - 1} \right) \, = B^{n + m} - 1. $$

In order to compute (8.1), first define a 1-digit by 1-digit multiplier: given four B-ary digits a, b, c and d, it generates two B-ary digits e and f such that

$$ a\,\cdot\,b + c + d = e\,\cdot\,B + f $$
(8.2)

(Fig. 8.1a).

Fig. 8.1
figure 1

1-digit by 1-digit multiplier. a Symbol, b internal structure (B = 2)

If B = 2, it amounts to a 2-input AND gate and a 1-digit adder (Fig. 8.1b).

An n-digit by 1-digit multiplier made up of n 1-digit by 1-digit multipliers is shown in Fig. 8.2. It computes as

Fig. 8.2
figure 2

n-digit by 1-digit multiplier

$$ z = x\,\cdot\,b + u + d $$
(8.3)

where x and u are n-digit numbers, b and d are 1-digit numbers, and z is an (n + 1)-digit number. Observe that the maximum value of z is

$$ \left( {B^{n} - 1} \right)\left( {B - 1} \right) \, + \, \left( {B^{n} - 1} \right) \, + \, \left( {B - 1} \right) \, = B^{n + 1} - 1. $$

Using the iterative circuit of Fig. 8.2 as a computation resource, the computation of (8.1) amounts to computing the m n-digit by 1-digit products

$$ \begin{aligned} z^{\left( 0 \right)} &= x\,\cdot\,y_{0} + u + v_{0} , \\ z^{\left( 1\right)} &= \, \left( {x\,\cdot\,y_{ 1} + v_{ 1} } \right)B, \\ z^{\left( 2\right)} &= \, \left( {x\,\cdot\,y_{ 2} + v_{ 2} } \right)B^{ 2} , \\ &\qquad\quad\ldots \\ z^{{\left( {m - 1} \right)}} &= \, \left( {x\,\cdot\,y_{m - 1} + v_{m - 1} } \right)B^{m - 1} , \\ \end{aligned} $$
(8.4)

and to adding them, that is

$$ z = z^{\left( 0 \right)} + z^{( 1)} + z^{( 2)} + \, \ldots \, + z^{(m - 1)} = x\,\cdot\,y + u + v. $$
(8.5)

For that, one of the multioperand adders of Sect. 7.7 can be used. As an example, if Algorithm 7.2 is used, then z is computed as follows.

Algorithm 8.1: Multiplication, right to left algorithm

8.2 Combinational Multipliers

8.2.1 Ripple-Carry Parallel Multiplier

The combinational circuit of Fig. 8.3 implements Algorithm 8.1 (with n = 4 and m = 3). One of its critical paths has been shaded. Its computation time is equal to

Fig. 8.3
figure 3

Combinational multiplier

$$ T_{multiplier} \left( {n,m} \right) \, = \, \left( {n + 2m - 2} \right)\,\cdot\,T_{multiplier} \left( { 1, 1} \right). $$
(8.6)

The following VHDL model describes the circuit of Fig. 8.3 (B = 2).

A complete generic model parallel_multiplier.vhd is available at the Authors’ web page.

8.2.2 Carry-Save Parallel Multiplier

A straightforward modification of the multiplier of Fig. 8.3, similar to the carry-save principle, is shown in Fig. 8.4. The circuit is made up of an n-by-m array of 1-by-1 multipliers, whose computation time is equal to n · T(1,1), plus an m-digit output adder. Its critical path has been shaded. Its computation time is equal to

Fig. 8.4
figure 4

Carry-save combinational multiplier

$$ T_{multiplier} \left( {n,m} \right) \, = n\, \cdot \,T_{multiplier} \left( { 1, 1} \right) \, + m\, \cdot \,T_{adder} \left( 1\right) \, \le \, \left( {n + m} \right)\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$
(8.7)

The following VHDL model describes the circuit of Fig. 8.4 (B = 2).

A complete generic model parallel_csa_multiplier.vhd is available at the Authors’ web page.

8.2.3 Multipliers Based on Multioperand Adders

A straightforward implementation of Eqs. (8.4) and (8.5) can also be considered (Fig. 8.5). For that, any type of multioperand adder can be used.

Fig. 8.5
figure 5

Multiplier with a multioperand adder

Example 8.1

Consider an n-bit by 7-bit multiplier. The 7-operand adder can be divided up into a 7-to-3 counter, a 3-to-2 counter and a ripple-carry adder. The complete structure is shown in Fig. 8.6 and is described by the following VHDL model:

Fig. 8.6
figure 6

An n-bit by 7-bit multiplier

A complete generic model N_by_7_multiplier.vhd is available at the Authors’ web page.

Numerous multipliers, based on trees of counters, have been proposed and reported, among others the Wallace and Dadda multipliers (Wallace [4]; Dadda [3]). Nevertheless, as already mentioned before (Comment 7.5), in many cases the best FPGA implementations are based on relatively simple algorithms, to which correspond regular circuits that allow taking advantage of the special purpose carry logic circuitry. To follow, an example of efficient FPGA implementation is described.

Consider the set of equations (8.4). If two successive steps are merged within an only step (loop unrolling), the new set of equations is:

$$ \begin{aligned} z^{{\left( { 1,0} \right)}} &= \, \left( {x\,\cdot\,y_{ 1} + v_{ 1} } \right)B + x\,\cdot\,y_{0} + u + v_{0} , \\ z^{{\left( { 3, 2} \right)}} &= \, \left[ {\left( {x\,\cdot\,y_{ 3} + v_{ 3} } \right)B + \, \left( {x\,\cdot\,y_{ 2} + v_{ 2} } \right)} \right]B^{ 2} , \\ &\qquad\qquad\ldots \\ z^{{\left( {m - 1,m - 2} \right)}} &= \, [\left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B + \, \left( {x\, \cdot \,y_{m - 2} + v_{m - 2} } \right)]B^{m - 2} , \\ \end{aligned} $$
(8.8)

and the product is equal to

$$ z = z^{{\left( { 1,0} \right)}} + z^{( 3, 2)} + \, \ldots \, + z^{(m - 1,m - 2)} . $$

Assuming that u = 0, the basic operation to implement (8.8) is

$$ z^{{\left( {i + 1,i} \right)}} = \, \left( {x\, \cdot \,y_{j + 1} + v_{j + 1} } \right)B + \left( {x\,\cdot\,y_{j} + v_{j} } \right) $$

to which corresponds the circuit of Fig. 8.7 (with n = 4).

Fig. 8.7
figure 7

4-digit by 2-digit multiplier

The circuit of Fig. 8.7 can be decomposed into n + 1 vertical slices of the type shown in Fig. 8.8a (with obvious simplifications regarding the first and last slices). Finally, if B = 2 and v j  = 0, the carries of the first line are equal to 0, so that the circuit of Fig. 8.8a can be implemented as shown in Fig. 8.8b.

Fig. 8.8
figure 8

Iterative cell of a parallel multiplier

Comment 8.1

Most FPGA’s include the basic components for implementing the structure of Fig. 8.8b, and the synthesis tools have the capability to generate optimized multipliers from a simple VHDL expression, such as

$$ z\,<\,=\,{\text{ x}}\,*\,{\text{y}}; $$

Furthermore, many FPGA’s also include fixed-size multiplier blocks.

8.2.4 Radix-2k and Mixed-Radix Parallel Multipliers

The basic multiplication algorithm (Sect. 8.1) and the corresponding ripple-carry and carry-save multipliers (Sects. 8.2.1 and 8.2.2) have been defined for any radix-B. In particular, radix-2k multipliers can be defined. This allows the synthesis of n · k-bit by m · k-bit multipliers using k-bit by k-bit multipliers as building blocks.

The following VHDL model defines a radix-2k ripple-carry parallel multiplier. The main iteration consists of m · n instantiations of any type of k-bit by k-bit combinational multiplier that computes z = a · b + c + d and represents z under the form \( z_{H} \, \cdot \, 2^{k} \, + z_{L} , \) where z H and z L are k-bit numbers:

A complete generic model base_2k_parallel_multiplier.vhd is available at the Authors’ web page.

The stored-carry encoding can also be applied. Once again, the main iteration consists of m · n instantiations of any type of k-bit by k-bit multiplier, and the connections are similar to those of the carry-save multiplier of Sect. 8.2.2. A complete generic model base_2k_csa_multiplier.vhd is available at the Authors’ web page.

A straightforward generalization of relations (8.2) to (8.5) allows defining mixed-radix combinational multipliers. First consider the circuit of Fig. 8.1a, assuming that

$$ a,c\, \in \left\{ {0,{ 1},\, \ldots \,,B_{ 1} - 1} \right\},\,{\text{and}}\,b,d\, \in \left\{ {0,{ 1},\, \ldots \,,B_{ 2} - 1} \right\}. $$

Then

$$ z = a\, \cdot \,b + c + d \le \, \left( {B_{ 1} - 1} \right)\, \cdot \,\left( {B_{ 2} - 1} \right) \, + \, \left( {B_{ 1} - 1} \right) \, + \, \left( {B_{ 2} - 1} \right) \, = B_{ 1} \, \cdot \,B_{ 2} - { 1}, $$

so that z can be expressed under the form

$$ z = e\, \cdot \,B_{ 1} + f,{\text{ with}}\,e \in \left\{ {0,{ 1},\, \ldots \,,B_{ 2} - 1} \right\},f \in \left\{ {0,{ 1},\, \ldots \,,B_{ 1} - 1} \right\}. $$

Then, consider the circuit of Fig. 8.2, assuming that x and u are n-digit radix-B 1 numbers, and b and d are 1-digit radix-B 2 numbers. Thus,

$$ x\, \cdot \,b + u + d = z_{n} \, \cdot \,B_{ 1}^{n} + z_{n - 1} \, \cdot \,B_{ 1}^{n - 1} + \, \ldots \, + z_{ 1} \, \cdot \,B_{ 1} + z_{0} , $$
(8.9)

with

$$ z_{n} \in \left\{ {0,{ 1},\, \ldots \,,B_{ 2} - 1} \right\}{\text{ and}}\,z_{i} \in \left\{ {0, 1,\, \ldots \,,B_{ 1} - 1} \right\},\forall i\,{\text{in }}\left\{ {0, 1,\, \ldots \,,n - 1} \right\}. $$

Finally, given two n-digit radix-B 1 numbers x and u, and two m-digit radix-B 2 numbers y and v, compute

$$ \begin{aligned} z^{\left( 0 \right)} &= x\, \cdot \,y_{0} + u + v_{0} , \\ z^{\left( 1\right)} &= \, \left( {x\, \cdot \,y_{ 1} + v_{ 1} } \right)B_{ 2} , \\ z^{\left( 2\right)} &= \, \left( {x\, \cdot \,y_{ 2} + v_{ 2} } \right)B_{ 2}^{ 2} , \\ &\qquad\quad\ldots \\ z^{{\left( {m - 1} \right)}} &= \, \left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B_{ 2}^{m - 1} . \\ \end{aligned} $$
(8.10)

Then

$$ z = z^{\left( 0 \right)} + z^{( 1)} + z^{( 2)} + \, \ldots \, + z^{(m - 1)} = x\, \cdot \,y + u + v. $$
(8.11)

Consider the case where

$$ B_{1} = 2^{{k_{1} }} ,\;B_{2} = 2^{{k_{2} }} . $$

An easy way to define a VHDL model of the corresponding multiplier consists in first modelling a circuit that implements (8.9). The main iteration consists of n instantiations of any type of k 1-bit by k 2-bit combinational multiplier that computes

$$ a\,\cdot\,b + c + d = z_{H} \,\cdot\,2^{{k_{1} }} + z_{L} , $$

where z H is a k 2-bit number and z L a k 1-bit number:

Then, it remains to instantiate m rows:

A complete generic model MR_parallel_multiplier.vhd is available at the Authors’ web page.

The circuit defined by the preceding VHDL model is a bidirectional array similar to that of Fig. 8.3, but with more complex connections. As an example, with k 1 = 4 and k 2 = 2, the connections corresponding to cell (j, i) are shown in Fig. 8.9. As before, a stored-carry encoding circuit could also be designed, but with an even more complex connection pattern. It is left as an exercise.

Fig. 8.9
figure 9

Part of a 4n-bit by 2n-bit multiplier using 4-bit by 2-bit multiplication blocks

8.3 Sequential Multipliers

8.3.1 Shift and Add Multiplier

In order to synthesize sequential multipliers, the basic algorithm of Sect. 8.1 can be modified. For that, Eq. (8.4) are substituted by the following:

$$ \begin{aligned} z^{\left( 0 \right)} &= \, \left( {u + x\, \cdot \,y_{0} + v_{0} } \right)/B, \\ z^{\left( 1\right)} &= \, \left( {z^{(0)} + x\, \cdot \,y_{ 1} + v_{ 1} } \right)/B, \\ z^{\left( 2\right)} &= \, \left( {z^{ 1)} + x\, \cdot \,y_{ 2} + v_{ 2} } \right)/B, \\ &\qquad\qquad\ldots \\ z^{{\left( {m - 1} \right)}} &= \, \left( {z^{(m - 2)} + x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)/B. \\ \end{aligned} $$
(8.12)

Multiply the first equation by B, the second by B 2, and so on, and add the so obtained equations. The result is

$$ z^{{\left( {m - 1} \right)}} B^{m} = u + x\, \cdot \,y_{0} + v_{0} + \, \left( {x\, \cdot \,y_{ 1} + v_{ 1} } \right)B + \, \ldots \, + \, \left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B^{m - 1} = xy + u + v. $$

Algorithm 8.2: Shift and add multiplication

A data path for executing Algorithm 8.2 is shown in Fig. 8.10a. The following VHDL model describes the circuit of Fig. 8.10a (B = 2).

Fig. 8.10
figure 10

Shift and add multipliers

The complete circuit also includes an m-state counter and a control unit. A complete generic model shift_and_add_multiplier.vhd is available at the Authors’ web page.

If v = 0, the same shift register can be used for storing both y and the least significant bits of z. The modified circuit is shown in Fig. 8.10b. A complete generic model shift_and_add_multiplier2 is also available.

The computation time of the circuits of Fig. 8.10 is approximately equal to

$$ T_{multiplier} \left( {n,m} \right) \, = m\, \cdot \,T_{multiplier} \left( {n, 1} \right) \, = m\, \cdot \,n\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$
(8.13)

8.3.2 Shift and Add Multiplier with CSA

The shift and add algorithm can also be executed with stored-carry encoding. After m steps the result is obtained under the form

$$ s_{n - 1} \,B^{n + m - 1} + \, \left( {c_{n - 2} + s_{n - 2} } \right)B^{n + m - 2} + \, \ldots \, + \, \left( {c_{0} + s_{0} } \right)B^{m} + z_{m - 1} \,B^{m - 1} + \, \ldots \, + z_{ 1} B + z_{0} , $$

and an additional n-digit adder computes

$$ s_{n - 1} \,B^{n - 1} + \, \left( {c_{n - 2} + s_{n - 2} } \right)B^{n - 2} + \ldots + \, \left( {c_{0} + s_{0} } \right) \, = z_{m + n - 1} \,B^{n - 1} + \, \ldots \, + z_{m + 1} \,B + z_{m} . $$

The corresponding data path is shown in Fig. 8.11. The carry-save adder computes

$$ y_{ 1} + y_{ 2} + y_{ 3} = s + c, $$
Fig. 8.11
figure 11

Sequential carry-save multiplier

where y 1, y 2 and y 3 are n-bit numbers, and s and c are (n + 1)-bit numbers. At the end of step i, the less significant bit of s is z i , and the n most significant bits of s and c are transmitted to the next step:

The complete circuit also includes an m-state counter and a control unit. A complete generic model sequential_CSA_multiplier.vhd is available at the Authors’ web page. The minimum clock period is equal to the delay of a 1-bit by 1-bit multiplier. Thus, the total computation time is equal to

$$ T_{multiplier} \left( {n,m} \right) \, = m\, \cdot \,T_{multiplier} \left( { 1, 1} \right) \, + T_{adder} \left( n \right) \, \le \, \left( {n + m} \right)\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$
(8.14)

Comment 8.2

In sequential_CSA_multiplier.vhd the done flag is raised as soon as the final values of the adder inputs are available. A more correct control unit should raise the flag k cycles later, being k · T clk an upper bound of the n-bit adder delay. The value of k could be defined as a generic parameter (Exercise 8.3).

8.4 Integers

Given four B’s complement integers

$$ x = x_{n} x_{n - 1} x_{n - 2} \ldots x_{0} ,\,\,\,y = \, y_{m} y_{m - 1} y_{m - 2} \ldots y_{0} ,\,\,u = u_{n} u_{n - 1} \,\,u_{n - 2} \ldots u_{0} ,\,\,\,\,v = v_{m} v_{m - 1} v_{m - 2} \, \ldots \,v_{0} , $$

belonging to the ranges

$$ - B^{n} \le \, x < B^{n} ,-B^{m} \le \, y < B^{m} , - B^{n} \le \, u < B^{n} ,-B^{m} \le \, v < B^{m} , $$

then z = x · y + u + v belongs to the interval

$$ - B^{n + m + 1} \le \, z < B^{n + m + 1} . $$

Thus, z is a B’s complement number of the form

$$ z = z_{n + m + 1} \,z_{n + m} \,\,z_{n + m - 1} \, \ldots \,z_{ 1} z_{0} . $$

8.4.1 Mod 2Bn+m Multiplication

The integer represented by a vector \( x_{n} \,x_{n - 1} \,x_{n - 2} \, \ldots \,x_{ 1} \,x_{0} \) is

$$ x = - x_{n} B^{n} + x_{n - 1} \,B^{n - 1} + x_{n - 2} \,B^{n - 2} \, + \ldots + x_{ 1} B + x_{0} , $$

while the natural natural(x) represented by this same vector is

$$ natural\,\left( x \right) \, = x_{n} B^{n} + x_{n - 1} \,B^{n - 1} + x_{n - 2} \,B^{n - 2} + \ldots + x_{ 1} B + x_{0} . $$

As x n ∈ {0, 1}, either natural(x) = x or natural(x) = x +2B n. So,

$$ natural\left( x \right) = x\,\bmod \,2B^{n} . $$

The following method can be used to compute z = x · y + u + v. First, represent the operands x, y, u and v with the same number of digits (n + m + 2) as the result z (digit extension, Sect. 7.8). Then, compute z = x · y + u + v as if x, y, u and v were naturals:

$$ z = natural\left( x \right)\, \cdot \,natural\left( y \right) \, + natural\left( u \right) \, + natural\left( v \right) \, = natural\left( {x\, \cdot \,y + u + v} \right). $$

Finally, reduce z modulo \( 2B^{n + m + 1} . \) Assume that before the mod \( 2B^{n + m + 1} \) reduction

$$ z = \, \ldots \, + z_{n + m + 1} \,B^{n + m + 1} + z_{n + m} \,B^{n + m} + z_{n + m - 1} \,B^{n + m - 1} + \ldots + z_{ 1} B + z_{0} ; $$

then

$$ z\,\bmod \,2B^{n + m + 1} = \left( { \ldots + z_{n + m + 1} \bmod 2} \right)B^{n + m + 1} + z_{n + m} \,B^{n + m} + z_{n + m + 1} \,B^{n + m - 1} + \ldots + z_{1} B + z_{0} . $$

In particular, if B is even,

$$ z\, \bmod \, 2B^{n + m + 1} = \left( {z_{n + m + 1} \, \mod \, 2} \right)B^{n + m + 1} + z_{n + m} \,B^{n + m} + z_{n + m - 1} \,B^{n + m - 1} + \ldots + z_{ 1} B + z_{0} . $$

Example 8.2

Assume that B = 10, n = 4, m = 3, x = 7918, y = −541, u = −7017, v = 742, and compute z = 7918·(−541) + (−7017) + 742. In 10’s complement: x = 07918, y = 1459, u = 12983, v = 0742.

  1. 1.

    Express all operands with 9 digits: x = 000007918, y = 199999459, u = 199992983, v = 000000742.

  2. 2.

    Compute x · y + u + v: 000007918·199999459 + 199992983 + 000000742 = 1583795710087.

  3. 3.

    Reduce 1583795710087 modulo 2·108: (1583795710087) mod 2·108 = (7 mod 2)·108 + 95710087 = 195710087.

The result 195710087 is the 10’s complement representation of −4289913.

Thus, any multiplier for natural numbers can be used. As an example, an (n + m + 2)-digit by (n + m + 2)-digit carry-save multiplier could be used (Fig. 8.4). As the result is reduced modulo 2B n+m+1, only the rightmost part of the circuit is used (if B is even), so that there is no output adder, and the most significant digit is reduced mod 2. An example with n = 3 and m = 2 is shown in Fig. 8.12. The corresponding computation time is equal to

$$ \left( {n + m + 2} \right)\, \cdot \,T_{multiplier} \left( { 1, 1} \right). $$
(8.15)

This delay is practically the same as that of a carry-save combinational multiplier (8.7). Nevertheless, the number of 1-digit by 1-digit multiplication cells is equal to \( 1 { } + { 2 } + { 3 } + \ldots + \, \left( {n + m + 2} \right) \, = \, \left( {n + m + 2} \right)\left( {n + m + 3} \right)/ 2 \) instead of n · m.

Fig. 8.12
figure 12

Carry-save multiplier for integers (n = 3, m = 2)

A very simple way to generate a VHDL model consists of defining (n + m + 2)-bit representations of all operands and instantiating an (n + m + 2)-bit by (n + m + 2)-bit carry-save multiplier:

Only n + m + 2 output bits of the carry-save multiplier are connected to output ports, and the synthesis program will prune the circuit accordingly.

A complete generic model integer_CSA_multiplier.vhd is available at the Authors’ web page.

To conclude, this approach is conceptually attractive because any type of multiplier for natural numbers can be used. Nevertheless, the cost of the corresponding circuits is very high.

8.4.2 Modified Shift and Add Algorithm

Consider again four B’s complement integers

$$ x = x_{n} \,x_{n - 1} \,x_{n - 2} \, \ldots \,x_{0} ,y \, = \, y_{m} y_{m - 1} \,y_{m - 2} \, \ldots \,y_{0} ,\,\,u = u_{n} \,u_{n - 1} \,\,u_{n - 2} \, \ldots \,u_{0} ,\,\,v = v_{m} \,v_{m - 1} \,v_{m - 2} \, \ldots \,v_{0} . $$

A set of equations similar to (8.12) can be defined:

$$ \begin{aligned} z^{\left( 0 \right)} &= \left( {u + x\, \cdot \,y_{0} + v_{0} } \right)/B, \\ z^{\left( 1\right)} &= \left( {z^{(0)} + x\, \cdot \,y_{ 1} + v_{ 1} } \right)/B, \\ z^{\left( 2\right)} &= \left( {z^{ 1)} + x\, \cdot \,y_{ 2} + v_{ 2} } \right)/B, \\ &\qquad\quad\ldots \\ z^{{\left( {m - 1} \right)}} &= \left( {z^{(m - 2)} + x\cdot\,y_{m - 1} + v_{m - 1} } \right)/B, \\ z^{\left( m \right)} &= \left( {z^{(m - 1)} - x\cdot\,y_{m} - v_{m} } \right)/B. \\ \end{aligned} $$
(8.16)

Multiply the first equation by B, the second by B 2, and so on, and add the m + 1 so obtained equations. The result is

$$ z^{\left( m \right)} B^{m + 1} = u + x\, \cdot \,y_{0} + v_{0} + \, \left( {x\, \cdot \,y_{ 1} + v_{ 1} } \right)B + \, \ldots \, + \left( {x\, \cdot \,y_{m - 1} + v_{m - 1} } \right)B^{m - 1} - \left( {x\, \cdot \,y_{m} + v_{m} } \right)B^{m} = xy + u + v. $$

Algorithm 8.3: Modified shift and add multiplicationIn what follows it is assumed that v m  = 0, that is to say v ≥ 0; so, in order to implement Algorithm 8.3, the two following computation primitives must be defined:

$$ z = u + x\, \cdot \,b + d $$
(8.17)

and

$$ z = u - x\, \cdot \,b, $$
(8.18)

where

$$ - B^{n} \le x < B^{n} , - B^{n} \le u < B^{n} , \, 0 \, \le b < B, \, 0 \, \le d < B. $$

Thus, in the first case,

$$ - B^{n + 1} \le z < B^{n + 1} , $$

and in the second case

$$ - B^{n + 1} + \, \left( {B - 1} \right) \, \le z < B^{n + 1} , $$

so that in both cases z is an (n + 2)-digit B’s complement integer and natural(z) = z mod 2B n+1.

The first primitive (8.17) is implemented by the circuit of Fig. 8.13 and the second (8.18) by the circuit of Fig. 8.14. In both, circuit z n+1 is computed modulo 2.

Fig. 8.13
figure 13

First computation primitive

Fig. 8.14
figure 14

Second computation primitive

As an example, the combinational circuit of Fig. 8.15 implements Algorithm 8.3 (with n = m = 2). Its cost and computation time are practically the same as in the case of a ripple-carry multiplier for natural numbers. It can be described by the following VHDL model.

Fig. 8.15
figure 15

Combinational multiplier for integers (B = 2, m = n = 2)

A complete generic model modified_parallel_multiplier.vhd is available at the Authors’ web page.

The design of a sequential multiplier based on Algorithm 8.3 is left as an exercise.

8.4.3 Post Correction Multiplication

Given four B’s complement integers

$$ x = \, x_{n} \,x_{n - 1} \,x_{n - 2} \, \ldots \,x_{0} ,y \, = \, y_{m} \,y_{m - 1} \,y_{m - 2} \, \ldots \,y_{0} ,u \, = \, u_{n} u_{n - 1} \,u_{n - 2} \, \ldots \,u_{0} ,v \, = \, v_{m} v_{m - 1} \,v_{m - 2} \, \ldots \,v_{0} , $$

then z = x · y + u + v, belonging to the interval \( - B^{n + m + 1} \le \, z < B^{n + m + 1} , \) can be expressed under the form

$$ z = \left( {X_{0} \, \cdot \,Y_{0} + U_{0} + V_{0} } \right) \, + x_{n} \, \cdot \,y_{m} \, \cdot \,B^{n + m} - \, \left( {x_{n} \, \cdot \,Y_{0} + u_{n} } \right)\, \cdot \,B^{n} - \, \left( {y_{m} \, \cdot \,X_{0} + v_{m} } \right)\, \cdot \,B^{n} , $$

where X 0, Y 0, U 0 and V 0 are four naturals

$$ X_{0} = x_{n - 1} \,x_{n - 2} \, \ldots \,x_{0} ,Y_{0} = \, y_{m - 1} \,y_{m - 2} \, \ldots \,y_{0} ,\,U_{0} = \, u_{n - 1} \,u_{n - 2} \, \ldots \,_{ } u_{0} ,\,X_{0} = v_{m - 1} v_{m - 2} \, \ldots \,v_{ 1} v_{0} $$

deduced from x, y, u and v by eliminating the sign bits. Thus, the computation of z amounts to the computation of

$$ Z_{0} = X_{0} \, \cdot \,Y_{0} + U_{0} + V_{0} , $$

that can be executed by any type of multiplier for naturals, plus a post correction that consists of several additions and left shifts.

If B = 2 and u = v = 0, then

$$ z = x\, \cdot \,y = X_{0} \, \cdot \,Y_{0} + x_{n} \, \cdot \,y_{m} \, \cdot \, 2^{n + m} -x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{n} . $$

The (n + m + 2)-bit 2’s complement representations of \( - x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} \) and \( - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{m} \) are

$$ (2^{m + 1} + 2^{m} + \overline{{(x_{n} \, \cdot \,y_{m - 1} )}} \, \cdot \,2^{m - 1} + \ldots + \overline{{(x_{n} \, \cdot \,y_{0} )}} \, \cdot \,2^{0} + 1)\, \cdot \,2^{n} \;\bmod \,2^{n + m - 2} , $$

and

$$ (2^{n + 1} + 2^{n} + \overline{{(y_{m} \, \cdot \,x_{n - 1} )}} \, \cdot \,2^{n - 1} + \, \ldots \, + \overline{{(y_{m} \, \cdot \,x_{0} )}} \, \cdot \,2^{0} + 1)\, \cdot \,2^{m} \;\bmod \,2^{n + m - 2} , $$

so that the representation of \( x_{n} \, \cdot \,y_{m} \, \cdot \, 2^{n + m} -x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{n} \) is

$$ \begin{gathered} (2^{n + m + 1} + x_{n} \, \cdot \,y_{m} \, \cdot \,2^{n + m} + \overline{{(x_{n} \, \cdot \,y_{m - 1} )}} \, \cdot \,2^{n + m - 1} + \ldots + \overline{{(x_{n} \, \cdot \,y_{0} )}} \, \cdot \,2^{n} \hfill \\ + 2^{n} + \overline{{(y_{m} \, \cdot \,x_{n - 1} )}} \, \cdot \,2^{n + m - 1} + \ldots + \overline{{(y_{m} \, \cdot \,x_{0} )}} \, \cdot \,2^{m} + 2^{m} )\;\bmod \,2^{n + m + 2} . \hfill \\ \end{gathered} $$

A simple modification of the combinational multipliers of Fig. 8.3 and 8.4 allows computing x · y, where x is an (n + 1)-bit 2’s complement integer and y an (m + 1)-bit 2’s complement integer. An example is shown in Fig. 8.16 (n = 3, m = 2). The nand multiplication cells are similar to that of Fig. 8.1b, but for the substitution of the AND gate by a NAND gate [1].

Fig. 8.16
figure 16

Multiplier with post correction

The following VHDL model describes the circuit of Fig. 8.16.

A complete generic model postcorrection_multiplier.vhd is available at the Authors’ web page.

8.4.4 Booth Multiplier

Given an (m + 1)-bit 2’s complement integer \( y = - y_{m} \, \cdot \, 2^{m} + y_{m - 1} \, \cdot \, 2^{m - 1} + \, \ldots \, + y_{ 1} \, \cdot \, 2 { } + y_{0} , \) define

$$ y_{0}^{\prime} = - y_{0} \,{\text{and}}\,y_{j}^{\prime} = - y_{j} + y_{j - 1} ,\,\forall i\,{\text{in }}\left\{ { 1, 2, \ldots ,m} \right\}, $$

so that all coefficients y i ’ belong to {−1, 0, 1}. Then y can be represented under the form

$$ y = y_{m}^{\prime} \, \cdot \, 2^{m} + y_{m - 1}^{\prime} \, \cdot \, 2^{m - 1} + \ldots + y_{1}^{\prime} \, \cdot \, 2 { } + y_{0}^{\prime} , $$

the so-called Booth’s encoding of y (Booth [2]. Unlike the 2’s complement representation in which y m has a specific function, all coefficients y i ’ have the same function. Formally, the Booth’s representation of an integer is the same as the binary representation of a natural. The basic multiplication algorithm (Algorithm 8.1), with v = 0, can be used.

Algorithm 8.4: Booth multiplication, z = x·y + u

The following VHDL model describes a combinational circuit based on Algorithm 8.4. A complete generic model Booth1_multiplier.vhd is available at the Authors’ web page.

Higher radix Booth multipliers can be defined. Given an (m + 1)-bit 2’s complement integer \( y = - y_{m} \, \cdot \, 2^{m} + y_{m - 1} \, \cdot \, 2^{m - 1} + \ldots + y_{ 1} \, \cdot \, 2 { } + y_{0} , \) where m is odd, define

$$ y_{0}^{\prime} = - 2\, \cdot \,y_{1} + y_{0} ,y_{i}^{\prime} = - 2\, \cdot \,y_{2\, \cdot \,i + 1} + y_{2\, \cdot \,i} + y_{2\cdot\,i - 1} ,\,\forall i\,{\text{in}}\left\{ {1,2, \ldots ,\left( {m - 1} \right)/2} \right\}, $$

so that all coefficients y i ’ belong to {−2, −1, 0, 1, 2}. Then y can be represented under the form

$$ y = y_{{\left( {m - 1} \right)/ 2}}^{\prime} \, \cdot \, 4^{{\left( {m - 1} \right)/ 2}} + y_{{\left( {m - 1} \right)/ 2- 1}}^{\prime} \, \cdot \, 4^{{\left( {m - 1} \right)/ 2- 1}} + \, \ldots \, + y_{1}^{\prime} \, \cdot \, 4 { } + y_{0}^{\prime} , $$

the so-called Booth-2 encoding of y.

Example 8.3

Consider the case where m = 9 and thus (m−1)/2 = 4. The 2’s complement representation of −137 is 1101110111. The corresponding Booth-2 encoding is −1 2 −1 2 −1 and, indeed, −44 + 2 · 43 − 42 +2 · 4 − 1 = −137. The basic radix-4 multiplication algorithm, with v = 0, can be used.

Algorithm 8.5: Radix-4 Booth multiplication, z = x · y + u

A sequential implementation is shown in Fig. 8.17. It includes a shift register whose content is shifted two positions at each step, a parallel register and an adder whose second operand is −2x, −x, 0, x or 2x depending on the three least significant bits (y2·i+1, yi , y2·i−1) of the shift register. At each step, two output bits are generated. Hence, the total computation time is equal to (m + 1)/2·T clk , where T clk must be greater than the computation time of an (n + 3)-bit adder. Thus,

$$ T(n,m) \cong \frac{m + 1}{2}\, \cdot \,T_{adder} (n + 3). $$
Fig. 8.17
figure 17

Sequential radix-4 Booth multiplier

With respect to a radix-2 shift and add multiplier (Sect. 8.2.1), the computation time has been divided by 2.

The following VHDL model describes the circuit of Fig. 8.17.

The complete circuit also includes an (m + 1)/2-state counter and a control unit. A complete generic model Booth2_sequential_multiplier.vhd is available at the Authors’ web page.

8.5 Constant Multipliers

Given an n-bit constant natural c and an m-bit natural y, the computation of c · y can be performed with any n-bit by m-bit multiplier whose first operand is connected to the constant value c. Then, the synthesis tool will eliminate useless components. In the case of FPGA implementations, an alternative method is to store the constant c within the LUTs.

Assume that the technology at hand includes k-input LUTs. The basic component is a circuit that computes w = c · b, where b is a k-bit natural. The maximum value of w is

$$ \left( { 2^{n} - 1} \right)\left( { 2^{k} - 1} \right) \, = { 2}^{n + k} - { 2}^{k} - { 2}^{n} + 1, $$

so w is an (n + k)-bit number. The circuit is shown in Fig. 8.18, with k = 6. It is made up of n + 6 LUT-6, each of them being programmed in such a way that

$$ w_{ 6j + 5\, \ldots \, 6j} \left( b \right) \, = \, \left[ {c_{ 1} \, \cdot \,b} \right]_{ 6j + 5\, \ldots \, 6j} . $$
Fig. 8.18
figure 18

LUT implementation of a k-bit by n-bit constant multiplier

Its computation time is equal to T LUT6.

The following VHDL model describes the circuit of Fig. 8.18.

The function LUT_definition defines the LUT contents.

The circuit of Fig. 8.18 can be used as a component for generating constant multipliers. As an example, a sequential n-bit by m-bit constant multiplier is synthesized. First define a component similar to that of Fig. 8.2, with x constant. It computes z = c · b + u, where c is an n-bit constant natural, b a k-bit natural, and u an n-bit natural. The maximum value of z is

$$ \left( { 2^{n} - 1} \right)\left( { 2^{k} - 1} \right) \, + { 2}^{n} - 1 { } = { 2}^{n + k} - { 2}^{k} , $$

so it is an (n + k)-bit number. It consists of a k-bit by n-bit multiplier (Fig. 8.18) and an (n + k)-bit adder (Fig. 8.19).

Fig. 8.19
figure 19

Computation of w = c · b + u

Finally, the circuit of Fig. 8.19 can be used to generate a radix-2k shift and add multiplier that computes z = c · y + u, where c is an n-bit constant natural, y an m-bit natural, and u an n-bit natural. The maximum value of z is

$$ \left( { 2^{n} - 1} \right)\left( { 2^{m} - 1} \right) \, + { 2}^{n} - 1 { } = { 2}^{n + m} - { 2}^{m} , $$

so z is an (n + m)-bit number. Assume that the radix-2k representation of y is Y m/k−1 Y m/k−2… Y 0, where each Y i is a k-bit number. The circuit implements the following set of equations:

$$ \begin{aligned} z^{\left( 0 \right)} &= \, \left( {u + c\, \cdot \,Y_{0} } \right)/ 2^{k} , \\ z^{\left( 1\right)} &= \, \left( {z^{(0)} + c\, \cdot \,Y_{ 1} } \right)/ 2^{k} , \\ z^{\left( 2\right)} &= \, \left( {z^{ 1)} + c\, \cdot \,Y_{ 2} } \right)/ 2^{k} , \\ &\qquad\qquad\ldots \\ z^{{\left( {m/k - 1} \right)}} &= \, \left( {z^{(m/k - 2)} + c\, \cdot \,Y_{m/k - 1} } \right)/ 2^{k} . \\ \end{aligned} $$
(8.19)

Thus,

$$ z^{{\left( {m/k - 1} \right)}} \, \cdot \,\left( { 2^{k} } \right)^{m/k} = u + c\, \cdot \,Y_{0} + c\, \cdot \,Y_{ 1} \, \cdot \, 2^{k} + \ldots + c\, \cdot \,Y_{m/k - 1} \, \cdot \,\left( { 2^{k} } \right)^{m/k - 1} , $$

that is to say

$$ z^{{\left( {m/k - 1} \right)}} \, \cdot \, 2^{m} = c\, \cdot \,y + u. $$

The circuit is shown in Fig. 8.20.

Fig. 8.20
figure 20

n-bit by m-bit constant multiplier

The computation time is approximately equal to

$$ T \cong \left( {m/k} \right)\, \cdot \,(T_{LUT - k} + T_{adder} \left( {n + k} \right). $$

The following VHDL model describes the circuit of Fig. 8.20 (k = 6).

A complete model sequential_constant_multiplier.vhd is available at the Authors’ web page.

The synthesis of constant multipliers for integers is left as an exercise.

8.6 FPGA Implementations

Several multipliers have been implemented within a Virtex 5-2 device. Those devices include Digital Signal Processing (DSP) slices that efficiently perform multiplications (25 bits by 18 bits), additions and accumulations. Apart from multiplier implementations based on LUTs and FFs, more efficient implementations, taking advantage of the availability of DSP slices, are also reported. As before, the times are expressed in ns and the costs in numbers of Look Up Tables (LUTs), flip-flops (FFs) and DSP slices. All VHDL models as well as several test benches are available at the Authors’ web page.

8.6.1 Combinational Multipliers

The circuit is shown in Fig. 8.3. The synthesis results for several numbers n and m of bits are given in Table 8.1.

Table 8.1 Combinational multiplier

A faster implementation is obtained by using the carry-save method (Fig. 8.4; Table 8.2).

Table 8.2 Carry-save combinational multiplier

If multipliers based on the cell of Fig. 8.8b are considered, more efficient circuits can be generated. It is the “by default” option of the synthesizer (Table 8.3).

Table 8.3 Optimized combinational multiplier

Finally, if DSP slices are used, better implementations are obtained (Table 8.4).

Table 8.4 Combinational multiplier with DSP slices

8.6.2 Radix-2k Parallel Multipliers

Several m · k bits by n · k bits multipliers (Sect. 8.2.4) have been implemented (Table 8.5).

Table 8.5 Radix-2k parallel multipliers

A faster implementation is obtained by using the carry-save method (Table 8.6).

Table 8.6 Carry-save radix-2k parallel multipliers

The same circuits have been implemented with DSP slices. The implementation results are given in Tables 8.7, 8.8

Table 8.7 Radix-2k parallel multipliers with DSPs
Table 8.8 Carry-save radix-2k parallel multipliers with DSPs

8.6.3 Sequential Multipliers

Several shift and add multipliers have been implemented. The implementation results are given in Tables 8.9, 8.10. Both the clock period T clk and the total delay (m · T clk ) are given.

Table 8.9 Shift and add multipliers
Table 8.10 Sequential carry-save multipliers

8.6.4 Combinational Multipliers for Integers

A carry-save multiplier for integers is shown in Fig. 8.12. The synthesis results for several numbers n and m of bits are given in Table 8.11.

Table 8.11 Carry-save mod 2n+m+1 multipliers

Another option is the modified shift and add algorithm of Sect. 8.4.2 (Fig. 8.15; Table 8.12).

Table 8.12 Modified shift and add algorithm

In Table 8.13, examples of post correction implementations are reported.

Table 8.13 Multipliers with post correction

As a last option, several Booth multipliers have been implemented (Table 8.14).

Table 8.14 Combinational Booth multipliers

8.6.5 Sequential Multipliers for Integers

Several radix-4 Booth multipliers have been implemented (Fig. 8.17). Both the clock period T clk and the total delay (m · T clk ) are given (Table 8.15).

Table 8.15 Sequential radix-4 Booth multipliers

8.7 Exercises

  1. 1.

    Generate the VHDL model of a mixed-radix parallel multiplier (Sect. 8.2.4).

  2. 2.

    Synthesize a 2n-bit by 2n-bit parallel multiplier using n-bit by n-bit multipliers as building blocks.

  3. 3.

    Modify the VHDL model sequential_CSA_multiplier.vhd so that the done flag is raised when the final result is available (Comment 8.2).

  4. 4.

    Generate the VHDL model of a carry-save multiplier with post correction (Sect. 8.4.3).

  5. 5.

    Synthesize a sequential multiplier based on Algorithm 8.3.

  6. 6.

    Synthesize a parallel constant multiplier (Sect. 8.5).

  7. 7.

    Generate models of constant multipliers for integers.

  8. 8.

    Synthesize a constant multiplier that computes z = c 1·y 1 + c 2·y 2 + … + c s ·y s  + u.