Abstract
Multiplication is a basic arithmetic operation whose execution is based on 1-digit by 1-digit multipliers and multi-operand adders. Most FPGA families include the basic components for implementing fast and cost-effective multipliers. Furthermore, they also include optimized fixed-size multipliers which, in turn, can be used for implementing larger-size multipliers.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Multiplication is a basic arithmetic operation whose execution is based on 1-digit by 1-digit multipliers and multi-operand adders. Most FPGA families include the basic components for implementing fast and cost-effective multipliers. Furthermore, they also include optimized fixed-size multipliers which, in turn, can be used for implementing larger-size multipliers.
The basic multiplication algorithm is described in Sect. 8.1. Several combinational implementations are proposed in Sect. 8.2. They correspond to different types of multi-operand adders: iterative ripple-carry adders, carry-save adders, multi-operand adders based on counters, radix-2k and mixed-radix adders. Sequential implementations are proposed in Sect. 8.3. They used the shift and add method implemented with either a ripple-carry adder or a carry-save adder. If integer operands are considered, several options are proposed in Sect. 8.4. A first method consists of multiplying B’s complement integers as they are naturals; the drawback of this conceptually simple method is that the operands must be represented, and multiplied, with as many digits as the final result. Better options are a modification of the shift and add algorithm, multiplication of naturals followed by a post-correction, and the Booth algorithms. The last section describes a LUT-based method for implementing a constant multiplier, that is to say, circuits that compute c · y + u, where c is a constant.
8.1 Basic Algorithm
Consider two radix-B numbers
where x i and y i belong to {0, 1,…, B −1}. An n-digit by m-digit multiplier generates a radix-B number
such that
A somewhat more general definition considers the addition of two additional numbers
so that
Observe that the maximum value of z is
In order to compute (8.1), first define a 1-digit by 1-digit multiplier: given four B-ary digits a, b, c and d, it generates two B-ary digits e and f such that
(Fig. 8.1a).
If B = 2, it amounts to a 2-input AND gate and a 1-digit adder (Fig. 8.1b).
An n-digit by 1-digit multiplier made up of n 1-digit by 1-digit multipliers is shown in Fig. 8.2. It computes as
where x and u are n-digit numbers, b and d are 1-digit numbers, and z is an (n + 1)-digit number. Observe that the maximum value of z is
Using the iterative circuit of Fig. 8.2 as a computation resource, the computation of (8.1) amounts to computing the m n-digit by 1-digit products
and to adding them, that is
For that, one of the multioperand adders of Sect. 7.7 can be used. As an example, if Algorithm 7.2 is used, then z is computed as follows.
Algorithm 8.1: Multiplication, right to left algorithm
8.2 Combinational Multipliers
8.2.1 Ripple-Carry Parallel Multiplier
The combinational circuit of Fig. 8.3 implements Algorithm 8.1 (with n = 4 and m = 3). One of its critical paths has been shaded. Its computation time is equal to
The following VHDL model describes the circuit of Fig. 8.3 (B = 2).
A complete generic model parallel_multiplier.vhd is available at the Authors’ web page.
8.2.2 Carry-Save Parallel Multiplier
A straightforward modification of the multiplier of Fig. 8.3, similar to the carry-save principle, is shown in Fig. 8.4. The circuit is made up of an n-by-m array of 1-by-1 multipliers, whose computation time is equal to n · T(1,1), plus an m-digit output adder. Its critical path has been shaded. Its computation time is equal to
The following VHDL model describes the circuit of Fig. 8.4 (B = 2).
A complete generic model parallel_csa_multiplier.vhd is available at the Authors’ web page.
8.2.3 Multipliers Based on Multioperand Adders
A straightforward implementation of Eqs. (8.4) and (8.5) can also be considered (Fig. 8.5). For that, any type of multioperand adder can be used.
Example 8.1
Consider an n-bit by 7-bit multiplier. The 7-operand adder can be divided up into a 7-to-3 counter, a 3-to-2 counter and a ripple-carry adder. The complete structure is shown in Fig. 8.6 and is described by the following VHDL model:
A complete generic model N_by_7_multiplier.vhd is available at the Authors’ web page.
Numerous multipliers, based on trees of counters, have been proposed and reported, among others the Wallace and Dadda multipliers (Wallace [4]; Dadda [3]). Nevertheless, as already mentioned before (Comment 7.5), in many cases the best FPGA implementations are based on relatively simple algorithms, to which correspond regular circuits that allow taking advantage of the special purpose carry logic circuitry. To follow, an example of efficient FPGA implementation is described.
Consider the set of equations (8.4). If two successive steps are merged within an only step (loop unrolling), the new set of equations is:
and the product is equal to
Assuming that u = 0, the basic operation to implement (8.8) is
to which corresponds the circuit of Fig. 8.7 (with n = 4).
The circuit of Fig. 8.7 can be decomposed into n + 1 vertical slices of the type shown in Fig. 8.8a (with obvious simplifications regarding the first and last slices). Finally, if B = 2 and v j = 0, the carries of the first line are equal to 0, so that the circuit of Fig. 8.8a can be implemented as shown in Fig. 8.8b.
Comment 8.1
Most FPGA’s include the basic components for implementing the structure of Fig. 8.8b, and the synthesis tools have the capability to generate optimized multipliers from a simple VHDL expression, such as
Furthermore, many FPGA’s also include fixed-size multiplier blocks.
8.2.4 Radix-2k and Mixed-Radix Parallel Multipliers
The basic multiplication algorithm (Sect. 8.1) and the corresponding ripple-carry and carry-save multipliers (Sects. 8.2.1 and 8.2.2) have been defined for any radix-B. In particular, radix-2k multipliers can be defined. This allows the synthesis of n · k-bit by m · k-bit multipliers using k-bit by k-bit multipliers as building blocks.
The following VHDL model defines a radix-2k ripple-carry parallel multiplier. The main iteration consists of m · n instantiations of any type of k-bit by k-bit combinational multiplier that computes z = a · b + c + d and represents z under the form \( z_{H} \, \cdot \, 2^{k} \, + z_{L} , \) where z H and z L are k-bit numbers:
A complete generic model base_2k_parallel_multiplier.vhd is available at the Authors’ web page.
The stored-carry encoding can also be applied. Once again, the main iteration consists of m · n instantiations of any type of k-bit by k-bit multiplier, and the connections are similar to those of the carry-save multiplier of Sect. 8.2.2. A complete generic model base_2k_csa_multiplier.vhd is available at the Authors’ web page.
A straightforward generalization of relations (8.2) to (8.5) allows defining mixed-radix combinational multipliers. First consider the circuit of Fig. 8.1a, assuming that
Then
so that z can be expressed under the form
Then, consider the circuit of Fig. 8.2, assuming that x and u are n-digit radix-B 1 numbers, and b and d are 1-digit radix-B 2 numbers. Thus,
with
Finally, given two n-digit radix-B 1 numbers x and u, and two m-digit radix-B 2 numbers y and v, compute
Then
Consider the case where
An easy way to define a VHDL model of the corresponding multiplier consists in first modelling a circuit that implements (8.9). The main iteration consists of n instantiations of any type of k 1-bit by k 2-bit combinational multiplier that computes
where z H is a k 2-bit number and z L a k 1-bit number:
Then, it remains to instantiate m rows:
A complete generic model MR_parallel_multiplier.vhd is available at the Authors’ web page.
The circuit defined by the preceding VHDL model is a bidirectional array similar to that of Fig. 8.3, but with more complex connections. As an example, with k 1 = 4 and k 2 = 2, the connections corresponding to cell (j, i) are shown in Fig. 8.9. As before, a stored-carry encoding circuit could also be designed, but with an even more complex connection pattern. It is left as an exercise.
8.3 Sequential Multipliers
8.3.1 Shift and Add Multiplier
In order to synthesize sequential multipliers, the basic algorithm of Sect. 8.1 can be modified. For that, Eq. (8.4) are substituted by the following:
Multiply the first equation by B, the second by B 2, and so on, and add the so obtained equations. The result is
Algorithm 8.2: Shift and add multiplication
A data path for executing Algorithm 8.2 is shown in Fig. 8.10a. The following VHDL model describes the circuit of Fig. 8.10a (B = 2).
The complete circuit also includes an m-state counter and a control unit. A complete generic model shift_and_add_multiplier.vhd is available at the Authors’ web page.
If v = 0, the same shift register can be used for storing both y and the least significant bits of z. The modified circuit is shown in Fig. 8.10b. A complete generic model shift_and_add_multiplier2 is also available.
The computation time of the circuits of Fig. 8.10 is approximately equal to
8.3.2 Shift and Add Multiplier with CSA
The shift and add algorithm can also be executed with stored-carry encoding. After m steps the result is obtained under the form
and an additional n-digit adder computes
The corresponding data path is shown in Fig. 8.11. The carry-save adder computes
where y 1, y 2 and y 3 are n-bit numbers, and s and c are (n + 1)-bit numbers. At the end of step i, the less significant bit of s is z i , and the n most significant bits of s and c are transmitted to the next step:
The complete circuit also includes an m-state counter and a control unit. A complete generic model sequential_CSA_multiplier.vhd is available at the Authors’ web page. The minimum clock period is equal to the delay of a 1-bit by 1-bit multiplier. Thus, the total computation time is equal to
Comment 8.2
In sequential_CSA_multiplier.vhd the done flag is raised as soon as the final values of the adder inputs are available. A more correct control unit should raise the flag k cycles later, being k · T clk an upper bound of the n-bit adder delay. The value of k could be defined as a generic parameter (Exercise 8.3).
8.4 Integers
Given four B’s complement integers
belonging to the ranges
then z = x · y + u + v belongs to the interval
Thus, z is a B’s complement number of the form
8.4.1 Mod 2Bn+m Multiplication
The integer represented by a vector \( x_{n} \,x_{n - 1} \,x_{n - 2} \, \ldots \,x_{ 1} \,x_{0} \) is
while the natural natural(x) represented by this same vector is
As x n ∈ {0, 1}, either natural(x) = x or natural(x) = x +2B n. So,
The following method can be used to compute z = x · y + u + v. First, represent the operands x, y, u and v with the same number of digits (n + m + 2) as the result z (digit extension, Sect. 7.8). Then, compute z = x · y + u + v as if x, y, u and v were naturals:
Finally, reduce z modulo \( 2B^{n + m + 1} . \) Assume that before the mod \( 2B^{n + m + 1} \) reduction
then
In particular, if B is even,
Example 8.2
Assume that B = 10, n = 4, m = 3, x = 7918, y = −541, u = −7017, v = 742, and compute z = 7918·(−541) + (−7017) + 742. In 10’s complement: x = 07918, y = 1459, u = 12983, v = 0742.
-
1.
Express all operands with 9 digits: x = 000007918, y = 199999459, u = 199992983, v = 000000742.
-
2.
Compute x · y + u + v: 000007918·199999459 + 199992983 + 000000742 = 1583795710087.
-
3.
Reduce 1583795710087 modulo 2·108: (1583795710087) mod 2·108 = (7 mod 2)·108 + 95710087 = 195710087.
The result 195710087 is the 10’s complement representation of −4289913.
Thus, any multiplier for natural numbers can be used. As an example, an (n + m + 2)-digit by (n + m + 2)-digit carry-save multiplier could be used (Fig. 8.4). As the result is reduced modulo 2B n+m+1, only the rightmost part of the circuit is used (if B is even), so that there is no output adder, and the most significant digit is reduced mod 2. An example with n = 3 and m = 2 is shown in Fig. 8.12. The corresponding computation time is equal to
This delay is practically the same as that of a carry-save combinational multiplier (8.7). Nevertheless, the number of 1-digit by 1-digit multiplication cells is equal to \( 1 { } + { 2 } + { 3 } + \ldots + \, \left( {n + m + 2} \right) \, = \, \left( {n + m + 2} \right)\left( {n + m + 3} \right)/ 2 \) instead of n · m.
A very simple way to generate a VHDL model consists of defining (n + m + 2)-bit representations of all operands and instantiating an (n + m + 2)-bit by (n + m + 2)-bit carry-save multiplier:
Only n + m + 2 output bits of the carry-save multiplier are connected to output ports, and the synthesis program will prune the circuit accordingly.
A complete generic model integer_CSA_multiplier.vhd is available at the Authors’ web page.
To conclude, this approach is conceptually attractive because any type of multiplier for natural numbers can be used. Nevertheless, the cost of the corresponding circuits is very high.
8.4.2 Modified Shift and Add Algorithm
Consider again four B’s complement integers
A set of equations similar to (8.12) can be defined:
Multiply the first equation by B, the second by B 2, and so on, and add the m + 1 so obtained equations. The result is
Algorithm 8.3: Modified shift and add multiplicationIn what follows it is assumed that v m = 0, that is to say v ≥ 0; so, in order to implement Algorithm 8.3, the two following computation primitives must be defined:
and
where
Thus, in the first case,
and in the second case
so that in both cases z is an (n + 2)-digit B’s complement integer and natural(z) = z mod 2B n+1.
The first primitive (8.17) is implemented by the circuit of Fig. 8.13 and the second (8.18) by the circuit of Fig. 8.14. In both, circuit z n+1 is computed modulo 2.
As an example, the combinational circuit of Fig. 8.15 implements Algorithm 8.3 (with n = m = 2). Its cost and computation time are practically the same as in the case of a ripple-carry multiplier for natural numbers. It can be described by the following VHDL model.
A complete generic model modified_parallel_multiplier.vhd is available at the Authors’ web page.
The design of a sequential multiplier based on Algorithm 8.3 is left as an exercise.
8.4.3 Post Correction Multiplication
Given four B’s complement integers
then z = x · y + u + v, belonging to the interval \( - B^{n + m + 1} \le \, z < B^{n + m + 1} , \) can be expressed under the form
where X 0, Y 0, U 0 and V 0 are four naturals
deduced from x, y, u and v by eliminating the sign bits. Thus, the computation of z amounts to the computation of
that can be executed by any type of multiplier for naturals, plus a post correction that consists of several additions and left shifts.
If B = 2 and u = v = 0, then
The (n + m + 2)-bit 2’s complement representations of \( - x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} \) and \( - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{m} \) are
and
so that the representation of \( x_{n} \, \cdot \,y_{m} \, \cdot \, 2^{n + m} -x_{n} \, \cdot \,Y_{0} \, \cdot \, 2^{n} - y_{m} \, \cdot \,X_{0} \, \cdot \, 2^{n} \) is
A simple modification of the combinational multipliers of Fig. 8.3 and 8.4 allows computing x · y, where x is an (n + 1)-bit 2’s complement integer and y an (m + 1)-bit 2’s complement integer. An example is shown in Fig. 8.16 (n = 3, m = 2). The nand multiplication cells are similar to that of Fig. 8.1b, but for the substitution of the AND gate by a NAND gate [1].
The following VHDL model describes the circuit of Fig. 8.16.
A complete generic model postcorrection_multiplier.vhd is available at the Authors’ web page.
8.4.4 Booth Multiplier
Given an (m + 1)-bit 2’s complement integer \( y = - y_{m} \, \cdot \, 2^{m} + y_{m - 1} \, \cdot \, 2^{m - 1} + \, \ldots \, + y_{ 1} \, \cdot \, 2 { } + y_{0} , \) define
so that all coefficients y i ’ belong to {−1, 0, 1}. Then y can be represented under the form
the so-called Booth’s encoding of y (Booth [2]. Unlike the 2’s complement representation in which y m has a specific function, all coefficients y i ’ have the same function. Formally, the Booth’s representation of an integer is the same as the binary representation of a natural. The basic multiplication algorithm (Algorithm 8.1), with v = 0, can be used.
Algorithm 8.4: Booth multiplication, z = x·y + u
The following VHDL model describes a combinational circuit based on Algorithm 8.4. A complete generic model Booth1_multiplier.vhd is available at the Authors’ web page.
Higher radix Booth multipliers can be defined. Given an (m + 1)-bit 2’s complement integer \( y = - y_{m} \, \cdot \, 2^{m} + y_{m - 1} \, \cdot \, 2^{m - 1} + \ldots + y_{ 1} \, \cdot \, 2 { } + y_{0} , \) where m is odd, define
so that all coefficients y i ’ belong to {−2, −1, 0, 1, 2}. Then y can be represented under the form
the so-called Booth-2 encoding of y.
Example 8.3
Consider the case where m = 9 and thus (m−1)/2 = 4. The 2’s complement representation of −137 is 1101110111. The corresponding Booth-2 encoding is −1 2 −1 2 −1 and, indeed, −44 + 2 · 43 − 42 +2 · 4 − 1 = −137. The basic radix-4 multiplication algorithm, with v = 0, can be used.
Algorithm 8.5: Radix-4 Booth multiplication, z = x · y + u
A sequential implementation is shown in Fig. 8.17. It includes a shift register whose content is shifted two positions at each step, a parallel register and an adder whose second operand is −2x, −x, 0, x or 2x depending on the three least significant bits (y2·i+1, y2·i , y2·i−1) of the shift register. At each step, two output bits are generated. Hence, the total computation time is equal to (m + 1)/2·T clk , where T clk must be greater than the computation time of an (n + 3)-bit adder. Thus,
With respect to a radix-2 shift and add multiplier (Sect. 8.2.1), the computation time has been divided by 2.
The following VHDL model describes the circuit of Fig. 8.17.
The complete circuit also includes an (m + 1)/2-state counter and a control unit. A complete generic model Booth2_sequential_multiplier.vhd is available at the Authors’ web page.
8.5 Constant Multipliers
Given an n-bit constant natural c and an m-bit natural y, the computation of c · y can be performed with any n-bit by m-bit multiplier whose first operand is connected to the constant value c. Then, the synthesis tool will eliminate useless components. In the case of FPGA implementations, an alternative method is to store the constant c within the LUTs.
Assume that the technology at hand includes k-input LUTs. The basic component is a circuit that computes w = c · b, where b is a k-bit natural. The maximum value of w is
so w is an (n + k)-bit number. The circuit is shown in Fig. 8.18, with k = 6. It is made up of n + 6 LUT-6, each of them being programmed in such a way that
Its computation time is equal to T LUT6.
The following VHDL model describes the circuit of Fig. 8.18.
The function LUT_definition defines the LUT contents.
The circuit of Fig. 8.18 can be used as a component for generating constant multipliers. As an example, a sequential n-bit by m-bit constant multiplier is synthesized. First define a component similar to that of Fig. 8.2, with x constant. It computes z = c · b + u, where c is an n-bit constant natural, b a k-bit natural, and u an n-bit natural. The maximum value of z is
so it is an (n + k)-bit number. It consists of a k-bit by n-bit multiplier (Fig. 8.18) and an (n + k)-bit adder (Fig. 8.19).
Finally, the circuit of Fig. 8.19 can be used to generate a radix-2k shift and add multiplier that computes z = c · y + u, where c is an n-bit constant natural, y an m-bit natural, and u an n-bit natural. The maximum value of z is
so z is an (n + m)-bit number. Assume that the radix-2k representation of y is Y m/k−1 Y m/k−2… Y 0, where each Y i is a k-bit number. The circuit implements the following set of equations:
Thus,
that is to say
The circuit is shown in Fig. 8.20.
The computation time is approximately equal to
The following VHDL model describes the circuit of Fig. 8.20 (k = 6).
A complete model sequential_constant_multiplier.vhd is available at the Authors’ web page.
The synthesis of constant multipliers for integers is left as an exercise.
8.6 FPGA Implementations
Several multipliers have been implemented within a Virtex 5-2 device. Those devices include Digital Signal Processing (DSP) slices that efficiently perform multiplications (25 bits by 18 bits), additions and accumulations. Apart from multiplier implementations based on LUTs and FFs, more efficient implementations, taking advantage of the availability of DSP slices, are also reported. As before, the times are expressed in ns and the costs in numbers of Look Up Tables (LUTs), flip-flops (FFs) and DSP slices. All VHDL models as well as several test benches are available at the Authors’ web page.
8.6.1 Combinational Multipliers
The circuit is shown in Fig. 8.3. The synthesis results for several numbers n and m of bits are given in Table 8.1.
A faster implementation is obtained by using the carry-save method (Fig. 8.4; Table 8.2).
If multipliers based on the cell of Fig. 8.8b are considered, more efficient circuits can be generated. It is the “by default” option of the synthesizer (Table 8.3).
Finally, if DSP slices are used, better implementations are obtained (Table 8.4).
8.6.2 Radix-2k Parallel Multipliers
Several m · k bits by n · k bits multipliers (Sect. 8.2.4) have been implemented (Table 8.5).
A faster implementation is obtained by using the carry-save method (Table 8.6).
The same circuits have been implemented with DSP slices. The implementation results are given in Tables 8.7, 8.8
8.6.3 Sequential Multipliers
Several shift and add multipliers have been implemented. The implementation results are given in Tables 8.9, 8.10. Both the clock period T clk and the total delay (m · T clk ) are given.
8.6.4 Combinational Multipliers for Integers
A carry-save multiplier for integers is shown in Fig. 8.12. The synthesis results for several numbers n and m of bits are given in Table 8.11.
Another option is the modified shift and add algorithm of Sect. 8.4.2 (Fig. 8.15; Table 8.12).
In Table 8.13, examples of post correction implementations are reported.
As a last option, several Booth multipliers have been implemented (Table 8.14).
8.6.5 Sequential Multipliers for Integers
Several radix-4 Booth multipliers have been implemented (Fig. 8.17). Both the clock period T clk and the total delay (m · T clk ) are given (Table 8.15).
8.7 Exercises
-
1.
Generate the VHDL model of a mixed-radix parallel multiplier (Sect. 8.2.4).
-
2.
Synthesize a 2n-bit by 2n-bit parallel multiplier using n-bit by n-bit multipliers as building blocks.
-
3.
Modify the VHDL model sequential_CSA_multiplier.vhd so that the done flag is raised when the final result is available (Comment 8.2).
-
4.
Generate the VHDL model of a carry-save multiplier with post correction (Sect. 8.4.3).
-
5.
Synthesize a sequential multiplier based on Algorithm 8.3.
-
6.
Synthesize a parallel constant multiplier (Sect. 8.5).
-
7.
Generate models of constant multipliers for integers.
-
8.
Synthesize a constant multiplier that computes z = c 1·y 1 + c 2·y 2 + … + c s ·y s + u.
References
Baugh CR, Wooley BA (1973) A two’s complement parallel array multiplication algorithm. IEEE Trans Comput C 31:1045–1047
Booth AD (1951) A signed binary multiplication technique. Q J Mech Appl Mech 4:126–140
Dadda L (1965) Some schemes for parallel multipliers. Alta Frequenza 34:349–356
Wallace CS (1964) A suggestion for fast multipliers. IEEE Trans Electron Comput EC-13:14–17
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Deschamps, JP., Sutter, G.D., Cantó, E. (2012). Multipliers. In: Guide to FPGA Implementation of Arithmetic Functions. Lecture Notes in Electrical Engineering, vol 149. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2987-2_8
Download citation
DOI: https://doi.org/10.1007/978-94-007-2987-2_8
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2986-5
Online ISBN: 978-94-007-2987-2
eBook Packages: EngineeringEngineering (R0)