1 Introduction

Multiplication is often the critical path of integrated processors, such as digital signal processing [23], cryptography [11] and communication systems [7], as they have higher latency compared to the other arithmetic circuits like addition and subtraction. Moreover, many Internet of Things (IoT) applications, such as health monitoring and surveillance camera, have the limitations of power consumption and extensive computation for burst-mode signal processing. In the IoT applications, individual nodes, also called smart nodes, are often composed of sensors, signal processors and wireless transceivers [28]. To keep smart nodes working for enough time without frequent maintenance, a low-power signal processor with high energy efficiency is adopted to IoT applications [5, 13]. Therefore, it is reasonable to design more efficient multipliers in order to improve the whole device efficiency. One of the most suitable schemes is to use unconventional number systems like residue number systems (RNS) rather than conventional binary number systems. The bottleneck of binary number systems is the carry propagation chain as a function of operand width. This problem deteriorates sharply when the width of operands becomes larger. Here are the reasons which prove that RNS leads to a more efficient number system compared to its counterpart, i.e. binary number system [2,3,4]. RNS divides large numbers to smaller ones, called residues, based on the desired moduli set. Arithmetic operations could be done independently and in parallel on each module. Due to smaller residues, a smaller arithmetic circuit is needed for each module. There is no carry propagation chain among the moduli. These characteristics result in fast and low-power systems.

Although RNS eliminates the inter-carry propagation chain, the intra-carry propagation chain already exists, which bounds RNS speed-up [4]. Applying redundant encoding for representing each residue has turned to be the best answer to this problem [3]. In fact, the redundant number system is another way for accelerating arithmetic circuits independently. Employing the redundant number system whose digit set in radix-r number system contains more than r digits and consequently enables multiple representations for each redundant number can improve arithmetic operations like addition and multiplication through reduction or elimination of the carry propagation [8, 9, 12].

The combination of redundant and residue number systems to improve performance is called redundant residue number system (RRNS). RRNS is a special residue number system with desired moduli set which employs redundant representation to encode the residues. The basic characteristics of RRNS are fast, parallel and low-power arithmetic because of the following reasons: (1) decomposing the input operand into smaller residues, (2) lack of carry propagation among the moduli and (3) removing carry propagation inside the moduli.

The three existing RRNS encodings are binary signed-digit RNS (BSD-RNS) [29], maximally redundant signed-digit RNS (MRSD-RNS) [17] and stored-unibit-transfer RNS (SUT-RNS) [16]. When using high-radix (r > 2) redundant representation to encode the residues, it is called high-radix RRNS. Although BSD-RNS (with radix r = 2) utilizes fully redundant representation and offers faster arithmetic units, the two other ones utilize high-radix (with radix r > 2) redundant representation and enable a trade-off between delay and area overhead by selecting appropriate radix. Recently, MRSD-RNS was presented in [17] and also an adder structure based on this number representation was proposed. The authors indicate that MRSD-RNS adder has the least delay and power among the high-radix RRNS.

This paper extends the work of [17] and modifies the addition algorithm of 2n − 1 MRSD-RNS. It also explores a new modulo 2n − 1 MRSD-RNS multiplier for the first time. The simulation results indicate that the proposed modulo 2n − 1 MRSD-RNS multiplier has the least area, power and PDP for different delays among the RRNS multipliers. It has also the least delay among the existing high-radix RRNS multipliers.

The rest of the paper is organized as follows: In Sect. 2, RNS, redundant and RRNS are described. BSD-RNS and SUT-RNS encodings are also studied in the section. Section 3 develops formal representations and algorithmic notations for maximally redundant signed-digit (MRSD) encoding and efficient addition algorithm, as well as MRSD-RNS encoding and the recently presented adder structure. In Sect. 4, we propose a radix-2h MRSD-RNS multiplication algorithm and structure for modulo 2n − 1. Section 5 includes simulation results and comparison of our proposed MRSD-RNS multiplier with other RRNS multipliers. Finally, Sect. 6 concludes the paper.

2 Preliminaries

The three number systems, RNS, redundant and RRNS are first defined in this section. Then, we focus on existing RRNS encodings.

2.1 Residue Number System (RNS)

Definition 1

(residue number system) RNS is defined by a set of N positive and pairwise relatively prime moduli \( \left\{ {m_{1} ,m_{2} , \ldots ,m_{N} } \right\} \). A binary number R is represented in the determined residue number system as a set of N small integers \( R = \left( {R_{1} ,R_{2} , \ldots ,R_{N} } \right) \), where

$$ R_{i} = \left| R \right|_{{m_{i} }} \quad 0 \le R_{i} < m_{i} \quad {\text{and}}\quad 1 \le i \le N $$

and \( \left| R \right|_{{m_{i} }} \) denotes the residue of R modulo mi. This representation is individual for any integer R in the range [0, M − 1], where \( M = m_{1} \times m_{2} \times \cdots \times m_{N} \) is the dynamic range of the moduli set [4]. \( \square \)

Therefore, RNS converts large numbers to smaller residues according to its moduli. Arithmetic operations like addition, subtraction and multiplication are implemented in parallel on the moduli without any carry propagation between them. So, RNS can support high-speed, carry-limited, low-power and parallel arithmetic.

The sections of a typical RNS architecture are depicted in Fig. 1. As shown in the figure, the RNS is composed of three sections: (1) binary to RNS \( (m_{1} ,m_{2} , \ldots ,m_{N} ) \) conversion, RNS modular arithmetic circuits and finally, RNS \( (m_{1} ,m_{2} , \ldots ,m_{N} ) \) to binary conversion. The input binary number R is converted to the residues \( \left( {R_{1} ,R_{2} , \ldots ,R_{N} } \right) \) associated with the RNS moduli set \( (m_{1} ,m_{2} , \ldots ,m_{N} ) \). (2) RNS arithmetic operations are performed on \( \left( {R_{1} ,R_{2} , \ldots ,R_{N} } \right) \) and produce the final result \( \left( {R_{1}^{{\prime }} ,R_{2}^{{\prime }} , \ldots ,R_{N}^{{\prime }} } \right) \). 3) Finally, the residues \( \left( {R_{1}^{{\prime }} ,R_{2}^{{\prime }} , \ldots ,R_{N}^{{\prime }} } \right) \) are converted to R’ according to the moduli set. RNS is appropriate for applications composed of only addition, subtraction and multiplication. Division, sign detection and magnitude comparison are difficult to be computed in RNS [4].

Fig. 1
figure 1

Typical components of RNS architecture with the moduli set {m1, m2, …, mN}

2.2 Redundant Number System

A redundant number is formed by a collection of digits, each associated with a power-of-2 weight. A digit is also encoded as a collection of weighted bits. Since a redundant number exploits more bits than required to represent a binary number in a conventional binary number system, some numbers have several representations. Redundant number system allows carry-free addition independent of operand width, i.e. the addition is done without propagating a carry signal through the width of operand [1, 9, 31].

The components of a typical redundant number system architecture are illustrated in Fig. 2. As shown in the figure, the redundant number system is composed of three sections: binary to redundant conversion, redundant arithmetic circuits and finally, redundant to binary conversion.

Fig. 2
figure 2

Typical components of a redundant number system

An input operand R is first converted to redundant representation RR. Redundant arithmetic circuits operate on the input redundant operand RR and produce the output operand RR′ with redundant representation. Finally, the output RR′ is converted to the binary number R′.

2.3 Redundant Residue Number System (RRNS)

As mentioned earlier, utilizing redundant number system for representing each module of residue number system leads to more efficient arithmetic units due to its carry-free properties. RRNS is defined as follows:

Definition 2

(redundant residue number system) RRNS is defined by two parameters: (1) the moduli set parameter m = {m1, m2, …, mN} associated with the RNS and (2) the parameter ‘redundant’ associated with the redundant number system. So, we imply RRNS by the pair (m, redundant) where m is the moduli set and redundant is a desired redundant number system. \( \square \)

Without losing generality, Fig. 3 presents the block diagram of a typical RRNS(m, redundant) with the moduli set m = {m1, m2, …, mN} and every desired redundant representation. According to this figure, RRNS(m, redundant) includes five steps:

Fig. 3
figure 3

Generic model of the RRNS(m, redundant) with the moduli set m = {m1, m2,…, mN} and a defined redundant representation

  1. 1.

    Binary to RNS conversion: The input binary number R is converted to the residues {R1, R2, …, RN} according to the RNS moduli set m.

  2. 2.

    RNS residues to redundant conversion: The residues {R1, R2, …, RN} are converted to redundant residues {RR1, RR2, …, RRN} according to the defined redundant representation.

  3. 3.

    RRNS arithmetic circuits for each module: The final results \( \left\{ {RR_{1}^{{\prime }} ,RR_{2}^{{\prime }} , \ldots ,RR_{N}^{{\prime }} } \right\} \) are produced according to the arithmetic operations.

  4. 4.

    Redundant to RNS residues conversion: The redundant residues \( \left\{ {RR_{1}^{{\prime }} ,RR_{2}^{{\prime }} , \ldots ,RR_{N}^{{\prime }} } \right\} \) are converted to the residues \( \left\{ {R_{1}^{{\prime }} ,R_{2}^{{\prime }} , \ldots ,R_{N}^{{\prime }} } \right\} \) according to the defined redundant representation.

  5. 5.

    RNS to binary conversion: The residues \( \left\{ {R_{1}^{{\prime }} ,R_{2}^{{\prime }} , \ldots ,R_{N}^{{\prime }} } \right\} \) are converted to R′ according to the RNS moduli set m.

The first two steps (binary to RNS and RNS to redundant conversions) and also, the last two ones (redundant to RNS and RNS to binary conversions) could be merged to achieve a more efficient implementation. Three RRNS encodings of BSD-RNS [18, 26, 29], MRSD-RNS [17] and SUT-RNS [15, 16, 20] have been explored in the state-of-the-art articles. Nevertheless, BSD-RNS provides the fastest and the most symmetric arithmetic circuits; MRSD-RNS and SUT-RNS (as high-radix RRNSs) offer a trade-off between area and speed through various redundancies. Each number system can be chosen regarding the worst case delay or the available resources listed to a designer. In the rest of the section, we review the BSD-RNS and SUT-RNS encodings.

Definition 3

(BSD-RNS) BSD-RNS which stands for binary signed-digit residue number system is a class of RRNS number representation which utilizes redundant binary signed-digit (BSD) number system [9, 12] to represent the residues. So, BSD-RNS is implied by RRNS(m, BSD) where m is the moduli set. In this number system, each redundant residue RRi (in Fig. 3) is composed of k binary signed-digits where a binary signed-digit has three values in the range {− 1, 0, 1} and \( k = \log_{2} m_{i} \). \( \square \)

To encode the three values in the range {− 1, 0, 1}, one posibit and one negabit could be utilized together [21]. Posibit \( (x_{i} \)) is the well-known bit and has the values in the range {0, 1}. Negabit \( (X_{i} ) \) is the negated bit and has the values in the range {− 1, 0} as depicted in Table 1. It should be mentioned that in this article, posibits and negabits are represented by small and capital words, respectively. According to the table, the posibits (negabits) are represented by black (white) circles in this article. Figure 4 indicates dot notation and symbolic view of a k-digit redundant residue RRi in BSD-RNS, where BSD digits Ai (\( 0 \le i < k \)) consists of a posibit \( x_{i} \) and a negabit \( X_{i} \).

Table 1 Summary of two-valued digit sets: posibit, negabit and unibit [6]
Fig. 4
figure 4

Dot notation and symbolic representation of a k-digit redundant residue A in BSD-RNS [18]

Modulo 2k ± 1 BSD-RNS addition could be performed simply by adding an end-around-carry unit to the BSD adder [18, 22]. Making possible to represent results which are greater than the module value by negative range, only k digits are sufficient for representing modulo 2k + 1.

Modulo 2k ± 1 BSD-RNS multiplication algorithm proposed in [29] is composed of three main steps: (1) partial product generation, (2) addition tree for partial products reduction and (3) final BSD-RNS modulo addition. Some efficient BSD-RNS multipliers have been reported in the articles [14, 19, 24, 26, 30].

Definition 4

(SUT-RNS) SUT-RNS which stands for radix-2h (h > 1) stored-unibit transfer residue number system is a class of RRNS number representation which utilizes radix-2h SUT number system to represent the residues [20]. So, SUT-RNS is implied by RRNS(m, radix-2h SUT). In this number system, each redundant residue RRi (in Fig. 3) is composed of k radix-2h SUT digits where a radix-2h SUT digit, as presented in [6], has the values in the range \( \left[ { - 2^{h - 1} - 1,2^{h - 1} - 1 } \right] \). \( \square \)

Figure 5 indicates dot notation and symbolic representation of a k-digit radix-2h redundant residue in SUT-RNS, where radix-2h SUT digits Ai\( (0 \le i < k) \) consists of (h − 1) posibits \( x_{i}^{h - 2} \ldots x_{i}^{0} \), a negabit \( X_{i}^{h - 1} \) and a unibit \( x_{i}^{{{\prime }0}} \). Unibit is a bit in the range {− 1, 1} and is noted by a white square as shown in Table 1. It is shown in [15] that general building blocks such as full adder, half adder and compressor could be utilized to construct SUT-RNS arithmetic units.

Fig. 5
figure 5

Dot notation and symbolic representation of a k-digit redundant residue A in radix-2h SUT-RNS [20]

In [15, 16, 20], the adder, subtractor and multiplier structures have been presented for the moduli set {2n − 1, 2n, 2n + 1} based on SUT-RNS encoding. Although SUT-RNS increases delay slightly, it outperforms area overhead of the BSD-RNS. Besides, choosing appropriate radix value provides flexible trade-off between delay and area overhead.

3 MRSD and MRSD-RNS Encodings and Addition Algorithms

In this section, we explore formal representations for the definitions and addition algorithms of MRSD and MRSD-RNS. The algorithmic notations are developed with some modifications.

Definition 5

(MRSD encoding) In [13], Avezienis introduced redundant radix-r signed-digit (SD) number systems with digits in the range [− α, α]. The redundant radix-r SD with radixes greater than 2 is called high-radix signed-digit (HRSD) as shown in Fig. 6. In the figure, for digits Ai, where \( 0 \le i < k \), with the intention of eliminating carry propagation chain, α should be in the following range [13]:

$$ \left\lceil {\frac{r + 1}{2}} \right\rceil \le \alpha \le r - 1 $$
(1)

For the cases in which α = r–1 = 2 h-1, the radix-2h maximally redundant SD (MRSD) number systems lead to more efficient SD adders. In fact, the MRSD number is a HRSD number with the maximum possible range. Figure 6 indicates a k-digit radix-2h MRSD number which consists of h posibits \( x_{i}^{h - 1} \cdots x_{i}^{0} \) and a negabit \( X_{i}^{h} \). \( \square \)

Fig. 6
figure 6

Dot notation and symbolic representation of a radix-2h MRSD number system

An improved addition algorithm for MRSD number system was presented in [10] and is stated by Algorithm 1.

Algorithm 1

(Fast MRSD Addition) Let’s assume two k-digit radix-2h MRSD numbers A and B with digits Ai and Bi whose symbolic representations are:

$$ A_{i} = X_{i}^{h} x_{i}^{h - 1} \cdots x_{i}^{0} , B_{i} = Y_{i}^{h} y_{i}^{h - 1} \cdots y_{i}^{0} $$

The MRSD addition consists of two steps:

  • Parallel calculation of transfer digit ti by only considering digits Ai and Bi, where ti belongs to {− 1, 0, 1} and 0 ≤ i < k.

  • Parallel calculation of Si for digits Ai and Bi where 0 ≤ i < k. \( \square \)

The adder is composed of k MRSD adder cells presented in [10] as shown in Fig. 7. As proved in [10], the transfer digit ti+1 is calculated slightly according to the most significant bits (i.e. \( X_{i}^{h} x_{i}^{h - 1} \) and \( Y_{i}^{h} y_{i}^{h - 1} \)). So, the addition is performed independent of h and a carry-free addition is exploited.

Fig. 7
figure 7

Symbolic representation of MRSD addition scheme for two k-digit radix-2h MRSD numbers A and B proposed in [10]

Definition 6

(MRSD-RNS encoding) MRSD-RNS which stands for radix-2h (h > 1) maximally redundant signed-digit residue number system is a class of RRNS number representation which utilizes radix-2h MRSD number system (Definition 5) to represent the residues. So, MRSD-RNS is implied by RRNS(m, radix-2hMRSD). In this number system, each redundant residue is composed of k radix-2h MRSD digits where a radix-2h MRSD digit is composed of h posibits and a negabit defined earlier. \( \square \)

Figure 8 indicates dot notation and symbolic representation of a k-digit radix-2h redundant residue in MRSD-RNS, where radix-2h MRSD-RNS digits Ai (\( 0 \le i < k \)) consists of h posibits \( x_{i}^{h - 1} \cdots x_{i}^{0} \) and a negabit \( X_{i}^{0} \). So, the difference between a MRSD number (Definition 5) and a MRSD-RNS redundant residue (Definition 6) is their negabits position, where the negabit is considered as the most (least) significant bit in MRSD (MRSD-RNS) number system. MRSD-RNS encoding and its adder structure have been recently explored in [17]. The addition algorithm is described with some modifications in the next section.

Fig. 8
figure 8

Dot notation and symbolic representation of a k-digit radix-2h redundant residue A in MRSD-RNS [17]

4 Proposed Radix-2h MRSD-RNS Modulo 2k − 1 Addition and Multiplication Algorithms

This section explores the proposed addition and multiplication algorithms by Algorithms 2 and 3, respectively.

Algorithm 2

(MRSD-RNS Modulo 2n − 1 Addition) Let’s assume two k-digit radix-2h MRSD-RNS redundant residues A and B composed of digits Ai and Bi, where 0 ≤ i < k and n = k.h.

$$ A_{i} = \begin{array}{*{20}l} {x_{i}^{h - 1} \ldots x_{i}^{0} } \hfill & {} \hfill \\ {} \hfill & {X_{i}^{0} } \hfill \\ \end{array} ,\quad B_{j} = \begin{array}{*{20}l} {y_{j}^{h - 1} \ldots y_{j}^{0} } \hfill & {} \hfill \\ {} \hfill & {Y_{j}^{0} } \hfill \\ \end{array} $$
(2)

Since \( \left| {2^{n} X_{0}^{0} } \right|_{{2^{n} - 1}} = \left| {2^{0} X_{0}^{0} } \right|_{{2^{n} - 1}} \), \( X_{0}^{0} \) in Fig. 8 could be considered as the most significant negabit in position 2n. Now, we define new digits \( A_{i}^{{\prime }} \) and \( B_{i}^{{\prime }} \) shown in Fig. 9. The redundant residues A and B are composed of \( A_{i}^{{\prime }} \) and \( B_{i}^{{\prime }} \) defined as follows:

$$ A_{i}^{{\prime }} = X_{i}^{h} x_{i}^{h - 1} \ldots x_{i}^{0} ,\quad B_{i}^{{\prime }} = Y_{i}^{h} y_{i}^{h - 1} \ldots y_{i}^{0} $$
(3)
Fig. 9
figure 9

Defining new digits \( A_{i}^{{\prime }} \) and \( B_{i}^{{\prime }} \) based on the redundant residues A and B, respectively

MRSD-RNS adder of A and B for modulo 2n − 1 is made of three parts.

  1. 1.

    Parallel calculation of transfer digit ti by considering digits \( A_{i}^{{\prime }} \) and \( B_{i}^{{\prime }} \), where ti belongs to {− 1, 0, 1} and 0 ≤ i < k.

  2. 2.

    Parallel calculation of Si for digits \( A_{i}^{{\prime }} \) and \( B_{i}^{{\prime }} \) where 0 ≤ i < k.

  3. 3.

    The transfer produced in the last digit (tk) is ended around to the first digit as an input transfer. For modulo 2n − 1, the re-entrant carry is applied to the least significant digit position because we have:

    $$ \left| {2^{n} t_{k} } \right|_{{2^{n} - 1}} = \left| {(2^{n} - 1)t_{k} + t_{k} } \right|_{{2^{n} - 1}} = t_{k} $$
    (4)

    \( \square \)

It should be also noted that the addition procedure of [17] has been extended by presenting new Algorithm 2 and Figs. 9 and 10. The figures demonstrate appropriately the carry-free property of MRSD-RNS addition that had not been revealed in the previous work.

Fig. 10
figure 10

Architecture of proposed MRSD-RNS modulo 2n − 1 adder

As mentioned in [17], the only difference between MRSD addition (Algorithm 1) and MRSD-RNS addition (Algorithm 2) is in the third part of the above algorithm, i.e. the output transfer digit tk is ignored in MRSD addition, whereas it re-enters as input carry to the first digit in MRSD-RNS addition. In Fig. 10, the addition of two k-digit MRSD-RNS numbers A and B for modulo 2n − 1 is depicted.

Algorithm 3

(MRSD-RNS modulo 2n − 1 multiplication) Let’s assume two k-digit redundant residues A and B in radix-2h MRSD-RNS where n = k.h. The proposed algorithm is composed of three main steps and is depicted in Fig. 11:

Fig. 11
figure 11

Multiplying two k-digit redundant residues A and B in radix-2h MRSD-RNS

Step 1 :

Digit-product computation: radix-2h digit Ai is multiplied by Bj for 0 ≤ i, j < k;

Step 2 :

Digit-products arrangement and rotation to generate partial products, and

Step 3 :

Partial product accumulation

The rest of the section explains the above three steps in detail. Finally, it is concluded by two examples.

4.1 Step 1 of the Algorithm 3: Digit-Product Computation

In the first stage, we have the following inputs and outputs:

  • Inputs: digits Ai and Bj in the form of MRSD-RNS digit where 0 ≤ i, j < k

  • Outputs:k2 digit products Ai × Bj (shown by two-digit MRSD numbers P’jiPji)

In fact, k digits of A are multiplied by k digits of B. So k2 digit products are produced. In the following, we consider the multiplication of radix-2h digits Ai by Bj in MRSD-RNS for 0 ≤ i, j < k. Let’s assume that the symbolic representations of Ai and Bj are shown by (2).

The bit and symbolic representations of multiplying Ai and Bj are revealed in Fig. 12a, b, respectively, according to the first step of Algorithm 3. The final digit product Ai × Bj (shown by P’jiPji) is computed in the form of a two-digit radix-2h MRSD number as depicted in the figure. The digit-product computation could be divided into two parts:

Fig. 12
figure 12

a Bit representation and b symbolic representation of multiplying Ai and Bj according to the first step of Algorithm 3

Part 1) Bit-Product Generation: The first part is to derive the bit products including posibits and negabits. Figure 13 indicates symbolic representations and circuits for the bit-product derivation. There are three possible combinations of bit products as shown in Fig. 13:

Fig. 13
figure 13

Symbolic representations and required circuits for bit production as the first part of step 1 of the Algorithm 3: the production of a two posibits, b one posibit and one negabit, c two negabits

  • multiplying a posibit by a posibit which results in a posibit as shown in Fig. 13a,

  • multiplying a posibit by a negabit which results in a negabit as shown in Fig. 13b,

  • multiplying a negabit by a negabit which results in a posibit as shown in Fig. 13c.

The first type of bit production is for two posibits implemented by an and-gate as depicted in Fig. 13a. However, when multiplying a negabit Y in the range {− 1, 0} by a posibit x in the range {0, 1}, a bit in the range {− 1, 0} is achieved which is equivalent by a negabit. The bit multiplication is done by an and-gate with inverted negabit input Y and inverted negabit output S, as shown in Fig. 13b. Similarly, by multiplying two negabits in the range {− 1, 0}, a bit in the range {0, 1} is obtained which is shown by a posibit. The operation is done by a nor-gate as shown in Fig. 13c.

Part 2) Bit-Product Accumulation: In the second part, the bit products are accumulated to achieve the final digit product. To accumulate the bits, we employ adder structures. Figure 14 shows schematic representations of a full adder (half adder) used to accumulate a set of 3 (2) bits. The sum and carry of each summation are indicated by capital or small s and c letters depending on the input set of posibits and negabits. It is shown in the figure that standard full adders (half adders) could be applied to accumulate three (two) combinations of negabits and posibits. Standard half adder assumes a posibit with the value zero as the third input. However, to obtain the desired final representation, two combinations of equally weighted posibits and negabits utilize a semi-half adder shown by HA+ in the figure. The cell HA+1 assumes a negabit with the value of zero as the third input. So, its structure is a bit different from the standard half adder shown in Fig. 14. For bit-product summation, standard full adders (FA), standard half adders (HA) and semi-half adders (HA+1) represented in Fig. 14 are used. Finally, a two-digit MRSD number is achieved at the end of first step and is shown by \( P_{ji}^{{\prime }} P_{ji} \) as depicted in Fig. 12.

Fig. 14
figure 14

Different functionalities of the full-adder and half-adder cells to accumulate various sets of input posibits and negabits exploited for the second part of step 1 of the Algorithm 3

4.2 Step 2 of the Algorithm 3: Digit-Product Rotation and Partial Product Generation

The inputs and outputs of the second stage are as follows:

  • Inputs:k2 digit products Ai × Bj shown by two-digit MRSD numbers \( P_{ji}^{{\prime }} P_{ji} \)

  • Outputs: 2k partial products in the form of k-digit MRSD-RNS numbers

A two-digit MRSD number \( P_{ji}^{{\prime }} P_{ji} \), produced by multiplying Ai by Bj in the first step, is located in position indices of (i + j) and (i + j+1). In the second step, the input k2 two-digit MRSD numbers, obtained from the previous step, are arranged first to achieve 2k k-digit MRSD numbers. Figure 15 depicts two k-digit radix-2h MRSD-RNS redundant residues, A and B. Afterwards, the digits with position indices more than n (rotation axis in the figure) are rotated based on the difference between their position indices and n. That is, the modular rotation is performed whenever the addition of two indices of Puv or \( P_{uv}^{{\prime }} \) is greater than n. let us considered that q = u+v. Then, Puv or \( P_{uv}^{{\prime }} \) is rotated to \( \left| q \right|_{k} \) and \( \left| {q + 1} \right|_{k} \), respectively.

Fig. 15
figure 15

The first and second steps of the proposed Algorithm 3: reducing K2 two-digit MRSD numbers to 2k k-digit MRSD-RNS partial products (PPi and \( PP_{i}^{{\prime }} \)) by appropriate rotating according to the module

As indicated in Fig. 15, for multiplication of B0 by A, as an example, the only digit whose index is greater than n is \( P_{{\left( {k - 1} \right)0}}^{{\prime }} \) that should be rotated. Similarly, for e multiplication of B1 by A, 3 digits, i.e. \( P_{{_{{1\left( {k - 1} \right)}} }}^{{\prime }} \), P1(k−1) and P’1(k−2), and for the multiplication of Bk-1 by A, all digits except P00 have to be rotated. For modulo 2n − 1, the bits of rotating digits keep their polarity, i.e. posibits remain posibits and negabits remain negabits.

4.3 Step 3 of the Algorithm 3: Partial Products Accumulation

In this step, we have 2k k-digit MRSD-RNS partial products that can be accumulated simply by the proposed MRSD-RNS adder depicted in Fig. 10. So, k MRSD-RNS adders are first required to reduce 2k partial products to k partial products as shown in level 1 of Fig. 16. Then, \( k/2 \) MRSD-RNS adders are utilized for summation in the level 2. The reduction continues until the final result is achieved. The final result is achieved after \( 1 + \log_{2} k \) levels.

Fig. 16
figure 16

MRSD-RNS adder tree structure for summing 2k partial products

4.4 MRSD-RNS Examples

Example 1

(radix-8 MRSD-RNS modulo 29 − 1 multiplication, i.e. h = 3, k = 3 and n = 9) the first step of radix-8 MRSD-RNS multiplication for digits Ai and Bj results in a two-digit MRSD number as revealed in Fig. 17. The figure shows the bit representation of multiplying two radix-8 MRSD-RNS redundant residues digits Ai and Bj.

$$ A_{i} = \begin{array}{*{20}l} {x_{i}^{2} x_{i}^{1} x_{i}^{0} } \hfill & {} \hfill \\ {} \hfill & {X_{i}^{0} } \hfill \\ \end{array} ,\quad B_{j} = \begin{array}{*{20}l} {y_{i}^{2} y_{i}^{1} y_{i}^{0} } \hfill & {} \hfill \\ {} \hfill & {Y_{i}^{0} } \hfill \\ \end{array} $$
Fig. 17
figure 17

Example 1: step 1 of Algorithm 3 (digit-product computation) for two radix-8 MRSD-RNS digits Ai and Bj which results in a two-digit MRSD number \( P_{ji}^{{\prime }} \,P_{ji} \) (left) the numeric example for digit product 3 by 5 which results in 15 (right)

The first part in Fig. 17 indicated as bit product is the product of each bit of Bj multiplied by Ai. As mentioned earlier, the bit products are produced by the structures illustrated in Fig. 13 The second part includes addition of different sets of posibits and negabits separated by grey rectangle utilizing the schemes shown in Fig. 14. The process of bit-product summation is done during some steps until the final result is achieved. The final result is in the form of two-digit MRSD number. As a numeric example, the digit product of 3 by 5 is also shown in the figure (right). Th process of producing the final result 15 is also traceable.

Figure 18 indicates the three steps of radix-8 MRSD-RNS modulo 29 − 1 multiplication. After digit-product generation in the first step, digit products are rotated in the second step. Then, partial product accumulation for the 3-digit radix-8 MRSD-RNS multiplication is performed. The rotated partial products in the form of MRSD-RNS number are accumulated by MRSD-RNS adder.

Fig. 18
figure 18

Example 1: the three steps of the proposed modulo 29 − 1 multiplication algorithm for two 3-digit radix-8 MRSD-RNS numbers

Example 2

(radix-16 MRSD-RNS Digit Production) Figure 19 reveals the bit representation of multiplying two radix-16 MRSD-RNS digits Ai and Bj as the first step of Algorithm 3.

$$ A_{i} = \begin{array}{*{20}l} {x_{i}^{3} x_{i}^{2} x_{i}^{1} x_{i}^{0} } \hfill & {} \hfill \\ {} \hfill & {X_{i}^{0} } \hfill \\ \end{array} ,\quad B_{j} \begin{array}{*{20}l} {y_{i}^{3} y_{i}^{2} y_{i}^{1} y_{i}^{0} } \hfill & {} \hfill \\ {} \hfill & {Y_{i}^{0} } \hfill \\ \end{array} $$
Fig. 19
figure 19

Example 2: step 1 of Algorithm 3 (digit-product computation) for two radix-16 MRSD-RNS digits Ai and Bj which results in a two-digit MRSD number \( P_{ji}^{{\prime }} \,P_{ji} \)

The next two steps are similar to the previous example depicted in Fig. 15.

5 Simulation Results and Comparison

To compare the proposed MRSD-RNS multiplier with the existing RRNS multipliers, first we selected the most efficient existing modulo 2n − 1 multipliers for BSD-RNS [24, 30] and SUT-RNS [20]. Afterwards, we described MRSD-RNS, BSD-RNS and SUT-RNS multipliers in VHDL (VHSIC—Very high speed integrated circuits—Hardware Description Language) and then synthesized them by the Synopsys Design Vision tool. The modulo 2n − 1 RRNS multipliers were synthesized by TSMC-90 nm CMOS technology under typical conditions (1 V, 25 °C).

Figure 20 indicates total delay, PDP and ADP of radix-16 BSD-RNS, SUT-RNS, and the proposed MRSD-RNS multipliers versus different number of bits (n). Due to great range of changes in ADP and PDP diagrams, these two figures are depicted in logarithmic scales and for better understanding, the percentage of each bar is calculated and demonstrated on top of it. Moreover, Fig. 21 indicates total area and total power consumption of the three RRNS multipliers for n = 128, as an example.

Fig. 20
figure 20

Total ADP, PDP and delay of radix-16 MRSD-RNS, BSD_RNS and radix-16 SUT-RNS multipliers versus different number of digits

Fig. 21
figure 21

Total power and area of radix-16 MRSD-RNS, BSD_RNS and SUT-RNS multipliers for modulo 2128 − 1

As depicted in Fig. 20, BSD-RNS outperforms delay of the high-radix RRNSs for radix-16. But PDP and ADP of MRSD-RNS outperform the other RRNSs for n > 32. It is reasonable because the main property of high-radix RRNSs is their flexibility to receive the desired area-time-power constraints with increasing the number of bits. Accordingly, simulation results of RRNS multipliers for n = 64 and 128 are presented in Tables 2 and 3. The tables indicate total area, total power consumption, PDP and ADP of the four radix-16 RRNS modulo 264 − 1 and 2128 − 1 multipliers for their minimum possible delay. As it is reasonable, BSD-RNS multiplier offers the least delay because the carry propagates shorter path in comparison with high-radix RRNSs. However, the proposed MRSD-RNS multiplier has the least area, power, PDP and ADP.

Table 2 Delay, area, power, PDP and ADP of the modulo 264 − 1 RRNS multipliers for minimum delay effort
Table 3 Delay, area, power, PDP and ADP of the modulo 2128 − 1 RRNS multipliers for minimum delay effort

From Fig. 20, it is concluded that MRSD-RNS offers the least ADP and PDP among all RRNS multipliers for large numbers, i.e. when n is greater than 32. Therefore, the proposed MRSD-RNS could be recognized as an efficient RRNS in terms of area-time-power for large numbers. According to Fig. 21 for the cases in which delay is not the limiting factor, our proposed MRSD-RNS multiplier outperforms area and power of the BSD-RNS and SUT-RNS multipliers versus different delays. For example, to develop a multiplier with 7.5 ns delay, radix-16 modulo 2128 − 1 MRSD-RNS multiplier achieves 60.3%, 58.5% and 69.8% less area and 54.2%, 54.6% and 70.2% less power when compared with BSD-RNS [24, 30] and SUT-RNS [20], respectively.

For further enhancing the results, we utilized matrix multiplication benchmark [25, 27] which is the most suitable benchmark for comparing and approving the efficiency of multiplier architectures. Table 4 indicates the results of 2 × 2 square matrix multiplication for modulo 264 − 1 RRNS multipliers. As it is expected, although MRSD_RNS increases delay up to 13.5%, it improves PDP and ADP by 29.2% and 20%, respectively.

Table 4 Delay, area, power, PDP and ADP of the modulo 264 − 1 RRNS multipliers for 2-by-2 matrix multiplication benchmark

Moreover, in order to retrieve design layout, netlist of modulo 264 − 1 MRSD-RNS multiplier is extracted from Synopsis Design Vision tool and applied to Cadence SOC Encounter tool. SOC Encounter is an automatic place and route software from Cadence which enables quick full-chip virtual prototyping to accurately capture downstream physical or electrical impacts. Figure 22 represents the layout view of modulo 264 − 1 MRSD-RNS multiplier.

Fig. 22
figure 22

Layout view of modulo 264 − 1 MRSD-RNS multiplier

6 Conclusion

In order to achieve a good trade-off between delay, area and power, we need to explore new processors to support area-time-energy limited applications. RRNS turns out to be an efficient number system as it eliminates carry propagation chain inside of the RNS moduli. The proposed MRSD-RNS is a kind of RRNS that leads to area-time-power efficient arithmetic units compared to other RRNS encodings.

In this article, we first modified the MRSD-RNS modulo 2n − 1 addition algorithm and then proposed modulo 2n − 1 multiplier for MRSD-RNS encoding. According to the synthesis results, our newly proposed modulo 2n − 1 multiplier for radix-16 MRSD-RNS has less area, power, PDP and ADP than the most efficient RRNS modulo 2n − 1 multipliers for n greater than 32. Besides, MRSD-RNS achieves the least delay among existing high-radix RRNS multipliers. As an example, radix-16 modulo 2128 − 1 MRSD-RNS multiplier outperforms the delay of SUT-RNS multiplier by %29.5, while it achieves 42.2%, 37.7% and 52.5% less PDP and 54.9%, 55.1% and 54.9% less ADP compared to BSD-RNS [24], BSD-RNS [30] and SUT-RNS [20] multipliers, respectively. Moreover, for modulo 2128 − 1 multiplier, as an example, improvements in 60.3%, 58.5% and 69.8% less area and 54.2%, 54.6% and 70.2% less power are achieved for 7.5 ns delay in comparison with BSD-RNS [24], BSD-RNS [30] and SUT-RNS [20], respectively. Therefore, modulo 2n − 1 MRSD-RNS multiplier is a promising RRNS multiplier to realize area-time-power efficient processor.