Abstract
Modular inversion is a very common primitive used for the cryptographic computations. It is the most computation intensive unit which demands more resources as compared to other primitives. Inside the modular inversion arithmetic circuits, considerable speed up with optimized architecture is required. This paper proposes an optimized parallel architecture for Itoh-Tsujii modular inversion algorithm for the field GF(2256) by introducing 23 blocks. The comparative results with conventional architecture show the 30% reduction in LUT requirement with 37% in combinational delay.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
- Modular inversion
- Galois Field (GF)
- Fermat’s little theorem
- Euclidean algorithm
- Extended Euclidean algorithm
1 Introduction
Galois Field arithmetic grabs a substantial growth in recent years due to its applications in numerous cryptographic systems. The mathematics in the Galois Field basically includes three types of function: (i) modular addition (ii) modular multiplication and (iii) modular inversion. Among them, the modular inversion has gathered significant attention as its properties are proven to be useful in the field of cryptography. The Extended Euclidean algorithm and Fermat’s little theorem are two most popular methods for large finite field inversion [1]. For extension fields GF(2 m ), the Itoh-Tsujii inversion algorithm [2] is the best alternative. It reduces extension field inversion to inversion in binary field for which inversion operation becomes easier. The inversion in binary field is done either using look-up tables or with a series of binary squaring and multiplication operations. The Itoh-Tsujii algorithm is applicable to finite fields GF(2 m ) in normal basis representation. However, the original reference deals with composite fields GF((2 n ) m ). This paper applies the idea of Itoh-Tsujii algorithm to composite fields GF((2 n ) m ) in polynomial basis representation. Although the use of exponentiation operations required in the algorithm make it much complex for general fields in a polynomial basis representation. The exponentiations can be computed with a very low complexity for certain classes of finite fields.
This paper is organized as follows; Sect. 2 discusses brief overview of mathematical background and related work of Itoh-Tsujii Algorithm, equations for Fermat’s little algorithm and develops a method for realizing Itoh-Tsujii algorithm. Section 3 gives an idea about previous works done in this field. Section 4 proposes the modifications in hardware implementation of Itoh-Tsujii algorithm. Experimental results are discussed in Sect. 5 and Sect. 6 concludes the paper.
2 Mathematical Background
The extended Euclidean algorithm is a modification of the Euclidean algorithm used to calculate GCD of two numbers. It contains recursive division operations that are not suitable for hardware implementation. Hence, this procedure is mainly used in software that is based on modular inversion. The Fermat’s little theorem [1], on the other hand, is mainly based on exponentiation of numbers which are relatively more hardware oriented. This approach is used as a reference to implement modular inversion in hardware architectures. The following subsection describes the mathematics of Fermat’s little theorem.
2.1 Fermat’s Little Theorem Based Inversion in GF(2 m )
Let \( \alpha \) is an element in a Galois Field GF(2 m ). The term \( \alpha^{ - 1} \in GF(2^{m} ) \) can be determined using following expression:
Now the term 1 + 2+… + 2 m−2 can be factorized in two ways:
2.2 Itoh-Tsujii Multiplicative Inversion in GF(2 m )
The Itoh-Tsujii algorithm is the modified form of Fermat’s little theorem. It evaluates the inversion using a series of recursive multiplications and squarings. Factorization of the expression \( 1 + 2 + 2^{2} + \ldots + 2^{{{\text{m}} - 2}} \) is carried out in such a way that minimum number of additions are required for implementation. For example, if m = 9, then the above expression can be decomposed as \( 1 + 2 + 2^{2} + \ldots + 2^{7} = \left( {1 + 2} \right).\left( {1 + 2^{2} } \right).\left( {1 + 2^{4} } \right) \). Since, it includes only three additions, it is most useful for hardware implementation. The number of plus signs in the decomposition of the statement \( 1 + 2+ \ldots + 2 ^{\text{m - 2}} \) denotes the number of multiplications required to implement the inversion. The job of this algorithm is to reduce the number of multiplication blocks as much as possible. The Itoh-Tsujii algorithm is based on the simple idea shown in (2) [3]. From these two expressions one can easily derive that the number of multiplications sufficient to determine the inverse of an element \( \alpha \in GF(2^{m} ) \).
Addition chain can also be used to reduce the number of multiplications [4]. It is a series of successive numbers formulated such that each number can be obtained by addition of two of its precedent numbers. An addition chain with minimum number of elements is generated and inverse can be computed using the expression
Where \( \beta_{k} = \alpha^{{2^{k} - 1}} \)
For simplicity, we shall denote \( \beta_{k} (\alpha ) \) by \( \beta_{k} \). For the analytical approach, we use the identity
For an element \( \alpha \in GF(2^{m} ) \) the inverse can be calculated as \( \alpha^{ - 1} = [\beta_{255} (\alpha )]^{2} \) [5]. The term \( \beta_{255} (\alpha ) \) is obtained using (3) and an addition chain for 255 given by
3 Related Work
Hardware architectures for modular inversion are proposed in [1] for extended Euclidean algorithm and Itoh-Tsujii algorithm, using polynomial as well as Gaussian normal basis. The Itoh-Tsujii algorithm is used to determine the modular inversion for the field GF(2 m ). It was first proposed in [2] for the normal basis representation. A lot of work has been done to improve the original algorithm and make it feasible to analyse for different basis representation. In [4], a theoretical model to implement Itoh-Tsujii algorithm on a k-input LUT based FPGA is presented. This idea was further reviewed in [5], where a modified Itoh-Tsujii algorithm was proposed for efficient implementations on FPGA platforms. A fast implementation of the algorithm was proposed in [6] which can evaluate the inverse in 10 clock cycles for GF(2 233 ) and GF(2 409 ) fields. In [7], the Itoh-Tsujii algorithm is generalized for the fields GF(q m ) using polynomial basis representation. In this paper, we propose an optimized parallel architecture of Itoh-Tsujii algorithm for GF(2 256 ) on FPGA platform.
4 Proposed Work
In this paper, the architecture of Itoh-Tsujii algorithm is modified in order to achieve efficient implementation on FPGA. We assess the analytical complexity of the addition chain shown in Table 1 as follows. The algorithm performs 10 iterations (since \( \beta_{1} (\alpha ) \) is \( \alpha \) itself) and one field multiplication per iteration. Thus, we conclude that a total of 10 field multiplication calculations are required. This is much better than the Fermat’s little theorem implementation which requires 255 multiplications. However, the number of square blocks required is still very high. The Itoh-Tsujii algorithm requires 254 square computation blocks. A hybrid Karatsuba multiplier is used for multiplication operation in binary field. The efficiency of architecture is estimated in terms of maximum combinational delay and power. In case of conventional (parallel) architecture of Itoh-Tsujii algorithm, a large number of cascaded square blocks are used, which degrades the performance of the device. The use of Quad [5] and Octet block improves the speed of modified architecture.
4.1 Significance of Quad Circuits
Since the number of squaring operations is as high as 255 for conventional Itoh-Tsujii algorithm over the GF(2 256 ), we need to improve the circuit in order to reduce the number of blocks for square operation. The quad circuit can be used to overcome this problem [6]. A quad circuit is a block which performs the operation of raising the input by a power of four instead of squaring operation. In Itoh-Tsujii algorithm, we can use any exponentiation circuit of the form 2n. In this paper, the advantages of using 22 circuits on FPGAs for exponentiation in fields with irreducible trinomials are observed. Quad circuits offer the best LUT utilization for an FPGA with four or six input LUTs. The irreducible trinomial for the field is x 9 + x + 1. We observe from Table 2 that the quad circuit’s LUT requirement significantly reduced by around 25% [5]. This is because the quad circuit utilizes FPGA resources better than the squarer. Moreover, since quad is a single stage combinational circuit, both circuits have the same delay of one LUT. These observations are scalable to larger fields like GF(2 233 ) and GF(2 193 ) [4].
The limitation of using quad circuits instead of squarers depends on the fields generated by irreducible polynomial. When irreducible pentanomials are used for generating the field instead of irreducible trinomials, the saving of area is almost negligible due to the fact that a quad circuit and two cascaded squarers will have about the same area. Unfortunately there is no irreducible trinomial for GF(2 256 ) field. However, the combination of squarer and quad computation blocks, significant improvement in overall area delay product is possible. Based on these observations we propose a hybrid-Itoh-Tsujii algorithm for fields generated by irreducible pentanomials which use quad exponentiation circuits as well as squarer circuits. Table 3 shows the evaluation of \( \beta_{255} (\alpha ) \) using an improved Itoh-Tsujii algorithm implemented using hybrid approach.
4.2 Significance of 2 3 Circuits
The idea of combining multiple squarer blocks into a single unit is further explored using a 23 circuit. The proposed logic block is a combinational unit, mathematically equivalent to three cascaded squarer blocks. However, combining multiple blocks into a single unit results in efficient LUT utilization and less computation time. Also, 3 completly divides 255, the architecture consists of 23 blocks only, except at the initial stage for precomputation of the term \( \alpha^{7} \) and the final stage of inversion [5]. Table 4 shows the evaluation of \( \beta_{255} (\alpha ) \) using an optimized Itoh-Tsujii algorithm architecture implemented using 23 blocks. It is observed that the cascade blocks of Quads and Squarer block shown in Table 3 is completely eliminated.
The architecture for Itoh-Tsujii algorithm considering a special class of irreducible polynomial, \( m\left( x \right) = x^{256} + x^{10} + x^{5} + x^{2} + 1 \) for GF(2 256 ) is presented in Fig. 1. It uses field multiplication, field squaring and field Octet operators as its primary building blocks. We also show how this version of the algorithm can be parallelized to improve the efficiency when implemented in hardware platforms.
5 Result and Discussions
The comparison of performance of various exponentiation blocks on the binary fields with irreducible polynomials is shown in Table 5. The use of 2n circuit improves the performance of exponentiation. For Virtex-6 and 7, both area and speed of Octet architecture is improved. This is due to higher utilization factor of respective FPGAs. The purpose of optimization of FPGA design is to ensure that the resources of the device are utilized completely. The smallest programmable unit in the FPGA is the lookup table, which generally has four (or six) inputs. The LUTs are used to implement any Boolean logic function of four (or six) variables. When a logic function with less than four (or six) variables is implemented using LUT, the LUT is underutilized. An optimized implementation is obtained when each LUT is utilized up to the maximum extent. The proposed 23 blocks are best-utilized using Virtex-6 and Virtex-7 FPGA platforms.
Table 6 compares various parameters of our architectures with generic architectures. These results show that in case of finite fields with irreducible pentanomial, the parameters of the circuit can be improved enough to be compared with finite fields with irreducible trinomials. The results show almost 40% improvement in cumulative delay when octet architecture is used.
6 Conclusion
This paper optimizes the parallel Itoh-Tsujii inverse algorithm to implement on FPGA platforms using Squarer, Quad and Octet blocks. Hybrid Itoh-Tsujii algorithm is put forward for the fields generated by irreducible pentanomials. Area-delay product is considerably reduced by using Octet block. The octet block of proposed architecture requires 30% less area and increases the speed of operation by 37%. An FPGA architecture of the Octet Itoh-Tsujii algorithm architecture results in 40% and 20% improved delay as compared with squarer and quad architectures respectively.
References
Trujillo-Olaya, V., Velasco-Medina, J.: Hardware architectures for inversion in GF (2m) using polynomial and gaussian normal basis. In: ANDESCON IEEE 2010 Conference Publications, pp. 1–5 (2010)
Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF(2m) using normal bases. Inf. Comput. 78(3), 171–177 (1988)
Dimitrov, V., Järvinen, K.: Another look at inversions over binary fields. In: 2013 IEEE 21st Symposium on Computer Arithmetic, pp. 211–218 (2013)
Roy, S.S., Rebeiro, C., Mukhopadhyay, D.: Theoretical modelling of the Itoh-Tsujii inversion algorithm for enhanced performance on k-LUT based FPGAs. In: Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, vol. 1, pp. 1–6, March 2011
Rebeiro, C., Roy, S.S., Reddy, D.S., Mukhopadhyay, D.: Revisiting the Itoh-Tsujii inversion algorithm for FPGA platforms. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19(8), 1508–1512 (2011)
Parrilla, L., Lloris, A., Castillo, E., et al.: Minimum-clock-cycle Itoh-Tsujii algorithm hardware implementation for cryptography applications over GF(2m) fields. Electron. Lett. 48(18), 1126–1128 (2012)
Guajardo, J., Paar, C.: Itoh-Tsujii inversion in standard basis and its application in cryptography and codes. Des. Codes Cryptogr. 25(2), 207–216 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zode, P., Deshmukh, R.B., Samad, A. (2017). Fast Architecture of Modular Inversion Using Itoh-Tsujii Algorithm. In: Kaushik, B., Dasgupta, S., Singh, V. (eds) VLSI Design and Test. VDAT 2017. Communications in Computer and Information Science, vol 711. Springer, Singapore. https://doi.org/10.1007/978-981-10-7470-7_5
Download citation
DOI: https://doi.org/10.1007/978-981-10-7470-7_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7469-1
Online ISBN: 978-981-10-7470-7
eBook Packages: Computer ScienceComputer Science (R0)