1 Introduction

Multiplication in characteristic three fields, denoted by \(\mathbb {F}_{3^{n}}\), where n≥1, is employed in curve-based cryptography. The use of these fields in elliptic curve cryptography has been discussed in [11, 14, 16, 17, 19, 21]. Examples of work related to efficient arithmetic in characteristic three fields can be found in [13, 20]. A common method for multiplication in \(\mathbb {F}_{3^{n}}\) is to use polynomial basis representation, in which the elements of \(\mathbb {F}_{3^{n}}\) are represented by polynomials of degree up to (n−1) over \(\mathbb {F}_{3}\), the finite field with three elements. To perform multiplication in \(\mathbb {F}_{3^{n}}\), the polynomials are first multiplied, and the result is then reduced modulo an irreducible polynomial of degree n over \(\mathbb {F}_{3}\). The arithmetic cost of the reduction step is linear in input size; on the other hand, the polynomial multiplication step requires a sub-quadratic complexity. The multiplication step is thus more costly than the reduction step. As a result, reducing the cost of polynomial multiplication over \(\mathbb {F}_{3}\) directly affects the cost of multiplication in \(\mathbb {F}_{3^{n}}\).

Our Contributions

For practical cryptographic applications, polynomial multiplication schemes with low arithmetic complexity are essentially Karatsuba-like algorithms. For example, recursive uses of 2-way and 3-way algorithms have a total arithmetic complexity of 7n 1.58−8n+2 and 6.8n 1.63−8n+2.2, respectively. In this paper, after introducing an improved version of the 2-way and 3-way algorithms, we propose a 3-way polynomial multiplication algorithm with five multiplications using interpolation in \(\mathbb {F}_{9}\). In contrast to the formula presented in [13], we show that the recursive use of the algorithm yields an arithmetic complexity of \(15n^{1.46}-4.85n\log _{3}n-14n\). In addition, the time delay complexity that are useful when the algorithm is mapped on to bit parallel hardware are also derived. The final proposal is a new 3-way algorithm for multiplication of polynomials over \(\mathbb {F}_{3}\) which uses three multiplications of polynomials of 1/3 of the input size over \(\mathbb {F}_{3}\) and one multiplication of polynomial of 1/3 of the input size over \(\mathbb {F}_{9}\) with a total complexity less than 15n 1.46 which is, to our knowledge, superior arithmetic complexity than that available with previously known algorithms.

Organization of the Paper

The remainder of the paper is organized as follows. Notations and preliminaries used in the rest of the paper are provided in Section 2. Relevant known algorithms along with our suggestions for their improvements are presented in Section 3. The next section introduces the proposed 3-way algorithm with five 1/3 sized polynomial multiplications over \(\mathbb {F}_{3}\) and \(\mathbb {F}_{9}\). Section 5 reports the results of our hardware implementation. Further improvements are discussed in Section 6 and concluding remarks are made in the final section.

2 Notations and Preliminaries

This section explains the notations used in the remainder of the paper and describes a few basic algorithms. Unless otherwise stated, the fields employed in the work are assumed to be of characteristic three. The following notations are used:

  • M 3,⊕(n): number of \(\mathbb {F}_{3}\) additions (or subtractions) required for the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{3}\).

  • M 3,⊗(n): number of \(\mathbb {F}_{3}\) multiplications required for the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{3}\).

  • M 3(n): number of total \(\mathbb {F}_{3}\) operations required for the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{3}\), i.e., M 3(n) = M 3,⊕(n) + M 3,⊗(n).

  • M 9,⊕(n): number of \(\mathbb {F}_{3}\) additions (or subtractions) required for the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{9}\).

  • M 9,⊗(n): number of \(\mathbb {F}_{3}\) multiplications required for the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{9}\).

  • M 9(n): number of total \(\mathbb {F}_{3}\) operations required for the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{9}\), i.e., M 9(n) = M 9,⊕(n) + M 9,⊗(n).

  • D 3(n): delay complexity associated with the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{3}\).

  • D 9(n): delay complexity associated with the multiplication of two degree n−1 polynomials over \(\mathbb {F}_{9}\).

  • D : latency of an \(\mathbb {F}_{3}\) addition (or subtraction).

  • D : latency of an \(\mathbb {F}_{3}\) multiplication.

We represent the elements of \(\mathbb {F}_{3^{n}}\) as polynomials over \(\mathbb {F}_{3}\) with a degree less than n. Moreover, we construct \(\mathbb {F}_{9}\cong \mathbb {F}_{3}[X]/(X^{2}+1)\) and assume that ω 2+1=0, where \(\omega \in \mathbb {F}_{9}.\)

A further assumption is that multiplication by −1 of a polynomial is cost free and that addition and subtraction have identical complexity. It should also be noted that the cost of the multiplication in \(\mathbb {F}_{9}\) can be assumed to be equivalent to four multiplications and two additions in \(\mathbb {F}_{3}\), based on the following formula:

$$(a+b\omega)(c+d\omega)=ac-bd+(bc+ad)\omega.$$

In addition, no cost is incurred for the multiplication of an element in \(\mathbb {F}_{9}\) by ω since (a + b ω)ω=−b + a ω.

Throughout the paper we make use of the solution to the following recurrence equation.

Lemma 1

Let a,b,ℓ be positive integers, n=b , a≠1, and

$$M(n)=aM(n/b)+cn+d+fn^{\delta}, \;\; M(1)=e$$
  1. (i)

    If a≠b and f=0 then the solution of M(n) is

    $$M(n)=\left( e+{\frac{bc}{a-b}}+{\frac{d}{a-1}} \right)n^{\log_{b}a}-{\frac{bc}{a-b}} n+{\frac{d}{a-1}}.$$
  2. (ii)

    If a=b then the solution of M(n) is

    $$M(n)={\frac{fb^{\delta}}{b^{\delta}-a}}n^{\delta}+\left( e-{\frac{fb^{\delta}}{b^{\delta}-a}}+{\frac{d}{a-1}} \right)n+cn\log_{b}n-{\frac{d}{a-1}}.$$
  3. (iii)

    If a≠b then the solution of M(n) is

    $$\begin{array}{@{}rcl@{}} M(n)&=&{\frac{fb^{\delta}}{b^{\delta}-a}}n^{\delta}+\left( e+{\frac{bc}{a-b}}{~}-{\frac{fb^{\delta}}{b^{\delta}-a}}{~}+\frac{d}{a-1} \right)n^{\log_{b}a}\\ &&-\left( {\frac{bc}{a-b}}\right)n-{~}{\frac{d}{a-1}}. \end{array} $$

Proof

Proofs of (i) and (ii) are in [10] and [7]. For (iii), we substitute the value of M(n/b) into M(n). Then, we have

$$M(n)=a(aM(n/b^{2})+cn/b+d+fn^{\delta}/b^{\delta})+cn+d+fn^{\delta}.$$

This equation yields

$$M(n)=a^{2}M(n/b^{2})+(cn+acn/b)+(d+ad)+(fn^{\delta}+afn^{\delta}/b^{\delta}).$$

When we substitute the value of M(n/b 2) into the last equation and continue this process, we obtain

$$\begin{array}{@{}rcl@{}} M(n)&=&a^{\ell} M(1)+cn(1+a/b+\ldots+(a/b)^{\ell-1})\\ &&+d(1+a+\ldots+a^{\ell-1})+fn^{\delta}(1+(a/b^{\delta})+\ldots\\ &&+(a/b^{\delta})^{\ell-1}). \end{array} $$

After computing the expressions in the parenthesis and using \(a^{\ell }=a^{\log _{b}n}=n^{\log _{b}a}\), we get

$$\begin{array}{@{}rcl@{}} M(n)\!&=&\!\frac{fb^{\delta}}{b^{\delta}-a}n^{\delta}\,+\,\left( e\,+\,\frac{bc}{a-b\!}-\!\frac{fb^{\delta}}{b^{\delta}-a}+\frac{d}{a-1} \right)n^{\log_{b}a}\\ &&-\left( \frac{bc}{a-b}\right)n-\frac{d}{a-1}. \end{array} $$

3 Known Algorithms and their Improvements

This section presents Karatsuba 2-way and 3-way algorithms for characteristic three fields. Following the work in [4] and [23] for improving the corresponding algorithms in characteristic two, we introduce improvements for characteristic three.

Remark 1

To apply recursive 2-way and 3-way algorithms, the polynomials are split in two and three parts, respectively. If n is not divisible by two or three, we pad the polynomial with one or two zeros so that the sizes become divisible by two or three. This adjustment has a negligible effect on the complexity.

3.1 Karatsuba 2-Way Algorithm

Let \(A={\sum }_{i=0}^{n-1} a_{i} X^{i}\) and \(B={\sum }_{i=0}^{n-1} b_{i} X^{i}\). We can divide A and B into two parts as follows: A(X) = A 0 + A 1 X n/2 and B(X) = B 0 + B 1 X n/2 where A i and B i are polynomials of degree less than n/2. Let \(C={\sum }_{i=0}^{2} C_{i} X^{ni/2}\) be the product of A and B. The Karatsuba 2-way algorithm [12, 15] is the following:

$$ \left\{\begin{array}{l} P_{0}=A_{0}B_{0},\;\; P_{1}=(A_{0}+A_{1})(B_{0}+B_{1}),\;\;P_{2}=A_{1}B_{1}, \\ C=P_{0}+(P_{1}-P_{0}-P_{2})X^{n/2}+P_{2}X^{n}. \end{array} \right. $$
(1)

The algorithm given in Eq. 1 requires three multiplications of two degree n/2−1 polynomials plus 4n−4 additions. On the other hand, the delay complexity of the algorithm is D 3(n/2)+3D . The recursive use of this algorithm is thus associated with the following complexities:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)& \leq &3M_{3,\otimes}(n/2), \, M_{3,\otimes}(1)=1, \\ M_{3,\oplus}(n)&\leq &3M_{3,\oplus}(n/2)+4n-4, \, M_{3,\oplus}(1)=0,\\ M_{3}(n)&\leq &3M_{3}(n/2)+4n-4, \, M_{3}(1)=1,\\ D_{3}(n)&\leq &D_{3}(n/2)+3D_{\oplus}, D_{3}(1)=D_{\otimes} . \end{array} \right. $$
(2)

Using Lemma 1, we obtain the following bounds:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq &n^{\log_{2}3}, \\ M_{3,\oplus}(n)&\leq &6n^{\log_{2}3}-8n+2,\\ M_{3}(n)&\leq &7n^{\log_{2}3}-8n+2,\\ D_{3}(n)&\leq &3(\log_{2}n)D_{\oplus}+D_{\otimes}. \end{array} \right. $$
(3)

3.2 Improved Karatsuba 2-Way Algorithm

Using the algorithm given in [4] enables us to reconstruct part of the algorithm given in Eq. 1 as follows:

$$ \left\{\begin{array}{l} P_{0}=A_{0}B_{0},\;\; P_{1}=(A_{0}+A_{1})(B_{0}+B_{1}),\;\;P_{2}=A_{2}B_{2}, \\ C=(X^{n/2}-1)(X^{n/2}P_{2}-P_{0})+P_{1}X^{n/2}. \end{array}\right. $$
(4)

The algorithm given in Eq. 4 requires three multiplications of two degree n/2−1 polynomials plus 7n/2−3 additions. The delay complexity of the algorithm is D 3(n/2)+3D . Therefore, using this algorithm recursively yields the following complexity:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&3M_{3,\otimes}(n/2), \, M_{3,\otimes}(1)=1, \\ M_{3,\oplus}(n)&\leq&3M_{3,\oplus}(n/2)+7n/2-3, \, M_{3,\oplus}(1)=0,\\ M_{3}(n)&\leq&3M_{3}(n/2)+7n/2-3, \, M_{3}(n)=1,\\ D_{3}(n)&\leq&D_{3}(n/2)+3D_{\oplus}, D_{3}(1)=D_{\otimes}. \end{array} \right. $$
(5)

Using Lemma 1 gives the following bounds:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq& n^{\log_{2}3}, \\ M_{3,\oplus}(n)&\leq& 5.5n^{\log_{2}3}-7n+1.5,\\ M_{3}(n)&\leq&6.5n^{\log_{2}3}-7n+1.5,\\ D_{3}(n)&\leq& 3(\log_{2}n)D_{\oplus}+D_{\otimes}. \end{array} \right. $$
(6)

Compared to Eq. 3, using Eq. 6 results in about 7% reduction in the number of arithmetic operations. The delay complexities of Eqs. 3 and 6 are the same.

3.3 Karatsuba Like 3-Way Algorithm

As before, let \(A={\sum }_{i=0}^{n-1} a_{i} X^{i}\) and \(B={\sum }_{i=0}^{n-1} b_{i} X^{i}\). This time we divide A and B into three parts as follows: A(X) = A 0 + A 1 X n/3 + A 2 X 2n/3 and B(X) = B 0 + B 1 X n/3 + B 2 X 2n/3 where A i and B i are polynomials of degree less than n/3. To compute the product C = A B, a Karatsuba-like 3-way algorithm, which can be obtained using the Chinese remainder theorem [22] and from [18], can be expressed as follows:

$$ \left\{\begin{array}{l} P_{0}=A_{0}B_{0},\;\; P_{1}\,=\,A_{1}B_{1},\;\;P_{2}\,=\,A_{2}B_{2},\;\;P_{3}\,=\,(A_{0}+\!A_{1})(B_{0}+B_{1}),\\[-.5pt] P_{4}=(A_{0}+A_{2})(B_{0}+B_{2}),\;\;P_{5}=(A_{1}+A_{2})(B_{1}+B_{2}).\\[-.5pt] C=P_{0}+(P_{3}-P_{0}-P_{1})X^{n/3}+(P_{4}+P_{1}-P_{0}-P_{2})X^{2n/3}+\\[-.5pt] (P_{5}-P_{1}-P_{2})X^{3n/3}+P_{2}X^{4n/3}. \end{array}\right. $$
(7)

The algorithm given in Eq. 7 requires six multiplications of two degree n/3−1 polynomials plus 2n additions for P i ’s, 14n/3−7 additions for the coefficients of C and 4n/3−4 additions for overlaps. On the other hand, the delay complexity of the algorithm is D 3(n/3)+4D . The recursive use of this algorithm is therefore associated with the following complexity:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&6M_{3,\otimes}(n/3), \, M_{3,\otimes}(1)=1, \\[-.5pt] M_{3,\oplus}(n)&\leq&6M_{3,\oplus}(n/3)+8n-11, \, M_{3,\oplus}(1)=0,\\ M_{3}(n)&\leq&6M_{3}(n/3)+8n-11, \, M_{3}(1)=1,\\[-.5pt] D_{3}(n)&\leq&D_{3}(n/3)+4D_{\oplus}, D_{3}(1)=D_{\otimes}. \end{array} \right. $$
(8)

Applying Lemma 1 gives the following bounds:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq& n^{\log_{3}6}, \\[-.5pt] M_{3,\oplus}(n)&\leq&5.8n^{\log_{3}6}-8n+2.2,\\[-.5pt] M_{3}(n)&\leq&6.8n^{\log_{3}6}-8n+2.2,\\[-.5pt] D_{3}(n)&\leq&4(\log_{3})nD_{\oplus}+D_{\otimes}. \end{array} \right. $$
(9)

3.4 Improved Karatsuba Like 3-Way Algorithm

This section presents our redesign of the reconstruction part of the algorithm in Eq. 7 with the use of a technique similar to that reported in [6, 23]. It should be noted that the degree of P i products for 0≤i≤5 is 2n/3−2. We divide each P i into two parts as P i = P i L + x n/3 P i H where P i L is a degree n/3−1 polynomial and P i H is a degree n/3−2 polynomial. Substituting those representations of the products into the reconstruction part of Eq. 7 gives the following:

$$\begin{array}{@{}rcl@{}} C&=&P_{0L}+X^{n/3}(P_{0H}-P_{0L}-P_{1L}+P_{3L})+X^{2n/3}\\ &&\times(-P_{0L}-P_{0H}+P_{1L}-P_{1H}-P_{2L}+P_{3H}+P_{4L})\\ &&+X^{3n/3}(-P_{0H}+P_{1H}-P_{2L}-P_{2H}+P_{4H}+P_{5L}-P_{1L})\\ &&+X^{4n/3}(-P_{1H}+P_{2L}-P_{2H}+P_{5H})+X^{5n/3}P_{2H}. \end{array} $$
(10)

It should be noted that Eq. 10 contains no overlaps so that we compute only the cost of the coefficients. The algorithm can be improved through the observation of some common terms R 1 = P 0H P 1L and R 2 = P 1H P 2L in Eq. 10. We then have,

$$\begin{array}{@{}rcl@{}} C&=&P_{0L}+X^{n/3}(R_{1}-P_{0L}+P_{3L})+X^{2n/3}\\ &&\times(-R_{1}-P_{0L}-P_{1H}-P_{2L}+P_{3H}+P_{4L})\\ &&+X^{3n/3}(R_{2}-P_{0H}-P_{1L}-P_{2H}+P_{4H}+P_{5L})\\ &&+X^{4n/3}(-R_{2}-P_{2H}+P_{5H})+X^{5n/3}P_{2H}. \end{array} $$
(11)

The number of additions required in Eq. 11 is computed as follows: Computing (A 0 + A 1),(B 0 + B 1),(A 0 + A 2),(B 0 + B 2),(A 1 + A 2) and (B 1 + B 2) requires 2n additions. Computing R 1 and R 2 requires n/3−1 additions each. On the other hand, we need 2n/3 additions for (R 1P 0L + P 3L ), 5n/3−2 additions for −R 1P 0L P 1H P 2L + P 3H + P 4L , 5n/3−3 additions for R 2P 0H P 1L P 2H + P 4H + P 5L and 2n/3−2 additions for −R 2P 2H + P 5H . The delay complexity of the algorithm is the same as that of the previous one. The recursive use of this algorithm results in the following complexity:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&6M_{3,\otimes}(n/3), \, M_{3,\otimes}=1, \\ M_{3,\oplus}(n)&\leq&6M_{3,\oplus}(n/3)+22n/3-9, \, M_{3,\oplus}(1)=0,\\ M_{3}(n)&\leq&6M_{3}(n/3)+22n/3-9, \, M_{3}(n)=1,\\ D_{3}(n)&\leq&D_{3}(n/3)+4D_{\oplus}, \, D_{3}(1)=D_{\otimes}. \end{array} \right. $$
(12)

Applying Lemma 1 gives the following solutions:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&n^{\log_{3}6}, \\ M_{3,\oplus}(n)&\leq&5.53n^{\log_{3}6}-7.33n+1.8,\\ M_{3}(n)&\leq&6.53n^{\log_{3}6}-7.33n+1.8,\\ D_{3}(n)&\leq& 4(\log_{3}n)D_{\oplus}+D_{\otimes}. \end{array} \right. $$
(13)

The results in Eq. 13 represents a reduction of the complexity of Eq. 9 by approximately 4%. The delay complexities are the same.

Figure 1
figure 1

Multi-evaluation (left) and reconstruction (right) data flow for multiplication in \(\mathbb {F}_{9}[X]\).

4 Proposed 3-Way Algorithm with Five Multiplications

In this section, we introduce a 3-way algorithm for multiplying polynomials of degree n−1 over \(\mathbb {F}_{3}\) with five multiplications using interpolation in \(\mathbb {F}_{9}\). Let \(A={\sum }_{i=0}^{n-1} a_{i} X^{i}\) and \(B={\sum }_{i=0}^{n-1} b_{i} X^{i}\). We can divide A and B into three parts as follows: A(X) = A 0 + A 1 X n/3 + A 2 X 2n/3 and B(X) = B 0 + B 1 X n/3 + B 2 X 2n/3 where A i and B i are polynomials of degree less than n/3. Let \(C={\sum }_{i=0}^{4} C_{i} X^{in/3}\) be the product of A and B. Recall that \(\mathbb {F}_{9}=\mathbb {F}_{3}[X]/(X^{2}+1)\) and ω is a root of X 2+1=0 in \(\mathbb {F}_{9}\).

To obtain an algorithm for the product C = A B with five multiplications, we use the interpolation method which yields Toom-Cook like formulas [5, 79, 13, 22]. Since \(\mathbb {F}_{3}\) has insufficient points for the interpolation method, we use an element from \(\mathbb {F}_{9}\), i.e., we use the points \(0,1,2,\infty \) and ω as evaluation points. Evaluation of A B = C at those points then gives us the following system of linear equations in \(\mathbb {F}_{9}\):

$$\begin{array}{@{}rcl@{}} &&{}\text{Evaluation at}~X=0 \Longrightarrow P_{0}=A_{0}B_{0}=C_{0} \\ &&{}\text{Evaluation at}~X=1 \Longrightarrow P_{1}=(A_{0}+A_{1}+A_{2})\\ &&\times(B_{0}+B_{1}+B_{2})=C_{0}+C_{1}+{\cdots} +C_{4} \\ &&{}\text{Evaluation at}~X=-1 \Longrightarrow P_{2}=(A_{0}-A_{1}+A_{2})\\ &&\times(B_{0}-B_{1}+B_{2})=C_{0}-C_{1}+{\cdots} +C_{4} \\ &&{}\text{Evaluation at}~X=\omega \Longrightarrow P_{3}=(A_{0}+A_{1}\omega-A_{2})\\ &&\times(B_{0}+B_{1}\omega-B_{2})=C_{0}+C_{1}\omega-{\cdots} +C_{4} \\ &&{}\text{Evaluation at}~X=\infty \Longrightarrow P_{4}=A_{2}B_{2}=C_{4}. \end{array} $$

Solving this system of linear equations yields the following algorithm for computing the product C = A B:

$$ \left\{\begin{array}{l} C_{0}=P_{0},\\ C_{1}=(P_{1}-P_{2})-(-P_{0}+P_{1}+P_{2}-P_{3}-P_{4})\omega,\\ C_{2}=-(P_{0}+P_{1}+P_{2}+P_{4}),\\ C_{3}=(P_{1}-P_{2})+(-P_{0}+P_{1}+P_{2}-P_{3}-P_{4})\omega,\\ C_{4}=P_{4}. \end{array}\right. $$
(14)

We now compute the cost of the recursive use of this algorithm. Assume that A and B are degree n−1 polynomials. A 0,A 1,A 2,B 0,B 1 and B 2 are therefore degree (n/3−1) polynomials. The cost associated with the operations are listed in Table 1.

Table 1 Cost of multi-evaluation and reconstruction for the new three-way split formulas.

Remark 2

For the \(\mathbb {F}_{3}\) computations indicated in Table 1, the cost of both U 4 and U 5 is zero since these are related to the ω-free part of the results .

To compute the delay complexity for the multiplication in \(\mathbb {F}_{9}[X]\), we have drawn the multi-evaluation and reconstruction data flow shown in Fig. 1. As can be seen from the figure, the critical path for the evaluation requires two additions. It begins at A 0 and ends at R 2. On the other hand, the critical path for the reconstruction needs four additions that starts from P 1 and continues through U 2, U 4, and U 5, ending at C 1. It should be noted that multiplication by ω is cost free and not counted in the complexity analysis. The last consideration is the requirement for one addition in the final overlap, so we obtain the complexity in Eq. 15.

$$ \left\{ \begin{array}{lll} M_{9,\otimes}(n)&\leq&5M_{9,\otimes}(n/3), \, M_{9,\otimes}(1)=4, \\ M_{9,\oplus}(n)&\leq&5M_{9,\oplus}(n/3)+20n-24, \, M_{9,\oplus}(1)=2,\\ M_{9}(n)&\leq&5M_{9}(n/3)+20n-24, \, M_{9}(1)=6,\\ D_{9}(n)&\leq&D_{9}(n/3)+7D_{\oplus}, D_{9}(1)=D_{\oplus}+D_{\otimes}. \end{array} \right. $$
(15)

Applying Lemma 1 leads to these bounds:

$$ \left\{ \begin{array}{lcl} M_{9,\otimes}(n)&\leq&4n^{\log_{3}5}, \\ M_{9,\oplus}(n)&\leq&26n^{\log_{3}5}-30n+6,\\ M_{9}(n)&\leq&30n^{\log_{3}5}-30n+6,\\ D_{9}(n)&\leq&(7\log_{3}n+1)D_{\oplus}+D_{\otimes}. \end{array} \right. $$
(16)

We now obtain the following complexity for M 3(n) by substituting the result from Eq. 15 into M 3(n) and applying Lemma 1:

$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&4M_{\otimes}(n/3)+M_{9,\otimes}(n/3), \, M_{3,\otimes}(1)=1,\\ M_{3,\oplus}(n)&\leq&4M_{3,\oplus}(n/3)+M_{9,\oplus}(n/3)+8n-10, \, M_{3,\oplus}(1)=0,\\ M_{3}(n)&\leq&4M_{3}(n/3)+M_{9}(n/3)+8n-10, \, M_{3}(1)=6,\\ D_{3}(n)&\leq &D_{9}(n/3)+7D_{\oplus}. \end{array} \right. $$
(17)
$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&4n^{\log_{3}5}-3n^{\log_{3}4}, \\ M_{3,\oplus}(n)&\leq&26n^{\log_{3}5}-33.33n^{\log_{3}4}+6n+1.33,\\ M_{3}(n)&\leq&30n^{\log_{3}5}-36.33n^{\log_{3}4}+6n+1.33,\\ D_{3}(n)&\leq&(7\log_{3}n+1)D_{\oplus}+D_{\otimes}, \, D_{3}(1)=D_{\otimes}. \end{array} \right. $$
(18)

When we compare the total number of arithmetic operations in Eq. 18 with that in Eq. 13, which is the best known 3-way algorithm, we see that the algorithm presented in this section becomes cost effective after the size of polynomials is more than 729. The percentage of cost reduction increases significantly when the polynomial size increases. For example, the reduction is about 4% for n=37, 14% for n=38, 25% for n=39, and 55% for n=310. The delay complexity in Eq. 18 is about 75% higher than that in Eq. 13 and this relative difference remains the same for all practical values of n. This is because the dominant coefficients for the delay complexities in Eqs. 18 and 13 are \(7\log _{3} n\) and \(4\log _{3} n\), respectively.

5 Complexity Analysis and Implementation Results for Practical n Values

This section presents the arithmetic complexity analysis and hardware implementation of polynomial multiplications over \(\mathbb {F}_{3}\) and \(\mathbb {F}_{9}\) for n=167,193,239,317 and 353. It should first be noted that we use 2-way or 3-way splits. If n is not divisible by two or three, we pad the polynomial with one or two zeros so that the sizes become divisible by three. The adjustment has a negligible effect on the complexity. Another note is that using the same algorithm in every recursion until the size becomes unity fails to produce the best results and that employing the schoolbook method after the size becomes small enough yields a better outcome. We recall that the schoolbook method requires n 2 multiplications and (n−1)2 additions in order to multiply two degree (n−1) polynomials leading to

$$ M_{3}(n+1)\leq M_{3}(n)+4n. $$
(19)

When we refer to the schoolbook method it is implied that we are computing M 3(n+1) in terms of M 3(n).

To demonstrate the effect of this approach, we can consider, for example, M 3(8). Using the improved Karatsuba method in each recursion gives

$$M\!_{3}(8)\,=\,3M_{3}(4)\,+\,25\,=\,9M_{3}(2)+58=27M_{3}(1)+94=123. $$

On the other hand, using the schoolbook method after n=4 gives

$$M_{3}(8)=3M_{3}(4)+25=3\cdot25+25=100. $$

We use the same strategy for the multiplication in \(\mathbb {F}_{9}[X]\). It can be observed that using the schoolbook method and the improved Karatsuba 2-way method together for multiplication in \(\mathbb {F}_{9}[X]\) yields better results for small values of n. Recalling M 9(1)=6 from Section 2 leads to an easy determination that the improved Karatsuba 2-way method for multiplication in \(\mathbb {F}_{9}[X]\) gives

$$ M_{9}(n)\leq 3M_{9}(n/2)+7n-6, $$
(20)

and that the schoolbook method in \(\mathbb {F}_{9}[X]\) gives

$$ M_{9}(n+1)\leq M_{9}(n)+16n+4. $$
(21)

As a further refinement, we use an additional strategy and proceed as follows: We split the degree (n−1) polynomials that will be multiplied into two parts by extracting ω part and ω-free part, i.e., we write \(A,B \in \mathbb {F}_{9}[X]\) as A = A 0 + A 1 ω and B = B 0 + B 1 ω where \(A_{0},A_{1},B_{0},B_{1} \in \mathbb {F}_{3}[X]\) of degree (n−1). Then

$$\begin{array}{@{}rcl@{}} AB&=&(A_{0}+A_{1}\omega)(B_{0}+B_{1}\omega)=A_{0}B_{0}-A_{1}B_{1}\\ &&+((A_{0}+A_{1})(B_{0}+B_{1})-A_{0}B_{0}-A_{1}B_{1}) \omega, \end{array} $$
(22)

i.e., M 9(n)≤3M 3(n)+8n−3 and M 9(3)≤60.

We are now ready to compute the arithmetic cost of multiplication for n=167, 193, 249, 317, and 353. We have employed the following abbreviations for the algorithm names:

  • KA for the improved Karatsuba 2-way in \(\mathbb {F}_{3}[X]\) as presented in Section 3.2

  • SB for the schoolbook method in \(\mathbb {F}_{3}[X]\)

  • K A 9 for the improved Karatsuba 2-way in \(\mathbb {F}_{9}[X]\) as given by Eq. 20

  • A19 for the new 3-way algorithm for multiplication in \(\mathbb {F}_{9}[X]\), as explained in Section 4

  • A29 for multiplication in \(\mathbb {F}_{9}[X]\) as given by Eq. 22

It should be noted that when we recursively use an algorithm A times, we write (A). The recursions listed in Tables 2 are also used in our hardware implementations.

Table 2 Complexities for polynomial multiplication over \(\mathbb {F}_{3}\) and \(\mathbb {F}_{9}\) for special n values.

An additional note is that the classical approach for polynomial multiplication over \(\mathbb {F}_{9}\) is the method expressed in Eq. 22, which has

$$ M_{9}(n)/ M_{3}(n)\approx 3. $$
(23)

Our proposed algorithms reduce this ratio to 2.57 or lower for the values indicated in Table 2, representing an improvement of about 15%.

For hardware implementation using digital technologies, we represent the elements of \(\mathbb {F}_{3}\) as (x 1,x 2) where x 1, x 2∈{0,1} and the elements of \(\mathbb {F}_{3}\), 0, 1, and 2 are represented by (0,0), (1,0) and (0,1), respectively. The addition of the elements of \(\mathbb {F}_{3}\) is performed using the method reported in [20]. Let (x 1,x 2), (y 1,y 2), \((z_{1},z_{2}) \in \mathbb {F}_{3}\) such that (x 1,x 2)+(y 1,y 2)=(z 1,z 2). The addition can be implemented as follows:

$$\begin{array}{@{}rcl@{}} &&{}t = (x_{1}\, | \, y_{2})\,\, \hat{} \,\,(x_{2} \, | \, y_{1});\\ &&{}z_{1}= (x_{2} \, | \, y_{2}) \, \hat{} \, t;\\ &&{}z_{2}= (x_{1}\, | \, y_{1}) \, \hat{} \, t; \end{array} $$

It should also be noted that the negation is as follows: 2(x 1,x 2)=−(x 1,x 2)=(x 2,x 1), i.e., multiplication of an element of \(\mathbb {F}_{3}\) by −1 is essentially free of cost.

Assume that (x 1,x 2), (y 1,y 2), \((z_{1},z_{2}) \in \mathbb {F}_{3}\) such that (x 1,x 2)(y 1,y 2)=(z 1,z 2). We use the following method for multiplication in \(\mathbb {F}_{3}\).

$$\begin{array}{@{}rcl@{}} z_{1} = (x_{1}\, \& \,y_{1})\,\, |\,\, (x_{2} \,\& \,y_{2});\\ z_{2} = (x_{1}\, \& \,y_{2}) \,\,| \,\,(x_{2} \,\& \,y_{1}); \end{array} $$

Now, we show the representation and operations for \(\mathbb {F}_{9}\). Recall that we represen

$$\mathbb{F}_{9} \cong \mathbb{F}_{3}[\omega]/(\omega^{2}+1)\,=\,\{0,1,2,\omega,\omega+1,\omega+2,2\omega,2\omega+1,2\omega+2\}.$$

The elements of \(\mathbb {F}_{9}\) are therefore represented by a 0 + a 1 ω, where \(a_{1},a_{2} \in \mathbb {F}_{3}\). For \(a,b,c,d \in \mathbb {F}_{3}\), the addition and multiplication in \(\mathbb {F}_{9}\) are:

$$\left\{ \begin{array}{l} (a+b\omega)+(c+d\omega)=(a+c)+(b+d)\omega,\\ (a+b\omega)(c+d\omega)=ac-bd+(bc+ad)\omega. \end{array} \right. $$

Since the elements of \(\mathbb {F}_{3}\) are each represented as a two-tuple, we need two two-tuples for representing the elements of \(\mathbb {F}_{9}\), i.e., for \(a=a_{0}+a_{1}\omega \in \mathbb {F}_{9}\), we have a=(a 0,1,a 0,2)+(a 1,1,a 1,2)ω or simply a=[(a 0,1,a 0,2),(a 1,1,a 1,2)], where the entries are from {0,1}.

Let \(a,b \in \mathbb {F}_{9}\) such that a=[(a 1,a 2),(a 3,a 4)] and b=[(b 1,b 2),(b 3,b 4)]. Then a + b is obtained as follows:

$$a+b=[\underbrace{(a_{1},a_{2})+(b_{1},b_{2})}_{\text{Call}~\mathbb{F}_{3}~\text{addition}},\underbrace{(a_{3},a_{4})+(b_{3},b_{4})}_{\text{Call}~\mathbb{F}_{3}~\text{addition}}]. $$

On the other hand, multiplication of a=[(a 1,a 2),(a 3,a 4)] and b=[(b 1,b 2),(b 3,b 4)] can be performed as follows:

$$\begin{array}{@{}rcl@{}} ab&=&[(a_{1},a_{2})(b_{1},b_{2})-(a_{3},a_{4})(b_{3},b_{4}),(a_{3},a_{4})(b_{1},b_{2})\\ &&+(a_{1},a_{2})(b_{3},b_{4})]. \end{array} $$

It should be noted that multiplication in \(\mathbb {F}_{9}\) requires multiplication, addition and negation in \(\mathbb {F}_{3}\).

We have implemented the polynomial multiplication of degree (n−1) for n=167, 239, 317, and 353. For each of these values, we have used the proposed algorithms for \(\mathbb {F}_{3}\) and \(\mathbb {F}_{9}\) for a number of recursions at the beginning and as the size of polynomials became smaller we have switched to other algorithms so that the overall arithmetic complexity could be kept at a minimum. Table 2 lists the sequence of algorithms used for each of the values of n. For example, for the multiplication of polynomials of degree 166 (or size 167) over \(\mathbb {F}_{3}\), KA is used for the first six recursions, and SB is then used for the multiplication of polynomials of degree two. For computing the multiplication of polynomials of degree 166 over \(\mathbb {F}_{9}\), we used A19 twice, K A 9 once, A29 once, KA once and SB five times.

We have implemented the proposed algorithms at the Register Transfer Level (RTL) using Verilog HDL. For each algorithm, the sequence of operations described in Table 2 has been realized as pure combinational circuits. As an example, a high level block diagram of our circuits for the multiplication algorithm A19 is given in Appendix. In our implementation using Verilog, the schoolbook multiplication, \(\mathbb {F}_{3}\) addition and \(\mathbb {F}_{9}\) addition have each been coded to be configurable so that they can be instantiated in the recursive tree simply by passing the size parameter. The gate level synthesis has been performed in a Synopsys Design Compiler Version E-2010.12 using the TSMC 65 nm standard cell library at the worst case corner. The synthesis has been targeted to optimize for the area. The total areas and critical path delays achieved in the post-synthesis simulation for a variety of values of n are listed in Table 3. We note that there seems to be no previous ASIC implementation of characteristic three polynomial or field multiplication algorithms for values of n similar to those reported in Table 3. Readers interested in FPGA implementation results for multiplication of elements over characteristic fields like \(\mathbb {F}_{3^{97}}\) are referred to [20].

Table 3 Implementation results for polynomial multiplication over \(\mathbb {F}_{3}\) and \(\mathbb {F}_{9}\) for special values of n.

6 Further Improvements

M 3(n) can be improved about 50% if we design a new algorithm as follows: We use the evaluation points 0,1,ω,−ω and \(\infty \). Then we have

$$\begin{array}{@{}rcl@{}} &&{}\text{Evaluation at}~X=0 \Longrightarrow P_{0}=A_{0}B_{0}=C_{0} \\ &&{}\text{Evaluation at}~X=1 \Longrightarrow P_{1}=(A_{0}+A_{1}+A_{2})\\ &&\times(B_{0}+B_{1}+B_{2})=C_{0}+C_{1}+{\cdots} +C_{4} \\ &&{}\text{Evaluation at}~X=\omega \Longrightarrow P_{2}=(A_{0}+A_{1}\omega-A_{2})\\ &&\times(B_{0}+B_{1}\omega-B_{2})=C_{0}+C_{1}\omega-{\cdots} +C_{4} \\ &&{}\text{Evaluation at}~X=-\omega \Longrightarrow P_{3}=(A_{0}-A_{1}\omega-A_{2})\\ &&\times(B_{0}-B_{1}\omega-B_{2})=C_{0}-C_{1}\omega+{\cdots} +C_{4} \\ &&{}\text{Evaluation at}~X=\infty \Longrightarrow P_{4}=A_{2}B_{2}=C_{4}. \end{array} $$

Let P 2 = P 2,0 + ω P 2,1 and P 3 = P 3,0 + ω P 3,1. It can be observed that P 2,0 = P 3,0 and P 2,1=−P 3,1, which shows that P 3 can be obtained from P 2, thus avoiding the requirement to compute P 3. The following formula and recursions can easily be obtained with the use of the method described previously in this section:

$$ \left\{\begin{array}{l} C_{0}=P_{0},\\ C_{1}=-P_{0}-P_{1}-P_{2,0}-P_{4}+P_{2,1}\omega,\\ C_{2}=-P_{0}-P_{2,0}+P_{4},\\ C_{3}=-P_{0}-P_{1}-P_{2,0}-P_{4}-P_{2,1}\omega,\\ C_{4}=P_{4}. \end{array}\right. $$
(24)
$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&3M_{\otimes}(n/3)+M_{9,\otimes}(n/3), \, M_{3,\otimes}(1)=1,\\ M_{3,\oplus}(n)&\leq&3M_{3,\oplus}(n/3)+M_{9,\oplus}(n/3)+14n/3-6, \, M_{3,\oplus}(1)=0,\\ M_{3}(n)&\leq&3M_{3}(n/3)+M_{9}(n/3)+14n/3-6, \, M_{3}(1)=6,\\ D_{3}(n)&\leq &D_{9}(n/3)+7D_{\oplus}. \end{array} \right. $$
(25)
$$ \left\{ \begin{array}{lcl} M_{3,\otimes}(n)&\leq&2n^{\log_{3}5}-n, \\ M_{3,\oplus}(n)&\leq&13n^{\log_{3}5}-4.85n{\log_{3}n}-13n,\\ M_{3}(n)&\leq&15n^{\log_{3}5}-4.85n{\log_{3}n}-14n,\\ D_{3}(n)&\leq&(7\log_{3}n+1)D_{\oplus}+D_{\otimes}. \end{array} \right. $$
(26)

The complexity of each algorithm is summarized in Table 4. It should be noted that the new 3-way algorithm outperforms the improved Karatsuba algorithm when n>400.

Table 4 Complexities of the different approaches for multiplication over \(\mathbb {F}_{3}\).

Example 1

This example illustrates our design of an area efficient algorithm for n=709. Using the recursion in Eq. 25 and the values for M 3(239), M 9(239) and M 3(355) from Table 2 yields the following results:

$$\begin{array}{@{}rcl@{}} M_{3}(709)&<&M_{3}(717)\leq 3M_{3}(239)\\&&+M_{9}(239)+14\cdot 239-6=191870. \end{array} $$

On the other hand, improved Karatsuba gives

$$M_{3}(709)<M_{3}(710)\leq 3M_{3}(355)+7\cdot355-3. $$

Even if we use M 3(355) = M 3(353)≤67761 from Table 2, we get

$$M_{3}(709)\leq 205765. $$

The improved Karatsuba thus yields M 3(709)≤205765 while the new algorithm results in M 3(709)≤191870, i.e., an approximately 7% reduction in the complexity

Remark 3

The classical approach for performing multiplication in \(\mathbb {F}_{3^{2n}}\) is to use the Karatsuba algorithm in the first recursion so that the complexity becomes approximately M 3(2n)≤3M 3(n). The other possible method relies on the extension field representation of the elements based on the use of \(\mathbb {F}_{3^{2n}} \cong \mathbb {F}_{9^{n}}\). The elements of \(\mathbb {F}_{3^{2n}}\) can then be represented by the polynomials over \(\mathbb {F}_{9}\) of degree less than n. This method requires approximately M 3(2n)≤M 9(n). Recall that M 9(n)≈2.5M 3(n) (see Table 2). The use of the 3-way algorithm for multiplication of polynomials over \(\mathbb {F}_{9}\) is therefore superior to the classical approach by about 15%.

7 Conclusion

In this paper, we have proposed improved algorithms for multiplication in \(\mathbb {F}_{3^{n}}\). As a first step, we introduced improvements to the classical Karatsuba algorithm, which can also be employed for characteristic three fields, and we also indicated the computational cost of the improved Karatsuba 2-way and 3-way algorithms. Next, we explained our derivation of a new 3-way polynomial multiplication algorithm with five 1/3 sized multiplications using interpolation in \(\mathbb {F}_{9}\) and determined the arithmetic and delay complexity associated with the recursive use of this algorithm. We then described ASIC implementation of multiplication of polynomials that are of practical interest. The final contribution of this work is another efficient algorithm for multiplication in \(\mathbb {F}_{3^{n}}\) that uses polynomial multiplication over \(\mathbb {F}_{9}\) and produces superior results. This algorithm leads to about 15% reduction in terms of the number of basic \(\mathbb {F}_{3}\) operations needed for fields considered in Section 5.