Keywords

1 Introduction

Elliptic curve cryptography (ECC) can ensure an equivalent security with much smaller key sizes. Hence, ECC has been implemented in secure Internet-of-Things (IoT) devices  [1] and various blockchain applications. Elliptic curve scalar multiplication (ECSM) is a fundamental computation used in ECC. It is therefore important to construct a secure and efficient ECSM. Studies on secure and efficient ECSM algorithms can be divided into two categories. The first direction is to find secure and efficient scalar multiplication algorithms [9,10,11, 13, 14]. The second direction is to find secure and efficient coordinates with addition formulae [3, 6, 7, 15, 17]. This paper concentrates on resisting simple power attack (SPA) and safe error attack (SEA). SPA makes use of “if statements” and SEA makes use of “dummy statements” to reveal significant bits of input scalars. Thus, secure ECSM algorithms should exclude conditional and dummy statements. Elliptic curve CA formulae [7, 15, 17] can achieve secure ECSM algorithms but are inefficient in terms of memory and computational costs. Another secure ECSM, which uses (extended) affine, is more efficient for both memory and computational costs  [8]. However, it scans input scalars from right to left (RL).

In this paper, we propose secure and compact left-to-right (LR) ECSM algorithms based on affine coordinates. We improve Joye’s LR 2-ary algorithm to exclude exceptional computations of affine formulae and extended affine formulae [8]. We propose secure scalar multiplication algorithms, Algorithm 7 and Algorithm 8 through 2-bit scanning using the affine double and quadruple algorithm (DQ-algorithm) [12]. Subsequently, along with applying the idea of a Montgomery trick [4, 12], we revise three affine combination-addition formulae, which reduce the number of inversion computations to one during all computations. We combine Algorithm 7 and Algorithm 8 with affine combination-addition algorithms and modify Algorithm 8 to Algorithm 9 with our affine combination-addition algorithm (Algorithm 6).

For memory, (Algorithm 7) and (Algorithm 7 with Algorithm 2) use the least amount of memory for ten field elements, reducing that of Joye’s LR with CA formulae by \(37.5\%\) and that of Joye’s RL with CA formulae by \(47.37\%\). For computational cost, we evaluate all ECSMs by estimating the number of modulo multiplication (M), modulo square (S), and inversion (I). Modulo multiplication with parameters a and b (\(m_{a}\) and \(m_{b}\)) and modulo addition (A) are omitted. In many cases, such as the National Institute of Standard and Technology (NIST) elliptic curves, we can only omit \(m_{a}\) and A. Then, our ECSMs of (Algorithm 7 with (extended) affine), (Algorithm 7 with (extended) affine and Algorithm 2), and (Algorithm 9 with (extended) affine and Algorithm 6) can be the most efficient during a larger interval of \(\frac{I}{M} \le \frac{26.8-54/\ell }{1+17/\ell }\) (24.93 when bit length \(\ell =256)\) compared to Joye’s LR with CA formulae. Experiments also show that our new LR ECSM algorithms can reduce the computational time by more than 40% compared to Joye’s LR with CA formulae.

The remainder of this paper is organized as follows. Related studies are provided in Sect. 2. Our proposed algorithms are described in Sect. 3. In Sect. 4, we analyze our Algorithms 7–9 with (extended) affine and affine combination-addition algorithms (Algorithms 2, 3, 4, 5 and 6) from the theoretical and experimental perspectives. Finally, we conclude our work in Sect. 5.

2 Related Work

ECSM algorithms consist of two parts: scalar multiplication algorithms and elliptic curve addition formulae. Thus, related studies on secure and efficient ECSM algorithms can be divided into two categories: scalar multiplication algorithms [9,10,11, 13, 14] and elliptic curve coordinates with addition formulae [3, 7, 15, 17]. We briefly introduce related studies in this section.

$$\begin{aligned} \hbox {affine addition formula} \, (P\ne \pm Q) \nonumber \\ \begin{aligned} x_{3}=\left( \frac{y_{2}-y_{1}}{x_{2}-x_{1}}\right) ^{2}-x_{1}-x_{2}\\ y_{3}=\left( \frac{y_{2}-y_{1}}{x_{2}-x_{1}}\right) (x_{1}-x_{3})-y_{1} \end{aligned} \end{aligned}$$
(1)
$$\begin{aligned} \hbox {affine doubling formula} \, (2P\ne \mathcal {O}) \nonumber \\ \begin{aligned} x_{3}=\left( \frac{3x_{1}^{2}+a}{2y_{1}}\right) ^{2}-2x_{1}\\ y_{3}=\left( \frac{3x_{1}^{2}+a}{2y_{1}}\right) (x_{1}-x_{3})-y_{1} \end{aligned} \end{aligned}$$
(2)

2.1 Addition Formulae and Exceptional Computations

Let \(E({\mathbb {F}}_{p})\) be a Weierstrass elliptic curve over \({\mathbb {F}}_{p}\), \(E({\mathbb {F}}_{p}):y^{2}=x^{3}+ax+b\), \((a, b \in {\mathbb {F}}_{p})\). Then affine coordinates compute addition and doubling as Eqs. 1 and 2. A point at infinity cannot be represented clearly by the affine coordinates. Thus, \(\mathcal {O}+P\), \(P+Q=\mathcal {O}\) and \(2P=\mathcal {O}\), which cannot be computed correctly by the affine addition formulae, are so-called exceptional computations of affine addition formulae. In addition, \(P+P\) becomes an exceptional computation of the affine addition formula. In summary, \(\mathcal {O}+P\), \(P+Q=\mathcal {O}\) and \(P+P\) are exceptional computations of the affine addition formula and \(2P=\mathcal {O}\) is the exceptional computation of the affine doubling formula. Similarly, \(\mathcal {O}+P\) and \(P+P\) are exceptional computations of Jacobian and projective addition formula. To reduce exceptional computations from affine coordinates, extended affine coordinates assign (0, 0) as the point at infinity for elliptic curves without point (0, 0), such as prime order elliptic curves  [8]. Using the extended affine addition formulae, \(P+Q=\mathcal {O}\) and \(2P=\mathcal {O}\) can be computed as (0, 0), which is exactly the point at infinity. Both \(\mathcal {O}+P\) and \(P+P\) are still exceptional computations of the extended affine addition formula. We use (extended) affine to indicate the mixed use of the original affine addition and the extended affine addition.

The complete addition (CA) formulae of an elliptic curve  [15] can be used to compute the addition of any elliptic curve point pair, and thus, can be employed to secure scalar multiplication algorithms without introducing conditional statements to process exceptional computations. However, CA formulae are inefficient in terms of memory and computational costs. In addition, note that they only work for prime order elliptic curves.

Table 1 summarizes the computational cost of the elliptic curve addition formulae, where M, S, I, and A are the computational costs for one field multiplication, square, inversion, and addition, respectively. Further, \(m_{a}\) and \(m_{b}\) are the computational costs for one field multiplication with parameters a and b, respectively. Assuming that \(S = 0.8M\), and ignoring the computational costs of \(m_{a}\), \(m_{b}\), and A, the computational cost of one elliptic curve addition (ADD) and one elliptic curve doubling (DBL) using the CA formulae is 24M in total. Subsequently, the computational cost of the ADD and DBL using affine addition formulae is more efficient than those using the CA formulae, Jacobian addition formulae, or projective addition formulae when \(I<8.8M\), \(I<8M\) or \(I<9.1M\), respectively. If we employ these addition formulae on NIST elliptic curves, where \(a=-3\) and the computational cost \(m_{b}\) cannot be ignored, the computational cost of ADD and DBL using the CA formulae is 26.4M in total. The computational cost of ADD and DBL using affine addition formulae is more efficient when \(I<10M\).

Table 1. Computational cost of elliptic curve addition formulae

2.2 Scalar Multiplication Algorithms

SCA has several attack methods to reveal significant bits of input scalars: a simple power analysis (SPA), which makes use of conditional statements applied during an algorithm depending on the data being processed; a differential power analysis (DPA), which uses the correlation between the power consumption and specific key-dependent bits; a timing attack, which uses the relation between the implementation time and the bits of the scalars; and a safe error attack (SEA), which uses dummy statements [2, 5]. Therefore, to resist these attacks, we need to eliminate conditional statements in the ECSM for the SPA, the relation between the implementation time and the input scalars for the timing attack, and dummy statements for the SEA. In addition, the power consumption should be changed at each new execution for the DPA. Note that countermeasure to timing attack is taken by padding ‘0’s in front of the input scalars to make certain that almost the same execution time can be easy employed in our algorithms  [16]. In this paper, we focus on the SPA and SEA.

Regarding secure ECSM resisting SPA and SEA, three properties, namely, the generality of k, secure generality, and executable coordinates are defined in  [8]. In their paper, the authors evaluated secure ECSM focusing on RL scalar multiplication algorithms. We can evaluate Joye’s regular 2-ary LR algorithm (Algorithm 1) in the same way as shown in Theorem 1.

figure a

Theorem 1

Joye’s regular 2-ary LR algorithm satisfies the generality of k and the secure generality. Coordinates with CA formulae are its executable coordinates. Affine and Jacobian coordinates are not executable coordinates of this algorithm.

New (two-bit) 2-ary RL scalar multiplication algorithms by improving Joye’s regular 2-ary RL algorithm to make (extended) affine be executable coordinates for it are proposed in [8]. In fact, Joye’s regular 2-ary LR algorithm uses two fewer memories than Joye’s regular 2-ary RL algorithm. We would like to employ the same idea to improve Joye’s regular LR algorithm to use the (extended) affine as executable coordinates.

2.3 Inversion-Reduction Combination-Addition Formulae

We can compute any two or more inversions of the field elements using only a single inversion by applying the Montgomery trick. The computational cost of nI becomes \(3(n-1)M+I\), which is more efficient when \(I>3M\). Using this method, Eisentrager et al. proposed an affine doubling and addition algorithm (DA-algorithm), computing \(2P+Q\) as \(P+Q+P\) with \(P(x_{1}, y_{1})\), \(Q(x_{2}, y_{2})\) using the following formulae  [4]:

$$\begin{aligned} x_{3}=\lambda _{1}^{2}-x_{1}-x_{2}, y_{3}=\lambda _{1}(x_{1}-x_{3})-y_{1}, \lambda _{1}=\left( \frac{y_{2}-y_{1}}{x_{2}-x_{1}}\right) \end{aligned}$$
(3)
$$\begin{aligned} x_{4}=(\lambda _{2}-\lambda _{1})(\lambda _{2}+\lambda _{1})+x_{2}, y_{4}=\lambda _{2}(x_{1}-x_{4})-y_{1}, \lambda _{2}=-\lambda _{1}-\frac{2y_{1}}{x_{3}-x_{1}} \end{aligned}$$
(4)

DA-algorithm computes an inversion of \((x_{2}-x_{1})^{3}(x_{3}-x_{1})=(x_{2}-x_{1})(y_{2}-y_{1})^{2}-(2x_{1}+x_{2})(x_{2}-x_{1})^{3}\) first and then computes inversions of \((x_{2}-x_{1})\) and \((x_{3}-x_{1})\). The result \((x_{4}, y_{4})\) can be computed without computing \((x_{3}, y_{3})\) completely. The computational cost is \(9M+2S+I\).

Le and Nguyen proposed an affine double and quadruple algorithm (DQ-algorithm)  [12]. Their algorithm can compute both 2P and 4P simultaneously with only one inversion computation. The computational cost is \(8M+8S+I\). Its memory use is improved to 11 field elements, as described in [8].

3 Secure and Efficient LR-ECSM Algorithms

Algorithm 1 satisfies the generality of k and the secure generality, and uses two fewer memories than Joye’s regular 2-ary RL algorithm. The affine addition formulae save memory and are efficient depending on the ratio of inversion and multiplication costs but are not the executive coordinates of Algorithm 1. In the case of Joye’s regular 2-ary RL algorithm, accelerated version with (extended) affine is proposed in  [8]. However, the authors failed to apply them to Algorithm 1. With the advantages of Algorithm 1 and affine coordinates, in this section, we describe the improvement of Algorithm 1 to adapt (extended) affine.

First, we revise affine combination-addition formulae using the Montgomery trick, which are used in our new LR ECSM to enhance the efficiency. We then propose our new LR scalar multiplication algorithms with (extended) affine and prove their security.

3.1 Affine Combination-Addition Formulae

In this section, using the Montgomery trick, we revise several affine combination-addition formulae, which are used in our LR ECSM Algorithms 7, 8 and 9. Table 2 shows our affine combination-addition formulae together with the previous formulae. When using the Montgomery trick to reduce the inversion cost, inverses needed to be computed depend on each other. Thus, it is not straightforward to apply the Montgomery trick. First, we improve DQ-algorithm  [8, 12] to optimize the use of memory as ten field elements in Algorithm 3, which saves one memory from  [8].

figure b

DA, the computation of \(2P+Q\), is a basic computation formulae in the main loop of Algorithm 1. DA-algorithm is not described in detail, and it is thus unclear how much memory is required  [4]. We specify the DA-algorithm to optimize the use of memory as seven field elements in Algorithm 2. DA-algorithm in  [4] has exceptional inputs of \(P+Q=\mathcal {O}\) and \(P=Q\), where P and Q have the same x-coordinate. If an exceptional input \(P+Q=\mathcal {O}\) or \(P=Q\) is computed using Algorithm 2, \(-P\) is its output. That’s DA(P, Q)\( \rightarrow -P\), where \(P+Q=\mathcal {O}\) or \(P=Q\). If we make use of Algorithm 2 twice with exceptional input \(P+Q=\mathcal {O}\) (for example \(P=(x,y)\) and \(Q=-P=(x,-y)\)), then Algorithm 2 outputs \(P \leftarrow \) DA(P, Q) \(=-P\) first, whereas \(2P+Q=P\). Next, input the updated P and the original Q into the algorithm again, then Algorithm 2 outputs \(P \leftarrow \) DA(\(-P\), Q) \(=P\). Thus, Algorithm 2 outputs DA(DA(P, Q), Q), where \(P+Q=\mathcal {O}\), correctly. This is why we can not directly use Algorithm 2 in our Algorithm 7. In Algorithms 2 and 3, inversions are computed in the same way as in the extended affine formulae  [8], which means an inversion of zero is computed as zero.

figure c
Table 2. Comparison of affine combination-addition algorithms

Our double-add algorithm (Algorithm 4) computes 2P followed by \(2P+Q\) instead of first computing \(P+Q\), and then \((P+Q)+P\) in Algorithm 2. Algorithm 4 can correctly compute \(2P+Q\) if \(2P \ne \mathcal {O}\), \(2P \ne Q\), \(2P+Q \ne \mathcal {O}\) and \(Q \ne \mathcal {O}\). Note that Algorithm 4 can compute \(2P+Q\) correctly even when \(P+Q=\mathcal {O}\) or \(P=Q\), which cannot be computed correctly by Algorithm 2. Algorithm 4 can be used in Algorithm 7 without exceptional inputs. The memory usage is eight field elements and the computational cost is \(9M+5S+I\). Next, Algorithm 5 combines two continuous affine additions into a single unit, which can compute \(P+Q+R\) correctly if \(P+Q \ne \mathcal {O}\), \(P+Q+R \ne \mathcal {O}\), \(P \ne Q\), \(P+Q \ne R\) and \(P, Q, R \ne \mathcal {O}\), used in Algorithm 8. The memory usage is ten field elements and the computational cost is \(9M+4S+I\). Finally, our affine quadruple and addition algorithm (Algorithm 6) combines the computations in the main loop of our extended new two-bit 2-ary LR algorithm (Algorithm 9), \(4P+Q\). In general, to obtain \(4P+Q\), we need to compute \(P \leftarrow 2P\), \(P \leftarrow 2P\), and \(P \leftarrow P+Q\), which cost 3I. Algorithm 6 can compute \(4P+Q\) correctly if \(4P+Q \ne \mathcal {O}\), \(4P \ne Q\) and \(P, 2P, 4P, Q \ne \mathcal {O}\). The memory usage is ten field elements and the computational cost is \(18M+14S+I\).

3.2 Secure and Efficient LR Scalar Multiplication

We improve Algorithm 1 to a new 2-ary LR algorithm (Algorithm 7) and a new two-bit 2-ary LR algorithm (Algorithm 8). We then combine Algorithms 7 and 8 with affine combination-addition algorithms to reduce the inversion computations.

Algorithm 8 uses two-bit scanning, which is different from Algorithm 7. Algorithm 8 adjusts the length of the input scalars k including the sign bit to be even by padding ‘0’s after the sign bit of input scalarsFootnote 1. Therefore, two-bit scanning can operate well for both even and odd lengths of input scalars k.

figure d

Both Algorithms 7 and 8 assume that \(k \in \mathbb {Z}/N\mathbb {Z}\) is in \(k \in [-\frac{N}{2},\frac{N}{2}]\), which ensures that our algorithms exclude exceptional computations as shown in Theorem 2. Then, k is represented by \(k=(-1)^{k_{\ell }}\sum _{i=0}^{\ell -1}k_{i}2^{i}\) \((k_{i} \in \{0,1\})\), where \(k_{\ell } \in \{0,1\}\) is the sign bit and \(0 \le |k| \le \frac{N}{2}\). Algorithms 7 and 8 consist of three parts: initialization, a main loop, and a final correction. Compared with Algorithm 1, we change the initialization of R[.] and A to avoid the exceptional initialization of \(A \leftarrow \mathcal {O}\) when \(k_{\ell -1} = 1\) and the exceptional computations of \(2\mathcal {O}+P\), \(2\mathcal {O}+2P\), and \(-2P+2P\) in the main loop. Our initializations of R[.] and A cause 4P, or 3P, to be added to the final result when \(k_{0} = 0\), or \(k_{0} = 1\), respectively. We correct this in Steps 7 and 8 of the final correction of Algorithm 7 and Algorithm 8, and thus, avoid the exceptional computation, \(A \leftarrow A+R[1]\), in the original final correction of Algorithm 1. Remark that the extended affine is used only once in Step 8 of Algorithm 7 and Algorithm 8. Actually, the extended affine is only necessary for \(k=0\). If \(k=0\) is excluded from the input, then only the original affine can work well.

As for further reduction of inversion computations, Algorithms 2 or 4 can be applied in Steps 5 and 7 of Algorithm 7, respectively. Although Algorithm 2 cannot be directly employed in Algorithm 7 when \(k_{\ell -1}=0\), twice use of of Algorithm 2 to exceptional inputs \(A+R[k_{i}]=\mathcal {O}\) outputs a correct result \(A \leftarrow 2P\). Thus, in Algorithm 7, using Algorithm 2, we can make sure it can compute correctly by adjusting the number of ‘0’s from \(k_{\ell -1}\) until the first bit ‘1’ on the left to be even, by padding ‘0’s after the sign bit of k. Algorithm 3 is applied in Step 4 of Algorithm 8. Algorithm 5 is applied in Step 5 of Algorithm 8.

Our Algorithms 7 and 8 satisfy the generality of k as well as the secure generality, and the affine coordinates are executable coordinates for them. Algorithms 7 and 8 have no conditional or dummy statements. (Extended) affine and affine combination-addition algorithms can be employed without introducing conditional statements, which will be given in the final paper. Thus, Algorithms 7 and 8 with (extended) affine and affine combination-addition are secure ECSM algorithms.

figure e

Theorem 2

Let \(E({\mathbb {F}}_{p})\) be an elliptic curve without two-torsion points. Let \(P \in E({\mathbb {F}}_{p})\), \(P \ne \mathcal {O}\) be an elliptic curve point, whose order is \(N>4\). Then, Algorithms 7 and 8 using (extended) affine and affine combination-addition formulae can compute kP correctly without exceptional computations for any input \(k \in [-\frac{N}{2}, \frac{N}{2}]\).

Algorithm 7 combined with Algorithm 2 (or Algorithm 4) can reduce the inversions from two-times to one-time in the main loop. By contrast, Algorithm 8 computes two inversions in the main loop. To reduce inversions to one-time in the main loop of Algorithm 8, we propose an extended new two-bit 2-ary LR algorithm (Algorithm 9), where our Quadruple-Add algorithm (Algorithm 6) is used in the main loop of Algorithm 9.

In Algorithm 9, we initialize \(R[0]=-6P\), \(R[1]=-5P\), \(R[2]=-4P\), \(R[3]=-3P\), \(A=2P\), which can be computed without exceptional computations if \(N > 6\). Because of initialization, the condition of \(N>4\) is changed to \(N>6\). We compute R[0] and R[1] back to \(-2P\) and \(-P\) in Step 6 of Algorithm 9 to what they were in Algorithm 8 to make sure the remaining part of the final correction of Algorithm 9 can be computed correctly. Algorithm 9 has the generality of k and the secure generality and avoids all exceptional computations of the affine formulae when \(k \in [-\frac{N}{2},\frac{N}{2}]\), similar to Algorithm 8. The proof of Theorems 2 can be easy extended to Algorithm 9.

4 Efficiency and Memory Analysis

4.1 Theoretical Analysis

Table 3. Computational cost and memory cost analysis
Table 4. The most efficient algorithm with the conditions of \(r=\frac{I}{M}\) (\(m_{a}=m_{b}=A=0\), \(\ell \) is bit length of k)

We analyzed the computational and memory costs of Algorithms 7–9 with (extended) affine and affine combination-addition algorithms in comparison with the Algorithm 1 with CA formulae, Joye’s regular 2-ary RL algorithm with CA formulae, and two RL algorithms  [8], the results of which are shown in Table 3.

The memory cost considers the number of \({\mathbb {F}}_{p}\) elements, including the memory used in the scalar multiplication algorithms. For the computational cost, we evaluated all algorithms by estimating the number of modulo multiplications (M), modulo squaring (S), multiplications with parameters a and b (\(m_{a}\) and \(m_{b}\)), additions (A), and inversions (I). We assume that \(S=0.8M\), and that \(\ell \) is the length of the input scalar k. Let us describe the ratio of inversion cost to the multiplication cost by r, i.e., \(I=rM\).

The total computational cost of Algorithm 1 with CA formulae is \(24\ell M\), and that of Joye’s RL with CA formulae is \((\ell +1)24M\), if we ignore the computational costs of \(m_{a}\), \(m_{b}\), and A. Therefore, Algorithm 1 with CA formulae is more efficient than Joye’s regular 2-ary RL algorithm with CA formulae and uses less memory. Both the memory and computational costs of Algorithm 7 with Algorithm 2 are less than those of Algorithm 7 with Algorithm 4. However, as stated earlier, Algorithm 2 can be applied to Algorithm 7 only when the number of ‘0’s between the sign bit and the first bit ‘1’ on the left is even. Packaging all computations of the main loop as a single computation unit reduces the inversion computations, and we can see that Algorithm 9 with (extended) affine and Algorithm 6 has a computational cost of \((14.6\ell +27)M+(\frac{\ell +17}{2})I\), which is the best when \(\frac{8+33.2/\ell }{1-13/\ell } \le r \le \frac{18.8-54/\ell }{1+17/\ell }\). Therefore, if the ratio r is approximately 11, then Algorithm 9 with (extended) affine and Algorithm 6 is the most efficient approach. Its memory usage is costly but less than that of Joye’s RL with CA formulae. Table 4 shows the most efficient ECSM algorithm with the ratio \(r=\frac{I}{M}\). Note that the conditions do not change according to the size of the scalar \(\ell \). In numerous cases, such as the NIST elliptic curves in Table 5, we can only assume that \(m_{a}=A=0\). The interval of r where our algorithms are more efficient is larger.

Table 5. NIST elliptic curves(\(y^2 = x^3-3x+c\))

Regarding the memory cost, Algorithm 7 uses the least amount of memory of ten field elements, which reduces that of Algorithm 1 with CA formulae by \(37.5\%\).

4.2 Experimental Analysis

We implemented all algorithms listed in Table 3 on NIST P-224, P-256, and P-384. Table 5 shows their comparison. We randomly generated \(2 \times 10^5\) test scalars during the interval of \([-\frac{N}{2},\frac{N}{2}]\), where N is the order of point P used to measure the average scalar multiplication time of the algorithms. The experimental platform uses C programming language with GUN MP 6.1.2, which is a multiple precision arithmetic library, and Intel (R) Core (TM) i7-8650U CPU @ 1.90 GHz 2.11 GHz personal computer with 16.0 GB RAM 64-bit; the operating system is Windows 10. We turn off Intel turbo boost, which is Intel’s technique that automatically raises certain of its processors’ operating frequency, and thus performance, when demanding tasks are running to make sure our computer works at 2.11 GHz.

Table 6. Average computation time for one scalar multiplication (milliseconds)
Table 7. Time of fundamental computations of GUN MP (milliseconds)

Table 6 shows the average scalar multiplication time. Table 6 shows that Algorithm 7 with Algorithm 2 is the most efficient for NIST P-256 and P-384, which reduces the computation time of Joye’s RL with CA by \(46.56\%\) for P-224, \(44.38\%\) for P-256, and \(38.88\%\) for P-384, and the computation time of Algorithm 1 with CA by \(45.5\%\) for P-224, \(44.36\%\) for P-256, and \(39.39\%\) for P-384, and the computation time of two-bit 2-ary RL  [8] by \(16.11\%\) for P-224, \(16.22\%\) for P-256, and \(23.2\%\) for P-384. Algorithm 7 with Algorithm 2 uses the least amount of memory of ten field elements. Algorithm 9 with (extended) affine and Algorithm 6 is the most efficient for NIST P-224.

As we previously discussed, the efficiency of our algorithms depends on the ratio \(r=\frac{I}{M}\). Algorithm 7 with (extended) affine and Algorithm 2 is the most efficient when applied to P-256 and P-384 during our experiments. The ratio \(r=\frac{I}{M}\) in the GUN MP library is 4.56714 and 6.22689, respectively, as shown in Table 7. These implementation results reflect the theoretic analysis in Table 4. Algorithm 9 with (extended) affine and Algorithm 6 is the most efficient when applied to P-224, where the ratio \(\frac{I}{M}\) is 4.08232. By contrast, Algorithm 7 with (extended) affine is the most efficient, as indicated in Table 4. For the implementation time, both function calls and the number of loops in an algorithm cost time according to the compiler. Algorithm 9 has much fewer function calls and loops than the other algorithms, which may save time.

5 Conclusion

We improved the affine combination-addition formulae of double-add (DA), double-quadruple (DQ), two-adds (TA), and quadruple-add (QA) in terms of the memory or computational cost. We also proposed three new secure LR scalar multiplication Algorithms 7, 8 and 9, and we proved that our new LR ECSM algorithms satisfy the generality of k and the secure generality; in addition, they can exclude exceptional computations of \(\mathcal {O}+P\), \(P+Q=\mathcal {O}\), and \(P+P\), which means the affine coordinates are executable coordinates for them.

We analyzed our LR scalar multiplication algorithms with (extended) affine and affine combination-addition formulae from the theoretical perspective. In many cases, such as with NIST elliptic curves, we can only omit the computational cost of \(m_{a}\) and A. In this case, our algorithms of Algorithm 7 with (extended) affine, Algorithm 7 with (extended) affine and Algorithm 2, and Algorithm 9 with (extended) affine and Algorithm 6 are the most efficient when \(\frac{I}{M} \le \frac{26.8-54/\ell }{1+17/\ell }\) (24.93 at \(\ell =256)\) compared to Algorithm 1 with CA formulae.

We also analyzed the algorithms from an experimental perspective. Algorithm 7 with (extended) affine and Algorithm 2 achieves a high efficiency. Algorithm 7 with (extended) affine and Algorithm 2 uses the least memory of ten field elements, which reduces the memory requirements of Algorithm 1 with CA formulae by \(37.5\%\).