1 Introduction

The first release of a cryptographic standard specifying elliptic curves for use in practice dates back to 2000 [21]. Nowadays, roughly one out of ten systems on the publicly observable Internet offers cipher suites in the Secure Shell (SSH) and Transport Layer Security (TLS) protocols that contain elliptic-curve-based cryptographic algorithms [16]. Most elliptic curve standards recommend curves for different perceived security levels that are either defined over prime fields or binary extension fields; on the Internet, however, the deployed curves are mostly defined over prime fields [16]. This can be partially explained by the increasing skepticism towards the security of elliptic curves defined over binary extension fields (justified by recent progress on solving the discrete logarithm problem on such curves [26]). Therefore, in this work, we only consider elliptic curves defined over prime fields.

Recently, part of the cryptographic community has been looking for alternatives to the currently deployed elliptic curves that may offer better performance and provide stronger overall security (see for example an evaluation of recent curve candidates in [12]). Most notably, the TLS working group has issued a formal request to the Crypto Forum Research Group (CFRG) asking for recommendations for new elliptic curves. The urge to change curves has been fueled by the recently leaked NSA documents, which suggest the existence of a back door in the Dual Elliptic Curve Deterministic Random Bit Generator [55]. Although cryptographers have suspected this at least as early as in 2007 [52], these recent revelations have accelerated a controversy on whether the widely deployed NIST curves [57] should be replaced by curves with a verifiably deterministic generation. Besides such security concerns, there has been significant progress related to both efficiency and security since the initial standardization of elliptic curve cryptography. Notable examples are algorithms protected against certain side-channel attacks, different “special” prime shapes which allow faster modular arithmetic, and a larger set of curve models from which to choose. For example, Edwards [25] discovered an interesting normal form for elliptic curves, now called the Edwards model, which was introduced to cryptographic applications by Bernstein and Lange [11]. A generalization of this curve model, known as the twisted Edwards model [7], facilitates the most efficient curve arithmetic [35]. Such (twisted) Edwards curves also have other attractive properties: they may be selected to support a complete addition law and are compatible with the Montgomery model, which supports efficient Montgomery ladder computations [47]. However, twisted Edwards curves cannot have a prime number of rational points over the base field, and they are therefore incompatible with the prime-order Weierstrass curves used in all of the current cryptographic standards [21, 48, 57].

Related work

The NIST curves [57] have been included in numerous standards (e.g., [21, 48]) and are deployed in many security protocols. The most recent speed record on the NIST curve which aims to provide 128-bit security is due to Gueron and Krasnov [31]. Alternatives to the NIST curves have been suggested by the German working group Brainpool [24]; their curve choices followed additional security requirements, one of which demands verifiably pseudo-random curve generation. Another alternative curve has been proposed by Bernstein [5]; this is a Montgomery curve, called Curve25519, which allows efficient computation of ECDH using the Montgomery ladder at the 128-bit security level. It was later shown by Bernstein et al. [9] that a twisted Edwards curve, birationally equivalent to Curve25519, can be used for efficient elliptic curve signature generation and verification. Recently, Bernstein and Lange started a project to select and analyze secure elliptic curves for use in cryptography: see [12] for a list of the security assessments the project performs and the requirements it imposes. A range of curves, targeting different security levels, is also presented in [12]. Following this, several new curves satisfying the requirements from [12], which facilitate both the twisted Edwards and Montgomery form, were proposed by Aranha et al. [3].

Motivation and rationale

The new curves presented in [3, 12] are all efficient and secure elliptic curves ready to be used in cryptography. This prompts the question as to why we should perform an efficiency and security analysis for a set of new curves. It is our opinion that not all options for prime fields and elliptic curve models have been considered in the recent curve proposal projects (either because they are overlooked or do not fit the requirements set by the project). Our goal is to rigorously analyze all of these different aspects from both a security and efficiency perspective, in hope that this paper helps practitioners better understand (and correctly implement) the choices that lie in front of them. Abandoning a set of standard curves demands a judicious selection of new curves, since this cannot be done too frequently if widespread adoption is desired. In that light, it is our opinion that one should consider all of the options available. For example, in contrast to [3, 12], our selection includes prime order Weierstrass curves. Just as the almost-prime order twisted Edwards curves have their practical advantages, we argue that there are also benefits to choosing prime order Weierstrass curves: the absence of small torsion simplifies the point/input validation process, and (over a prime field of fixed length) does not sacrifice any bits of security with respect to attacks on the underlying elliptic curve discrete logarithm problem (ECDLP). In addition, such curves are backwards compatible with current implementations supporting NIST curves over prime fields (i.e., no changes are required in protocols), and could be integrated into existing implementations by simply changing the curve constant and (in some cases) field arithmetic.Footnote 1

We investigate the selection of prime moduli that allow efficient modular arithmetic. As in [3, 5, 12, 15, 35, 41], we study pseudo-Mersenne primes of the form \(2^{\alpha }-\gamma \), but also primes of the form \(2^{\alpha }(2^{\beta }-\gamma )-1\) that can be used to accelerate Montgomery arithmetic [46] as used in [15, 32]. Following the deterministic selection requirement from [12], we pick two primes of each shape for a given targeted security level: one prime is selected to be slightly smaller than the other, which sacrifices a small amount of ECDLP security in favor of enhanced performance. Note that, as explained in Sect. 2, for practical considerations we require all primes to be congruent to \(3\) modulo \(4\). These primes are used to construct cryptographically suitable curves focusing on (arguably) the two most relevant curve models: short Weierstrass curves with the curve parameter \(a\) set to \(-3\) and twisted Edwards curves with the curve parameter \(a\) set to \(-1\). The prime order Weierstrass curves give full ECDLP security over prime fields of a fixed bitlength, while offering good practical performance. On the other hand, the twisted Edwards curves sacrifice a small amount of ECDLP security but facilitate the fastest realization of curve arithmetic [35]. Both types of curves are selected in a deterministic fashion (see Sect. 3 for the full details) and offer twist-security [5], a property which is useful in certain scenarios. We note that our prime and curve selection is meant to cover a wide range of options exhibiting attractive features. Nevertheless, there are other design alternatives that might offer different trade-offs between security, rigidity and performance on different platforms. We leave the investigation of other options as future work.

An important requirement for implementations of modern cryptographic algorithms is a constant runtime when the algorithm computes on secret data to guard against timing attacks [38]. In particular, this potential threat exists for two basic elliptic curve operations: variable-base and fixed-base scalar multiplication. One solution is to use a complete addition law. However, a complete addition law is typically less efficient compared to the dedicated formulas which can fail for certain inputs. In Sect. 4 we outline another solution to this problem for the variable-base case. We show that our algorithms which compute on secret data, can never run into any exceptional cases (i. e. produce incorrect results) while using the faster dedicated formulas and ensuring a constant runtime (with the exception of the very last addition; see Sect. 4.1 for the details). Hence, this solution results in faster implementations compared to the complete solution. In the fixed-base case the situation is more complicated: most efficient algorithms in the literature may potentially run into exceptions. While the use of a complete addition formula suffices to solve the problem on twisted Edwards curves, the high cost of complete additions on Weierstrass curves would degrade performance significantly [18] (see Appendix C.1). To solve this problem, we propose a new formula that works for all possible inputs by exploiting masking techniques. This pseudo-complete addition requires the same number of multiplications and squarings as the unprotected dedicated addition formula and drastically reduces the overhead of protecting scalar multiplication. We comment that the formula is also useful in the context of secure, exception-free multi-scalar multiplications. The reader is referred to Appendix C.1 for more details on the new formula.

We do not claim full security against other attacks such as simple power analysis (SPA); this is left for future work. Nevertheless, we remark that all the selected algorithms have a regular structure as required when implementing countermeasures against certain simple side-channel attacks.

Summary of contributions

  • Analysis of a new set of deterministically selected prime-order Weierstrass curves (see Table 1) which are defined over pseudo-Mersenne and Montgomery-friendly primes whose bitlengths match those of the NIST primes. See Sects. 2 and 3.

  • Analysis of a new set of deterministically selected composite-order twisted Edwards curves (see Table 2 and Sect. 3). In contrast to existing curve proposals, the selected curves present (simultaneously) minimal parameter \(d\) in the twisted Edwards form and minimal parameter \(A\) in isogenous Montgomery form (minimal in absolute value). See Sect. 3.3.

  • A new, (pseudo-)complete addition algorithm for general curves in short Weierstrass form. This algorithm works for all pairs of inputs and its execution incurs only a small overhead compared to the dedicated addition law. See Sect. C.1.

  • We demonstrate how to use the scalar multiplication algorithms and prove that they become exception-free and facilitate constant-time implementations when used this way. This allows one to use the more efficient dedicated formulas whenever possible, resulting in an efficient and secure solution for elliptic curve scalar multiplication. See Sect. 4.

  • A comprehensive software implementation providing timings for various scenarios; this includes performance estimates for the above curves when used in the context of the TLS protocol. See Sects. 5 and 6.

Proposed curves

Tables 1 and 2 show the curves that we have chosen deterministically according to our security and efficiency criteria. The tables show the target security level, which gives a rough estimate for the desired security in each case. Curve names indicate the curve model [w for the Weierstrass model and ed for the (twisted) Edwards model], the bitlength of the underlying base field prime and the type of prime (mont for Montgomery-friendly and mers for pseudo-Mersenne primes). In Appendix D, we provide the trace of Frobenius \(t\) for each curve, so the number of \(\mathbf{{F}}_p\)-rational points for the curve \(E\) and its quadratic twist \(E'\) can be computed as \(\#E(\mathbf{{F}}_p) = p + 1 - t\) and \(\#E'(\mathbf{{F}}_p)=p+1+t\). More details on the curve choices and their properties are given in Sect. 3.

Table 1 Summary of our chosen Weierstrass curves of the form \(E_b/\mathbf{{F}}_p:y^2=x^3-3x+b\) defined over \(\mathbf{{F}}_p\) with quadratic twist \(E_b'/\mathbf{{F}}_p: y^2=x^3-3x-b\) and target security level \(\lambda \)
Table 2 Summary of our chosen twisted Edwards curves of the form \(\mathcal {E}_d/\mathbf{{F}}_p: -x^2 + y^2 = 1 + dx^2y^2\) defined over \(\mathbf{{F}}_p\), where \(d = -(A-2)/(A+2)\), and the target security level is \(\lambda \)

2 Modular arithmetic: choosing primes

Over a prime field \(\mathbf{{F}}_p\) (with \(p>3\) prime), the computation of the elliptic curve group operation boils down to numerous computations modulo \(p\). In this section, we outline the types of primes that we prefer for efficiency and security considerations, and discuss how the primes are uniquely determined from a fixed security level. We have not experimented with using a smaller radix system to accumulate the intermediate carries, at the cost of increasing the number of multiplications. We leave the investigation of such approaches as future work.

Primes of the form \(2^\alpha -\gamma \) Selecting primes of a special form to enhance the performance of the modular reduction is not new. The primes standardized in the digital signature standard [57] have a special form allowing fast reduction based on the work by Solinas [53]. Even faster modular reduction can be achieved by selecting primes of the form \(p = 2^\alpha -\gamma \), known as pseudo-Mersenne primes. In this case, the value \(\alpha \) is determined by the security parameter and is typically a multiple of \(64\) (or slightly smaller). The integer \(\gamma \) is chosen to be a small positive integer, i.e., significantly smaller than \(2^{32}\). Given two integers \(x\) and \(y\) such that \(0\le x,y<2^\alpha -\gamma \), one can compute \(x\cdot y \mod (2^\alpha -\gamma )\) by first computing the product and writing this in a radix-\(2^\alpha \) system as \(x\cdot y=z_h\cdot 2^\alpha + z_\ell \). A first reduction step, based on the shape of the modulus, is \(z_h\cdot 2^\alpha + z_\ell \equiv z_\ell +z_h\cdot \gamma \pmod {2^\alpha -\gamma } = z\), where \(0\le z <(\gamma +1) 2^\alpha \). If this step is repeated, the result is such that \(0\le z < 2^\alpha + \gamma ^2\), which can finally be brought into the desired range by applying an additional correction modulo \(p\) using subtractions. A standard way of enhancing the performance is to use a redundant representation: instead of reducing \(z\) to the range \([0,2^\alpha -\gamma )\), one can often more efficiently reduce \(z\) to the range \([0,2^\alpha )\), or to the range \([0,2^{2s})\) if \(\alpha \) is a few bits smaller than \(2s\) (at a target security level of \(s\) bits). The latter case can be optimized further by computing exclusively in such a redundant form and performing a sole correction at the end of the scalar multiplication.

Given a security level of \(s\) bits, we consider the parameter \(\alpha \in \{2s,2s-1\}\). Taking \(\alpha =2s\) makes the prime as large as possible, matching one of the requirements to achieve maximal ECDLP security at the \(s\)-bit security level. Taking \(\alpha =2s-1\) sacrifices half a bit of ECDLP security in favor of potential enhancements in efficiency, as described above. Thus, fixing \(s\) results in two possible values for \(\alpha \) and subsequently two primes of the form \(2^{\alpha }-\gamma \): for a fixed \(\alpha \), we choose the smallest \(\gamma \) such that \(2^{\alpha }-\gamma \) is both prime and congruent to \(3\) modulo \(4\) (the rational behind this congruence condition is discussed below). Following our curve selection criteria, the values \(\gamma \) for the curves under analysis are always smaller than \(2^{10}\), which makes them attractive for efficient implementation on 16, 32 and 64-bit platforms.

Primes of the form \(2^\alpha (2^\beta -\gamma )-1\) Another approach to select primes is inspired by Montgomery arithmetic [46]. The idea behind Montgomery multiplication is to replace the relatively expensive divisions by computationally inexpensive logical shifts when computing the modular reduction. Some computations (and storage) can be avoided when primes of the form \(p=2^\alpha (2^\beta -\gamma )-1\) are used for positive integers \(\alpha , \beta \) and \(\gamma \) (cf. [1, 15, 32, 37, 39]). When the prime \(p\) is two bits short of a multiple of the word size \(w\) (i.e., \(w\mid \alpha +\beta +2\)), one can avoid a conditional subtraction in every multiplication [58].

There are different ways to construct Montgomery-friendly primes: for example, [32] prefers \(\gamma \) to be a power of two, while [15] sets \(\beta =64\) and \(\gamma \) as small as possible to specifically target 64-bit platforms. We make choices of \(\alpha , \beta \) and \(\gamma \) such that the modular arithmetic can be implemented efficiently on a wide range of platforms. Given a security level of \(s\) bits, we consider \(\alpha =8\delta \) and \(\beta \in \{2s-\alpha ,2s-2-\alpha \}\), and choose \(\gamma \) and \(\delta \) as the smallest positive integers such that \(p=2^\alpha (2^\beta -\gamma )-1\) is prime and \(\lceil \log _2(p)\rceil =2s\) (resp. \(\lceil \log _2(p)\rceil =2s-2\)) in the setting of \(\beta =2s-\alpha \) (resp. \(\beta =2s-2-\alpha \)). We start with \(\delta =1\) and increment it by 1 (if necessary) until \(\gamma \) is found. For instance, for \(s=192\) and \(\beta =2s-\alpha \), we observe that \((\delta , \gamma ) = (1, 79)\) results in a prime which can be written as

$$\begin{aligned} 2^{376}(2^8-79)-1= & {} 2^{352} (2^{32}-2^{24}\cdot 79)-1 \\= & {} 2^{320}(2^{64}-2^{56}\cdot 79)-1, \end{aligned}$$

for usage on 8-, 32- and 64-bit platforms, respectively. This has the advantage that the reduction step, which has to be computed at every iteration inside the interleaved Montgomery algorithm, can be computed using only a multiply-and-add and an addition instruction. Note that, by construction, primes of this form are always congruent to \(3\) modulo \(4\).

Constant-time modular arithmetic One of the measures to guard software implementations against various types of side-channel analysis such as timing attacks [38] is to ensure a constant running time. In practice, this often means writing code which does not contain branches depending on secret data. For instance, the interleaved Montgomery multiplication algorithm requires a conditional subtraction at the end. To remove this, we always compute the subtractions and select (mask) the correct value depending on the conditional flag. In the setting of primes of the shape \(2^\alpha -\gamma \), one must always compute the worst-case number of reduction rounds in order to ensure constant runtime.

Besides the “standard” modular operations, there is also the need for constant-time methods to compute the modular inversion and the modular square roots. In order to compute the inversion modulo a prime \(p\), one can use Fermat’s little theorem: i.e., compute \(a^{p-2}\equiv a^{-1} \pmod p\). Since our chosen primes all have a special shape, finding efficient addition chains for this exponentiation is not difficult. For the \(n\)-bit primes considered in this work, we found that we can always compute the modular inversion using at most \(1.11\lceil \log _2(p)\rceil \) modular multiplications and modular squarings. If \(p\equiv 3 \pmod 4\), then one can compute a modular square root \(x\) (if it exists) of an element \(a\) using \(x\equiv a^{\frac{p+1}{4}}\pmod p\). Since this can be performed efficiently, and in constant-time, we require all of our primes to be congruent to \(3\) modulo \(4\).

3 Curve selection

In this section we explain how the curves in Tables 1 and 2 were chosen based on the selection of primes that is outlined in Sect. 2. For each chosen prime \(p \equiv 3\pmod 4\), we provide two curves: one is a prime order short Weierstrass curve, while the other is an almost-prime order twisted Edwards curve.

3.1 Curve selection for Weierstrass curves

For a fixed prime \(p\), a specific curve \(E_b: y^2 = x^3 - 3x + b\) is uniquely determined by the curve parameter \(b \in \mathbf{{F}}_p \backslash \{\pm 2,0\}\). Note that, since \(p \equiv 3 \mod 4\), its non-trivial quadratic twist \(E_b'\) has the curve equation \(E_b': y^2 = x^3 - 3x - b\). In order to guarantee twist-security [5], we require both the group orders \(r=\#E_b(\mathbf{{F}}_p)\) and \(r'=\#E_b'(\mathbf{{F}}_p)\) to be prime. We have \(r = p+1-t\) and \(r'=p+1+t\) for \(|t| \le 2\sqrt{p}\) and demand \(|t| > 1\) because curves with \(t \in \{0,1\}\) are weak. Thus, depending on the sign of the trace \(t\), either \(r>p, r'<p\) or \(r<p, r'>p\). To ease implementation, we demand that \(r<p\) for all curves, i.e., we choose the curve with positive trace. To leave no room for manipulating the curve choice, we select all curve parameters deterministically, namely by choosing the integer \(b\) with the smallest absolute value that yields a curve with the above properties. Based on these considerations, the selection process is completely explained in accordance with the rigidity condition of [12]. Specifically, we search for a suitable coefficient \(b\) by starting with \(b=1\) and incrementing \(b\) by one until both \(r\) and \(r'\) are prime. For each value of \(b\), we use the Schoof–Elkies–Atkin (SEA) point counting algorithm [51] in Magma [17] to compute the trace \(t\) of \(E_b\), such that \(r=p+1-t\) and \(r'=p+1+t\). We use the implementation’s ‘early abort’ feature that abandons the computation when small factors are found either in the curve’s or the twist’s group order. Because of the curve model for \(E_b'\), the search only considers positive values of \(b\) and we select the sign of \(b\) to ensure that \(r<p\). The resulting curves are summarized in Table 1.

3.2 Curve selection for twisted Edwards (and Montgomery) curves

For a fixed prime \(p\), a specific twisted Edwards curve \(\mathcal {E}_d: -x^2 + y^2 = 1 + dx^2y^2\) is uniquely determined by the curve parameter \(d \in \mathbf{{F}}_p \backslash \{0,-1\}\). Let \(A= 2\frac{1-d}{d+1}\), and \(B=-(A+2)\). Theorem 3.2 of [7] shows that the twisted Edwards curve \(\mathcal {E}\) and the Montgomery curve \(By^2=x^3+Ax^2+x\) are birationally equivalent. If \(B\) is a square in \(\mathbf{{F}}_p\) (which it is for all our curves), then \(\mathcal {E}_d\) is birationally equivalent to \(E_A :y^2 = x^3 + Ax^2 + x\). As for the Weierstrass curves, we demand \(t>1\) to exclude the weak curves with \(t\in \{0,1\}\) and to ensure that \(4r<p\).

Ideally, it would be desirable to have a curve with minimal parameter \(d\) in the twisted Edwards form and minimal parameter \(A\) in the Montgomery form. Unfortunately, existing curve proposals have been forced to pick one form and optimize it at the expense of the other one. We show in Sect. 3.3 below, that a search minimizing the absolute value of the parameter \(d\) would find curves with the same group orders for curve and twist, where the latter corresponds to \(-(d+1)\). This means that a search for minimal absolute value of \(d\) will always find positive \(d\) first, which corresponds to negative \(A\). Our search thus minimizes the absolute values of \(A\) and \(d\) at the same time.

For each fixed \(p\), we start with \(A=-6\) and search for \(A \in 2+4\mathbf{{Z}}\) (subtracting 4 each time) until \(\#E_A=4r\) and \(\#E_A'=4r'\), where \(r\) and \(r'\) are both prime. Note that the discussion in Sect. 3.3 also shows that \(B=-(A+2)\) is always a square in \(\mathbf{{F}}_p\), which means that \(E_A': y^2 = x^3 - Ax^2 + x\) is a model for the non-trivial quadratic twist of \(E_A\). Again, for each \(A\), we use the SEA algorithm [51] in Magma [17] to compute the trace \(t\) of \(E\), which determines \(\#E_A=p+1-t\) and \(\#E_A'=p+1+t\). Section 3.3 also shows that \(A^2-4\) is non-square in \(\mathbf{{F}}_p\), which simplifies notions of completeness on \(E\) (see [5]). Furthermore, we check that the curve satisfies all conditions posed by [12], if one of them is not met,Footnote 2 we continue with the next value for \(A\). We note that the cofactors of \(4\) are minimal when insisting on an \(\mathbf{{F}}_p\)-rational twisted Edwards and/or Montgomery form. The resulting curves are summarized in Table 2.

3.3 Correspondence between minimal \(A\) and \(d\) for twisted Edwards curves

Table 2 contains a column with values for the parameter \(d_0 = -(A+2)/4\), which can be used for implementing twisted Edwards curves defined over our prime fields. The curve \(\mathcal {E}_{d_0}/\mathbf{{F}}_p: -x^2 + y^2 = 1 + d_0x^2y^2\) has the same number of \(\mathbf{{F}}_p\)-rational points as the curve \(\mathcal {E}_d/\mathbf{{F}}_p: -x^2 + y^2 = 1 + dx^2y^2\) with \(d = -(A-2)/(A+2)\) and the Montgomery curve \(E_A/\mathbf{{F}}_p: y^2=x^3+Ax^2+x\). Furthermore, the curve \(\mathcal {E}_{-(d_0+1)}/\mathbf{{F}}_p: -x^2 + y^2 = 1 - (d_0+1)x^2y^2\) has the same number of \(\mathbf{{F}}_p\)-rational points as the quadratic twist \(\mathcal {E}_d'\) and the quadratic twist \(E_{-A}\). In this section, we show that this is true in general, and that therefore, the relation between \(d_0\) and \(A\) shows that the value \(d_0\) is the minimal value for \(d\) defining \(\mathcal {E}_d\) such that all the criteria in our curve selection are satisfied if and only if \(A\) is the minimal such value for the Montgomery form. This shows that it is not necessary to search for a new set of twisted Edwards curves if one wants to minimize the parameter \(d\) instead of the Montgomery parameter \(A\). One can simply use the curve defined by \(d_0\).

The following lemma connects the two twisted Edwards curves \(\mathcal {E}_d\) and \(\mathcal {E}_{d_0}\) via an isogeny whenever \(d_0 = -1/(d+1)\). It also gives a condition on \(d_0\) which determines whether the map is defined over \(\mathbf{{F}}_p\). If this is the case, both curves have the same number of \(\mathbf{{F}}_p\)-rational points.

Lemma 1

Let \(\mathcal {E}_d: -x^2 + y^2 = 1 + dx^2y^2\) be a twisted Edwards curve defined over a prime field \(\mathbf{{F}}_p\) and let \(d_0 = -1/(d+1) \in \mathbf{{F}}_p\). Then there exists a \(4\)-isogeny \(\phi : \mathcal {E}_d \rightarrow \mathcal {E}_{d_0}\). If \(d_0\) is a square in \(\mathbf{{F}}_p\), the isogeny is defined over \(\mathbf{{F}}_p\), in particular \(\#\mathcal {E}_d(\mathbf{{F}}_p) = \#\mathcal {E}_{d_0}(\mathbf{{F}}_p)\).

Proof

The isogeny \(\phi \) is one of the isogenies described in Section 3 of [2]. This means, it is the composition of maps

$$\begin{aligned} \phi = \hat{\psi }_{-1,-1/(d+1)}\circ \sigma \circ \psi _{-1,d}. \end{aligned}$$

The map \(\psi _{-1,d}\) is the \(2\)-isogeny \(\psi _{-1,d}: \mathcal {E}_d \rightarrow L_{-d}\) to the Legendre form curve \(L_{-d}: y^2 = x(x-1)(x+d)\) given in [2], Theorem 3.2], and \(\hat{\psi }_{-1,-1/(d+1)}: L_{1/(d+1)} \rightarrow \mathcal {E}_{-1/(d+1)}\) is the dual of the corresponding isogeny for \(1/(d+1)\). The map \(\sigma \) is equal to the isomorphism \(\sigma _2\sigma _1: L_{-d} \rightarrow L_{1/(d+1)}\) given in [2], Section 3.2]. The composition \(\phi \) is defined over \(\mathbf{{F}}_p\) if \(d_0\) and thus \(-(d+1)\) is a square in \(\mathbf{{F}}_p\). This proves the lemma. \(\square \)

The next result uses the previous isogeny to show that the original curve \(\mathcal {E}_d\) and its twist \(\mathcal {E}_d'\) each have corresponding curves with small parameters \(d_0\) and \(-(d_0+1)\), respectively, which have the same number of \(\mathbf{{F}}_p\)-rational points, provided that both these small parameters are squares in \(\mathbf{{F}}_p\).

Lemma 2

Let \(A\in \mathbf{{F}}_p{\setminus }\{-2,2\}, d = -(A-2)/(A+2)\) and \(d_0 = -(A+2)/4\) such that both \(d_0\) and \(-(d_0 + 1)\) are squares in \(\mathbf{{F}}_p\). Then \(\#\mathcal {E}_d(\mathbf{{F}}_p) = \#\mathcal {E}_{d_0}(\mathbf{{F}}_p)\). Moreover, \(\#\mathcal {E}_d'(\mathbf{{F}}_p) = \#\mathcal {E}_{-(d_0+1)}(\mathbf{{F}}_p)\).

Proof

The first part follows from Lemma 1 because \(d_0 = -1/(d+1)\). Since the twist \(\mathcal {E}_d' = \mathcal {E}_{1/d}\), the second part follows from Lemma 1 with \(d\) replaced by \(1/d\), which means that \(d_0\) is replaced by \(-(d_0 + 1)\). \(\square \)

Finally, we show that indeed our search criteria, in particular the facts that \(p \equiv 3 \pmod 4\) and that both group orders are not divisible by \(8\), imply that \(d_0\) and \(-(d_0+1)\) as given in our setting are squares in \(\mathbf{{F}}_p\), which shows that the correspondence above holds.

Lemma 3

Let \(p \equiv 3 \pmod 4, d_0\in \mathbf{{F}}_p\) and let \(\mathcal {E}_{d_0}: -x^2 + y^2 = 1 + d_0x^2 y^2\) be a twisted Edwards curve such that \(\#\mathcal {E}_{d_0}(\mathbf{{F}}_p) = 4r\) and \(\#\mathcal {E}_{d_0}'(\mathbf{{F}}_p) = 4r'\) for primes \(r\) and \(r'\). Then \(d_0\) and \(-(d_0+1)\) are both squares in \(\mathbf{{F}}_p\).

Proof

We first prove that \(d_0\) is a square in \(\mathbf{{F}}_p\). Assume that it is not a square. Section 3 in [8] provides an exhaustive description of all points of order \(2\) and \(4\) on a twisted Edwards curve. If \(d_0\) is not a square, then \(-1/d_0\) is a square because \(p \equiv 3 \pmod 4\). Then the full \(2\)-torsion is defined over \(\mathbf{{F}}_p\), it consists of the affine point \((0,-1)\) and two points at infinity \(((1:0),(\pm \sqrt{-1/d_0}))\) (written as completed points in projective space \({\mathbb {P}}^1 \times {\mathbb {P}}^1\), see [8], Section 2.7]). Let \(s\in \mathbf{{F}}_p\) with \(s^2 = -1/d_0\), then exactly one of \(\pm s\) is a square, assume without loss of generality that it is \(s\). Then this value gives \(4\) affine points \((\pm \sqrt{s},\pm \sqrt{s})\) (signs chosen independently) of order \(4\) defined over \(\mathbf{{F}}_p\). The group structure of the \(4\)-torsion on \(\mathcal {E}_{d_0}\) that is defined over \(\mathbf{{F}}_p\) is thus \(\mathbf{{Z}}_2 \times \mathbf{{Z}}_4\) and has order \(8\). Therefore \(8\) must divide \(\#\mathcal {E}_{d_0}(\mathbf{{F}}_p)\), which contradicts our assumption that the group order is \(4r\) for \(r\) prime. Hence, \(d_0\) is a square.

We know that the twist \(\mathcal {E}'_{d_0}\) is birationally equivalent to \(\mathcal {E}_{1/d_0}\), and we have already shown that \(d_0\) is a square, so \(1/d_0\) is a square. We can apply Lemma 1 with \(d_0\) replaced by \(1/d_0\), which means that \(d=-(d_0 + 1)\), and obtain that \(\#\mathcal {E}_{-(d_0 +1)}(\mathbf{{F}}_p) = \#\mathcal {E}_{1/d_0}(\mathbf{{F}}_p) = 4r'\). Now looking at the 4-torsion defined over \(\mathbf{{F}}_p\) as above yields that \(-(d_0 + 1)\) is a square in \(\mathbf{{F}}_p\). \(\square \)

The minimality of \({\mathbf{d}_\mathbf{0}}\) All our selected twisted Edwards curves satisfy the conditions of the previous two lemmas. Therefore, one can choose to work with the isogenous curves defined by \(d_0\) or \(-(d_0+1)\), whichever is more convenient. The isogenous curves and their twists have the same orders as the original curves and their twists. Therefore all conditions required in the curve selection are satisfied with the added benefit of a small \(d\)-value.

We argue that \(d_0\) is of minimal absolute value defining a curve that satisfies the search criteria. Assume that \(A\) is a coefficient with minimal absolute value that yields a desired curve when minimizing for the Montgomery parameter (like the values for \(A\) in our examples). A search that minimizes the absolute value of the parameter \(d\) in the twisted Edwards model \(\mathcal {E}_d\), must find the value \(d_0 (\hbox {or }-(d_0+1))\) first since \(A = -(4d_0 + 2)\). Without loss of generality, let \(|d_0| < |d_0 + 1|\), i.e., \(d_0 > 0\), otherwise interchange \(d_0\) and \(-(d_0+1)\). Indeed, assume that a \(d_1\) with \(|d_1|<|d_0|\) leads to a curve that satisfies all criteria. Let \(A_1 = -(4d_1 + 2)\). By Lemma 3, \(d_1\) and \(-(d_1+1)\) are squares, then by Lemma 2, \(\#\mathcal {E}_{d_1}(\mathbf{{F}}_p) = \#\mathcal {E}_{\hat{d}_1}(\mathbf{{F}}_p) = \#E_{A_1}(\mathbf{{F}}_p)\), where \(\hat{d}_1 = -(A_1-2)/(A_1+2)\) and \(\#\mathcal {E}_{d_1}'(\mathbf{{F}}_p) = \mathcal {E}_{-(d_1+1)}(\mathbf{{F}}_p) = \#\mathcal {E}_{1/\hat{d}_1}(\mathbf{{F}}_p) = \#E_{-A_1}(\mathbf{{F}}_p)\). This means that the curve \(E_{A_1}\) satisfies the search criteria.

Since we fixed \(d_0 > 0\), we have \(A<0\). By assumption, we have \(-A = 4d_0 + 2 = 4|d_0| + 2 >4|d_1|+2\). Now consider the two cases \(d_1>0\) and \(d_1<0\). If \(d_1>0\), then \(A_1 = -(4d_1+2)<0\), and \(|A| = -A > 4|d_1| + 2 = 4d_1 + 2 = -A_1 = |A_1|\), contradicting the minimality of \(A\). Similarly, if \(d_1<0\), then \(A_1>0\) and \(|A| = -A > 4|d_1| + 2 > 4|d_1+1| + 2 = -4(d_1+1) + 2 = -(4d_1+2) = A_1 = |A_1|\), again a contradiction. Overall, this means that \(d_0\) must be the coefficient with minimal absolute value.

3.4 Curve properties

In both families of curves, note that for primes of the form \(2^{\alpha }-\gamma \), the bitlengths of \(r\) and \(r'\) differ by 1, since \(|t| \gg \gamma \) in general; for primes of the form \(2^{\alpha }(2^\beta -\gamma )-1\), the bitlengths of \(r\) and \(r'\) are always equal when \(\gamma \ne 0\). The curves in Table 2 can be used in different curve models: in the twisted Edwards model, in the Montgomery model for implementing Montgomery ladders, and also in the original Edwards model allowing complete addition formulas [11]. The latter can be seen as follows. Since \(p \equiv 3 \pmod 4, E_A\) is birationally equivalent to an Edwards curve by [7], Theorem 3.4]. Using the maps discussed in [7], Section 3], one can show that \(E_A: y^2 = x^3 + Ax^2 + x\) is birationally equivalent to \(\mathcal {E}_{-1/d}: x^2 + y^2 = 1 - (1/d)x^2y^2\). For all of our curves, \(d\) is a square in \(\mathbf{{F}}_p\), so \(-1/d\) is not a square, which means that the addition law on \(\mathcal {E}_{-1/d}\) is complete. All of the curves in Table 2 allow for an efficient map from a subset of their \(\mathbf{{F}}_p\)-rational points to bit strings of a certain length, such that they are indistinguishable from uniform random bitstrings of the same length (see [10], which is based on [29]). However, note that curves defined over pseudo-Mersenne primes are more suitable for achieving indistinguishability than those over Montgomery-friendly primes because for the latter primes \(p\), the value \((p+1)/2\) is further away from a power of \(2\) (see [10], §2.6]). The prime-order Weierstrass curves presented in Table 1 are similar in their basic properties to the NIST curves, as they have the same curve model, share the parameter \(a=-3\), and include prime fields of the same bit lengths as the ones for the NIST curves [57]. However, we stress that the curves in Table 1 do not allow any room for manipulations, which can be the case when the curve parameter \(b\) is allowed to be chosen “randomly”. Our curves are twist-secure, do not allow transfers, and have large discriminants (notions used to guard against certain attacks; e.g., see [12]). The work in [56] shows that indistinguishability can also be achieved for our prime-order Weierstrass curves in Table 1, however the resulting bit strings are twice as large as those that result from applying [10, 29] to the twisted Edwards curves in Table 2.

4 Efficient, constant-time, and exceptionless scalar multiplications

To protect against certain types of side-channel attacks [38], it is essential that scalar multiplications are computed in constant-time. This means that the running time of the algorithm for computing a scalar multiplication \(kP\) must be independent of the scalar \(k\) and the point \(P\). Classical curve arithmetic formulas have exceptional cases, i.e., they do not work for all points. Having conditional statements in the code that check for these cases means the algorithms have a variable running time depending on different input cases, but simply leaving them out might lead to exceptional point attacks that produce wrong results or cause other implementation errors. In this section we outline how constant-time algorithms can be achieved efficiently for our chosen Weierstrass and twisted Edwards curves in two different settings: the variable- and fixed-base scenarios. The variable-base scenario refers to the case in which the base point \(P\) can be different for each execution of the algorithm. In the fixed-base case, multiples of a public constant point can be precomputed, which allows different optimization possibilities. In Appendix A we present an algorithm for the double-scalar scenario, which carries out a computation of the form \(k_1P_1 + k_2P_2\) (see Algorithm 9). This occurs for example in the verification of ECDSA signatures. In this setting the verification algorithm operates on public inputs only, and one can profit from more efficient variable-time algorithms since the implementation does not require side-channel protection or constant-time execution.

We discuss the various cases for implementing scalar multiplication for the different curve models and algorithm choices. We list all algorithms as pseudo-code in Appendix A (scalar multiplication, point validation, precomputation and recoding) and in Appendix B (point operations). The reader is referred to Appendix C for complete details on the selection of explicit formulas. Note that several of these algorithms contain if-statements, which are marked in the pseudo-code according to their nature. For example, some of these statements occur in algorithms that are only run on public inputs and do not need to run in constant time; some of them are implemented in constant time via masking techniques; and some of them are there merely to allow us to represent several algorithms in one pseudo-code algorithm environment and to re-use the different variants in different scenarios. As soon as a specific scenario is chosen, these statements are always executed under the same condition. The remaining if-statements are the ones that when implemented introduce data-dependent branches into the algorithms. They occur only in algorithms for point doubling, point addition and merged point doubling/addition, where they correspond to exceptions, i.e., the exceptional cases for which the given formulas are not valid. But, whenever the implementation needs to be constant-time, the conditions for entering these if-statements are always false such that they are never executed (and can be removed in the code). Below, we argue that indeed no exceptional cases occur and that the proposed algorithms can be implemented to run in constant time (when used as described in the algorithms in Appendix A). Note that the neutral element on Weierstrass curves is the point at infinity, i.e., the point \((0:1:0)\) in projective coordinates, while on twisted Edwards curves the neutral element is the rational point \((0,1)\), and in the Montgomery ladder the neutral element is \((X :Z) = (0 :0)\). In this paper, they are all denoted by \(\mathcal {O}\).

4.1 Weierstrass scalar multiplications

Let \(E_b/\mathbf{{F}}_p\) be any of the Weierstrass curves in Table 1, with \(r=\#E_b(\mathbf{{F}}_p)\) prime. Let \(k\) be an integer scalar and \(P = (x_1,y_1) \in \mathbf{{F}}_p \times \mathbf{{F}}_p\). We consider the computation of efficient, constant-time and exception-free scalar multiplications in two scenarios.

The variable-base scenario On input of the scalar \(k\) and variable point \(P = (x_1,y_1)\), perform the following steps.

  1. 1.

    Validation Validate that \(k\in [1,r)\) and that \(P=(x_1,y_1) \in E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\}\) by checking that \(y_1^2=x_1^3-3x_1+b\). Otherwise, return false (see Algorithm 2).

  2. 2.

    Precomputation For a fixed window size \(2\le w < 10\), compute the \(2^{w-2}\) multiples \(\{ P, 3P, \ldots , (2^{w-1}-1)P\}\) of \(P\), and store them in a lookup table. This precomputation can be achieved using one point doubling and \(2^{w-2}-1\) point additionsFootnote 3 (see Algorithm 4).

  3. 3.

    Scalar recoding Convert the scalar \(k\) to odd by replacing \(k\) with \(r-k\) (if even) and recode it into exactly \(\lceil \log _2(r)/(w-1)\rceil +1\) odd, signed, non-zero digits in \(\{\pm 1,\pm 3, \ldots , \pm (2^{w-1}-1)\}\) (see Algorithm 6).

  4. 4.

    Evaluation Compute \(kP\) using a fixed window with the precomputed values from the previous step. This requires exactly \((w-1)\lceil \log _2(r)/(w-1)\rceil \) point doublings and \(\lceil \log _2(r)/(w-1)\rceil \) point additions, or \((w-2)\lceil \log _2(r)/(w-1)\rceil + 1\) point doublings, \(\lceil \log _2(r)/(w-1)\rceil - 1\) point doubling-additions and one addition when \(w > 2\). Note that every time an addition is performed, we also negate the selected point in the look-up table, and choose the correct one according to the sign of the digit in the recoded scalar. This is repeated until the last iteration, when crucially, the final addition is performed via a “complete masked” addition (see Appendix C.1). The final result is negated if the original value of \(k\) was even.

This can be computed as outlined in Algorithm 1 in Appendix A.

Proposition 1

When computing variable-base scalar multiplications on any of the Weierstrass curves in Table 1 using Algorithm 1 to implement the steps above, no exceptions occur.

Before proving the proposition, we fix notation to partition the non-zero points in a prime order subgroup of the group \(E_b(\mathbf{{F}}_p)\). For a fixed point \(P \in E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\}\), the map \([1, r) \rightarrow E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\},\ k \mapsto kP\) is a bijection. It induces a partition of \(E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\} = S_\mathrm{odd} \cup S_\mathrm{even}\) into two equally sized sets, where \(S_\mathrm{odd} = \{kP \mid k \in [1, r) \hbox { odd}\}\) and \(S_\mathrm{even} = \{kP \mid k \in [1, r) \hbox { even}\}\). Let \(T=\{ P, 3P, \ldots , (2^{w-1}-1)P\} \subset S_\mathrm{odd}\) and \(T^{-1}=\{(r-1) P, (r-3)P, \ldots , (r-(2^{w-1}-1))P\} \subset S_\mathrm{even}\). The set \(T^{-1}\) contains the inverses of the points in the set \(T\).

Proof

To exclude any exceptions in the course of Algorithm 1, we consider all of its doubling, addition and merged doubling/addition operations. First of all, it is easy to see that all doubling and addition steps for building the look-up table are exception-free. Note that the look-up table consists exactly of the points in the set \(T\) defined above. The precomputation as shown in Algorithm 4 starts by doubling \(P\) with Algorithm 10. The algorithm works for the point at infinity \(\mathcal {O}\) when defined as \((0:Y_1:0)\) with \(Y_1 \ne 0\), but the case \(P=\mathcal {O}\) is excluded by point validation, and it does not have any exceptions since there are no points of order \(2\) in the group \(E(\mathbf{{F}}_p)\). The points for the look-up table are then computed by adding \(2P \in S_\mathrm{even}\) to points from \(T\subset S_\mathrm{odd}\) only, i.e., the input points to the additions are always different and do not include \(\mathcal {O}\). Also \(-2P = (r-2)P\) is not among these points because \(2^{w-1}-1 < r-2\) (note \(2 \le w < 10\)).

The operations in the evaluation stage depend on the recoding of the scalar \(k\), which at this point in the algorithm satisfies \(0 < k < r\). Let \(t=\lceil \log _2(r)/(w-1)\rceil \), then with notation as in Algorithm 1, the scalar can be written as

$$\begin{aligned} k = \sum _{i=0}^t s_i |k_i| 2^{(w-1)i}, \end{aligned}$$

where \(s_i \in \{-1,1\}\) and \(k_i \in \mathbf{{Z}}\) with \(0 < |k_i| < 2^{w-1}\). The recoding used here guarantees \(k_t > 0\) such that \(s_t = 1\) and \(|k_t|=k_t\). Throughout the evaluation stage, the variable \(Q\) is used to denote the running value during the algorithm. At any stage, there is some \(z \in [0, r)\) such that \(Q = zP\). Let \(z_1>0\) and \(z_2 = 2^{w-1}z_1 \pm z_0\) with \(z_0\in \{1,3,\ldots ,2^{w-1}-1\}\), then \(z_2 \ge z_1\). If \(z_1 > 1\), we even have \(z_2 > z_1\). This means that whenever a positive integer is doubled \(w-1\) times and then an integer corresponding to one of the elements in the look-up table is either added or subtracted, the result cannot be smaller than the original integer. Thus, in the evaluation stage of Algorithm 1, after each sequence of \(w-1\) doublings and one addition step, the value \(z\) of the running point \(Q\) cannot decrease.

The evaluation stage begins with choosing an element from the lookup table \(T\) and assigning it to \(Q\). After the first assignment, we have \(z \in \{1, 3, \ldots , 2^{w-1}-1\}\). All the doubling operations in Lines 11, 14 and 18 of Algorithm 1 are done using Algorithm 10. Therefore, for the same reasons as explained above there are no exceptions possible in these steps. The last addition in Line 19 is done with a complete addition formula and hence also does not have any exceptional cases. It now suffices to ensure that all remaining addition steps (i.e., in Lines 12 and 15) do not run into exceptions.

First, assume that an exceptional case occurs in one of the additions in Step 15, which computes \(Q+R\) for \(R\in T\cup T^{-1}\) using Algorithm 2. Note that none of the doubling steps can ever output \(\mathcal {O}\) because there are no points of order \(2\) and \(\mathcal {O}\) is never input to any of them since the running value \(Q\) always has \(1<z<r\) for all points input to doubling steps prior to any of the additions in Step 15. Thus the only exceptional cases that could occur in this algorithm, are the cases where \(Q = \pm R\). This means that either \(Q\in T\) or \(Q\in T^{-1}\). Since \(Q\) is the output of a non-trivial doubling operation, we have \(Q\in S_\mathrm{even}\) which excludes \(Q\in T\) and means that \(Q\in T^{-1}\). Therefore, \(Q=zP\) with \(z \ge r- (2^{w-1} - 1)\). After each addition in Step 15 there are always \(w-1\) doublings that follow. Hence, the minimal value for \(z\) that can occur after the exceptional addition and the following doublings is \(2^{w-1} (r- 2(2^{w-1} - 1))\). The addition of a table element immediately after these doublings, can bring down this value to the minimal \(z_{\min } = 2^{w-1} (r- 2(2^{w-1} - 1)) - (2^{w-1}-1) = 2^{w-1} r - (2^{w}+1)(2^{w-1} - 1)\). This value is larger than \(r\), because otherwise, it follows that \(r \le 2^{w}+1\), which is not true for any of our curves. Given the observation that a positive integer does not decrease after any sequence of \(w-1\) doublings and a following addition of an integer corresponding to a look-up table element, the scalar \(k\) cannot be reached any more as the final value for \(z\) after the exceptional addition. This contradicts any exceptions in the additions of Step 15.

Next, assume that an exception occurs in one of the steps in Line 12 of Algorithm 1. This step is a merged doubling and addition step and is computed via Algorithm 11. The algorithm computes \(2Q + R\) for \(R\in T\cup T^{-1}\) as \((Q+R) + Q\). For the same reasons as above, the input point \(Q\) cannot be equal to \(\mathcal {O}\). Since \(R\in T\cup T^{-1}\), we have \(R\ne \mathcal {O}\). The first addition \(Q+R\) could have the same exceptions as the additions in Step 15 treated in the previous paragraph. This means that an exception can only be \(Q \in T^{-1}\) as above and again we look at the minimal value \(z_{\min }\) after carrying out the exceptional addition, the addition of \(Q\) and the following \(w-1\) doublings and subsequent addition (also the steps including the merged doubling and addition algorithm can be treated as such). This value is \(z_{\min } \ge 2^{w-1}\cdot (2r- 3(2^{w-1} - 1)) - (2^{w-1} - 1) = 2^{w}r - (3\cdot 2^{w-1}+1)(2^{w-1}-1)\). Again, this value is larger than \(r\), because otherwise we would have \(r\le 3\cdot 2^{w-1} + 1\), which does not hold for our curve parameters. As above this means that the scalar \(k<r\) cannot be reached as the final value of \(z\), contradicting any exception in the first addition in \((Q+R) + Q\). Finally, we assume that there is an exception in the second addition. We have already excluded \(Q=\mathcal {O}\) and \(Q+R = \mathcal {O}\). Hence, the only two possibilities for an exception are \(Q+R = Q\) or \(Q+R = -Q\). The first condition means that \(R=\mathcal {O}\) which is not possible since \(R\in T\cup T^{-1}\). We are thus left with the condition \(2Q = -R\) and hence either \(2Q \in T\) or \(2Q \in T^{-1}\). Since \(2Q \in S_\mathrm{even}\), it cannot be in \(T\), which leaves \(2Q \in T^{-1}\). This means that \(2z \ge r-(2^{w-1} - 1)\). The minimal value \(z_{\min }\) after the computation \((Q+R) + Q\) and the following \(w-1\) doublings and another addition is \(z_{\min }\ge 2^{w-1}(r-2(2^{w-1}-1)) - (2^{w-1}-1) = 2^{w-1}r - (2^w+1)(2^{w-1}-1)\). Again, this value is larger than \(r\), leaving no way to achieve the scalar \(k\) during the remaining computation. This excludes all exceptions in Line 12 and therefore all exceptions in Algorithm 1. \(\square \)

Given that the recoding always produces a fixed length for the scalar, this means that after a successful validation step, we do not execute any conditional statements.

The fixed-base scenario In this setting, the point \(P\) is fixed (e.g., as a public parameter of the system), so multiples of \(P\) can be precomputed offline and used to speedup the online computation of \(k P\). In terms of performance, it might be difficult to select the “optimal” size of the precomputed table. A larger table with more multiples of \(P\) typically means a reduced number of elliptic curve operations at runtime, but such tables might result in cache-misses which can result in a performance penalty. Moreover, when one wants to extract elements from this table in a cache-attack resistant manner, one should access every element and mask out the correct value to avoid leaking access patterns. Hence, using a larger table implies an increased access cost for every table-lookup.

This is not the only problem with large precomputed tables. As far as we know, one cannot show (for all inputs) that a current active point in the fixed-base scalar multiplication will not be the same (or have an opposite sign) as one of the many precomputed values. Although this might happen only with extremely low probability, such that honest parties may never encounter this by accident, active adversaries could manipulate such scalar/point combinations to force exceptions. This means that, unlike the variable-base multiplication, the implementation of the group law must cover exceptional cases. One solution is to use complete formulas (which have no exceptional cases). Unfortunately, the complete Weierstrass formulas from [18] (see Appendix C.1) are expensive compared to their incomplete counterparts, and using these would incur a much larger relative penalty than the complete formulas on (twisted) Edwards curves do. Another possible solution is to always compute two candidates for the addition, \(C_1=2P\) and \(C_2=P+L\), and select (in a constant time manner) \(C_1\) if \(P=L, \mathcal {O}\) if \(P=-L, L\) if \(P=\mathcal {O}, P\) if \(L=\mathcal {O}\), and \(C_2\) otherwise. At a first glance this approach seemingly increases the cost of an addition to be at least that of computing both an addition and a doubling. However, as noted by Chevallier-Mames et al. [22] for the case of binary affine operations, doubling and addition share several similarities in their formulas. By observing that these similarities naturally reflect to the projective formulas, we present a solution that achieves the required behavior explained above without increasing the number of modular multiplications or squarings required in a dedicated point addition (see Algorithms 18 and 19). The idea is to exploit the similarities in the doubling and addition routines by masking out the correct operands first, and using these as inputs to the arithmetic operations.

Hence, Algorithms 18 and 19 work for any input points, do not have any exceptional cases and have roughly the same run-time as their corresponding dedicated point additions. We note that Chevallier-Mames et al.’s approach tries to address a different problem and hence produces different formulas. In particular, they exploit similarities in the affine formulas to build (separate) routines for doubling and addition with the same pattern of field operations. This is done in order to eliminate differences in the power traces of the doubling and addition executions. In projective coordinates, however, the same approach would not work because of the extra operations required by addition in comparison to doubling (in this case, point operations are partitioned into smaller atomic units, each with the same pattern of field operations. Thus, this technique does not exploit similarities between doubling and addition).

For a scalar \(k\) and the fixed point \(P = (x_1,y_1)\), we make use of these formulas to perform the following steps.

Offline computation

  1. 1.

    Point validation Validate that \(P=(x_1,y_1) \in E_b(\mathbf{{F}}_p)\) \({\setminus } \{{\mathcal {O}}\}\) by checking that \(y_1^2=x_1^3-3x_1+b\). Otherwise, return false (see Algorithm 2).

  2. 2.

    Precomputation For a fixed window size \(2\le w <10\), compute \(v>0\) tables of \(2^{w-1}\) points (each) for the mLSB-set comb method (see Line 2 of Algorithm 7). Convert all points in the lookup table to affine form. Online computation

  3. 3.

    Scalar validation Validate that the scalar \(k\in [1,r)\). Let the maximum bit-length of all valid scalars be \(t=\lceil \log _2(r)\rceil \).

  4. 4.

    Recoding Convert the scalar \(k\) to odd by replacing it with \(r-k\) (if even) and recode it into the mLSB-set representation (see Algorithm 8).

  5. 5.

    Evaluation Using the precomputed values from the offline precomputation, compute \(kP\) with exactly \(\lceil \frac{t}{w\cdot v}\rceil -1\) point doublings and \(v \lceil \frac{t}{w \cdot v}\rceil -1\) point additions.Footnote 4 All point additions are computed using the “complete masked” approach in Algorithm 18 in Appendix C.1. The final result is negated if the original value of \(k\) was even.

This approach is outlined in Algorithm 7 in Appendix A.

Proposition 2

When computing fixed-base scalar multiplications on any of the Weierstrass curves in Table 1 using Algorithm 7 to implement the steps above, no exceptions occur.

Proof

Following the proof of Proposition 1, point doublings computed via Algorithm 10 do not fail for any rational points in \(E_b(\mathbf{{F}}_p)\) for any of the curves \(E_b\) in Table 1. Furthermore, Algorithm 10 also correctly computes doublings at the point at infinity, \({\mathcal {O}}\). Thus, no exceptions can arise in point doublings; and, since all online additions are implemented using the “complete” masking technique described in Appendix C.1, it follows that no exceptions can arise at any stage of the online computation (offline computations can also make use of this technique if necessary). \(\square \)

4.2 Twisted Edwards scalar multiplications

Let \({\mathcal {E}}_d/\mathbf{{F}}_p :-x^2+y^2=1+dx^2y^2\) be any of the twisted Edwards curves in Table 2, with \(\#E(\mathbf{{F}}_p)=4r\) for \(r\) prime. In a similar vein to [5, 34], we avoid small subgroup attacks by requiring all scalar multiplications to include a cofactor \(4\). Thus, let the integer \(\hat{k}\) be defined as \(\hat{k}:=4k\) with \(k \in [1,r)\), and let \(P = (x_1,y_1)\) be in \(\mathbf{{F}}_p \times \mathbf{{F}}_p\).

The variable-base scenario On input of \(\hat{k}\) and (variable) \(P = (x_1,y_1) \in \mathbf{{F}}_p \times \mathbf{{F}}_p\), we perform the following steps.

  1. 1.

    Validation Validate that \(\hat{k} \in [4\cdot 1, 4\cdot 2, \ldots , 4(r-1) ]\). Validate that \(P=(x_1,y_1) \in \mathcal {E}_d(\mathbf{{F}}_p){\setminus }\{ {\mathcal {O}}\}\) by checking that \(-x_1^2+y_1^2=1+dx_1^2y_1^2\) and that \(P \ne (0,1) = {\mathcal {O}}\) (see Algorithm 3). Otherwise, return false.

  2. 2.

    Clear torsion Compute \(Q \leftarrow [4]P\) using two consecutive doublings (as in Algorithm 3).

  3. 3.

    Revalidation Validate that (the projective point) \(Q\ne \mathcal {O}\). If not, reject.

  4. 4.

    Precomputation Compute the \(2^{w-2}\) odd, positive multiples \(\{ Q, 3Q, \ldots , (2^{w-1}-1)Q\}\) of \(Q\), and store them in a lookup table. This precomputation can be achieved using one point doubling and \(2^{w-2}-1\) point additionsFootnote 5 (see Algorithm 4).

  5. 5.

    Scalar recoding Using a window size of \(2\le w < 10\), convert the updated scalar \(k := \hat{k} / 4 \in [1,r-1]\) to odd by setting \(k\) to \(r-k\) (if even) and recode it into exactly \(\lceil \log _2(r)/(w-1)\rceil +1\) odd, signed, non-zero digits in \(\{\pm 1,\pm 3, \ldots , \pm (2^{w-1}-1)\}\) (see Algorithm 6).

  6. 6.

    Evaluation Compute \(\hat{k}P\) as \(kQ\), using exactly \((w-1)\lceil \log _2(r)/(w-1)\rceil \) point doublings and \(\lceil \log _2(r)/(w-1)\rceil \) point additions. Note that every time an addition is performed, we also negate the selected point in the look-up table, and choose the correct one according to the sign of the digit in the recoded scalar. This is repeated until the last iteration, when crucially, the final addition is performed using the unified formula in [35], Eq. (5)]. The final result is negated if the original value of \(k\) was even.

This computation is given in Algorithm 1 in Appendix A.

Proposition 3

When computing variable-base scalar multiplications on any of the twisted Edwards curves in Table 2 using Algorithm 1 to implement the steps above, no exceptions occur.

Proof

The first three steps (validation, clear torsion, and revalidation) detailed in Sect. 4.2 ensure that the point \(Q\) has large prime order \(r\). Furthermore, only elements of \(\langle Q \rangle \) are encountered after the revalidation stage, meaning that Corollary 1 from [35] can be invoked to say that the additions in Algorithm 15 (from [35], but extended according to the representation suggested in [32]) will never fail to add points \(P\) and \(Q\) of odd order, except when \(P = Q\). This corollary also tells us that the formulas for point doubling in Algorithm 14 never fail for points of odd order. Similar to the addition formulas, these doubling formulas, which are from [7], are extended according to [32]. Thus, the proof from this point is identical to the proof of Proposition 1: we partition the elements in \(\langle Q \rangle {\setminus } \{ {\mathcal {O}} \}\) into \(S_\mathrm{odd}\) and \(S_\mathrm{even}\) to categorize the elements in the look-up table, and use this to show that the running value that is input into point additions can never be equal to an element in the look-up table, except possibly in the final addition, where we use the formula in [35], Eq. (5)], which is slightly slower, but is exception-free in \(\langle Q \rangle \). \(\square \)

The fixed-base scenario Let \(P = (x_1,y_1) \in \mathbf{{F}}_p\times \mathbf{{F}}_p\) be a fixed point and let \(\hat{k}= 4k\) be an integer scalar, which is a multiple of the cofactor \(4\). Then perform the following steps.

Offline computation

  1. 1.

    Validation Validate that \(\hat{k} \in [4\cdot 1, 4\cdot 2, \ldots , 4(r-1) ]\). Validate that \(P=(x_1,y_1) \in \mathcal {E}_d(\mathbf{{F}}_p){\setminus }\{ {\mathcal {O}}\}\) by checking that \(-x_1^2+y_1^2=1+dx_1^2y_1^2\) and that \(P \ne (0,1) = {\mathcal {O}}\) (see Algorithm 3). Otherwise, return false.

  2. 2.

    Clear torsion Compute \(Q \leftarrow [4]P\) using two consecutive doublings (see Algorithm 3).

  3. 3.

    Revalidation Validate that \(Q\ne \mathcal {O}\). If not, reject.

  4. 4.

    Precomputation For a fixed window size \(2\le w <10\), compute \(v>0\) tables of \(2^{w-1}\) points (each) for the mLSB-set comb method (see Line 2 of Algorithm 7)—convert all points in the lookup table to affine form. Online computation.

  5. 5.

    Recoding Convert the updated scalar \(k:=\hat{k}/4\) to odd by setting \(k\) to \(r-k\) (if even) and recode it into the mLSB-set representation (see Algorithm 8).

  6. 6.

    Evaluation: Using the precomputed values from the offline precomputation, compute \(\hat{k}P\) as \(kQ\) with exactly \(\lceil \frac{t}{w\cdot v}\rceil -1\) point doublings and \(v \lceil \frac{t}{w \cdot v}\rceil -1\) point additions.Footnote 6 Every one of these additions is computed using the unified formulas from [35], Eq. (5)]. The final result is negated if the original value of \(k\) was even.

Algorithm 7 in Appendix A outlines this computation.

Proposition 4

When computing fixed-base scalar multiplications on any of the twisted Edwards curves in Table 2 using Algorithm 7 to implement the steps above, no exceptions occur.

Proof

As in the proof of Proposition 3, we start by noting that the (updated) point \(Q\) has odd order \(r\), and that we only compute on elements in \(\langle Q \rangle \). The only algorithm we use for online additions corresponds to the formulas in [35], Eq. (5)], which do not fail for any pair of inputs in \(\langle Q \rangle \). Additionally, the only algorithm we use for doublings is Algorithm 14 (from [7]), which is also exception-free on all inputs from \(\langle Q \rangle \). \(\square \)

4.3 The Montgomery ladder

Let \(E_A/\mathbf{{F}}_p :y^2=x^3+Ax^2+x\) be the Montgomery form of any of the curves in Table 2, with \(\#E_A(\mathbf{{F}}_p)=4r\), for \(r\) a large prime. Since the Montgomery ladder is not compatible with the recoding techniques discussed in Sect. 4, we take the following route to guarantee a fixed length scalar. For all \(k \in [1, r-1]\), we use the updated scalar \(\hat{k} = 4(\alpha r+k)\), where \(\alpha \) is the smallest positive integer such that \(\alpha r +1\) and \((\alpha +1) r -1\) have the same bitlength; \(\alpha \) is specific to \(r\), but for each of the curves in Table 2 we have \(\alpha \in \{1,2,3\}\). Note that scalar multiplication by \(\hat{k}\) corresponds to scalar multiplication by \(4k\) on \(E_A\), which thwarts small subgroup attacks in the same way as the twisted Edwards scalar multiplications in Sect. 4.2.

On input of \(\hat{k}\) and \(x_1 \in \mathbf{{F}}_p\), we perform the following steps.

  1. 1.

    Scalar validation First validate that \(\hat{k} \in 4{\mathbb {Z}}\), and then that the integer \(\hat{k}/4 \in [\alpha r+1,(\alpha +1)r-1]\). Otherwise, reject.

  2. 2.

    Evaluation Process the scalar by inputting \(\hat{k}\) and \((x_1 :1)\) into the standard \((X :Z)\)-only Montgomery ladder routine [47], §10], with constant \((A+2)/4\) in the addition formula. Since \(\hat{k} = 4(\alpha r+k)\), this can be done by inputting the fixed-length scalar \(\hat{k}/4=\alpha r+k\) and \((x_1 :1)\) into the Montgomery ladder to give \((X_1 :Z_1)\), before finishing with two repeated, standalone Montgomery doublings of \((X_1 :Z_1)\) to give \((\hat{X} :\hat{Z})= 4(X_1 :Z_1)\)

  3. 3.

    Normalize: If \(\hat{Z}=0\), return \({\mathcal {O}}\), otherwise return \(\hat{x}_1 = \hat{X}/\hat{Z}\).

Notice that there is no validation of the input coordinate \(x_1 \in \mathbf{{F}}_p\), i.e., that we do not check whether \(x_1^3+Ax_1^2+x_1\) is a square in \(\mathbf{{F}}_p\), so that \(x_1\) corresponds to a point (or points) on \(E_A\). Avoiding this check in the presence of twist-security is due to Bernstein (cf. [5]), since even if \(x_1\) corresponds to a point on the quadratic twist \(E_A'\), the output of the Montgomery ladder corresponds to a scalar multiplication on \(E_A'\), because scalar multiplications on both curves use the same constant \((A+2)/4\). In this case, multiplication by \(\hat{k} = 4(\alpha r+k)\) on \(E_A'\) no longer corresponds to the scalar \(4k\), but rather to the scalar \(4k'\), where \(k' \equiv (\alpha r +k) \mod r'\) for \(\#E_A'(\mathbf{{F}}_p)=4r'\). This is not a problem in practice since the cofactor of 4 still clears torsion on the twist, and the twist-security ensures that the discrete logarithm problem has a similar difficulty in \(E_A'(\mathbf{{F}}_p)\) as it does in \(E_A(\mathbf{{F}}_p)\). Following the arguments developed in [4] (see also [5], App. A–B]), it could be possible to prove that no exceptions can occur in Montgomery ladder implementations of the curves in Table 2 that follow Steps 1–3 above, subject to addressing the issues below.

It should first be pointed out that the lack of validation means that there are some scalar/point combinations which could produce exceptions. For example, suppose \(k\) is chosen as the unique integer less than \(r'\) such that \(k \equiv -\alpha r \mod r'\). If \(k\) is also less than \(r\), then \(\hat{k}:=4(\alpha r + k)\) is a valid scalar according to Step 1 above. But, if an unvalidated \(x\)-coordinate, say \(x_1'\), corresponds to a point \(P_1'\) on \(E_A'\), then \(\hat{k}P_1 = {\mathcal {O}}\), because \((\alpha r + k) \equiv 0 \mod r'\); note that outputting \({\mathcal {O}}\) in Step 3 above could leak information to an attacker. Furthermore, in practice these ladder implementations are often used in conjunction with non-ladder implementations on (most likely a twisted Edwards model of) the same curve—see Sect. 6. In such a scenario, the refined forms of the scalars in this section do not match the forms of the scalars in Sect. 4.2, so if the scalars above were to be used on the twisted Edwards form of \(E_A\), then Proposition 3 and Proposition 4 no longer provide any guarantees. More specifically, if an implementation synchronizes the inherently larger Montgomery ladder scalars above to also be used on the twisted Edwards curve, then the argument of \(\hat{k} \in [4, 8, \ldots , 4(r-1)]\) that was used in the proof of Proposition 3 no longer holds when \(\alpha > 0\). Roughly speaking, the fact that \(\hat{k}/4\) is now outside the range \([1,r-1]\) means that the running multiple of an input point can now reach the dangerous stage of a scalar multiplication (which we handle by using complete additions) before the final addition.

In the Montgomery ladder implementation of Curve25519 [5], and in the complementary Edwards “Ed25519” implementation [9], it seems that the above problems are overcome by restricting the set of permissible scalars to be of a lesser cardinality than the prime subgroup order. Namely, Curve25519 has \(r,r' > 2^{252}\), with all scalars being of the form \(\hat{k} = 8 \cdot (2^{251}+k)\) for \(k \in [0,2^{251}-1]\). As well as guaranteeing that all of the possible scalars \(\hat{k}\) have the same bitlength, this prevents the existence of a \(\hat{k}\) such that \(\hat{k} \equiv 0 \mod r\) or \(\hat{k} \equiv 0 \mod r'\). On the other hand, it also means that for a fixed base point \(P\) of order \(r\) on Ed25519, less than half of the elements in \(\langle P \rangle \) are possible outputs when computing scalar multiplications of \(P\).

As one potential alternative, we remark that a hybrid solution which uses both Montgomery and twisted Edwards scalar multiplications could parse scalars differently: \(k \in [0,r-1]\) could be modified to \(\hat{k}:=4(\alpha r + k)\) in the Montgomery implementation, but modified to \(\hat{k}:=4k\) in the twisted Edwards implementation. If, in addition, all \(x\)-coordinates were validated in Step 1 of the Montgomery ladder routine,Footnote 7 then this may well be enough to prove that all scalar multiplications compute correctly and without exception: Proposition 3 would then apply directly to the twisted Edwards part, while the techniques in [4, 5] could be used to prove the Montgomery ladder part.

5 Implementation results

To evaluate the performance of the selected curves, we developed a software libraryFootnote 8 that includes support for three scenarios: variable-base, fixed-base and double-scalar multiplication. The library can perform arithmetic on \(a=-1\) twisted Edwards, \(a=-3\) Weierstrass and Montgomery curves, and supports all of the new curves from Sect. 3, with exception of the Weierstrass curves with reduced bitlength (see Tables 1, 2). The implementation of the library is largely in the C-programming language with the modular arithmetic implemented in x64 assembly for Windows.

Taking the above into account, we remark that the purpose of the library is to allow the comparison and evaluation of a large number of curve options, using a generic design that is flexible, reduces code size, and eases maintenance effort. Nevertheless, the library achieves good performance in comparison with standalone implementations that are tailored towards speed records.

It is well known that it is non-trivial to create an efficient and secure implementation of cryptographic primitives (for use in elliptic curve cryptography). Complete formulas might avoid certain pitfalls to the programmer, but this can come at a performance cost. As illustrated in Sect. 4, and by our software library, it is possible to have efficient, constant-time, and exceptionless scalar multiplications with a reasonable easy implementation strategy.

Table 3 Experimental results for variable-base, fixed-base and double-scalar multiplication

Table 3 shows the performance details of scalar multiplication in the three scenarios of interest. Variable-base scalar multiplication is computed with the fixed-window method (see Algorithm 1 in Appendix A) using window width \(w=6\). Fixed-base scalar multiplication was computed using the mLSB-set method (see Algorithm 7 in Appendix A) using parameters \(w=5\) and \(v=4\) for the twisted Edwards curves at the 128-bit security level; all other cases use \(w=6\) and \(v=3\). These values correspond to precomputed tables of sizes: 6, 9 and 12 KB for Weierstrass curves at the 128-, 192- and 256-bit security levels, respectively, and 6, 13.5 and 18 KB for twisted Edwards curves at the 128-, 192- and 256-bit security levels, respectively. Double scalar multiplication was computed using the \(w\)NAF method with interleaving (see Algorithm 9 in Appendix A) using window width \(w_1=6\) for the variable base and \(w_2=7\) for the fixed base. The latter corresponds to precomputed tables with sizes: 2, 3 and 4 KB for Weierstrass curves at the 128-, 192- and 256-bit security levels, respectively, and 3, 4.5 and 6 KB for twisted Edwards curves at the 128-,192- and 256-bit security levels, respectively. The results (expressed in terms of computer cycles) were obtained by running and averaging \(10^4\) iterations of each computation on an Intel Core i7-2600 (Sandy Bridge) processor with Intel’s Turbo Boost and Hyper-Threading disabled. The variable- and fixed-base scalar multiplication routines have a constant running time which guards against various types of timing attacks [20, 38], including cache attacks [50] (e.g., see [19] in the asymmetric setting). This means that no conditional branches on secret data or secret indexes for table lookups are allowed in the implementations.

Our results suggest that reducing the size of the pseudo-Mersenne primes does not have a significant effect on the performance: below a factor \(1.04\) reduction of the running time at the expense of roughly half a bit of ECDLP security. However, using slightly smaller moduli in the setting of the Montgomery-friendly primes does pay off: a reduction of the running time by a factor \(1.20, 1.11\), and \(1.09\) at the \(128\)-, \(192\)-, and \(256\)-bit security level, respectively. This performance difference between pseudo-Mersenne and Montgomery-friendly primes can be explained by the fact that the final constant-time conditional subtraction in Montgomery multiplication can be omitted when reducing the modulus size appropriately. The size-reduced Montgomery-friendly primes are the best choice (with respect to performance) at the 128- and 192-bit security levels while the size-reduced pseudo-Mersenne prime is faster for the 256-bit security level. For full-word length moduli, Montgomery-friendly and pseudo-Mersenne primes achieve similar performance at the 128-bit security level, whereas full-word length pseudo-Mersenne moduli are the best option for the 192- and 256-bit security levels. The better performance of pseudo-Mersenne primes at high security levels can be explained by the inherent higher register pressure in our Montgomery-friendly implementations which results in more load and store operations for large moduli sizes. The faster arithmetic operations in the base field translate directly to optimizations in the different scenarios for the scalar multiplication.

In the setting of variable-base scalar multiplication the twisted Edwards implementation and the Montgomery ladder achieve similar performance at the 128 and 192-bit security levels. At the 256-bit security level the gap increases in favor of twisted Edwards which outperforms the Montgomery ladder by a factor 1.05.

Note that our best results using the twisted Edwards and Montgomery forms at the 128-bit security level are virtually equivalent to the state-of-the-art Montgomery ladder implementation of Curve25519 [5] (\(194,000\) cycles on the benchmark machine “sandy0” [13]). Given the significant level of code optimization applied on the Curve25519 implementation which includes full use of assembly for the curve and field arithmetic, this comparison demonstrates the high efficiency of the chosen 254-bit Montgomery-friendly prime.

The state-of-the-art implementation of the NIST P-256 curve [31] can compute a variable-base scalar multiplication in \(400,000\) cycles on a Sandy Bridge CPU. Our curve w-256-mont offers better security properties and results in a \(1.43\) times reduction of the running time compared to [31]. When switching from prime order Weierstrass curves using full size moduli to composite order twisted Edwards curves with size-reduced moduli one can expect a reduction in the running time by a factor between \(1.25\) and \(1.44\) at the price of a slight decrease in ECDLP security.

6 Real-world protocols

Although significant research has been devoted to optimize the most popular ECC operation (the variable-base scalar multiplication), in real-world cryptographic solutions it is often not as simple as computing just a single scalar multiplication with an unknown base. Cryptographic protocols typically require a combination of different types of scalar multiplications including fixed-, variable-base and multiple-scalar operations. In this section we study the TLS protocol, more specifically the computation of the TLS handshake using the ECDHE–ECDSA cipher suite. We outline the impact of using different curve and coordinate systems in practice.

Table 4 Cost estimates for the TLS handshake using the ECDHE–ECDSA cipher suite for different security levels where we consider the elliptic curve scalar multiplications

TLS with perfect forward secrecy Support for using elliptic curves in the TLS protocol is specified in RFC 4492 [14]. The cipher suites specified in this RFC use the elliptic curve Diffie–Hellman (ECDH) key exchange, whose keys may either be long-term or ephemeral. We focus our analysis on the latter case (denoted by ECDHE) since it offers perfect forward secrecy. Besides the usage of elliptic curves in the DH key exchange, TLS certificates contain a public key that the server uses to authenticate itself: this is an ECDSA public key for the case of the ECDHE–ECDSA cipher suite. The TLS handshake, using the ECDHE–ECDSA cipher suite, consists of three main components. The ECDSA signature generation (fixed-base scalar multiplication), ECDSA signature verification (double scalar multiplication), and ECDHE (one fixed- and one variable-base scalar multiplication).Footnote 9 We consider Weierstrass and twisted Edwards curves separately, with and without point compression. The cost of decompressing a point in Weierstrass and twisted Edwards form is stated in Table 7 (where we follow the approach described in [9] to decompress points on twisted Edwards curves).

When using Weierstrass curves the situation is not complicated: transmitting compressed points costs a single conversion while no additional cost is needed when transmitting uncompressed points. In the setting of twisted Edwards curves there are more possibilities. The simplest approach is to only use the Montgomery form; however, this is expensive since the Montgomery ladder cannot take advantage of the fixed-base setting. One might consider a hybrid solution: computing the fixed-base scalar multiplication using the birationally equivalent twisted Edwards curve while computing the variable-base scalar multiplication using the Montgomery ladder. In such a hybrid solution the protocol should specify if the coordinates are transmitted in (compressed) twisted Edwards or Montgomery coordinates (which are already in compressed form). When using such a hybrid solution in the setting of ECDHE, transmitting the points in Montgomery form is best (see Table 7). The cost for the conversion (between Montgomery and twisted Edwards) is roughly the same as when only using twisted Edwards curves and transmitting compressed points.

Table 4 gives the cost estimates for the separate components and total cost of the TLS handshake using the ECDHE–ECDSA cipher suite for different security levels. It includes the options with the best results for the cases of Weierstrass curves, twisted Edwards curves and the hybrid approach combining the use of the Montgomery ladder and twisted Edwards. The results suggest that the approach using only twisted Edwards achieves similar performance to the hybrid approach using the Montgomery ladder, while it avoids conversions between coordinate systems (the performance gap between both approaches is below 4 % in all the cases, compressed or uncompressed form). Furthermore, our Montgomery ladder implementations do not include the extra validation step discussed at the end of Sect. 4.3; if incorporated, this would incur additional overhead.

The results in Table 4 also show that the use of twisted Edwards for the ECDHE and full ECDHE–ECDSA computations are approximately a factor 1.46, 1.26 and 1.24 faster in comparison to the Weierstrass curves at the 128-, 192- and 256-bit security levels, respectively. We also include the results from [31] when using NIST P-256. In [31] the fixed-base scalar multiplication is implemented using a relatively large (slightly over 150 KB) lookup table for the fixed-base scalar multiplication. It is unclear if this implementation accesses the table-lookup elements in a cache-attack resistant manner and if the dedicated addition formula used takes care of exceptions, and if so if this is done in constant time. This might explain the faster implementation results.

As a reference we also include the results for Ed25519 [9] (obtained from the “sandy0” benchmark machine [13]), which is a Schnorr-like signature scheme based on a twisted Edwards curve isomorphic to Curve25519. Note that [9] only computes signatures; when computing ECDH one could use the approach as described in [5] which uses the Montgomery ladder. In order to achieve perfect forward secrecy (ECDHE), the implementation can compute the fixed-base scalar multiplication using the Montgomery ladder (which is slow) or convert the point and compute the fixed-base scalar multiplication using the corresponding twisted Edwards curve (using a hybrid approach).

7 Conclusions

In this paper we have presented new elliptic curves for cryptography targeting the 128-, 192-, and 256-bit security levels. By considering different choices for the base field arithmetic, pseudo-Mersenne and Montgomery-friendly primes, we deterministically selected efficient twisted Edwards curves as well as traditional Weierstrass curves. Instead of resorting to the slower complete formulas, we show how to compute efficient scalar multiplications by using constant-time, exceptionless, dedicated group operations. For the cases in which they are not guaranteed to be exceptionless, we have proposed an efficient “complete” addition formula based on masking techniques for Weierstrass curves. Our implementation of the scalar multiplication in the three most-widely deployed scenarios show that our new backwards compatible Weierstrass curves offer enhanced security properties while improving the performance compared to the standard NIST Weierstrass curves. At the expense of at most a few bits of ECDLP security, our new twisted Edwards curves offer a performance increase of a factor 1.2–1.4 compared to our new Weierstrass curves. We demonstrated the potential cryptographic impact by showing cost estimates for these curves inside the TLS handshake protocol.