Selecting elliptic curves for cryptography: an efficiency and security analysis

Bos, Joppe W.; Costello, Craig; Longa, Patrick; Naehrig, Michael

doi:10.1007/s13389-015-0097-y

Selecting elliptic curves for cryptography: an efficiency and security analysis

Regular Paper
Published: 01 May 2015

Volume 6, pages 259–286, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Cryptographic Engineering Aims and scope Submit manuscript

Selecting elliptic curves for cryptography: an efficiency and security analysis

Download PDF

Joppe W. Bos¹,
Craig Costello²,
Patrick Longa² &
…
Michael Naehrig²

1492 Accesses
56 Citations
5 Altmetric
Explore all metrics

Abstract

We select a set of elliptic curves for cryptography and analyze our selection from a performance and security perspective. This analysis complements recent curve proposals that suggest (twisted) Edwards curves by also considering the Weierstrass model. Working with both Montgomery-friendly and pseudo-Mersenne primes allows us to consider more possibilities which help to improve the overall efficiency of base field arithmetic. Our Weierstrass curves are backwards compatible with current implementations of prime order NIST curves, while providing improved efficiency and stronger security properties. We choose algorithms and explicit formulas to demonstrate that our curves support constant-time, exception-free scalar multiplications, thereby offering high practical security in cryptographic applications. Our implementation shows that variable-base scalar multiplication on the new Weierstrass curves at the 128-bit security level is about 1.4 times faster than the recent implementation record on the corresponding NIST curve. For practitioners who are willing to use a different curve model and sacrifice a few bits of security, we present a collection of twisted Edwards curves with particularly efficient arithmetic that are up to 1.42, 1.26 and 1.24 times faster than the new Weierstrass curves at the 128-, 192- and 256-bit security levels, respectively. Finally, we discuss how these curves behave in a real-world protocol by considering different scalar multiplication scenarios in the transport layer security protocol. The proposed curves and the results of the analysis are intended to contribute to the recent efforts towards recommending new elliptic curves for Internet standards.

Security and efficiency trade-offs for elliptic curve Diffie–Hellman at the 128-bit and 224-bit security levels

Article 12 April 2021

Faster Compact Diffie–Hellman: Endomorphisms on the x-line

A Compact and Exception-Free Ladder for All Short Weierstrass Elliptic Curves

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The first release of a cryptographic standard specifying elliptic curves for use in practice dates back to 2000 [21]. Nowadays, roughly one out of ten systems on the publicly observable Internet offers cipher suites in the Secure Shell (SSH) and Transport Layer Security (TLS) protocols that contain elliptic-curve-based cryptographic algorithms [16]. Most elliptic curve standards recommend curves for different perceived security levels that are either defined over prime fields or binary extension fields; on the Internet, however, the deployed curves are mostly defined over prime fields [16]. This can be partially explained by the increasing skepticism towards the security of elliptic curves defined over binary extension fields (justified by recent progress on solving the discrete logarithm problem on such curves [26]). Therefore, in this work, we only consider elliptic curves defined over prime fields.

Recently, part of the cryptographic community has been looking for alternatives to the currently deployed elliptic curves that may offer better performance and provide stronger overall security (see for example an evaluation of recent curve candidates in [12]). Most notably, the TLS working group has issued a formal request to the Crypto Forum Research Group (CFRG) asking for recommendations for new elliptic curves. The urge to change curves has been fueled by the recently leaked NSA documents, which suggest the existence of a back door in the Dual Elliptic Curve Deterministic Random Bit Generator [55]. Although cryptographers have suspected this at least as early as in 2007 [52], these recent revelations have accelerated a controversy on whether the widely deployed NIST curves [57] should be replaced by curves with a verifiably deterministic generation. Besides such security concerns, there has been significant progress related to both efficiency and security since the initial standardization of elliptic curve cryptography. Notable examples are algorithms protected against certain side-channel attacks, different “special” prime shapes which allow faster modular arithmetic, and a larger set of curve models from which to choose. For example, Edwards [25] discovered an interesting normal form for elliptic curves, now called the Edwards model, which was introduced to cryptographic applications by Bernstein and Lange [11]. A generalization of this curve model, known as the twisted Edwards model [7], facilitates the most efficient curve arithmetic [35]. Such (twisted) Edwards curves also have other attractive properties: they may be selected to support a complete addition law and are compatible with the Montgomery model, which supports efficient Montgomery ladder computations [47]. However, twisted Edwards curves cannot have a prime number of rational points over the base field, and they are therefore incompatible with the prime-order Weierstrass curves used in all of the current cryptographic standards [21, 48, 57].

Related work

The NIST curves [57] have been included in numerous standards (e.g., [21, 48]) and are deployed in many security protocols. The most recent speed record on the NIST curve which aims to provide 128-bit security is due to Gueron and Krasnov [31]. Alternatives to the NIST curves have been suggested by the German working group Brainpool [24]; their curve choices followed additional security requirements, one of which demands verifiably pseudo-random curve generation. Another alternative curve has been proposed by Bernstein [5]; this is a Montgomery curve, called Curve25519, which allows efficient computation of ECDH using the Montgomery ladder at the 128-bit security level. It was later shown by Bernstein et al. [9] that a twisted Edwards curve, birationally equivalent to Curve25519, can be used for efficient elliptic curve signature generation and verification. Recently, Bernstein and Lange started a project to select and analyze secure elliptic curves for use in cryptography: see [12] for a list of the security assessments the project performs and the requirements it imposes. A range of curves, targeting different security levels, is also presented in [12]. Following this, several new curves satisfying the requirements from [12], which facilitate both the twisted Edwards and Montgomery form, were proposed by Aranha et al. [3].

Motivation and rationale

The new curves presented in [3, 12] are all efficient and secure elliptic curves ready to be used in cryptography. This prompts the question as to why we should perform an efficiency and security analysis for a set of new curves. It is our opinion that not all options for prime fields and elliptic curve models have been considered in the recent curve proposal projects (either because they are overlooked or do not fit the requirements set by the project). Our goal is to rigorously analyze all of these different aspects from both a security and efficiency perspective, in hope that this paper helps practitioners better understand (and correctly implement) the choices that lie in front of them. Abandoning a set of standard curves demands a judicious selection of new curves, since this cannot be done too frequently if widespread adoption is desired. In that light, it is our opinion that one should consider all of the options available. For example, in contrast to [3, 12], our selection includes prime order Weierstrass curves. Just as the almost-prime order twisted Edwards curves have their practical advantages, we argue that there are also benefits to choosing prime order Weierstrass curves: the absence of small torsion simplifies the point/input validation process, and (over a prime field of fixed length) does not sacrifice any bits of security with respect to attacks on the underlying elliptic curve discrete logarithm problem (ECDLP). In addition, such curves are backwards compatible with current implementations supporting NIST curves over prime fields (i.e., no changes are required in protocols), and could be integrated into existing implementations by simply changing the curve constant and (in some cases) field arithmetic.^{Footnote 1}

We investigate the selection of prime moduli that allow efficient modular arithmetic. As in [3, 5, 12, 15, 35, 41], we study pseudo-Mersenne primes of the form $2^{\alpha }-\gamma $, but also primes of the form $2^{\alpha }(2^{\beta }-\gamma )-1$ that can be used to accelerate Montgomery arithmetic [46] as used in [15, 32]. Following the deterministic selection requirement from [12], we pick two primes of each shape for a given targeted security level: one prime is selected to be slightly smaller than the other, which sacrifices a small amount of ECDLP security in favor of enhanced performance. Note that, as explained in Sect. 2, for practical considerations we require all primes to be congruent to $3$ modulo $4$. These primes are used to construct cryptographically suitable curves focusing on (arguably) the two most relevant curve models: short Weierstrass curves with the curve parameter $a$ set to $-3$ and twisted Edwards curves with the curve parameter $a$ set to $-1$. The prime order Weierstrass curves give full ECDLP security over prime fields of a fixed bitlength, while offering good practical performance. On the other hand, the twisted Edwards curves sacrifice a small amount of ECDLP security but facilitate the fastest realization of curve arithmetic [35]. Both types of curves are selected in a deterministic fashion (see Sect. 3 for the full details) and offer twist-security [5], a property which is useful in certain scenarios. We note that our prime and curve selection is meant to cover a wide range of options exhibiting attractive features. Nevertheless, there are other design alternatives that might offer different trade-offs between security, rigidity and performance on different platforms. We leave the investigation of other options as future work.

An important requirement for implementations of modern cryptographic algorithms is a constant runtime when the algorithm computes on secret data to guard against timing attacks [38]. In particular, this potential threat exists for two basic elliptic curve operations: variable-base and fixed-base scalar multiplication. One solution is to use a complete addition law. However, a complete addition law is typically less efficient compared to the dedicated formulas which can fail for certain inputs. In Sect. 4 we outline another solution to this problem for the variable-base case. We show that our algorithms which compute on secret data, can never run into any exceptional cases (i. e. produce incorrect results) while using the faster dedicated formulas and ensuring a constant runtime (with the exception of the very last addition; see Sect. 4.1 for the details). Hence, this solution results in faster implementations compared to the complete solution. In the fixed-base case the situation is more complicated: most efficient algorithms in the literature may potentially run into exceptions. While the use of a complete addition formula suffices to solve the problem on twisted Edwards curves, the high cost of complete additions on Weierstrass curves would degrade performance significantly [18] (see Appendix C.1). To solve this problem, we propose a new formula that works for all possible inputs by exploiting masking techniques. This pseudo-complete addition requires the same number of multiplications and squarings as the unprotected dedicated addition formula and drastically reduces the overhead of protecting scalar multiplication. We comment that the formula is also useful in the context of secure, exception-free multi-scalar multiplications. The reader is referred to Appendix C.1 for more details on the new formula.

We do not claim full security against other attacks such as simple power analysis (SPA); this is left for future work. Nevertheless, we remark that all the selected algorithms have a regular structure as required when implementing countermeasures against certain simple side-channel attacks.

Summary of contributions

Analysis of a new set of deterministically selected prime-order Weierstrass curves (see Table 1) which are defined over pseudo-Mersenne and Montgomery-friendly primes whose bitlengths match those of the NIST primes. See Sects. 2 and 3.
Analysis of a new set of deterministically selected composite-order twisted Edwards curves (see Table 2 and Sect. 3). In contrast to existing curve proposals, the selected curves present (simultaneously) minimal parameter $d$ in the twisted Edwards form and minimal parameter $A$ in isogenous Montgomery form (minimal in absolute value). See Sect. 3.3.
A new, (pseudo-)complete addition algorithm for general curves in short Weierstrass form. This algorithm works for all pairs of inputs and its execution incurs only a small overhead compared to the dedicated addition law. See Sect. C.1.
We demonstrate how to use the scalar multiplication algorithms and prove that they become exception-free and facilitate constant-time implementations when used this way. This allows one to use the more efficient dedicated formulas whenever possible, resulting in an efficient and secure solution for elliptic curve scalar multiplication. See Sect. 4.
A comprehensive software implementation providing timings for various scenarios; this includes performance estimates for the above curves when used in the context of the TLS protocol. See Sects. 5 and 6.

Proposed curves

Tables 1 and 2 show the curves that we have chosen deterministically according to our security and efficiency criteria. The tables show the target security level, which gives a rough estimate for the desired security in each case. Curve names indicate the curve model [w for the Weierstrass model and ed for the (twisted) Edwards model], the bitlength of the underlying base field prime and the type of prime (mont for Montgomery-friendly and mers for pseudo-Mersenne primes). In Appendix D, we provide the trace of Frobenius $t$ for each curve, so the number of $\mathbf{{F}}_p$-rational points for the curve $E$ and its quadratic twist $E'$ can be computed as $\#E(\mathbf{{F}}_p) = p + 1 - t$ and $\#E'(\mathbf{{F}}_p)=p+1+t$. More details on the curve choices and their properties are given in Sect. 3.

Table 1 Summary of our chosen Weierstrass curves of the form $E_b/\mathbf{{F}}_p:y^2=x^3-3x+b$ defined over $\mathbf{{F}}_p$ with quadratic twist $E_b'/\mathbf{{F}}_p: y^2=x^3-3x-b$ and target security level $\lambda $

Full size table

Table 2 Summary of our chosen twisted Edwards curves of the form $\mathcal {E}_d/\mathbf{{F}}_p: -x^2 + y^2 = 1 + dx^2y^2$ defined over $\mathbf{{F}}_p$, where $d = -(A-2)/(A+2)$, and the target security level is $\lambda $

Full size table

2 Modular arithmetic: choosing primes

Over a prime field $\mathbf{{F}}_p$ (with $p>3$ prime), the computation of the elliptic curve group operation boils down to numerous computations modulo $p$. In this section, we outline the types of primes that we prefer for efficiency and security considerations, and discuss how the primes are uniquely determined from a fixed security level. We have not experimented with using a smaller radix system to accumulate the intermediate carries, at the cost of increasing the number of multiplications. We leave the investigation of such approaches as future work.

Primes of the form $2^\alpha -\gamma $ Selecting primes of a special form to enhance the performance of the modular reduction is not new. The primes standardized in the digital signature standard [57] have a special form allowing fast reduction based on the work by Solinas [53]. Even faster modular reduction can be achieved by selecting primes of the form $p = 2^\alpha -\gamma $, known as pseudo-Mersenne primes. In this case, the value $\alpha $ is determined by the security parameter and is typically a multiple of $64$ (or slightly smaller). The integer $\gamma $ is chosen to be a small positive integer, i.e., significantly smaller than $2^{32}$. Given two integers $x$ and $y$ such that $0\le x,y<2^\alpha -\gamma $, one can compute $x\cdot y \mod (2^\alpha -\gamma )$ by first computing the product and writing this in a radix-$2^\alpha $ system as $x\cdot y=z_h\cdot 2^\alpha + z_\ell $. A first reduction step, based on the shape of the modulus, is $z_h\cdot 2^\alpha + z_\ell \equiv z_\ell +z_h\cdot \gamma \pmod {2^\alpha -\gamma } = z$, where $0\le z <(\gamma +1) 2^\alpha $. If this step is repeated, the result is such that $0\le z < 2^\alpha + \gamma ^2$, which can finally be brought into the desired range by applying an additional correction modulo $p$ using subtractions. A standard way of enhancing the performance is to use a redundant representation: instead of reducing $z$ to the range $[0,2^\alpha -\gamma )$, one can often more efficiently reduce $z$ to the range $[0,2^\alpha )$, or to the range $[0,2^{2s})$ if $\alpha $ is a few bits smaller than $2s$ (at a target security level of $s$ bits). The latter case can be optimized further by computing exclusively in such a redundant form and performing a sole correction at the end of the scalar multiplication.

Given a security level of $s$ bits, we consider the parameter $\alpha \in \{2s,2s-1\}$. Taking $\alpha =2s$ makes the prime as large as possible, matching one of the requirements to achieve maximal ECDLP security at the $s$-bit security level. Taking $\alpha =2s-1$ sacrifices half a bit of ECDLP security in favor of potential enhancements in efficiency, as described above. Thus, fixing $s$ results in two possible values for $\alpha $ and subsequently two primes of the form $2^{\alpha }-\gamma $: for a fixed $\alpha $, we choose the smallest $\gamma $ such that $2^{\alpha }-\gamma $ is both prime and congruent to $3$ modulo $4$ (the rational behind this congruence condition is discussed below). Following our curve selection criteria, the values $\gamma $ for the curves under analysis are always smaller than $2^{10}$, which makes them attractive for efficient implementation on 16, 32 and 64-bit platforms.

Primes of the form $2^\alpha (2^\beta -\gamma )-1$ Another approach to select primes is inspired by Montgomery arithmetic [46]. The idea behind Montgomery multiplication is to replace the relatively expensive divisions by computationally inexpensive logical shifts when computing the modular reduction. Some computations (and storage) can be avoided when primes of the form $p=2^\alpha (2^\beta -\gamma )-1$ are used for positive integers $\alpha , \beta $ and $\gamma $ (cf. [1, 15, 32, 37, 39]). When the prime $p$ is two bits short of a multiple of the word size $w$ (i.e., $w\mid \alpha +\beta +2$), one can avoid a conditional subtraction in every multiplication [58].

There are different ways to construct Montgomery-friendly primes: for example, [32] prefers $\gamma $ to be a power of two, while [15] sets $\beta =64$ and $\gamma $ as small as possible to specifically target 64-bit platforms. We make choices of $\alpha , \beta $ and $\gamma $ such that the modular arithmetic can be implemented efficiently on a wide range of platforms. Given a security level of $s$ bits, we consider $\alpha =8\delta $ and $\beta \in \{2s-\alpha ,2s-2-\alpha \}$, and choose $\gamma $ and $\delta $ as the smallest positive integers such that $p=2^\alpha (2^\beta -\gamma )-1$ is prime and $\lceil \log _2(p)\rceil =2s$ (resp. $\lceil \log _2(p)\rceil =2s-2$) in the setting of $\beta =2s-\alpha $ (resp. $\beta =2s-2-\alpha $). We start with $\delta =1$ and increment it by 1 (if necessary) until $\gamma $ is found. For instance, for $s=192$ and $\beta =2s-\alpha $, we observe that $(\delta , \gamma ) = (1, 79)$ results in a prime which can be written as

$$\begin{aligned} 2^{376}(2^8-79)-1= & {} 2^{352} (2^{32}-2^{24}\cdot 79)-1 \\= & {} 2^{320}(2^{64}-2^{56}\cdot 79)-1, \end{aligned}$$

for usage on 8-, 32- and 64-bit platforms, respectively. This has the advantage that the reduction step, which has to be computed at every iteration inside the interleaved Montgomery algorithm, can be computed using only a multiply-and-add and an addition instruction. Note that, by construction, primes of this form are always congruent to $3$ modulo $4$.

Constant-time modular arithmetic One of the measures to guard software implementations against various types of side-channel analysis such as timing attacks [38] is to ensure a constant running time. In practice, this often means writing code which does not contain branches depending on secret data. For instance, the interleaved Montgomery multiplication algorithm requires a conditional subtraction at the end. To remove this, we always compute the subtractions and select (mask) the correct value depending on the conditional flag. In the setting of primes of the shape $2^\alpha -\gamma $, one must always compute the worst-case number of reduction rounds in order to ensure constant runtime.

Besides the “standard” modular operations, there is also the need for constant-time methods to compute the modular inversion and the modular square roots. In order to compute the inversion modulo a prime $p$, one can use Fermat’s little theorem: i.e., compute $a^{p-2}\equiv a^{-1} \pmod p$. Since our chosen primes all have a special shape, finding efficient addition chains for this exponentiation is not difficult. For the $n$-bit primes considered in this work, we found that we can always compute the modular inversion using at most $1.11\lceil \log _2(p)\rceil $ modular multiplications and modular squarings. If $p\equiv 3 \pmod 4$, then one can compute a modular square root $x$ (if it exists) of an element $a$ using $x\equiv a^{\frac{p+1}{4}}\pmod p$. Since this can be performed efficiently, and in constant-time, we require all of our primes to be congruent to $3$ modulo $4$.

3 Curve selection

In this section we explain how the curves in Tables 1 and 2 were chosen based on the selection of primes that is outlined in Sect. 2. For each chosen prime $p \equiv 3\pmod 4$, we provide two curves: one is a prime order short Weierstrass curve, while the other is an almost-prime order twisted Edwards curve.

3.1 Curve selection for Weierstrass curves

For a fixed prime $p$, a specific curve $E_b: y^2 = x^3 - 3x + b$ is uniquely determined by the curve parameter $b \in \mathbf{{F}}_p \backslash \{\pm 2,0\}$. Note that, since $p \equiv 3 \mod 4$, its non-trivial quadratic twist $E_b'$ has the curve equation $E_b': y^2 = x^3 - 3x - b$. In order to guarantee twist-security [5], we require both the group orders $r=\#E_b(\mathbf{{F}}_p)$ and $r'=\#E_b'(\mathbf{{F}}_p)$ to be prime. We have $r = p+1-t$ and $r'=p+1+t$ for $|t| \le 2\sqrt{p}$ and demand $|t| > 1$ because curves with $t \in \{0,1\}$ are weak. Thus, depending on the sign of the trace $t$, either $r>p, r'<p$ or $r<p, r'>p$. To ease implementation, we demand that $r<p$ for all curves, i.e., we choose the curve with positive trace. To leave no room for manipulating the curve choice, we select all curve parameters deterministically, namely by choosing the integer $b$ with the smallest absolute value that yields a curve with the above properties. Based on these considerations, the selection process is completely explained in accordance with the rigidity condition of [12]. Specifically, we search for a suitable coefficient $b$ by starting with $b=1$ and incrementing $b$ by one until both $r$ and $r'$ are prime. For each value of $b$, we use the Schoof–Elkies–Atkin (SEA) point counting algorithm [51] in Magma [17] to compute the trace $t$ of $E_b$, such that $r=p+1-t$ and $r'=p+1+t$. We use the implementation’s ‘early abort’ feature that abandons the computation when small factors are found either in the curve’s or the twist’s group order. Because of the curve model for $E_b'$, the search only considers positive values of $b$ and we select the sign of $b$ to ensure that $r<p$. The resulting curves are summarized in Table 1.

3.2 Curve selection for twisted Edwards (and Montgomery) curves

For a fixed prime $p$, a specific twisted Edwards curve $\mathcal {E}_d: -x^2 + y^2 = 1 + dx^2y^2$ is uniquely determined by the curve parameter $d \in \mathbf{{F}}_p \backslash \{0,-1\}$. Let $A= 2\frac{1-d}{d+1}$, and $B=-(A+2)$. Theorem 3.2 of [7] shows that the twisted Edwards curve $\mathcal {E}$ and the Montgomery curve $By^2=x^3+Ax^2+x$ are birationally equivalent. If $B$ is a square in $\mathbf{{F}}_p$ (which it is for all our curves), then $\mathcal {E}_d$ is birationally equivalent to $E_A :y^2 = x^3 + Ax^2 + x$. As for the Weierstrass curves, we demand $t>1$ to exclude the weak curves with $t\in \{0,1\}$ and to ensure that $4r<p$.

Ideally, it would be desirable to have a curve with minimal parameter $d$ in the twisted Edwards form and minimal parameter $A$ in the Montgomery form. Unfortunately, existing curve proposals have been forced to pick one form and optimize it at the expense of the other one. We show in Sect. 3.3 below, that a search minimizing the absolute value of the parameter $d$ would find curves with the same group orders for curve and twist, where the latter corresponds to $-(d+1)$. This means that a search for minimal absolute value of $d$ will always find positive $d$ first, which corresponds to negative $A$. Our search thus minimizes the absolute values of $A$ and $d$ at the same time.

For each fixed $p$, we start with $A=-6$ and search for $A \in 2+4\mathbf{{Z}}$ (subtracting 4 each time) until $\#E_A=4r$ and $\#E_A'=4r'$, where $r$ and $r'$ are both prime. Note that the discussion in Sect. 3.3 also shows that $B=-(A+2)$ is always a square in $\mathbf{{F}}_p$, which means that $E_A': y^2 = x^3 - Ax^2 + x$ is a model for the non-trivial quadratic twist of $E_A$. Again, for each $A$, we use the SEA algorithm [51] in Magma [17] to compute the trace $t$ of $E$, which determines $\#E_A=p+1-t$ and $\#E_A'=p+1+t$. Section 3.3 also shows that $A^2-4$ is non-square in $\mathbf{{F}}_p$, which simplifies notions of completeness on $E$ (see [5]). Furthermore, we check that the curve satisfies all conditions posed by [12], if one of them is not met,^{Footnote 2} we continue with the next value for $A$. We note that the cofactors of $4$ are minimal when insisting on an $\mathbf{{F}}_p$-rational twisted Edwards and/or Montgomery form. The resulting curves are summarized in Table 2.

3.3 Correspondence between minimal $A$ and $d$ for twisted Edwards curves

Table 2 contains a column with values for the parameter $d_0 = -(A+2)/4$, which can be used for implementing twisted Edwards curves defined over our prime fields. The curve $\mathcal {E}_{d_0}/\mathbf{{F}}_p: -x^2 + y^2 = 1 + d_0x^2y^2$ has the same number of $\mathbf{{F}}_p$-rational points as the curve $\mathcal {E}_d/\mathbf{{F}}_p: -x^2 + y^2 = 1 + dx^2y^2$ with $d = -(A-2)/(A+2)$ and the Montgomery curve $E_A/\mathbf{{F}}_p: y^2=x^3+Ax^2+x$. Furthermore, the curve $\mathcal {E}_{-(d_0+1)}/\mathbf{{F}}_p: -x^2 + y^2 = 1 - (d_0+1)x^2y^2$ has the same number of $\mathbf{{F}}_p$-rational points as the quadratic twist $\mathcal {E}_d'$ and the quadratic twist $E_{-A}$. In this section, we show that this is true in general, and that therefore, the relation between $d_0$ and $A$ shows that the value $d_0$ is the minimal value for $d$ defining $\mathcal {E}_d$ such that all the criteria in our curve selection are satisfied if and only if $A$ is the minimal such value for the Montgomery form. This shows that it is not necessary to search for a new set of twisted Edwards curves if one wants to minimize the parameter $d$ instead of the Montgomery parameter $A$. One can simply use the curve defined by $d_0$.

The following lemma connects the two twisted Edwards curves $\mathcal {E}_d$ and $\mathcal {E}_{d_0}$ via an isogeny whenever $d_0 = -1/(d+1)$. It also gives a condition on $d_0$ which determines whether the map is defined over $\mathbf{{F}}_p$. If this is the case, both curves have the same number of $\mathbf{{F}}_p$-rational points.

Lemma 1

Let $\mathcal {E}_d: -x^2 + y^2 = 1 + dx^2y^2$ be a twisted Edwards curve defined over a prime field $\mathbf{{F}}_p$ and let $d_0 = -1/(d+1) \in \mathbf{{F}}_p$. Then there exists a $4$-isogeny $\phi : \mathcal {E}_d \rightarrow \mathcal {E}_{d_0}$. If $d_0$ is a square in $\mathbf{{F}}_p$, the isogeny is defined over $\mathbf{{F}}_p$, in particular $\#\mathcal {E}_d(\mathbf{{F}}_p) = \#\mathcal {E}_{d_0}(\mathbf{{F}}_p)$.

Proof

The isogeny $\phi $ is one of the isogenies described in Section 3 of [2]. This means, it is the composition of maps

$$\begin{aligned} \phi = \hat{\psi }_{-1,-1/(d+1)}\circ \sigma \circ \psi _{-1,d}. \end{aligned}$$

The map $\psi _{-1,d}$ is the $2$-isogeny $\psi _{-1,d}: \mathcal {E}_d \rightarrow L_{-d}$ to the Legendre form curve $L_{-d}: y^2 = x(x-1)(x+d)$ given in [2], Theorem 3.2], and $\hat{\psi }_{-1,-1/(d+1)}: L_{1/(d+1)} \rightarrow \mathcal {E}_{-1/(d+1)}$ is the dual of the corresponding isogeny for $1/(d+1)$. The map $\sigma $ is equal to the isomorphism $\sigma _2\sigma _1: L_{-d} \rightarrow L_{1/(d+1)}$ given in [2], Section 3.2]. The composition $\phi $ is defined over $\mathbf{{F}}_p$ if $d_0$ and thus $-(d+1)$ is a square in $\mathbf{{F}}_p$. This proves the lemma. $\square $

The next result uses the previous isogeny to show that the original curve $\mathcal {E}_d$ and its twist $\mathcal {E}_d'$ each have corresponding curves with small parameters $d_0$ and $-(d_0+1)$, respectively, which have the same number of $\mathbf{{F}}_p$-rational points, provided that both these small parameters are squares in $\mathbf{{F}}_p$.

Lemma 2

Let $A\in \mathbf{{F}}_p{\setminus }\{-2,2\}, d = -(A-2)/(A+2)$ and $d_0 = -(A+2)/4$ such that both $d_0$ and $-(d_0 + 1)$ are squares in $\mathbf{{F}}_p$. Then $\#\mathcal {E}_d(\mathbf{{F}}_p) = \#\mathcal {E}_{d_0}(\mathbf{{F}}_p)$. Moreover, $\#\mathcal {E}_d'(\mathbf{{F}}_p) = \#\mathcal {E}_{-(d_0+1)}(\mathbf{{F}}_p)$.

Proof

The first part follows from Lemma 1 because $d_0 = -1/(d+1)$. Since the twist $\mathcal {E}_d' = \mathcal {E}_{1/d}$, the second part follows from Lemma 1 with $d$ replaced by $1/d$, which means that $d_0$ is replaced by $-(d_0 + 1)$. $\square $

Finally, we show that indeed our search criteria, in particular the facts that $p \equiv 3 \pmod 4$ and that both group orders are not divisible by $8$, imply that $d_0$ and $-(d_0+1)$ as given in our setting are squares in $\mathbf{{F}}_p$, which shows that the correspondence above holds.

Lemma 3

Let $p \equiv 3 \pmod 4, d_0\in \mathbf{{F}}_p$ and let $\mathcal {E}_{d_0}: -x^2 + y^2 = 1 + d_0x^2 y^2$ be a twisted Edwards curve such that $\#\mathcal {E}_{d_0}(\mathbf{{F}}_p) = 4r$ and $\#\mathcal {E}_{d_0}'(\mathbf{{F}}_p) = 4r'$ for primes $r$ and $r'$. Then $d_0$ and $-(d_0+1)$ are both squares in $\mathbf{{F}}_p$.

Proof

We first prove that $d_0$ is a square in $\mathbf{{F}}_p$. Assume that it is not a square. Section 3 in [8] provides an exhaustive description of all points of order $2$ and $4$ on a twisted Edwards curve. If $d_0$ is not a square, then $-1/d_0$ is a square because $p \equiv 3 \pmod 4$. Then the full $2$-torsion is defined over $\mathbf{{F}}_p$, it consists of the affine point $(0,-1)$ and two points at infinity $((1:0),(\pm \sqrt{-1/d_0}))$ (written as completed points in projective space ${\mathbb {P}}^1 \times {\mathbb {P}}^1$, see [8], Section 2.7]). Let $s\in \mathbf{{F}}_p$ with $s^2 = -1/d_0$, then exactly one of $\pm s$ is a square, assume without loss of generality that it is $s$. Then this value gives $4$ affine points $(\pm \sqrt{s},\pm \sqrt{s})$ (signs chosen independently) of order $4$ defined over $\mathbf{{F}}_p$. The group structure of the $4$-torsion on $\mathcal {E}_{d_0}$ that is defined over $\mathbf{{F}}_p$ is thus $\mathbf{{Z}}_2 \times \mathbf{{Z}}_4$ and has order $8$. Therefore $8$ must divide $\#\mathcal {E}_{d_0}(\mathbf{{F}}_p)$, which contradicts our assumption that the group order is $4r$ for $r$ prime. Hence, $d_0$ is a square.

We know that the twist $\mathcal {E}'_{d_0}$ is birationally equivalent to $\mathcal {E}_{1/d_0}$, and we have already shown that $d_0$ is a square, so $1/d_0$ is a square. We can apply Lemma 1 with $d_0$ replaced by $1/d_0$, which means that $d=-(d_0 + 1)$, and obtain that $\#\mathcal {E}_{-(d_0 +1)}(\mathbf{{F}}_p) = \#\mathcal {E}_{1/d_0}(\mathbf{{F}}_p) = 4r'$. Now looking at the 4-torsion defined over $\mathbf{{F}}_p$ as above yields that $-(d_0 + 1)$ is a square in $\mathbf{{F}}_p$. $\square $

The minimality of ${\mathbf{d}_\mathbf{0}}$ All our selected twisted Edwards curves satisfy the conditions of the previous two lemmas. Therefore, one can choose to work with the isogenous curves defined by $d_0$ or $-(d_0+1)$, whichever is more convenient. The isogenous curves and their twists have the same orders as the original curves and their twists. Therefore all conditions required in the curve selection are satisfied with the added benefit of a small $d$-value.

We argue that $d_0$ is of minimal absolute value defining a curve that satisfies the search criteria. Assume that $A$ is a coefficient with minimal absolute value that yields a desired curve when minimizing for the Montgomery parameter (like the values for $A$ in our examples). A search that minimizes the absolute value of the parameter $d$ in the twisted Edwards model $\mathcal {E}_d$, must find the value $d_0 (\hbox {or }-(d_0+1))$ first since $A = -(4d_0 + 2)$. Without loss of generality, let $|d_0| < |d_0 + 1|$, i.e., $d_0 > 0$, otherwise interchange $d_0$ and $-(d_0+1)$. Indeed, assume that a $d_1$ with $|d_1|<|d_0|$ leads to a curve that satisfies all criteria. Let $A_1 = -(4d_1 + 2)$. By Lemma 3, $d_1$ and $-(d_1+1)$ are squares, then by Lemma 2, $\#\mathcal {E}_{d_1}(\mathbf{{F}}_p) = \#\mathcal {E}_{\hat{d}_1}(\mathbf{{F}}_p) = \#E_{A_1}(\mathbf{{F}}_p)$, where $\hat{d}_1 = -(A_1-2)/(A_1+2)$ and $\#\mathcal {E}_{d_1}'(\mathbf{{F}}_p) = \mathcal {E}_{-(d_1+1)}(\mathbf{{F}}_p) = \#\mathcal {E}_{1/\hat{d}_1}(\mathbf{{F}}_p) = \#E_{-A_1}(\mathbf{{F}}_p)$. This means that the curve $E_{A_1}$ satisfies the search criteria.

Since we fixed $d_0 > 0$, we have $A<0$. By assumption, we have $-A = 4d_0 + 2 = 4|d_0| + 2 >4|d_1|+2$. Now consider the two cases $d_1>0$ and $d_1<0$. If $d_1>0$, then $A_1 = -(4d_1+2)<0$, and $|A| = -A > 4|d_1| + 2 = 4d_1 + 2 = -A_1 = |A_1|$, contradicting the minimality of $A$. Similarly, if $d_1<0$, then $A_1>0$ and $|A| = -A > 4|d_1| + 2 > 4|d_1+1| + 2 = -4(d_1+1) + 2 = -(4d_1+2) = A_1 = |A_1|$, again a contradiction. Overall, this means that $d_0$ must be the coefficient with minimal absolute value.

3.4 Curve properties

In both families of curves, note that for primes of the form $2^{\alpha }-\gamma $, the bitlengths of $r$ and $r'$ differ by 1, since $|t| \gg \gamma $ in general; for primes of the form $2^{\alpha }(2^\beta -\gamma )-1$, the bitlengths of $r$ and $r'$ are always equal when $\gamma \ne 0$. The curves in Table 2 can be used in different curve models: in the twisted Edwards model, in the Montgomery model for implementing Montgomery ladders, and also in the original Edwards model allowing complete addition formulas [11]. The latter can be seen as follows. Since $p \equiv 3 \pmod 4, E_A$ is birationally equivalent to an Edwards curve by [7], Theorem 3.4]. Using the maps discussed in [7], Section 3], one can show that $E_A: y^2 = x^3 + Ax^2 + x$ is birationally equivalent to $\mathcal {E}_{-1/d}: x^2 + y^2 = 1 - (1/d)x^2y^2$. For all of our curves, $d$ is a square in $\mathbf{{F}}_p$, so $-1/d$ is not a square, which means that the addition law on $\mathcal {E}_{-1/d}$ is complete. All of the curves in Table 2 allow for an efficient map from a subset of their $\mathbf{{F}}_p$-rational points to bit strings of a certain length, such that they are indistinguishable from uniform random bitstrings of the same length (see [10], which is based on [29]). However, note that curves defined over pseudo-Mersenne primes are more suitable for achieving indistinguishability than those over Montgomery-friendly primes because for the latter primes $p$, the value $(p+1)/2$ is further away from a power of $2$ (see [10], §2.6]). The prime-order Weierstrass curves presented in Table 1 are similar in their basic properties to the NIST curves, as they have the same curve model, share the parameter $a=-3$, and include prime fields of the same bit lengths as the ones for the NIST curves [57]. However, we stress that the curves in Table 1 do not allow any room for manipulations, which can be the case when the curve parameter $b$ is allowed to be chosen “randomly”. Our curves are twist-secure, do not allow transfers, and have large discriminants (notions used to guard against certain attacks; e.g., see [12]). The work in [56] shows that indistinguishability can also be achieved for our prime-order Weierstrass curves in Table 1, however the resulting bit strings are twice as large as those that result from applying [10, 29] to the twisted Edwards curves in Table 2.

4 Efficient, constant-time, and exceptionless scalar multiplications

To protect against certain types of side-channel attacks [38], it is essential that scalar multiplications are computed in constant-time. This means that the running time of the algorithm for computing a scalar multiplication $kP$ must be independent of the scalar $k$ and the point $P$. Classical curve arithmetic formulas have exceptional cases, i.e., they do not work for all points. Having conditional statements in the code that check for these cases means the algorithms have a variable running time depending on different input cases, but simply leaving them out might lead to exceptional point attacks that produce wrong results or cause other implementation errors. In this section we outline how constant-time algorithms can be achieved efficiently for our chosen Weierstrass and twisted Edwards curves in two different settings: the variable- and fixed-base scenarios. The variable-base scenario refers to the case in which the base point $P$ can be different for each execution of the algorithm. In the fixed-base case, multiples of a public constant point can be precomputed, which allows different optimization possibilities. In Appendix A we present an algorithm for the double-scalar scenario, which carries out a computation of the form $k_1P_1 + k_2P_2$ (see Algorithm 9). This occurs for example in the verification of ECDSA signatures. In this setting the verification algorithm operates on public inputs only, and one can profit from more efficient variable-time algorithms since the implementation does not require side-channel protection or constant-time execution.

We discuss the various cases for implementing scalar multiplication for the different curve models and algorithm choices. We list all algorithms as pseudo-code in Appendix A (scalar multiplication, point validation, precomputation and recoding) and in Appendix B (point operations). The reader is referred to Appendix C for complete details on the selection of explicit formulas. Note that several of these algorithms contain if-statements, which are marked in the pseudo-code according to their nature. For example, some of these statements occur in algorithms that are only run on public inputs and do not need to run in constant time; some of them are implemented in constant time via masking techniques; and some of them are there merely to allow us to represent several algorithms in one pseudo-code algorithm environment and to re-use the different variants in different scenarios. As soon as a specific scenario is chosen, these statements are always executed under the same condition. The remaining if-statements are the ones that when implemented introduce data-dependent branches into the algorithms. They occur only in algorithms for point doubling, point addition and merged point doubling/addition, where they correspond to exceptions, i.e., the exceptional cases for which the given formulas are not valid. But, whenever the implementation needs to be constant-time, the conditions for entering these if-statements are always false such that they are never executed (and can be removed in the code). Below, we argue that indeed no exceptional cases occur and that the proposed algorithms can be implemented to run in constant time (when used as described in the algorithms in Appendix A). Note that the neutral element on Weierstrass curves is the point at infinity, i.e., the point $(0:1:0)$ in projective coordinates, while on twisted Edwards curves the neutral element is the rational point $(0,1)$, and in the Montgomery ladder the neutral element is $(X :Z) = (0 :0)$. In this paper, they are all denoted by $\mathcal {O}$.

4.1 Weierstrass scalar multiplications

Let $E_b/\mathbf{{F}}_p$ be any of the Weierstrass curves in Table 1, with $r=\#E_b(\mathbf{{F}}_p)$ prime. Let $k$ be an integer scalar and $P = (x_1,y_1) \in \mathbf{{F}}_p \times \mathbf{{F}}_p$. We consider the computation of efficient, constant-time and exception-free scalar multiplications in two scenarios.

The variable-base scenario On input of the scalar $k$ and variable point $P = (x_1,y_1)$, perform the following steps.

1.
Validation Validate that $k\in [1,r)$ and that $P=(x_1,y_1) \in E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\}$ by checking that $y_1^2=x_1^3-3x_1+b$. Otherwise, return false (see Algorithm 2).
2.
Precomputation For a fixed window size $2\le w < 10$, compute the $2^{w-2}$ multiples $\{ P, 3P, \ldots , (2^{w-1}-1)P\}$ of $P$, and store them in a lookup table. This precomputation can be achieved using one point doubling and $2^{w-2}-1$ point additions^{Footnote 3} (see Algorithm 4).
3.
Scalar recoding Convert the scalar $k$ to odd by replacing $k$ with $r-k$ (if even) and recode it into exactly $\lceil \log _2(r)/(w-1)\rceil +1$ odd, signed, non-zero digits in $\{\pm 1,\pm 3, \ldots , \pm (2^{w-1}-1)\}$ (see Algorithm 6).
4.
Evaluation Compute $kP$ using a fixed window with the precomputed values from the previous step. This requires exactly $(w-1)\lceil \log _2(r)/(w-1)\rceil $ point doublings and $\lceil \log _2(r)/(w-1)\rceil $ point additions, or $(w-2)\lceil \log _2(r)/(w-1)\rceil + 1$ point doublings, $\lceil \log _2(r)/(w-1)\rceil - 1$ point doubling-additions and one addition when $w > 2$. Note that every time an addition is performed, we also negate the selected point in the look-up table, and choose the correct one according to the sign of the digit in the recoded scalar. This is repeated until the last iteration, when crucially, the final addition is performed via a “complete masked” addition (see Appendix C.1). The final result is negated if the original value of $k$ was even.

This can be computed as outlined in Algorithm 1 in Appendix A.

Proposition 1

When computing variable-base scalar multiplications on any of the Weierstrass curves in Table 1 using Algorithm 1 to implement the steps above, no exceptions occur.

Before proving the proposition, we fix notation to partition the non-zero points in a prime order subgroup of the group $E_b(\mathbf{{F}}_p)$. For a fixed point $P \in E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\}$, the map $[1, r) \rightarrow E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\},\ k \mapsto kP$ is a bijection. It induces a partition of $E_b(\mathbf{{F}}_p) {\setminus } \{{\mathcal {O}}\} = S_\mathrm{odd} \cup S_\mathrm{even}$ into two equally sized sets, where $S_\mathrm{odd} = \{kP \mid k \in [1, r) \hbox { odd}\}$ and $S_\mathrm{even} = \{kP \mid k \in [1, r) \hbox { even}\}$. Let $T=\{ P, 3P, \ldots , (2^{w-1}-1)P\} \subset S_\mathrm{odd}$ and $T^{-1}=\{(r-1) P, (r-3)P, \ldots , (r-(2^{w-1}-1))P\} \subset S_\mathrm{even}$. The set $T^{-1}$ contains the inverses of the points in the set $T$.

Proof

To exclude any exceptions in the course of Algorithm 1, we consider all of its doubling, addition and merged doubling/addition operations. First of all, it is easy to see that all doubling and addition steps for building the look-up table are exception-free. Note that the look-up table consists exactly of the points in the set $T$ defined above. The precomputation as shown in Algorithm 4 starts by doubling $P$ with Algorithm 10. The algorithm works for the point at infinity $\mathcal {O}$ when defined as $(0:Y_1:0)$ with $Y_1 \ne 0$, but the case $P=\mathcal {O}$ is excluded by point validation, and it does not have any exceptions since there are no points of order $2$ in the group $E(\mathbf{{F}}_p)$. The points for the look-up table are then computed by adding $2P \in S_\mathrm{even}$ to points from $T\subset S_\mathrm{odd}$ only, i.e., the input points to the additions are always different and do not include $\mathcal {O}$. Also $-2P = (r-2)P$ is not among these points because $2^{w-1}-1 < r-2$ (note $2 \le w < 10$).

The operations in the evaluation stage depend on the recoding of the scalar $k$, which at this point in the algorithm satisfies $0 < k < r$. Let $t=\lceil \log _2(r)/(w-1)\rceil $, then with notation as in Algorithm 1, the scalar can be written as

$$\begin{aligned} k = \sum _{i=0}^t s_i |k_i| 2^{(w-1)i}, \end{aligned}$$

where $s_i \in \{-1,1\}$ and $k_i \in \mathbf{{Z}}$ with $0 < |k_i| < 2^{w-1}$. The recoding used here guarantees $k_t > 0$ such that $s_t = 1$ and $|k_t|=k_t$. Throughout the evaluation stage, the variable $Q$ is used to denote the running value during the algorithm. At any stage, there is some $z \in [0, r)$ such that $Q = zP$. Let $z_1>0$ and $z_2 = 2^{w-1}z_1 \pm z_0$ with $z_0\in \{1,3,\ldots ,2^{w-1}-1\}$, then $z_2 \ge z_1$. If $z_1 > 1$, we even have $z_2 > z_1$. This means that whenever a positive integer is doubled $w-1$ times and then an integer corresponding to one of the elements in the look-up table is either added or subtracted, the result cannot be smaller than the original integer. Thus, in the evaluation stage of Algorithm 1, after each sequence of $w-1$ doublings and one addition step, the value $z$ of the running point $Q$ cannot decrease.

The evaluation stage begins with choosing an element from the lookup table $T$ and assigning it to $Q$. After the first assignment, we have $z \in \{1, 3, \ldots , 2^{w-1}-1\}$. All the doubling operations in Lines 11, 14 and 18 of Algorithm 1 are done using Algorithm 10. Therefore, for the same reasons as explained above there are no exceptions possible in these steps. The last addition in Line 19 is done with a complete addition formula and hence also does not have any exceptional cases. It now suffices to ensure that all remaining addition steps (i.e., in Lines 12 and 15) do not run into exceptions.

First, assume that an exceptional case occurs in one of the additions in Step 15, which computes $Q+R$ for $R\in T\cup T^{-1}$ using Algorithm 2. Note that none of the doubling steps can ever output $\mathcal {O}$ because there are no points of order $2$ and $\mathcal {O}$ is never input to any of them since the running value $Q$ always has $1<z<r$ for all points input to doubling steps prior to any of the additions in Step 15. Thus the only exceptional cases that could occur in this algorithm, are the cases where $Q = \pm R$. This means that either $Q\in T$ or $Q\in T^{-1}$. Since $Q$ is the output of a non-trivial doubling operation, we have $Q\in S_\mathrm{even}$ which excludes $Q\in T$ and means that $Q\in T^{-1}$. Therefore, $Q=zP$ with $z \ge r- (2^{w-1} - 1)$. After each addition in Step 15 there are always $w-1$ doublings that follow. Hence, the minimal value for $z$ that can occur after the exceptional addition and the following doublings is $2^{w-1} (r- 2(2^{w-1} - 1))$. The addition of a table element immediately after these doublings, can bring down this value to the minimal $z_{\min } = 2^{w-1} (r- 2(2^{w-1} - 1)) - (2^{w-1}-1) = 2^{w-1} r - (2^{w}+1)(2^{w-1} - 1)$. This value is larger than $r$, because otherwise, it follows that $r \le 2^{w}+1$, which is not true for any of our curves. Given the observation that a positive integer does not decrease after any sequence of $w-1$ doublings and a following addition of an integer corresponding to a look-up table element, the scalar $k$ cannot be reached any more as the final value for $z$ after the exceptional addition. This contradicts any exceptions in the additions of Step 15.

Next, assume that an exception occurs in one of the steps in Line 12 of Algorithm 1. This step is a merged doubling and addition step and is computed via Algorithm 11. The algorithm computes $2Q + R$ for $R\in T\cup T^{-1}$ as $(Q+R) + Q$. For the same reasons as above, the input point $Q$ cannot be equal to $\mathcal {O}$. Since $R\in T\cup T^{-1}$, we have $R\ne \mathcal {O}$. The first addition $Q+R$ could have the same exceptions as the additions in Step 15 treated in the previous paragraph. This means that an exception can only be $Q \in T^{-1}$ as above and again we look at the minimal value $z_{\min }$ after carrying out the exceptional addition, the addition of $Q$ and the following $w-1$ doublings and subsequent addition (also the steps including the merged doubling and addition algorithm can be treated as such). This value is $z_{\min } \ge 2^{w-1}\cdot (2r- 3(2^{w-1} - 1)) - (2^{w-1} - 1) = 2^{w}r - (3\cdot 2^{w-1}+1)(2^{w-1}-1)$. Again, this value is larger than $r$, because otherwise we would have $r\le 3\cdot 2^{w-1} + 1$, which does not hold for our curve parameters. As above this means that the scalar $k<r$ cannot be reached as the final value of $z$, contradicting any exception in the first addition in $(Q+R) + Q$. Finally, we assume that there is an exception in the second addition. We have already excluded $Q=\mathcal {O}$ and $Q+R = \mathcal {O}$. Hence, the only two possibilities for an exception are $Q+R = Q$ or $Q+R = -Q$. The first condition means that $R=\mathcal {O}$ which is not possible since $R\in T\cup T^{-1}$. We are thus left with the condition $2Q = -R$ and hence either $2Q \in T$ or $2Q \in T^{-1}$. Since $2Q \in S_\mathrm{even}$, it cannot be in $T$, which leaves $2Q \in T^{-1}$. This means that $2z \ge r-(2^{w-1} - 1)$. The minimal value $z_{\min }$ after the computation $(Q+R) + Q$ and the following $w-1$ doublings and another addition is $z_{\min }\ge 2^{w-1}(r-2(2^{w-1}-1)) - (2^{w-1}-1) = 2^{w-1}r - (2^w+1)(2^{w-1}-1)$. Again, this value is larger than $r$, leaving no way to achieve the scalar $k$ during the remaining computation. This excludes all exceptions in Line 12 and therefore all exceptions in Algorithm 1. $\square $

Given that the recoding always produces a fixed length for the scalar, this means that after a successful validation step, we do not execute any conditional statements.

The fixed-base scenario In this setting, the point $P$ is fixed (e.g., as a public parameter of the system), so multiples of $P$ can be precomputed offline and used to speedup the online computation of $k P$. In terms of performance, it might be difficult to select the “optimal” size of the precomputed table. A larger table with more multiples of $P$ typically means a reduced number of elliptic curve operations at runtime, but such tables might result in cache-misses which can result in a performance penalty. Moreover, when one wants to extract elements from this table in a cache-attack resistant manner, one should access every element and mask out the correct value to avoid leaking access patterns. Hence, using a larger table implies an increased access cost for every table-lookup.

This is not the only problem with large precomputed tables. As far as we know, one cannot show (for all inputs) that a current active point in the fixed-base scalar multiplication will not be the same (or have an opposite sign) as one of the many precomputed values. Although this might happen only with extremely low probability, such that honest parties may never encounter this by accident, active adversaries could manipulate such scalar/point combinations to force exceptions. This means that, unlike the variable-base multiplication, the implementation of the group law must cover exceptional cases. One solution is to use complete formulas (which have no exceptional cases). Unfortunately, the complete Weierstrass formulas from [18] (see Appendix C.1) are expensive compared to their incomplete counterparts, and using these would incur a much larger relative penalty than the complete formulas on (twisted) Edwards curves do. Another possible solution is to always compute two candidates for the addition, $C_1=2P$ and $C_2=P+L$, and select (in a constant time manner) $C_1$ if $P=L, \mathcal {O}$ if $P=-L, L$ if $P=\mathcal {O}, P$ if $L=\mathcal {O}$, and $C_2$ otherwise. At a first glance this approach seemingly increases the cost of an addition to be at least that of computing both an addition and a doubling. However, as noted by Chevallier-Mames et al. [22] for the case of binary affine operations, doubling and addition share several similarities in their formulas. By observing that these similarities naturally reflect to the projective formulas, we present a solution that achieves the required behavior explained above without increasing the number of modular multiplications or squarings required in a dedicated point addition (see Algorithms 18 and 19). The idea is to exploit the similarities in the doubling and addition routines by masking out the correct operands first, and using these as inputs to the arithmetic operations.

Hence, Algorithms 18 and 19 work for any input points, do not have any exceptional cases and have roughly the same run-time as their corresponding dedicated point additions. We note that Chevallier-Mames et al.’s approach tries to address a different problem and hence produces different formulas. In particular, they exploit similarities in the affine formulas to build (separate) routines for doubling and addition with the same pattern of field operations. This is done in order to eliminate differences in the power traces of the doubling and addition executions. In projective coordinates, however, the same approach would not work because of the extra operations required by addition in comparison to doubling (in this case, point operations are partitioned into smaller atomic units, each with the same pattern of field operations. Thus, this technique does not exploit similarities between doubling and addition).

For a scalar $k$ and the fixed point $P = (x_1,y_1)$, we make use of these formulas to perform the following steps.

Offline computation

1.
Point validation Validate that $P=(x_1,y_1) \in E_b(\mathbf{{F}}_p)$ ${\setminus } \{{\mathcal {O}}\}$ by checking that $y_1^2=x_1^3-3x_1+b$. Otherwise, return false (see Algorithm 2).
2.
Precomputation For a fixed window size $2\le w <10$, compute $v>0$ tables of $2^{w-1}$ points (each) for the mLSB-set comb method (see Line 2 of Algorithm 7). Convert all points in the lookup table to affine form. Online computation
3.
Scalar validation Validate that the scalar $k\in [1,r)$. Let the maximum bit-length of all valid scalars be $t=\lceil \log _2(r)\rceil $.
4.
Recoding Convert the scalar $k$ to odd by replacing it with $r-k$ (if even) and recode it into the mLSB-set representation (see Algorithm 8).
5.
Evaluation Using the precomputed values from the offline precomputation, compute $kP$ with exactly $\lceil \frac{t}{w\cdot v}\rceil -1$ point doublings and $v \lceil \frac{t}{w \cdot v}\rceil -1$ point additions.^{Footnote 4} All point additions are computed using the “complete masked” approach in Algorithm 18 in Appendix C.1. The final result is negated if the original value of $k$ was even.

This approach is outlined in Algorithm 7 in Appendix A.

Proposition 2

When computing fixed-base scalar multiplications on any of the Weierstrass curves in Table 1 using Algorithm 7 to implement the steps above, no exceptions occur.

Proof

Following the proof of Proposition 1, point doublings computed via Algorithm 10 do not fail for any rational points in $E_b(\mathbf{{F}}_p)$ for any of the curves $E_b$ in Table 1. Furthermore, Algorithm 10 also correctly computes doublings at the point at infinity, ${\mathcal {O}}$. Thus, no exceptions can arise in point doublings; and, since all online additions are implemented using the “complete” masking technique described in Appendix C.1, it follows that no exceptions can arise at any stage of the online computation (offline computations can also make use of this technique if necessary). $\square $

4.2 Twisted Edwards scalar multiplications

Let ${\mathcal {E}}_d/\mathbf{{F}}_p :-x^2+y^2=1+dx^2y^2$ be any of the twisted Edwards curves in Table 2, with $\#E(\mathbf{{F}}_p)=4r$ for $r$ prime. In a similar vein to [5, 34], we avoid small subgroup attacks by requiring all scalar multiplications to include a cofactor $4$. Thus, let the integer $\hat{k}$ be defined as $\hat{k}:=4k$ with $k \in [1,r)$, and let $P = (x_1,y_1)$ be in $\mathbf{{F}}_p \times \mathbf{{F}}_p$.

The variable-base scenario On input of $\hat{k}$ and (variable) $P = (x_1,y_1) \in \mathbf{{F}}_p \times \mathbf{{F}}_p$, we perform the following steps.

1.
Validation Validate that $\hat{k} \in [4\cdot 1, 4\cdot 2, \ldots , 4(r-1) ]$. Validate that $P=(x_1,y_1) \in \mathcal {E}_d(\mathbf{{F}}_p){\setminus }\{ {\mathcal {O}}\}$ by checking that $-x_1^2+y_1^2=1+dx_1^2y_1^2$ and that $P \ne (0,1) = {\mathcal {O}}$ (see Algorithm 3). Otherwise, return false.
2.
Clear torsion Compute $Q \leftarrow [4]P$ using two consecutive doublings (as in Algorithm 3).
3.
Revalidation Validate that (the projective point) $Q\ne \mathcal {O}$. If not, reject.
4.
Precomputation Compute the $2^{w-2}$ odd, positive multiples $\{ Q, 3Q, \ldots , (2^{w-1}-1)Q\}$ of $Q$, and store them in a lookup table. This precomputation can be achieved using one point doubling and $2^{w-2}-1$ point additions^{Footnote 5} (see Algorithm 4).
5.
Scalar recoding Using a window size of $2\le w < 10$, convert the updated scalar $k := \hat{k} / 4 \in [1,r-1]$ to odd by setting $k$ to $r-k$ (if even) and recode it into exactly $\lceil \log _2(r)/(w-1)\rceil +1$ odd, signed, non-zero digits in $\{\pm 1,\pm 3, \ldots , \pm (2^{w-1}-1)\}$ (see Algorithm 6).
6.
Evaluation Compute $\hat{k}P$ as $kQ$, using exactly $(w-1)\lceil \log _2(r)/(w-1)\rceil $ point doublings and $\lceil \log _2(r)/(w-1)\rceil $ point additions. Note that every time an addition is performed, we also negate the selected point in the look-up table, and choose the correct one according to the sign of the digit in the recoded scalar. This is repeated until the last iteration, when crucially, the final addition is performed using the unified formula in [35], Eq. (5)]. The final result is negated if the original value of $k$ was even.

This computation is given in Algorithm 1 in Appendix A.

Proposition 3

When computing variable-base scalar multiplications on any of the twisted Edwards curves in Table 2 using Algorithm 1 to implement the steps above, no exceptions occur.

Proof

The first three steps (validation, clear torsion, and revalidation) detailed in Sect. 4.2 ensure that the point $Q$ has large prime order $r$. Furthermore, only elements of $\langle Q \rangle $ are encountered after the revalidation stage, meaning that Corollary 1 from [35] can be invoked to say that the additions in Algorithm 15 (from [35], but extended according to the representation suggested in [32]) will never fail to add points $P$ and $Q$ of odd order, except when $P = Q$. This corollary also tells us that the formulas for point doubling in Algorithm 14 never fail for points of odd order. Similar to the addition formulas, these doubling formulas, which are from [7], are extended according to [32]. Thus, the proof from this point is identical to the proof of Proposition 1: we partition the elements in $\langle Q \rangle {\setminus } \{ {\mathcal {O}} \}$ into $S_\mathrm{odd}$ and $S_\mathrm{even}$ to categorize the elements in the look-up table, and use this to show that the running value that is input into point additions can never be equal to an element in the look-up table, except possibly in the final addition, where we use the formula in [35], Eq. (5)], which is slightly slower, but is exception-free in $\langle Q \rangle $. $\square $

The fixed-base scenario Let $P = (x_1,y_1) \in \mathbf{{F}}_p\times \mathbf{{F}}_p$ be a fixed point and let $\hat{k}= 4k$ be an integer scalar, which is a multiple of the cofactor $4$. Then perform the following steps.

Offline computation

1.
Validation Validate that $\hat{k} \in [4\cdot 1, 4\cdot 2, \ldots , 4(r-1) ]$. Validate that $P=(x_1,y_1) \in \mathcal {E}_d(\mathbf{{F}}_p){\setminus }\{ {\mathcal {O}}\}$ by checking that $-x_1^2+y_1^2=1+dx_1^2y_1^2$ and that $P \ne (0,1) = {\mathcal {O}}$ (see Algorithm 3). Otherwise, return false.
2.
Clear torsion Compute $Q \leftarrow [4]P$ using two consecutive doublings (see Algorithm 3).
3.
Revalidation Validate that $Q\ne \mathcal {O}$. If not, reject.
4.
Precomputation For a fixed window size $2\le w <10$, compute $v>0$ tables of $2^{w-1}$ points (each) for the mLSB-set comb method (see Line 2 of Algorithm 7)—convert all points in the lookup table to affine form. Online computation.
5.
Recoding Convert the updated scalar $k:=\hat{k}/4$ to odd by setting $k$ to $r-k$ (if even) and recode it into the mLSB-set representation (see Algorithm 8).
6.
Evaluation: Using the precomputed values from the offline precomputation, compute $\hat{k}P$ as $kQ$ with exactly $\lceil \frac{t}{w\cdot v}\rceil -1$ point doublings and $v \lceil \frac{t}{w \cdot v}\rceil -1$ point additions.^{Footnote 6} Every one of these additions is computed using the unified formulas from [35], Eq. (5)]. The final result is negated if the original value of $k$ was even.

Algorithm 7 in Appendix A outlines this computation.

Proposition 4

When computing fixed-base scalar multiplications on any of the twisted Edwards curves in Table 2 using Algorithm 7 to implement the steps above, no exceptions occur.

Proof

As in the proof of Proposition 3, we start by noting that the (updated) point $Q$ has odd order $r$, and that we only compute on elements in $\langle Q \rangle $. The only algorithm we use for online additions corresponds to the formulas in [35], Eq. (5)], which do not fail for any pair of inputs in $\langle Q \rangle $. Additionally, the only algorithm we use for doublings is Algorithm 14 (from [7]), which is also exception-free on all inputs from $\langle Q \rangle $. $\square $

4.3 The Montgomery ladder

Let $E_A/\mathbf{{F}}_p :y^2=x^3+Ax^2+x$ be the Montgomery form of any of the curves in Table 2, with $\#E_A(\mathbf{{F}}_p)=4r$, for $r$ a large prime. Since the Montgomery ladder is not compatible with the recoding techniques discussed in Sect. 4, we take the following route to guarantee a fixed length scalar. For all $k \in [1, r-1]$, we use the updated scalar $\hat{k} = 4(\alpha r+k)$, where $\alpha $ is the smallest positive integer such that $\alpha r +1$ and $(\alpha +1) r -1$ have the same bitlength; $\alpha $ is specific to $r$, but for each of the curves in Table 2 we have $\alpha \in \{1,2,3\}$. Note that scalar multiplication by $\hat{k}$ corresponds to scalar multiplication by $4k$ on $E_A$, which thwarts small subgroup attacks in the same way as the twisted Edwards scalar multiplications in Sect. 4.2.

On input of $\hat{k}$ and $x_1 \in \mathbf{{F}}_p$, we perform the following steps.

1.
Scalar validation First validate that $\hat{k} \in 4{\mathbb {Z}}$, and then that the integer $\hat{k}/4 \in [\alpha r+1,(\alpha +1)r-1]$. Otherwise, reject.
2.
Evaluation Process the scalar by inputting $\hat{k}$ and $(x_1 :1)$ into the standard $(X :Z)$-only Montgomery ladder routine [47], §10], with constant $(A+2)/4$ in the addition formula. Since $\hat{k} = 4(\alpha r+k)$, this can be done by inputting the fixed-length scalar $\hat{k}/4=\alpha r+k$ and $(x_1 :1)$ into the Montgomery ladder to give $(X_1 :Z_1)$, before finishing with two repeated, standalone Montgomery doublings of $(X_1 :Z_1)$ to give $(\hat{X} :\hat{Z})= 4(X_1 :Z_1)$
3.
Normalize: If $\hat{Z}=0$, return ${\mathcal {O}}$, otherwise return $\hat{x}_1 = \hat{X}/\hat{Z}$.

Notice that there is no validation of the input coordinate $x_1 \in \mathbf{{F}}_p$, i.e., that we do not check whether $x_1^3+Ax_1^2+x_1$ is a square in $\mathbf{{F}}_p$, so that $x_1$ corresponds to a point (or points) on $E_A$. Avoiding this check in the presence of twist-security is due to Bernstein (cf. [5]), since even if $x_1$ corresponds to a point on the quadratic twist $E_A'$, the output of the Montgomery ladder corresponds to a scalar multiplication on $E_A'$, because scalar multiplications on both curves use the same constant $(A+2)/4$. In this case, multiplication by $\hat{k} = 4(\alpha r+k)$ on $E_A'$ no longer corresponds to the scalar $4k$, but rather to the scalar $4k'$, where $k' \equiv (\alpha r +k) \mod r'$ for $\#E_A'(\mathbf{{F}}_p)=4r'$. This is not a problem in practice since the cofactor of 4 still clears torsion on the twist, and the twist-security ensures that the discrete logarithm problem has a similar difficulty in $E_A'(\mathbf{{F}}_p)$ as it does in $E_A(\mathbf{{F}}_p)$. Following the arguments developed in [4] (see also [5], App. A–B]), it could be possible to prove that no exceptions can occur in Montgomery ladder implementations of the curves in Table 2 that follow Steps 1–3 above, subject to addressing the issues below.

It should first be pointed out that the lack of validation means that there are some scalar/point combinations which could produce exceptions. For example, suppose $k$ is chosen as the unique integer less than $r'$ such that $k \equiv -\alpha r \mod r'$. If $k$ is also less than $r$, then $\hat{k}:=4(\alpha r + k)$ is a valid scalar according to Step 1 above. But, if an unvalidated $x$-coordinate, say $x_1'$, corresponds to a point $P_1'$ on $E_A'$, then $\hat{k}P_1 = {\mathcal {O}}$, because $(\alpha r + k) \equiv 0 \mod r'$; note that outputting ${\mathcal {O}}$ in Step 3 above could leak information to an attacker. Furthermore, in practice these ladder implementations are often used in conjunction with non-ladder implementations on (most likely a twisted Edwards model of) the same curve—see Sect. 6. In such a scenario, the refined forms of the scalars in this section do not match the forms of the scalars in Sect. 4.2, so if the scalars above were to be used on the twisted Edwards form of $E_A$, then Proposition 3 and Proposition 4 no longer provide any guarantees. More specifically, if an implementation synchronizes the inherently larger Montgomery ladder scalars above to also be used on the twisted Edwards curve, then the argument of $\hat{k} \in [4, 8, \ldots , 4(r-1)]$ that was used in the proof of Proposition 3 no longer holds when $\alpha > 0$. Roughly speaking, the fact that $\hat{k}/4$ is now outside the range $[1,r-1]$ means that the running multiple of an input point can now reach the dangerous stage of a scalar multiplication (which we handle by using complete additions) before the final addition.

In the Montgomery ladder implementation of Curve25519 [5], and in the complementary Edwards “Ed25519” implementation [9], it seems that the above problems are overcome by restricting the set of permissible scalars to be of a lesser cardinality than the prime subgroup order. Namely, Curve25519 has $r,r' > 2^{252}$, with all scalars being of the form $\hat{k} = 8 \cdot (2^{251}+k)$ for $k \in [0,2^{251}-1]$. As well as guaranteeing that all of the possible scalars $\hat{k}$ have the same bitlength, this prevents the existence of a $\hat{k}$ such that $\hat{k} \equiv 0 \mod r$ or $\hat{k} \equiv 0 \mod r'$. On the other hand, it also means that for a fixed base point $P$ of order $r$ on Ed25519, less than half of the elements in $\langle P \rangle $ are possible outputs when computing scalar multiplications of $P$.

As one potential alternative, we remark that a hybrid solution which uses both Montgomery and twisted Edwards scalar multiplications could parse scalars differently: $k \in [0,r-1]$ could be modified to $\hat{k}:=4(\alpha r + k)$ in the Montgomery implementation, but modified to $\hat{k}:=4k$ in the twisted Edwards implementation. If, in addition, all $x$-coordinates were validated in Step 1 of the Montgomery ladder routine,^{Footnote 7} then this may well be enough to prove that all scalar multiplications compute correctly and without exception: Proposition 3 would then apply directly to the twisted Edwards part, while the techniques in [4, 5] could be used to prove the Montgomery ladder part.

5 Implementation results

To evaluate the performance of the selected curves, we developed a software library^{Footnote 8} that includes support for three scenarios: variable-base, fixed-base and double-scalar multiplication. The library can perform arithmetic on $a=-1$ twisted Edwards, $a=-3$ Weierstrass and Montgomery curves, and supports all of the new curves from Sect. 3, with exception of the Weierstrass curves with reduced bitlength (see Tables 1, 2). The implementation of the library is largely in the C-programming language with the modular arithmetic implemented in x64 assembly for Windows.

Taking the above into account, we remark that the purpose of the library is to allow the comparison and evaluation of a large number of curve options, using a generic design that is flexible, reduces code size, and eases maintenance effort. Nevertheless, the library achieves good performance in comparison with standalone implementations that are tailored towards speed records.

It is well known that it is non-trivial to create an efficient and secure implementation of cryptographic primitives (for use in elliptic curve cryptography). Complete formulas might avoid certain pitfalls to the programmer, but this can come at a performance cost. As illustrated in Sect. 4, and by our software library, it is possible to have efficient, constant-time, and exceptionless scalar multiplications with a reasonable easy implementation strategy.

Table 3 Experimental results for variable-base, fixed-base and double-scalar multiplication

Full size table

Table 3 shows the performance details of scalar multiplication in the three scenarios of interest. Variable-base scalar multiplication is computed with the fixed-window method (see Algorithm 1 in Appendix A) using window width $w=6$. Fixed-base scalar multiplication was computed using the mLSB-set method (see Algorithm 7 in Appendix A) using parameters $w=5$ and $v=4$ for the twisted Edwards curves at the 128-bit security level; all other cases use $w=6$ and $v=3$. These values correspond to precomputed tables of sizes: 6, 9 and 12 KB for Weierstrass curves at the 128-, 192- and 256-bit security levels, respectively, and 6, 13.5 and 18 KB for twisted Edwards curves at the 128-, 192- and 256-bit security levels, respectively. Double scalar multiplication was computed using the $w$NAF method with interleaving (see Algorithm 9 in Appendix A) using window width $w_1=6$ for the variable base and $w_2=7$ for the fixed base. The latter corresponds to precomputed tables with sizes: 2, 3 and 4 KB for Weierstrass curves at the 128-, 192- and 256-bit security levels, respectively, and 3, 4.5 and 6 KB for twisted Edwards curves at the 128-,192- and 256-bit security levels, respectively. The results (expressed in terms of computer cycles) were obtained by running and averaging $10^4$ iterations of each computation on an Intel Core i7-2600 (Sandy Bridge) processor with Intel’s Turbo Boost and Hyper-Threading disabled. The variable- and fixed-base scalar multiplication routines have a constant running time which guards against various types of timing attacks [20, 38], including cache attacks [50] (e.g., see [19] in the asymmetric setting). This means that no conditional branches on secret data or secret indexes for table lookups are allowed in the implementations.

Our results suggest that reducing the size of the pseudo-Mersenne primes does not have a significant effect on the performance: below a factor $1.04$ reduction of the running time at the expense of roughly half a bit of ECDLP security. However, using slightly smaller moduli in the setting of the Montgomery-friendly primes does pay off: a reduction of the running time by a factor $1.20, 1.11$, and $1.09$ at the $128$-, $192$-, and $256$-bit security level, respectively. This performance difference between pseudo-Mersenne and Montgomery-friendly primes can be explained by the fact that the final constant-time conditional subtraction in Montgomery multiplication can be omitted when reducing the modulus size appropriately. The size-reduced Montgomery-friendly primes are the best choice (with respect to performance) at the 128- and 192-bit security levels while the size-reduced pseudo-Mersenne prime is faster for the 256-bit security level. For full-word length moduli, Montgomery-friendly and pseudo-Mersenne primes achieve similar performance at the 128-bit security level, whereas full-word length pseudo-Mersenne moduli are the best option for the 192- and 256-bit security levels. The better performance of pseudo-Mersenne primes at high security levels can be explained by the inherent higher register pressure in our Montgomery-friendly implementations which results in more load and store operations for large moduli sizes. The faster arithmetic operations in the base field translate directly to optimizations in the different scenarios for the scalar multiplication.

In the setting of variable-base scalar multiplication the twisted Edwards implementation and the Montgomery ladder achieve similar performance at the 128 and 192-bit security levels. At the 256-bit security level the gap increases in favor of twisted Edwards which outperforms the Montgomery ladder by a factor 1.05.

Note that our best results using the twisted Edwards and Montgomery forms at the 128-bit security level are virtually equivalent to the state-of-the-art Montgomery ladder implementation of Curve25519 [5] ($194,000$ cycles on the benchmark machine “sandy0” [13]). Given the significant level of code optimization applied on the Curve25519 implementation which includes full use of assembly for the curve and field arithmetic, this comparison demonstrates the high efficiency of the chosen 254-bit Montgomery-friendly prime.

The state-of-the-art implementation of the NIST P-256 curve [31] can compute a variable-base scalar multiplication in $400,000$ cycles on a Sandy Bridge CPU. Our curve w-256-mont offers better security properties and results in a $1.43$ times reduction of the running time compared to [31]. When switching from prime order Weierstrass curves using full size moduli to composite order twisted Edwards curves with size-reduced moduli one can expect a reduction in the running time by a factor between $1.25$ and $1.44$ at the price of a slight decrease in ECDLP security.

6 Real-world protocols

Although significant research has been devoted to optimize the most popular ECC operation (the variable-base scalar multiplication), in real-world cryptographic solutions it is often not as simple as computing just a single scalar multiplication with an unknown base. Cryptographic protocols typically require a combination of different types of scalar multiplications including fixed-, variable-base and multiple-scalar operations. In this section we study the TLS protocol, more specifically the computation of the TLS handshake using the ECDHE–ECDSA cipher suite. We outline the impact of using different curve and coordinate systems in practice.

Table 4 Cost estimates for the TLS handshake using the ECDHE–ECDSA cipher suite for different security levels where we consider the elliptic curve scalar multiplications

Full size table

TLS with perfect forward secrecy Support for using elliptic curves in the TLS protocol is specified in RFC 4492 [14]. The cipher suites specified in this RFC use the elliptic curve Diffie–Hellman (ECDH) key exchange, whose keys may either be long-term or ephemeral. We focus our analysis on the latter case (denoted by ECDHE) since it offers perfect forward secrecy. Besides the usage of elliptic curves in the DH key exchange, TLS certificates contain a public key that the server uses to authenticate itself: this is an ECDSA public key for the case of the ECDHE–ECDSA cipher suite. The TLS handshake, using the ECDHE–ECDSA cipher suite, consists of three main components. The ECDSA signature generation (fixed-base scalar multiplication), ECDSA signature verification (double scalar multiplication), and ECDHE (one fixed- and one variable-base scalar multiplication).^{Footnote 9} We consider Weierstrass and twisted Edwards curves separately, with and without point compression. The cost of decompressing a point in Weierstrass and twisted Edwards form is stated in Table 7 (where we follow the approach described in [9] to decompress points on twisted Edwards curves).

When using Weierstrass curves the situation is not complicated: transmitting compressed points costs a single conversion while no additional cost is needed when transmitting uncompressed points. In the setting of twisted Edwards curves there are more possibilities. The simplest approach is to only use the Montgomery form; however, this is expensive since the Montgomery ladder cannot take advantage of the fixed-base setting. One might consider a hybrid solution: computing the fixed-base scalar multiplication using the birationally equivalent twisted Edwards curve while computing the variable-base scalar multiplication using the Montgomery ladder. In such a hybrid solution the protocol should specify if the coordinates are transmitted in (compressed) twisted Edwards or Montgomery coordinates (which are already in compressed form). When using such a hybrid solution in the setting of ECDHE, transmitting the points in Montgomery form is best (see Table 7). The cost for the conversion (between Montgomery and twisted Edwards) is roughly the same as when only using twisted Edwards curves and transmitting compressed points.

Table 4 gives the cost estimates for the separate components and total cost of the TLS handshake using the ECDHE–ECDSA cipher suite for different security levels. It includes the options with the best results for the cases of Weierstrass curves, twisted Edwards curves and the hybrid approach combining the use of the Montgomery ladder and twisted Edwards. The results suggest that the approach using only twisted Edwards achieves similar performance to the hybrid approach using the Montgomery ladder, while it avoids conversions between coordinate systems (the performance gap between both approaches is below 4 % in all the cases, compressed or uncompressed form). Furthermore, our Montgomery ladder implementations do not include the extra validation step discussed at the end of Sect. 4.3; if incorporated, this would incur additional overhead.

The results in Table 4 also show that the use of twisted Edwards for the ECDHE and full ECDHE–ECDSA computations are approximately a factor 1.46, 1.26 and 1.24 faster in comparison to the Weierstrass curves at the 128-, 192- and 256-bit security levels, respectively. We also include the results from [31] when using NIST P-256. In [31] the fixed-base scalar multiplication is implemented using a relatively large (slightly over 150 KB) lookup table for the fixed-base scalar multiplication. It is unclear if this implementation accesses the table-lookup elements in a cache-attack resistant manner and if the dedicated addition formula used takes care of exceptions, and if so if this is done in constant time. This might explain the faster implementation results.

As a reference we also include the results for Ed25519 [9] (obtained from the “sandy0” benchmark machine [13]), which is a Schnorr-like signature scheme based on a twisted Edwards curve isomorphic to Curve25519. Note that [9] only computes signatures; when computing ECDH one could use the approach as described in [5] which uses the Montgomery ladder. In order to achieve perfect forward secrecy (ECDHE), the implementation can compute the fixed-base scalar multiplication using the Montgomery ladder (which is slow) or convert the point and compute the fixed-base scalar multiplication using the corresponding twisted Edwards curve (using a hybrid approach).

7 Conclusions

In this paper we have presented new elliptic curves for cryptography targeting the 128-, 192-, and 256-bit security levels. By considering different choices for the base field arithmetic, pseudo-Mersenne and Montgomery-friendly primes, we deterministically selected efficient twisted Edwards curves as well as traditional Weierstrass curves. Instead of resorting to the slower complete formulas, we show how to compute efficient scalar multiplications by using constant-time, exceptionless, dedicated group operations. For the cases in which they are not guaranteed to be exceptionless, we have proposed an efficient “complete” addition formula based on masking techniques for Weierstrass curves. Our implementation of the scalar multiplication in the three most-widely deployed scenarios show that our new backwards compatible Weierstrass curves offer enhanced security properties while improving the performance compared to the standard NIST Weierstrass curves. At the expense of at most a few bits of ECDLP security, our new twisted Edwards curves offer a performance increase of a factor 1.2–1.4 compared to our new Weierstrass curves. We demonstrated the potential cryptographic impact by showing cost estimates for these curves inside the TLS handshake protocol.

Notes

Cryptographic libraries with support for generic-prime field arithmetic (e.g., using Montgomery arithmetic) are fully compatible with the proposed curves.
The only instance where the first twisted Edwards curve we found did not fulfill all of the SafeCurves requirements was in the search for ed-383-mers: the constant $A=1629146$ corresponds to a curve-twist pair with $\#E_A=4r$ and $E_A'=4r'$, where $r$ and $r'$ are both prime, but the embedding degree of $E_A$ with respect to $r$ is $(r-1)/188$, which fails to meet the minimum requirement of $(r-1)/100$ imposed in [12].
Except for when $w=2$, where this comes for free.
We note that this cost increases by a single point addition when $wv \mid t$, since an extra precomputed point is needed in this case.
Again, except for when $w=2$, where this comes for free.
Again, we note that when $wv \mid t$, an extra precomputed point is needed.
Validating that $x_1 \in \mathbf{{F}}_p$ corresponds to $E_A$ would incur the small relative cost of an exponentiation and a few multiplications: namely, we reject $x_1$ if $(x_1^3+Ax_1^2+x_1)^{(p-1)/2} = -1$.
A version of the library (known as MSR ECCLib [44]) which supports a subset of the curves presented in this work is publicly available at http://research.microsoft.com/en-us/downloads/149804d4-b5f5-496f-9a17-a013b242c02d/.
This cost assumes the use of the simplest, most secure implementation approach, i.e., each ephemeral key is used once and then discarded.
We also corrected some typos in [18] that were pointed out in [6].
We did not optimize (1) aggressively; we simply grouped common subexpressions and employed obvious operation scheduling—it is likely that there are faster routes.

References

Acar, T., Shumow, D.: Modular reduction without pre-computation for special moduli. Technical report. Microsoft Research (2010)
Ahmadi, O., Granger, R.: On isogeny classes of edwards curves over finite fields. J. Number Theory 132(6), 1337–1358 (2012)
Article MathSciNet MATH Google Scholar
Aranha, D.F., Barreto, P.S.L.M., Pereira, G.C.C.F., Ricardini, J.E.: A note on high-security general-purpose elliptic curves. Cryptology ePrint Archive, Report 2013, 647 (2013). http://eprint.iacr.org/
Bernstein, D.J.: Can we avoid tests for zero in fast elliptic-curve arithmetic? (2006). http://cr.yp.to/papers.html#curvezero
Bernstein, D.J.: Curve25519: New Diffie–Hellman speed records. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) Public Key Cryptography—PKC 2006, vol. 3958 of LNCS, pp. 207–228. Springer, Heidelberg (2006)
Bernstein, D.J.: Counting points as a video game, 2010. Slides of a talk given at Counting Points: Theory, Algorithms and Practice, April 19, University of Montreal. http://cr.yp.to/talks/2010.04.19/slides.pdf
Bernstein, D.J., Birkner, P., Joye, M., Lange, T., Peters, C.: Twisted Edwards curves. In: Vaudenay, S. (ed.) AFRICACRYPT, vol. 5023 of LNCS, pp. 389–405. Springer, Berlin (2008)
Bernstein, D.J., Birkner, P., Lange, T., Peters, C.: ECM using Edwards curves. Math. Comput. 82(282), 1139–1179 (2013)
Bernstein, D.J., Duif, N., Lange, T., Schwabe, P., Yang, B.-Y.: High-speed high-security signatures. J. Cryptogr. Eng. 2(2), 77–89 (2012)
Article MATH Google Scholar
Bernstein, D.J., Hamburg, M., Krasnova, A., Lange, T.: Elligator: elliptic-curve points indistinguishable from uniform random strings. In: ACM conference on computer and communications security (2013)
Bernstein, D.J., Lange, T.: Faster addition and doubling on elliptic curves. In: Kurosawa, K. (ed.) ASIACRYPT, vol. 4833 of LNCS, pp. 29–50. Springer, Berlin (2007)
Bernstein, D.J., Lange, T.: SafeCurves: choosing safe curves for elliptic-curve cryptography. http://safecurves.cr.yp.to. Accessed 16 Oct 2013
Bernstein, D.J., Lange, T. (eds.): eBACS: ECRYPT benchmarking of cryptographic systems. http://bench.cr.yp.to. Accessed 3 Feb 2014
Blake-Wilson, S., Bolyard, N., Gupta, V., Hawk, C., Moeller, B.: Elliptic curve cryptography (ECC) cipher suites for transport layer security (TLS). RFC 4492 (2006)
Bos, J.W., Costello, C., Hisil, H., Lauter, K.: Fast cryptography in genus 2. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT, vol. 7881 of LNCS, pp. 194–210. Springer, Berlin (2013)
Bos, J.W., Halderman, J.A., Heninger, N., Moore, J., Naehrig, M., Wustrow, E.: Elliptic curve cryptography in practice. In: Christin, N., Safavi-Naini, R. (eds.) Financial Cryptography and Data Security, vol. 8437 of LNCS, pp. 157–175. Springer, Berlin (2014)
Bosma, W., Cannon, J., Playoust, C.: The Magma algebra system. I. The user language. J. Symb. Comput. 24(3–4), 235–265 (1997). Computational algebra and number theory (London, 1993)
Article MathSciNet MATH Google Scholar
Bosma, W., Lenstra, H.W.: Complete systems of two addition laws for elliptic curves. J. Number Theory 53(2), 229–240 (1995)
Article MathSciNet MATH Google Scholar
Brumley, B.B., Hakala, R.M.: Cache-timing template attacks. In: Matsui, M. (ed.) ASIACRYPT, vol. 5912 of LNCS, pp. 667–684. Springer, Berlin (2009)
Brumley, D., Boneh, D.: Remote timing attacks are practical. In: Mangard, S. Standaert, F.-X. (eds.) Proceedings of the 12th USENIX security symposium, vol. 6225 of LNCS, pp. 80–94. Springer (2003)
Certicom Research.: Standards for efficient cryptography 2: recommended elliptic curve domain parameters. Standard SEC2, Certicom (2000)
Chevallier-Mames, B., Ciet, M., Joye, M.: Low-cost solutions for preventing simple side-channel analysis: side-channel atomicity. IEEE Trans. Comput. 53(6), 760–768 (2004)
Article MATH Google Scholar
Chudnovsky, D., Chudnovsky, G.: Sequences of numbers generated by addition in formal groups and new primality and factorization tests. Adv. Appl. Math. 7(4), 385–434 (1986)
Article MathSciNet MATH Google Scholar
ECC Brainpool.: ECC Brainpool Standard Curves and Curve Generation. http://www.ecc-brainpool.org/download/Domain-parameters.pdf (2005)
Edwards, H.M.: A normal form for elliptic curves. Bull. Am. Math. Soc. 44, 393–422 (2007)
Article MathSciNet MATH Google Scholar
Faugère, J.-C., Perret, L., Petit, C., Renault, G.: Improving the complexity of index calculus algorithms in elliptic curves over binary fields. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT, vol. 7237 of LNCS, pp. 27–44. Springer, Berlin (2012)
Faz-Hernández, A., Longa, P., Sánchez, A.: Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV-GLS curves (extended version). J. Cryptogr. Eng. 5(1), 31–52 (2015)
Feng, M., Zhu, B., Xu, M., Li, S.: Efficient comb elliptic curve multiplication methods resistant to power analysis. In: Cryptology ePrint Archive, Report 2005/222 (2005). http://eprint.iacr.org/2005/222
Fouque, P.-A., Joux, A., Tibouchi, M.: Injective encodings to elliptic curves. In: Boyd, C., Simpson, L. (eds.) ACISP, vol. 7959 of LNCS, pp. 203–218. Springer, Berlin (2013)
Gallant, R.P., Lambert, R.J., Vanstone, S.A.: Faster point multiplication on elliptic curves with efficient endomorphisms. In: Kilian, J. (ed.) CRYPTO, vol. 2139 of LNCS, pp. 190–200. Springer, Berlin (2001)
Gueron, S., Krasnov, V.: Fast prime field elliptic curve cryptography with 256 bit primes. Cryptology ePrint Archive, Report 2013/816 (2013). http://eprint.iacr.org/
Hamburg, M.: Fast and compact elliptic-curve cryptography. Cryptology ePrint Archive, Report 2012/309 (2012). http://eprint.iacr.org/
Hamburg, M.: Twisting Edwards curves with isogenies. Cryptology ePrint Archive, Report 2014/027 (2014). http://eprint.iacr.org/
Hankerson, D., Menezes, A., Vanstone, S.: Guide to Elliptic Curve Cryptography. Springer Verlag, Berlin (2004)
MATH Google Scholar
Hisil, H., Wong, K.K.-H., Carter, G., Dawson, E.: Twisted Edwards curves revisited. In: Pieprzyk, J. (ed.) Asiacrypt 2008, vol. 5350 of LNCS, pp. 326–343. Springer, Heidelberg (2008)
Joye, M., Tunstall, M.: Exponent recoding and regular exponentiation algorithms. In: Joye, M. (ed.) Proceedings of Africacrypt 2003, vol. 5580 of LNCS, pp. 334–349. Springer, Berlin (2009)
Knežević, M., Vercauteren, F., Verbauwhede, I.: Speeding up bipartite modular multiplication. In: Hasan, M., Helleseth, T. (eds.) Arithmetic of Finite Fields—WAIFI 2010, vol. 6087 of LNCS, pp. 166–179. Springer, Berlin/Heidelberg (2010)
Kocher, P.C.: Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) Crypto 1996, vol. 1109 of LNCS, pp. 104–113. Springer, Heidelberg (1996)
Lenstra, A.K.: Generating RSA moduli with a predetermined portion. In: Ohta, K., Pei, D. (eds.) Asiacrypt’98, vol. 1514 of LNCS, pp. 1–10. Springer, Berlin/Heidelberg (1998)
Lim, C.H., Lee, P.J.: More flexible exponentiation with precomputation. In: Desmedt, Y. (ed.) CRYPTO, vol. 839 of LNCS, pp. 95–107. Springer, Berlin (1994)
Longa, P., Gebotys, C.: Efficient techniques for high-speed elliptic curve cryptography. In: Mangard, S., Standaert, F.-X. (eds.) Proceedings of CHES 2010, vol. 6225 of LNCS, pp. 80–94. Springer, Berlin (2010)
Longa, P., Miri, A.: New composite operations and precomputation scheme for elliptic curve cryptosystems over prime fields. In: Cramer, R. (ed.) Proceedings of PKC 2008, vol. 4939 of LNCS, pp. 229–247. Springer, Berlin (2008)
Meloni, N.: New point addition formulae for ECC applications. In: Carlet, C., Sunar, B. (eds.) Workshop on Arithmetic of Finite Fields (WAIFI), vol. 4547 of LNCS, pp. 189–201. Springer, Berlin (2007)
Microsoft Research.: MSR Elliptic Curve Cryptography Library (MSR ECCLib) (2014). http://research.microsoft.com/en-us/projects/nums
Möller, B.: Algorithms for multi-exponentiation. In: Vaudenay, S., Youssef, A.M. (eds.) Selected Areas in Cryptography, vol. 2259 of LNCS, pp. 165–180. Springer, Berlin (2001)
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Article MathSciNet MATH Google Scholar
Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)
Article MathSciNet MATH Google Scholar
National Security Agency.: Fact sheet NSA Suite B Cryptography. http://www.nsa.gov/ia/programs/suiteb_cryptography/index.shtml (2009)
Okeya, K., Takagi, T.: The width-$w$ NAF method provides small memory and fast elliptic curve scalars multiplications against side-channel attacks. In: Joye, M. (ed.) Proceedings of CT-RSA 2003, vol. 2612 of LNCS, pp. 328–342. Springer, Berlin (2003)
Osvik, D.A., Shamir, A., Tromer, E.: Cache attacks and countermeasures: the case of AES. In: Pointcheval, D. (ed.) CT-RSA, vol. 3860 of LNCS, pp. 1–20. Springer, Berlin (2006)
Schoof, R.: Counting points on elliptic curves over finite fields. Journal de théorie des nombres de Bordeaux 7(1), 219–254 (1995)
Shumow, D., Ferguson, N.: On the possibility of a back door in the NIST SP800-90 dual ec prng. http://rump2007.cr.yp.to/15-shumow.pdf (2007)
Solinas, J.A.: Generalized Mersenne numbers. Technical report CORR 99–39, Centre for Applied Cryptographic Research, University of Waterloo (1999)
Solinas, J.A.: Efficient arithmetic on Koblitz curves. Des. Codes Cryptogr. 19, 195–249 (2000)
Article MathSciNet MATH Google Scholar
The New York Times: Government announces steps to restore confidence on encryption standards. http://bits.blogs.nytimes.com/2013/09/10/government-announces-steps-to-restore-confidence-on-encryption-standards (2013)
Tibouchi, M.: Elligator squared: uniform points on elliptic curves of prime order as uniform random strings. Cryptology ePrint Archive, Report 2014/043 (2014) http://eprint.iacr.org/
U.S. Department of Commerce/National Institute of Standards and Technology: Digital signature standard (DSS). FIPS-186-4 (2013). http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf
Walter, C.D.: Montgomery exponentiation needs no final subtractions. Electron. Lett. 35(21), 1831–1832 (1999)
Article Google Scholar

Download references

Acknowledgments

We thank Niels Ferguson, Thorsten Kleinjung, Dan Shumow and Greg Zaverucha for their valuable feedback, comments, and help. We also would like to thank the anonymous reviewers of JCEN which helped to improve the quality of the paper.

Author information

Authors and Affiliations

NXP Semiconductors, Leuven, Belgium
Joppe W. Bos
Microsoft Research, Redmond, USA
Craig Costello, Patrick Longa & Michael Naehrig

Authors

Joppe W. Bos
View author publications
You can also search for this author in PubMed Google Scholar
Craig Costello
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Longa
View author publications
You can also search for this author in PubMed Google Scholar
Michael Naehrig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Longa.

Appendices

Appendix A: Algorithms for scalar multiplication

Algorithms for variable-base scalar multiplication Algorithm 1 computes scalar multiplication for the variable-base scenario using the fixed-window method from [49]. We refer to Sects. 4.1 and 4.2 for details on its usage with Weierstrass and twisted Edwards curves, respectively. The computation of this operation mainly consists of four different stages: input and point validation, precomputation, recoding and evaluation. Input and point validation are computed at the very beginning of the execution using Algorithm 2 for Weierstrass curves and Algorithm 3 for twisted Edwards curves. In particular, Algorithm 3 performs two doublings over the input point in twisted Edwards to ensure that subsequent computations are performed in the large prime order subgroup (avoiding small subgroup attacks). We remark that it is the protocol implementer’s responsibility to ensure that timing differences during the detection of errors do not leak sensitive information to an attacker. In the precomputation stage, the implementer should first select a window width $2\le w < 10$ according to efficiency and/or memory considerations. For example, selecting $w=6$ for 256-, 384- and 512-bit scalar multiplication was found to achieve optimal performance in our implementations of Weierstrass curves. Precomputation is then computed by successively executing $P+2P+2P+\cdots +2P$ with $2^{w-2}-1$ point additions and storing the intermediate results. Explicit schemes are given in Algorithms 4 and 5 for $a=-3$ Weierstrass and $a=-1$ twisted Edwards curves, respectively. In the recoding stage, we use a variant of the regular recoding by [36] that ensures fixed length (see Algorithm 6). Since Algorithm 6 only recodes odd integers, we include a conversion step at Step 6 to deal with even values. The corresponding correction is performed at Step 20. These computations should be executed in constant time to protect against timing attacks. For example, a constant time execution of Step 6 could be implemented as follows (assuming a two’s complement representation in which $-1 \equiv $ 0xFF$\ldots $FF, and bitlength$(odd) =$ bitlength$(k)$):

$$\begin{aligned}&odd=-(k\hbox { AND }1) \qquad \quad \{\hbox {If }k\, \hbox {is even}\\&\hbox {then } odd=\hbox {0xFF}\ldots \hbox {FF else }odd=0\} \\&k'=k-r \\&k=(odd\hbox { AND }(k\hbox { XOR }k'))\hbox { XOR }k' \quad \{\hbox {If }odd = 0\hbox { then }\\&k=k-r\} \end{aligned}$$

The main computation in the evaluation stage consists of $t = \lceil \log _2(r)/(w-1) \rceil $ iterations each computing $(w-1)$ doublings and one addition with a value from the precomputed table. For $a=-3$ Weierstrass curves, the use of Jacobian coordinates is a popular choice for efficiency reasons. If this is used, then Algorithm 1 can use an efficient merged doubling-addition formula [42] when $w>2$ by setting DBLADD = $true$. Other cases, including Weierstrass curves with $w=2$ or twisted Edwards curves, should use DBLADD = $false$. Note that the evaluation of DBLADD is used to simplify the description of the algorithm. An implementation might choose for having separate functions for twisted Edwards and Weierstrass curves. Following the recommendations from Sect. 4, the last addition should be performed with a unified formula (denoted by $\oplus $) in order to avoid exceptions and it has been separated from the main loop; see Steps 18 and 19. To achieve constant-time execution, the points from the precomputed table should be extracted by doing a full pass over all the points in the lookup table and masking the correct value with the index $(|k_i |-1)/2$. Finally, a suitable conversion to affine coordinates may be computed at Step 21 (if required).

Algorithm 7 computes scalar multiplication for the fixed-base scenario using the modified LSB-set method [27] (denoted by mLSB-set), which combines the comb method [40] and LSB-set recoding [28]. Refer to Sects. 4.1 and 4.2 for details on the use of the method with Weierstrass and twisted Edwards curves, respectively. This operation consists of computations executed offline, which involve point validation and precomputing multiples of the known input point, and computations executed online, which involve scalar validation, recoding and evaluation stages. As before, point validation for twisted Edwards using Algorithm 3 during the offline phase performs two doublings over the input point to ensure that the computation takes place in the large prime order subgroup. Again, it is the protocol implementer’s responsibility to ensure that timing differences during the detection of errors do not leak sensitive information to an attacker. The implementer should choose a window width $2\le w < 10$ and a table parameter $v \ge 1$ according to efficiency and/or memory constraints, taking into account that the mLSB-set method requires $v \cdot 2^{w-1}$ precomputed points. For example, selecting $w=6$ and $v=3$ for 256-bit scalar multiplication was found to achieve optimal performance in our implementations of Weierstrass curves when storage is constrained to 6 KB. During the online computation, the recoded scalar obtained from Algorithm 8 has a fixed length, which enables a fully regular execution when the representation is set up as described at Step 7. Since Algorithm 8 only recodes odd integers, we include a conversion step at Step 6 to deal with even values. The corresponding correction is performed at Step 13. In the evaluation stage, the main computation consists of $e-1 = \lceil \lceil \log _2(r)\rceil / (wv) \rceil - 1$ iterations each computing one doubling and $v$ additions with a value from the precomputed table. Following Sect. 4, the additions should be performed with a unified formula (denoted by $\oplus $) to avoid exceptions. Note that, as described in the variable-base case, all the conditional computations using “if” statements as well as the extraction of points from the precomputed table should be executed in constant time in order to protect against timing attacks (with the exception of Step 3, which depends on public parameters; any potential leak through the detection of errors at Step 4 should be assessed by the protocol’s implementer). Finally, a suitable conversion to affine coordinates may be computed at Step 14 (if required).

Algorithm 9 computes double-scalar multiplication, which is typically found in signature verification schemes, and uses the width-$w$ non-adjacent form [54] with interleaving [30, 45]. We assume that one of the input points is known in advance $(P_2)$ whereas the other one is a variable base $(P_1)$. Hence, we distinguish two phases: offline, which involves validation of $P_2$ and a precomputation stage using the value $w_2$; and online, which involves scalar validation, point validation of $P_1$ and precomputation (using $w_1$), recoding and evaluation stages. Again, point validation for twisted Edwards curves with Algorithm 3 performs two doublings over the input points to ensure computation in the large prime order subgroup. The precomputation for both input points are performed as in the variable-base scenario using Algorithms 4 and 5 for $a=-3$ Weierstrass and $a=-1$ twisted Edwards curves, respectively. However, the implementer has additional freedom in the selection of $w_2$ since the precomputation for the fixed-base is done offline. For example, we found that using $w_1=6$ and $w_2=7$ results in optimal performance in our implementations of Weierstrass curves when storage was restricted to 2, 3 and 4 KB for 128-, 192- and 256-bit security levels. In the online computation, recoding of the scalars is performed using [34], Algorithm 3.35]. Accordingly, the evaluation stage consists of $\lceil \log _2(r) \rceil + 1$ iterations, each consisting of one doubling and at most two additions (one per precomputed table). As in the variable-base case, for $a=-3$ Weierstrass curves using Jacobian coordinates one may use the merged doubling-addition formula [42] by setting DBLADD = $true$. A suitable conversion to affine coordinates may be computed at Step 39 (if required).

Appendix B: Algorithms for point operations

Refer to Algorithms 10–17.

Appendix C: Implementing the group law

Weierstrass curves It is standard to represent points on $E_b:y^2=x^3-3x+b$ using Jacobian coordinates [21, 48, 57]: for non-zero $Z \in \mathbf{{F}}_p$, the tuple $(X :Y :Z)$ is used to represent the affine point $(X/Z^2, Y/Z^3)$ on $E_b$. There are many different variants of the Jacobian formulas originally proposed in [23]. In our implementation we use the doubling formula from [41] (see Algorithm 10). Point additions are usually performed between a running point and a point from a (precomputed) ‘look-up’ table. Typically, it is advantageous to leave the precomputed points in projective form for variable-base computations, and to convert them (offline) to their affine form for fixed-base computations. When elements in the table are stored in affine coordinates, point addition is performed using mixed Jacobian/affine coordinates using, for example, the formula presented in [34] (see Algorithm 13). There are cases in which exceptions in the formulas might arise. This is the case, for example, for fixed-base scalar multiplication. To achieve constant-time execution, we devised a complete formula based on masking that works for point addition, doubling, inverses and the point at infinity (see Algorithm 18). If points from the precomputed table are stored in projective coordinates, we use Chudnovsky coordinates to represent the affine point $(X/Z^2, Y/Z^3) \in E_b$ by the projective tuple $(X:Y:Z:Z^2:Z^3)$. The corresponding addition formula is given as Algorithm 12. More efficiently, whenever a doubling is followed by an addition (as in the main loop of the variable-base scalar multiplication; see Algorithm 1) one can use a merged doubling-addition formula [42] that is based on the special addition with the same $Z$-coordinate from [43] (see Algorithm 11). The different costs of the point formulas used in our implementation can be found in Table 5. Finally, the exact routine to perform the precomputation for the variable-base scenario is outlined in Algorithm 4. The scheme uses a straightforward variant of the general formulas, including the special addition from [43].

Twisted Edwards curves Hisil et al. [35] derive efficient formulas for additions on (special) twisted Edwards curves [7] by representing affine points $(X/Z, Y/Z)$ on ${\mathcal {E}}_d:-x^2+y^2=1+dx^2y^2$ by the projective tuple $(X:Y:Z:T)$, where $T = XY/Z$. Hamburg [32] proposes to represent such a projective point using five elements: $(X:Y:Z:T_1:T_2)$, where $T=T_1T_2$. This has the advantage of avoiding a required look-ahead when computing the elliptic curve scalar multiplication using the techniques from [35]. If the addition formulas are “dedicated” they do not work for doubling but are usually more efficient. The details of the dedicated additions used in our implementation are outlined in Algorithms 15 and 16. For settings that might trigger exceptions in the formulas (e.g., fixed-based scalar multiplication), one can use the unified addition formula proposed by [35] (see Algorithm 17). The algorithm for point doubling on ${\mathcal {E}}_d$ is given in Algorithm 14: this extends the formula from [7] by using the five element representation as suggested in [32].

When storing precomputed points, we follow the caching techniques described in [35]: we store affine points as $(x+y,y-x,2t)$ with $t=xy$, or projective points as $(X+Y:Y-X:2Z:2T)$ with $T=XY/Z$, both of which can speed up the scalar multiplication computation. Just as in the case of the Weierstrass curves above, it is usually advantageous to leave the precomputed points in projective form for variable-base computations, and to convert them (offline) to their affine form for fixed-base computations. The explicit routine that performs the precomputation for the variable-base scenario is outlined in Algorithm 5. The costs of the different formulas used in our implementation are displayed in Table 5.

Table 5 An overview of the number of modular operations required to implement the group law for $a=-3$ Weierstrass, $a=-1$ twisted Edwards and Montgomery curves using different coordinate systems

Full size table

1.1 Appendix C.1: Complete addition laws

An elliptic curve addition law is said to be complete if it correctly computes the group operation regardless of the two input points. Although employing such an addition law on its own can simplify the task of the implementer, it usually incurs a performance penalty. This is because the fastest formulas available for a particular curve model, which work fine for most input pairs, tend to fail on certain inputs. However, it is often the case that implementers can safely exploit the speed of such incomplete formulas by correctly dealing with all possible exceptions, or by designing the scalar multiplication routine such that exceptions can never arise.

All of the twisted Edwards curves presented in this paper can make use of the complete addition law in [11] by working on the birationally equivalent Edwards model $\mathcal {E}_{-1/d}: x^2 + y^2 = 1 - (1/d)x^2y^2$. However, the complete formulas are slower compared to the fastest formulas on the twisted Edwards curve [35]. But even when working on an Edwards curve with complete formulas, an implementation of the scalar multiplication could still be sped up by mapping to a different curve, while remaining with the complete formulas for all other operations. One could for example follow the approach suggested in [33], and use an isogeny to the twisted Edwards curve $\mathcal {E}_{-1/d-1}: x^2 + y^2 = 1 - (1/d+1)x^2y^2$; or use the birational equivalence to $\mathcal {E}: -x^2 + y^2 = 1 + dx^2y^2$.

The situation for the prime order Weierstrass curves in this paper is more complicated. As pointed out by Bosma and Lenstra [18], the best that we can do for general elliptic curves is as follows: on input of two points $P_1$ and $P_2$, we must compute two candidate sums, $P_3$ and $P_3'$, for which we can only be guaranteed that at least one of them is a correct projective representation for $P_1+P_2$. In the case that precisely one of $P_3$ and $P_3'$ correctly corresponds to $P_1+P_2$, the other candidate has all of its coordinates as zero; although this makes it straightforward to write a constant-time routine for complete additions, it also means that computing complete additions in this way is much more costly than computing incomplete additions.

For the sake of comparison, we present the simplified version of the complete formulas^{Footnote 10} from [18], which are specialized to short Weierstrass curves of the form $E:y^2=x^3+ax+b$. For two input points $P_1 = (X_1 :Y_1 :Z_1)$ and $P_2 = (X_2 :Y_2 :Z_2)$ in homogeneous projective space, the two candidate sums $P_3=(X_3 :Y_3 :Z_3)$ and $P_3' = (X_3' :Y_3' :Z_3')$ are computed as

$$\begin{aligned} X_3= & {} (X_1Y_2-X_2Y_1)(Y_1Z_2+Y_2Z_1) \nonumber \\&-(X_1Z_2-X_2Z_1)(a(X_1Z_2+X_2Z_1)\nonumber \\&+\,3bZ_1Z_2-Y_1Y_2); \nonumber \\ Y_3= & {} -(3X_1X_2+aZ_1Z_2)(X_1Y_2-X_2Y_1) \nonumber \\&+ (Y_1Z_2-Y_2Z_1)(a(X_1Z_2+X_2Z_1)\nonumber \\&+\,3bZ_1Z_2-Y_1Y_2); \nonumber \\ Z_3= & {} (3X_1X_2+aZ_1Z_2)(X_1Z_2-X_2Z_1) \nonumber \\&-(Y_1Z_2+Y_2Z_1)(Y_1Z_2-Y_2Z_1); \nonumber \\ X_3'= & {} -(X_1Y_2+X_2Y_1)(a(X_1Z_2+X_2Z_1)\nonumber \\&+3bZ_1Z_2-Y_1Y_2)-(Y_1Z_2+Y_2Z_1)\nonumber \\&(3b(X_1Z_2+X_2Z_1)+ a(X_1X_2-aZ_1Z_2)); \nonumber \\ Y_3'= & {} Y_1^2Y_2^2+3aX_1^2X_2^2-2a^2X_1X_2Z_1Z_2\nonumber \\&-(a^3+9b^2)Z_1Z_2^2+ (X_1Z_2+X_2Z_1)\nonumber \\&(3b(3X_1X_2-aZ_1Z_2)- a^2(X_2Z_1+X_1Z_2)); \nonumber \\ Z_3'= & {} (3X_1X_2+aZ_1Z_2)(X_1Y_2+X_2Y_1)\nonumber \\&+(Y_1Z_2+Y_2Z_1)(Y_1Y_2+3bZ_1Z_2\nonumber \\&+a(X_1Z_2+X_2Z_1)). \end{aligned}$$

(1)

In the case of $a=-3$ short Weierstrass curves, like the prime order curves in this paper, we found that the computations in (1) require at most^{Footnote 11} 22 multiplications, 3 multiplications by $b$, and one multiplication by $b^2-3$. The adaptation of the formulas to points in Jacobian coordinates can be achieved in the obvious way at an additional cost of 6 multiplications and 3 squarings: preceding (1), we can transform from Jacobian coordinates to homogeneous coordinates by taking $X_i \leftarrow X_i \cdot Z_i$ and then $Z_i \leftarrow Z_i^3$ for $i=1,2$; and, following the correct choosing of $P_3 = (X_3 :Y_3 :Z_3)$, we can move back to Jacobian coordinates by taking $X_3 \leftarrow X_3 \cdot Z_3$ and then $Y_3 \leftarrow Y_3 \cdot Z_3^2$.

Table 6 The traces of Frobenius $t$ for the curves in Tables 1 and 2

Full size table

Table 7 The cost of converting points when using the curves from Tables 1 and 2

Full size table

Although the formulas in (1) are mathematically satisfactory, their computation costs around twice as much as an incomplete addition (see Table 5), which renders them far from satisfactory in cryptographic applications. On the other hand, the work-around we present in Algorithm 19 and Algorithm 18, while perhaps not as mathematically elegant, is equivalent for all practical purposes and incurs a much smaller overhead over the incomplete formulas. In particular, there are no additional multiplications or squarings (on top of those incurred during an incomplete addition) required when performing a complete addition via this masking approach (Tables 6, 7).

As briefly discussed in Sect. 4.1, the idea is to exploit the similarity between the sequences of operations computed in a doubling and an addition. On input of $P$ and $Q$, one would ordinarily compute the doubling $2P$ and the (non-unified) addition $P+Q$ and mask out the correct result at the end, depending on whether $P=Q$. However, the detection of $P=Q$ (or not) can be achieved much earlier in projective space using only a few operations that are common to both doublings and non-unified additions—see Line 17 (resp. Line 12) in Algorithm 19 (resp. Algorithm 18). After this detection, the required operation (doubling or addition) is achieved by masking the correct inputs and outputs through a sequence of subsequent computations, those which overlap in the explicit formulas for point doublings and additions. Of course, in the case that one or both of $P$ or $Q$ is ${\mathcal {O}}$, or that $P=-Q$, these superfluous computations are still computed in constant-time such that the correct result is masked out in a cache-attack resistant manner.

Appendix D: Traces of Frobenius

Refer to Table 6.

Appendix E: Costs of point conversion

Refer to Table 7.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bos, J.W., Costello, C., Longa, P. et al. Selecting elliptic curves for cryptography: an efficiency and security analysis. J Cryptogr Eng 6, 259–286 (2016). https://doi.org/10.1007/s13389-015-0097-y

Download citation

Received: 25 October 2014
Accepted: 08 April 2015
Published: 01 May 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s13389-015-0097-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Selecting elliptic curves for cryptography: an efficiency and security analysis

Abstract

Similar content being viewed by others

Security and efficiency trade-offs for elliptic curve Diffie–Hellman at the 128-bit and 224-bit security levels

Faster Compact Diffie–Hellman: Endomorphisms on the x-line

A Compact and Exception-Free Ladder for All Short Weierstrass Elliptic Curves

1 Introduction

2 Modular arithmetic: choosing primes

3 Curve selection

3.1 Curve selection for Weierstrass curves

3.2 Curve selection for twisted Edwards (and Montgomery) curves

3.3 Correspondence between minimal \(A\) and \(d\) for twisted Edwards curves

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

3.4 Curve properties

4 Efficient, constant-time, and exceptionless scalar multiplications

4.1 Weierstrass scalar multiplications

Proposition 1

Proof

Proposition 2

Proof

4.2 Twisted Edwards scalar multiplications

Proposition 3

Proof

Proposition 4

Proof

4.3 The Montgomery ladder

5 Implementation results

6 Real-world protocols

7 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Algorithms for scalar multiplication

Appendix B: Algorithms for point operations

Appendix C: Implementing the group law

1.1 Appendix C.1: Complete addition laws

Appendix D: Traces of Frobenius

Appendix E: Costs of point conversion

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation