1 Introduction

Constructive pairings were introduced into cryptography in the early 2000’s with a one round tripartite Diffie–Hellman key exchange [31], identity-based encryption [12], and short signatures [13]. More recently, new applications have been proposed, e.g. zero-knowledge proofs [11] used in the Zcash cryptocurrency and electronic voting.

Pairing-based cryptography relies on the hardness of the discrete logarithm (DL) problem over two groups: an elliptic curve \(E(\mathbb {F}_p)\) and a finite field \(\mathbb {F}_{p^k}\), k being the embedding degree. Advances on discrete logarithm computation over special field extensions \(\mathbb {F}_{p^k}\) force us to review not only the parameter sizes of curves used in practice, but also the families of curves used to generate parameters.

The computation of discrete logarithms in finite fields and the factorisation of large integers are both addressed by the Number Field Sieve algorithm (NFS). Its complexity is \(L_{p^k}(1/3,c+o(1))\), with the notation \(L_{p^k}(\alpha ,c) = \exp (c (\log p^k)^\alpha (\log \log p^k)^{1-\alpha })\). For the families of finite fields that we consider in this paper, we have \(1.526 \le c \le 2.20\). The value of c depends on the variant of the NFS algorithm. For prime fields \(\mathbb {F}_p\), \(c=1.923\) and this complexity was considered as the reference for choosing the size of any finite field \(\mathbb {F}_{p^k}\), where p is medium to large compared to \(p^k\). However, for some degrees k the NFS algorithm can be parameterised differently, yielding a better complexity with a smaller c. The size of the finite field must then be increased so as to maintain the same security as previously thought, that is with \(c=1.923\). In the context of finite fields (unrelated to pairings), so-called special primes p are subject to the special NFS (SNFS) variant with \(c=1.526\): in these cases, the size of p gives a false sense of security [27, 28, 47, 49, 53]. The most efficient pairings were obtained with specific families of pairing-friendly curves, where the prime p is special (see Table 2). In 2013, Joux and Pierrot exploited this weakness [33]. In 2015, Barbulescu, Gaudry and Kleinjung revisited Schirokauer’s Tower NFS (TNFS) variant, and obtained \(c=1.526\) in a theoretical “special Tower NFS” variant (STNFS) [9]. In 2016, Kim and Barbulescu applied the STNFS variant to finite fields of composite extension degree k and obtained, in the best case, an algorithm of complexity with \(c=1.526\) [36]. This means that, asymptotically, the size of the finite field \(\mathbb {F}_{p^k}\) should be roughly doubled to provide the same security as thought before.

However, the asymptotic complexity does not provide enough accuracy to deduce the sizes of the finite fields that we would like to use for cryptography today. Menezes, Sarkar and Singh already showed that the efficiency of the STNFS variants depends on the total size and the extension degree [42], and that the STNFS variant with the best asymptotic complexity is not necessarily the best variant that applies to a 3000-bit finite field. Then Barbulescu and Duquesne proposed a way to refine the estimates [7] and proposed parameters for 128 bits of security for BN, BLS, and KSS curves (where \(k=12\), 16 and 18).

Table 1 Sizes and DL cost estimates

The rule of thumb that prime fields of 3072 to 3200 bits offer 128 bits of security is extrapolated from the asymptotic complexity, re-scaled according to a record computation. It almost systematically relies on the bodacious assumption \(o(1)=0\) in the asymptotic formula. Since a record computation is not available for the TNFS and STNFS algorithms, the papers [7, 42] simulate a simplified version of the new algorithms to estimate their costs. We recall in Table 1 the popular pairing-friendly curves before the STNFS algorithm (2015), and the new propositions of [7], together with our estimate of the running-time of a DL computation in the corresponding fields \(\mathbb {F}_{p^k}\).

Generation of pairing-friendly curves. Pairings on elliptic curves map pairs in \(E(\mathbb {F}_p)[r] \times E(\mathbb {F}_{p^k})[r]\) to \(\mathbb {F}_{p^k}\), and the embedding degreek should be small, say \(1\le k\le 20\). The first suggested pairings used supersingular curves, because they were the first known way to produce curves with a small embedding degree. Then two other approaches were proposed (and surveyed in [26]). On the one hand, polynomial methods parameterise the characteristic p, the trace t, and the curve order r by polynomials. The parameterisation enables two fast variants of the Tate pairing: the ate or optimal ate pairing [52]. A Tate pairing computation contains an internal loop (the Miller loop) of length \(\log _2 r\), while this length is \(\log _2(t-1)\) (resp. \((\log _2 r)/\varphi (k)\)) for an ate (resp. optimal ate) pairing. On the other hand, non-polynomial methods were also proposed. Because of the large gap in pairing efficiency, the latter methods attracted less attention, and were not optimised.

Pairing-friendly curves from polynomial constructions enjoy fast implementations (e.g., [2]). However, the downside is that since p is parameterised by a polynomial, the Kim–Barbulescu STNFS algorithm applies (at least in some cases), and security estimates need to be revised. Alternatively, cryptographers look for new pairing-friendly curves. Chatterjee, Menezes and Rodríguez-Henríquez propose in [16] to revisit curves of degree one (and trace \(t=2\)), avoiding the TNFS attack. The target finite field is a prime field so it only requires \(\log _2(p) \ge 3072\) to get 128-bits of security [44]. Fotiadis and Konstantinou propose in [24] families of elliptic curves from a variant of the (polynomial) Brezing–Weng method. For composite embedding degree, they increase the parameter sizes in order to get TNFS-resistant curves, but they also propose curves of prime embedding degree for which the TNFS attack is restricted to \(\deg h = k\) only. Unfortunately, the prime p has a polynomial form with tiny coefficients so the special variants (SNFS and STNFS) still apply.

Our contribution. In this article, we take a different approach: to avoid having to increase the size of the target finite field \(\mathbb {F}_{p^k}\), we choose a Cocks–Pinch curve so that p is not special and the special variants (SNFS, STNFS) do not apply. But on Cocks–Pinch curves, no optimal ate pairing is available, and the ate pairing is as slow as the Tate pairing because of a large trace t. So we modify the Cocks–Pinch method to obtain a trace of smallest possible size\((\log _2 r)/\varphi (k)\), and arrange so that the ate pairing or its variant [52, §2.2] is available. We obtain an optimal pairing as defined by Vercauteren in [52]. We generate curves of embedding degree 5 to 8 in order to compare the efficiency of the ate pairing with different sizes of parameters.

  • For composite extension degrees (\(k = 6\) and 8), we reuse some of the optimisations from the literature to obtain a pairing computation that is as efficient as with competing constructions. While it is true that by doing so, we endow our prime p with some special structure, we argue in this paper that the multivariate nature of our parameterisation offers much more flexibility than known constructions, and to our knowledge thwarts all known “special” variants of NFS.

  • For prime embedding degrees (\(k=5\) and 7), the TNFS attack is restricted to one choice: \(\deg h = k\). It leads to a smaller target finite field than in the composite cases. But a prime embedding degree eliminates some optimisation opportunities for the pairing computation.

This article also gives cost estimates and comparisons based on existing software. We show that the added confidence in the DL hardness can be obtained without sacrificing the pairing efficiency too much.

Organisation of the paper. In Sect. 2, we recall the Tate and ate pairings, in particular the notions of Miller loop and final exponentiation. We present in Sect. 3 our Cocks–Pinch variant to construct secure curves with an efficient pairing. Section 4 addresses the cost estimates for DL computation with the known variants of the NFS algorithm. In Sect. 5 we provide parameters of curves for 128 bits of security together with the analysis of the pairing cost. We compare the pairing efficiency to their challengers: BN, BLS12 and KSS16 curves, and embedding degree one curves from [16].

1.1 Code repository

Companion code is provided for several sections in this article, including code to reproduce experimental data. The code repository is publicly accessible at:

https://gitlab.inria.fr/smasson/cocks-pinch-variant.

2 Background on pairings

We present here the computation of two pairings used in practice, the Tate and ate pairings. Then we list refinements in the case of ate pairing on BN curves.

Let E be an elliptic curve defined over \(\mathbb {F}_p\). Let \(\pi _p\) be the Frobenius endomorphism \((x,y)\mapsto (x^p, y^p)\). Its minimal polynomial is \(X^2-tX+p\) and t is called the trace. Let r be a prime divisor of \(\#E(\mathbb {F}_p) = p+1-t\). The r-torsion subgroup of E is denoted \(E[r]:= \{P\in E(\overline{\mathbb {F}_p}), [r]P = \mathcal {O}\}\) and has two subgroups of order r (eigenspaces of \(\pi _p\) in E[r]) that are useful for pairing applications: \(\mathbb {G}_1=E[r] \cap \ker (\pi _p - [1])\) and \(\mathbb {G}_2=E[r] \cap \ker (\pi _p - [p])\). The latter is defined over \(\mathbb {F}_{p^k}\), where the embedding degreek is the smallest integer \(k\in \mathbb {N}^*\) such that r divides \(p^k - 1\). A pairing-friendly curve is an elliptic curve that satisfies the following conditions: p and r are prime numbers, t is relatively prime to p, and k should be small. The discriminant \(-D\) is the fundamental discriminant of the quadratic imaginary field defined by \(X^2-tX+p\) (so that \(t^2-4p=-Dy^2\) for an integer y). All constructions require that |D| be small enough, so that the complex multiplication method is feasible (the record computation in [50] has \(|D| \sim 10^{16}\)). The \(\rho \)-value of E is defined by \(\rho (E) = \log (p)/\log (r)\). The “ideal” case is \(\rho (E) \approx 1\) when \(r = \#E(\mathbb {F}_p)\).

We recall the Tate and ate pairings definition, based on the same two steps: evaluating a function \(f_{s,Q}\) at a point P, and then raising it to the power \((p^k-1)/r\). (Sometimes the pairing is said reduced to stress the final exponentiation). The function \(f_{s,Q}\) has divisor \(\text {div}(f_{s,Q}) = s(Q) - ([s]Q) - (s-1)(\mathcal {O})\) and satisfies for integers i and j

$$\begin{aligned} f_{i+j,Q} = f_{i,Q} f_{j,Q} \frac{\ell _{[i]Q,[j]Q}}{v_{[i+j]Q}} \end{aligned}$$

where \(\ell _{[i]Q,[j]Q}\) and \(v_{[i+j]Q}\) are the two lines needed to compute \([i+j]Q\) from [i]Q and [j]Q (\(\ell \) through the two points, v the vertical). We compute \(f_{s,Q}(P)\) with the Miller loop presented in Algorithm 1.

figure a

The Tate and ate pairings are defined by

$$\begin{aligned} \text {Tate}(P, Q):= f_{r, P}(Q)^{(p^k-1)/r}, \text{ and } \text {ate}(P, Q):= f_{t-1, Q}(P)^{(p^k-1)/r} \end{aligned}$$

where \(P \in \mathbb {G}_1\subset E[r](\mathbb {F}_p)\) and \(Q \in \mathbb {G}_2\subset E[r](\mathbb {F}_{p^k})\). The values \(\text {Tate}(P, Q)\), and \(\text {ate}(P, Q)\) are in the “target” group \(\mathbb {G}_T\) of r-th roots of unity in \(\mathbb {F}_{p^k}\).

Before we analyse in Sect. 5 the cost of computing pairings, we briefly comment on the CM discriminant \(-D\). When \(D=3\) (resp. \(D=4\)), the curve has complex multiplication by \(\mathbb {Q}(\sqrt{-3})\) (resp. \(\mathbb {Q}(\sqrt{-1})\)), so that a twist of degree 6 (resp. 4) exists. When E has d-th order twists for some d|k, then \(E[r](\mathbb {F}_{p^k})\) is isomorphic to \(E'[r](\mathbb {F}_{p^{k/d}})\) for some twist \(E'\). Dealing with the latter is easier. Therefore, composite extension degrees are often an invitation to choose \(D=3\) or \(D=4\).

3 Construction of secure curves with efficient ate pairing

In this section, we look for curves that are not threatened by recent variants of NFS. We make the following observations.

  • All families of curves in [26] compute p as a polynomial evaluated at a chosen integer. This (often) enables the STNFS algorithm [33], so that the DL problem in \(\mathbb {F}_{p^k}\) is easier than in other fields of same bit length.

  • While composite extension degrees are appealing for fast pairing computation (see Sect. 2), they also offer additional parameterisation choices for the TNFS algorithm [36]. This also makes DL computations in \(\mathbb {F}_{p^k}\) more efficient.

We wish to avoid special primes. Furthermore, as our range of interest \(5\le k\le 8\) contains the composite degrees \(k=6\) and \(k=8\), we acknowledge the need to choose the size of p so as to compensate the TNFS attack.

The special variants of NFS rely on a special form of p: the prime integer should have a polynomial form \(p = p(u)\) where u is a seed and p(x) is a polynomial. Roughly put, for the special variants to apply, p(x) should have degree at least 3 and (most importantly) tiny coefficients. MNT curves of embedding degree \(k=6\) are defined by a characteristic \(p(x) = 4x^2+1\), a curve order \(r(x) = 4x^2-2x+1\) and a trace \(t(x) = 2x+1\). One can check that \(r(x) | \varPhi _6(p(x))\) where \(\varPhi _6(x) = x^2-x+1\) is the 6-th cyclotomic polynomial. Yet the prime p of a MNT curve is not special because p(x) has degree 2 only. BLS curves for \(k=6\) have \(p(x) = (x^4-3x^3+4x^2+1)/3\), \(r(x)=x^2-x+1\) and \(t(x) = x+1\). The prime p of a BLS-6 curve is special because p(x) has degree 4 and tiny coefficients (one can use 3p(x) in the Special setting, and the largest coefficient is 4). We will obtain families of curves where p(x) has a degree larger than 2 but has large coefficients so that the special setting (STNFS) does not perform better than a generic setting (TNFS).

To avoid special primes, we revisit the Cocks–Pinch method, which constructs pairing-friendly curves with freely chosen embedding degree k and discriminant \(-D\). The classical Cocks–Pinch algorithm first fixes the prime r and deduces a root of unity mod r to compute t and then p satisfying the conditions of pairing-friendly curves. Instead, we first choose T small, and then compute r such that T is a root of the k-th cyclotomic polynomial \(\varPhi _k\) mod r. Then we observe that \(f_{T,Q}(P)\) like \(f_{t-1,Q}(P)\) gives a Miller loop of a bilinear pairing.

figure b

Our variant is given in Algorithm 2. The trace \(\bar{t}\in \mathbb {Z}/r\mathbb {Z}\) can be any of \(\bar{t}=T^i+1 \bmod r\) where \(\gcd (i,k)=1\), and \(\pm \bar{y} = (\bar{t}-2)/\pm \sqrt{-D} \bmod r\) as in the original method. We then lift \(\bar{t},\pm \bar{y}\) to \(t_0,y_0\in \mathbb {Z}\). Since \(\sqrt{-D}\) is only determined up to sign, we fix the sign indetermination problem as follows.

Remark 3.1

(Normalisation of\(t_0\)and\(y_0\)) When lifting \(\bar{t},\pm \bar{y}\) from \(\mathbb {Z}/r\mathbb {Z}\) to \(\mathbb {Z}\), we choose \(t_0\) as the signed representative of \(\bar{t}\) with smallest absolute value, and \(y_0\) as the smallest positive representative of \(\pm \bar{y}\).

The choice of the cofactors \(h_t\) and \(h_y\) in Algorithm 2 must abide by certain rules so that the Weil number \(\pi \) is an algebraic integer: if \(-D\equiv 0\mod 4\), then \(t_0+h_t\) must be even, and if \(-D\equiv 1\mod 4\), then \(t_0+y_0+h_t+h_y\) must be even. For p to have the desired bit length, we notice that \(h_t\) and \(h_y\) must be chosen in an annulus-like region given by the equation \(2^{\lambda _p+1}\le (t_0+h_tr)^2+D(y_0+h_yr)^2<2^{\lambda _p+2}\).

The Miller algorithm in the ate pairing iterates on \(t-1 = T^i\), a k-th root mod r. Iterating on another root of unity also gives a pairing, as remarked by the following statement.

Theorem 3.2

([52, §2.2]) Let \(P\in \mathbb G_1 = E[r]\cap \ker (\pi _p -[1])\) and \(Q\in \mathbb G_2 = E[r]\cap \ker (\pi _p - [p])\). Let T be a k-th root of unity mod r. Then, \(f_{T,Q}(P)\) defines a bilinear pairing and

$$\begin{aligned} \text {Tate}(Q,P)^L = f_{T,Q}(P)^{c(p^k-1)/N} \end{aligned}$$

where \(N = \gcd (T^k-1, p^k-1)\), \(T^k - 1= LN\), and \(c = \sum _{i=0}^{k-1}T^{k-1-i}p^{ji}\).

In particular, our T in Algorithm 2 is convenient and defines an optimal ate pairing in the sense of [52, Definition 1]:

$$\begin{aligned} \text {OptimalAte}(P, Q):= f_{T, Q}(P)^{\frac{p^k-1}{r}}~. \end{aligned}$$

Does Algorithm 2produce primes of a special form? In this paper, we will discuss in particular whether we hold to our promise that p, as issued by Algorithm 2, is not special. The prime p has the form

$$\begin{aligned} p = \frac{1}{4}\left( (t_0+h_tr)^2+D\left( y_0+h_yr\right) ^2\right) \end{aligned}$$

where \(t_0,y_0\) are centered representatives of \(T^i+1 \bmod r\) and \((t_0-2)/\sqrt{-D} \bmod r\) resp., and both r and \(t_0\) are low-degree polynomials in T.

If T, \(h_t\), and \(h_y\) are chosen by Algorithm 2 as random integers of the desired bit length, and that D is arbitrary (then \(y_0\) has no nice sparse polynomial expression in T), then the expression above is considered unlikely to yield any computational advantage to an attacker.

On the other hand, efficiency considerations may lead us to choose D specially, so as to allow extra automorphisms on the curve, for example choose \(-D\) as the discriminant of the d-th cyclotomic field \(\mathbb {Q} (\zeta _{d})\) for some \(d\mid k\). Then \(\sqrt{-D}\) typically has a low-degree polynomial expression in T. If T, \(h_t\), and \(h_y\) are then chosen with low Hamming weight, the answer is less clear. In comparison to other pairing-based constructions however (see Table 2), we have here a multivariate expression for p (it depends on T, \(h_t\), and \(h_y\)). First there exists no special-purpose NFS construction that adapts well to a bivariate polynomial. Secondly for fixed \(h_t\) and \(h_y\), one can obtain a univariate polynomial in T. This setting will provide a notable advantage to NFS only if\(h_t\)and\(h_y\)are tiny enough to produce a sparse polynomial\(p_{h_t,h_y}(T)\). As an illustration, here is the multivariate expression of 4p in the case \(k=8\), \(D=4\), and \(i=1\).

$$\begin{aligned} 4p&= (h_t^2 + 4h_y^2)T^8 +4 h_y T^7 - (4h_y - 1) T^6 +2(h_t - 1) T^5 + (2h_t^2 + 8h_y^2 + 2h_t + 1)T^4 \\&\quad +4h_yT^3 -(4h_y - 1)T^2 +2(h_t + 1)T +(h_t^2 + 4h_y^2 + 2h_t + 1). \end{aligned}$$
Table 2 Parameters of commonly used pairing-friendly curves

If \(h_t\) and \(h_y\) are small, one can get a univariate expression for p and exploit the S(T)NFS variant. In [24, Remark 3, Table 3], one finds a family (denoted FK-8 in the following to stand for Fotiadis–Konstantinou) with \(D=4,h_t=1,h_y=0\) and \(p(x)=(x^8+x^6+5x^4+x^2+4x+4)/4\), \(r(x)=x^4+1\), \(t(x)=x+1+r(x)=x^4+x+2\), \(y(x) = (x^3-x^2)/2\) (where \(p(x)=(t(x)^2+4y(x)^2)/4\)). In our case, for r of 256 bits and p of 544 bits, we have \(h_y\) of 16 bits, hence \(p_{h_t,h_y}(T)\) has coefficients of more than 32 bits.

We design in Sect. 4 an estimation of the cost needed to compute a discrete logarithm in \(\mathbb {F}_{p^k}\) with STNFS and TNFS and compare the costs obtained for MNT-6, BLS-6 and Cocks–Pinch-6 curves, and FK-8, Cocks–Pinch-8 curves and a third \(k=8\) family of curves (Tanaka–Nakamula curves). We show in Sect. 4.2 as well as in Appendix C that using very small \(h_t\) and \(h_y\), the NFS variants can exploit a univariate polynomial expression and reduce the security of the DL over the finite field \(\mathbb {F}_{p^k}\). But if these parameters are large enough, the advantage vanishes.

4 DL cost estimate and size requirements

In this section, we would like to determine, for each embedding degree \(k \in \{5,6,7,8\}\), the appropriate bit length of p so that the pairings on the curves constructed in Sect. 3 match the 128-bit security level. This leads us to assess the hardness of the DL computation in the subgroup of (256-bit) prime order r of \(\mathbb {F}_{p^k}\). In terms of notations, we naturally search for different values of p for each k, so that p depends on k. For brevity however, we prefer to keep the notation p rather than use \(p_k\).

We also do the same analysis for other curve families. We strive to take into account in our analysis the different known NFS variants. This complements the study in [7].

4.1 Strategy for estimating NFS cost

Estimating the computational cost of the number field sieve is a delicate task because of the inherent complexity of NFS, its numerous parameters and its variants, and moreover because of the very different nature of the different steps of the algorithm. This is a vastly different situation from, say, the assessment of the DLP hardness for elliptic curves, where the lack of any algorithm more advanced than \(O(\sqrt{r})\) algorithms makes estimations comparatively easy.

The two main steps of NFS are relation collection and linear algebra. We tried to estimate both, in terms of “elementary operations” that combine a memory access (to sieve array elements, to vector coefficients) as well as arithmetic operations. Our simulation methodology is not dissimilar to the one used in [7, 42]: starting from an NFS or TNFS setup, we estimate the norms involved in the relation collection step, and the associated smoothness probability. Further detail is given in Appendix B.2. We summarise here the variations that we introduce:

  • In the NFS context, coprime (ab) pairs that form the primary source of candidates for smoothness are counted as the total number of pairs in the sieve area times the factor \(\frac{6}{\pi ^2}=\frac{1}{\zeta (2)}\). In the TNFS context, the analogue of this scaling is given by \(\frac{1}{\zeta _K(2)}\) where \(\zeta _K\) is the Dedekind zeta function of the base number field [30].

  • We compute the smoothness probabilities of the two norms that are deduced from (ab) pairs as the average of the smoothness probabilities of norms over \(10^6\) samples, instead of the smoothness probabilities of the average norm over 25,600 samples.

  • Estimating the matrix size that results from a given set of parameter choices requires to estimate the reduction factor of the so-called filtering step. As we show in Appendix B.2, recent large computations chose parameters very differently, which led to vastly different reduction factors. Based on a rationale for parameter choice that is in accordance with previous computation with the cado-nfs software, we estimate the filtering step as providing a (conservative) constant reduction factor of 9. This is very different from the reasoning in [7] and we justify this choice in Appendix B.2.

4.2 Evolution of DL cost as \(h_t\), \(h_y\) increase for \(k=6\), 8

Embedding Degree 6: MNT-6, BLS-6 and Cocks–Pinch-6 Curves

We wish to compare the effectiveness of various NFS strategies, for our Cocks–Pinch-6 curves having \(D=3\), as well as other curves. Targeting the 128-bit security level, we fix the size of r in Cocks–Pinch-6 curves to be 256 bits, hence T is 64 bits long. We set \(D=3\), and let p vary from 512 to 800 bits thanks to \(h_y\) of increasing size from 1 to 145 bits. For comparison, we generate MNT-6 parameters with Ben Lynn’s pbc library where p runs from 126 to 2047 bits, and BLS-6 parameters where p runs from 126 to 2046 bits (r is 64 to 1024 bit long). Then we run the STNFS and TNFS DL cost estimate for each p. The results are presented in Fig. 1, where the degree of h is implicitly chosen so as to minimise the estimated cost of (S)TNFS, and the notation \(L_N^0\) stresses the fact that the asymptotic complexity is used with o(1) set to 0.

Fig. 1
figure 1

TNFS vs STNFS simulations for curves with \(k=6\)

MNT-6 and Cocks–Pinch-6 curves of p of 672 bits (\(p^k\) of 4032 bits) provide 128 bits of security, while a BLS-6 curve of that size only provides 116 bits of security, a 832-bit p is needed for a BLS-6 curve to provide 128 bits of security. The polynomials for TNFS are given in Table 12. All parameters can be obtained in the code repository mentioned in Sect. 1.1.

Embedding Degree 8: Tanaka–Nakamula, Fotiadis–Konstantinou and Cocks–Pinch-8 Curves. We do the same for \(k=8\). We generate Cocks–Pinch-8 parameters with \(D=4\), r of fixed length 256 bits, and \(h_y\) growing from 1 to 128 bits, so that p grows from 512 to 768 bits. We compare to Tanaka–Nakamula curves of \(k=8\) (TN-8) [51] where p(x) has degree 6, and Fotiadis–Konstantinou curves of \(k=8\) and \(D=4\) (FK-8) [24, Table 3] where p(x) has degree 8. We obtain Fig. 2. Cocks–Pinch-8 curves of r of 256 bits and p of 544 bits (\(h_t,h_y\) of 16 bits) provide 128 bits of security. TN-8 curves of p of 544 bits have r of 362 bits and offer 125 bits of security. FK-8 curves of p of 544 bits have r of 273 bits and provide 116 bits of security. To obtain 128 bits of security, TN-8 curves should have p of 576 bits (then r is 383 bits long) and FK-8 curves should have p of 672 bits (then r is 337 bits long).

Fig. 2
figure 2

TNFS vs STNFS simulation for curves with \(k=8\)

Evolution of DL cost as\(\log _2p\)changes for\(k=6,8\). It follows from Figs. 1 and 2 that the special form of p for our curves does offer an advantage compared to the generic Conjugation–TNFS algorithm only for very small sizes of p, between 512 and 560 bits for \(k=6\), \(D=3\) (this corresponds to \(|h_t|\) and \(|h_y|\) of at most 20 bits), and between 512 and 536 bits for \(k=8\), \(D=4\) (\(|h_t|\) and \(|h_y|\) of at most 12 bits). For larger parameters, the generic TNFS algorithm is faster, which supports our claim that the primes that we use are not special.

4.3 Cost estimates at the 128-bit security level

Our simulation results are given in Table 3, which covers both the method in Sect. 3 as well as competing curves. In each case, only the most efficient NFS variant is retained. For reference, we also include a cost estimate, obtained with the same method, for 3072-bit prime fields. This fits reasonably well with the commonly accepted idea that this size matches 128-bit security (see e.g. [44, §5.6]).

Table 3 Comparison of DL cost estimates

In addition to the parameters for BLS12 and BN curves that were proposed in [7], we find it necessary to also consider curves in these families for a 446-bit prime field. As it turns out, this gives only a marginally different cost estimate for the attack, and the fact that p fits in seven 64-bit machine words is a clear implementation advantage. For BN-446 the seed \(u=2^{110}+2^{36}+1\) is taken from [45] and our seed for BLS12-446 is \(u=-(2^{74}+2^{73}+2^{63}+2^{57}+2^{50}+2^{17}+1)\) so that the curve is subgroup-, twist-, and subgroup-twist-secure.

5 Pairing cost

We now count the number of operations over \(\mathbb {F}_p\) to compute the optimal ate pairing with Algorithms 3, 4 and 5. We denote by \(\mathbf{m} _k\), \(\mathbf{s} _k\), \(\mathbf{i} _k\) and \(\mathbf{f} _k\) the costs of multiplication, squaring, inversion, and p-th power Frobenius in \(\mathbb {F}_{p^k}\), and by \(\mathbf{m} =\mathbf{m} _1\) the multiplication in \(\mathbb {F}_p\). We neglect additions and multiplications by small constants. In order to provide a common comparison base, we give estimated costs for \(\mathbf{m} _k\) and \(\mathbf{s} _k\) using Karatsuba-like formulas [19, 22, 43]. Inversions are computed using the expression below, which relies on efficient Frobenius powers:

$$\begin{aligned} a^{-1} = ({\text {Norm}}_{\mathbb {F}_q^k/\mathbb {F}_q}(a))^{-1}\times a^q\times \cdots \times a^{q^{k-1}}. \end{aligned}$$

Remark 5.1

(Frobenius cost) For the latter to perform well, it is very useful to have \(p\equiv 1 \bmod k\) (as we did in Algorithm 2), and define \(\mathbb {F}_{p^k}\) as \(\mathbb {F}_{p^k} = \mathbb {F}_p[x]/(x^k-\alpha )\): let \(w = \sum _{i=0}^{k-1} \omega _ix^i \in \mathbb {F}_{p^k}= \mathbb {F}_p[x]/(x^k-\alpha )\). Then, \(w^{p^j} = \omega _0 + \sum _{i=1}^{k-1} \omega _i x^{ip^j}\). The terms \(x^{ip^j}\) do not depend on w and are precomputed. By Euclidean division by k, \(x^{ip^j} = x^{u_jk+i} = \alpha ^{u_j}x^i\). Therefore we have at most \(\mathbf{f} _k=(k-1)\mathbf{m} \) for any \(p^j\)-th power Frobenius. Note that for k even, we have \(x^{k/2\cdot (p^j-1)}=\alpha ^{(p^j-1)/2}=\pm 1\) so that \(x^{k/2\cdot p^j}=\pm x^{k/2}\), whence one multiplication can be saved.

figure c

Consequences of the above are given in [48], notably \(\mathbf{i} _2=2\mathbf{m} +2\mathbf{s} _1+\mathbf{i} _1\) and \(\mathbf{i} _3=9\mathbf{m} +3\mathbf{s} _1+\mathbf{i} _1\), neglecting additions. Recursive application yields \(\mathbf{i} _k\) for \(k=2,3,4,6,8,12,16\). For \(k=5\), we compute \(t {:}{=}a^q\times \cdots \times a^{q^4}=((a^q)^{1+q})^{1+q^2}\) and then use the fact that the norm of a is in \(\mathbb {F}_p\) to get \(\text {Norm}_{\mathbb {F}_{p^5}/\mathbb {F}_p}(a) = a \times t = a_0t_0 + \alpha \sum _{j=1}^{k-1}a_it_{k-i}\). We finally obtain that \(\mathbf{i} _5=3\mathbf{f} _5+ 2\mathbf{m} _k+\mathbf{i} _1+10\mathbf{m} \), and \(\mathbf{i} _7\) is obtained in a similar way. Table 4 summarises these estimated costs, and includes specialised costs for cyclotomic squares (see [29, §3.1]). We compared Table 4 with timings of the RELIC library [4] for primes p of 6 to 8 machine words and \(k=2,6,12\) on an Intel Core i5-4570 CPU, 3.20GHz. The accordance is satisfactory (within 10%), to the point that we use Table 4 as a base. Additionally, we also measured the relative costs of \(\mathbf{i} _1\), \(\mathbf{s} _1\), and \(\mathbf{m} \) on the same platform,Footnote 1 leading to \(\mathbf{i} _1 \approx 25\mathbf{m} \) and \(\mathbf{s} _1 \approx \mathbf{m} \).

5.1 Miller loop

The Miller loop evaluates functions defined over \(\mathbb {F}_{p^k}\), at a point of \(E(\mathbb {F}_{p})\). Algorithm 3 is essentially a repetition of Algorithm 1 with a few modifications related to practice: it is desirable to separate numerators and denominators so as to compute only one inversion at the end. Furthermore, the argument T may conveniently be handled in binary non-adjacent form \(T = \sum _{i=0}^n b_i2^i = ({b_nb_{n-1}\ldots b_2b_1b_0})_{\text {2-NAF}}\) with \(b_i\in \{-1,0,1\}\). We use the notation \({{\,\mathrm{\text {HW}_\text {{2-NAF}}}\,}}\) for the number of non-zero \(b_i\).

Algorithm 3 uses three helper functions that are detailed as Algorithms 4 and 5. The input point S as well as the output point \(\mathbf{S} \) in these algorithms are in Jacobian coordinates [20]: the quadruple \((X,Y,Z,Z^2)\) represents the affine point \((X/Z^2,Y/Z^3)\). This saves inversions and multiplications.

Table 4 Relative cost of \(\mathbf{m} _k\), \(\mathbf{s} _k\) and \(\mathbf{i} _k\) for our finite field extensions
Table 5 Miller loop cost (see Equation (1))

In Table 5, we give the cost of the line computations for \(5\le k\le 8\). As it turns out, the embedding degree k affects the pairing cost in multiple ways. As mentioned in Sect. 2 (see also [3, 21]), twists allow efficient computations for 6|k (resp. 4|k) if we set \(D=3\) (resp. \(D=4\)). During Algorithm 3, the factors in proper subfields of \(\mathbb {F}_{p^k}\) are neutralised during the final exponentiation, and hence are not computed. In particular, \(\lambda _d\) and \(m_d\) are omitted from the Miller loop computation for these curves. Algorithms 4 and 5 are also simplified.Footnote 2 Table 5 takes adaptations of these optimisations into account when twists are available,Footnote 3 while the data for \(k=5\) and \(k=7\) comes straight from Algorithms 34 and 5. We denote by \(\mathbf {mc}_{k}\) the cost of multiplying a constant \(c\in \mathbb {F}_p\) by an element of \(\mathbb {F}_{p^k}\). A consequence of Table 5 is that the Jacobi quartic, Hessian, and Edwards models induce comparatively expensive pairing computations, and the Weierstrass model is preferred in practice.

The final cost of Algorithm 3 is given by the following formula, where the notation \(\mathbf{c} _X\) denotes the cost of step X, or algorithm X.

$$\begin{aligned} \nonumber \mathbf{c} _{\textsc {MillerLoop}} =&(\log _2(T)-1)\left( \mathbf{c} _{\textsc {DoubleLine}}+ \mathbf{c} _{\textsc {VerticalLine}}\right) \nonumber \\&+ (\log _2(T)-2)\mathbf{c} _{\textsc {Update1}}\nonumber \\&+ ({{\,\mathrm{\text {HW}_\text {{2-NAF}}}\,}}(T)-1)( \mathbf{c} _{\textsc {AddLine}} + \mathbf{c} _{\textsc {VerticalLine}} + \mathbf{c} _{\textsc {Update2}})\nonumber \\&+ (\text {only if }k\in \{5,7\})\mathbf{i} _k. \end{aligned}$$
(1)
Table 6 Cost \(\mathbf{c} _{\textsc {FirstExp}}\) of the first part of the final exponentiation

5.2 Final exponentiation, first and second part

The final exponentiation to the power \((p^k-1)/r\) is computed in two steps, named first and second part, corresponding to the two factors of the exponent: \(\frac{p^k-1}{\varPhi _k(p)}\times \frac{\varPhi _k(p)}{r}\).

The first part of the exponentiation uses few Frobenius powers and inversions and its cost (Table 6) depends on the value of \(\varPhi _k(p)\). Its computation is very efficient because of Frobenius powers (Remark 5.1). In particular, for \(x\in \mathbb {F}_{p^8}\), \(x^{p^4}\) is almost free: it is simply the conjugate of x seen in a quadratic extension of \(\mathbb {F}_{p^4}\).

The second part of the exponentiation is more expensive and is specific to each curve. The key ingredient is the base-p representation of the exponent, since Frobenius powers \(p^i\) are computed efficiently. Notice that in Algorithm 2, we have \(p\equiv (t-1)\equiv (t_0-1)\mod r\). Let c be such that \(p+1-t_0 = c \cdot r\). The expression \((\varPhi _k(p)-\varPhi _k(t_0-1))/r\) simplifies, and we obtain a nice generic formula in p and \(t_0\) for each embedding degree. The actual expression depends on the exponent i in Algorithm 2, as well as on congruence conditions on T. We only detail a few examples. Formulas for the other cases can be obtained with the companion software mentioned in Sect. 1.1.

For \(k=8\), we choose \(D=4\) so that \(\sqrt{-D}=2T^2\). When for instance we choose \(i=1\) in Algorithm 2, we have \(t_0=T+1\bmod r\). This leads to the following expression, where \(h_u\) denotes the integer \((h_t+1)/2\).

$$\begin{aligned} \varPhi _8(p)/r&= \varPhi _8(t_0-1)/r + (p+t_0-1)(p^2 + (t_0-1)^2)c, \nonumber \\ c&= ((((h_u^2-h_u+h_y^2+1/4) T +h_y) T -h_y + 1/4) T+h_u - 1) T +h_u^2 + h_y^2 \end{aligned}$$
(2)

where \(\varPhi _8(t_0-1)/r = \varPhi _8(T)/r=1\) by construction. To raise to the power \(\varPhi _8(p)/r\), we use the fact that T is even to deal with fractional values in the exponent. We obtain the following upper bound on the cost, using \(\mathbf{c} _T\), \(\mathbf{c} _u\), \(\mathbf{c} _y\) to denote the cost of raising to the power T, \(h_u\), and \(h_y\), respectively:

$$\begin{aligned} \mathbf{c} _{\textsc {SecondExp}, k=8} = (3\mathbf{c} _T + 2\mathbf{f} _k + 3\mathbf{m} _k) + (11\mathbf{m} _k + 4\mathbf{c} _T + 2\mathbf{c} _u + 2\mathbf{c} _y). \end{aligned}$$

For \(k=6\), we choose \(D = 3\) so that \(\sqrt{-D}=2T-1\). We obtain expressions that vary slightly depending on i, and on the congruence class of \(h_t\bmod 2\) and \(T\bmod 3\). It also appears that it is more convenient to compute the cube of the pairing. When for instance we choose \(i=1\) in Algorithm 2, and that \(T\bmod 3=1\) and \(h_t\bmod 2=1\), we have the following expression, where \(u = (h_t+1)/2\), \(w = h_y/2\), and \(T'=T-(T\bmod 3)=T-1\):

$$\begin{aligned} 3\varPhi _6(p)/r&= 3\varPhi _6(t_0-1)/r + 3(p+t_0)c, \nonumber \\ 3c&= ((3u^2 + 9w^2 - 3u - 3w + 1)T'+ \nonumber \\&\quad 3u^2 + 9w^2 - 6w)T'+ 3u^2 + 9w^2 + 3u - 9w. \end{aligned}$$
(3)

Raising to the power \(3\varPhi _k(p)/r\) thus has the following cost (we give an upper bound on all possible congruence conditions). We use \(\mathbf{c} _u\), \(\mathbf{c} _w\), \(\mathbf{c} _T\) and \(\mathbf{c} _{T'}\) to denote the cost of raising to the powers u, w, T and \(T'=T-(T\bmod 3)\), respectively.

$$\begin{aligned} \mathbf{c} _{\textsc {SecondExp}, k=6} = (\mathbf{c} _T + \mathbf{f} _k + 2\mathbf{s} _k + 4\mathbf{m} _k) + (12\mathbf{m} _k + 2\mathbf{s} _k + 2\mathbf{c} _u+ 2\mathbf{c} _w + 2\mathbf{c} _{T'}) \end{aligned}$$

For \(k=5\) and \(k=7\), we use \(p=(t_0-1)+c\varPhi _k(T)\) to reduce \(\varPhi _k(p)/\varPhi _k(T)\) (rational fraction in the indeterminate T) to the form

$$\begin{aligned} \varPhi _k(p)/r = {\sum _{0\le j\le k-2}p^ja_j(c,T)}. \end{aligned}$$

The exact expression of the coefficients \((a_j)_j\) depends on k and i, and so does the cost of raising to these powers. For example, for \(k=5\) and \(i=2\), we have

$$\begin{aligned} (a_j)_{0\le j\le 3}=(-cT^3 - T + 1, -cT^3 - (c+1)T + 1, cT^2 + c + 1, c). \end{aligned}$$

By applying this method, we found that for \(k=5\) and \(k=7\), raising to the power \(\varPhi _k(p)/r\) costs at most

$$\begin{aligned} \mathbf{c} _{\textsc {SecondExp}, k\in \{5,7\}}= 2\mathbf{i} _k + (k-2)(\mathbf{f} _k + \mathbf{c} _T + 2\mathbf{m} _k) + \mathbf{c} _c + \mathbf{m} _k \nonumber \end{aligned}$$

where the two inversions can be saved in the favorable case \(i=1\), and \(\mathbf{c} _T\) and \(\mathbf{c} _c\) are the costs of raising to the powers T and c, respectively.

6 Comparisons with previous curves

We compare curves generated with Algorithm 2 of embedding degree 5 to 8 with the state of the art: BN and BLS12 curves [3], KSS16 curves [21, §4], and \(k=1\) curves [16]. Note that several estimates below differ marginally from [7], which uses a different estimated cost \(\mathbf{i} _{12} = \mathbf{i} _1 + 97\mathbf{m} \), and also reproduce the \(7\mathbf{m} _2\) estimate that we mentioned in footnote 4 on page 14.

BN curve with a 462-bit primep. Barbulescu and Duquesne give in [7] parameters of a BN curve for 128 bits of security. The curve is defined from the parameter \(u = 2^{114} + 2^{101} - 2^{14} - 1\) and has a prime p of 462 bits (see Table 2). An estimation of the optimal ate pairing on this curve is also given in [7], we reproduce the final count. The Miller loop iterates on \(T=6u+2\), of 117 bits and NAF-Hamming-weight 7. Reusing Eq. (1) and Tables 4 and 5, and taking into account the correcting terms of the Miller loop for BN curves [52], we get:

$$\begin{aligned} \mathbf{c} _{\textsc {MillerLoop}}&= 116(3\mathbf{m} _2 + 6 \mathbf{s} _2 + 4\mathbf{m} ) + 115(\mathbf{s} _{12}+ 13\mathbf{m} _2) \\&\quad + 6(11\mathbf{m} _2 + 2\mathbf{s} _2 + 4\mathbf{m} + 13\mathbf{m} _2) \\&\quad +(11\mathbf{m} _2 + 2\mathbf{s} _2 + 4\mathbf{m} ) + 4\mathbf{m} _2 + 4\mathbf{m} + 2(13\mathbf{m} _2)+4(10\mathbf{m} ) \\&= 12180\mathbf{m} . \end{aligned}$$

According to [34, Corollary 4.1], raising to the power u costs

$$\begin{aligned} \mathbf{c} _u=4(114-1)\mathbf{m} _2 + (6\cdot 3-3)\mathbf{m} _2 + 3\mathbf{m} _{12} + 3\cdot 3\mathbf{s} _2 + \mathbf{i} _2 = 1585\mathbf{m} + \mathbf{i} . \end{aligned}$$

The final exponentiation costs

$$\begin{aligned} \mathbf{c} _{\textsc {FinalExp}} = \mathbf{i} _{12} + 12\mathbf{m} _{12} + 3\mathbf{s} _{12}^\text {cyclo} + 4\mathbf{f} _{12} + 3\mathbf{c} _u = 5591\mathbf{m} + 4\mathbf{i} \end{aligned}$$

where \(\mathbf{s} _{12}^\text {cyclo}\) is the cost of cyclotomic squarings (see [34]), namely \(\mathbf{s} _{12}^\text {cyclo} = 18\mathbf{m} \). The optimal ate pairing on the BN-462 curve costs in total \(17771\mathbf{m} +4\mathbf{i} \).

BLS12 curve with a 461-bit primep. We reproduce the results from [3] adapted to the parameter \(u = -2^{77} + 2^{50} + 2^{33}\) from [7]. We obtain:

$$\begin{aligned} \mathbf{c} _{\textsc {MillerLoop}}&= 76(3\mathbf{m} _2 + 6 \mathbf{s} _2 + 4 \mathbf{m} ) + 75(\mathbf{s} _{12} + 13\mathbf{m} _2) \\&\quad + 2(11\mathbf{m} _2 + 2 \mathbf{s} _2 + 4 \mathbf{m} + 13\mathbf{m} _2)\\&= 7685\mathbf{m} . \end{aligned}$$

As above, we adapt [34, Corollary 4.1]. Raising to the power u costs

$$\begin{aligned} \mathbf{c} _u&=4(77 - 1)\mathbf{m} _2 + (6\cdot 1 - 3)\mathbf{m} _2 + 2\mathbf{m} _{12} + 3\cdot 2\cdot \mathbf{s} _2 + \mathbf{i} _2 \\&= 1045\mathbf{m} + \mathbf{i} . \end{aligned}$$

The final exponentiation costs

$$\begin{aligned} \mathbf{c} _{\textsc {FinalExp}}&= \mathbf{i} _{12} + 12\mathbf{m} _{12} + 2\mathbf{s} ^\text {cyclo}_{12} + 4\mathbf{f} _{12} + 5\mathbf{c} _u\\&= 6043\mathbf{m} + 6\mathbf{i} . \end{aligned}$$

The optimal ate pairing on the BLS12-461 curve costs in total \(13728\mathbf{m} + 6\mathbf{i} \).

KSS16 curve with a 339-bit primep. We reproduce the results from [21] with the parameter \(u=2^{35}-2^{32}-2^{18}+2^8+1\) from [7]. We obtain:

$$\begin{aligned} \mathbf{c} _{\textsc {MillerLoop}}&= 34(2\mathbf{m} _4 + 8 \mathbf{s} _4 + 8 \mathbf{m} ) + 33(\mathbf{s} _{16} + (8\mathbf{m} _4)) + 4(9\mathbf{m} _4 + 5 \mathbf{s} _4 + 8 \mathbf{m} )\\&\ \ \ + 3(14\mathbf{m} ) + 5\mathbf{m} _4 + \mathbf{s} _4+ 16\mathbf{m} + 6(8\mathbf{m} _4)\\&= 7691\mathbf{m} . \end{aligned}$$

Raising to the power u costs \(34\mathbf{s} _{16}^\text {cyclo} + 4\mathbf{m} _{16} = 1548\mathbf{m} \), and the final exponentiation costs:

$$\begin{aligned} \mathbf{c} _{\textsc {FinalExp}}&= \mathbf{i} _{16} + 32\mathbf{m} _{16} + 34\mathbf{s} ^\text {cyclo}_{16} + 8\mathbf{f} _{16} + 24\mathbf{m} _4 + 9(1684\mathbf{m} )\\&= 18210\mathbf{m} + \mathbf{i} . \end{aligned}$$

The optimal ate pairing on the KSS16-339 curve costs in total \(25901\mathbf{m} + \mathbf{i} \).

Curves of embedding degree one. The curves suggested by [16] are resistant to TNFS because the target finite field is \(\mathbb {F}_p\), with p as large as 3072 bits. The ate pairing is not available on these curves because the trace is \(t=2\), so the Tate pairing must be used. Its cost is given in [16]: for a 256-bit r, the Miller loop costs \(4626\mathbf{m} + \mathbf{i} \) and the final exponentiation costs \(4100\mathbf{m} \). The total cost is finally \(8726\mathbf{m} + \mathbf{i} \).

6.1 Our new STNFS-resistant curves at the 128-bit security level

We generate four curves of embedding degree 5, 6, 7 and 8 with Algorithm 2 combined with the CM method and estimate the cost of the new optimal ate pairing on these curves. Code to reproduce this search can be found in the repository mentioned in Sect. 1.1.

Twist-secure and subgroup-secure parameters. We checked our curves for twist- and subgroup-security (see [10]). For each curve, we checked the size of the cofactors of the curve and its quadratic twist on \(\mathbb {G}_1\) and \(\mathbb {G}_2\). A curve E is \(\eta \)-subgroup-secure over \(\mathbb {F}_q\) if all the factors of \(E(\mathbb {F}_q)\) are at least as large as r, except those of size \(\eta \). A curve is twist-subgroup secure if its quadratic twist is subgroup-secure. This makes five criteria: subgroup- and twist-subgroup- security for both \(\mathbb {G}_1\) and \(\mathbb {G}_2\), as well as subgroup security for \(\mathbb {G}_T\) (with respect to \(\varPhi _k(p)\)). We selected four curves that are subgroup and twist-subgroup secure for \(\mathbb {G}_1\) with \(\eta =13\). Except in the \(k=7\) case, the curve containing the subgroup \(\mathbb {G}_2\) is also subgroup-secure. For embedding degree 5 and 6, this curve is also twist-subgroup-secure. We did not investigate the \(\mathbb {G}_T\) subgroup-security: together with the Cocks–Pinch conditions, it would require finding parameters such that \(\varPhi _k(p)/r\) is prime or almost prime. With the sizes provided in the following paragraph, \(1088 \le \log _2(\varPhi _k(p)/r) \le 2816\) so it is difficult to factor this thousand-bit integer entirely and it will very unlikely be prime.

Parameter choices. We explain our choices of parameter sizes in Algorithm 2:

  • Size of the primep. We target a size for the finite field \(\mathbb {F}_{p^k}\) that determines the size of p. These values can be read in Table 3.

  • Hamming weight (or 2-NAF weight) of T. We restrict to low weight T in order to get an efficient Miller loop. We choose \(\text {HW}_{\text {2-NAF}}(T) = 4\) for the \(k=5\) curve, or \(\text {HW}_{\text {2-NAF}}(T) = 5\) for others.

  • Discriminant. For efficiency, we target curves with as many automorphisms as possible. For \(k=6\) (resp. 8), we set \(D = 3\) (resp. 4) so that a sextic (resp. quartic) twist is available. For \(k=5\), we chose arbitrarily \(D\approx 10^{10}\), which is well within the feasible range for the CM method. For \(k=7\), the size of p (512 bits) restricts us to small discriminants, since we must have \(4p = t^2 + Dy^2\) with \(\log _2(t), \log _2(y) \approx \log _2(r) = 256\).

  • Hamming weight (or 2-NAF weight) of\(h_t\)and\(h_y\). As explained in Sect. 5.2, for \(k=6\) and \(k=8\) we restrict to low weight cofactors \(h_t\) and \(h_y\) so as to accelerate the exponentiation to the power c in the second part of the final exponentiation (see Equations (2) and (3)).

Allowing a cofactor of 13 bits, we obtain twist- and subgroup-secure elliptic curves of embedding degree five to eight. We denote by \(E^t\) the quadratic twist of E and \(\tilde{E}\) the degree d twist of E such that \(E(\mathbb {F}_{p^k})[r] \simeq \tilde{E}(\mathbb {F}_{p^{k/d}})[r]\). Points on E can be represented with the Edwards model if we restrict to curves with 4-torsion points. For the remainder of this section, the notation \(p_{N}\) denotes an arbitrary prime of N bits.

Curve of embedding degree 5. The curve \(E: y^2 = x^3 -3x + b_5\) defined over \(\mathbb {F}_p\) with

$$\begin{aligned} \begin{array}{rl} b_5= &{} \texttt {0x3dd2d2b0b2e68770bf01b41946ab867390cf9ecc4a858004fc769c}\\ &{} \texttt {278f079574677c7db3e7201c938b099f85eb6e85f200b95a80b24fdb}\\ &{} \texttt {df584098d690c6b91b21d00f52cc79473a11123b08ab2a616b4a4fbf}\\ p = &{} \texttt {0x40000138cd26ab94b86e1b2f7482785fa18f877591d2a4476b4760}\\ &{} \texttt {217f860bfe8674e2a4610d669328bda13044c030e8cc836a5b363f2d}\\ &{} \texttt {4c8abcab71b12091356bb4695c5626bc319d38bf65768c5695f9ad97} \end{array} \end{aligned}$$

satisfies

$$\begin{aligned} \#E(\mathbb {F}_p)= & {} 2^2 \cdot p_{405} \cdot r\quad \#\tilde{E}(\mathbb {F}_{p^{5}}) = p_{2393} \cdot (2^2 \cdot p_{405} \cdot r) \cdot r\\ \#E^t(\mathbb {F}_p)= & {} 2^2 \cdot p_{661}\quad \#E^t(\mathbb {F}_{p^{5}}) =p_{2649} \cdot (2^2 \cdot p_{661})\\ r= & {} {\texttt {0x9610000000015700ab80000126012600c4007000a800e00}}\\&{\texttt {0f000200040008001}} \end{aligned}$$

The additional parameters to obtain the curve from Algorithm 2 are :

and \(\mathbb {F}_{p^5}\) can be defined as \(\mathbb {F}_p[x] / (x^5 - 5)\).

Curve of embedding degree 6. The curve \(E: y^2 = x^3 - 1\) defined over \(\mathbb {F}_p\) with

$$\begin{aligned} \begin{array}{rl} p = &{} \texttt {0x9401ff90f28bffb0c610fb10bf9e0fefd59211629a7991563c5e468}\\ &{} \texttt {d43ec9cfe1549fd59c20ab5b9a7cda7f27a0067b8303eeb4b31555cf4}\\ &{} \texttt {f24050ed155555cd7fa7a5f8aaaaaaad47ede1a6aaaaaaaab69e6dcb} \end{array} \end{aligned}$$

satisfies

$$\begin{aligned} \#E(\mathbb {F}_p)= & {} 2^2 \cdot p_{414} \cdot r\quad \#\tilde{E}(\mathbb {F}_{p^{}}) = 3 \cdot p_{414} \cdot r\\ \#E^t(\mathbb {F}_p)= & {} 2^2 \cdot 3 \cdot 7 \cdot p_{665}\quad \#E^t(\mathbb {F}_{p^{}}) =13 \cdot 19 \cdot p_{664}\\ r= & {} {\texttt {0xe0ffffffffffffc400000000000003ff10000000}}\\&{\texttt {000000200000000000000001}} \end{aligned}$$

The additional parameters to obtain the curve from Algorithm 2 are :

and \(\mathbb {F}_{p^6}\) can be defined as \(\mathbb {F}_p[x] / (x^6 - 2)\).

Curve of embedding degree 7. The curve \(E: y^2 = x^3 -3u^2x + b_7u^3\) defined over \(\mathbb {F}_p\) with

$$\begin{aligned} \begin{array}{rl} b_7= &{} \texttt {0x15d384c76889d377dd63600fbe42628e0c386a3e87}\\ &{} \texttt {915790188d944845aab2b649964f386dc90b3a9b612}\\ &{} \texttt {0af5da9a2aaead5e415dd958c5cfa80ea61aac268b0}\\ p = &{} \texttt {0x8f591a9876a6d2344ae66dd7540ea2fd28174755d1}\\ &{} \texttt {6c4ae5c5cd5c1d208e639271b48c8ba7453c95a2a9b}\\ &{} \texttt {e6434f2455504d419f13e35062aa5ebbc49ecfd30f9}\\ u = &{} \texttt {11} \end{array} \end{aligned}$$

satisfies

$$\begin{aligned} \#E(\mathbb {F}_p)= & {} 2^2 \cdot 3^2 \cdot p_{251} \cdot r\quad \#E^t(\mathbb {F}_{p^{7}}) =2^5 \cdot 5 \cdot p_{504}\\ r= & {} {\texttt {0xb63ccd541c3aa13c7b7098feb312eecf5648fd215c0d29}}\\&{\texttt {16714b429d14e8f889}} \end{aligned}$$

The additional parameters to obtain the curve from Algorithm 2 are :

$$\begin{aligned} T = 2^{43}- 2^{41}-{\texttt {0x47dfdb8}}, D = 20, i=6, h_t = -2, h_y = 0 \end{aligned}$$

and \(\mathbb {F}_{p^7}\) can be defined as \(\mathbb {F}_p[x] / (x^7 - 2)\).

Curve of embedding degree 8. The curve \(E: y^2 = x^3 + 2x\) defined over \(\mathbb {F}_p\) with

$$\begin{aligned} \begin{array}{rl} p = &{} \texttt {0xbb9dfd549299f1c803ddd5d7c05e7cc0373d9b1ac15b}\\ &{} \texttt {47aa5aa84626f33e58fe66943943049031ae4ca1d2719b}\\ &{} \texttt {3a84fa363bcd2539a5cd02c6f4b6b645a58c1085e14411} \end{array} \end{aligned}$$

satisfies

$$\begin{aligned} \#E(\mathbb {F}_p)= & {} 2^2 \cdot 3^2 \cdot 5 \cdot 41 \cdot p_{275} \cdot r\qquad \#E^t(\mathbb {F}_p) = 2^4 \cdot p_{540}\quad \#\tilde{E}(\mathbb {F}_{p^{2}}) = 2 \cdot 89 \cdot p_{824} \cdot r\\ r= & {} {\texttt {0xff0060739e18d7594a978b0ab6ae4ce3dbfd52a9}}\\&{\texttt {d00197603fffdf0000000101}} \end{aligned}$$

The additional parameters to obtain the curve from Algorithm 2 are :

and \(\mathbb {F}_{p^8}\) can be defined as \(\mathbb {F}_p[x] / (x^8 - 5)\).

6.2 Comparison with the state of the art

In order to compare the pairing on our curves with the computation on BN, BLS12, KSS16, and \(k=1\) curves, we need to determine the cost of a multiplication \(\mathbf{m} \) for different sizes of p. Indeed, a multiplication in the 3072-bit field of \(k=1\) curves is much more expensive than in a 512-bit prime field. Table 7 shows the benchmarks with RELIC [4] for base field arithmetic with the different primes involved in our pairings.

Pairing computation. The costs of the Miller loop and the first part of the final exponentiation are given by Eq. (1), Tables 5, and 6. The second part of the final exponentiation is covered by Sect. 5.2. This part is specific to each set of curve parameters, in particular \({{\,\mathrm{\text {HW}_\text {{2-NAF}}}\,}}(h_t)\) and \({{\,\mathrm{\text {HW}_\text {{2-NAF}}}\,}}(h_y)\). Appendix A goes into more detail for our curve with \(k=8\). Further detail for the second part of the final exponentiation for all exponents, covering the various cases, can be found in the code repository (see Sect. 1.1).

Table 8 summarises our comparison results for pairing computations. We warn the reader that timings of Table 8 are not real pairing computations, as we simply used as a base the arithmetic operations of RELIC [4], and the multiplication costs that we detailed in the paragraphs above. This being said, for the curves where an actual implementation of the optimal ate pairing is available with RELIC (BN and BLS12 curves), the estimation that we obtain is within 10% of the actual computation time. This gives reasonable confidence for the validity of the other projected timings.

Miller loop. We obtain a faster Miller loop for \(k=6\) and \(k=8\) curves compared to BN and BLS12 curves. The \(k=8\) curve has a shorter Miller loop (64-bit) compared to the BN and BLS12 ones (117-bit). The \(k=6\) curve has a sextic twist that allows to compute \(f_{T, Q}(P)\) on \(\mathbb {F}_{p}\) of 672 bits, compared to a quadratic field of 922 bits for BN and BLS12 curves. As for the cases \(k=5\) and \(k=7\), the Miller loop is not as efficient because no twist is available, and the computation is done over \(\mathbb {F}_{p^k}\). Comparisons between \(k=6\) and \(k=7\) curves show that using a curve with twists is a better option than having a short Miller loop. The best option is obviously to have a short Miller loop and a curve with twists, as for \(k=8\) curves.

Final exponentiation. The rewriting tricks used in Sect. 5.2 for the final exponentiation apply for any curve obtained with Algorithm 2 with the optimisation \(r=\varPhi _k(T)\). For \(k=6\) and \(k=8\) the cofactor is smaller, and the discriminant \(D=3\), resp. \(D=4\), gives formulas that are as good as for BN and BLS12 curves. For \(k=5\) and \(k=7\) curves, the exponentiation is less efficient because fast cyclotomic squaring formulas are not available.

Table 7 \(\mathbb {F}_p\) multiplication timing for RELIC on a Intel Core i7-8700 CPU, 3.20GHz with TurboBoost disabled
Table 8 Pairing cost and timing extrapolation from Table 7

Total cost. Table 8 shows that our new pairing is almost as efficient as the optimal ate pairing on the BLS12 and KSS16 curves. Given the nature of Table 8 which gives estimated timings, it is however more appropriate to say that the performance difference is within the error margin. Additionally, we estimate that the optimal ate pairing on our \(k=8\) curve is up to 22 times more efficient than the Tate pairing on \(k=1\) curves [16].

Remark 6.1

We also estimated the cost of a Tate pairing on Cocks–Pinch curves without twists for \(k=5,7\) where the first input point P has coordinates in \(\mathbb {F}_p\) and the second input point Q has coordinates in \(\mathbb {F}_{p^k}\). The doublings and additions of points now take place in \(\mathbb {F}_p\), but the accumulations still require \(\mathbb {F}_{p^k}\) arithmetic. The costs are \(\mathbf{c} _{\textsc {AddLine}} = 8\mathbf{m} +3\mathbf{s} +2k\mathbf{m} \), \(\mathbf{c} _{\textsc {DoubleLine}}=7\mathbf{m} +4\mathbf{s} +2k\mathbf{m} \), \(\mathbf{c} _{\textsc {VerticalLine}}=k\mathbf{m} \), \(\mathbf{c} _{\textsc {Update1}}=2\mathbf{m} _k+\mathbf{s} _k\), and \(\mathbf{c} _{\textsc {Update2}} = 2\mathbf{m} _k\) (\(m_d \in \mathbb {F}_p\) is not computed since it would be 1 after the final exponentiation). The Miller length is \((k-1)\) times longer, of length \(\log _2 r\) instead of \(\log _2 T\). The accumulation steps \(\mathbf{c} _{\textsc {Update1}}\) cost \((k-1)\) times more. The Miller loop of a Tate pairing would cost roughly \(({{\,\mathrm{log_2}\,}}(r)-1) (\mathbf{c} _{\textsc {DoubleLine}}+\mathbf{c} _{\textsc {VerticalLine}}) + ({{\,\mathrm{log_2}\,}}(r)-2) \mathbf{c} _{\textsc {Update1}} + ({{\,\mathrm{\text {HW}_\text {{2-NAF}}}\,}}(r)-1)(\mathbf{c} _{\textsc {AddLine}}+\mathbf{c} _{\textsc {VerticalLine}}+\mathbf{c} _{\textsc {Update2}}) + \mathbf{i} _k\). For \(k=5\) we have \({{\,\mathrm{log_2}\,}}(r) = 256\) and \({{\,\mathrm{\text {HW}_\text {{2-NAF}}}\,}}(r)=39\), which gives \(18585 \mathbf{m} \). For \(k=7\) we have \({{\,\mathrm{log_2}\,}}(r) = 256\) and \({{\,\mathrm{\text {HW}_\text {{2-NAF}}}\,}}(r)=90\), which gives \(31817 \mathbf{m} \). The gain of swapping P and Q in the Tate pairing is counterbalanced by the much longer Miller loop and finally, the Miller loop of a Tate pairing would require more multiplications in \(\mathbb {F}_p\) compared to an optimal ate Miller loop costing \(14496\mathbf{m} \) for \(k=5\) and \(18830\mathbf{m} \) for \(k=7\) (see Table 8).

Elliptic curve scalar multiplication in\(\mathbb {G}_1\)and\(\mathbb {G}_2\). Our generation of curve leads to large prime value (up to eleven 64-bit words instead of eight for BN and BLS12 curves). The scalar multiplication cost on \(\mathbb G_1\) is not affected by slow finite field multiplications because our curves benefit of other improvements: BN (resp. BLS) curves parameters for 128 bits of security lead to scalar multiplications [k]P on \(\mathbb {G}_1\) and \(\mathbb {G}_2\) with \(\log _2(k) \approx 448\) (resp. 300). For our curves of embedding degree five to eight, we choose r of minimal size (256 bits to withstand the Pollard rho attack). Some curves get benefits of efficient group law arithmetic: curves of embedding degree 5, 7, and 8 use the Edwards model. The \(k=6\) curve use the efficient formulas available for \(a=0\) curves, widely used in practice. The Gallant–Lambert–Vanstone (GLV) method can be performed on \(k=6\) and \(k=8\) curves in order to reduce the number of doubling and addition steps. Over \(\mathbb {G}_2\), the scalar multiplication is often accelerated by using a twist of the curve. The trick is available for curves of degree 6 and 8, but not for \(k=5\) and 7. Even if the main topic of this paper is about pairing computations, various protocols also compute scalar multiplications. Curves of embedding degree 5 and 7 do not benefit of twists and GLV optimisation, so the cost over \(\mathbb {G}_2\) is too expensive for practical applications.

7 Conclusion

We modified the Cocks–Pinch method to generate pairing-friendly elliptic curves with an optimal Miller loop length \(\log r/\varphi (k)\) to compute efficiently an optimal ate pairing. Moreover the parameters are carefully chosen so that the curves withstand the recent STNFS attacks on discrete logarithms. Our projected timings for optimal ate pairing computation on our \(k=6\) and \(k=8\) curve seems to be close to the BN or BLS12 curves, and the difference is very probably within the error margin between this projected analysis and an actual impementation. Compared to \(k=1\) curves presented in [16], pairing computations on the curves suggested here are expected to be 7 (\(k=5\)) to 22 (\(k=8\)) times faster.

One lesson of our work is that the Miller loop length is very important for an efficient pairing, even in the Cocks–Pinch case. It matters much more than the \(\rho \) value.

With respect to the threats on pairing summarised in Table 1, users fearing the progress of NFS variants should prefer the more conservative choice of our modified Cocks–Pinch curves: unlike BN, BLS12, and KSS16 curves, we do not have to use much larger parameters to be STNFS-resistant.

We finally note that the short Miller loop on our \(k=6\) and \(k=8\) curves is well-suited to protocols where the product of several pairings is computed, the final exponentiation being computed only once, after the Miller loops. This is the case for the translation in the prime-order setting of the Boneh–Boyen IBE scheme: the product of six pairings is computed in the decryption step, and for the hierarchical identity-based encryption based on Lewko–Waters scheme: the product of ten pairings is computed in the decryption step [40].