1 Introduction

Isogeny-based cryptography was independently introduced in 2006 by Couveignes [16], Rostovtsev and Stolbunov in [32, 34]. Since then, an ever increasing number of isogeny-based key-exchange protocols have been proposed. A selection of those protocols, especially relevant for this work, is briefly summarized below.

Operating with supersingular elliptic curves defined over the finite field \({\mathbb {F}}_{p^2}\), with p a prime, the Supersingular Isogeny-based Diffie-Hellman key exchange protocol (SIDH) was presented by Jao and De Feo in [21] (see also [17]). In 2017, the Supersingular Isogeny Key Encapsulation (SIKE) protocol, an SIDH variant, was submitted to the NIST post-quantum cryptography standardization project [2]. On July 2020, NIST announced that SIKE passed to the round 3 of this contest as an alternate candidate.

In 2018, the commutative group action protocol CSIDH was introduced by Castryck, Lange, Martindale, Panny and Renes in [8]. Operating with supersingular elliptic curves defined over a prime field \({\mathbb {F}}_p,\) CSIDH is a significantly faster version of the Couveignes-Rostovtsev-Stolbunov scheme variant as it was presented in [18].

In 2019, Costello proposed a variant of SIDH named B-SIDH [13]. In B-SIDH, Alice computes isogenies from a \((p +1)\)-torsion supersingular curve subgroup, whereas Bob has to operate on the \((p-1)\)-torsion subgroup of the quadratic twist of that curve. A remarkable feature of B-SIDH is that it can achieve similar classical and quantum security levels as SIDH, but using significantly smaller public/private key sizes. The single most important challenge in the implementation of B-SIDH is the high computational cost associated to the large degree isogenies involved in its execution.

More recently in 2021, Banegas et al. proposed in [3] a more efficient approach for computing constant-time CSIDH that they named CTIDH. In CTIDH the authors employ an economical key space mechanism (which was adapted from an idea of bounding the 1-norm of CSIDH secret key vector [28]), along with an approach for processing primes in batches [24].

In general, performing isogeny constructions and evaluations are the most expensive computational tasks of any isogeny-based protocol. This is especially true for CSIDH and B-SIDH, where [exceedingly] large odd prime degree-\(\ell \) isogenies come into play.

For decades now, Vélu’s formulas (cf. [22, Sect. 2.4] and [35, Theorem 12.16]) have been widely used to construct and evaluate degree-\(\ell \) isogenies. Using several elliptic curve and isogeny arithmetic optimization tricks reported in the last few years [9, 14, 27], the construction and evaluation of degree-\(\ell \) isogenies via Vélu’s formulas can be obtained at a computational cost of roughly \(6\ell \) field multiplications (see a detailed discussion in Sect. 2).

Bernstein, De Feo, Leroux and Smith presented in [5] a new approach for constructing and evaluating degree-\(\ell \) isogenies at a combined cost of just \({\tilde{O}}(\sqrt{\ell })\) field operations. This improvement was obtained by observing that the main polynomial product embedded in the isogeny computations, can be effectively accelerated via a baby-step giant-step approach [5, Algorithm 2]. Due to its square root complexity reduction (up to polylogarithm factors), in the remainder of this paper, we will refer to this variant of Vélu’s formulas, as √élu formulas or simply √élu.

As we will see in this paper, and as it was already hinted in [5], √élu has a noticeable impact on the performance of CSIDH, and even more so on B-SIDH. By way of illustration, consider the combined cost of constructing and evaluating degree-\(\ell \) isogenies for \(\ell =587,\) which corresponds to an example highlighted in [5, Appendix A.3].Footnote 1 For that degree \(\ell ,\) the authors report a cost of just \(2296\approx 3.898(\ell +2)\) field multiplications and squaring operations. This has to be compared with the cost of a classical Vélu approach that would take some \(3544\approx 6.017(\ell +2)\) multiplications.

In spite of the groundbreaking result announced in [5], along with the high performance achieved by its companion software library, the authors did not provide a practical cost analysis of their approach, but rather, they focus their attention on its asymptotical analysis. Moreover, their √élu implementation reported a rather modest 1% and 8% speedup over the traditional Vélu’s formulas when applied to the non constant-time CSIDH-512 and CSIDH-1024 instantiations, respectively. Furthermore, the authors of [5] left open the problem of assessing the practical impact of √élu on CSIDH and B-SIDH constant-time implementations.

Our Contributions. We present a concrete computational analysis of √élu. From this analysis, we conclude that for virtually all practical scenarios, the best approach for performing the polynomial products associated to the isogeny arithmetic is achieved by nothing more than carefully tailored Karatsuba polynomial multiplications. The main practical consequence of this observation is that computing degree-\(\ell \) isogenies with √élu has a concrete computational cost dominated by a \(b^{\log _2{(3)}}\) factor, where \(b \approx \frac{\sqrt{(\ell - 1)}}{2}\). We also present several tricks that permit to save multiplications when performing products involving the polynomials \(E_{J_0}\) and \(E_{J_1}\) as defined in Sect. 4. We additionally exploit the fact that the polynomials \(E_{J_0}\) and \(E_{J_1}\) are the reciprocal of each other. These simple but effective observations help us to construct and evaluate a degree-587 isogeny using only \(2180{{M}}\approx 3.701(\ell +2).\) This is about 5.3% cheaper than the same computation announced in [5]. This improvement also pushes to \(\ell = 89\) the threshold where computing degree-\(\ell \) isogenies with √élu becomes more effective than traditional Vélu.Footnote 2

In a nutshell, our main practical contributions can be summarized as follows:

  1. 1.

    We report the first constant-time implementation of the protocol B-SIDH introduced in [13]. Using the framework of [11], optimal strategies à la SIDH are applied to B-SIDH while also taking advantage of √élu. The experimental results for B-SIDH show a saving of up to 75% compared with an implementation of this protocol using traditional Vélu.

  2. 2.

    We used the framework presented in [11] to apply optimal strategies to CSIDH, while exploiting √élu. This allows us to present the first application of √élu to constant-time implementations of the CSIDH-512, CSIDH-1024, and CSIDH-1792 instantiations. A comparison with respect to CSIDH using traditional Vélu, reports savings of 5.357%, 13.68% and 25.938% field \({\mathbb {F}}_p\)-operations for CSIDH-512, CSIDH-1024, and CSIDH-1792, respectively.

  3. 3.

    In Sect. 4.3, we show that the number of field multiplications required for computing degree-\(\ell \) isogenies using √élu with Karatsuba polynomial multiplication has concrete computational cost closer to \(O(b^{\log _2(3)}).\)

Outline. The remainder of this paper is organized as follows. In Sect. 2, we give a description of traditional Vélu’s formulas. We include also a compact description of the B-SIDH and CSIDH protocols. In Sect. 3, we briefly discuss the application of optimal strategies to CSIDH and B-SIDH. In Sect. 4, we present an explicit description of √élu main building blocks KPS, xEVAL, and xISOG. In addition, we discuss several √élu algorithmic improvements in Sect. 4.2. We report the experimental results obtained from our software library in Sect. 5, first in Sect. 5.1 for CSIDH and then in Sect. 5.2 for B-SIDH. Finally, our concluding remarks are drawn in Sect. 6.

Notation.  M, S, and a denote the cost of computing a single multiplication, squaring, and addition (or subtraction) in the prime field \({\mathbb {F}}_p\), respectively.

2 Background

Most if not all of the fastest isogeny-based constant-time protocol implementations have adopted for their schemes Montgomery and twisted Edwards curve models. A Montgomery curve [26] is defined by the equation \(E_{A,B}: By^2~=~x^3 + Ax^2 + x\), such that \(B \ne 0\) and \(A^2 \ne 4.\) For the sake of simplicity, we will write \(E_{A}\) for \(E_{A,1}\) and will always consider \(B = 1.\) Moreover, it is customary to represent the constant A in the projective space \({\mathbb {P}}^1\) as \((A': C'),\) such that \(A = A'/C'\) (see [15]).

Let \(q = p^n,\) where p is a large prime number and n a positive integer. Let E be a supersingular Montgomery curve \(E: y^2 = x^3 +Ax^2 +x\) defined over \({\mathbb {F}}_q\), and let \(\ell \) be an odd prime number. Given an order-\(\ell \) point \(P \in E({\mathbb {F}}_q)\), the construction of a degree-\(\ell \) isogeny \(\phi : E \mapsto E'\) of kernel \(G = \langle P \rangle \) and its evaluation at a point \(Q\in E({\mathbb {F}}_q)\backslash G\) consist of the computation of the Montgomery coefficient \(A'\in {\mathbb {F}}_q\) of the codomain curve \(E': y^2 = x^3 +A'x^2 +x\) and the image point \(\phi (Q)\), respectively. In this paper, we will refer to these two tasks as isogeny construction and isogeny evaluation computations, respectively.

Vélu’s formulas (see [22, Sect. 2.4] and [35, Theorem12.16]), have been generally used to construct and evaluate degree-\(\ell \) isogenies by performing three main building blocks known as, KPSxISOG and xEVAL. The block KPS computes the first k multiples of the point P, namely, the set \(\{P, [2]P,\dots , [k]P\}.\) Using KPS as a sort of pre-computation ancillary module, xISOG finds the constants \((A': C')\in {\mathbb {F}}_q\) that determine the codomain curve \(E'.\) Also, using KPS as a building block, xEVAL calculates the x-coordinate of the image point \(\phi (Q)\in E'.\)

After applying a number of elliptic curve arithmetic tricks [9, 14, 27], the computational expenses of KPSxISOG and xEVALhave been found to be about \(3\ell , \ell \) and \(2\ell \) multiplications, respectively. This gives an overall cost of about \(6\ell \) multiplications for the combined cost of the isogeny construction and evaluation tasks. In Sect. 4, we give a detailed discussion of how the √élu approach of [5] drastically reduces the timing costs of traditional Vélu’s formulas.Footnote 3

In the remainder of this section, we briefly discuss the two isogeny-based protocols implemented in this paper, namely, CSIDH and B-SIDH.

2.1 Overviewing the CSIDH

Here, we give a simplified description of CSIDH. For more technical details, the interested reader is referred to [8, 9, 24, 29].

CSIDH is an isogeny-based protocol that can be used for key exchange and encapsulation [8], and other more advanced protocols and primitives. Figure 1 shows how CSIDH can be executed analogously to Diffie–Hellman, to produce a shared secret between Alice and Bob. Remarkably, the elliptic curves \(E_{BA}\) and \(E_{AB}\) computed by Alice and Bob at the end of the protocol are one and the same.

Fig. 1
figure 1

CSIDH key-exchange protocol

CSIDH works over a finite field \({\mathbb {F}}_p\), where p is a prime of the form

$$\begin{aligned} p = 4\prod _{i=1}^n\ell _i - 1 \end{aligned}$$

with \(\ell _1,\dots ,\ell _n\) a set of small odd primes. For example, the original CSIDH article [8] defined a 511-bit p with \(\ell _1,\dots ,\ell _{n-1}\) the first 73 odd primes, and \(\ell _n=587\). This instantiation is commonly known as CSIDH-512.

In CSIDH, we compute the action of small prime ideals, which happen to be factors of \(p + 1\). We have that the principal ideal \((\ell _i) \subset {\mathbb {Z}}[\sqrt{-p}]\) splits into two primes, namely \({\mathfrak {l}}_i = (\ell _i,\pi -1)\) and \({\bar{{\mathfrak {l}}}}_i = (\ell _i,\pi +1)\), where \(\pi \) is the Frobenius endomorphism. From the fact that \({\bar{{\mathfrak {l}}}}_i{\mathfrak {l}}_i = (\ell _i)\) is principal, we obtain CSIDH commutative property, \( {\bar{{\mathfrak {l}}}}_i*({\mathfrak {l}}_i*E) = {\mathfrak {l}}_i*({\bar{{\mathfrak {l}}}}_i*E) = E \), for all \(E/{\mathbb {F}}_p\) with endomorphism ring \(\text {End}(E) \cong {\mathbb {Z}}[\sqrt{-p}]\).

The set of public keys in CSIDH is a subset of all supersingular elliptic curves in Montgomery form, \(y^2=x^3+Ax^2+x,\) defined over \({\mathbb {F}}_p\). Since the CSIDH base curve E is supersingular, it follows that \(\#E({\mathbb {F}}_p) = (p + 1) = 4 \prod _{i=1}^n\ell _i\).

The input to the CSIDH class group action algorithm is an elliptic curve \(E:y^2=x^3+Ax^2+x\), represented by its A-coefficient, and an ideal class \({\mathfrak {a}} = \prod _{i =1}^{n} {\mathfrak {l}}_i^{e_i},\) represented by its list of secret exponents \((e_i,\dots ,e_n)\in {\llbracket -m \cdot \cdot m \rrbracket }^n\). The output is the A-coefficient of the elliptic curve \(E_A\) defined as,

$$\begin{aligned} E_A = {\mathfrak {a}}* E = {\mathfrak {l}}_1^{e_1} * \cdots * {\mathfrak {l}}_n^{e_n} * E. \end{aligned}$$
(1)

Taking advantage of the commutative property of the group action, we can implement the protocol shown in Fig. 1, which closely resembles the flow of the classical Diffie-Hellman protocol. Alice and Bob begin by selecting secret keys \({\mathfrak {a}}\) and \({\mathfrak {b}}\), and producing their corresponding public keys \(E_A = {\mathfrak {a}}* E\) and \(E_B = {\mathfrak {b}}* E\), respectively. After exchanging these public keys and taking advantage of the commutative property of the group action, Alice and Bob compute a shared secret as,

$$\begin{aligned} {\mathfrak {a}}* E_B = ({\mathfrak {a}}\cdot {\mathfrak {b}}) E = ({\mathfrak {b}}\cdot {\mathfrak {a}}) E = {\mathfrak {b}}* E_A. \end{aligned}$$

The computational cost of the group action described in 4 of Sect. A.1, is dominated by the calculation of n degree-\(\ell _i^{e_i}\) isogeny evaluations and constructions plus a total of \(\frac{n(n + 1)}{2}\) scalar multiplications by the prime factors \(\ell _i,\) for \(i = 1, \ldots , n.\) A similar multiplication-based approach for computing the group action algorithm was proposed in the original CSIDH protocol of [8]. It was first stated in [6, Sect. 8] (see also [20]) that this multiplication-based procedure could possibly be improved by adapting to CSIDH, the SIDH optimal strategy approach introduced by De Feo, Jao and Plût in [17]. We briefly discuss about the role of optimal strategies for large instances of CSIDH in Sect. 3, where the framework presented in [11] was adopted.

Fig. 2
figure 2

B-SIDH protocol for a prime p such that \(M|(p + 1)\) and \(N|(p - 1).\)

2.2 Playing the B-SIDH

B-SIDH was proposed by Costello in [13], Alice and Bob work in the \((p + 1)\)- and \((p - 1)\)-torsion of a set of supersingular curves defined over \({\mathbb {F}}_{p^2}\) and their quadratic twist set, respectively. B-SIDH is effectively twist-agnostic because optimized isogeny and Montgomery arithmetic only require the x-coordinate of the points along with the A coefficient of the curve.Footnote 4 This feature implies that B-SIDH can be executed entirely à la SIDH as shown in Fig. 2.Footnote 5

More concretely, as before let \(E: By^2 = x^3 + Ax^2 + x\) denote a supersingular Montgomery curve defined over \({\mathbb {F}}_{p^2},\) so that \(\#E({\mathbb {F}}_{p^2}) = (p + 1)^2,\) and let \(E_t / {\mathbb {F}}_{p^2}\) denote the quadratic twist of \(E / {\mathbb {F}}_{p^2}\). For efficiency reasons, the prime p is chosen so that both, \(p + 1\) and \(p -1\) are M- and N-smooth, respectively. In other words, \(M| (p + 1)\) and \(N| (p - 1)\). Then, \(E_t / {\mathbb {F}}_{p^2}\) can be modeled as, \((\gamma B) y^2 = x^3 + Ax^2 + x\), where \(\gamma \in {\mathbb {F}}_{p^2}\) is a non-square element and \(\#E_t({\mathbb {F}}_{p^2}) = (p - 1)^2.\) Note that the isomorphism connecting these two curves is determined by the map \(\iota :(x,y) \mapsto (x, jy)\) with \(j^2 = \gamma \) (see [13, Sect. 3]).

Hence, for any \({\mathbb {F}}_{p^2}\)-rational point \(P = (x,y)\) on \(E_t / {\mathbb {F}}_{p^2}\) it follows that \(Q = \iota (P) = (x, jy)\) is an \({\mathbb {F}}_{p^4}\)-rational point on E, such that \(Q + \pi ^2(Q) = {\mathcal {O}}\). Here \(\pi :(x,y) \mapsto (x^p, y^p)\) is the Frobenius endomorphism. This implies that Q is a zero-trace \({\mathbb {F}}_{p^4}\)-rational point on \(E / {\mathbb {F}}_{p^2}\).

B-SIDH can thus be seen as a reminiscent of the CSIDH protocol [8], where the quadratic twist is exploited to perform the computations using rational and zero-trace points with coordinates in \({\mathbb {F}}_{p^2}.\) Although B-SIDH allows to work over smaller fields than either SIDH or CSIDH, it requires the computation of considerably larger degree-\(\ell \) isogenies.

As illustrated in Fig. 2, B-SIDH can be executed analogously to the main flow of the SIDH protocol. B-SIDH public parameters correspond to a supersingular Montgomery curve \(E / {\mathbb {F}}_{p^2}:By^2 = x^3 + Ax^2 + x\) with \(\#E({\mathbb {F}}_{p^2}) = {(p+1)^2}\), two rational points \(P_a\) and \(Q_a\) on \(E / {\mathbb {F}}_{p^2}\), and two zero-trace \({\mathbb {F}}_{p^4}\)-rational points \(P_b\) and \(Q_b\) on \(E / {\mathbb {F}}_{p^2}\) such that

  • \(P_a\) and \(Q_a\) are two independent order-M points with \(M\mid (p+1)\), \(\gcd (M,2) = 2\), and \(\left[ \frac{M}{2}\right] Q_a = (0,0)\);

  • \(P_b\) and \(Q_b\) are two independent order-N points with \(N\mid (p-1)\) and \(\gcd (N,2) = 1\).

In practice, B-SIDH is implemented using projectivized x-coordinate points, and thus the point differences \({PQ}_a = P_a - Q_a\) and \({PQ}_b = P_b - Q_b\) must also be exchanged. Since the x-coordinates of \(P_a, Q_a, {PQ}_a, P_b, Q_b\) and \({PQ}_b,\) all belong to \({\mathbb {F}}_{p^2}\), a B-SIDH implementation must perform field arithmetic on that quadratic extension field.

As in the case of SIDH, the protocol flow of B-SIDH must perform two main phases, namely, key generation and secret sharing. In the key generation phase, the evaluation of the projectivized x-coordinate points x(P), x(Q) and \(x(P - Q)\) is required. Thus for B-SIDH, secret sharing is significantly cheaper than key generation.

We briefly discuss the role of optimal strategies for large instances of CSIDH and B-SIDH, in the next section.

3 Optimal strategies for the CSIDH and the B-SIDH

In [17], optimal strategies were introduced to efficiently compute degree-\(\ell ^e\) isogenies at a cost of approximately \(\frac{e}{2} \log _2{e}\) scalar multiplications by \(\ell \), \(\frac{e}{2} \log _2{e}\) degree-\(\ell \) isogeny evaluations, and e constructions of degree-\(\ell \) isogenous curves. Optimal strategies can be obtained using dynamic programming (see [2, 11] for concrete algorithms).

In the context of SIDH, optimal strategies tend to balance the number of isogeny evaluations and scalar multiplications to \(O(e\log {(e)}).\) In the case of CSIDH, optimal strategies are expected to be largely multiplicative, i.e., optimal strategies will tend to favor the computation of more scalar multiplications over isogeny evaluations. This is due to the fact that these operations are cheaper than large prime degree-\(\ell \) isogeny evaluations.

Let \({\mathcal {L}} = [ \ell _1, \ell _2, \ldots , \ell _{n} ]\) be the list of small odd prime numbers such that \(p~=~4 \cdot \prod _{i=1}^{n} \ell _i - 1\) is the prime number used in CSIDH. Here, we adopt the framework presented in [11], where the authors heuristically assumed that an arrangement of the set \({\mathcal {L}}\) from the smallest to the largest \(\ell _i,\) is close to the global optimal. For this fixed ordering, the authors of [11] reported a procedure that finds an optimal strategy with cubic complexity with respect to n.

Optimal strategies can also be used to improve the performance of B-SIDH, although in this case, we can see the resulting strategies as a hybrid between SIDH and CSIDH. On the one hand, B-SIDH follows the same SIDH protocol flow. On the other hand, B-SIDH must construct/evaluate several isogenies whose degrees are powers of large odd primes, as in CSIDH.

Let us assume that we need to construct a degree-L isogeny with \(L = {\ell _1}^{e_1} \cdot {\ell _2}^{e_2} \cdots {\ell _n}^{e_n}\), and let us write

$$\begin{aligned} \begin{aligned} {\mathcal {L}}'&= [\underbrace{\ell _1, \ldots , \ell _1}_{e_1},\underbrace{\ell _2, \ldots , \ell _2}_{e_2}, \ldots , \underbrace{\ell _n, \ldots , \ell _n}_{e_n} ]. \end{aligned} \end{aligned}$$
(2)

Then, in order to efficiently execute either the key generation or the secret sharing main phases of B-SIDH, we must find an optimal strategy for the setting \({\mathcal {L}}'\) as described in Algoritm 5 of “Appendix A.1”.

Notice that any B-SIDH strategy can be encoded as is customary in SIDH and CSIDH, i.e., by a list of \(e-1\) positive integers where \(e = \sum _{i = 1}^{n}e_i.\) Moreover, any such strategy can be evaluated by executing the dynamic-programming procedure shown in Algorithm 5.

4 New Vélu’s formulas

In this section we present a more detailed discussion of the √élu algorithms and their application to isogeny-based cryptography. We give several algorithmic tricks that slightly improve the performance of √élu as it was presented in [5].

Let \(E_A/{\mathbb {F}}_q\) be an elliptic curve defined in Montgomery form by the equation \(y^2 = x^3 + Ax^2 + x\), with \(A^2 \ne 4\). Let P be a point on \(E_A\) of odd prime order \(\ell \), and \(\phi : E_A \rightarrow E_{A'}\) a separable isogeny of kernel \(G = \langle P\rangle \) and codomain \(E_{A'}/{\mathbb {F}}_q: y^2 = x^3 + A'x^2 + x\).

Our main task here is to compute \(A'\) and the x-coordinate \(\phi _x(\alpha )\) of \(\phi (Q)\), for a rational point \(Q = (\alpha , \beta ) \in E_A({\mathbb {F}}_q)\setminus G\). As mentioned in [5] (see also [14, 25] and [27]), the following formulas allow to accomplish this task,

$$\begin{aligned} A' = 2 \; \frac{1+d}{1-d} \;&\text{ and } \; \phi _x(\alpha ) = \alpha ^\ell \; \frac{h_S(1/\alpha )^2}{h_S(\alpha )^2}, \text{ where } \end{aligned}$$
$$\begin{aligned} S = \{1, 3, \ldots , \ell -2\},\quad&d = \left( \frac{A-2}{A+2}\right) ^\ell \left( \frac{h_S(1)}{h_S(-1)}\right) ^8, \text{ and } \\ \;&h_S(X) = \prod _{s \in S} (X - x([s]P)). \end{aligned}$$

From the above, we see that the efficiency of computing \(A'\) and \(\phi _x(\alpha )\) directly depends on the cost of evaluating the polynomial \(h_S(X) = \prod _{s \in S} (X - x([s]P))\). A naive approach would compute \(h_S(X)\) by performing \(\#S-1\) polynomial products. Alternatively, exploiting a baby-step giant-step strategy √élu obtains a square root complexity speedup over a traditional Vélu approach. In the following, we briefly sketch this strategy.

Given \(E_A/{\mathbb {F}}_q\) an order-\(\ell \) point \(P \in E_A({\mathbb {F}}_q)\), and some value \(\alpha \in {\mathbb {F}}_q\) we want to efficiently evaluate the polynomial, \(h_S(\alpha ) = \prod _{i}^{\ell - 1} (\alpha - x([i]P)).\) From Lemma 4.3 of [5],

$$\begin{aligned} (X -x(P + Q))(X - x(P - Q))&= X^2 + \frac{F_1(x(P), x(Q))}{F_0(x(P), x(Q))}X \\&\quad + \frac{F_2(x(P), x(Q))}{F_0(x(P), x(Q))} \end{aligned}$$

where,

$$\begin{aligned} F_0(Z, X)&= Z^2 - 2XZ + X^2; \\\nonumber F_1(Z, X)&= -2( XZ^2 + (X^2 + 2AX + 1)Z + X); \\ F_2(Z, X)&= X^2Z^2 - 2XZ + 1.\nonumber \end{aligned}$$
(3)

This suggests a rearrangement à la Baby-step Giant-step as,

$$\begin{aligned} h(\alpha ) = \prod _{i\in I}\prod _{j\in J}(\alpha -x([i + s\cdot j]P))(\alpha -x([i - s\cdot j]P)) \end{aligned}$$

Now \(h(\alpha )\) can be efficiently computed by calculating the resultants of polynomials of the form,

$$\begin{aligned} h_I&\leftarrow \prod _{x_i \in {\mathcal {I}}}(Z - x_i) \in {\mathbb {F}}_q[Z]\\ E_{J}(\alpha )&\leftarrow \prod _{x_j \in {\mathcal {J}}}\left( F_0(Z, x_j)\alpha ^2 + F_1(Z, x_j)\alpha + F_2(Z, x_j)\right) . \end{aligned}$$

The most demanding operations of √élu require computing four different resultants of the form \(\texttt {Res}_Z(f(Z), g(Z))\) for polynomials \(f, g\in {\mathbb {F}}_q[Z]\). We compute these four resultants using a remainder tree approach supported by carefully tailored Karatsuba polynomial multiplications. In practice, the computational cost of performing degree-\(\ell \) isogenies using √élu is close to \(K(\sqrt{\ell })^{\log _2{3}}\) field operations, with K a constant.

4.1 Construction and evaluation of odd degree isogenies

As in Sect. 2, we consider the three building blocks KPS, xISOG, xEVAL, where KPS consists of computing the x coordinates of all the points in the kernel G, xISOG finds the codomain coefficient \(A'\), and xEVAL performs the computation of \(\phi _x(\alpha )\).

In line with the traditional approach, one could use the KPS procedure of traditional Vélu for computing the x coordinates of (\(\# S = (\ell - 1)/2\)) points in the kernel G. This will cost about \(3\ell \) field multiplications. More efficiently, √élu only computes the x-coordinates of points of G with indices in three subsets of S, each of size \(O(\sqrt{\ell })\). Denote by \({\mathcal {I}}\), \({\mathcal {J}}\) and \({\mathcal {K}}\) those subsets of S. Then, \({\mathcal {I}}\) and \({\mathcal {J}}\) are chosen such that the maps \({\mathcal {I}}\times {\mathcal {J}}\rightarrow S\) defined by \((i,j) \mapsto i+j\) and \((i,j) \mapsto i-j\) are injective and their images \({\mathcal {I}}+{\mathcal {J}}\), \({\mathcal {I}}-{\mathcal {J}}\) are disjoint. We call \(({\mathcal {I}}, {\mathcal {J}})\) an index system for S and write \({\mathcal {I}}\pm {\mathcal {J}}\) for \(({\mathcal {I}}+{\mathcal {J}}) \cup ({\mathcal {I}}-{\mathcal {J}})\). The remaining indices of S are gathered in \({\mathcal {K}}= S\backslash ({\mathcal {I}}\pm {\mathcal {J}})\). Algorithm 1 states the required KPS computations.

figure a

Let us recall that for achieving an efficient computation of xISOG and xEVAL , √élu requires the biquadratic polynomials of Eq. 3, which implies the computation of resultants of the form \(\texttt {Res}_Z(f(Z), g(Z)),\) for two polynomials \(f, g\in {\mathbb {F}}_q[Z]\).

We are now ready to present in Algorithms 2–3 the computation of xISOG and xEVAL, respectively. Deriving the resultants in Algorithms 2–3 may turn out to be a cumbersome task if it is not carried out in an elaborated way. For polynomials \(f = a\prod _{0\le i<n}(Z-x_i)\) and g in \({\mathbb {F}}_q[Z]\), their resultant \(\texttt {Res}(f,g) = a^n\prod _{0\le i<n}g(x_i)\) can be computed efficiently when the factorization of f is known, which is exactly the case in the algorithms at hand. Employing a remainder tree approach (an equivalent alternative being continued fractions), one evaluates the factors \(g(x_i)\) by computing \(g \bmod (Z-x_i)\), \(0\le i < n\), followed by their product.

One considerable advantage of using remainder trees here is that the subjacent product tree of the \((Z-x_i)\) factors, can be shared among all the resultants in Algorithm 2 and , since these linear polynomials depend only on the kernel \(\langle P \rangle \). In other words, the four resultants in Algorithms 2–3 show no dependencies among them and therefore, they can be computed concurrently by a √élu parallel implementation.

figure b
figure c

Notice that the single most recurrent high level operation of Algorithms 2–3, is the polynomial multiplication on the ring \({\mathbb {F}}_q[X].\) Thus, as in [5], it is essential that we utilize fast tailor-made polynomial multiplication algorithms. These customized algorithms are useful because for several modular computations, only a segment of the polynomial product is actually needed.

The resultant \(\texttt {Res}_Z(f(Z), g(Z))\) of two polynomials \(f, g\in {\mathbb {F}}_q[Z]\) can be computed with an asymptotic runtime complexity of \({\tilde{O}}(n)\) by using a fast polynomial multiplication. Here fast means that this polynomial operation has a \(O(n\log _2(n))\) computational complexity (see [4, p. 7,  Sect. 3]). The degree of the polynomials used for CSIDH and even B-SIDH, are sufficiently small so that Karatsuba polynomial multiplication (or related approaches such as Toom-Cook), emerges as the most efficient solution. For example, according to the implementation of [5], \(\ell = 587\) requires polynomials of degree \(\#{\mathcal {I}}= 16\) and \(2\times \#{\mathcal {J}}= 18\) (in the B-SIDH case this translates to \(\#{\mathcal {I}}, \#{\mathcal {J}}\le 150\)). It can be easily verified that Karatsuba polynomial multiplication becomes a more efficient choice than the Schönage-FFT approach (for a comprehensive analysis of these design options, see “Appendix A.2”).

4.2 Implementation speedups

In this section we report several algorithmic techniques that are exploited in our implementation to obtain some modest, but noticeably savings over [5]. Our first refinement affects xEVAL, and arises from the special shape of the biquadratic polynomials \(F_0\), \(F_1\), \(F_2\). Considering either variable, one can see that \(F_1\) is symmetric and \(F_0\) is symmetric to \(F_2\)Footnote 6, that is, \(F_1 = 1/Z^2\, F_1(1/Z, X)\) and \(F_2 = 1/Z^2\, F_0(1/Z, X)\) by, for example, considering the first variable. Now, using a projective representation of the x-coordinate \(\alpha = x/z\) in xEVAL, we can write a quadratic polynomial factor in \(E_{0,J}\) and a quadratic polynomial factor in \(E_{1,J}\) respectively as

$$\begin{aligned} E_{0,j}&= 1/x^2\left( F_0(Z, x_j)z^2 + F_1(Z, x_j)xz + F_2(Z, x_j)x^2\right) ; \\ E_{1,j}&= 1/z^2\left( F_0(Z, x_j)x^2 + F_1(Z, x_j)xz + F_2(Z, x_j)z^2\right) . \end{aligned}$$

Thus, it becomes clear that the polynomials \(x^{2\, \#J}E_{0,J}\) and \(z^{2\, \#J}E_{1,J}\) are symmetric to one another, allowing to save the computation of one of the two products \(E_{0,J}\), \(E_{1,J}\). This gives us an expected saving of \(\#{\mathcal {J}}\cdot \log _2\left( \#{\mathcal {J}}\right) \) polynomial multiplications via product trees.

Our next improvement is focused on the computation of \(E_{0,j}\) required in xEVAL. Let us write \(x_j~=~X_j/Z_j.\) Then, \(\left( F_0(Z, x_j)z^2 + F_1(Z, x_j)xz + F_2(Z, x_j)x^2\right) \) can be expressed as \(aZ^2 + bZ + c,\) where

$$\begin{aligned} a&= C{( x Z_j - z X_j)}^2; \\ 2b&= \big [C(X^2 + Z^2)\big ](-4X_jZ_j) - \big [2(X_j^2 + Z_j^2)\big ]\big (2[ C(XZ)]\big ) \\&\quad + \big (2[A'(XZ)]\big )(-4X_jZ_j);\\ c&= C{( x X_j - z Z_j )}^2. \end{aligned}$$

The three equations above can be implemented (with the help of some extra pre-computations required in xISOG) at a cost of 7M + 3S + 12a field operations. This cost represents a saving with respect to the implementation of [5], which requires 11M + 2S + 13a field operations. Assuming M\(= {{\textbf {S}}},\) this implies that our proposed formulas save 3 field multiplications per polynomial \(E_{0,j}\), \(0 \le j < \#{\mathcal {J}}\).

Let us now illustrate the improvements just described applied to the example \(\ell = 587.\) Let us recall that in the implementation of [5], we have \(\#{\mathcal {I}}= 16\) and \(\#{\mathcal {J}}= 9\). Consequently, our first improvement saves \(9 \log _2(9) \approx 28\) polynomial multiplications via product trees. On the other hand, our second improvement saves \(3 \times \#{\mathcal {J}}= 3 \times 9 = 27\) field multiplications.

4.3 A concrete complexity analysis

In this section, we present the computational cost associated to the combined evaluation of the KPS, xISOG, and xEVAL procedures.Footnote 7

Let \(b = \lfloor \frac{\sqrt{\ell -1}}{2}\rfloor \) as given in Step 1 of Algorithm 1. Note that KPS  (see Algorithm 1), can be performed at a cost of about 4b differential point additions (assuming \(\#{\mathcal {I}}\approx \#{\mathcal {J}}\approx b, \#{\mathcal {K}}\approx 2b\)), which implies an expense of at most \((24b){{M}}\) field multiplications.

Observe also that the computation of the polynomial \(h_I(Z)\) required at Step 1 of both, xISOG (2) and xEVAL (3) procedures, can be shared and thus must be computed only once. One interesting observation of [5], is that the computation of the polynomials \(E_{0,J}\) and \(E_{1,J}\) in xISOG (see Steps 2–3 of xISOG 2), can be performed at a cost of only one product tree procedure. Furthermore, as it was already discussed in Sect. 4.2, this same trick can also be applied to xEVAL, i.e., Steps 2–3 of xEVAL 3 can be calculated by executing only one product tree. Hence, each polynomial \(E_{i,J}\), \(i=0,1\), required by xISOG and xEVAL can be obtained at a cost of \((3b){{M}}\) and \((10b){{M}}\) field operations, respectively.

Additionally, in Steps 4–5 of xISOG and xEVAL, the computation of two resultants are required, implying that four resultants must be computed in total. Each Resultant corresponds to the computation of \(\texttt {Res}_Z(f(Z), g(Z))\) such that \(f, g\in {\mathbb {F}}_q[Z]\), \(\deg f = b' \approx b\) and \( \deg g = 2b\). We give in “Appendix A.3”, a detailed description of the cost of computing such a resultant in terms of b. This calculation is performed by computing the product of the remainder tree leaves. In “Appendix A.3”, it is shown that the complexity in terms of field operations associated to the computation of a resultant as described in Sect. 4.2 is given as,

$$\begin{aligned} R(b)&= \left( 3b^{\log _2(3)} + b\log _2(b) - \frac{5}{3}b + \frac{5}{6} \right) \;. \end{aligned}$$
(4)

The constants \(M_0\) and \(M_1\) in Steps 6-7 of xISOG and xEVAL, have a cost of \((4b){{M}}\) and \((8b){{M}}\) field operations, respectively. Lastly, the computations of the coefficient d of xISOG and the output of xEVAL require about \((6\log _2(b) + 22)\) multiplications. All in all and invoking Eq. 4, the evaluation of KPS, xISOG, and xEVAL procedures have a combined cost of approximately,

$$\begin{aligned} \text{ Cost }(b)&= 4\left( 3b^{\log _2(3)} + b\log _2(b) - \frac{5}{3}b + \frac{5}{6} \right) \nonumber \\&\quad + \left( b^{\log _2(3)} - \frac{2}{3}b\right) + 2\left( 3b^{\log _2(3)} - 2b\right) \nonumber \\&\quad + 49b + 6\log _2(b) + 22 \nonumber \\&= 19b^{\log _2(3)} + 4b\log _2(b) + \frac{113}{3}b + 6\log _2(b) + \frac{76}{3}\;. \end{aligned}$$
(5)
Fig. 3
figure 3

Measured and expected running time of \(\texttt {KPS}~+ \texttt {xISOG}~+ \texttt {xEVAL}\) for all the 207 small odd primes \(\ell _i\) required in the group action evaluation of CSIDH-1792 (see [11]). All computational costs are given in \({\mathbb {F}}_p\)-multiplications. The expected running time corresponds to \(1.4 \times \text{ Cost }(b)\). Additionally, \(b \approx \frac{\sqrt{(\ell - 1)}}{2}\)

Recall that \({\mathcal {K}}\) corresponds to the missing kernel computation part not included in the remainder trees, whose size is upper bounded by 2b elements. The precise size of \({\mathcal {K}}\) depends on the choice of the prime \(\ell \), which in turn determines the value of b. Since our model does not account for the cost associated with \({\mathcal {K}}\), it is just natural to expect a slight experimental discrepancy as shown in Fig. 3a, where we present the ratio between experimental and expected isogeny costs. For the sake of simplicity, we did not delve into further refinements of our model.

However, to quantify the correctness of the cost predicted by Eq. 5, we performed the following experiment. We computed degree-\(\ell \) isogenies for all the odd prime factors \(\ell _1, \ell _2, \ldots , \ell _{207}\) of \(p +1,\) where p is the prime used in the CSIDH-1792 instantiation proposed in [11]. Figure 3 shows an accurate correlation between the theoretical cost of Eq. 5 and the experimental results obtained from our Python3 library software, where we can see that the number of required field multiplications is at most 1.4 times larger than the expected value predicted by our analysis above.

Recall that the derivation of the expected cost of Eq. 5 (See “Appendix A.3”), is driven by the assumption that \({{M}}~= {{S}}\), which is the typical case for CSIDH. For the B-SIDH case on the other hand, since one is working on the quadratic extension field \({\mathbb {F}}_{q= p^2},\) it holds that \({{M}}_{{\mathbb {F}}_q}~= 3{{M}}_{{\mathbb {F}}_p}\) and \({{S}}_{{\mathbb {F}}_q}~= 2{{M}}_{{\mathbb {F}}_p}\), and thus \({{S}}_{{\mathbb {F}}_q}~= \frac{2}{3}{{M}}_{{\mathbb {F}}_q}.\) However, as an upper bound (for the B-SIDH case), we can assume \({{M}}_{{\mathbb {F}}_q}~= 3{{M}}_{{\mathbb {F}}_p}\) and \({{M}}_{{\mathbb {F}}_q}~= {{S}}_{{\mathbb {F}}_q}\), which gives an expected running-time of \(3\times \text{ Cost }(b)\) \({\mathbb {F}}_p\)-multiplications.

A memory analysis of √élu reveals that less than 4b points, equivalent to 8b field elements, are computed and stored in KPS. The computation of the trees determined by the polynomial \(h_I\) in Step 1 of xISOG and xEVAL, requires the storage of no more than \(3b\log _2{b}\) field elements.Footnote 8 All in all, √élu memory cost is of about \(8b + 3b\log _2{b}\) field elements.

By performing a quick inspection of Algorithms 1–3, one can see that it is straightforward to concurrently compute many of the operations required by all three of those procedures. Specifically, the calculation of the four resultants in Steps 4–5 of Algorithms 2–3 show no dependencies among them and can therefore be computed in parallel by a multi-core processor. Since the four resultant calculations accounts for about 85% of the total computational cost of √élu, the expected savings are substantial.

5 Experiments and discussion

In this section, we introduce the Python3-code constant-time library sibc (Supersingular Isogeny-Based Cryptographic constructions), dedicated to isogeny-based primitives. The sibc library aims to easily compare, test, and run SIDH-based primitives such as SIDH, SIKE, CSIDH, and BSIDH.

We remark that our CSIDH and B-SIDH implementations make extensive usage of the √élu formulas introduced in [5], boosted with the computational tricks presented in Sect.  4. Furthermore, we also exploit the optimal strategy framework presented in [11], which helps us to maximize the performance of both protocols.

In summary, our Python3-code software allows us to readily benchmark the total number of additions, multiplications, and squarings required by the instantiations of the two aforementioned protocols. To this end, we included counters inside the field arithmetic function cores for adding, multiplying, and squaring field elements. Hence, all the performance figures presented in this section correspond with our count of field operations in the base field \({\mathbb {F}}_p\). In the case of the B-SIDH experiments, we use standard arithmetic tricks over \({\mathbb {F}}_{p^2}\), to perform the multiplication and squaring at a cost of \(3{{M}}~+ 5{{a}}\) and \(2{{M}}~+ 3{{a}}\) base field operations, respectively.

All the experiments performed in this section are centered on comparing the following configurations, which are based on tradicional Vélu’s formulas [14, 31] and élu:

  • Using tradicional Vélu (labeled as tvelu);

  • Using √élu (labeled as svelu);

  • Using a hybrid between traditional Vélu and √élu (labeled as hvelu).

Notice that because of the nature of each protocol, the B-SIDH experiments are randomness-free, which implies that the same cost is reported for any given instance. In contrast, the CSIDH experiments have a variable cost determined by the randomness introduced by the order of the torsion points sampled from its Elligator-2 procedure (for a more detailed explanation see [9]).

5.1 Experiments on the CSIDH

Our Python3-code implementation of the CSIDH protocol includes a portable version for the following CSIDH instantiations,

  1. 1.

    OAYT-style [29]: two torsion point with dummy isogeny constructions;

  2. 2.

    MCR-style [24]: one torsion point with dummy isogeny constructions;

  3. 3.

    Dummy-free style [9]: two torsion point without dummy isogeny constructions.

Our software supports performing experiments with any prime field of \(p = 2^e \cdot \left( \prod _{i=1}^n \ell _i\right) - 1\) elements, for any \(e\ge 1.\) Our experiments were focused on the CSIDH-512 prime proposed in [8], the CSIDH-1024 prime proposed in [5], and the CSIDH-1792 prime proposed in [11]. We stress that the quantum security level offered by the CSIDH instantiations reported in this work have been recently call into question in [7, 10, 30].

Table 1 Number of field operation for the constant-time CSIDH-512 group action evaluation
Table 2 Number of field operation for the constant-time CSIDH-1024 group action evaluation

The required number of field operations for those CSIDH variants are reported in Tables 12, and 3. In addition, each table presents a comparison between the results of this work and the ones presented in [11]. Moreover, for each configuration we adopted optimal strategies and suitable bound vectors according to [11, Sects.  3.4, 4.4 and 4.5].

When comparing with respect to CSIDH constant-time implementations using traditional Vélu’s formulas, our experimental results report a saving of 5.357%, 13.68% and 25.938% field \({\mathbb {F}}_p\)-operations for CSIDH-512, CSIDH-1024, and CSIDH-1792, respectively. These results are somewhat more encouraging than the ones reported in [5], where speedups of about 1% and 8% were reported for a non constant-time implementation of CSIDH-512 and CSIDH-1024.

CTIDH, a fast constant-time implementation of CSIDH (cf. with Table 6), was recently reported in [3]. CTIDH combines an efficient key space mechanism along with a Matryoshka-structure-like for processing several degree-\(\ell \) isogenies in batches [23]. CTIDH’s key space is upper bounded by \(B_{i}\), the 1-norm of the i-th batch. Security wise, it is critical that all the isogenies associated to a given batch are calculated in constant-time. To assure this, all the isogenies in a batch are computed at the cost of the maximal degree of this batch. This can only be achieved by adding a significant number of dummy operations.

It appears clear that √élu plays a significant role for achieving CTIDH’s impressive computational performance. Unfortunately, the authors of [3] did not include a version of CTIDH using traditional Vélu’s formulas. This experiment would have allowed to precisely quantify the speedup contributed by √élu alone in the CTIDH performance.

Table 3 Number of field operation for the constant-time CSIDH-1792 group action evaluation. Counts are given in millions of operations, averaged over 1024 random experiments
Table 4 Number of base field operation in \({\mathbb {F}}_p\) for the public key generation phase of BSIDH
Table 5 Number of base field operation in \({\mathbb {F}}_p\) for the secret sharing phase of BSIDH
Table 6 Skylake Clock cycle timings for a key exchange protocol for different instantiations of the SIDH, CSIDH, and B-SIDH protocols

5.2 Experiments playing the B-SIDH

To the best of our knowledge, we present in this section the first implementation of the B-SIDH protocol, which was designed to be a constant-time one. As in the case of CSIDH, we report here the required number of \({\mathbb {F}}_p\) arithmetic operations. Similarly to CSIDH, the B-SIDH implementation provided in this work, allows to perform experiments with any prime field of p elements such that \(p \equiv 3 \bmod 4\). The main contribution provided in this subsection corresponds to a comparison of B-SIDH instantiations using the primes B-SIDHp253, B-SIDHp255,B-SIDHp247,B-SIDHp237 and B-SIDHp257, which are specified in “Appendix A.4”.

All the above primes were chosen considering the following features: (i) \(p \equiv 3 \bmod 4\), (ii) the parameters \(M|(p + 1)\) and \(N|(p - 1)\) are as smooth as it was possible to find, and (iii) \(2^{210} < N, M\). Our Python3-code implementation uses the degree-4 isogeny construction and evaluation formulas given in [12]. Additionally, the key generation does not perform xISOG  calls, which are expensive for large primes, it reconstructs the A-coefficient by using the three points pushed under the isogeny being computed (that is, we implement a projective version of get_A() procedure). The corresponding experimental results for the key generation and secret sharing phases are presented in Tables 4 and 5, respectively. It can be seen that significant savings ranging from 24% up to 76% were obtained by B-SIDH combined with √élu with respect to the same implementation of this protocol using traditional Vélu’s formulas.

Notice that the best results were obtained when using the B-SIDHp253 configuration, which seems to be faster than any CSIDH instantiation, mostly due to its small 256-bit field.

5.3 Discussion

Table 6 presents the clock cycle counts for several isogeny-based protocols recently reported in the literature. Rather than providing a direct comparison, the main purpose of including this table here is that of providing a perspective of the relative timing costs of several emblematic implementations of isogeny-based key-exchange primitives.

Table 7 Number of base field operation in \({\mathbb {F}}_p\) of both SIKE and B-SIKE (B-SIDH with KEM) protocol.

Clearly, √élu has a dramatic impact on the performance of B-SIDH, so much so that one can claim confidently that B-SIDH outperforms any instantiation of CSIDH. For example, using the B-SIDH configuration presented in example 2 of [13], Alice and Bob will require about \(1.620 \times {2}^{20}\) and \(1.343 \times {2}^{20}\) base field multiplications in \({\mathbb {F}}_p,\) where p is a 256-bit prime, respectively. In particular, making the conservative assumption that a 256-bit field multiplication takes 40 clock cycles, then a key exchange using B-SIDH would cost about \(118.520 \times 2^{20}\) clock cycles. On the other hand, the fastest CISDH-512 group action evaluation (see [11, 20]) takes about \(230\times 2^{20}\) clock cycles. Therefore, a key exchange using CSIDH would take about \(920 \times {2}^{20}\) clock cycles (considering four group action evaluations). This implies that B-SIDH is expected to be about 8x faster than the fastest CSIDH-512 C-code implementation.

Costello proposed as a possible application for B-SIDH, key exchange protocols executed in the context of a client-server session [13]. Typically, one could expect that the client has much more constrained computational resources than the server. In the case that we choose the prime B-SIDHp237 for performing a B-SIDH key exchange, Alice and Bob would require about \(0.13 \times {2}^{20}\) and \(3.953 \times {2}^{20}\) base field multiplications in \({\mathbb {F}}_p.\) Once again, assuming that a 256-bit field multiplication takes 40 clock cycles, then a key exchange using B-SIDH would cost about \(5.20 \times {2}^{20}\) and \(158.12 \times {2}^{20}\) clock cycles for Alice and Bob, respectively. For comparison, a SIKEp434 key exchange costs about \(10.73 \times {2}^{20}\) and \(12.04 \times {2}^{20}\) clock cycles for Alice and Bob, respectively. Hence, Alice (the client) will benefit with a B-SIDHp237 computation that is about twice as fast as the one required in SIKEp434. This will come at the price that Bob’s computation (the server) would become thirteen times more expensive. On the other hand, the B-SIDHp237 key sizes are noticeably smaller than the ones required in SIKEp434. This feature is especially valuable for highly constrained client devices.

In terms of security, the B-SIDH instantiations reported in this paper should achieve the same classical and quantum security level than a SIDH instantiations using the SIKEp434 prime. However, B-SIDH is susceptible to the active attack described in [19]. To offer protection against this kind of attacks, B-SIDH should incorporate a key encapsulation mechanism (KEM) such as the one included in [2]. Essentially, a B-SIDH augmented version with a key encapsulation mechanism (B-SIKE), inherits the same SIKE protocol flow: (i) KeyGen performs one degree-M isogeny, ii) Encaps computes two ephemeral degree-N isogenies, and iii) Decaps executes one degree-M isogeny and one ephemeral degree-N isogeny.

We illustrate the impact of a KEM in B-SIDH, comparing in Table 7 the associated timings of SIKE and B-SIKE instantiations with similar security level. In particular, we focus on our best B-SIDH instantiation: B-SIDHp253 with KEM (B-SIKEp253).

Once again, assuming that a 253-bit field multiplication takes 40 clock cycles, then a B-SIKEp253 instantiation would cost \((0.772 + 1.404 + 1.332)\times 40.0 \approx 140.32\) Millions of clock cycles. This is still faster than any CSIDH-512 instantiation, and also faster than CTIDH-512 [3], which is about twice as fast as CSIDH-512) Footnote 9

6 Conclusions

In this paper, we presented a concrete analysis of the √élu procedure introduced in [5]. From our analysis, we conclude that for most practical scenarios, the best approach for performing the polynomial products associated to √élu, is Karatsuba polynomial multiplication. The main concrete consequence of this observation is that computing degree-\(\ell \) isogenies with √élu has a practical computational complexity essentially proportional to \( b^{\log _2{(3)}},\) where \(b \approx \frac{\sqrt{(\ell - 1)}}{2}\).

We introduced several algorithmic tricks that permit to save multiplications when performing the polynomial products involving the computation of the resultants included in 23. The combination of these improvements allows us to construct and evaluate degree-\(\ell \) isogenies with a slightly lesser number of arithmetic operations than the ones employed in [5].

We applied √élu and optimal strategies to several instantiations of the CSIDH and B-SIDH protocols, producing the very first constant-time implementation of the latter protocol for a selection of primes taken from [5, 13].

Our future work includes C constant-time single-core and multi-core implementations of the two protocol instantiations studied in this work. We would also like to study more efficient selections of the sets \({\mathcal {I}}, {\mathcal {J}}\) and \({\mathcal {K}}\) as defined in Sect. 4.1, which could yield more economical computations of √élu.