1 How to hash onto pairing-friendly curves

In the last years, the author and some other researchers have made progress in constructing novel efficient hash functions to elliptic curves over finite fields. Today, it can be undeniably said that the theory of such hash functions has become an independent, rapidly developing subarea of elliptic cryptography. This claim is particularly confirmed by Chávez-Saab et al.’s article [1], which was recognized as one of the three best papers at Asiacrypt 2022. There are also a lot of other sources (including recent ones) on the topic. Inter alia, good surveys are represented in [2, Section 8], [3]. So, with the reader’s permission, a detailed introduction is not provided in order to avoid repetition. Instead, all necessary notions and statements will be gradually introduced or referred to in the process.

Let \(E_1\) be an ordinary (a.k.a. non-supersingular) pairing-friendly elliptic curve of embedding degree \(k > 1\) over a finite field \(\mathbb {F}_{q}\). Besides, put \(d:= \#\textrm{Aut}(E_1)\) and \(e:= k/d\). Recall that \(d \in \{2, 4, 6\}\), where \(d = 2\) if and only if \(j(E_1) \ne 0, 1728\) (resp., \(d = 4\) iff \(j(E_1) = 1728\) and \(d = 6\) iff \(j(E_1) = 0\)). Furthermore, we will assume that \(e \in \mathbb {N}\). It is claimed (e.g., in [2, Theorem 3.3.5]) that for any prime divisor \(r \mid N_1:= \#E_1(\mathbb {F}_{q})\) there is always a unique non-trivial \(\mathbb {F}_{q^e}\)-twist \(E_2\) (of degree d) such that \(r \mid N_2:= \#E_2(\mathbb {F}_{q^{e}})\). As is customary, denote by \(\mathbb {G}_1 \subset E_1(\mathbb {F}_{q})\) and \(\mathbb {G}_2 \hookrightarrow E_2(\mathbb {F}_{q^e})\) the eigenspaces of the Frobenius endomorphism on \(E_1[r] \subset E_1(\mathbb {F}_{q^k})\), associated with the eigenvalues 1, q, respectively. By abuse of notation, we will identify the order r subgroup \(\mathbb {G}_2 \subset E_1(\mathbb {F}_{q^k})\) with its image under an \(\mathbb {F}_{q^e}\)-isomorphism \(E_1 \rightarrow E_2\). Thus, \(\mathbb {G}_1 = E_1(\mathbb {F}_{q})[r]\) and \(\mathbb {G}_2 = E_2(\mathbb {F}_{q^e})[r]\).

This section explains how to hash onto \(\mathbb {G}_2\) more efficiently and why we sometimes do not need to hash directly onto \(\mathbb {G}_1\). In the first case, we will significantly exploit the presence of clearing the cofactor \(c_2:= N_2/r\). In the second one, on the contrary, clearing the cofactor \(c_1:= N_1/r\) can be fully or partially avoided. The fact is that optimal ate pairings \(a: \mathbb {G}_2 \times \mathbb {G}_1 \rightarrow \mu _r \subset \mathbb {F}_{q^k}^*\) [2, Theorem 3.3.4] can be painlessly (unlike \(E_2(\mathbb {F}_{q^e}) \times \mathbb {G}_1\)) extended to \(\mathbb {G}_2 \times E_1(\mathbb {F}_{q})\) in certain scenarios.

At the moment, due to [4, Table 1], the curve BLS12-381 is a de facto standard in pairing-based cryptography. More generally, the Barreto–Lynn–Scott family with \(k = 12\) and \(d = 6\) (see, e.g., [5, Section 3.1]) possesses the parameters

$$\begin{aligned} r(z) = z^4 - z^2 + 1, \quad q(z) = (z-1)^2 r(z)/3 + z. \end{aligned}$$

By definition, BLS12-381 is generated by \(z:= -0xd201000\) 000010000 and hence

$$\begin{aligned} \lceil \log _2(-z) \rceil = 64, \quad \lceil \log _2(r) \rceil = 255, \quad \lceil \log _2(q) \rceil = 381. \end{aligned}$$

Notice that \(r \ll q\) in contrast to the Barreto–Naehrig family [2, Example 4.2] of prime-order curves having also \(k = 12\) and \(d = 6\). Furthermore, as indicated in [6, Section 3.1], there are BLS curves (as opposed to BN ones) of arbitrary embedding degree k such that \(3 \mid k\), but \(18 \not \mid k\). All of them are ordinary curves of j-invariant 0.

Recall that almost all known hash functions \(\mathcal {H}_i: \{0,1\}^* \rightarrow \mathbb {G}_i\) are the compositions \(\mathcal {H}_i = [c_i] \circ h_i \circ \eta _i\). Here, \(\eta _i: \{0, 1\}^* \rightarrow S_i\) are hash functions to some finite sets, \(h_1: S_1 \rightarrow E_1(\mathbb {F}_{q})\) and \(h_2: S_2 \rightarrow E_2(\mathbb {F}_{q^e})\) are just maps traditionally called encodings, and finally \([c_i]\) is the scalar multiplication by \(c_i\) on the curve \(E_i\). The latter is said to be clearing cofactor. Surprisingly, it is enough and more efficient to multiply outputs of \(h_i\) by specific scalars \(c_i^\prime \in \mathbb {N}\) (different from \(c_i\) as a rule) such that \(r \not \mid c_i^\prime \) and \(c_1^\prime \mid c_1\) due to [6, Section 3.2], [7, Section 3] and conversely \(c_2 \mid c_2^\prime \) due to [5, 8]. The sets \(S_i\) are supposed to be pretty elementary, hence it is easy to combine \(\eta _i\) from existing hash functions \(\{0, 1\}^* \rightarrow \{0, 1\}^\ell \) for \(\ell \in \mathbb {N}\). The most complicated component of \(\mathcal {H}_i\) is no doubt \(h_i\), because its essence is based on high-dimensional algebraic geometry.

The majority of pairing-based protocols require a hash function to at most one group \(\mathbb {G}_1\) or \(\mathbb {G}_2\). Of course, any such protocol can be equivalently implemented for hashing to the other group. Without using point compression-decompression methods, elements of \(\mathbb {G}_1\) (resp., \(\mathbb {G}_2\)) are obviously represented by \(2\lceil \log _2(q) \rceil \) (resp., \(2e \lceil \log _2(q) \rceil \)) bits. Therefore, the choice often depends on whether a hash value should be more compact than the second pairing argument or vice versa. Besides, there are rare protocols, for example Scott’s identity-based key agreement [9], where both hash functions \(\mathcal {H}_i\) are necessary. Thus, the more cumbersome hashing to \(\mathbb {G}_2\) cannot be exchanged for hashing to \(\mathbb {G}_1\) in all situations.

1.1 How not to hash onto \(\mathbb {G}_1\)

As far as the author knows, (non-degenerate) optimal ate pairings \(a: \mathbb {G}_2 \times \mathbb {G}_1 \rightarrow \mu _r \subset \mathbb {F}_{q^k}^*\) are the most used in today’s real-world cryptography. The fact is that the corresponding Miller loop has the hypothetically smallest length \(\approx \log _2(r)/\varphi (k)\), where \(\varphi \) is Euler’s totient function. However, it is more practical to take the whole group \(E_1(\mathbb {F}_{q})\) instead of \(\mathbb {G}_1\). In this case, the pairing \(a: \mathbb {G}_2 \times E_1(\mathbb {F}_{q}) \rightarrow \mu _r\) becomes degenerate, but this is not important. A similar trick is done in [10, Section 5] for the Tate pairing [2, Section 3.2.2] in the context of isogeny-based cryptography, where, on the contrary, \(\mathbb {G}_2\) is replaced by \(E_1(\mathbb {F}_{q^k})\) in our notation.

First, the length of the Miller loop depends only on the order of \(\mathbb {G}_2\). Second, if for points \(P \in E_1(\mathbb {F}_{q})\) and \(Q \in \mathbb {G}_2\) we have \(a(Q, P) = 1\), then a fortiori \(a(Q, c_1^\prime P) = a(Q, P)^{c_1^\prime } = 1\). Further, many popular protocols (such as the aggregate BLS signature [11,12,13]) work correctly whether the order of P equals r or not. It should be borne in mind that the strong unforgeability property (unlike the usual existential one) is not satisfied anymore for this signature as emphasized in [11, Section 5.2]. Nevertheless, in the opinion of [14, Section 7], the existential unforgeability is sufficient in practice. Finally, the complexity of computing a(QP) remains the same as that of computing \(a(Q, c_1^\prime P)\), because P, \(c_1^\prime P\) are equally defined over \(\mathbb {F}_{q}\).

Often, multiplication by a cofactor \(c \in \mathbb {N}\) (on any elliptic curve of order cr) also serves as a kind of protection against the so-called small-subgroup attack [15] exploiting divisors of c smaller than r. In the absence of such divisors, the curve is called subgroup secure [16]. Such curves with \(c > 1\) necessarily possess the value \(\rho := \log _2(cr)/\log _2(r) \geqslant 2\) or, equivalently, \(c \geqslant r\). In fact, applying [c] in some situations is fraught with appearance of critical bugs such as double spending noticed at one time in CryptoNote cryptocurrencies [17]. Nonetheless, for simple protocols the given solution is quite appropriate whenever \(c \ll r\), which is frequently the case for \(c_1\) as opposed to \(c_2\). For instance, clearing cofactor is authorized in the NIST specification [18] of the classical Diffie–Hellman key exchange. By the way, state-of-the-art true subgroup membership tests for pairing-friendly curves are discussed in [7, Section 4], [19].

Assume that we continue dealing with the pairing \(a: \mathbb {G}_2 \times E_1(\mathbb {F}_{q}) \rightarrow \mu _r\), not clearing the cofactor \(c_1\). In this scenario, the BLS signature of a message \(m \in \{0,1\}^*\) (up to the swap of the pairing groups) is the point \(sP \in E_1(\mathbb {F}_{q})\), where \(s \in \mathbb {Z}/r\) and \(P:= (h_1 \circ \eta _1)(m)\). Since s is a secret key, the curve \(E_1\) has to be \(\mathbb {G}_1\)-strong, i.e., subgroup secure with respect to \(\mathbb {G}_1\). Typically, the BLS signature scheme is deployed on BLS12 curves (mostly on BLS12-381) whose value \(\rho \approx 1.5\). As a consequence, at least for this curve family one is obliged to return as the signature the point \(c_1^\prime sP = s\mathcal {H}_1(m) \in \mathbb {G}_1\). Yet, we are really able to benefit from the circumstance that a is efficiently defined on \(\mathbb {G}_2 \times E_1(\mathbb {F}_{q})\). Let’s demonstrate this advantage for the verificator whose computational load is incomparably larger than for each of signers.

Let \(G_2\) be a fixed generator of the group \(\mathbb {G}_2\). For \(1 \leqslant i \leqslant n \in \mathbb {N}\) there are arrays of secret keys \(s_i \in \mathbb {Z}/r\), public keys \(pk_i:= s_iG_2\), messages \(m_i \in \{0,1\}^*\), and signatures \(\sigma _i:= s_i\mathcal {H}_1(m_i)\). The signatures are aggregated into the short one \(\sigma := \sum _{i = 1}^n \sigma _i\). Recall that verifying \(\sigma \) (in a simplified version of the scheme) consists in doing the equality

$$\begin{aligned} a(G_2, \sigma )&= \prod _{i = 1}^n a\big ( pk_i, \mathcal {H}_1(m_i) \big ), \quad \text {that is}, \quad \\ a(G_2, \sigma )&= \left( \prod _{i = 1}^n a\big ( pk_i, (h_1\circ \eta _1)(m_i) \big ) \right) ^{c_1^\prime }. \end{aligned}$$

The scalar multiplication \([c_1^\prime ]\) in \(E_1(\mathbb {F}_{q})\) is itself cheaper than the exponentiation to \(c_1^\prime \) in \(\mathbb {F}_{q^k}\), unless k is a tiny number (say, \(k \leqslant 4\)). Such embedding degrees are beyond practical use. However, starting from a certain moderate value of n, the second equality clearly becomes less expensive to check. The given value heavily depends on many factors, albeit it is readily determined in each concrete case. Meantime, the aggregate BLS signature is mainly employed in systems intended for huge n, otherwise it makes little sense to abandon more traditional pairing-free signatures such as ECDSA.

Notice that the described trick resembles batching the final exponentiation of multi-pairing (see, e.g., [20]). Incidentally, when all the messages coincide \((m:= m_1 = \cdots = m_n)\), the verification process reduces to the quicker test \(a(G_2, \sigma ) = a\big ( pk, \mathcal {H}_1(m) \big )\), where \(pk:= \sum _{i = 1}^n pk_i\). In this setting, \(\sigma \) is said to be a multi-signature, while the proposed trick does not yield any performance gain, since the hash-to-curve function arises in a single copy. It is also worth saying that the trick holds valid for various modifications of the scheme thwarting the rogue public-key attack [12, Section 1.1]. Lastly, it is evident that the trick can be safely tailored to totally other cryptographic protocols involving simultaneously multi-pairing and hashing to \(E_1\).

To be honest, pairing-friendly curves with \(\rho \geqslant 2\) are niche, although they are actually used in (one layer) proof compositions [6, 21], not for the BLS signature scheme. The author does not know if it is possible to (efficiently) generate \(\mathbb {G}_1\)-strong elliptic curves with \(\rho \approx 2\) suitable for such compositions. As seen from [19, Table 1], the Cocks–Pinch curve CP6-782 and Brezing–Weng curve BW6-761 (on top of the curve BLS12-377 [22]) are not \(\mathbb {G}_1\)-strong. Note that BW6-761 was found in [21] to replace CP6-782, an order of magnitude slower curve: The former is of j-invariant 0 (not to mention the smaller q) in contrast to the latter. So, CP6-782 is most likely not applied anymore.

It is clear that for \(\rho \approx 2\) there is maximum one prime divisor \(\ell \mid c_1\) such that \(\ell \geqslant r\). Let’s imagine that such a divisor occurs, albeit this still has nothing to do with the curves CP6-782, BW6-761. For compactness, put \(\widetilde{c}_1:= c_1/\ell \in \mathbb {N}\). No doubt, one can securely work in the intermediate group \({\widetilde{\mathbb {G}}}_1:= [\widetilde{c}_1]E_1(\mathbb {F}_{q})\) of order \(r\ell \). It is seemingly much easier to construct a curve \(E_1\) on top of BLS12-377 with the given weaker property than with the \(\mathbb {G}_1\)-strongness. At the same time, we have the substantially faster hash function \({\widetilde{\mathcal {H}}}_1:= [\widetilde{c}_1] \circ h_1 \circ \eta _1: \{0,1\}^* \rightarrow {\widetilde{\mathbb {G}}}_1\). Indeed, the number \(\widetilde{c}_1\) is close to 1 (ideally, \(\widetilde{c}_1 = 1\)), hence the component \(h_1\) is the unique bottleneck for evaluating \({\widetilde{\mathcal {H}}}_1\). One more time, the pairing \(a: \mathbb {G}_2 \times {\widetilde{\mathbb {G}}}_1 \rightarrow \mu _r\) and its restriction on \(\mathbb {G}_2 \times \mathbb {G}_1\) (as well as the group operations in \(\mathbb {G}_1\), \({\widetilde{\mathbb {G}}}_1\)) are equivalent in the computational aspect. Thus, the group \({\widetilde{\mathbb {G}}}_1\) is unambiguously better than \(\mathbb {G}_1\).

1.1.1 How to hash onto \(E_1: y^2 = x^3 + b\) provided that \(\sqrt{b} \in \mathbb {F}_{q}\)

The previous section demonstrates several scenarios when the scalar multiplication \([c_1^\prime ]\) on the curve \(E_1\) is not required, while the corner map \(h_1: S_1 \rightarrow E_1(\mathbb {F}_{q})\) is inevitable. That is why its acceleration is an important task despite the fact that \([c_1^\prime ]\) may be a (drastically) slower map.

In [23] an encoding \(h_1: \mathbb {F}_{q}^2 \rightarrow E_1(\mathbb {F}_{q})\) is constructed for elliptic curves \(E_1\) as in the title of the present section. There, it is proved that \(h_1\) is admissible in the sense of [24, Definition 4], which leads (in compliance with [24, Theorem 1]) to the indifferentiable hash function \(h_1 \circ \eta _1\). It is worth clarifying that indifferentiable (from a random oracle) hashing is meant as in [24, Section 2.2]. Moreover, the only bottleneck of \(h_1\) consists in extracting one cubic root in \(\mathbb {F}_{q}\). For \(q \not \equiv 1 \, (\textrm{mod} \, 27)\) the latter can be implemented in constant time of raising to some power \(n_1 \in \mathbb {N}\) in the field \(\mathbb {F}_{q}\).

Lemma 1

Each BLS curve \(E_1\) fits the title condition.

Proof

It is suggested to borrow and properly complete the proof of [6, Proposition 2]. As said in it, always \(3 \mid c_1\), i.e., there is a point \((x_0,y_0) \in E_1(\mathbb {F}_{q})[3]\). As a result, \(x_0\) is a root of the 3-division polynomial \(\psi _3(x) = 3x(x^3 + 4b)\) of the curve \(E_1\). If \(x_0 = 0\), then \(y_0 = \sqrt{b}\). Otherwise, \(x_0 = -\root 3 \of {4b}\) and hence \(y_0 = \sqrt{-3b}\). Since \(E_1\) is an ordinary curve, \(\sqrt{-3} \in \mathbb {F}_{q}\) as is known and thus \(\sqrt{b} \in \mathbb {F}_{q}\). The lemma is proved. \(\square \)

In particular, the aforementioned encoding \(h_1\) is applicable to the curve BLS12-381 for which \(b = 4\) and \(n_1 = (q-10)/27\). Recall that famous (indirect) Wahby–Boneh’s encoding \(h_{WB}\) [25, Section 4] (based on the simplified SWU one [24, Section 7]) is also valid for BLS12-381. It requires to extract one square root in \(\mathbb {F}_{q}\), which for that curve is equivalent to raising in \(\mathbb {F}_{q}\) to the power \(n_2:= (q-3)/4 \in \mathbb {N}\). The hash function \(\textrm{H}_2\) from [25, Section 5] twice applies \(h_{WB}\) in order to act as a random oracle. By the way, the other indifferentiable hash function \(\textrm{H}_3\) is even less performant than \(\textrm{H}_2\) by virtue of [25, Figure 1].

To be exact, the Hamming weight \(w(n_1) = 192\) and \(w(n_2) = 228\). Denote by \(\ell (n_i)\) the length of a shortest addition chain for \(n_i\). In accordance with [26, Section 9.2.1], we establish the inequalities

$$\begin{aligned} 382 \le \ell (n_1) \lesssim 419, \quad 385 \le \ell (n_2) \lesssim 422. \end{aligned}$$

One cannot claim that these upper bounds are mathematically correct, because o(1) is omitted in contrast to the original inequality. However, in any case, the sought bounds are very close (probably equal) to ours.

On the other hand, following the sliding window method [26, Section 9.1.3] (with \(k = 5\)), the author explicitly derives in Magma [27] an addition chain for \(n_1\) (resp., \(n_2\)) whose length equals 449 (resp., 458). Curiously, a similar chain for \(n_2\) of the same length 458, obtained by means of more advanced methods, appears in the optimized library blst, namely in [28]. Thus, the encoding \(h_{WB}\) applied twice is much slower than the encoding \(h_1\) applied once. Indeed, \(2\cdot 458 - 449 = 467\) is a significant amount of multiplications in \(\mathbb {F}_{q}\) that can be eliminated by giving priority to \(h_1\) rather than two copies of \(h_{WB}\).

The author provides in [29] a general reference implementation of \(h_1\) in Sage. The corresponding Rust implementation and benchmarks for BLS12-381 (and BLS12-377) are given in [30] by Zhang who uses the famous library arkworks as a base. His low-level comparison of running time shows that the new encoding is actually more efficient than \(h_{WB}\).

1.2 How to hash onto \(\mathbb {G}_2\)

To the author’s knowledge, optimal ate pairings do not have a natural extension to \(E_2(\mathbb {F}_{q^e}) \times \mathbb {G}_1\). Conversely, (non-degenerate) twisted optimal ate pairings [2, Theorem 3.3.8] of the form \(\mathbb {G}_1 \times \mathbb {G}_2 \rightarrow \mu _r\) are readily extended to \(\mathbb {G}_1 \times E_2(\mathbb {F}_{q^e})\). But for them the Miller loop is mostly of a larger length than for (usual) optimal ate pairings. It is widely recognized that a pairing is a more laborious operation than an elliptic curve scalar multiplication. Therefore, reducing the Miller loop seems a better solution than avoiding the multiplication by \(c_2^\prime \).

For the sake of convenience, introduce so-called tensor multiplication of any two maps \(h: S \rightarrow G\), \(g: T \rightarrow G\) from sets S, T to the same group \((G, +)\):

$$\begin{aligned} h \otimes g: S \times T \rightarrow G \qquad (s, t) \mapsto h(s) + g(t). \end{aligned}$$

We know (e.g., from [2, Theorem 2.11]) that \(E_2(\mathbb {F}_{q^e}) \simeq \mathbb {Z}/(mr) \times \mathbb {Z}/\ell \), where \(\ell \mid m\) and \(m\ell = c_2\). Pick any independent points \(P_0, P_1 \in E_2(\mathbb {F}_{q^e})\) of orders m and \(\ell \), respectively. The independency is in the sense that \(P_1 \in E_2(\mathbb {F}_{q^e}) {\setminus } \langle P_0 \rangle \) if \(\ell > 1\), and \(P_1 = (0: 1: 0)\) if \(\ell = 1\). Consider the set \(V:= \mathbb {Z}/m \times \mathbb {Z}/\ell \) and the maps

$$\begin{aligned} \begin{array}{ll} g: V \rightarrow \langle P_0, P_1 \rangle \!=\! E_2(\mathbb {F}_{q^e})/\mathbb {G}_2 &{} \quad (v_0,v_1) \mapsto v_0P_0 \!+\! v_1P_1, \\ F: \mathbb {F}_{q^e} \times V \rightarrow \mathbb {G}_2 &{} \quad F:= [c_2^\prime ] \circ (h_2 \otimes g). \end{array} \end{aligned}$$

These maps resemble those of [24, Sections 1, 5, 6.1] except for the principal fact that therein \(g: V \times \mathbb {Z}/r \rightarrow E_2(\mathbb {F}_{q^e})\) or \(g: \mathbb {Z}/r \rightarrow \mathbb {G}_2\) in our notation.

Below, we need the notions of (B-)well-distributed encoding [31, Definitions 5] and (\(\epsilon \)-)regular map [31, Definition 3] with respect to the uniform distribution on its domain. It is also worth clarifying what exactly a “negligible” quantity \(\epsilon \in \mathbb {R}_{\geqslant 0}\) will mean for us. Apart from the slow construction from [24, Section 5], all known regular encodings (to an ordinary elliptic \(\mathbb {F}_{q}\)-curve E) are of the form \(\mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\). For all of them \(\epsilon = cq^{-1/2} + O(q^{-1})\) (cf. [24, Theorem 3]) with a small positive constant c. As a confirmation of the given words, see [1, Lemma 4], [23, Corollary 2], [32, Corollary 4], [33, Corollary 2], and Lemma 2. Since, in turn, \(\log _2(q)/2 \gtrsim \lambda \) for a desirable security level \(\lambda \) (typically, \(\approx 128\)), we will count \(\epsilon \) negligible when \(\log _2(\epsilon ) \lesssim -\lambda \).

This definition does not fit the alternative one accepted in [31] and some other sources on the topic in accordance with which negligibility takes place if \(\epsilon = o(\log (q)^{-n})\) for all \(n \in \mathbb {N}\). As usual, the disadvantage of the asymptotic definition is in supposing that q tends to infinity, although in life q is always a concrete number. That is why it is necessary to be prudent in utilizing the contributions of [31, Section 2.3] despite their attractiveness.

Theorem 1

Assume that \(h_2: \mathbb {F}_{q^e} \rightarrow E_2(\mathbb {F}_{q^e})\) is a B-well-distributed encoding (for \(B \in \mathbb {R}_{\geqslant 0})\). Then, the map F is \(\epsilon \)-regular, where \(\epsilon := B\sqrt{r/q^e}\). As a result, \(\epsilon \) is negligible whenever \(\log _2(B) \lesssim 0\) and \(\log _2(c_2) \gtrsim \log _2(r)\), which includes the case \(e \geqslant 2\).

Proof

The indicated value \(\epsilon \) is immediately derived from [31, Corollary 1] and [24, Lemma 13]. Besides,

$$\begin{aligned} \log _2(\epsilon )&\lesssim \log _2\big ( \sqrt{r/q^e} \big ) = \dfrac{\log _2(r/q^e)}{2} \approx \dfrac{-\log _2(c_2)}{2} \\&\lesssim \dfrac{-\log _2(r)}{2} \lesssim -\lambda . \end{aligned}$$

Lastly,

$$\begin{aligned} \log _2(c_2)&\approx \log _2(q^e/r) = e\log _2(q) - \log _2(r)\\&\gtrsim (e-1)\log _2(r), \end{aligned}$$

which is \(\geqslant \log _2(r)\) once \(e \geqslant 2\). The theorem is proved. \(\square \)

This theorem can be in principle reformulated with an abstract elliptic \(\mathbb {F}_{q}\)-curve E and a well-distributed encoding \(\mathbb {F}_{q} \rightarrow E(\mathbb {F}_{q})\). However, the author decided to emphasize its relevance solely for curves \(E_2\). Truly, all real-world pairing-friendly curves \(E_1\) (not to mention plain ones) have the value \(\rho \lesssim 2\), that is, \(\log _2(c_1) \lesssim \log _2(r)\). Thereby, the theorem’s premise is fulfilled exclusively in the borderline case \(\rho \approx 2\). As remarked in Sect. 1.1, this happens for niche curves with the parameters \(k = d = 6\) such as BW6-761 or other BW6 curves from [6, Sections 4, 5]. For them \(E_2\) is a sextic twist equally defined over \(\mathbb {F}_{q}\). Thus, the pairing groups \(\mathbb {G}_1\), \(\mathbb {G}_2\) can be interpreted in a dual way. In other words, the results of Sect. 1.1 and of the current one are applicable to both of them. In particular, we do not lose the generality in the above theorem.

Note that F is a samplable map (in the sense of [24, Definition 4]) if, as is often the case, \(h_2\) enjoys a large image, that is, \(\# \textrm{Im}(h_2) = \Theta (q^e)\). Indeed, this property follows from [24, Lemma 13] and [31, Algorithm 1]. Eventually, we establish a series of corollaries.

Corollary 1

The map F is admissible.

Corollary 2

If a hash function \(\eta : \{0, 1\}^* \rightarrow \mathbb {F}_{q^e}\) is indifferentiable from a random oracle, then so is the hash function \([c_2^\prime ] \circ h_2 \circ \eta : \{0, 1\}^* \rightarrow \mathbb {G}_2\) (denoted by \(\textrm{H}_4\) in [25, Section 5]).

Proof

Take another random oracle \(\theta : \{0, 1\}^* \rightarrow V\). Therefore, the functions \((\eta , \theta )(s):= \big ( \eta (s), \theta (s) \big )\) and hence \(F \circ (\eta , \theta ): \{0, 1\}^* \rightarrow \mathbb {G}_2\) also act as a random oracle (the second fact is [24, Theorem 1]). Finally, obviously, \(\textrm{H}_4 = F \circ (\eta , \theta )\). \(\square \)

For the BLS12-381 curve \(E_2: y^2 = x^3 + 4(1+i)\) (where \(i:= \sqrt{-1} \not \in \mathbb {F}_{q}\)) in the role of \(h_2\) the article [25, Section 5] proposes Wahby–Boneh’s encoding. However, that article does not notice the indifferentiability of \(\textrm{H}_4\). By the way, the other (indifferentiable) hash functions \(\textrm{H}_5\), \(\textrm{H}_6\) are even slower than \(\textrm{H}_4\) by virtue of [25, Figure 1].

2 Batching two “relatively prime” encodings

Hash functions to classical (i.e., non-pairing-friendly) elliptic curves have become more and more in demand as well. Indeed, according to [34, Table I], they are actively used in many PAKE (Password-Authenticated Key Exchange) protocols. Incidentally, several years ago CFRG (Crypto Forum Research Group) conducted the PAKE selection process [35] in which the protocols CPace [36] and OPAQUE [37] won. Besides, such hash functions are necessary for some blind signatures (e.g., from [38, Section 3.3], [39, Section 6]), which serve as a basis of modern electronic voting systems. It is also worth mentioning that hashing to elliptic curves is applied in OPRFs (Oblivious Pseudorandom Functions) [40], among others, in the 2HashDH scheme [41, Section 3.1], [42, Section 3].

2.1 How to hash onto \(E(\mathbb {F}_{q})\) provided that \(q \equiv 2 \, (\textrm{mod} \, 3)\) and \(j(E) \ne 0, 1728\)

Consider an elliptic curve \(E: y^2 = x^3 + ax + b\) defined over a finite field \(\mathbb {F}_{q}\). Under the condition \(q \equiv 2 \, (\textrm{mod} \, 3)\) (resp., \(j(E) \ne 0, 1728\)), Icart’s encoding \(h_I\) [43] (resp., the simplified SWU one \(h_{sSWU}\)) is available. In accordance with [43, Lemma 4], [24, Lemma 6], for any \(P \in E(\mathbb {F}_{q})\) we have \(\# h_I^{-1}(P) \leqslant 4\) and \(\# h_{sSWU}^{-1}(P) \leqslant 8\). In fact, if an implementation of \(h_{sSWU}\) takes into account the sign of the y-coordinate, then \(\# h_{sSWU}^{-1}(P) \leqslant 4\). At the same time, by virtue of [32, Section 5], the encoding \(h_I\) (resp., \(h_{sSWU}\)) is B-well-distributed with \(B = 13\) (resp., \(B = 53\)) at least for q of a cryptographic size. Applying [31, Corollary 1], we thus get the next statement.

Lemma 2

The map \(F:= h_I \otimes h_{sSWU}: \mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\) is \(\epsilon \)-regular for the negligible value \(\epsilon := 26 \sqrt{N}/q\), where \(N:= \# E(\mathbb {F}_{q})\).

From now on, we assume in addition that \(q \equiv 3 \, (\textrm{mod} \, 4)\). Obviously,

$$\begin{aligned} q \equiv 2 \, (\textrm{mod} \, 3), \; q \equiv 3 \, (\textrm{mod} \, 4) \quad \Leftrightarrow \quad q \equiv 11 \, (\textrm{mod} \, 12). \end{aligned}$$

For the sake of compactness, introduce the naturals

$$\begin{aligned} \ell&:= \dfrac{2q-1}{3}, \quad e:= \dfrac{q+1}{4}, \\ k&:= \dfrac{q+1}{12} = \ell e \; \Big (\textrm{mod} \, \dfrac{q-1}{2} \Big ). \end{aligned}$$

Given \(Z = n/d\) such that \(n, d \in \mathbb {F}_{q}^*\), we obtain:

$$\begin{aligned} z:= Z^{k}&= n^{k} \cdot d^{q-1 - k} = n^{k} \cdot d^{(11q - 13)/12}\\&= nd^9 \cdot (nd^{11})^{(q-11)/12},\\ z^6&= Z^{(q+1)/2} = \left( \frac{Z}{q} \right) Z, \end{aligned}$$

where \( \left( \frac{Z}{q} \right) \) is the Legendre symbol. In particular, \(z = \root 6 \of {Z}\) whenever Z is a quadratic residue in \(\mathbb {F}_{q}\).

Given \((t, s) \in \mathbb {F}_{q}^2\), we need to evaluate \(h_I(t)\) and \(h_{sSWU}(s)\). As is known, separately each of these points can be computed in constant time of one exponentiation in \(\mathbb {F}_{q}\) (see the case of \(h_{sSWU}\) in [25, Section 4.2]). Let’s show that this is also possible simultaneously for the two points (and hence for F(ts)). The only cumbersome part of \(h_I\) (resp., \(h_{sSWU}\)) consists in the exponentiation \(\root 3 \of {f} = f^{\ell }\) (resp., \(\pm g^{e}\) such that \((g^{e})^2 = \left( \frac{g}{q} \right) g\)), where

$$\begin{aligned} f:= \left( \dfrac{3a - t^4}{6t} \right) ^{2} - b - \dfrac{t^6}{27}, \quad g:= -\dfrac{b}{a} \left( 1 + \dfrac{1}{s^4 - s^2} \right) . \end{aligned}$$

Evidently, \(f^\ell \) is the unique cubic root of f in \(\mathbb {F}_{q}\) and for our purpose it is sufficient to find \(g^e\) up to a sign. For the sake of simplicity, we exclude from consideration the zeros and poles of the functions f, g. As usual, they can be processed individually.

It is suggested to act in a similar way as in [44, Section 3], that is, for \(Z:= f^2 g^3\) to compute \(z = Z^k\) (almost \(\root 6 \of {Z}\)) instead of computing separately \(\root 3 \of {f}\) and \(\pm g^e\) (almost \(\sqrt{g}\)). Note that

$$\begin{aligned} z = f^{(q+1)/6} \cdot g^{e} = \left( \frac{f}{q} \right) \root 3 \of {f} \cdot g^{e}, \quad z^2 = \root 3 \of {f^2} \cdot \left( \frac{g}{q} \right) g. \end{aligned}$$

Introducing the auxiliary notation \(\theta := fg/z^2\), we get the equalities

$$\begin{aligned} \root 3 \of {f} = \dfrac{ \left( \frac{g}{q} \right) fg}{z^2} = \left( \frac{g}{q} \right) \theta ,\quad g^e = \dfrac{z}{ \left( \frac{f}{q} \right) \root 3 \of {f}} = \dfrac{z}{ \left( \frac{fg}{q} \right) \theta }. \end{aligned}$$

We see that \(\theta ^3 = \left( \frac{g}{q} \right) f\) and \(z^6 = \left( \frac{g}{q} \right) Z\). Therefore, the symbol \( \left( \frac{g}{q} \right) \) can be determined for free. More formally,

$$\begin{aligned} \big ( \root 3 \of {f},\, \pm g^e \big ) = {\left\{ \begin{array}{ll} \begin{array}{ll} \big ( \theta ,\, z/\theta \big ) &{} \quad \text {if} \quad \theta ^3 = f, \, \mathrm {i.e.}, \, z^6 = Z, \\ \big (-\theta ,\, z/\theta \big ) &{} \quad \text {otherwise}. \end{array} \end{array}\right. } \end{aligned}$$

Bearing in mind the formula above for \((n/d)^k\) without the inversion operation, we completely justify the next remark.

Remark 1

The map F (in contrast to \(h_I^{\otimes 2}\) and \(h_{sSWU}^{\otimes 2}\)) can be computed in constant time of one exponentiation in \(\mathbb {F}_{q}\).

By analogy with [27], given q, it is of course not difficult to derive explicit short addition chains for raising to the power k. Besides, F is a samplable map due to [31, Algorithm 1], which eventually leads to the following result.

Corollary 3

The map \(F: \mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\) is admissible.

Remark 1 holds valid when \(h_I\) (resp., \(h_{sSWU}\)) is replaced by any encoding implementable with the cost of extracting one cubic (resp., square) root in \(\mathbb {F}_{q}\). It is logical to choose \(h_I\) and \(h_{sSWU}\), because they are the most universal among such encodings known in the literature. In particular, these encodings are relevant even if N is a prime (that is, the cofactor equals 1), which is the case for many widespread elliptic curves. Note that for \(q \equiv 11 \, (\textrm{mod} \, 12)\) curves of j-invariants 0, 1728 are supersingular in compliance with [26, Section 24.2.1.c]. Since such curves pose special challenges for security by virtue of [2, Remark 2.22], the map \(h_{sSWU}\) does not have restrictions in the current context.

There are a lot of standardized elliptic curves over fields \(\mathbb {F}_{q}\) such that \(q \equiv 11 \, (\textrm{mod} \, 12)\). It is readily checked that this condition is fulfilled, e.g., for the (unique) French curve FRP256v1 [45], for the curves P-192, P-384, and Curve448-Goldilocks from NIST SP 800-186 [46, Section 4.2.1] as well as for all Russian curves [47, Appendix B] except for id-GostR3410-2001-CryptoPro-B-ParamSet. Remark 1 remains true in the case \(q \equiv 2 \, (\textrm{mod} \, 3)\), \(q \equiv 5 \, (\textrm{mod} \, 8)\) when a square root is still expressed via one exponentiation (see, e.g., [3, Appendix I.2]). However, the author has not encountered standardized curves over such fields, hence it was decided to stop at this moment in order not to inflate the text length. If required, the reader can easily repeat the conducted reasoning.

It is worth separate mentioning that unlike \(q \equiv 3 \, (\textrm{mod} \, 4)\), the remainder \(q \equiv 5 \, (\textrm{mod} \, 8)\) can arise when the j-invariant 1728 is ordinary, while the second assumption \(q \equiv 2 \, (\textrm{mod} \, 3)\) has nothing to do with the ordinariness of 1728. One might wonder if it is feasible to carry over the tensor-multiplication map F to this important case by changing \(h_{sSWU}\) to a suitable square-root encoding \(\mathbb {F}_{q} \rightarrow E(\mathbb {F}_{q})\), e.g., from [50]. This turns out to be meaningless, because the work [33] constructs an admissible map \(\mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\) to all curves E of j-invariant 1728 regardless of the remainder of q modulo 3. The point is that the latter map needs to find a quartic root in \(\mathbb {F}_{q}\) rather than a sextic one. In addition, it is shown there that \(\root 4 \of {\cdot } \in \mathbb {F}_{q}\) is really represented via one exponentiation in \(\mathbb {F}_{q}\) once \(q \equiv 5 \, (\textrm{mod} \, 8)\).

2.2 Generalization of Icart’s encoding

The given section answers the following curious question.

Question 1

Do we know an example of a superelliptic \(\mathbb {F}_{q}\)-curve \(C: y^{m} = f(x)\) and an \(\mathbb {F}_{q}\)-cover \(\chi : C \rightarrow E\) (of small degree) onto some ordinary elliptic \(\mathbb {F}_{q}\)-curve E provided that \(m \in \mathbb {N}\) is relatively prime with \(3(q-1)\)?

The case \(m = 2\) is obviously impossible for odd q we deal with in this article. In turn, the case \(m = 3\) is deliberately excluded, because it is treated by Icart to the full extent. If the question has a positive answer, then we have yet another encoding \(h_\chi := \chi \circ pr_x^{-1}: \mathbb {F}_{q} \rightarrow E(\mathbb {F}_{q})\), where \(pr_x\) is the projection from C to the x-coordinate. Truly, since \(\textrm{GCD}(m, q-1) = 1\), for every \(x \in \mathbb {F}_{q}\) there is a unique \(\mathbb {F}_{q}\)-root \(y = \root m \of {f(x)} = f(x)^{m^{-1} \, (\textrm{mod} \, q-1)}\). Of course, from the practical point of view, \(h_\chi \) is only useful if Icart’s encoding is not applicable, that is, \(q \equiv 1 \, (\textrm{mod} \, 3)\). However, in this section the remainder of q modulo 3 is insignificant.

By analogy with Sect. 2.1, we also possess the map \(F_\chi := h_\chi \otimes h_{sSWU}: \mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\). Note that \(\# h_\chi ^{-1}(P) \leqslant \deg (\chi )\) for each point \(P \in E(\mathbb {F}_{q})\). This property is sufficient for \(F_\chi \) to be regular (and hence admissible), because \(h_{sSWU}\) is already well-distributed. Moreover, since m is odd, we can extract the roots \(\root m \of {f}\) and \(\sqrt{g}\) at the cost of extracting the root \(z:= \root 2m \of {Z}\), where \(Z:= f^2g^m\). Indeed, \(z = \root m \of {f}\sqrt{g}\), hence \(\root m \of {f} = fg^{(m-1)/2}/z^{m-1}\) and \(\sqrt{g} = z/\root m \of {f}\). Furthermore, z is expressed via one exponentiation in \(\mathbb {F}_{q}\) whenever this is true for \(\sqrt{g}\), e.g., in the case \(q \equiv 3 \, (\textrm{mod} \, 4)\), because \(z = \smash {\sqrt{\root m \of {Z}}}\). As well as for \(m=3\), the reasoning is readily extended, taking into account the sign \( \left( \frac{Z}{q} \right) = \left( \frac{g}{q} \right) \) in the equality \(z^{2m} = \left( \frac{g}{q} \right) Z\).

Denote by \({\overline{C}} \subset \mathbb {P}^2\) and \(\overline{pr_x}: {\overline{C}} \rightarrow \mathbb {P}^1\) the projective closure of C and \(pr_x\), respectively. It is quite evident that \(\overline{pr_x}\) (resp., \({\overline{C}}\)) fits the definition of an exceptional cover (resp., median value curve) in the sense of [48] no matter \(m \geqslant \deg (f)\) or not. By definition, \(\overline{pr_x}: {\overline{C}}(\mathbb {F}_{q}) \rightarrow \mathbb {P}^1(\mathbb {F}_{q})\) is a bijection and \(\#{\overline{C}}(\mathbb {F}_{q}) = q+1\). As a result, \(h_\chi \) is an Icart-like encoding in the terminology of [49]. So, \(h_\chi \) is not “almost surjective”, that is, \(\#h_\chi (\mathbb {F}_{q}) \ne q + o(q)\). Therefore, this encoding itself cannot be admissible. Nevertheless, it is not required due to the availability of \(F_\chi \). Incidentally, another type of Icart-like encodings is studied in [50] for elliptic curves of j-invariants 0, 1728 whose Frobenius trace has a small divisor.

Interestingly, the author found only one desired example (with \(m=7\)) in the existing literature on elliptic curves. Consider the Klein quartic \(K:= X^3Y + Y^3Z + Z^3X\) on the projective plane \(\mathbb {P}^2_{(X:Y:Z)}\). It is clearly a non-singular curve of genus 3. The detailed information about the Klein quartic is represented in the book [51] and especially in its chapter “The Klein quartic in number theory”. In particular, there is a birational isomorphism between K and \(C_7: y^7 = x^2(x+1)\) of the form

$$\begin{aligned} \begin{array}{ll} \tau : K \dashrightarrow C_7 &{}\quad (X:Y:Z) \mapsto \left( \dfrac{Y^3}{Z^2X}, -\dfrac{Y}{Z} \right) , \\ \tau ^{-1}: C_7 \dashrightarrow K&{} \quad (x,y) \mapsto \left( {-}\dfrac{y^3}{x}: -y: 1 \right) . \end{array} \end{aligned}$$

Furthermore, we have a cover

$$\begin{aligned} \chi : C_7 \rightarrow E_7 \qquad (x,y) \mapsto \left( \dfrac{\textrm{num}_x}{\textrm{den}}, \dfrac{\textrm{num}_y}{\textrm{den}} \right) \end{aligned}$$

onto the elliptic curve \(E_7: y^2 = x^3 + (21/4)x^2 + 7x\), where

$$\begin{aligned} \textrm{num}_x&:= 2y\big ( x^2(-y^7 - y^5 - y^2 + y + 1)\\&\quad + x(-y^2 + y + 1)y^7+(y^5 + y^2 - y - 1)y^7 \big ), \\ \textrm{num}_y&:= x^2(3y^8 + 2y^7 + 3y^6 + 2y^5 + 2y^4 - y^3-3y^2 \\&\quad - 3y - 2)+ x(2y^4 - y^3 - 3y^2 - 3y - 2)y^7 \\&\quad + (-3y^6 - 2y^5 - 2y^4 + y^3 + 3y^2 + 3y + 2)y^7, \\ \textrm{den}&:= 2y^3(x^2 + xy^7 - y^7). \end{aligned}$$

The correctness of these formulas is checked in Magma [27]. Similar formulas (with respect to another model of \(E_7\)) are given in [52]. The cover \(\chi \) is nothing but the composition of \(\tau ^{-1}\) and the canonical map \(K \rightarrow K/\phi \simeq E_7\), where \(\phi \) is the cyclic shift \((X:Y:Z) \mapsto (Y:Z:X)\) on K. Inter alia, \(\deg (\chi ) = 3\). Finally, it does not seem that \(\chi \) can be easily modified to cover over \(\mathbb {F}_{q}\) the quadratic twist of \(E_7/\mathbb {F}_{q}\).

Put \(\zeta \) to be a fixed primitive 7-th root of unity and \(\alpha := (-1 + \sqrt{-7})/2 = \zeta ^4 + \zeta ^2 + \zeta \). It is known that \(j(E_7) = -3375 = -3^3 5^3\) and the natural lift \(E_7/\mathbb {Q}\) has complex multiplication by \(\mathbb {Z}[\alpha ]\) in the sense of [26, Section 18.1.1]. By the way, such pairing-friendly elliptic curves occur in the fresh paper [53]. The curve \(E_7/\mathbb {F}_{q}\) is ordinary if and only if \( \left( \frac{-7}{p} \right) = 1\) (see, e.g., [2, Section 4.3]), where \(p > 7\) denotes the characteristic of the field \(\mathbb {F}_{q}\). According to the quadratic reciprocity law [26, Section 2.3.4.a], this is equivalent to the condition \( \left( \frac{p}{7} \right) = 1\), that is, \(p \equiv 1, 2, 4 \, (\textrm{mod} \, 7)\). We need to exclude the case \(p \equiv 1 \, (\textrm{mod} \, 7)\) with regard to Question 1. In other words, \(\sqrt{-7} \in \mathbb {F}_{p}\), while \(\zeta \not \in \mathbb {F}_{p}\). More precisely, \(\mathbb {F}_{p}(\zeta ) = \mathbb {F}_{p^3}\), because the extension degree \([\mathbb {Q}(\zeta ): \mathbb {Q}] = \varphi (7) = 6\). Thus, we are interested in prime powers \(q \equiv 2,4 \, (\textrm{mod} \, 7)\).

Looking ahead, all the next equalities are verified in an elementary way. For \(\ell := 7^{-1} \, (\textrm{mod} \, q-1)\) and \(Z = n/d\) such that \(n, d \in \mathbb {F}_{q}^*\) we obtain:

The case \(q \equiv 2 \, (\textrm{mod} \, 7)\):

$$\begin{aligned} \ell&= \dfrac{6q-5}{7} \in \mathbb {N}, \quad \root 7 \of {Z} = Z^{\ell }= n^{\ell } \cdot d^{q-1 - \ell } \\&= n^{\ell } \cdot d^{(q - 2)/7} = n \cdot (n^6d)^{(q-2)/7}; \end{aligned}$$

The case \(q \equiv 4 \, (\textrm{mod} \, 7)\):

$$\begin{aligned} \ell&= \dfrac{2q-1}{7} \in \mathbb {N}, \quad \root 7 \of {Z} = Z^{\ell }= n^{\ell } \cdot d^{q-1 - \ell } \\&= n^{\ell } \cdot d^{(5q-6)/7} = nd^{2} \cdot (n^2d^{5})^{(q-4)/7}. \end{aligned}$$

Besides, for

$$\begin{aligned} e:= \dfrac{q+1}{4} \in \mathbb {N}, \quad k:= \ell e \left( \textrm{mod} \, \dfrac{q-1}{2} \right) , \quad L:= \left( \frac{Z}{q} \right) \end{aligned}$$

we eventually have:

The case \(q \equiv 2 \, (\textrm{mod} \, 7), \, q \equiv 3 \, (\textrm{mod} \, 4)\) or, equivalently, \(q \equiv 23 \, (\textrm{mod} \, 28)\):

$$\begin{aligned} k&= \dfrac{5q-3}{28} \in \mathbb {N}, \quad \root 14 \of {LZ} = Z^{k} = n^{k} \cdot d^{q-1 - k} \\&= n^{k} \cdot d^{(23q - 25)/28} = n^4d^{18} \cdot (n^5d^{23})^{(q-23)/28}; \end{aligned}$$

The case \(q \equiv 4 \, (\textrm{mod} \, 7), \, q \equiv 3 \, (\textrm{mod} \, 4)\) or, equivalently, \(q \equiv 11 \, (\textrm{mod} \, 28)\):

$$\begin{aligned} k&= \dfrac{11q-9}{28} \in \mathbb {N}, \quad \root 14 \of {LZ} = Z^{k} = n^{k} \cdot d^{q-1 - k} \\&= n^{k} \cdot d^{(17q - 19)/28}= n^4d^6 \cdot (n^{11}d^{17})^{(q-11)/28}. \end{aligned}$$

3 Taxonomy of hash functions to elliptic curves

This section aims to systematize known results on hashing to elliptic \(\mathbb {F}_{q}\)-curves E. Table 1 contains state-of-the-art admissible encodings of the form \(\mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\). For the sake of completeness, Table 2 exhibiting encodings \(\mathbb {F}_{q} \rightarrow E(\mathbb {F}_{q})\) is also included. The latter are not regular, because their full images are only proportions of the whole group \(E(\mathbb {F}_{q})\). Nevertheless, in a series of situations they are more efficient than the former. The point is that some protocols remain secure whether a used hash function \(\{0, 1\}^* \rightarrow E(\mathbb {F}_{q})\) acts as a random oracle or not. In the literature there are a lot of other encodings to elliptic curves. The tables demonstrate only those, which arose earlier and which are, at the same time, the best at least for certain E and \(\mathbb {F}_{q}\).

Table 1 Taxonomy of admissible encodings \(\mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\) to elliptic \(\mathbb {F}_{q}\)-curves \(E: y^2 = x^3 + ax + b\)
Table 2 Taxonomy of (non-admissible) encodings \(\mathbb {F}_{q} \rightarrow E(\mathbb {F}_{q})\) to elliptic \(\mathbb {F}_{q}\)-curves \(E: y^2 = x^3 + ax + b\)

The only exception is Skałba’s encoding [54]. In contrast to Shallue–van de Woestijne’s (SW) encoding [55], it does not cover curves of j-invariant 0, not to mention that Skałba’s formulas are quite awkward. Hence, it is widely recognized that the former is worse than the latter. Nonetheless, seminal Skałba’s work merits to be cited, because it is historically the first in this research area. Apart from \(\sqrt{\cdot }\), both encodings need the values of two Legendre symbols \( \left( \frac{\cdot }{q} \right) \). However, we have to bear in mind the recent breakthrough [57, 58] in fast constant-time implementations of the Legendre symbol. In other words, the computational complexity of the encodings is close to that of one square root extraction.

It is necessary to explain why Skałba’s encoding is seemingly admissible and what is meant by modification of the SW one, which appears to be equally admissible. The original encodings of the form \(\mathbb {F}_{q} \rightarrow E(\mathbb {F}_{q})\) are of course not so, because they are far from surjective. First of all, recall that the threefold

$$\begin{aligned} T: y^2 = f(x_1)f(x_2)f(x_3) \quad \subset \quad \mathbb {A}^{4}_{(x_1, x_2, x_3, y)}, \end{aligned}$$

where \(E: y^2 = f(x):= x^3 + ax + b\), is at the core of the encodings under discussion. To be more precise, we have the cornerstone map

$$\begin{aligned}&h^\prime : T(\mathbb {F}_{q}) \rightarrow E(\mathbb {F}_{q}) \\&h^\prime := {\left\{ \begin{array}{ll} \begin{array}{ll} \left( x_1, \sqrt{f(x_1)} \right) &{} \quad \text {if} \quad \left( \frac{f(x_1)}{q} \right) \in \{0, 1\}, \\ \left( x_2, \sqrt{f(x_2)} \right) &{} \quad \text {if} \quad \left( \frac{f(x_2)}{q} \right) \in \{0, 1\},\\ \left( x_3, \sqrt{f(x_3)} \right) &{} \quad \text {otherwise}, \, \text {i.e.}, \, \left( \frac{f(x_3)}{q} \right) \in \{0, 1\}. \end{array} \end{array}\right. } \end{aligned}$$

Skałba’s encoding is based on \(\mathbb {F}_{q}\)-unirationality of the Châtelet surface (see, e.g., [59, Sections 1–2]). More concretely, one deals with the surface

figure a

By definition, there is a dominant \(\mathbb {F}_{q}\)-map \(\psi : \mathbb {A}^{2}_{(t_1,t_2)} \dashrightarrow S\) in the sense of [26, Definition 4.43]. Such a map is given in [54, Lemma 3] and yet another rational \(\mathbb {F}_{q}\)-map \(\varphi : S \dashrightarrow T\) is from [54, Lemma 2]. Besides, introduce the following notation:

$$\begin{aligned} \varphi \circ \psi = ( X_1, X_2, X_3, Y ) : \mathbb {A}^{2}_{(t_1,t_2)} \dashrightarrow T \end{aligned}$$

with irreducible fractions

$$\begin{aligned} X_i = \dfrac{\textrm{num}_i}{\textrm{den}_i}, \quad \textrm{num}_i, \textrm{den}_i \in \mathbb {F}_{q}[t_1,t_2], \quad \textrm{and} \quad Y \in \mathbb {F}_{q}(t_1,t_2). \end{aligned}$$

Finally, for \(\alpha \in \mathbb {F}_{q}\) let’s define the curves

$$\begin{aligned} C_{i, \alpha }:= \alpha \cdot \textrm{den}_i - \textrm{num}_i, \quad C_{i, \infty }:= \textrm{den}_i \quad \subset \quad \mathbb {A}^{2}_{(t_1,t_2)}. \end{aligned}$$

A “right” version of Skałba’s encoding is given as follows:

$$\begin{aligned}&h: \mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q}) \\&h(t_1, t_2):= \left\{ \begin{array}{ll} (0:1:0) &{} \quad \text {if} \quad \exists i: (t_1, t_2) \in C_{i, \infty },\\ \big ( h^\prime \circ \varphi \circ \psi \big )(t_1, t_2) &{} \quad \text {otherwise}. \end{array}\right. \end{aligned}$$

In order to shorten a bit formulas Skałba restricts h to the line \(t_1 = t_2\), because he does not worry about the regularity property. Using the intuition confirmed by [23, Theorem 3], [33, Theorem 1], the author suggests that the curves \(C_{i,\alpha }\) are probably absolutely irreducible except for few \(\alpha \). He does not possess sufficient computer resources to prove this statement, since Skalba’s formulas are fairly cumbersome. If the assumption is true, then, by analogy with [23, Corollary 2], [33, Corollary 2], it follows that the encoding h is regular. As usual, h is also efficiently computable and samplable in a clear way, which implies its admissibility.

The SW encoding is obtained in almost the same way. The difference consists in the surface

figure b

or, equivalently,

$$\begin{aligned} S: -y^2 = x^3 + 3t^2x^2 + ax + 4at^2 + b \quad \subset \quad \mathbb {A}^{3}_{(x,y,t)}. \end{aligned}$$

Note that the projection \(\pi : S \rightarrow \mathbb {A}^{1}_x\) is a conic bundle [60, Definition 6]. The original SW encoding just picks a non-degenerate \(\mathbb {F}_{q}\)-fiber \(\pi ^{-1}(\beta ) \subset \mathbb {A}^{2}_{(y,t)}\) (for some \(\beta \in \mathbb {F}_{q}\)) whose \(\mathbb {F}_{q}\)-parametrization \(\mathbb {A}^{1} \dashrightarrow \pi ^{-1}(\beta )\) is taken in the role of \(\psi \).

According to the quite constructive result [60, Theorem 1], the surface S is also \(\mathbb {F}_{q}\)-unirational, although in general it is not \(\mathbb {F}_{q}\)-rational as stressed in [55, Section 5]. As before, it is proposed to choose a dominant \(\mathbb {F}_{q}\)-map \(\psi : \mathbb {A}^{2} \dashrightarrow S\) of degree as little as possible. Once again, if the resulting curves \(C_{i,\alpha }\) (with respect to the new functions \(X_i\)) are absolutely irreducible, then the modified SW encoding h is admissible.

Recently, Chávez-Saab, Rodriguez-Henriquez, and Tibouchi [1] completely studied the case when S is an \(\mathbb {F}_{q}(x)\)-rational conic, that is, it has an \(\mathbb {F}_{q}(x)\)-point. Thereby, they constructed the birational \(\mathbb {F}_{q}\)-map \(\psi \) (of degree one). Be careful, \(\mathbb {F}_{q}\)-rationality of the surface S does not imply \(\mathbb {F}_{q}(x)\)-rationality of S as a conic. The corresponding encoding h was called SwiftEC. It is relevant for many elliptic curves arising in practice. Inter alia, all ordinary curves of j-invariant 0 are covered.

Nevertheless, the applicability conditions of SwiftEC are too restrictive for a series of interesting curves among which \(E: y^2 = x^3 + ax\) (of j-invariant 1728). That is why the work [33] does not lose significance. Moreover, the encoding \(h_1\) invented in [23] is still much faster than SwiftEC over highly 2-adic fields (see Sect. 4). Lots of modern curves [61] (including BLS12-377) are defined over such fields. The point is that \(h_1\) extracts a cubic root in \(\mathbb {F}_{q}\) rather than a square one, not to mention two Legendre symbols.

In conclusion, the last four rows of Table 1 encourage to formulate a very beautiful and practically useful conjecture (partially related to Question 1).

Conjecture 1

For any elliptic \(\mathbb {F}_{q}\)-curve E there is an admissible encoding \(\mathbb {F}_{q}^2 \rightarrow E(\mathbb {F}_{q})\) at the cost of one radical \(\root n \of {\cdot }\) in \(\mathbb {F}_{q}\) for some \(n \in \mathbb {N}\) without computing additional power residue symbols \( \left( \frac{\gamma }{q} \right) _{m}:= \gamma ^{(q-1)/m}\), where \(\gamma \in \mathbb {F}_{q}\) and \(m \mid q-1\).

4 How to hash over highly 2-adic fields

As said in the title, this section dwells on the problem of hashing to an elliptic curve E defined over a finite field \(\mathbb {F}_{q}\) such that \(2^\nu \mid \mid q-1\) for a non-small \(\nu \in \mathbb {N}\). According to Tables 12, the majority of curves have only hash functions computing a root of even degree. Therefore, we cannot expect to represent such a root as one or at least several exponentiations in \(\mathbb {F}_{q}\). For simplicity, let’s restrict to the case of a square root, but the arguments below are also true in the general case.

Undoubtedly, \(\sqrt{\cdot } \in \mathbb {F}_{q}\) can be found through (constant-time) Tonelli–Shanks’s algorithm [3, Appendix I.4]. However, it requires \(O(\log (q) + \nu ^2)\) operations in \(\mathbb {F}_{q}\). The given drawback can be mitigated to \(O\big (\log (q) + (\nu /\mu )^2 \big )\) ones with \(O(2^\mu \nu /\mu )\) storage (where \(\mu \) is a parameter) by means of Bernstein’s table-lookup variant [62]. In practice, this memory overhead is not significant for moderate 2-adicities such as the popular choice \(\nu \approx 32\). These words are justified, for example, by the square root implementation [63] aimed at the classical point decompression [64]. By contrast, in the context of hashing to elliptic curves (assuming a secret input as earlier) Bernstein’s approach is vulnerable to cache-timing attacks (cf. [65]). There is a vast literature about secure table lookups (see, e.g., [66]). However, it is desirable to completely avoid them if possible in order to be more confident in reliability.

Whenever j-invariant of E is different from 0, 1728, we can resort to the recent solution from [56]. Its novelty consists in an unexpected possibility to batch Müller’s square root algorithm [67] and some encoding \(h_M: \mathbb {F}_{q} \rightarrow E(\mathbb {F}_{q})\) including \(\sqrt{\cdot }\) in such a way that Müller’s algorithm does not contain anymore the non-deterministic subroutine. Recall that the unique bottleneck of Müller’s algorithm (and thereby of \(h_M\)) is computing the n-th element of a certain non-full Lucas sequence \(V_i(\cdot , 1)\), where \(n:= (q-1)/4\). These sequences periodically appear in various areas of cryptography (see, e.g., [68, Section 6.3.2]).

For determining \(V_n(\cdot , 1)\) we possess Postl’s algorithm [69], which performs \(\approx 2\log _2(q)\) multiplications in \(\mathbb {F}_{q}\). In fact, there is a folklore trick [70, 71] reducing the running time to \(\approx 2\log _2(q) - \nu \) ones. Taking it into account, Postl’s algorithm may be faster than (or at least of the same performance as) exponentiation in \(\mathbb {F}_{q}\) to some power e of length \(\approx \log _2(q)\) and of Hamming weight \(\omega \lesssim \log _2(q)\). This happens when

$$\begin{aligned} 2\log _2(q) - \nu \lesssim \log _2(q) + \omega \quad \Leftrightarrow \quad \log _2(q) \lesssim \nu + \omega . \end{aligned}$$

In this estimation the conventional binary exponentiation method is applied for the sake of simplicity. In particular, the encodings from [56, Table 1] extracting a higher-degree root \(\root m \of {\cdot }\) sometimes become slower than \(h_M\) even if \(\textrm{GCD}(m, q-1) = 1\).

As an illustration, look at the relatively new stark(jub) curves [72, 73] (of \(j \ne 0, 1728\)) defined over the same field \(\mathbb {F}_{q}\) for which \(\lceil \log _2(q)\rceil = 252\) and \(\nu = 192\). Note that \(q \equiv 2 \, (\textrm{mod} \, 3)\), hence Icart’s encoding \(h_I\) is applicable to these curves. Consequently, we deal with the parameters \(m = 3\), \(e = (2q-1)/3\), and \(\omega = 125\) as is readily checked. Substituting the values in the above inequality, we thus see that \(h_M\) performs approximately \(192 + 125 - 252 = 65\) fewer multiplications in \(\mathbb {F}_{q}\) than \(h_I\). Of course, utilizing a shorter addition chain for exponentiation to the power e allows to compensate the given difference. One can ultimately conclude that proper implementations of the encodings \(h_I\), \(h_M\) to stark(jub) curves have more or less identical execution time.

In the author’s opinion, the task of hashing in time \(O(\log (q))\) to all elliptic \(\mathbb {F}_{q}\)-curves of j-invariant 1728 is solvable. For instance, one of prospective approaches is mentioned in [56, Section 4]. Nonetheless, it is suggested to omit the case of \(j = 1728\) curves in the present section, because they are not considered over highly 2-adic fields in today’s real-world cryptography. The situation is radically opposite for \(j=0\) curves. So, it is worth explaining how an implementer should act if (s)he encounters the Weierstrass form \(E: y^2 = x^3 + b\) such that \(\sqrt{b} \not \in \mathbb {F}_{q}\).

As usual, one first needs to check the existence of an \(\mathbb {F}_{q}\)-isogeny \(\varphi : E^\prime \rightarrow E\) of small (prime) degree \(\ell \) from another elliptic curve \(E^\prime \). This can be done by means of [68, Theorem 25.4.6]. In this case, nothing prevents to employ the indirect encoding \(\varphi \circ h_M\). To evaluate \(\varphi \) we are able to use classical Vélu’s formulas [68, Section 25.1.1] requiring \(O(\ell )\) operations in \(\mathbb {F}_{q}\). Alternatively, there is in [74] the method called square-root Vélu and denoted by \(\sqrt{{\acute{\textrm{e}}\textrm{lu}}}\). It has the asymptotic complexity \(\widetilde{O}\big ( \sqrt{\ell } \big )\), that is, \(O\big (\sqrt{\ell } \big )\) up to polylogarithmic factors. In turn, the work [75] establishes that the complexity is closer to \(O(\ell ^{\log _2(3)/2})\). Owing to those sources, \(\sqrt{{\acute{\textrm{e}}\textrm{lu}}}\) becomes more efficient for \(\ell \approx 100\).

Now, we proceed to the most painful remaining case. Obviously, \(\sqrt{b} \in \mathbb {F}_{q^2}\), hence we can enjoy the admissible encoding \(h_1: \mathbb {F}_{q^2}^2 \rightarrow E(\mathbb {F}_{q^2})\). Further, the trace map

$$\begin{aligned} \textrm{Tr}: E(\mathbb {F}_{q^2}) \rightarrow E(\mathbb {F}_{q}) \qquad P \mapsto P + P^q \end{aligned}$$

comes to the fore, where \(P^q\) is the point \(\mathbb {F}_{q}\)-conjugate to P. Eventually, we get the composition \(\textrm{Tr} \circ h_1: \mathbb {F}_{q^2}^2 \rightarrow E(\mathbb {F}_{q})\). By the way, \(\textrm{Tr}\) is also leveraged in Sato–Hakuta’s encoding [76] relevant conversely for all curves of \(j \ne 0\). It is cute, but useless (regardless of \(\nu \)) if we bear in mind Tables 12.

Lemma 3

The map \(\textrm{Tr} \circ h_1\) is admissible.

Proof

The trace map is known to be a surjective homomorphism (as confirmed in [77, Lemma 1]), hence it is regular. At the same time, \(\textrm{Tr}\) is realized as an algebraic \(\mathbb {F}_{q}\)-map with quite clear formulas (see, e.g., [26, Section 7.4.2]), implying its samplability. Thus, the trace map is admissible, which readily entails the statement of the lemma. \(\square \)

One of disadvantages of this construction is that its domain has the bit length \(4\lceil \log _2(q) \rceil \) instead of \(2\lceil \log _2(q) \rceil \). Consequently, the execution time of a supplementary (indifferentiable) hash function \(\eta : \{0,1\}^* \rightarrow \mathbb {F}_{q^2}^2\) is doubled as well.

Besides, the cubic root arising in \(h_1\) takes place over \(\mathbb {F}_{q^2}\) rather than \(\mathbb {F}_{q}\). In contrast to a square \(\mathbb {F}_{q^2}\)-root [2, Algorithm 5.18], the author does not know how to express \(\root 3 \of {\cdot } \in \mathbb {F}_{q^2}\) through a few \(\mathbb {F}_{q}\)-roots. Since E is supposed to be an ordinary curve, \(q \equiv 1 \, (\textrm{mod} \, 3)\) and thereby the 3-adicities of \(\mathbb {F}_{q}\) and \(\mathbb {F}_{q^2}\) coincide. This means that \(\root 3 \of {\cdot } \in \mathbb {F}_{q^2}\) is represented as an exponentiation in \(\mathbb {F}_{q^2}\) if and only if the same holds over \(\mathbb {F}_{q}\), that is, \(q \not \equiv 1 \, (\textrm{mod} \, 27)\) due to [23, Lemma 6]. As above, denote by \(\omega \lesssim 2\log _2(q)\) the Hamming weight of the corresponding power. By virtue of Karatsuba’s trick [2, Algorithm 5.16], let’s assume that one multiplication in \(\mathbb {F}_{q^2}\) costs three multiplications in \(\mathbb {F}_{q}\). As a result, a cubic \(\mathbb {F}_{q^2}\)-root is found after \(\approx 2\log _2(q) + \omega \) multiplications in \(\mathbb {F}_{q^2}\), which amounts to \(\approx 6\log _2(q) + 3\omega \) ones in \(\mathbb {F}_{q}\). To sum up, the encoding \(\textrm{Tr} \circ h_1\) has the linear complexity \(O(\log (q))\) as desired unless the field \(\mathbb {F}_{q}\) is highly 3-adic. Fortunately, such fields do not occur in practice to the author’s knowledge.