1 Introduction

The rapid progress in wireless communication systems, personal communication systems, and smartcard technologies is impacting people’s life at a staggering rate and in significant ways. New security challenges have to be solved in order to protect those communication systems from malicious attacks. Elliptic curve cryptosystems (ECC) provide efficient and robust solutions to many of the existing security issues in wireless communication systems [9, 10, 32]. When implementing ECC based cryptographic protocols on smart devices such as RFID tags, smart cards, and wireless sensor nodes, it is often necessary to transmit or store points on elliptic curves. Since those smart devices usually have constrained resources in terms of computational capabilities, memory, and bandwidth, it is desirable to represent a point in a compact form.

Point compression is a technique that allows points on an elliptic curve to be represented with fewer bits of data, thereby reducing both the storage space on smart devices as well as the amount of data that needs to be transmitted to and from those devices [5]. A single point compression usually involves solving a quadratic equation over a finite field, which might be computationally expensive for low-cost smart devices. To overcome the aforementioned issue, Khabbazian et al. [20] proposed a double and a triple point compression schemes, respectively, which allow a compact representation of two or three points on an elliptic curve without the computational cost associated with ordinary single point compression. Moreover, the authors pointed out that the point compression can be used to reduce the required bandwidth in the case of the random point multiplication. We notice that double point compression technique can also be directly applied to some pairing-free identity-based signature schemes (see [8, 17] for example) to reduce the required bandwidth during wireless transmission. Additional applications of various point compression techniques include the following.

Message authentication in vehicular ad hoc networks (VANETs)

  • In VANETs, vehicles are equipped with Dedicated Short Range Communication (DSRC) capabilities to enable vehicles’ On-Board Units (OBUs) communicate wirelessly with other vehicles’ OBU or Road Side Units (RSUs) [26]. Safety provision is one of the core goals of VANET deployment. To this end, an OBU periodically broadcasts traffic-related messages to surrounding vehicles in an authenticated manner. Elliptic curve cryptography and digital certificates have been employed to ensure the authenticity of broadcast safety messages [19]. To save the communication bandwidth in VANETs, public keys in digital certificates are stored and transmitted in compressed form. For verifying digital signatures, an OBU needs first to decompress the received public keys, followed by the verification of multiple digital signatures. Multiple point compression can be directly applied in this case to accelerate the point decompression for multiple public keys, which provides an efficient method to achieving a good trade-off between decompression performance and storage requirement.

Multi-user broadcast authentication in wireless sensor networks (WSNs)

  • Multi-user broadcast is an efficient and common communication paradigm in WSNs, where multiple users disseminate messages (i.e., queries or commands) into networks for retrieving the information of their interest [25, 34]. Due to the constrained resources of wireless sensor nodes, elliptic curve cryptography and digital certificates are usually used to ensure the authenticity of the broadcast messages in WSNs [15]. Again, public keys of network users are stored in digital certificates in compressed form to save the energy consumption of sensor nodes. When receiving broadcast messages from multiple users, sensor nodes will verify both digital certificates and signatures. Application of multiple point compression technique enables sensor nodes to balance the performance for signature verifications and the memory footprint on wireless sensor nodes.

Countermeasures against attacks

  • Nowadays ECDSA [3], ECDH [4] and other elliptic-curve based cryptographic algorithms are often implemented on secure devices such as smartcards. Such devices must be protected against side channel attacks and perturbation attacks.

  • A well-known countermeasure against many side channel attacks is to use randomized projective coordinates. If a smartcard stores only x-coordinates of several points and the sum of their y-coordinates it is possible to randomize or to re-randomize the coordinates in an efficient manner with less field multiplications. The whole set of points will get a common randomly chosen z-coordinate.

  • A classical countermeasure against perturbation attacks is the integrity check, such as a parity check or a CRC check. In the case when elliptic-curve points are compressed they can be stored in a smaller buffer and thus the checksum or the CRC value can be computed more efficiently [29].

Other potential applications

  • The ciphertext of the ElGamal Elliptic Curve Cryptosystem [21] is a pair of points of an elliptic curve. Point compression may result in shorter ciphertexts, which is crucial for many applications.

  • The internal state of some ECC-based pseudorandom generators includes two points of an elliptic curve. If the pseudorandom generator is rarely invoked it makes sense to store its state in a compressed form and decompress it when needed. In particular, the two-point compression method can be effective. An example of such a pseudorandom generator is Dual_EC_DRBG [27]. This pseudorandom generator has recently been deprecated by NIST due to certain security flaws [28]. The point compression techniques are likely to be applicable to the successors of Dual_EC_DRBG as well.

Considering potential applications of multiple point compression technique, we present new linear algebra (LA) based compression algorithms for multiple points on elliptic curves in this paper. Let us emphasize from the start that the goal of these new compression algorithms is not to reduce storage requirements by achieving a higher compression rate, but rather to trade current compression techniques with one which is asymptotically (in storing many points) almost as good and additionally achieves a superior decompression speed.

To this end, a four- and five-point compression scheme is proposed which further generalizes the Khabbazian et al. [20]. Our new compression algorithms only make use of basic linear algebra (multiplications, addition, subtractions and at most one division) and do not need to solve any quadratic or higher degree equations over finite fields. In particular, the number of equivalent basic field multiplications (without considering the division/inversion) is constant and does not increase with the field size. Let us note that the running time for one division is equivalent to finding one square root. However, the strength of our method is that k inversions can be combined into a single inversion and \(3k-3\) multiplications  [11, Algorithm 10.3.4 p. 489 a.k.a. Montgomery’s trick], whereas several square root extractions cannot be combined into one modulo a constant number of multiplications.

The remainder of this paper is organized as follows. Section 2 briefly reviews the previous work for double and triple point compression schemes in [20], followed by the new four and five point LA based compression algorithm in Sect. 3. In Sect. 4, we generalize this method to an arbitrary number of points. Finally, Sect. 5 concludes this work.

2 Compression algorithms and previous work

For simplicity in this work we will restrict ourselves to curves over odd characteristic fields \(\mathbb {F}\). Given a plane equation \(y^2=f(x)\), where f is a polynomial in \(\mathbb {F}[x]\) without multiple roots, we usually store a point \(P_1=(x_1,y_1)\) by its coordinates (so called trivial, or no compression), otherwise by \(x_1\) plus one extra bit and recovering \(y_1\) by solving a quadratic equation in \(\mathbb {F}\), using the bit to discriminate between the two solutions. The latter method requires less memory, but more resources to decompress the point in order to find all its coordinates. In particular solving a quadratic equation in \(\mathbb {F}\) requires the equivalent of a field inversion, costing a number of field multiplications which increases linearly in the field bit size. For instance, solving \(y^2 \equiv a \pmod p\) for \(p\equiv 3 \pmod 4\) will entail computing \(a^{(p+1)/4} \mod p\).

We are interested in this work to study compression-decompression algorithms which make use of a constant number of field operations (multiplications, additions and inversions, with inversions grouped into a single one), independently of the field size. These methods will typically make use of linear algebra only, therefore we are going to call them LA methods. We can distinguish between the following two flavours of compression-decompression algorithms:

  1. (1)

    LA compression, non-LA decompression (e.g. the classical method of storing \(x_1\) and 1 bit, recover \(y_1\) by solving a quadratic equation),

  2. (2)

    LA compression, LA decompression (fully LA).

Additionally, one could conceive of a non-LA compression, LA decompression algorithm, although we could not find any in the literature. Nevertheless, our new compression method below (see Sect. 2.2) for two points is a first step towards this paradigm of a slow compression and very fast decompression method.

In [20] a fully LA method is presented to compress \(P_1, \dots ,P_n\) points when \(n=2,3\). The object of the present article is to extend their work to all values of n less than the characteristic of the field and to present an efficient implementation of this when \(n=4,5\).

We review the results of [20] here, in the case of fields of characteristic at least 5. Minor modifications can adapt this method to characteristic 2 and 3. Note that the exact performance improvement of multiple point compression technique over the traditional approach highly depends on the efficiency of solving a quadratic equation over finite fields (see Sect. 3.4 for more details).

2.1 Two-point compression

Given an elliptic curve \(y^2=x^3+ax+b=f(x)\) and points \(P_1=(x_1,y_1)\) and \(P_2=(x_2,y_2)\), we can represent them in general by \((x_1,x_2, \alpha =y_1+y_2)\). Then

$$\begin{aligned} y_1 = \frac{\alpha }{2} + \frac{x_1-x_2}{8\alpha } \left( 3(x_1+x_2)^2+ (x_1-x_2)^2 +4a \right) \end{aligned}$$

and

$$\begin{aligned} y_2 = \alpha - y_1. \end{aligned}$$

This is easily seen by remarking that

$$\begin{aligned} y_1-y_2 = \frac{y_1^2-y_2^2}{y_1+y_2} = \frac{f(x_1)-f(x_2)}{\alpha } \end{aligned}$$

and knowledge of \(\alpha =y_1+y_2\) will readily yield \(y_1, y_2\) by linear algebra. This algorithm works, as long as \(\alpha \ne 0\). This condition can be checked a priori before decompression by computing \(y_i^2= f(x_i)\) and checking that \(y_1^2\ne y_2^2\) (a stronger condition). Otherwise, if \(y_1^2=y_2^2\), the compression algorithm stores \((x_1, x_2, y_1, b)\) where b is an extra bit to distinguish between the generic case and the exceptional case when \(y_2=-y_1\). Two-point compression can thus be achieved by storing 3 field elements and up to one extra bit.Footnote 1

When \(\alpha \ne 0\), the computational complexity of the decompression is \(2M + 2S + I\).

2.2 Alternative two-point compression

We introduce new two-point compression algorithm which requires an inverse operation in compression part while decompression phase needs only multiplication and square operations.

In this case, points \(P_1 = (x_1,y_1)\) and \(P_2=(x_2,y_2)\) are represented by \(\Big (A = \frac{x_{1}-x_{2}}{y_{1}-y_{2}}, B = y_{1}-y_{2}, x_{2}\Big )\). Decompression algorithm works as follows:

Since \(x_{1}-x_{2} = AB\) we obtain \(x_{1} = AB+x_2\).

Then, having two elliptic curve equations

$$\begin{aligned} y_{1}^{2}= & {} x_{1}^{3}+ax_{1}+b\\ y_{2}^{2}= & {} x_{2}^{3}+ax_{2}+b \end{aligned}$$

the following two expressions are derived:

$$\begin{aligned} y_{1}^{2}-y_{2}^{2}= & {} x_{1}^{3}-x_{2}^{3}+a(x_{1}-x_{2})\\ y_{1}+y_{2}= & {} \frac{x_{1}-x_{2}}{y_{1}-y_{2}}\big ((x_{1}-x_{2})^{2}+3x_{1}x_{2}+a\big ) = A\big ((x_{1}-x_{2})^{2}+3x_{1}x_{2}+a\big ) \end{aligned}$$

Since \(y_1-y_2=B\), \(y_1\) and \(y_2\) are easily obtained by Linear Algebra:

$$\begin{aligned} y_{1}= & {} \frac{A\big ((x_{1}-x_{2})^{2}+3x_{1}x_{2}+a\big )+B}{2}\\ y_{2}= & {} y_1-B \end{aligned}$$

The overall computational cost of the algorithm is 3M+1S if multiplication by constant elements is disregarded. This algorithm does require an inverse operation in compression phase. However, decompression part needs only multiplication and square operations, which considerably reduces its computational complexity compared to previously described two-point decompression method.

The algorithm works when \(y_1^2 \ne y_2^2\). In case \(y_1^2 = y_2^2\), follow the compression algorithm described in the Sect. 2.1.

2.3 Three-point compression

In this case, the generic compression algorithm represents \(P_i=(x_i,y_i)\) by \((x_1,x_2,x_3, \alpha =y_1+y_2+y_3)\). Decompression works as follows. Let

$$\begin{aligned} \beta = \alpha ^2 + y_3^2-y_1^2-y_2^2 = 2 (y_3+y_1)(y_3+y_2). \end{aligned}$$

Then

$$\begin{aligned} y_3 = \frac{\beta ^2 + 4\alpha ^2y_3^2- 4 y_1^2y_2^2}{4\alpha \beta } \end{aligned}$$

and since \(y_1+y_2 = \alpha -y_3\) we recover \(y_1, y_2\) by the previous two-point compression algorithm. We are in the generic case provided \(\alpha \ne 0\) and \(y_i^2\ne y_j^2\) for \(1\le i<j\le 3\), in which case by the above \(\beta \ne 0\). We look at the exceptional cases:

  1. (1)

    \(y_i^2\ne y_j^2\) for \(1\le i<j\le 3\) and \(\alpha =0\). In this case, the compression part will store \((x_1, x_2, x_3, y_1+y_2)\). We can recover \(y_1, y_2\) by the generic case of two-point compression and then \(y_3=-y_1-y_2\).

  2. (2)

    Two squares are equal but not three. Without loss of generality, \(y_1^2=y_2^2\ne y_3^2\). In this case, represent the three points as \((x_1,x_2,x_3, y_2+y_3,q_1)\), where \(q_1\in \{0,1\}\). Decompression works out \(y_2,y_3\) using generic two-point compression, and bit \(q_1\) is used to tell if \(y_1=y_2\) or \(-y_2\).

  3. (3)

    \(y_1^2=y_2^2=y_3^2\). In this case, store \((x_1,x_2,x_3,y_1,q_1,q_2)\), with \(b_i\in \{0,1\}\). The bits are used to pinpoint the sign of \(y_2\) and \(y_3\) compared to \(y_1\).

Three-point compression can thus be achieved by storing 4 field elements and up to 2 extra bits. As shown in [20], in the computational complexity in the generic case is \(12M + 6S + I\).

We next present our generalization of this method to four points.

3 Four- and five-point compression

3.1 General idea

Let n be the number of points to be compressed and let \(y_1, y_2, \ldots , y_n\) be the y-coordinates of these points. Denote

$$\begin{aligned} a_1= & {} y_1+y_2+\cdots +y_n, \\ a_2= & {} \sum _{1 \le i< j \le n} y_i y_j, \\&\ldots \\ a_n= & {} y_1y_2\ldots y_{n-1}y_n \end{aligned}$$

the elementary symmetric polynomials in the \(y_i\)’s. We want to express \(y_i\) in terms of the \(a_j\)’s and the even powers of \(y_i\), since we know the value \(y_i^2=f(x_i)\).

Theorem 1

For even values of n the y-coordinates \(y_1, \ldots , y_n\) can be generically expressed as follows:

$$\begin{aligned} y_i = \frac{a_n+y_i^2 a_{n-2}+ \cdots +y_i^{n-2} a_2+y_i^n}{a_{n-1}+y_i^2 a_{n-3}+ \cdots +y_i^{n-2}a_1},\quad i = 1, \ldots ,n. \end{aligned}$$
(1)

For odd values of n,

$$\begin{aligned} y_i = \frac{a_n+y_i^2 a_{n-2}+ \cdots +y_i^{n-1}a_1}{a_{n-1}+y_i^2 a_{n-3}+ \cdots +y_i^{n-3}a_2+y_i^{n-1}}, i = 1,\ldots ,n. \end{aligned}$$
(2)

Proof

We have

$$\begin{aligned} (x+y_1)\cdots (x+y_n) = x^n+a_1x^{n-1} + \cdots + a_{n-1} x + a_n . \end{aligned}$$

Letting \(x=-y_i\), we get

$$\begin{aligned} 0=\,&a_n- y_i a_{n-1} + y_i^2 a_{n-2} - y_i^3 a_{n-3} +\cdots \\ =\,&a_n + y_i^2 a_{n-2} + \cdots -y_i (a_{n-1} + y_i^2 a_{n-3} + \cdots ) \end{aligned}$$

which is what we want. \(\square \)

We will not consider the exceptional situations when the denominators are equal to 0 (see Sect. 4 for a discussion of this).

3.2 Four-point compression

As a next step, we introduce a four-point compression algorithm based on Theorem 1. The compression algorithm represents the four points \(P_i, i = 1, 2, 3, 4,\) by \((x_1, x_2, x_3, x_4, a_1 = y_1 + y_2 + y_3 + y_4)\). The decompression algorithm first retrieves \(a_2, a_3, a_4\) and then uses the following equation to compute \(y_i\)’s:

$$\begin{aligned} y_i = \frac{a_4+y_i^2a_2+y_i^4}{a_3+y_i^2a_1},\quad \text {for} \quad i = 1,2,3,4. \end{aligned}$$
(3)

Our initial goal is to express the elementary symmetric polynomials \(a_2, a_3\) and \(a_4\) in terms of \(a_1\) and \(y_i^2\)’s. Note that

$$\begin{aligned} a_2 = \frac{a_1^2-b_1}{2}, \end{aligned}$$
(4)
$$\begin{aligned} a_1a_3-a_4 = \frac{a_2^2-b_2}{2}, \end{aligned}$$
(5)
$$\begin{aligned} a_2a_4 = \frac{a_3^2-b_3}{2}, \end{aligned}$$
(6)

where

$$\begin{aligned} b_1= & {} y_1^2+y_2^2+y_3^2+y_4^2, \\ b_2= & {} y_1^2y_2^2+y_1^2y_3^2+y_1^2y_4^2+y_2^2y_3^2+y_2^2y_4^2+y_3^2y_4^2,\\ b_3= & {} y_1^2y_2^2y_3^2+y_1^2y_2^2y_4^2+y_1^2y_3^2y_4^2+y_2^2y_3^2y_4^2.\\ \end{aligned}$$

From (4) \(a_2\) is easily computable. Denote the right-hand side of Eq. (5) as A. A is computable since \(a_2\) is known. Hence, from (5) we get the following linear relationship between \(a_3\) and \(a_4\):

$$\begin{aligned} a_3 = \frac{A+a_4}{a_1}. \end{aligned}$$
(7)

Substituting \(a_3\) in Eq. (6) by the above expression gives the following quadratic equation in \(a_4\):

$$\begin{aligned} 2a_1^2a_2a_4 = A^2+2Aa_4+a_4^2-a_1^2b_3. \end{aligned}$$

\(a_4^2\) is efficiently computable since it is equal to \(y_1^2y_2^2y_3^2y_4^2\). Therefore

$$\begin{aligned} a_4 = \frac{A^2+a_4^2-a_1^2b_3}{2\big (a_1^2a_2-A\big )}. \end{aligned}$$
(8)

Formula (7) implies that

$$\begin{aligned} a_3 = \frac{2a_1^2a_2A-A^2+a_4^2-a_1^2b_3}{2a_1\big (a_1^2a_2-A\big )}. \end{aligned}$$
(9)

Substituting \(a_3\) and \(a_4\) from (8) and (9) into (3) gives

$$\begin{aligned} y_i = \frac{a_1\Big (A^2+a_4^2-a_1^2b_3+2\big (a_1^2a_2-A\big )\big (y_i^2a_2+y_i^4\big )\Big )}{2a_1^2a_2A-A^2+a_4^2-a_1^2b_3+2a_1^2y_i^2\big (a_1^2a_2-A\big )}. \end{aligned}$$
(10)

The computational complexity of the decompression algorithm described above is 55M + 11S + 1I, where M, S, and I denote the complexity of the final field multiplication, squaring and inversion, respectively. The complexity can be reduced significantly when only one y-coordinate has to be retrieved. For more details, the reader is referred to Appendix 2. The calculation of the computational complexity is based on the principles described in [23].

3.3 Five-point compression

Theorem 1 can also be used as a basis for a five-point compression algorithm. The compression algorithm represents the five points \(P_i, i = 1, 2, 3, 4, 5,\) by \((x_1, x_2, x_3, x_4, x_5, a_1 = y_1 + y_2 + y_3 + y_4 + y_5)\). The decompression algorithm exploits Eq. (2) for \(n = 5\):

$$\begin{aligned} a_2= & {} (a_1^2 - b_1)/2, \end{aligned}$$
(11)
$$\begin{aligned} a_3 a_1 - a_4= & {} (a_2^2 - b_2)/2,\end{aligned}$$
(12)
$$\begin{aligned} a_4 a_2 - a_5 a_1= & {} (a_3^2 - b_3)/2,\end{aligned}$$
(13)
$$\begin{aligned} a_5 a_3= & {} (a_4^2 - b_4)/2. \end{aligned}$$
(14)

Equation (11) immediately yields \(a_2\) since \(a_1\) and \(b_i\), \(i = 1, 2, 3, 4,\) are known. It follows from (12) that \(a_3\) can be expressed as a linear function of \(a_4\):

$$\begin{aligned} a_3 = (A + a_4)/a_1. \end{aligned}$$
(15)

Here Formula (15) substituted into (13) gives

$$\begin{aligned} a_4 a_2 - a_5 a_1 = \left( \frac{A^2 + 2 A a_4 + a_4^2}{a_1^2} - b_3 \right) \Bigg /2, \end{aligned}$$

which can be used to express \(a_4^2\) as follows:

$$\begin{aligned} a_4^2 = a_1^2 (2 a_4 a_2 - 2 a_5 a_1 + b_3) - A^2 - 2 A a_4. \end{aligned}$$
(16)

As a next step, we use this expression as well as (15) to rewrite (14) as follows:

$$\begin{aligned} a_5 \cdot \left( \frac{A + a_4}{a_1} \right) = \big [ a_1^2 \big (2 a_4 a_2 - 2 a_5 a_1 + b_3\big ) - A^2 - 2 A a_4 - b_4 \big ] \big /2, \end{aligned}$$

which implies

$$\begin{aligned} a_4 = \frac{a_5 \big (2 a_1^4 + 2 A\big ) + a_1 \big (A^2 + b_4 - b_3 a_1^2\big )}{a_1 \big (2a_1^2 a_2 - 2A\big ) - 2 a_5}. \end{aligned}$$
(17)

As before, let \(C = a_1 \big (2 a_1^2 a_2 - 2A\big )\). Let \(F = 2 a_1^4 + 2 A\) and \(G = a_1 \big (A^2 + b_4 - b_3 a_1^2\big )\). Then

$$\begin{aligned} a_4 = \frac{F a_5 + G}{C - 2 a_5}. \end{aligned}$$

Now formula \(a_4^2 = 2 a_5 a_3 + b_4\), which follows directly from (14), can be written as follows:

$$\begin{aligned} \left( \frac{F a_5 + G}{C - 2 a_5} \right) ^2 = 2 a_5 \cdot \left( \frac{A + a_4}{a_1} \right) + b_4 \end{aligned}$$

or

$$\begin{aligned} \left( \frac{F a_5 + G}{C - 2 a_5} \right) ^2 = 2 a_5 \cdot \left( \frac{A + (F a_5 + G)/(C - 2 a_5)}{a_1} \right) + b_4. \end{aligned}$$

Note that \(a_5^2\) can be efficiently computed, \(a_5^2 = y_1^2 y_2^2 y_3^2 y_4^2 y_5^2\). Thus we get

$$\begin{aligned} a_5 = \frac{-F^2 a_5^2 a_1 - G^2 a_1 - 8 ACa_5^2 + 2FC a_5^2 - 4 G a_5 ^2 + C^2 b_4 a_1 + 4 b_4 a_1 a_5^2}{2 G F a_1 - 2 A C^2 - 8 A a_5^2 - 2 G C + 4 F a_5^2 + 4 C b_4 a_1}. \end{aligned}$$

Let J be the numerator and let H be the denominator of the formula so \(a_5 = J/H\). Then

$$\begin{aligned} a_4 = \frac{FJ + GH}{CH - 2J}. \end{aligned}$$

Denote \(K = CH - 2J\), \(L = FJ + GH\). Then

$$\begin{aligned} a_4 = L/K. \end{aligned}$$

and

$$\begin{aligned} a_3 = \frac{AK + L}{a_1 K}. \end{aligned}$$

Since \(n = 5\)

$$\begin{aligned} y_i = \frac{a_5 + y_i^2 a_3 + y_i^4 a_1}{a_4 + y_i^2 a_2 + y_i^4}, \end{aligned}$$

where \(i = 1, 2, 3, 4, 5\). Using the expressions above for \(a_3, a_4, a_5\) we get

$$\begin{aligned} y_i = \frac{J K a_1 + H y_i^2 \big (AK + L + K a_1^2 y_i^2\big )}{Ha_1 \big (L + K y_i^2 (a_2 + y_i^2)\big )}. \end{aligned}$$
(18)

Let \(M_i\) be the numerator of \(y_i\): \(M_i = J K a_1 + H y_i^2 (AK + L + K a_1^2 y_i^2)\). Then \(y_i\) can be converted to the common denominator

$$\begin{aligned} N = Ha_1 \Big (L + K y_1^2 \big (a_2 + y_1^2\big )\Big ) \cdot \Big (L + K y_2^2 \big (a_2 + y_2^2\big )\Big ) \cdots \Big (L + K y_5^2 \big (a_2 + y_5^2\big )\Big ) \end{aligned}$$

as follows:

$$\begin{aligned} y_1= & {} \frac{M_1 \Big (L + K y_2^2 \big (a_2 + y_2^2\big )\Big ) \cdots \Big (L + K y_5^2 \big (a_2 + y_5^2\big )\Big )}{N},\nonumber \\&\ldots \nonumber \\ y_5= & {} \frac{M_5 \Big (L + K y_1^2 \big (a_2 + y_1^2\big )\Big ) \cdots \Big (L + K y_4^2 \big (a_2 + y_4^2\big )\Big )}{N}. \end{aligned}$$
(19)

The latter formula can be used for the efficient computation of \(y_1, y_2, y_3, y_4, y_5\).

The overall computational complexity of the decompression is \(120 M + 12 S + I\) in the case when the goal is to retrieve the y-coordinates of all five points. As before, the complexity can be reduced if the goal of the algorithm is to retrieve the y-coordinate of only one point. Please refer to Appendix 3 for details.

3.4 Discussion on the occupied memory and computational complexity

The two-point and three-point compression algorithms proposed by [20] as well as the four-point and five-point compression algorithms presented in this paper reduce the computational cost of the decompression compared to the ordinary single point compression, in which only the x-coordinates are stored and the y-coordinates are extracted through square root operations. On the other hand, the multiple point compression algorithms require roughly \(\frac{100}{n} \%\) more storage space than the ordinary point compression, where n denotes the number of points to compress.

3.4.1 Square-and-multiply exponentiation, \(p\equiv 3 \pmod 4\)

We will make a concrete comparison for the case of 256-bit curves. Observe that solving \(x^2 \equiv a \pmod p\) for \(p\equiv 3 \pmod 4\) translates into computing \(a^{(p+1)/4} \mod p\). At the same time, \(x^{-1} \pmod p\) can be computed as \(x^{p-2} \pmod p\). Thus we may assume that the square root operation and the inversion operation require roughly the same number of operations on average, that is \(R \approx I \approx 127M + 254S\). The complexity of the inversion operation can be reduced by applying the extended Euclidean algorithm instead of the exponentiation. For the sake of simplicity we do not consider this enhancement in the two tables below; in Sect. 3.4.5 alternative values of R and I are discussed.

The following table provides a comparison between the ordinary-point and multiple-point compression algorithms.

n

Occupied memory

Computational complexity

Ordinary

Multiple

Ordinary

Multiple

2

2

3

\(2 R = 254M + 508S\)

\(129M+256S\)

3

3

4

\(3 R = 381M + 762S\)

\(139M+260S\)

4

4

5

\(4 R = 508M + 1016S\)

\(182M+265S\)

5

5

6

\(5 R = 635M + 1270S\)

\(247M+266S\)

Here the unit of memory is one field element; columns “ordinary” and “multiple” correspond to the ordinary-point and multiple-point compression algorithms, respectively. In this table we neglect the extra storage bits needed in exceptional situations (the reader is referred to Sect. 4 for more details).

The table above can be used to illustrate the differences in the memory usage and in the computational complexity between the ordinary-point and multiple-point compression algorithms. We assume that \(S = 0.8 M\), which is one of the possibilities considered in the database [23]. Then the table can be rewritten as follows.

n

Occupied memory

Computational complexity

Ordinary

Multiple

Percentage

Ordinary

Multiple

Percentage

2

2

3

+50

660M

334M

−49

3

3

4

+33

991M

347M

−65

4

4

5

+25

1321M

394M

−70

5

5

6

+20

1651M

460M

−72

The new two-point compression algorithm described in Sect. 2.2 requires one inversion operation in its compression phase. However, the computational complexity of decompression part is \(3M + 1S\) which requires 99.4 % less time than decompression phase of ordinary point compression method.

Please note that the multi-exponentiation technique [33] cannot be used to speed up the ordinary point compression algorithm. The end result of the multi-exponentiation is a product of several exponentiations while the ordinary point compression needs the result of each exponentiation separately.

3.4.2 Case \(p \equiv 2^s + 1\ \ ({\mathrm{mod}}\ {2^{s+1})}\)

So far we have considered only one example of the characteristic p, namely \(p\equiv 3 \pmod 4\). To get a broader view on the computational complexity of multiple point compression let us consider one more example.

Let \(p \equiv 2^s + 1 \pmod {2^{s+1}}\) for a small positive integer s. In this case square roots can be computed using the algorithm proposed by Koo et al. [22]. The algorithm is useful for small values of s such as \(s = 2, 3, 4\) since the number of branches increases exponentially as s grows. In order to compute x such that \(x^2 = a\) for a given square \(a \in \mathbb {F}_p\) the algorithm requires one of the following exponentiations \(a^{\frac{p-5}{8}}\), \(a^{\frac{p-9}{16}}\) or \(a^{\frac{p-17}{32}}\) depending on whether \(s = 2, 3\) or 4. This means that the square root calculation is about as expensive for \(p \equiv 2^s + 1 \pmod {2^{s+1}}\) and for \(p \equiv 3 \pmod 4\) for any fixed bit length. Thus the comparison tables given in the previous section are applicable for \(p \equiv 2^s + 1 \pmod {2^{s+1}}\) as well.

3.4.3 Quadratic extension fields

In [1], the authors proposed two efficient algorithms for computing square roots over even field extensions of the form \(\mathbb {F}_{q^2}\), with \(q = p^n\), p an odd prime and \(n \ge 1\). The proposed two algorithms address the case when \(q \equiv 1 \pmod 4\) and \(q \equiv 3 \pmod 4\), respectively, and have an associate computational cost roughly equivalent to one exponentiation in \(\mathbb {F}_{q^2}\). In the case when \(q \equiv 3 \pmod 4\), the authors showed that the complex method of [14] is the most efficient algorithm for square root computation. For instance, when \(p = 2^{256} - 2^{224} + 2^{192} + 2^{96} - 1\) and \(q = p\), the square root computation using the complex method requires \(1I_p + 1149M_p + 6Mc_p\), where \(I_p, M_p\) and \(Mc_p\) denote the inversion, multiplication and multiplication by constant over the underlying \(\mathbb {F}_p\). Hence, decompressing two points costs \(2I_p + 2298M_p + 12Mc_p\). Using the two-point decompression technique in Sect. 2.1, one only needs \(I_p + 14M_p + 7Mc_p\), given that \(M_{p^2} = 3M_p + Mc_p, S_{p^2} = 2M_p + 2Mc_p\) and \(I_{p^2} = I_p + 4M_p + Mc_p\). Using the same complexity estimation as in Sect. 3.4.1, one can find that the two-point compression technique requires \(87.5\,\%\) less time when compared to the traditional approach. A similar analysis is also applicable to the case when \(q \equiv 1 \pmod 4\) and we omit the details here.

3.4.4 Small characteristic fields

Elliptic curves over binary fields have the shape \(y^2+xy=f(x)\) or \(y^2+ay=f(x)\) for some cubic polynomial f and constant a. In this case, the trivial compression method stores x and one bit. To decompress we have to solve a quadratic Artin–Schreier equation \(y^2+by=c\) instead of taking a square root.

For elliptic curves defined over \(\mathbb {F}_{2^m}\) (m is odd), solving a quadratic equation is highly efficient when compared to other finite fields. As shown experimentally in [18], the time complexity is approximately 2/3 the time of a field multiplication. Even in this extreme case, it is not difficult to demonstrate that the two-point decompression technique (which requires \(I + 3M + S\) [20]) is roughly \(50\,\%\) faster than the traditional method (which requires \(2\cdot \big (I + \frac{5}{3}M + S\big )\)) for decompressing two points on elliptic curves defined over binary fields, provided that \(I \ge 10M\) and \(S = 0.1M\) [6].

For elliptic curves defined over \(\mathbb {F}_{3^m}\) (m is odd), the best algorithm for square root extraction takes \(O(m^2\log m)\) steps [7]. If \(m = 2k + 1\) for some k, it is easily verified that the square root computation requires \(\left( \lfloor \log _2 k \rfloor + \omega (k) + 1\right) \cdot M\) in this case, where \(\omega (k)\) is the Hamming weight of the binary representation of k. In the case of \(m = 239\), extracting a square root over \(\mathbb {F}_{3^{239}}\) requires 13M. While the traditional method requires 26M (i.e., two square root computations) for decompressing two points, the proposed two-point decompression technique only requires \(1I + 2M + S\), which results in about \(50\,\%\) performance improvement. Here we assume that \(I = 10M, S = M\) and a Type-II normal basis is used, as illustrated in [2]. If \(I = 30 M\) the computational complexity of the two-point decompression is roughly the same as that of the traditional method.

3.4.5 Application to resource-constrained environments

When applying multiple point compression technique to resource-constrained environments such as wireless sensor networks, the performance improvement over ordinary point decompression method is further amplified due to the low-power processor architecture design. For instance, when implementing the 192-bit NIST curve on an 8-bit AVR microcontroller, the multiplication, squaring and inversion operations take 4, 042, 2, 658 and 280, 829 clock cycles, respectively [24], which gives us \(I \approx 70M\), \(S \approx 0.65M\) and \(R = 127M + 189S \approx 250M\). Similar to the above estimations, we can obtain that the two-, three-, four- and five-point decompression algorithms requires 85.3, 88.5, 86.8 and \(84.2\,\%\) less time than the traditional approach, respectively, which further demonstrates the great advantages of using multiple point compression technique on embedded devices.

4 A unified treatment of n-point compression

Considering the \(\mathbb {F}\)-rational generic points \(P_i=(x_i, y_i), (i=1,\dots , n)\), lying on the curve \(y^2=f(x)\), we now describe a fully LA method to compress-decompress these points using only \(n+1\) field values. When referring to a “constant number of operations” we mean that this number does not depend on the field size, but only on n and the polynomial f. Our method shows that asymptotically, one can achieve a very fast compression–decompression algorithm by storing only one extra value, compared to the trivial algorithm which stores only the \(x_i\)’s. However, in practice, ad hoc methods such as those presented in the previous sections will be more efficient. Also, there are exceptional cases which in applications would not arise but can be dealt theoretically by storing a fixed number of extra bits, as in [20], hence our focus on an algorithm for generic points.Footnote 2 Additionally, with cryptographic applications in mind, we assume that \(n<{\text {char}} \mathbb F\) to avoid dividing by zero.

To compress n generic points, simply store the values

$$\begin{aligned} \big (x_1, \dots , x_n, a_1= y_1+ \cdots + y_n\big ). \end{aligned}$$
(20)

To decompress, let \(a_1, \dots , a_n\) as in Sect. 3 be the elementary symmetric polynomials in the \(y_i\)’s. Then, by Newton’s identities,

$$\begin{aligned} \nonumber {y_1^{2}+\cdots + y_n^{2}}&= {a_1^2} -2a_2,\\ {y_1^{4}+ \cdots + y_n^{4}}&= {a_1^4} - 4 a_1^2a_2 + 2{a_2^2} + 4a_1a_3- 4a_4, \\ \nonumber {y_1^{6}+ \cdots + y_n^{6}}&= {a_1^{6}} - 6 a_1^{4} a_2 + 9a_1^{2} a_2^{2} + 6a_1^{3} a_3 - 2{a_2^{3}} - 12 a_1 a_2 a_3\\ \nonumber&\quad \,- 6a_1^{2} a_4 + 3{a_3^{2}} + 6 a_2 a_4 + 6 a_1a_5 - 6a_6, \end{aligned}$$
(21)

and so on for \(p_{2k}=y_1^{2k}+\cdots + y_n^{2k}\) for \(k=1,\dots ,n\). The left-hand sides are computable in a constant number of operations. The right-hand sides are polynomials \(f_{2k}(a_1,\dots ,a_n)\), so the previous system (21) can be rewritten as \(p_{2k} = f_{2k}(a_1,\dots ,a_n)\) for \(k=1,\dots ,n\). One can solve it in \(a_1,\dots ,a_n\) by considering the ideal \({\mathcal {I}}\) of \(\mathbb {F}(p_2,\dots ,p_{2n})[a_1,\dots ,a_n]\) generated by the \(p_{2k} - f_{2k}(a_1,\dots ,a_n)\) for \(k=1,\dots ,n\) and computing its reduced Gröbner basis with respect to the lexicographic ordering of the variables with \(a_1<a_2<\cdots <a_n\). In fact, the Newton identities applied to the symmetric polynomials in the \(y_i^2\)’s show that the knowledge of \(p_{2k}\) for \(k=1,\dots ,n\) determines \(y_1^2,\dots ,y_n^2\) uniquely up to permutation. By our assumption, this means that given \(a_1\) there are either none or unique values (up to permutation) of \(y_1,\dots y_n\) for which their symmetric polynomials \((a_1,\dots a_n)\) belong to the variety V associated to \({\mathcal {I}}\). Therefore \(\#V=2^n\) and the points of V are separated by \(a_1\) (they have all distinct first coordinate). Let’s note that in this reasoning we have kept in mind specializations of the \(p_{2k}\) to particular values in \({\mathbb {F}}\) corresponding to generic \(y_1,\dots y_n\). In fact, what this shows is that as we work with undetermined coefficients, i.e. with \(\mathcal I\subset \mathbb {F}(p_2,\dots ,p_{2n})[a_1,\dots ,a_n]\) we have a fortiori \(\#V=2^n\), where \(V \subset \bigl (\overline{\mathbb {F}(p_2,\dots ,p_{2n})}\bigr )^n\), as well as separation by the first coordinate \(a_1\).

Moreover \({\mathcal {I}}\) is a radical ideal: consider again the symmetric polynomials \(e_1= y_1^2+\cdots + y_n^2\), ..., \(e_n=y_1^2\ldots y_n^2\) in the \(y_i^2\)’s. We write the identity

$$\begin{aligned} \big (x^2-y_1^2\big )\cdots \big (x^2-y_n^2\big ) = (x-y_1)\cdots (x-y_n)(x+y_1)\cdots (x+y_n) \end{aligned}$$

as

$$\begin{aligned}&x^{2n}-e_1x^{2n-2}+e_2x^{2n-4}+\cdots + (-1)^n e_n \\&\quad = \Big (x^n - a_1 x^{n-1} + a_2 x^{n-2} + \cdots + (-1)^n a_n\Big )\\&\quad \quad \times \, \Big (x^n + a_1 x^{n-1} + a_2 x^{n-2} + \cdots + a_n\Big ) \end{aligned}$$

This gives rise to the following system of n equations

$$\begin{aligned} e_1&= -2a_2 +a_1^2 , \nonumber \\ e_2&= 2a_4-2a_1a_3+a_2^2 , \\ \nonumber e_3&= -2a_6+2a_1a_5-2a_2a_4+a_3^2 , \\ \nonumber&\text {etc.} \end{aligned}$$
(22)

Newton’s identities imply that \(\mathbb {F}[p_2,\dots ,p_{2n}]=\mathbb {F}[e_1,\dots ,e_{n}]\) since we are supposing that \({\text {char}} {\mathbb {F}}>n\) and hence the two systems (21) and (22) are equivalent in the sense that the ideals they define are equal in \(\mathbb {F}(p_2,\dots ,p_{2n})[a_1,\dots ,a_n] = \mathbb {F}(e_1,\dots ,e_{n})[a_1,\dots ,a_n]\). Looking at \({\mathcal {I}}\) as described by (22), we apply Bézout’s Theorem for hypersurfaces [16, Ch.12.3]: since we already know that the n hyperquadrics (22) intersect in \(2^n\) points (the maximum allowed without common components), the multiplicity of intersection at each point is one and therefore by [13, Ch.4, Corollary 2.6] we deduce that \({\mathcal {I}}\) is radical (we don’t need the hypothesis that the field k be algebraically closed if we have already found \(V({\mathcal {I}})\)).

The Shape Lemma [12, 30] says that under the preceding conditions, the reduced Gröbner basis will look as follows.

$$\begin{aligned} \left[ g_{1}(a_1) , a_2+g_2(a_1), \dots , a_n + g_n(a_1) \right] , \end{aligned}$$

where \(g_k\in \mathbb {F}(p_2,\dots ,p_{2n})[a_1]\) with \(\deg g_1=2^n\) and \(\deg g_k<2^n\) for \(k=2,\dots ,n\). This Gröbner basis can be precomputed (it depends only on n) and from its form, given \(a_1\), the other \(a_k\)’s for \(k=2,\dots ,n\) can be recovered in a constant number of operations. See Appendix 1 for the expressions of \(g_k\) when \(n=4\).

Once we know the symmetric polynomials, we can compute the right-hand sides of

$$\begin{aligned} y_1+ \cdots + y_n&= a_1,\\ {y_1^{3}+ \cdots + y_n^{3}}&= {a_1^3} - 3 a_1a_2 + 3 a_3,\\ {y_1^{5}+ \cdots + y_n^{5}}&= {a_1^{5}} - 5 a_1^{3} a_2 + 5 a_1 a_2^{2} + 5 a_1^{2} a_3 - 5 a_2 a_3 - 5 a_1 a_4 + 5 a_5, \\&\vdots \end{aligned}$$

up to \(y_1^{2n-1}+ \cdots + y_n^{2n-1}\). Since

$$\begin{aligned} y_1^{2k-1}+ \cdots + y_n^{2k-1} = y_1\cdot y_1^{2k-2} + \cdots + y_n \cdot y_n^{2k-2} \end{aligned}$$

we will get n linear equations in the n variables \(y_1, \dots , y_n\), whose matrix is the Vandermonde

$$\begin{aligned} \begin{pmatrix} 1 &{} \cdots &{} 1 \\ y_1^2 &{} \cdots &{} y_n^2 \\ \vdots &{} &{} \vdots \\ y_1^{2n-2} &{} \cdots &{} y_n^{2n-2} \\ \end{pmatrix} \end{aligned}$$

which will be invertible since \(y_i^2\ne y_j^2\) for \(1\le i<j\le n\) (otherwise not all \(2^n\) sums \(\pm y_1 \pm \cdots \pm y_n\) would be distinct). We can solve this system by linear algebra in a constant number of operations, thereby concluding the LA decompression. Thus we have proved the following theorem.

Theorem 2

Given a curve \(y^2=f(x)\) over a field \(\mathbb {F}\) of odd characteristic greater than n and \({\mathbb {F}}\)-rational generic points \(P_i=(x_i,y_i), i=1,\dots , n\) on it, we can fully LA compress and decompress them by storing \(n+1\) field elements.

5 Conclusion

In this paper, we presented novel LA based compression techniques for multiple points on elliptic curves. Four- and five-point compression schemes are first described which extend Khabbazian et al. [20]. We then extend this analysis to deal with n points by generically storing \(n+1\) field elements. Using our new algorithms, the compression process requires almost no computational effort and the decompression does not involve solving any quadratic or higher degree equations over finite fields. Hence, the proposed techniques can be employed to reduce the required bandwidth/memory when single point compression is computationally expensive. It is possible to generalise this method to deal with divisors on hyperelliptic curves represented in Mumford form. One aspect that would be interesting to explore is the possibility to design a fully LA algorithm where compression and decompression take comparable amount of time, or even where decompression is fast but compression slower, in contrast with the present algorithm where compression is immediate and decompression is much slower.