1 Introduction

Elliptic Curves Cryptosystems (ECC) that have been introduced by N. Koblitz [37] and V. Miller [45], are based on the notable discrete logarithm problem, which has been thoroughly studied in the literature and is supposed to be a hard mathematical problem. The main benefit in elliptic curves based algorithms is the size of the keys. Indeed, for the same level of security, the schemes require keys that are far smaller than those involved in classical public-key cryptosystems. The success of ECC led to a wide variety of applications in our daily life and they are now implemented on lots of embedded devices: smart-cards, micro-controller, and so on. Such devices are small, widespread and in the hands of end-users. Thus the range of threats they are confronted to is considerably wider than in the classical situation. In particular, physical attacks are taken into account when assessing the security of the application implementation (e.g. the PACE protocol in e-passports [33]) and countermeasures are implemented alongside the algorithms.

A physical attack may belong to one of the two following families: perturbation analysis or observation analysis. The first one tends to modify the cryptosystem processing with laser beams, clock jitter or voltage perturbation. Such attacks can be thwarted by monitoring the device environment with captors and by verifying the computations before returning the output. The second kind of attacks consists in measuring a physical information, such as the power consumption or the electro-magnetic emanation, during sensitive computations. Inside this latter area we can distinguish, what we call simple attacks, that directly deduces the value of the secret from one or a small number of observation(s) (e.g.Simple Power Analysis [39]) and advanced attacks involving a large number of observations and exploiting them through statistics (e.g. Differential Power Analysis [40] or Correlation Power Analysis [15]). Such attacks require the use of a statistical tool, also known as a distinguisher, together with a leakage model to compare hypotheses with real traces (each one related to known or chosen inputs). The latter constraint may however be relaxed thanks to the so-called collision attacks [52] which aim at detecting the occurrences of colliding values during a computation, that can be linked to the secret [12, 21, 48, 49]. In order to counteract all those attacks, randomization techniques can be implemented (e.g. scalar/message blinding for ECC [25]). The recent introduction of the so-called horizontal side-channel technique by Clavier et al. in [20] seems to have set up a new deal. This method, which is inspired by Walter’s work [56], takes its advantage in requiring a unique power trace, thus making classical randomization techniques ineffective. Up to now, it has been applied successfully on RSA implementations and we show in this paper that it can be combined with collision correlation analysis to provide efficient attack on elliptic curves protected implementations.

Core idea. In the context of embedded security, most ECC protocols (e.g. ECDSA [2] or ECDH [3]) use a short term secret that changes at each protocol iteration. In this particular setting, advanced side-channel attacks, which require several executions of the algorithm with the same secret, are ineffective. As a consequence, only protection against S-PA is usually needed, that can be done thanks to the popular atomicity principle [17, 29, 43]. Up to now, this technique is considered as achieving the best security/efficiency trade-off to protect against side-channel analysis. In this paper, we provide a new side-channel attack, called horizontal collision correlation analysis that defeats such protected ECC implementations. In particular, implementations using point/scalar randomization combined with atomicity are not secure, contrary to what was thought up to now. Moreover in order to complete our study, we also investigate the case of unified formulas Footnote 1. Indeed, we show that our horizontal collision correlation attack allows to distinguish, with a single leakage trace, a doubling operation from an addition one. This technique, which allows to eventually recover the secret scalar, is applied to three different atomic formulae on elliptic curves, namely those proposed by Chevallier-Mames et al. in [17], by Longa in [43], by Giraud and Verneuil in [29].

Paper Organisation. The paper is organized as follows. First, Section 2 introduces a framework enabling to formally study the resistance of an implementation against side-channel attacks in both Horizontal and Vertical modes. Section 3 recalls some basics about ECC in a side-channel attacks context. Then, under the assumption that one can distinguish common operands in modular multiplications, the outlines of our new horizontal collision correlation attack are presented in Section 4. After a theoretical analysis explaining how to practically deal with the distinguishability assumption, we provide in Section 5 experimental results for 160, 256 and 384-bit-size curves working with 8 or 32-bit registers. These results show that the attack success rate stays high even when significant noise is added to the leakage.

2 A comprehensive study of side-channel analyses

In the following, a general framework is introduced which enables to describe most of the existing attacks in a similar way and to identify their core differences (actually the leakage pre-treatment, the leakage model and the statistical test).

2.1 A general framework for simple and advanced side-channel analyses

Theoretically modelling a side-channel analysis obviously requires some prior statistical notions. Notations. A realization of a random variable X is referred to as the corresponding lower-case letter x. A sample of observations of X is denoted by (x) or by (x i ) i when an indexation is needed. In this case, the global event is sum up as (x)↩X. The i th coordinate of a variable X (resp. x), viewed as a vector, is denoted by X[i] (resp. x[i]). As usual, the notation \(\mathbb {E}[X]\) refers to the mean of X. For clarity reasons we sometimes use the notation \(\mathbb {E}_{X} [\cdot ]\) to enlighten the variable over which the mean is computed. The variance of X is denoted by var (X).

Attacks presented in this paper involve the linear correlation coefficient which measures the linear interdependence between two variables X and Y. It is defined as \(\rho (X,Y) = \frac {cov(X,Y)}{\sigma _{X}\sigma _{Y}}\), where cov (X,Y), called covariance between X and Y, equals \(\mathbb {E}[(X-\mathbb {E}[X]).(Y-\mathbb {E}[Y])] = \mathbb {E}[XY]-\mathbb {E}[X]\mathbb {E}[Y]\) and where σ X and σ Y respectively denotes the standard deviation of X and Y. The linear correlation coefficient can be approximated from realizations samples (x i )1≤in and (y i )1≤in of X and Y respectively. For this approximation, the following so-called Pearson’s coefficient is usually involved:

$$ \hat \rho(X,Y) = \frac {n{\sum}_{i}x_{i}y_{i}-{\sum}_{i} x_{i} {\sum}_{j} y_{j}}{\sqrt{n{\sum}_{i}{x_{i}^{2}} - \big({\sum}_{i} x_{i}\big)^{2}}\sqrt{n{\sum}_{j} {y_{j}^{2}} - \big({\sum}_{j} y_{j}\big)^{2}}}\enspace . $$
(1)

Attack Context. In the subsequent descriptions of side-channel analyses, an algorithm \(\mathcal {A}\) is modelled by a sequence of elementary calculations (C i ) i that are Turing machines augmented with a common random access memory (see [44] for more details about this model). Each elementary calculation C i reads its input X i in this memory and updates it with its output O i =C i (O i ). All the attacks below target a variable O(s,X) defined as the output of a specific computation (e.g. a multiplication) performed by the device and parametrized by a secret sub-part s and a public variableFootnote 2 X. In the following, we shall use O instead of O(s,X) if there is no ambiguity on s and X.

To recover information on s, the attacks are performed on a sample of observations related to the processing of O by the device. Each of those observations, such as power consumption, electromagnetic emanations, and so on, is usually composed of several physical measurements corresponding to leakages at different times t i . It can hence be viewed as a realization of a multivariate random variable L whose coordinates L[i] satisfy:

$$ \textit{\textbf{L}}[i] = \varphi_{i}\left(O\right) +B_{i} \enspace, $$
(2)

where φ i only depends on the device behaviour at time t i during the processing of O and B i is an independent Gaussian noise with zero mean and standard deviation σ i . The function φ i is a priori unknown. The index i will be sometimes omitted. In this case, it is assumed that the same function is associated to all the time indices. See Fig. 1 to illustrate the notations and Appendix A for an extension of the definitions and notations to higher-order contexts.

Fig. 1
figure 1

Power consumption of the processing of O

An SCA is based on the Hypothesis Testing principle [38]. To make this test, a set of prediction values hj are deduced from each hypothesis \(\hat {s}\) on s and from the sample of implementation inputs (x j ) corresponding to the observations. This step involves a leakage model function m that must have been priorly chosen by the attacker (for instance based on its knowledge on the attacked device architecture). With this model function, the prediction values h j are built s.t. \(\mathrm {h}_{j} = \textbf {m} (O(\hat {s},x_{j})) \). Eventually, the adversary uses a distinguisher ρ to compare the h j with the observations l j ↩L|X=x j . This results in a so-called score value, \({\Delta }[\hat {s}]\) for the hypothesis \(\hat {s}\). The tuple of scores \(({\Delta }[\hat {s}])_{\hat {s}}\) is called scores vector.

The overall set of SCA is usually split in two subclasses. The first one, called simple Side-Channel Analysis, contains all attacks where observations only need to be done on a single value of the public input parameter (this implies that all the x j are equal to a same value, say x). This set contains S-PA [39], S-EMA [27, 51] or S-TA (Timing Analysis) [39]. The second subclass, called advanced Side-Channel Analysis, includes attacks where observations of the targeted internal processing must be done for several public input parameters. In particular, it contains univariate SCA attacks such as D-PA [41], C-PA [15] or MI-PA [28] and multivariate SCA attacks such as HO-D-PA [41, 50] or HO-MI-PA [6]. We give hereafter a more formal description of those two subclasses. Simple SCA. The class of simple SCA includes all Vertical or Horizontal SCA where the adversary makes observations on a single input. Table 1 provides a description of a simple Side-Channel Analysis.Footnote 3

Table 1 Simple side-channel analysis

Remark 1

In theory, simple SCA may be conducted with a single observation. In practice however, it is often necessary to use several observations of the processing for the same variable x in order to reduce the noise impact. In this case, the statistical distinguisher ρ may for instance involve a preliminary step consisting in the averaging of the observations sample.

Example 1

(S-PA on an RSA implementation based on the left-to-right Square and Multiply algorithm) In this case, the target processing is the entire computation O=X s m o d n. For a constant message xX, the adversary starts by getting a sample of observations (l j )↩L|X=x. Then, ranging i from the index of the key most significant bit to 0, he makes an hypothesis \(\hat {s}[i]\). From the model, he hence gets, for each index i, a pattern \(p_{\hat {s} [i]}\) that is assumed to correspond to the power consumption profile of either a squaring plus a multiplication if \(\hat {s}[i]\) equals 1 or a squaring alone if \(\hat {s}[i]\) equals 0. Eventually, the adversary chooses the Euclidean distance as distinguisher Δ and computes it between \(\textit {\textbf {h}}=p_{\hat {s}[i]}\) and the part of the mean vector \( \frac {1}{N} {\sum }_{j} \textit {\textbf {l}}_{j}\) corresponding to the i t h step of the exponentiation. The hypothesis \(\hat {s}[i]\) minimizing the distance is assumed to be the most likely one.

Advanced SCA. All the attacks where observations must involve several inputs belong to the advanced SCA category. This kind of attacks follows the outlines given in Table 2.

Table 2 Advanced side-channel analysis (A-SCA)

Remark 2

Depending on the statistical treatment processed by the distinguisher, the latter one may include a particular leakage post-processing ε. This post-treatment may be used to select some particular points in the leakage traces and, possibly, to combine them. For instance, in a second-order advanced SCA involving the mutual information as distinguisher, the function ε can be defined such that \(\varepsilon \left (\textit {\textbf {L}}\right ) = \left (\textit {\textbf {L}}[p], \textit {\textbf {L}}[q] \right )\) for some constant indices (a.k.a. leakage times) p and q. In a second-order advanced SCA involving the correlation coefficient as distinguisher, ε may be defined such that \(\varepsilon \varepsilon \left (\textit {\textbf {L}}\right ) = (\textit {\textbf {L}}[p] - \mathbb {E}(\textit {\textbf {L}}[p])) \cdot (\textit {\textbf {L}}[q] - \mathbb {E}(\textit {\textbf {L}}[q]))\). Moreover, the choice of the model function must be done in accordance with the distinguisher (see e.g. [50] and [28]).

Example 2

In template A-SCA, the probability density function \(f_{\hat {s}, x_{j}}\) of the random variable \((\textit {\textbf {L}} | {S} = \hat {s}, X = x_{j})\) is estimated for all pairs \((\hat {s},x_{j})\). The model function m and the predictions (h j (⋅)) j are hence defined such that \(\textit {\textbf {h}}_{j}(\cdot )=\textbf {m} (o(\hat {s},x_{j})) = f_{\hat {s},x_{j}}(\cdot )\). The estimation may for instance be done on a copy of the attacked device on which an open access is allowed. Eventually, the distinguisher corresponds to a Maximum Likelihood Test: \({\Delta } [\hat {s}] = {\prod }_{j} f_{\hat {s},x_{j}}(\textit {\textbf {l}}_{j})\).

2.2 Leakage measurements and observations

In the literature, two main approaches have been defined to get the observations l j (which corresponds to the first step of the attacks in Tables 1 and 2). The first method simply consists in executing the implementation several times (with the same input in simple SCA or with several ones in advanced SCA) and in defining l j as the observation related to the j th algorithm execution. Those attacks are called Vertical. The second method refers to attacks where a single execution is needed and where each l j corresponds to the observation of a processing at a different time period during the latter. In this case, the index j refers to the time period. The underlying assumption is that all the observations rely on the same internal calculus O(s,X), parametrized by a same secret s and different known values x j in advanced SCA, or a constant one x in simple SCA. Attacks corresponding to this modus operandi are called Horizontal. Figure 2 illustrates the notations and the differences between the two modus operandi.

Fig. 2
figure 2

Vertical and Horizontal SCA

All the attacks discussed in Section 2.1 can be either Vertical or HorizontalFootnote 4. Even if the Horizontal or Vertical characteristic of an SCA has no impact on the attack steps themselves (as described in Tables 1 and 2), it does impact the implementation security analysis. Indeed, we will see in Section 5 that a countermeasure may become ineffective when going from one category of attacks to another one. We illustrate this in the context of secure RSA implementations.

2.3 Security evaluation

To analyse the resistance of an implementation against any of the attacks presented in previous sections, we can evaluate the following quantity:

$$\Big| \text{Pr}[\textbf{S}=\textbf{s}] - \text{Pr}\big[\textbf{S}=\textbf{s} \,|\, ({\Delta}[\hat{s}])_{\hat{s}} \big]\Big| \enspace .$$

Indeed, if this value turns out to be negligible, it means that the adversary does not gain any advantage in recovering the secret key s by processing the SCA than just guessing it at random from uniformly distributed values among the set of all possibilities. In this case, we consider the implementation to be resistant to the corresponding SCA attack (represented by the scores vector Δ[⋅]). This way of analyzing the security is very close to the approach based on guessing entropy introduced in [53]. Also, after considering Δ[⋅] as an Oracle with which the adversary interacts, the approach exactly corresponds to that followed in classical security proofs in modern cryptography.

2.4 Taxonomy

Based on the discussions conducted in previous sections, we propose here a general taxonomy for simple and advanced side-channel attacks. To name an attack we propose to use the convention [XXX]-[YYY]-[ZZZ] where:

  • XXX equals either S for simple SCA or is a reference to the statistical tool for advanced SCA (e.g. C for Correlation, MI for Mutual Information, ML for Maximum Likelihood, LR for Linear Regression, etc.). In case of multivariate SCA, we propose to pad the order/dimension followed by O at the left of the distinguisher letter.

  • YYY is an acronym referring to the leakage type; PA for Power Analysis, EMA for Electromagnetic Analysis, TA for Timing Attacks, etc.

  • ZZZ is optional and may be used to specify if the attack is profiled or not. In this case, ZZZ is replaced by P (for Profiling) or UnP (for UnProfiling). For instance, Template attack is a ML-PA-P attack.

Of course, all those attacks can be applied either on a Vertical or Horizontal mode. Figure 3 illustrates the taxonomy for some existing attacks.

Fig. 3
figure 3

Side-Channel Attacks

3 Side-channel attacks against elliptic curves

In this section, we study the resistance of several implementations of ECC algorithms with respect to horizontal SCA.

3.1 Background on Elliptic Curves

As this paper focuses on side-channel attacks on ECC, let us recall now some basics on elliptic curves and especially on the various ways of representing points on such objects (the reader could refer to [23, 32] for more details).

Throughout this paper, we are interested in elliptic curve implementations running on platforms (ASIC, FPGA, micro-controller) embedding a hardware modular multiplier (e.g. a 16-bit, 32-bit or 64-bit multiplier). On such implementations, the considered elliptic curves are usually defined over a prime finite field \(\mathbb {F}_{p}\). In the rest of this paper, we will assume that all curves are defined over \(\mathbb {F}_{p}\) with p≠{2,3}. The algorithm used for the hardware modular multiplication is assumed to be known to the attacker. Moreover, to simplify the attack descriptions, we assume hereafter that the latter multiplication is performed in a very simple way: a schoolbook long integer multiplication followed by a reduction. Most of current devices do not implement the modular multiplications that way, but the attacks described hereafter can always be adapted by changing the definition of the elementary operations of Section 4.3 (see Appendix B).

Definition. An elliptic curve E over a prime finite field \(\mathbb {F}_{p}\) with p≠{2,3} can be defined as an algebraic curve of affine reduced Weierstrass equation:

$$ (E) : y^{2} = x^{3} + ax + b \enspace, $$
(3)

with \((a,b) \in (\mathbb {F}_{p})^{2}\) and 4a 3+27b 2≠0. Let P=(x 1,y 1) and Q=(x 2,y 2) be two points on (E), the sum R=(x 3,y 3) of P and Q belongs to the curve under a well-known addition rule [37]. The set of pairs \((x,y) \in (\mathbb {F}_{p})^{2}\) belonging to (E), taken with an extra point 𝜗, called point at infinity, form an abelian group named \(E(\mathbb {F}_{p})\).

In the rest of the paper, the points will be represented using their projective coordinates. Namely, a point P=(x,y) is expressed as a triplet (X:Y:Z) such that X=x Z and Y=y Z. This choice is discussed in Appendix C.

3.2 Points operations in presence of SCA

This paper focusses on elliptic curves cryptosystems which involve the scalar multiplication [s]P, implemented with the well-known double and add algorithm.

In a non-protected implementation, the sequence of point doublings and point additions can reveal the value of s with a single leakage trace. Thus to protect the scheme against S-PA, the sequence of point operations must be independent from the secret value. This can be achieved in several ways. The double and add always algorithm [25] is the simplest solution. It consists in inserting dummy point additions each time the considered bit value of s is equal to 0. In average, this solution adds an overhead of \(\frac {\log _{2}(\textit{s})}{2}\) point additions. Another technique consists in using unified formulae for both addition and doubling [10, 11, 42]. Finally, the strategy which is usually adopted in constrained devices such as smart cards is atomicity since it achieves the best time/memory trade-off [17, 29, 43]. This principle is a refinement of the double and add always technique. It consists in writing addition and doubling operations as a sequence of a unique pattern. This pattern is itself a sequence of operations over \(\mathbb {F}_{p}\). Since the pattern is unique, the same sequence of field operations is repeated for the addition and the doubling, the only difference being the number of times the pattern is applied for each operation. It thus becomes impossible to distinguish one operation from the other or even to identify the starting and ending of these operations.

To defeat an atomic implementation, the adversary needs to use advanced side-channel attacks (see Section 2.1), such as D-PA, C-PA and so on. These attacks focus on the operations operands instead of only focusing on the kind of operations. They usually require more observations than for S-PA since they rely on statistical analyses. In the ECC literature, such attacks have only been investigated in the vertical setting, where they can be efficiently prevented by input randomization.

4 Horizontal collision correlation attack on ECC

We show hereafter that implementations combining atomicity and randomization techniques are in fact vulnerable to collision attacks in the horizontal setting. This raises the need for new dedicated countermeasures.

This section starts by recalling some basics on collision attacks. Then, assuming that the adversary is able to distinguish when two field multiplications have a common (possibly unknown) operand, we show how to exhibit flaws in the atomic algorithms proposed in [17, 29, 43] and also in implementations using the unified formulae for Edward’s curves [9]. Eventually, we apply the collision attack presented in the first subsection to show how to efficiently deal with the previous assumption.

4.1 Collision power analysis in the horizontal setting

To recover information on a subpart s of the secret s, collision side-channel analyses are usually performed on a sample of observations related to the processing, by the device, of two variables O 1 and O 2 that jointly depend on s. The advantage of those attacks, compared to the classical ones, is that the algorithm inputs can be unknown since the adversary does not need to compute predictions on the manipulated data. When performed in the horizontal setting, the observations on O 1 and O 2 are extracted from the same algorithm execution (see Section 2.1). Then, the correlation between the two samples of observations is estimated thanks to the Pearson’s coefficient (see (1)) in order to recover information on s. We sum up hereafter the outlines of this attack, that will be applied in the following.

Remark 3

In Table 3, we use Pearson’s coefficient to compare the two samples of observations but other choices are possible (e.g. mutual information).

Table 3 Collision power analysis

Remark 4

In order to deduce information on s from the knowledge of \(\hat \rho \), one may use for instance a Maximum Likelihood distinguisher (see a discussion on that point in Section 5).

In the next section, the attack in Table 3 is invoked as an Oracle enabling to detect whether two field multiplications share a common operand.

Assumption 1

The adversary can detect when two field multiplications have at least one operand in common.

In Section 4.3, we will come back to the latter hypothesis and will detail how it can indeed be satisfied in the particular context of ECC implementations on constrained systems.

4.2 Attacks on ECC implementations: core idea.

We start by presenting the principle of the attack on atomic implementations, and then on an implementation based on unified (addition and doubling) formulae over Edward’s curves.

Attack on Chevallier-Mames et al.’s Scheme.

In Chevallier-Mames et al.’s atomic scheme, historically the first one, the authors propose the three first patternsFootnote 5 given in Fig. 4 for the doubling of a point Q=(X 1:Y 1:Z 1) and the addition of Q with a second point P=(X 2:Y 2:Z 2).

Fig. 4
figure 4

Three first atomic patterns of point doubling and addition

As expected, and as a straightforward implication of the atomicity principle, the doubling and addition schemes perform exactly the same sequence of field operations if the star (dummy) operations are well chosenFootnote 6. This implies that it is impossible to distinguish a doubling from an addition by just looking at the sequence of calculations (i.e. by S-PA). Let us now focus on the operations’ operands. In the addition scheme, the field multiplications in Patterns 1 and 3 both involve the coordinate Z 2. On the contrary, the corresponding multiplications in the doubling scheme have a priori independent operands (indeed the first one corresponds to the multiplication X 1X 1, whereas the other one corresponds to \({Z_{1}^{2}}\cdot {Z_{1}^{2}}\)). If an adversary has a mean to detect this difference (which is actually the case under Assumption 1), then he is able to distinguish a doubling from an addition and thus to fully recover the secret scalar s. Indeed, let us focus on the processing of the second step of the double and add left-to-right algorithm, and let us denote by s the most significant bit of s. Depending on s, this sequence either corresponds to the processing of the doubling of Q=[2]P (case s=0) or to the addition of Q=[2]P with P (case s=1). Eventually, the results T 1 and T 2 of the field multiplications in respectively Patterns 1 and 3 satisfy:

$$\left\{ \begin{array}{l} T_{1} = \left(X_{1}\cdot X_{1}\right)^{1-s} \cdot \left(Z_{2}\cdot Z_{2}\right)^{s} \\ T_{2} =\left({Z_{1}^{2}}\cdot {Z_{1}^{2}} \right)^{1-s} \cdot \left({Z_{2}^{2}}\cdot Z_{2}\right)^{s} \\ \end{array}\right.\enspace , $$
(4)

where we recall that we have P=(X 2:Y 2:Z 2) and Q=(X 1:Y 1:Z 1). Equation 4 and Assumption 1 enables to deduce whether s equals 0 or 1. Applying this attack \(\log _{2}({s})\) times, all the bits of s can be recovered one after the other.

We now show that the same idea can successfully be applied to attack the other atomic implementations proposed in the literature, namely those of Longa [43] and Giraud and Verneuil [29].

Attack on Longa’s Scheme.

The atomic pattern introduced by Longa in [43] is more efficient than that of Chevallier-Mames et al. ’s scheme. This improvement is got by combining affine and Jacobian coordinates in the points addition, see Fig. 5.

Fig. 5
figure 5

The first and third patterns used in atomicity of Longa

It can be seen that the first and third patterns of Longa’s scheme contain two field multiplications that either have no operand in common (doubling case) or share the operand Z 1 (addition case). Similarly to Chevallier-Mames et al.’s scheme, we can hence define the two following random variables:

$$\left\{ \begin{array}{l} T_{1} = \left(Z_{1}^{}\cdot Z_{1}^{}\right)^{1-s} \cdot \left(Z_{1}^{}\cdot Z_{1}^{}\right)^{s}\\ T_{2} =\left(X_{1}^{}\cdot 4{Y_{1}^{2}}\right)^{1-s} \cdot \left({Z_{1}^{2}} \cdot Z_{1}^{}\right)^{s} \\ \end{array}\right. \enspace , $$
(5)

Under Assumption 1, it leads to the recovery of s.

Attack on Giraud and Verneuil’s Scheme.

Giraud and Verneuil introduced in [29] a new atomic pattern which reduces the number of field additions, negations and dummy operations (⋆) compared to the above proposals. The patterns are recalled in Fig. 6.

Fig. 6
figure 6

The beginning of Giraud and Verneuil’s patterns

Once again, depending on the secret s, we observe a repetition of two multiplications with a common operand in the first pattern of the addition scheme (ADD 1.), leading to the following equations:

$$\left\{ \begin{array}{l} T_{1} = \left(X_{1}\cdot X_{1}\right)^{1-s} \cdot \left(Z_{2}\cdot Z_{2}\right)^{s} \\ T_{2} =\left(2Y_{1}\cdot Y_{1}\right)^{1-s} \cdot \left({Z_{2}^{2}}\cdot Z_{2}\right)^{s} \\ \end{array}\right. \enspace , $$
(6)

which, under Assumption 1, leads to the recovery of s.

Remark 5

A second version of the patterns in Fig. 6 has been proposed in [29] which allows to save more field additions and negations without addition of dummy operations. This proposal share the same weakness as the previous ones and our attack still applies.

Attack on Edward’s Curves.

Edward’s representation of elliptic curves has been introduced in [26]. In a subsequent paper [10], Bernstein and Lange homogenized the curve equation in order to avoid field inversions in Edward’s addition and doubling formulae. For this homogenized representation, points addition and doubling are both computed thanks to the same formula. Let P=(X 1:Y 1:Z 1) and Q=(X 2:Y 2:Z 2) be two points on the curve, the sum R=(X 3:Y 3:Z 3) of P and Q is given by the following system:

$$ \left\{\begin{array}{lll} X_{3} &=& Z_{1}Z_{2}(X_{1}Y_{2}-Y_{1}X_{2})(X_{1}Y_{1}{Z_{2}^{2}}+{Z_{1}^{2}}X_{2}Y_{2})\\ Y_{3} &=& Z_{1}Z_{2}(X_{1}X_{2}+Y_{1}Y_{2})(X_{1}Y_{1}{Z_{2}^{2}}-{Z_{1}^{2}}X_{2}Y_{2})\\ Z_{3} &=& d{Z_{1}^{2}}{Z_{2}^{2}}(X_{1}X_{2}+Y_{1}Y_{2})(X_{1}Y_{2}-Y_{1}X_{2}) \end{array}\right.\enspace, $$
(7)

where d is some constant related to the Edward curve equation. Formulae (7) works whether P equals Q or not, meaning that it applies similarly for addition and doubling. This is one of the main advantage of Edward’s representation compared to the other ones (e.g. Projectives) where such unified formulae do not exist. For attack illustration purpose, we give in Fig. 7 a sequence of operations which could appear when evaluating the formulae in (7) either to compute P+P or P+Q.

Fig. 7
figure 7

First steps of algorithm for addition

Here, we can exploit the fact that the multiplication X 1 Z 1 is performed twice if P=Q (i.e when the formula processed a doubling), which is not the case otherwise (see Fig. 7). We can hence define the two following random variables:

$$\left\{ \begin{array}{l} T_{1} = \left(X_{1}\cdot Z_{1}\right)^{1-s} \cdot \left(X_{1}\cdot Z_{2}\right)^{s} \\ T_{2} =\left(X_{1}\cdot Z_{1}\right)^{1-s} \cdot \left(X_{2}\cdot Z_{1}\right)^{s} \\ \end{array}\right. \enspace , $$
(8)

which, under Assumption 1, leads to the recovery of s.

Remark 6

This technique still applies in the case of other unified formulae (e.g. those introduced in [16]). Indeed, the sequence of operations in [16] present the same weaknesses as illustrated in Fig. 7. The multiplication X 1 Z 1 is performed twice if the current operation is a doubling (see the first and third multiplications in [16, Section 3, Fig. 1]).

In [10], Bernstein and Lange propose a sequence of operations to evaluate (7) while minimizing the total number of multiplications and squarings. We recall it in Fig. 8.

Fig. 8
figure 8

First steps of algorithm for addition

For P=Q or PQ, it may be checked that the sequence in Fig. 8 does not contain two terms which share the same operand. This implies that the attack strategy developed against previous atomic schemes does no longer apply here. It is however possible to follow another attack strategy relying on some defect of the double-and-add algorithm relatively to collision attacks. Indeed, with the double-and-add algorithm, every addition operation leads to perform an addition by a same point : the base point of the scalar multiplication. If we assume that the position in the operations flow of such an addition is known, for example if the most or least significant bit of the exponent is 1, then it is trivial to identify all the positions of the point additions using Assumption 1. Remark that this is quite general : this attack does not rely on some property of the unified sequence of operations described by Bernstein and Lange to compute point additions on Edward curves. It shows that the double-and-add algorithm is vulnerable to the attacks we present even when used with unified formulae.

4.3 Distinguishing common operands in multiplications

In this section we apply the collision attack principle presented in Section 4.1 to show how an adversary may deal with Assumption 1. This will conclude our attack description. As mentioned before, we assume that the field multiplications are implemented in an arithmetic co-processor with a Long Integer Multiplication (LIM) followed by a reduction. Many other multiplication methods exist but our attack can always be slightly adapted to also efficiently apply to those methods (see Appendix B).

Let ω denote an architecture size (e.g. ω equals 8, 16 or 32) and let us denote by (X[t],…,X[1])2ω the base- 2ω representation of an integer. We recall hereafter the main steps of the LIM when applied between two integers X and Y.

figure a

Let W, X, Y and Z be four independent values of size t ω bits. We show hereafter how to distinguish by side-channel analysis the following three cases:

  • Case (0) where the device processes LIM(X,W) and LIM(Y,Z) (all the operands are independent),

  • Case (1) where LIM(X,Z) and LIM(Y,Z) are processed (the two LIM processings share an operand).

  • Case (2) where LIM(X,Z) is processed two times (the two LIM processings operate on the same inputs).

To summarize, case (i) corresponds to two integer multiplications sharing i input factor(s).

For such a purpose, and by analogy with our side-channel model in Section 2.1 and Table 3, we denote by C 1 (resp. C 2) the multiplication in the loop during the first LIM processing (resp. the second O processing) and by O 1 (resp. O 2) its result. The output of each ω-bit word multiplication during the loop may be viewed as a realization of the random variable \(O_{a, b}^{1}\) (resp. \(O_{a,b}^{2}\)). To each of those realizations we associate a leakage \(\ell _{a,b}^{1}\) (resp. \(\ell ^{2}_{a,b}\)). To distinguish between cases (1), (2), and (3), we directly apply the attack described in Table 3 and we compute the Pearson’s correlation coefficient:

$$ \rho\left( (\ell_{a,b}^{1})_{a,b} , (\ell_{a,b}^{2})_{a,b} \right) \enspace . $$
(9)

In the specific case of the LIM, where all the word products X[a]Y[b] are processed, on can also average the observations of computations sharing a same input word Y[b]. This post-processing on the observations leads to evaluate the following Pearson coefficient in the attack:

$$ \rho^{mean}\left( (\ell_{a,b}^{1})_{a,b} , (\ell_{a,b}^{2})_{a,b} \right) =\rho\left( \left(\frac1t\sum\limits_{a} \ell_{a,b}^{1}\right)_{b} , \left(\frac1t\sum\limits_{a} \ell_{a,b}^{2}\right)_{b}\right) \enspace . $$
(10)

In the following section, we actually argue that this second correlation coefficient gives better results, which is confirmed by our attacks simulations reported in Section 5.

4.4 Study of the attack soundness

This section aims at arguing on the soundness of the approach described previously to distinguish common operands in multiplications. For such a purpose, we explicit formulae for the linear correlation coefficients corresponding to Pearson’s coefficients given in (9) and (10). Indeed, Pearson’s coefficient can be viewed as an estimator of the linear correlation coefficient: when the number of samples tends toward infinity, it tends toward the linear correlation coefficient.

For simplicity, the development is made under the assumption that the device leaks the Hamming weight of the processed data but similar developments could be done for other models and would lead to other expressions. Under the Hamming weight assumption, we have \(\ell _{a,b}^{1} \hookleftarrow \mathbf {HW} (O_{a,b}^{1}) + B_{a,b}^{1}\) and \(\ell _{a,b}^{2} \hookleftarrow \mathbf {HW} (O_{a,b}^{2}) + B_{a,b}^{2}\) where the \(B^{i}_{a,b}\) random variables are independent Gaussian random variables with zero mean and same standard deviation σ. The three cases presented in last section, 0-, 1- or 2-shared factors in two t-word integers multiplications, can be expressed in the following manner:

  • \(O_{a,b}^{1} = X[a].W[b], O_{a,b}^{2} = Y[a].Z[b]\)

  • \(O_{a,b}^{1} = X[a].Z[b], O_{a,b}^{2} = Y[a].Z[b]\)

  • \(O_{a,b}^{1} = X[a].Z[b], O_{a,b}^{2} = X[a].Z[b]\)

We model the integers X,Y,Z,W as vectors of independent, uniformly distributed, ω-bit word random variables. Furthermore, as seen in the previous subsection, we can either apply the Pearson coefficient directly using (9), or take advantage of the properties of the LIM to aggregate all observations pertaining to a same word of the second factor of the multiplication using (10). In the former case, the pairs \((\ell _{a,b}^{1}, \ell _{a,b}^{2})\) can be seen as t 2 samples of a pair of noisy random variables following distributions which depend on the number of shared factors between O 1 and O 2. In the latter case, the pairs of aggregated values can be seen as t samples of pairs of noisy averaged random variables of the form \(\frac {1}{t}{\sum }_{a=1}^{t} \text {HW}(U[a]V[b])\), with distributions depending on the number of shared factors.

For everyone of the 6 cases (3 attack cases times 2 types of correlation processing – normal or with averaging –), the correlation between the two noisy random variables can be expressed as a function of statistical parameters of the Hamming weights of word products (variance and covariance), and the standard deviation σ of the noise random variables. To get the expressions, the bilinearity of the covariance, the independence of the noise random variables and the word random variables may be used, which eventually results in the following formulae:

$$\begin{array}{c} \rho_{0} = \rho_{0}^{mean} = 0;\\ \rho_{1} = \displaystyle\frac{cov(HW(X.Z), HW(Y.Z))}{var(HW(X.Z)) + \sigma^{2}}; \rho_{2} = \displaystyle\frac{var(HW(X.Z))}{var(HW(X.Z)) + \sigma^{2}};\\ \rho_{1} = \displaystyle\frac{t^{2}cov(HW(X.Z),HW(Y.Z))}{tvar(HW(X.Z)) + t(t-1)cov(HW(X.Z),HW(Y.Z)) + t\sigma^{2}};\\ \rho_{2} = \displaystyle\frac{tvar(HW(X.Z)) + t(t-1)cov(HW(X.Z),HW(Y.Z))}{tvar(HW(X.Z)) + t(t-1)cov(HW(X.Z),HW(Y.Z)) + t\sigma^{2}} \enspace , \end{array} $$

where the index of the correlation coefficient refers to the attack case (a.k.a. the number of shared operands between the two word products).

Establishing explicit expressions of the variance and covariance of the Hamming weight of products of uniformly distributed random variables in the general case remains an open question. There are two favourable cases where we are able to obtain such expressions.

The parameter ω is small.

The distribution considered can be computed and thus their variance and covariance can be derived. For example for ω=8, we have

$$var(HW(X.Z)) = \displaystyle\frac{1136522959}{2^{28}} $$
$$cov(HW(X.Z), HW(Y.Z)) = \displaystyle\frac{279558159}{2^{28}} $$

Hamming weight of least significant bits.

The elementary word multiplication takes as input two ω-bit words and outputs a 2ω-bit word. This output is stored in 2 words. If the leakage associated to the multiplication can be decomposed in two parts, associated to the most significant bits and least significant bits of the results, then word multiplication mod 2ω can be considered instead of word multiplication in the ring of the integers. Due to the ring structure of \(\mathbb {Z}/2^{\omega }\mathbb {Z}\), explicit formulae giving the variances and covariances of interest as function of ω can be derived.

We have

$$\begin{array}{rcl} var(HW(X.Z)) & = &\frac{1}{2^{2\omega+2}} \left((\omega+1)2^{2\omega} -2\omega 2^{\omega}- 1\right),\\ cov(HW(X.Z),HW(Y.Z)) &= &\frac{1}{2^{2\omega+2}} \left(2.2^{2\omega} -(2\omega+1) 2^{\omega}- 1\right), \end{array} $$
$$\begin{array}{rclrcl} \rho_{1} &=& \displaystyle\frac{1}{1+\frac{2^{2\omega+2}\sigma^{2} + (\omega-1)2^{2\omega}+2^{\omega}}{2.2^{2\omega} -(2\omega+1) 2^{\omega}- 1}}, & \rho_{1}^{mean} &=&\displaystyle\frac{1}{1+\frac{1}{t}\frac{2^{2\omega+2}\sigma^{2} + (\omega-1)2^{2\omega}+2^{\omega}}{2.2^{2\omega} -(2\omega+1) 2^{\omega}- 1}} ,\\ \rho_{2} &=&\displaystyle\frac{1}{1+\frac{2^{2\omega+2}\sigma^{2}}{(\omega+1)2^{2\omega} -2\omega 2^{\omega}- 1}}, & \rho_{2}^{mean} &=&\displaystyle\frac{1}{1+\frac{2^{2\omega+2}\sigma^{2}}{t(2.2^{2\omega} -(2\omega+1) 2^{\omega}- 1) + 2^{\omega}[(\omega-1)2^{\omega}+1]}}.\\ \end{array} $$

Note that when t tends towards infinity, the correlation coefficient of averaged variables tends towards 1 (which is optimal), whereas the correlation coefficient when considering directly the random variables has some value strictly lower than 1 independently of the size of the sample.

5 Experiments

In order to validate the approach presented in Section 4.3 and thus to illustrate the practical feasibility of our attack, we performed several simulation campaigns for various sizes of elliptic curves, namely \(\lceil \log _{2}(p) \rceil \in \{160,256,384\}\), implemented on different kinds of architectures, namely ω∈{8,32} using the Chevallier-Mames et al. ’s scheme. Each experiment has been performed in the same way. For each (p,ω), we computed Pearson’s correlation coefficient (10) between the sample of observations coming from the leakages on operations C1 and C2 in the two following cases Footnote 7:

  • when the secret bit s is equal to 1, that is when an addition is performed (which implies correlated random variables, see (4)),

  • when the secret bit s is equal to 0, that is when a doubling operation is performed (which implies independent random variables, see (4)).

From the configuration (p,ω), the size t of the observations’ samples used in the attack can be directly deduced: it equals \(\lceil \frac {\log _{2}(p)}{\omega } \rceil \). The quality of the estimations of the correlation coefficient by Pearson’s coefficient depends on both the observations signal to noise ratio (SNR) and t. When the SNR tends towards 0, the sample size t must tend towards infinity to deal with the noise. Since, in our attack the samples size cannot be increased (it indeed only depends on the implementation parameters p and ω), our correlation estimations tend towards zero when the SNR decreases. As a consequence, distinguishing the two Pearson coefficients coming from s=0 and s=1 becomes harder when the SNR decreases. This observation raises the need for a powerful (and robust to noise) test to distinguish the two coefficients. To take this into account for each setting (p,ω) and several SNR, we computed an histogram approximation of the distribution of Pearson’s coefficient defined in (10) over samples of size t. To build those kinds of templates, leakages have been generated in the Hamming weight model with additive Gaussian noise of mean 0 and standard deviationFootnote 8 σ. When there is no noise at all, namely when σ=0 (i.e \(\text {SNR}=+\infty \)), one can observe that the mean of Pearson’s coefficient is coherent with the predictions evaluated in Section 4.4.

Figures 9, 10, 11 and 12 illustrate the spreading of the obtained Pearson’s coefficients. The curves indicate the evolution of the maxima of the distributions, and the colored cone around the maximum indicates the smallest interval containing more than half of the probability weight of the Pearson’s coefficient distribution. This gives us information about the amount of trust we can put into the values obtained during the attacks. It also shows whether a distinction between the right hypothesis and the wrong one can easily be highlighted. For each SNR value (denoted by τ) and each sample size t, let us denote by \(\hat {\rho }_{0,t}(\tau )\) (resp. \(\hat {\rho }_{1,t}(\tau )\)) the random variable associated to the processing of (10) for s=0 (resp. for s=1). Clearly, the efficiency of the attack described in Section 4 depends on the ability of the adversary to distinguish, for a fixed pair (t,τ), the distribution of \(\hat {\rho }_{0,t}(\tau )\) from that of \(\hat {\rho }_{1,t}(\tau )\). In other terms, once the adversary has computed a Pearson coefficient \(\hat {\rho }\) he must decide between the two following hypotheses; \(\mathrm {H}_{0}: \hat {\rho } \hookleftarrow \hat {\rho }_{0,t}(\tau )\) or \(\mathrm {H}_{1}: \hat {\rho } \hookleftarrow \hat {\rho }_{1,t}(\tau )\). For such a purpose, we propose here to apply a maximum likelihood strategy and to choose the hypothesis having the highest probability to occur. Based on the approximation of the Pearson’s coefficient we obtained, we computed the value \(\rho _{t}^{\text {limit}}(\tau )\) for which the values of the density probability function in both hypotheses are equal. During the attack, if \(\hat {\rho }\) is smaller than \(\rho _{t}^{\text {limit}}(\tau )\), the distinguisher chooses H0, otherwise it chooses H1. Attacks reported in Figs. 13 and 14 apply this strategy. They aim at recovering one bit of the secret scalar.

Fig. 9
figure 9

Pre-computations on w=8-bit registers

Fig. 10
figure 10

Pre-computations on w=8-bit registers

Fig. 11
figure 11

Pre-computations on w=32-bit registers

Fig. 12
figure 12

Pre-computations on w=32-bit registers

Fig. 13
figure 13

Success rate of the attack on 8-bit registers

Fig. 14
figure 14

Success rate of the attack on 32-bit registers

Remark 7

Since the adversary is not assumed to know the exact leakage SNR, the maximum likelihood can be computed for several SNR values τ starting from \(\infty \) to some pre-defined threshold. This problematic occurs each time that the principle of collision attacks is applied.

Remark 8

For a curve of size \(n=\lceil \log _{2}(p) \rceil \) and a ω-bit architecture, the adversary can have a sample of \(t=\lceil \frac {n}{\omega }\rceil \) observations if he averages over the columns and \(t=\lceil (\frac {n}{\omega })^{2}\rceil \) without averaging. All experiments provided in this section have been performed using the “average” strategy.

This attack works for any kind of architecture, even for a 32-bit one (see Fig. 14), which is the most common case in nowadays implementations. In the presence of noise, the attack success decreases highly but stays quite successful for curves of size 160, 256 and 384 bits. In all experiments (Figs. 13 and 14), we also observe that the success rate of our attack increases when the size of the curve becomes larger. This behaviour can be explained by the increasing number of observations available in this case. Paradoxically, it means that when the theoretical level of security becomes stronger (i.e p is large), resistance against side-channel attacks becomes weaker. This fact stands in general for horizontal attacks and has already been noticed in [19, 56].

6 Discussion about possible countermeasures

In this section we first recall classical countermeasures that are usually involved to defeat simple SCA and vertical advanced SCA, and we discuss about their (in)efficiency in the horizontal setting. In particular, following the same reasoning as in [20], we alert on the fact that a countermeasure effectiveness can be annihilated when going from the vertical context to the horizontal one. Then, in Section 6.2, we particularly focus on several countermeasures dedicated to Horizontal advanced SCA, trying to identify those that are the most effective against the collisions attack proposed in this paper.

6.1 Overview of classical countermeasures on elliptic curves

Careful choice of the elliptic curve scalar multiplication.

A first natural idea is to look for a scalar multiplication scheme inherently resistant against our horizontal collisions attack. The choice of a regular scheme (always performing the same sequence of additions and doublings whatever the secret scalar) seems a priori pertinent as distinguishing the two operations brings no sensitive information. From this point of view, the schemes Double & Add Always [25] or the Montgomery Ladder [47] look interesting. We recall them hereafter:

figure b
figure c

Unfortunately, even if the above algorithms withstand straightforward applications of our attack, they stay vulnerable to a slight adaptation of it. Let us respectively denote by \(R_{0}^{(i)}\) and \(R_{1}^{(i)}\) the values of the registers R 0 and R 1 after the i thiteration of the loop.

  • For Algorithm 2, we have \(R_{0}^{(i+1)}=2R_{0}^{(i)}\). This doubling and the addition performed to compute \(R_{1}^{(i)}\) have the first operand \(R_{0}^{(i)}\) in common iff s i =0 (otherwise one can assume that they operate on independent operands). It is therefore possible to recover the value of each bit s i by applying the idea of our horizontal collisions attack Footnote 9 to the sequences of field operations involved in both \(R_{0}^{(i+1)}\leftarrow 2R_{0}^{(i)}\) and \(R_{1}^{(i)}\leftarrow R_{0}^{(i)}+P\).

  • For Algorithm 3, we have \(R_{1-s_{i+1}}^{(i+1)}=R_{0}^{(i)}+R_{1}^{(i)}\). This addition and the subsequent doubling performed to compute \(R_{s_{i}}^{(i+1)}\) have the operand \(R_{0}^{(i)}\) in common iff s i+1=0. It is therefore possible to recover the value of each bit s i+1 by applying the idea of our horizontal collisions attack to the manipulations of the field coordinates of the first operand in both \(R_{1-s_{i+1}}^{(i+1)}\leftarrow R_{0}^{(i)} + R_{1}^{(i)}\) and \(R_{s_{i+1}}^{(i+1)}\leftarrow 2R_{s_{i+1}}^{(i)}\). The same kind of flaw can also be found in the left-to-right version of the Montgomery Ladder proposed in [35].

This attack is defeated if the step \(R_{1-s_{i}}\leftarrow R_{0}+R_{1}\) in Algorithm 3 is replaced by \(R_{1-s_{i}}\leftarrow R_{1-s_{i}}+R_{s_{i}}\) (which is actually the way Montgomery Ladder is classically described in the SCA literature – e.g. [31] –).

Randomizing the scalar.

This countermeasure was proposed by Coron in [25]. It consists in changing the value of the secret scalar in the point multiplication for each computation. For most schemes based on elliptic curves this countermeasure may be viewed as part of the protocol since the secret used in the point multiplication changes at each execution of the protocol. This is for instance the case with ECDH and ECDSA. This explains why, usually, specific countermeasures against advanced (vertical) attacks are not implemented in the elliptic curve setting. When such a countermeasure is added on purpose or part of the protocol, it does not provide any protection against our attack since it only requires one power curve.

Blinding the base point.

This is the second countermeasure proposed by Coron in [25]. It consists in modifying the point P by adding a random point R. This countermeasure does not have any impact against our attack since the adversary recovers the value of the secret exponent independently from the base point value.

Randomizing the coordinates.

This third countermeasure of Coron [25], which modifies the coordinates of the point P, has no effect on our attack exactly for the same reasons as for the second countermeasure. Indeed, the secret scalar is recovered independently from the base point value.

Splitting the scalar.

It is considered in [18, 22]. It consists in splitting the scalar in two parts i.e in computing [s]P=[s 1]P+[s 2]P. This countermeasure decreases the vertical attacks efficiency by a quadratic factor since the two point multiplications need to be combined in a so-called second-order attack setting. Against our attack, this countermeasure is much less efficient. Indeed, its efficiency is only decreased by a factor 2 since we are able to recover the bits of s 1 first, then those of s 2 in an independent way.

Randomizing the curve or the field.

This idea is described in [4, 34, 54] with different techniques. All these techniques modify the input of the point multiplication but they do not hide the property exploited in most of our attacks, that is the reuse of the point coordinate Z in the addition operation, and thus turn out to be inefficient in our setting.

6.2 Investigating countermeasures inside modular multiplication

The previous section has raised the need for new techniques to thwart side-channel analysis in the horizontal setting. In this section, we deal with this issue by investigating whether the solutions proposed in [19, 20] and developed in [55, Sec. 2.7] in the context of RSA can be applied to ECC.

As discussed in Section 4.3 and Appendix B, the multiplication of two integers U=(U[t],…U[1])2ω and V=(V[t],…V[1])2ω frequently leads to the processing of the following ω-bit word multiplications:

$$\left( \begin{array}{cccc} U[1] V[1] & U[1] V[2] & {\cdots} & U[1] V[t] \\ U[2] V[1] & {\cdots} & {\cdots} & U[2] V[t] \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ U[t] V[1] \ & \ {\cdots} \ & {\cdots} & U[t] V[t] \end{array} \right) \enspace . $$

Essentially, our attack consists in detecting when the same value V is used for two different modular multiplications. For such a purpose we correlate the elements of the previous matrix with those of the following one:

$$\left( \begin{array}{cccc} U^{\prime}[1] V^{\prime}[1] & U^{\prime}[1] V^{\prime}[2] & {\cdots} & U^{\prime}[1] V^{\prime}[t] \\ U^{\prime}[2] V^{\prime}[1] & {\cdots} & {\cdots} & U^{\prime}[2] V^{\prime}[t] \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ U^{\prime}[t] V^{\prime}[1] & {\cdots} & {\cdots} & U^{\prime}[t] V^{\prime}[t] \end{array} \right) \enspace , $$

where \(V = V^{\prime }\) for s=1 and \(V \neq V^{\prime }\) for s=0.

Countermeasures proposed in [19, 20, 55] consist in protecting the U[i]V[j] (resp. \((U^{\prime }[i] V^{\prime }[j])\)) products by blinding some of the operands with different random values and/or by randomizing the order in which they are processed. We study hereafter the soundness of those techniques when applied in ECC context.

6.2.1 Operands blinding

This countermeasure blinds each U[i] and V[j] with two random values R 1 and R 2 such that U[i]V[j]=(U[i]−R 1)(V[j]−R 2)+R 1 V[j]+R 2 U[i]−R 1 R 2. Then, each of the four terms is computed independently. First of all, it must be noticed that the multiplicative masking of V[j] (by R 1) and U[i] (by R 2) is not effective when U[i] or V[j] is null, which introduces a flaw that may be exploited by C-PA [30]. Moreover, the values \(\frac {1}{t}({\sum }_{i} U[i])(V[j] - R_{2})\) are still correlated with the values \(\frac {1}{t}({\sum }_{i} U^{\prime }[i])(V^{\prime }[j] - R^{\prime }_{2})\) when \(V=V^{\prime }\). Thus, although this countermeasure decreases the attack efficiency, it does not totally remove the leakage.

6.2.2 Shuffling rows and columns

In this countermeasure, rows and columns of the matrix are permuted independently. This means that one permutation applies on the rows and a second one on the columns. However one can notice that the rows permutation is the same for each column and reciprocally. Shuffling rows has no impact since U is unknown in the attack. When averaging over each column of the matrix, we observe that \(\left (\frac {1}{t}({\sum }_{j} U[j]) V[i] \right )_{i}\) and \(\left (\frac {1}{t}({\sum }_{j} U^{\prime }[j]) V^{\prime }[i] \right )_{i}\) stay correlated when \(V[i]=V^{\prime }[i]\). However, due to the column permutation, the adversary needs to guess the value of the permutation in order to observe this correlation. As a consequence, this countermeasure adds a t! search factor to the computational time of the attack.

6.2.3 Shuffling and blinding

In this countermeasure, only the V[j] are blinded using t independent random values associated with each row, while the rows of the matrix are randomly permuted. Namely, the U[i] are blinded while the columns of the matrix are permuted. This countermeasure prevents our attack, but opens new issues such as the manipulation of the correcting factor related to the blinding part. This value could be exploited in a zero-value attack [30].

6.2.4 Global shuffling

This countermeasure, proposed in [7], starts from the shuffling and blinding countermeasure, but performs the shuffling of rows and columns simultaneously. This essentially replaces two permutations of size t by a single one of size t 2, hence increasing the security against brute force attack from 2(t!) to (t 2!). Of course, care should also be taken when propagating the carry during the reconstruction of UV from the U[i]V[j]. This point is successfully addressed in [7]. Eventually, among shuffling methods, this countermeasure seems to be the most interesting one to defeat our attack. It is however still an open problem to formally quantify its effectiveness.

7 Conclusion

In this paper, we investigated the horizontal correlation attacks, introduced by Clavier et al., in the context of ECC implementations. We showed that these attacks, although fundamentally belonging to the well-known advanced side-channel attacks, are not covered by traditional countermeasures such as randomization techniques. Indeed, we have shown that we are able to apply such horizontal attacks on state-of-the-art SCA-protected ECC implementations. We showed how to defeat all the atomic point addition and doubling schemes proposed in the literature and also the one using the unified formulas, even when combined with randomization. Our simulations confirmed the validity of our attack for classical sizes of curves. It stays applicable even in a moderate noise setting.

We feel that this topic opens many areas for further research. Namely, the formal study and proofs of countermeasures against horizontal attacks is necessary in order to effectively protect implementations against this kind of attacks. It would also be of interest to investigate the applicability of such attacks to other domains of cryptography, such as pairings, code-based cryptography or keyed hash functions.