Keywords

1 Introduction

Not only designers of cryptographic devices have to implement the algorithms efficiently, they also have to ensure that sensible information that leaks through several side-channels (time, temperature, power consumption, electromagnetic emanations, etc.) during the execution of an algorithm, remains unexploited by an attacker. If not sufficiently protected, both symmetric and asymmetric cryptographic implementations are vulnerable to these so-called side-channel attacks (SCA). For public-key algorithms such as RSA, the main operation to be armoured consists of a multi-digit exponentiation over a finite ring. In this paper, we present an improved single-execution attack on a randomized implementation of RSA. However, the ideas and tools that we exploit would also apply in the context of CRT-RSA and (hyper)elliptic curves.

Attacking an exponentiation consists of identifying the bits of the exponent, a value that is often to be kept secret (it is either a secret key or a random secret value). Simple side-channel attacks [2], which uses a single trace of execution, are easily protected using so-called constant-time algorithms such as square-and-multiply-always [4], the Montgomery ladder [7] or atomicity [23].

However, these constant-time algorithms are not sufficient to defeat the more powerful differential [3] and correlation attacks [5]. Although very efficient on not sufficiently protected implementations, these attacks suffer from the very large number of traces to be collected in order to recover (part of) the secret. Collision attacks proposed by Fouque in 2003 [6] are very efficient; they only require two traces of execution on well chosen inputs. All these attacks are generally protected using exponent and/or message blinding using elementary algebraic manipulations. For example, randomization of an RSA exponent relies on the fact that \(m^d \equiv m^{d + r\phi (n)}\) mod n for any (random) value \(r\) (see Sect. 5). Apart from these well known tricks, randomization can also take place at the arithmetic level. The LRA concept [9], based on the Residue Number System, seems to be a robust, yet efficient [24, 25] alternative to more expensive (hardware) countermeasures.

Novel attacks [1417] have recently emerged. Unlike the well studied family of differential [3] and correlation attacks [5], these so-called horizontal correlation attacks (HCA), aim at correlating the Hamming Weight HW(\(m\)) of a known message \(m\), with a set of well-chosen sample points \(t_{i}\) from one single trace. Some of them [15, 17] are indeed efficient in the presence of message blinding. They exploit the very high regularity of multi-digit exponentiation algorithms and represent a very serious threat against classical randomization countermeasures. A major advantage of single-trace-based attacks is their natural immunity to exponent blinding, since, in many cases, recovering a random exponent is sufficient to break the cryptosystem (see Sect. 5).

Profiled template attacks can recover the exponent using few traces. As the original template attack [11] suggests, the attacker must have full control of the device. In particular, he must be able to send plain-texts of his choice to a known key device. In the case of public-key algorithms, the public-key is known and also can be used in the profiling phase. In this case, the pre-computations whose objective is to build the template set is refereed to as supervised learning. In [13] supervised template attacks are successfully applied on modular exponentiations in order to differentiate squarings from multiplications. More recently, a template attack on constant-time exponentiation algorithms was presented in [12], while [19] suggests a technique to attack the exponent blinding. A template attack targeting the memory addressing was presented in [22]. All those methods fall into the class of supervised attacks, i.e., a learning phase is required during which the adversary constructs templates by exploring the statistical characteristics of various types of operations.

When the adversary does not have a full control of the device, unsupervised methods are necessary. In [18], unsupervised learning has been presented to demonstrate the efficiency of localized EM attacks on exponentiations using a \(k\)-means clustering algorithm to differentiate the attacked samples. Their attack is performed on an ECC [27] implementation over a binary field using Lopez-Dahab coordinates [26]. The scalar is recovered using leakages collected during the execution of a single scalar multiplication (\(k \in \mathbb {Z}, P \in E(\mathbb {F}_{2^m}) \longrightarrow [k]P \in E(\mathbb {F}_{2^m}\)). However, their attack relies on the ability to acquire several simultaneous EM traces from different probe positionsFootnote 1. The leakages obtained from these multi-measurement sources are then combined together in order to reduce the signal-to-noise ratio. By doing so, they managed to classify the sampled points into two distinct sets which correspond to the zero bits (resp. non-zero bits) of the scalar \(k\).

In this paper, we present a single-trace, single-probe unsupervised attack, i.e. the side-channel data is collected from one EM probe only. In the next sections, we present the setting and statistical tools that we used to recover the entire exponent of a constant-time, randomized RSA implementation. Our attack is unsupervised because it does not require any a priori knowledge of the device, in particular we did not use the public key or send any chosen messages in order to learn the characteristics of the device. The chip under attack is a constant-time, RNS-based FPGA implementation of RSA protected with the Leak Resistant Arithmetic [9] and exponent blinding. Since all manipulated data is randomized, we explore the remaining leakages due to control instructions and memory activities. As previously demonstrated in the literature [20], memory and register addresses leak information related to the secret key. Instead of using simultaneous measurements as in [18], we combine the cluster classifications of several samples from each bit of the exponent. We thus process the probabilities obtained from this first phase to recover the entire exponent. Our attack requires four phases: trace pre-processing, points of interest identification, fuzzy \(k\)-means clustering, and exponent recovery. For this final phase, we present results obtained with three different statistical techniques (majority rule, normal probability density function and Bayesian classifier).

The paper is organized as follows: Sect. 2 gives details about the randomized exponentiation and the device under attack. The unsupervised learning based on clustering algorithms is detailed in Sect. 3. Section 4 presents the attack in details and the results that we obtained with the three statistical tools mentioned above. Possible countermeasures are suggested in Sect. 6.

2 The Randomized Exponentiation and the Device Under Test

The device under attack is a RNS-based implementation of RSA mapped onto a Spartan-3E xc3s1600 FPGA. For demonstration purposes, we considered a very weak 512-bit RSA. The modular exponentiation is computed with the regular and SPA-protected Montgomery ladder [8] using two sets of RNS bases \(\mathcal {A}\) and \(\mathcal {B}\) [10]. The atomic square-and-multiply [23] is also a regular and faster exponentiation. However as proposed in [15], randomized exponentiations can be explored through horizontal correlation attacks (HCA) if one of the intermediate operands, in the case the randomized input message, is used in several modular multiplications.

According to the leak resistant arithmetic (LRA) concepts [9], the RNS moduli can be randomized before each exponentiation. This countermeasure acts as a message blinding technique because it offers a high degree of randomization to the data. Furthermore, HCA exploits the regularity of long-integer multiplication (or squaring). The parallel RNS arithmetic is then a very limiting factor for this attack. Moreover, our hardware is protected with exponent blinding. Algorithm 1 shows the randomized exponentiation.

figure a

The operation \(MM(x, y, N, \mathcal {B}, \mathcal {A})\) returns \(xyB^{-1}\) mod N in the two RNS bases \(\mathcal {A}\) and \(\mathcal {B}\). Both squarings and multiplications are computed with the same number of clock cycles.

First, as the exponent is randomized, single-trace attack was the only option. Further, because the manipulated data is randomized with LRA, the target information of our unsupervised attack is not the data contribution in the EM traces. By data, we mean the intermediate variables which depend on the randomly selected RNS bases and the input message. Exponent-dependent decisions are taken by the architecture’s control in order to determine the memory address for reading or writing operands before, during and after the modular multiplications. These conditional tests, as well as the accessed memory addresses, cause subtle leakages of information. These are the only sources of leakages that we exploit in the present unsupervised attack. We present the details of our attack in the next sections.

3 Unsupervised Learning and the Clustering Algorithms

Clustering is one of the most frequently used data mining techniques, which is an unsupervised learning process for partitioning a data set into sub-groups so that the instances within a group are similar to each other and are very dissimilar to the instances of other groups. That is, we shall see what can be done when the collected samples are unlabelled and must be grouped in homogeneous clusters. Two different clustering methods are used in this work: the \(k\)-means and the fuzzy \(k\)-means algorithms [28].

The \(k\)-means algorithm is a geometric procedure for finding \(c\) means or centers (\(\mu _{1},\ldots ,\mu _{c}\)) considering a set of \(n\) samples \(x_{j}\), where \(1 \le j \le n\). The initialization phase consists in defining the number of clusters \(c\) and setting a random sample to each mean \(\mu _{i}\). Thereafter, the algorithm computes the Euclidean distances \(ED_{i,j}=\parallel x_{j}-\mu _{i}\parallel ^{2}\) for all \(n\) samples to obtain the maximum-likelihood estimation of the means \(\mu _{i}\). The \(k\)-means algorithm is shown in Algorithm 2.

figure b

The \(k\)-means algorithm iterates until no changes in \(\mu _{i}\) are verified. In all iterations each sample is assumed to be in exactly one cluster. The fuzzy \(k\)-means algorithm relaxes this condition and assumes that each sample \(x_{j}\) has some membership with different clusters \(\omega _{i}\), rather than belonging completely to just one cluster.

Initially, the probabilities of cluster membership for each point \(x_{j}\) of a \(n\) sample vector \(\mathbf x \) are normalized according to all clusters \(\omega _{i}\) as:

$$\begin{aligned} \sum _{i=1}^{c}P(\omega _{i}|x_{j})=1 \end{aligned}$$
(1)

where \(P(\omega _{i}|x_{j})\) is the probability that the sample \(x_{j}\) is in the cluster \(\omega _{i}\). At each iteration of the fuzzy \(k\)-means algorithm, the means (or centers) \(\mu _{i}\) are recomputed according to the following equation:

$$\begin{aligned} \mu _{j}=\frac{\sum _{j=1}^{n}[P(\omega _{i}|x_{j})]^{b}x_{j}}{\sum _{j=1}^{n}[P(\omega _{i}|x_{j})]^{b}} \end{aligned}$$
(2)

and the new probabilities are recomputed:

$$\begin{aligned} P(\omega _{i}|x_{j})=\frac{(1/ED_{ij})^{1/(b-1)}}{\sum _{r=1}^{c}(1/ED_{rj})^{1/(b-1)}}, \quad ED_{ij}=\parallel x_{j}-\mu _{i}\parallel ^{2} \end{aligned}$$
(3)

where \(b>1\) is a free parameter chosen to adjust the “blending” of different clusters. Its appropriate choice can improve the cluster classification if the analyzed data set is too much noisy. In this work, this parameter is set to 2. Algorithm 3 shows the fuzzy \(k\)-means algorithm.

figure c

The next section describes the unsupervised attack in four phases. The \(k\)-means algorithm is used in the search for the points of interest. The fuzzy \(k\)-means is employed in the cluster classification after having selected the points of interest.

4 The Unsupervised Attack

In a realistic assumption for single-execution attacks on exponentiations, the adversary works in a noisy environment and, as already stated in [19], “single bits are never known with certainty [...] and an SPA attacker [...] can only give a probability that any particular operation is a squaring or a multiplication”, if the attacked device executes the square-and-multiply algorithm. If a single-execution attack is able of recovering 98 % of the 1024 exponent bits and the adversary does not know the wrong bit positions inside the exponent, a brute force attack requires \(\sum _{j=0}^{21}C_{j}^{1024}=2^{144}\) steps to retrieve the incorrect bits. Therefore, the number of wrong bits in the recovered exponent must be at least very low, otherwise a single-execution attack is impracticable.

When applying non-profiled attacks on a single trace of an exponentiation, the adversary has no knowledge about the operation features (mean \(\mu \), variance \(\sigma ^{2}\)). All information must be recovered in a unsupervised manner. Regular binary algorithms [8, 23] compute the exponentiation iteratively and for each bit of the exponent (or segment) same operations are performed. Thus, a initial partitioning step is applied to the measured electromagnetic exponentiation trace in order to have \(\ell \) segments, each one representing one exponent bit interpretation. The segments are aligned and compressed to reduce the noise and clock jitter effects. Thereafter, as proposed in this attack, several points of interest are identified in each segment by computing an estimated and approximated difference of means. The cluster classification using the fuzzy \(k\)-means algorithm is applied in each set of compressed samples, each set representing a selected point of interest and providing an estimated exponent. The last step consists in retrieving the randomized exponent using all estimated exponents obtained with the cluster classification. The proposed attack, divided in four phases, is detailed below.

4.1 Phase 1: Trace Pre-processings

The attack starts by acquiring a single execution exponentiation trace from the device. Let us consider that the randomized exponent is \(d_{1:\ell ,k}\), where \(\ell \) is the length of the exponent and \(k\) is index of the exponentiation trace. In our case, the exponentiation is computed using the regular Montgomery ladder algorithm. The EM trace, with size \(L\), is sliced in \(\ell \) operations of multiplications and \(\ell \) operations of squarings, as depicted in Fig. 1.

Fig. 1.
figure 1

Exponentiation trace and the segmentation in multiplications and squarings.

Each multiplication (\(\text {M}_{i}\)) or squaring (\(\text {S}_{i}\)) in the acquired EM trace contains 74 clock cycles. The oscilloscope sampling rate was set to 20 GS/s during the acquisition step and the hardware computes the exponentiation at a clock frequency of 50 MHz, resulting in 59200 samples per exponent bit interpretation (\(\text {M}_{i}\text {S}_{i}\)). The device under attack does not feature any hardware countermeasure, e.g., time disarrangement, dummy cycles or frequency dividers. Therefore, the \(\ell \) segments of multiplication-squarings \(\text {M}_{i}\text {S}_{i}\), can be easily identified and combined to explore the leakage of information. However, the clock jitter effect is still present in the acquired EM trace and must be suppressed using a trace alignment procedure.

Another important role in unsupervised single-execution attacks is to identify the points of interest which present greater leakages. A simple solution consists in averaging the 400 samples of one clock cycle into 1 sample and taking each averaged sample as a point of interest. Here, in order to preserve the information over smaller windows, the trace is compressed by averaging 100 samples into 1 sample. Then, this allows reducing the amount of data from 59200 to 592 compressed samples during an exponent bit interpretation \(d_{i,k}\) (denoted by operation \(\langle \text {MS}\rangle _{i}\) in the sequel).

Now, the \(\ell \) operations are represented by a matrix \(T\):

$$\begin{aligned} T=\begin{bmatrix} \langle \text {MS}\rangle _{1} \\\langle \text {MS}\rangle _{2} \\ \vdots \\ \langle \text {MS}\rangle _{\ell } \end{bmatrix}=\begin{bmatrix} t_{1,1}&t_{1,2}&\cdots&t_{1,592} \\ t_{2,1}&t_{2,2}&\cdots&t_{2,592} \\ \vdots&\vdots&\ddots&\vdots \\ t_{\ell ,1}&t_{\ell ,2}&\cdots&t_{\ell ,592} \end{bmatrix} \end{aligned}$$
(4)

Each row of the matrix \(T\) is a set of compressed samples \(\langle \text {MS}\rangle _{i}=\lbrace t_{i,j}\rbrace \) representing an exponent bit interpretation \(d_{i,k}\). The term \(\ell \) is the exponent bit length and, of course, is the iterations number in the Algorithm 1 (steps 6–9). After the trace pre-processing, the attack enters in the second phase that consists in finding the points of interest.

4.2 Phase 2: Finding the Points of Interest

The success of the attack depends on the choice of the points of interest. With profiling or supervised attacks, these points can be found by computing a difference of means and observing the highest peaks of amplitude. In such a case, the adversary has a known key \(d\) and computes averaged traces \(\text {Tr}_{0}\) and \(\text {Tr}_{1}\) representing the truncated windows of sampled points when the exponent bit is zero and one, respectively and according to:

$$\begin{aligned}&\text {Tr}_{0} = \sum _{i}\langle \text {MS}\rangle _{d_{i}=0}&\text {Tr}_{1} = \sum _{i}\langle \text {MS}\rangle _{d_{i}=1} \end{aligned}$$
(5)

Because the presented attack aims at revealing the exponent through an unsupervised manner, the attacker should be considered as having minimal knowledge about the target implementation to identify the points of interest. Because all data are randomized, the remaining leakage is related to addressing and control executions. Therefore, by observing and studying the collected EM trace, the attacker can, for instance, localize the time points where the device performs such operations and discard the points that clearly show no compromising information.

Our unsupervised analysis needs a set of points of interest in each segment \(\langle \text {MS}\rangle _{i}\) to retrieve the exponent. A basic idea is to initially apply a clustering algorithm over each set of compressed samples \(\lbrace t_{1:\ell ,j}\rbrace \) (each column of matrix \(T\)) and find \(592\) approximated exponents \(\widehat{d_{1:\ell ,j}}\), for \(1 \le j\le 592\). In our practical experiments, this leads to the recovery of around 93 % of the entire exponent on the most leaking set of compressed samples \(\lbrace t_{1:\ell ,j}\rbrace \). It is insufficient. However, this result can be used for calculating approximated and averaged traces \(\widehat{\text {Tr}_{0}}\) and \(\widehat{\text {Tr}_{1}}\) from the approximated exponent \(\widehat{d_{1:\ell ,j}}\). For this initial step, we considered the \(k\)-means clustering algorithm because it is a simple and fast technique. Figure 2(a) shows the relation between the percentage of success recovery of the exponent and the analyzed set of compressed samples \(\lbrace t_{1:\ell ,j}\rbrace \), for \(1\le j \le 592\).

If the adversary selects the most likely exponent (in the case the set \(\lbrace t_{1:\ell ,465}\rbrace \)) he computes the averaged traces \(\widehat{\text {Tr}_{0}}\) and \(\widehat{\text {Tr}_{1}}\). Figure 2(b) shows the approximated difference of mean traces \(\widehat{D} = \widehat{\text {Tr}_{0}}-\widehat{\text {Tr}_{1}}\). The difference of means \(D = \text {Tr}_{0}-\text {Tr}_{1}\), for the real randomized exponent running in the device, is depicted in Fig. 2(c).

Fig. 2.
figure 2

(a) Percentage of correct exponent bits. (b) Approximated difference of mean traces \(\widehat{D} = \widehat{\text {Tr}_{0}}-\widehat{\text {Tr}_{1}}\) (c) Difference of mean traces \(D = \text {Tr}_{0}-\text {Tr}_{1}\).

Note that the results in Fig. 2(b) and (c) are quite similar and the adversary can select points of interest observing the highest peaks of amplitude in \(\widehat{D}\). In a worst case, the adversary would try to compute approximated difference of mean traces, and selecting points of interest, from each one of the 592 possibilities. It is clear that the selection of the most leaking point of interest reduces the computational time of the unsupervised attack. Besides, we observed (see Fig. 2) that the highest percentages of correct exponent recovery match with the highest peaks of amplitude in the approximated difference of means \(\widehat{D}\). We used this observation as a heuristic in order to select the points of interest.

4.3 Phase 3: Cluster Classification

After computing the approximated difference of mean traces from the set of compressed samples \(\lbrace t_{1:\ell ,465}\rbrace \), let us suppose the selection of \(u\) points of interest \(P=\lbrace p_{j}\rbrace \), for \(1 \le j \le u\), among the 592 possibilities. Observing the approximated difference of means \(\widehat{D}\) in Fig. 2(b), 17 points of interest were selected (\(p_{j} = 165, 166, 169, 281, 282, 284, 285, 342, 461, 462, 464, 465, 497, 498, 577, 580, 581)\), which evidently are the most leaking points.

A clustering is computed for all set of compressed samples \(\lbrace t_{1:\ell ,p_{j}}\rbrace \), for \(1 \le j \le u\) by applying the fuzzy \(k\)-means algorithm. Thus, a classification for these samples in two classes (bits zeros and ones), which leads to one estimated exponent \(\widehat{d_{1:\ell ,p_{j}}}\) per set of samples \(\lbrace t_{1:\ell ,p_{j}}\rbrace \), is obtained. Because real attacks work on noisy environments, the clustering over each point of interest \(p_{j}\) contains errors of classification. Figure 3 illustrates the cluster classification error for the set of compressed samples \(\lbrace t_{1:\ell ,169}\rbrace \). Figure 3(a) shows the correct classification according to the real randomized exponent \(d_{1:\ell ,k}\) and the Fig. 3(b) presents the cluster classification returned by the fuzzy \(k\)-means algorithm.

Fig. 3.
figure 3

Errors of cluster classification: (a) Correct classification. (b) Fuzzy \(k\)-means classification.

For each point of interest \(p_{j}\), the fuzzy \(k\)-means clustering algorithm returns two centers \(\mu _{1}\) and \(mu_{2}\) and two groups of clustered samples. A common problem would be to identify what class or operation (exponent bit zero or one) each cluster center represents. With \(u=17\) cluster classifications into two classes, there will be \(2^{17}=131072\) different possibilities. The identification of the classes can be performed in two different ways:

  1. 1.

    Instead of selecting random samples to initialize the values \(\mu _{1}\) and \(\mu _{2}\) in the Algorithm 3, we select the minimum and maximum samples from the set \(\lbrace t_{1:\ell ,p_{j}}\rbrace \) according to their amplitudes. The initialization in Algorithm 3 is done by assigning \(\mu _{1}=min\lbrace t_{1:\ell ,p_{j}}\rbrace \) and \(\mu _{2}=max\lbrace t_{1:\ell ,p_{j}}\rbrace \). It ensures that \(\mu _{1}<\mu _{2}\) after the clustering. Then, comparing the resulting cluster means \(\mu _{1}\) and \(\mu _{2}\) with the amplitude of the approximated difference of means \(\widehat{D}\), and also \(\widehat{\text {Tr}_{0}}\) and \(\widehat{\text {Tr}_{1}}\), it is straightforward to identify the classes.

  2. 2.

    As all selected leaking points may lead to more than 50 % of exponent recovery, we take one recovered exponent \(\widehat{d_{1:\ell ,v}}\) from one point of interest \(v\), \(v \in \lbrace p_{j}\rbrace \), and compute the bitwise XOR between this exponent and the other estimated exponent values. Let \(\ell \) be the size of the exponent, \(\widehat{d_{1:\ell ,p_{j}}}\) all the recovered exponents for \(1\le j \le u\), \(p_{j} \ne v\), and the bitwise results \(h_{1:\ell }=\)XOR(\(\widehat{d_{1:\ell ,p_{j}}},\widehat{d_{1:\ell ,v}}\)) for \(p_{i}\ne v\). If \(\sum _{i=1}^{\ell }h_{i}<\ell /2\) then returns NOT(\(\widehat{d_{1:\ell ,p_{j}}}\)), otherwise keep unchanged.

After the cluster classifications and respective association of the classes, the attack enters in the last step in order to combine all estimated exponents into one final exponent.

4.4 Phase 4: Exponent Recovery

The recovery of the final randomized exponent is computed through three different statistical techniques: majority decision, probability density function (pdf) and Bayesian classifier.

Majority Decision. Table 1 shows the cluster classification results for the first 40 bits of each estimated exponent \(\widehat{d_{1:\ell ,p_{j}}}\) considering the \(u=17\) points of interest. Using the majority decision we can retrieve a randomized exponent \(\widehat{d_{1:\ell ,k}}\).

Table 1. Cluster classification of the (first 40) exponent bits and the recovery of \(\widehat{d_{1:\ell ,k}}\) using the majority rule.

Because the majority rule is a simple statistical procedure, it requires more points of interest for achieving the correct exponent if compared to the next two adopted techniques, as will be demonstrated at the end of this section.

Probability Density Function. In [19], the probability density function, which is based on the normal distributions parameters \(\mathcal {N}(\mu _{0},\sigma _{0})\) and \(\mathcal {N}(\mu _{1},\sigma _{1})\), where \(\mu \) and \(\sigma \) are the mean and the standard deviation, returns the likelihood that a sample \(t_{i,j}\) is the operation when the exponent bit \(d_{i,k}=0\). As the presented analysis is unsupervised, we do not know \(\mu _{0}\) and \(\mu _{1}\). However, the fuzzy \(k\)-means cluster classification returns two means or centers \(\mu _{1}\) and \(\mu _{2}\) for each set of compressed samples \(\lbrace t_{1:\ell ,p_{j}}\rbrace \) which can be used in place of the means. The standard deviation \(\sigma \) is computed from all the set of samples \(\lbrace t_{1:\ell ,p_{j}}\rbrace \), considering the evaluated point of interest \(p_{i}\). Then, the likelihood that a sample \(t_{i,p_{j}}\) is an operation when \(d_{i,k}=0\) is given by the equation below:

$$\begin{aligned} p(t_{i,p_{j}},\mu _{1})=\frac{e^{-\frac{1}{2}(t_{i,p_{j}}-\mu _{1})^{2}/2\sigma ^{2}}}{e^{-\frac{1}{2}(t_{i,p_{j}}-\mu _{1})^{2}/2\sigma ^{2}} + e^{-\frac{1}{2}(t_{i,p_{j}}-\mu _{2})^{2}/2\sigma ^{2}}}, \quad 1 \le i \le \ell , 1 \le j \le u \end{aligned}$$
(6)

Following, the defined sum of probabilities gives the likelihood that a set of points of interest \(\lbrace t_{i,p_{1:u}}\rbrace \), representing the operation \(\langle \text {MS}\rangle _{i}\), is an operation performed when the randomized exponent bit \(d_{i,k}=0\) and is computed by:

$$\begin{aligned} S_{0,1:u} = \frac{1}{u}\sum _{j=1}^{u}p(t_{i,p_{j}},\mu _{1}) \quad 1\le i \le \ell \end{aligned}$$
(7)

Then, for \(1\le i \le \ell \), the following decision returns the estimated randomized exponent bit \(\widehat{d_{i,k}}\):

$$\begin{aligned} \widehat{d_{i,k}} = \left\{ \begin{array}{lll} 0, &{} \text {if} &{} S_{0,1:u} \ge 0.5)\\ 1, &{} \text {if} &{} S_{0,1:u} < 0.5) \end{array} \right. \end{aligned}$$
(8)

Table 2 shows the final sum of probabilities \(S_{0,1:u}\) and the exponent decision from Eq. 8 considering the 20 first exponent bits \(\widehat{d_{1:20,k}}\) (for space in Table 2) and \(u=17\) points of interest.

Table 2. Cluster classification of the (first 20) exponent bits and the recovery of \(\widehat{d_{1:\ell ,k}}\) using the probability density function for normal distributions.

Bayesian Classifier. The Bayesian decision theory makes the assumption that the decision problem is posed in probability terms. The classifier lies on the computation of the posterior probabilities \(P(\mu _{c}|t_{i,p_{j}})\) which is computed from the prior probabilities \(P(\mu _{c})\) and the probability density function for normal distributions \(p(t_{i,p_{j}},\mu _{c})\), where \(c=\lbrace 0,1\rbrace \) and \(p(t_{i,p_{j}},\mu _{c}) \in [0,1]\). Thus, the classification starts by obtaining the pdf estimation for each point of interest \(t_{i,p_{j}}\) of each operation \(i\). Again, this analysis considers the two cluster centers \(\mu _{1}\) and \(\mu _{2}\) in the place of means and the standard deviation is computed from all the set of compressed samples \(\lbrace t_{1:\ell ,p_{j}}\rbrace \):

$$\begin{aligned} p(t_{i,p_{j}},\mu _{1})= \frac{1}{\sigma \sqrt{2\pi }}e^{-\frac{(t_{i,p_{j}}-\mu _{1})^{2}}{2\sigma ^{2}}} \end{aligned}$$
(9)
$$\begin{aligned} p(t_{i,p_{j}},\mu _{2})= \frac{1}{\sigma \sqrt{2\pi }}e^{-\frac{(t_{i,p_{j}}-\mu _{2})^{2}}{2\sigma ^{2}}} \end{aligned}$$
(10)

The probability density functions \(p(t_{i,p_{j}},\mu _{1})\) and \(p(t_{i,p_{j}},\mu _{2})\) are obtained for \(1 \le i \le \ell \) and \(1 \le j \le u\). Considering \(P(\mu _{c})\) as the prior probabilities for the points of interest \(p_{j-1}\), where \(c=\lbrace 0,1\rbrace \), by Bayes’s formula we obtain the posterior probabilities \(P(\mu _{c}|t_{i,p_{j}})\) for the operations \(i\) and points of interest \(p_{j}\):

$$\begin{aligned}&P(\mu _{1}|t_{i,p_{j}})=\frac{p(t_{i,p_{j}},\mu _{1})P(\mu _{1})}{p(t_{i,p_{j}},\mu _{1})P(\mu _{1})+p(t_{i,p_{j}},\mu _{2})P(\mu _{2})}\end{aligned}$$
(11)
$$\begin{aligned}&P(\mu _{2}|t_{i,p_{j}})=\frac{p(t_{i,p_{j}},\mu _{2})P(\mu _{2})}{p(t_{i,p_{j}},\mu _{1})P(\mu _{1})+p(t_{i,p_{j}},\mu _{2})P(\mu _{2})} \end{aligned}$$
(12)

The Bayes’s formula is repeated for all points of interest \(p_{j}\) over the same operation \(i\). At the end, this estimation returns the probabilities that a certain operation \(\langle \text {MS}\rangle _{i}\) is being executed when the exponent bit \(d_{i,k}=0\). Table 3 shows the evolution of posterior probabilities \(P(\mu _{1}|t_{i,p_{j}})\) over all points of interest \(t_{i,p_{j}}\), for \(1 \le j \le u\), and the respective percentage of correct exponent bits. Again, in this example we consider the first 20 exponent bits.

Table 3. Cluster classification of the (first 20) exponent bits and the recovery of \(\widehat{d_{i:\ell ,k}}\) using the Bayesian classifier.

For the three presented methods, we showed the cluster classification results for \(u=17\) points of interest. Figure 4 demonstrates the evolution of the exponent recovery related to the number of points. In Fig. 4(a), it was considered the evolution from the least to the most leaking point. Note that using the Bayesian classifier 11 points are necessary to recover the entire exponent. The same result can be observed in Table 3. On the other hand, in Fig. 4(b), if the evolution is from the most to the least leaking point, the Bayesian classifier achieves 100 % of the exponent using only 4 points of interest per exponentiation segment.

Fig. 4.
figure 4

Relation between the exponent recovery and the number of points of interest: (a) from the least to the most leaking point and (b) from the most to the least leaking point (this Figure is represented in a different scale).

5 Obtaining the Private Key from Randomized Exponents

For decryption and message signing, the retrieval of the randomized exponent \(d_{r} = d + r.\phi (N)\) is the same as retrieving \(d\). Therefore, a single-execution attack is sufficient to break the target device. However, for non-CRT implementations of RSA and in the case when the recovered randomized exponents present few error bits, the adversary can also improve the procedure using a step-by-step attack, as proposed in [19]. In this case, the recovering of several blinding factors \(r\) in the exponent randomization is used to derive the exponent \(d\).

Approximately the \(\ell /2\) most significant bits of the exponent \(d\) are exposed when the public key \(e\) is small (\(3\), \(17\) or \(2^{16}+1\)). In this procedure, the term \(\phi (N)\) is approximated by \(N\) and the approximated exponent is given by, \(k \in \mathbb {Z}\):

$$\tilde{d} = \Big \lfloor \frac{1+kN}{e}\Big \rfloor $$

Consequently, the adversary can obtain the \(\ell /2\) most significant bits of all the possible randomized exponents by computing \(\tilde{d_{r_{i}}} = \tilde{d} + r_{i}.N\), for all \(i\) and \(r_{i} \in [0,2^{32}-1]\). Considering \(\lambda \) as being the bit length of the blinding factor \(r\), the \(\lambda \) most significant bits of \(d+r.\phi (N)\) are equivalent to the most significant bits of \(\tilde{d}+r.N\). Then, the adversary can compute all possible values for \(r\) and by observing the recovered randomized exponent \(d_{r}\), he can deduce \(r\). Finally, he constructs the attack on exponentiation traces containing possibly known blinding factors.

6 Countermeasures

The efficiency of single-execution attacks on exponentiations are related to the signal-to-noise ratio (SNR) of acquired traces. Theoretically, the Algorithm 1 is SPA-protected because it is regular and all manipulated data are randomized due to algorithmic (exponent blinding) and arithmetic (leak resistant arithmetic) countermeasures. If different operands are read and written depending on the exponent bits, in a practical implementation of the Montgomery ladder the memory accesses cause addressing related leakages, as demonstrated through the unsupervised attack.

Hardware countermeasures as the insertion of time disarrangement, dummy cycles or frequency dividers reduce the SNR. Balancing the power consumption is an alternative to avoid the leakage of information due conditional tests or control decisions.

If the leakage is related to memory accesses, a possible solution is the randomization of the RAM addresses during the exponentiation. By doing so, the unsupervised attack was unable to entirely recover the randomized exponent. By selecting the same points of interest \(P=\lbrace p_{j}\rbrace \), we applied the fuzzy \(k\)-means clustering algorithm and recovered approximately 80 % of the exponent using the Bayesian classifier technique.

7 Conclusions

This paper presented an unsupervised attack on randomized exponentiations. The explored leakages are based on control executions and memory addressing. We proposed to combine the cluster classification for several points of interest over each exponent bit interpretation in order to derive the randomized exponent using a single EM trace. The results were presented through three different statistical techniques and specifically for the probability density function and Bayesian Classifier techniques, we showed the likelihood for the randomized exponent bits.

The presented unsupervised attack demonstrated the efficiency of clustering algorithms against single execution of exponentiations even in the presence of algorithmic and arithmetic countermeasures. The obtained results show the importance of employing hardware countermeasures in public-key architectures.