Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The side-channel information leakage about secret-dependent internal values is usually limited. Attackers who target implementations of symmetric ciphers may repeat measurements many times to collect sufficient leakage information while the secret remains unchanged. In case of asymmetric algorithms, however, the secret is either ephemeral or blinded through countermeasures and attackers are only allowed one side-channel observation. Hence, it is crucial to record and exploit as much leakage as possible. Profiled attacks, e.g. template attacks, are powerful in exploiting leakage efficiently, however, can be prevented by blinding or by preventing attackers from gaining full access for profiling. Non-profiled attacks cannot be prevented in this way, because they do not require profiles and hence, are a much bigger threat to devices. Heyszl et al. [8] proposed to use well-established clustering algorithms for non-profiled attacks. They use k-means clustering after a simple sum-of-squares pre-processing of the measurement data in their practical experiments.

We follow their proposal and significantly improve the algorithmic approach. Principal Component Analysis (PCA) has been used for pre-processing and data reduction in other side-channel attacks [2, 3, 5, 15, 22]. Also, strategies to select only certain principal components have previously been mentioned [3]. We apply (PCA) to clustering-based, non-profiled attacks on exponentiation algorithms and performed practical experiments on an FPGA-based implementation of Elliptic Curve Cryptography (ECC) by using high-resolution electromagnetic measurements as side-channel. We find that PCA concentrates exploitable leakage with comparably low variance into few components which are not the highest-ranked ones. Hence, as an important step after transformation, we discard high-ranked as well as many low-ranked components during a parametrized selection. In our non-profiled setting, this requires some testing for the right selection parameters, hence, brute-force by the attacker. However, significantly improved attack results clearly justify this. For cluster classification, we use the expectation maximization algorithm instead of the k-means algorithm [8]. The resulting attack is successful with single-channel measurements in significantly more cases than if using the algorithmic approach by Heyszl et al. [8]. Most of the achieved algorithmic improvement can be attributed on using PCA and the component selection as pre-processing technique before clustering. Like expected, a profiled template attack still outperforms the improved non-profiled attack.

Another way to improve attacks in single-execution settings is to use multiple simultaneous channels and combine their leakage. Previous contributions have tested the combination of (low-resolution) magnetic field measurements and current consumption measurements [1, 22] using template attacks. High-resolution magnetic field measurements should generally provide better signal qualities [10] and allow to capture multiple independent channels because the signals highly depend on measurement locations [9]. We present the first practical results from using three high-resolution magnetic field probes simultaneously and combine them in the clustering-based non-profiled attack. However, we find that the combination of three channels does not improve the results using the non-profiled PCA- and clustering-based attack compared to the best individual channel. We conclude that in the non-profiled setting, our approach seems unsuitable for combining multi-channel data. The profiled template attack, however, leads to a significant improvement through the combination of channels. In profiled settings, attackers are able to find the best measurement positions for single channels. Hence, the additional cost for multi-channel equipment is only reasonable in profiled settings and if the available leakage is still insufficient.

We first explain the background and related work of (non-profiled) attacks against exponentiations in Sect. 2.1. In Sect. 2.2, we cover the background and related work of magnetic field side-channels and multi-channel measurements. In our first main Sect. 3, we describe our algorithmic approach to improve clustering-based attacks on exponentiations and to handle multi-channel data. We back these considerations by practical experiments in Sect. 4 and discuss the results. We summarize our contribution and findings in Sect. 5.

2 Preliminaries

2.1 Non-profiled Attacks Against Exponentiations

The main computation in public key cryptosystems is modular integer exponentiation with secret exponents (e.g. RSA, DSA) or elliptic curve scalar multiplication (e.g. ECDSA) with secret scalars. In this contribution, we use the generalized terms ‘exponentiation algorithms’ and ‘secret exponents’. The secret exponent is usually either ephemeral by design (e.g. ECDSA) or blinded through countermeasures (e.g. exponent blinding in RSA, or in ECDSA to prevent profiling). Therefore, it is different for every execution and side-channel attackers may only exploit single executions. The first single-execution attack on exponentiations was presented by Kocher [13] who exploits data-dependent execution times of algorithms. To avoid this, improved algorithms like the square-and-multiply-always, double-and-add-always or the Montgomery ladder algorithm have constant operation sequences (e.g. side-channel atomic routines) to avoid such simple side-channel attacks. In all those algorithms, exponents are scanned bit- or digit-wise (depending on whether it is a binary, m-ary, or sliding window exponentiation) and the computation is performed in a loop iterating a constant sequence of operations. (We will continue to refer to the binary case in this contribution.) Nonetheless, some side-channel leakage about the processed exponent remains in many cases which can be referred to as single-execution leakage. Examples include data-dependent leakage from using pre-computed multiples in digit-wise multiplications [25], address-bit leakage [12], location-dependent leakage from accessing different storage locations [9], or operation-dependent leakage, e.g., when square and multiply operations can be distinguished [4].

Attacks against an exponentiation are carried out by partitioning side-channel measurements into trace-segments with each segment corresponding to an independently processed bit of the secret exponent. The segmentation borders are either known a priori, or can often be derived from visual inspection or comparison of shifted trace parts. The trace for measuring n exponent bits consists of n trace-segments \(\varvec{t_d} = (t_{1+(d-1)\cdot {l}}, ..., t_{d\cdot l})\) with \(d \in [1,n]\), each of which is of length \(l\) (time-samples) which is referred to as its dimensionality (of features). For analyzing and attacking the measurement data, a \(n\times l\) matrix \(\varvec{M}\) is constructed by placing each segment in one row. The contained leakage is exploited to find a structure, or partitioning of the rows due to secret exponent values. Template attacks use a profiling step to create templates of the segments for different values. Profiling can be prevented in many cases by blinding countermeasures or not allowing attackers full access to devices for profiling. We concentrate on non-profiled attacks because they are more powerful and threatening.

There have been several published attacks on exponentiations which do not require profiling. Walter [25] was the first to describe an attack by using a custom algorithm (resembling a clustering algorithm) to partition the segments into buckets. Messerges et al. [16], Clavier et al. [6], and Witteman et al. [26] use cross-correlation in non-profiled single-execution attacks on exponentiations. We pursue the approach by Heyszl et al. [8] who promote the use of established clustering algorithms (such as e.g. k-means) for non-profiled attacks due to the generality of their approach and support for the combination of multiple channels. A correct classification of trace-segments equals the recovery of the secret exponent. (Later, Perin et al. [18] described a similar but heavily customized two-stage approach which seems tailored to their case and unreasonable for generalization.) We extend and significantly improve previous work by using Principal Component Analysis (PCA) and expectation maximization clustering (instead of k -means and simple pre-processing).

2.2 Multi-probe Measurements of Magnetic Fields

Using multiple side-channels concurrently, and combining them in an attack is an important way of increasing the exploitable leakage in single-execution attacks. Agrawal et al. [1] first, and later Standaert et Archambeau [22], describe the combination of current consumption with magnetic field measurements in profiled attacks through concatenation of traces. Standaert and Archambeau [22] report better results from magnetic field than current measurements and report an improvement from the combination of both channels. Souissi et al. [21] first presented results from combining two simultaneous measurements of the magnetic field. They measure the field close to two different supply capacitors of an FPGA. In this way they measure the supply of two different parts of the FPGA.

We find that in many cases, side-channel measurements of the magnetic field are closely related to the consumption of an entire device because comparably large coil diameters (\({>}{500}\,\mathrm{um}\)) are used at large distances to the integrated circuits (\({>}{300}\,\mathrm{um}\)) [1, 7, 17, 20, 22]. Such measurements often capture the magnetic field of supply wires (bonding wires) which is directly proportional to the current consumption of the entire integrated circuit (including noise sources from within the device). In our opinion, it is unreasonable to simultaneously record more than one magnetic field channels in such cases due to this global character. Lately, high-resolution magnetic field measurements at close distances to an integrated circuit die have been investigated extensively by Heyszl et al. [9, 10]. Such high-resolution measurements require magnetic field probes with diameters of \(\approx {150}\,{\upmu }\mathrm{m}\) at close distances to an integrated circuit die (\({<}{100}\,{\upmu }\mathrm{m}\)). In our opinion, the capturing of multiple simultaneous magnetic field side-channels only makes sense in case of such high-resolution measurements which can be restricted to parts of integrated circuits because they will convey sufficiently different information (e.g. localized leakage [9]). Heyszl et al. [8] mention the combination of multiple high-resolution channels for non-profiled single-execution attacks, however, did not perform actual simultaneous measurements. We extent their work and present first results from an extensive practical study using three high-resolution micro-coil magnetic field channels.

3 Improving Clustering-Based Attacks

In this section, we describe our algorithmic approach to clustering-based non-profiled attacks on exponentiations which improves previous work [8]. We explain how we use Principal Component Analysis (PCA) as a pre-processing step for dimensionality reduction and feature selection in Sect. 3.1. We continue and describe how expectation maximization clustering can be used to attack single- and multi-channel measurements in Sect. 3.2. Finally, we describe how classification errors can be handled and derive the brute-force complexity as a measure to assess attack outcomes in Sect. 3.3.

3.1 PCA for Dimensionality Reduction and Feature Selection

Side-channel measurements usually lead to big amounts of data, especially when high sampling rates for magnetic field measurements are required. This increases required computational power and memory consumption during subsequent data analysis. Only a small part of the data will contain exploitable leakage information. Hence, feature selection to discard other parts is desirable.

Simple trace compression [14] is commonly used and usually justified by electrical properties. This includes extracting the peak values or computing the sum-of-squares (such as Heyszl et al. [8]) during the time-period of one clock cycle. Another popular method is the selection of so-called points-of-interest. This subset is usually identified through profiling with known secrets.

We concentrate on powerful non-profiled, unsupervised methods, specifically, on PCA. PCA has been applied to side-channel analysis for data reduction in several contributions [2, 3, 5, 15, 22] for different attacks of which Archambeau et al. [2] were the first to describe the use of PCA in the context of template attacks. Standaert and Archambeau [22] later compare PCA and Linear Discriminant Analysis (LDA) in the context of template attacks and confirm that LDA leads to superior results. We disregard LDA because training data from profiling is used to achieve a representation which maximizes cluster separation.

PCA is based on Singular Value Decomposition (SVD) and transforms the data into another coordinate system subspace with linearly uncorrelated coordinates by using the variance as score function, hence, maximizing the retained variance of the data. As described in Sect. 2.1, recorded side-channel measurements are cut into trace-segments corresponding to exponent bits. This leads to the real matrix \(\varvec{M}\) of measurement data, with the shape \(n\times l\) for every probe (see Sect. 2.1). The SVD of \(\varvec{M}= U* \varSigma *V^*\) and the transformation into the orthogonal subspace of \(\varvec{M}\) equals \(U*\varSigma \). This matrix \(U*\varSigma \) consists of column vectors \((\varvec{PC_1}, ..., \varvec{PC_r})\) with r being the number of row-vectors and \(\varvec{PC_j}\) being a column-vector of shape \(n\times 1\), which is called a principal component. The maximum number of components equals the number of trace-segments \(n\) of the original data, \(\max |PC| = \min (n, l)\), because the segment-length \(l\) is usually much larger than \(n\). After applying PCA, the components are ordered by their variance which can be found in the diagonal matrix \(\varSigma \). In our experiments, we normalize the variances of the principal components to one, i.e. we directly use \(\varvec{M}_{\mathrm {PCA}}=U\) instead of \(U*\varSigma \). Before applying PCA, we removed the mean of every trace-segment as a standard measure.

Ideally, a transformation into a reduced subspace should maintain the ‘useful’ information while neglecting ‘not useful’ information, which is difficult without supervision. PCA combines correlating input dimensions into single principal components. Archambeau et al. [2] propose to only retain the first-ranked components assuming that the leakage is contained there, while discarding the remaining low-variance ones, assuming only noise is contained. Batina et al. [3] found in their practical experiments, that results of correlation-based Differential Power Analysis (DPA) improved when removing first-ranked components. There are several reasons for high variances of the trace segments, e.g. data-dependent signal influences and noise, which are irrelevant to the desired classification. We suspect that relevant and irrelevant signal parts will aggregate within separate components. Also, from our experience, the ‘interesting’ leakage signal parts are rather low-variance in the case of single-execution attacks.

Hence, we propose a selection strategy which discards several highest-ranked as well as many low-ranked components because they either contain noise or information which we are not interested in. We either select single principal components or a number of consecutive components (random choices of multiple components will lead most likely to an untestable amount of possibilities). Reduced trace-segments \(\varvec{M}_{\mathrm {PCA},k: k+ i}= (\varvec{PC_{k}}, ..., \varvec{PC_{k+i}})\) are derived with k the first selected component and \(i \ge 1\) the number of consecutive components retained. We trialled values of \(k \in [1, 20]\) and \(i \in \{1,2,4,6,9\}\) in our practical experiments and found that using only one single component \(i = 1\) leads to the best results in our attack on average, and that the \(k \le 3\) first-ranked components should be discarded. This selection strategy reflects the approach of an attacker who is unable to perform profiling. An optimal selection of components can certainly not be determined a priori because it is highly device- and application-specific (general issue in machine learning [27]). Hence, without a priori-knowledge, an attacker has to trial different values for k and i. This, however, only requires an additional brute-force complexity of a few bits and improved attack outcomes clearly justify this.

3.2 Expectation-Maximization Clustering of Multi-channel Data

Clustering algorithms can generally be split into supervised, semi-supervised and unsupervised algorithms. Our focus on non-profiled attacks restricts the choice to unsupervised algorithms. Heyszl et al. [8] first describe how unsupervised clustering algorithms can be used in a non-profiled attack to partition \(n\) trace-segments into classes according to their secret exponent values. An unsupervised cluster classification is equivalent to estimating the free parameters of the classes’ assumed distribution model. The choice of the algorithm and free parameters depends on the assumed probability distribution model, hence shape of the clusters. While Heyszl et al. use k-means clustering, we improve this by using the expectation maximization algorithm while keeping the Gaussian distribution assumption which both algorithms are based on.

Expectation-maximization clustering provides more free parameters which leads to a generally improved approximation of the cluster distributions, which usually leads to better classification results. The algorithm is based on repeated expectation and maximization steps. During these iterations the maximum likelihood means and covariances for the Gaussian distribution are derived. The result is a classification and a class-membership probability which indicates the reliability of correct classification for each segment (resp. secret exponent bit). The number of free parameters in the clustering algorithm can be chosen. We assume that the cluster shapes are mainly defined by Gaussian distributed noise. Additionally, we assume the noise being independent of the processed bit value. Hence, we chose to estimate two means and one joint full covariance matrix.

Multiple simultaneous measurements channels are combined by concatenating the trace-segments from different channels which correspond to the same exponent bits [1, 8]. PCA is applied to all side-channel measurement channels separately before concatenation. For example, segments \(\varvec{M}_{\mathrm {PCA},k: k+ i}^1\) from measurement channel 1 are combined with segments \(\varvec{M}_{\mathrm {PCA},k: k+ i}^2\) from measurement 2 leading to combined segments \(\varvec{M}_{\mathrm {PCA},k: k+ i}^{ combined } = (\varvec{M}_{\mathrm {PCA},k: k+ i}^1 , \varvec{M}_{\mathrm {PCA},k: k+ i}^2)\). An attacker would rather use the same values for k and i in all channels because it significantly increases the attack complexity to test different k-s and i-s for every channel without profiling (e.g. repeat clustering process \((20 * 5)^3\) times).

3.3 Classification Errors and Required Brute-Force Complexity

If the recovered exponent is incorrect, faulty bits need to be identified, which is usually hard. As described by Heyszl et al. [8], an attacker can use the bits’ probabilities of correctness to judge which need to be trialled for correctness and follow a simple strategy to enumerate possible keys. This strategy leads to an estimated remaining brute-force complexity which we use to assess practical attack outcomes. Better, even optimal, key enumeration strategies [23, 24] will result in a lower amount of required brute-force if the attacker applies them. However, the typically large key sizes in asymmetric cryptography make the application of such algorithms challenging for attackers as well as evaluators. The said brute-force complexity which is used instead can be seen as an upper bound for the rank of the correct key as derived from an optimal enumeration.

We chose to use the silhouette index score [19] for the bits’ error probability. It is based on the cumulative distance of each trace-segment to other trace-segments of each cluster. The silhouette index is calculated for every \(\varvec{m}_{\mathrm {PCA}}\), which corresponds to one row of \(\varvec{M}_{\mathrm {PCA},k: k+ i}\), with \(\mathcal {C}_1\) being the set of trace segments \(\varvec{t_d}\) of the same cluster like \(\varvec{m}_{\mathrm {PCA}}\) (determined by the expectation maximization algorithm) and \(\mathcal {C}_2\) being the set of trace segments belonging to other clusters. With the distance function dist(ab) (we use Euclidean distance due to the Gaussian noise assumption) the silhouette index s is computed as:

$$\begin{aligned} s(\varvec{m}_{\mathrm {PCA}}, \mathcal {C}_1, \mathcal {C}_2) = \frac{f(\mathcal {C}_1, \varvec{m}_{\mathrm {PCA}}) - f(\mathcal {C}_2, \varvec{m}_{\mathrm {PCA}}) }{max(f(\mathcal {C}_1, \varvec{m}_{\mathrm {PCA}}),f(\mathcal {C}_2, \varvec{m}_{\mathrm {PCA}}))} \end{aligned}$$
(1)
$$\begin{aligned} f(\mathcal {C}, \varvec{m}_{\mathrm {PCA}}) = \frac{1}{|\mathcal {C}|} \sum _{x \in \mathcal {C}} dist(x, \varvec{m}_{\mathrm {PCA}}) \end{aligned}$$
(2)

After calculating the score for all \(n\) segments, the ones with the lowest s are brute-forced in repetitions while including an increasing number of bits [8]. Let q be the last bit which is trialled until the correct exponent is found, then \(2^{(q+1+1)}\) different exponents have to be tested at maximum which can be referred to as remaining brute-force complexity after the attack [8]. One additional bit is included for both possibilities to assign labels to the two classes. It equals \(2^{(n+1+1)}\) at maximum and \(2^1\) at minimum.

4 Practical Evaluation

We present the first practical results from the simultaneous use of three high-resolution magnetic field probes. We chose a fixed geometric arrangement of the measurement probes close to the surface of an FPGA die and performed 400 measurements at different positions to gain conclusive insights from a high number of tests. We succeed in demonstrating the algorithmic improvement from our approach and derive conclusions about the benefit from using multiple channels simultaneously.

4.1 Design-Under-Test and Multi-probe Setup

Fig. 1.
figure 1

Geometric arrangement of measurement-probes on FPGA die surface

As a device under test, we use a Xilinx Spartan 3A FPGA chip (see Fig. 1a) which is configured with an Elliptic Curve Cryptography (ECC) design and performs an 163 bit elliptic curve scalar multiplication using a Montgomery ladder. This algorithm is a classical candidate for attacks against exponentiation algorithms since it processes the secret exponent bit-wise in \(n\) constant time segments. As a single-execution side-channel leakage about the consecutively processed exponent bits we exploit location-based leakage which is revealed by high-resolution measurements of the electromagnetic field [9].

After decapsulating the FPGA die (see Fig. 1a), we use an area of \({1700}\,{\upmu }\mathrm{m} \times {1700}\,{\upmu }\mathrm{m}\) on the surface of the die between bonding wires to place probes. We arrange three probes in a fixed formation, and place them on 400 (\(20 \times 20\)) different positions within this area to able to evaluate 400 data sets by our analysis. Figure 1 depicts the geometric arrangement of the probes from the side and from the top. The distance of the probes to the die surface is approximately \({100}\,{\upmu }\mathrm{m}\). We used three near-H-field (magnetic) probes with coil diameters of \( {250}\,{\upmu }\mathrm{m}\), \({150}\,{\upmu }\mathrm{m}\) and \({100}\,{\upmu }\mathrm{m}\) which we had available in our laboratory. The bandwidth of the probes is \({6}\,\mathrm{GHz}\) with a built-in \({30}\,\mathrm{dB}\) amplifier. The signal is sampled synchronously to the device’s clock at \(2.5\,\mathrm{{GS/s}}\). Contrary to other contributions [8, 18] no simple compression or pre-proccessing (e.g. averaging, maximum extraction or sum-of-squares during clock cycles) is applied before (PCA) and clustering. Such simple trace pre-processing techniques have been shown to have negative effects on results [10].

4.2 Quality of Principal Components

Fig. 2.
figure 2

Mean brute-force complexity for different selected principal components (k and i) over all measurement positions including standard deviation as bars

Our algorithmic approach includes the selection of principal components after PCA as a first step before clustering. The selection can be described by two parameters, k the first selected component, and, \(i \ge 1\) the number of consecutively selected components after the k-th one as described in Sect. 3.1. In this section we investigate the quality of different parameter choices. We executed the clustering-based attack on every single measurement from all 3 probes and 400 positions with choices of \(k \in [1, 20]\) and \(i \in \{1,2,4,6,9\}\) and assess the quality using the remaining brute-force complexity explained in Sect. 3.3.

We show the means over \(3 * 400\) results for the resulting brute-force complexities for each combination of parameters k and i in Fig. 2. Hence we are able to equally compare the results of different probes and show some fundamental properties of our measurements. These high mean brute-force complexities of \({>}100\) bits are certainly not within the range of realistic computing capabilities. They result from including many low-scoring results. The standard deviations are shown as vertical bars and indicate that there are multiple results with significantly lower brute-force complexities (the diagram does not include \(+1\) bits for assigning labels to classes). As an important observation, low-ranked components (\(k<10\)) seem preferable overall and first-ranked principal components do not contain exploitable leakage (see curve with \(i=1\) or \(i=2\) in Fig. 2). This confirms our assumptions from Sect. 3.1 as well as similar observations from Batina et al. [3]. Thus, we discard first-ranked as well as low-ranked principal components before further analysis and achieve significantly improved brute-force complexities.

In Fig. 2 it can also be noted that the component number \(k=4\) seems to contain the most leakage on average, reaching the lowest mean brute-force complexities. It seems that PCA concentrates most of the exploitable leakage information into a single principal component. This means that a choice of \(i=1\) for the number of selected consecutive principal components led to the best results in our circumstances. We used this choice in the practical evaluation in the next Sect. 4.3. As another observation, curves with \(i>2\) lead to low complexities as soon as component 4 is included in the consecutively selected components. For illustrative purposes, we show the resulting principal components after PCA transformation of an examples trace in the Appendix A.1.

Fig. 3.
figure 3

Brute force complexity occurrences over different principal components (Color Figure Online)

4.3 Analyzing Separate Channels

For every probe, we have 400 measurements from different positions. We analyze the data from the three available channels separately: Firstly we perform pre-processing by applying PCA, secondly we perform clustering using the expectation maximization algorithm and thirdly we compute the remaining brute-force complexity. For every probe separately, and for every selection of principal components (for every \(k \in [1, 20]\) while \(i=1\)), we summarize the results from 400 tests in Fig. 3a, 3b, and 3c. Figure 3a shows results for the \({250}\,{\upmu }\mathrm{m}\) coil probe, Fig. 3b for the \({150}\,{\upmu }\mathrm{m}\) coil probe and Fig. 3c for the \({100}\,{\upmu }\mathrm{m}\) coil probe. The figures show, how many of the 400 measurements of each probe, and for every selection of k, lead to which brute-force complexities. The occurrence rate is visually indicated by the size of the respective dots. Bigger dots mean that the corresponding brute-force complexity has occurred more often. For example, in Fig. 3a, almost all of the 400 measurements lead to a maximum brute-force complexity of 163 for \(k < 5\) and \(k > 10\). For \(k=5\), however, many measurements lead to lower resulting brute-force complexities, some even of the minimum. The red dashed line highlights the 32 bit complexity level up to which all outcomes are easily manageable for attackers through computation.

As an important finding, it can be observed, that the probe with the \({150}\,{\upmu }\mathrm{m}\) coil diameter depicted in Fig. 3b leads to the best results by far. For the principal component \(k=4\), an astonishing percentage of \(56\,\% \) out of the 400 measurements led to a remaining brute-force complexity \({\le } 32\) bit (summing up all outcomes equal or lower the red dashed line). This high number was unexpected and means that with the improved algorithmic, more than half of all measurement positions exhibited sufficient leakage for a complete break. The \({100}\,{\upmu }\mathrm{m}\) probe depicted in Fig. 3a leads to only 3 % \({\le } 32\) bit for \(k=5\). Also the \({250}\,{\upmu }\mathrm{m}\) probe depicted in Fig. 3c only leads to 3 % \({\le } 32\) bit for \(k=8\). Hence, the \({150}\,{\upmu }\mathrm{m}\) coil probe seems to work best under our circumstances. Since finding suitable measurement positions is rather easy (using the best probe), attackers should test different measurement positions instead of employing extensive computational brute-force, testing is comparably easy in case of single-execution attacks because only single measurements need to be analyzed at every position.

Without knowing \(k=4\) and \(i=1\) a priori, attackers could make minimal heuristic assumptions like \(k \in [3, 10]\) and \(i \in [1,4,9]\) which could fit similar circumstances. This would result in an additional brute-force complexity of \(+4\) bits which is not included in Fig. 3 and justified by significantly improved results.

To demonstrate the improvement of our proposal, we performed the original attack of Heyszl et al. [8] on the same measurements. A remaining brute-force complexity \({\le } 32\) bit is reached in none (0 %) of the 400 measurement cases using the \({150}\,{\upmu }\mathrm{m}\) coil probe. Compared to 56 % from the improved attack, this means that we achieve astonishingly improved results from applying PCA and expectation maximization clustering. (Only the \({250}\,{\upmu }\mathrm{m}\) coil probe led to marginally better results using the previous method, i.e., 8 % instead of 3 % of the cases \({\le } 32\) bit, however, this does not invalidate the previous statement in our opinion.)

We compared the performance of the k-means versus the expectation maximization clustering algorithm in the context of single channels. Since we only select single components (\(i=1\)) after PCA, channels only consist of single dimensions and there is not much benefit from more free parameters in the clustering algorithm. This is confirmed by the fact that expectation maximization and k-means clustering lead to almost equal results. This means that our reported improvement is mainly due to the PCA transformation and the selection of components. In the multi-channel case, however, more dimensions aggregate from separate channels making expectation maximization more eligible.

As benchmark for high-resolution magnetic field measurements, we tested the improved non-profiled attack on a current consumption measurement. We use a 1 Ohm measurement resistor and a differential probe at unchanged sampling rate. To cancel one-time effects such as disturbances or noise, we repeated this 12 times and averaged the results. The outcome is a significantly high brute-force complexity of 152 bits. Hence, it is completely impossible to exploit leakage from such current measurements. This underlines that high-resolution magnetic field measurements are clearly superior in leakage signal quality in our circumstances.

4.4 Combining Multiple Channels

After the individual analysis of the three measurement channels, we combined the channels for analysis as described in Sect. 3.2. The motivation for attackers to combine channels is to increase the exploitable leakage to improve attack outcomes, e.g. instead of trying to find better measurement positions.

Figure 3d shows the brute-force complexity results for the combined measurements in the same way as described in the previous Sect. 4.3. A visual comparison of the combined results in Fig. 3d to the individual results in Fig. 3a, b, and c gives the impression, that the overall result is comparable to Fig. 3b. However, expressed quantitatively in the same way as before, the combined channels lead to a remaining brute-force complexity of \({\le } 32\) bit in only 52 % of the cases for \(k=4\). Hence, as an important result, instead of an improvement, we observe a slight degradation compared to the best individual case which led to 56 % of cases \({\le } 32\) bit. This means that the described clustering-based non-profiled attack is unable to benefit from a combination of channels (in our circumstances).

We suspect that this is due to the fact that our selection strategy selects equal values k and i to pre-process all three channels in the same way using PCA in case of combined attacks. This should be a significant disadvantage in our case where different k are best for different channels (see results in Sect. 4.3). Unfortunately, it would significantly increase the complexity to test different k-s for every channel (e.g. repeat clustering \(20^3\) times). Increasing the number of selected components i to prevent this would include more noise, in our circumstances, which in turn would degrade classification results significantly (see how curves with \(i > 1\) result in higher mean-values in Fig. 2).

We compared the improved non-profiled attack against a profiled template attack. This requires one additional trace for profiling at each position and for every probe. Templates consist of two means and a single full covariance matrix. To derive the remaining brute-force complexity as described in Sect. 3.3, we use bit-wise template matching results. For a fair comparison, we also apply PCA including the selection strategy for k and i. A higher number of \(61\,\% \) of positions (compared to the 56 % from the non-profiled attack) lead to remaining brute-force complexities \({\le } 32\) bit for the \({150}\,{\upmu }\mathrm{m}\) coil probe, with \(i=1\) and \(k=4\). The profiled template attack outperforms the non-profiled attack. As the most important observation, we find that the combination of channels leads to an improved \(66\,\% \) of the cases with a remaining brute-force complexity of \({\le } 32\) bit, with \(i=9\) and \(k=3\). This clearly demonstrates the gain of combining channels in the profiled setting.

In a profiled setting, attackers are able to test and find the ‘best’ measurement positions. This means that, in our circumstances, the use of multiple channels is only reasonable if the leakage of such best single channels is insufficient which diminishes the good results to a certain extent.

5 Conclusion

We significantly improved the algorithmic approach for non-profiled attacks against exponentiation by applying (PCA) and disregarding high- as well as low-ranked ones following a simple strategy. This selection strategy requires some trying-out (additional brute-force), but this is highly rewarded by improved attack outcomes in terms of low brute-force complexities. With this approach, the unsupervised attack using a single-channel high-resolution magnetic field measurement is remarkably threatening and leads to manageable brute-force levels in over half of the tested measurement positions. This emphasizes the need to prevent all possible cause for exploitable single-execution leakage. Regarding our results from three simultaneous channels, we find that the combination of channels only significantly improves the attack results, if a profiled attack is used. In case of the clustering-based, non profiled attack, the results from the combination are only comparable to the best individual one. In profiled settings attackers are also able to look for the ‘best’ measurement positions. Hence, multi-channel attacks are only reasonable if the exploitable leakage is insufficient at such best positions.