Differential Power Analysis - Securing implementations of feedback-shift-register-based ciphers

1.8 Structure

2.1.2 Differential Power Analysis

Differential Power Analysis (DPA) attacks are the most popular type of power analysis attacks. The main advantage of DPA attacks is that a detailed knowledge about the attacked device is not required. Moreover, DPA attacks can reveal the secret key even if the recorded power traces are extremely noisy. However, DPA presents some drawbacks. The first one is that the attacker needs several power traces of the execution of the algorithm in the device under attack. The concrete number of traces depends on the noise of the signal and the leakage of the implementation, but it usually requires physical access to the device for a long time to obtain the needed traces. Another drawback is that the traces need to be perfectly synchronized.

It was first introduced by P. Kocher et al. [99, 101]. The generic naming of the attacks is also Differential Side-Channel Attack (DSCA), as DPA is the name for this first attack.

Chapter 2. Related work

There are two main stages in any DSCA attack: data acquisition and data analysis.

In data acquisition stage, the attacker executes several times the algorithm in the device under attack. For each execution, it must save data related to the execution and power traces of the execution, with a common reference that enables the synchronization of the traces.

In the data analysis stage an attacker checks which candidate key matches the traces collected. This operation is not done to the whole trace at once, but in a concrete instant of the trace, an instant where the power consumption depends on an intermediate value of data (v) and part of the key (k). Consequently, the first step is to choose an instant (concrete j) for the attack which corresponds to a known intermediate state of the algorithm. For each possible key k, an associated v can be calculated from known data input applying the algorithm. The attacker has a vector of real power values and, for each possible k, a vector of intermediate values. The attacker estimates the hypothetical leakage the execution of each possible intermediate value would generate. The next step is to find the key that generates an estimated leakage that statistically fits best with the real power traces obtained.

In the original DPA attack the difference of means method is used. The real power traces are split into two vectors using a selection function, which assigns power traces to one group, S0, when the intermediate value is 0 and to the other, S1, when is 1. For every

possible part of the key we have two subsets S0 and S1. The result of the attack is a

differential trace for every possible key candidate k, for every instant j in the power trace. In the differential trace, for a key guess, peaks appear when there is a relationship between traces associated to the same subset in that instant and it is different than the other subset (data dependent instant). If the point is not data dependent, an almost-zero value can be supposed: relationship between traces associated to the same subset exists but it should be similar in both subsets. If the instant is data dependent but the key guess is not correct, the relationship in each subset should be smaller and similar in both of them. Consequently, the partial key guess with highest value of the difference DTj is the

best candidate. If possible, the selection function and the instant j should correspond to intermediate values mutually independent in order to be sure that the greatest value of DTjcorresponds to the correct hypothesis [24].

Kocher assumed that if the key guess is incorrect, the differential trace should approach a flat trace but researchers have noticed (like T. Messerges in [126]) that “the actual trace may not be completely flat, as D with Kkincorrect may have a weak correlation to D with

Kkcorrect”. However, it has been shown [24] that this correlation even can be strong and

depends on the attacked algorithm and the selection function.

Although we have seen a selection function for one instant and one data bit, multiple bits and multiple instants can be used in order to increase the difference between the correct key guess and the incorrect ones. For instance, Messerges et al. [126] use d-bit data and two sets, and they assign those with greater HW to S1 (H(Vij ≥ d/2) and the

rest to S0 (H(Vij < d/2). In [24] Bevan improves the DPA attack using 4-bit D function.

Instead of deciding the key value when the four selection functions agree, the sum of the four differences of means to reach a solution faster. This solution is possible because every

2.1. Power Analysis Attacks Si mu la te d le a ka g e (H W ) clock cycles

(a) Simulated power trace

0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 120 140 160 180 200 Traces usedtraces used

co rre la tio n co e ff . (ρ )

(b) Correct key guess in black

Figure 2.1: CPA attack results.

bit influences the power consumption at the same time. Instead of having always a binary division, in [7] they use d-bit attacks and they divide the set of traces into d + 1 subsets. In [107] there is a formal definition of the differential trace which includes the mentioned attacks as particular cases, assigning values to aj in

DT = d X c=0 ac P ScSij |Sc| (2.1)

Correlation Power Analysis (CPA) attack is the name for a DSCA attack that uses the correlation factor method instead of the difference of means to solve the problem of the “ghost peaks”, which are peaks obtained with erroneous key guesses whose intermediate values partially correlate to the correct ones. It was first proposed in [36].

According to [107] CPA requires 15% of traces required by DPA, and it solves the problem of “ghost peaks”.

Figure 2.1 shows the results of a sample CPA attack to an ASIC AES implementation. The graph on the left shows the correlation of all K = 256 subkey permutations to the measurement results as a function of the number of measured samples S. On the right, the correlation of all K = 256 subkey permutations is given for 10 000 measurements.

The combination of CPA with the described collision attacks is presented in [130] and [32] with great results, improving the efficiency of the algorithms executed individu- ally in the presence of noise.

Mutual Information Analysis is a DSCA attack that apply information theory to develop a powerful attack without any device characterization [76]. The goal of the attack is to reduce the complexity and assumptions of the models needed for previous SCA. The experimental setup is similar to other DSCA, although the distinguisher uses only generic assumptions and is therefore more effective, although it might require more measurements to success.

MIA uses the concepts of entropy and mutual information between random variables. These random variables can be descripted using the probability density function (pdf).

Chapter 2. Related work

pdf is a function that represents the probability of the random variable X to take on a given value x from a space X .

The entropy of a random variable is its uncertainty when an experiment is performed. The random variables used in MIA are L, representing the hypothetical or estimated leakage at the point of interest, and O representing the measurements or observations made.

Given these random variables, according to Information Theory definitions [58] the entropy H[L] is a measure of the uncertainty of the estimated leakage when an experiment is performed.

Related to MIA, H[ ˆO|L] is the uncertainty of the measurements observed given an estimated leakage. It can be calculated for all the possible estimated leakages, depending on the possible key values.

The mutual information I(X;Y) expresses the dependence between these two random variables. It represents the amount of information acquired about random variable Y by knowing X.

MIA is divided in 2 stages: estimation of the pdf and calculation of the probability distance. The estimation of pdf of O is obtained from the measurements. The pdf of estimated leakage must be calculated for every considered value of the key at the point of interest (at least).

The calculation of probability distance consists in measuring the correlation of the every possible L with O. The estimated leakages with less divergence from the measurements are more probably from the correct key.

This generic toolbox of the MIA was presented in [182]. The original MIA presented in [76] followed this structure using histograms in the first stage and using the mutual information as the distance measurement method. The mutual information is performed calculating ˆH[O] and H[O| ˆL]from estimated pdf.

The problem of estimating pdf from a limited amount of random samples is well studied. Several proposals for MIA are described in [12, 185], dividing the methods in non- parametric, which require no assumption on the leakage model, and parametric methods, that perform better than non-parametric methods when the assumptions are correct.

The non-parametric methods include histogram based estimators, adaptive partition- ing of the XY plane, kernel density estimator (KDE), B-Spline estimator, k-nearest neight- bours (kNN) estimator and wavelet density estimator (WDE) . Both KDE and WDE truly outperform the histogram methods according to [185]

The parametric methods include Bayesian estimator, Edgeworth estimators, maximum likelihood (ML) estimator, and least square estimator. These model fitting methods require that the attacker establishes the family of distributions that adapt better to the leakage random variable O.

MIA is more suitable than CPA when the attacker’s leakage model is sufficiently imprecise.

2.1. Power Analysis Attacks

Attack Distinguisher Requirements Model

dependency Efficacy DPA Difference of

Means

Leakage model

estimation High Low

CPA

Pearson correlation

coefficient

Leakage model

estimation High High

MIA Mutual Information Leakage probability density function Low Medium

Table 2.2: DSCA summary

CPA is the most efficient attack, when the attacker estimates a leakage model that matches the leakage model of the target. Therefore, the evaluation framework we propose, described in Chapter 3, considers using a simulator with the same leakage model used in he attacks. In Section 3.4 we describe the DSCA evaluation method used, based on CPA.

However, depending on the algorithm, DSCA might produce multiple “ghost peaks” when there is a high correlation between the target intermediate value and other intermediate values.

In [140] authors propose Euclidean DPA (EDPA), which improves the results of CPA considering “ghost peaks”. The proposed method is compatible with any DSCA. It makes use of the information leaked by the ghost peaks to diminish the ghost peaks themselves. The DSCA result is si,j, the result for key candidate ki at time tj. Besides the

selected DSCA attack si,j, the attack includes a correlation between the hypothetical power

consumption for the different candidate keys, which they call inter-data correlation, cp,i

between key candidates p and i.

In contrast to CPA attacks where only the highest correlation value is used to indicate the correct key hypothesis, EDPA attacks determine the correct key based on the correlation values of all key candidates. Euclidean similarity is used to ensure that higher CPA values contribute more than lower ones. The Euclidean similarity values are used to tune the DSCA reducing the values such that only the peaks that are not significant in both. The final result is obtained by mixing the traditional DSCA with the Euclidean Similarity.

We consider that the attacker has great knowledge about the implementation. There- fore, the attack can be focused only on a time window when the target intermediate value can be manipulated, in order to avoid the effect of “ghost peaks”. However, in Section 3.4 we propose Differential CPA (DCPA) in case it is difficult to avoid the effect of “ghost peaks”. DCPA reduces the effect of inter-data correlation with a more simple approach than EDPA.

Chapter 2. Related work

Higher Order DPA

The commonly suggested way to fight against first-order power analysis is random masking (see Section 2.2.3). It is known that masking can be defeated if the attacker knows how to correlate power consumption more than once per computation. This is known as second-order, or more generally higher-order, power analysis and was originally suggested by Messerges in [125]. These attacks are known to be more complex and delicate to carry out because they usually require the attacker to have a deeper knowledge of the device, although this might be alleviated in particular cases [183].

k-order DPA attacks generalize (first-order) DPA attacks by considering simultane- ously k samples, within the same power consumption trace, that correspond to k different intermediate values. According to [55], the k leakage signals can be combined into one and perform a first order DPA (“combining higher order DPA”) or an expert attacker can profile the implementation leakage “profiling higher order DPA”. Once it is computed, the profile is involved to launch an optimal probabilistic attack. In [55] both techniques are ap- plied successfully to a third order masked implementation that claimed to be secure [158]. There is a preliminary problem for high-order attacks against masking countermeasures, which is knowing the time periods where data is manipulated. (i and j). It can be done by an exhaustive search on a small time window [136]. Other methods evaluate the variance between execution of the encryption on the same input data [109, 75], assuming that the variance is only due manipulating masked values, that are masked with different value in different executions. If a chosen plaintext scenario is not feasible, there is a generic method based on MIA [152] to reduce the exhaustive search proposed in [136]. It consists in evaluating the mutual information between the different instants of the small time window and selecting the tuples with higher value.

In case the adversary only knows the offset δ = i − j (but not i nor j), the previous attack can be extended as a “known-offset second-order DPA attack” [183]. The adversary evaluates the second-order differential trace.

Again, under certain assumptions, the second-order DPA trace exhibiting the highest DPA peak will likely uncover the value of the intermediate result.

DPA is not the only DSCA that has been extended to multivariate analysis. MIA easily generalizes to multivariate statistics [16], performing practical attacks on implementations with masking countermeasures, and hence does not need to worry about the combination of the leakages.

Stochastic methods also have been extended to multivariate analysis to attack pro- tected implementations. In [60] they complete a practical attack using linear regression techniques, assuming that the leakage can be expressed as a linear combination of functions chosen according to the nature of the target device and the algorithm under attack.

High-order attacks are proved to defeat masking countermeasures, although they might be difficult to apply in real scenarios. As we consider that the attacker has great knowledge about the implementation, a high-order attack would be feasible in our evaluation framework. We should consider high-order attacks if we use masking countermeasures. However, high-order attacks do not provide any benefit compared to

In document Securing implementations of feedback-shift-register-based ciphers using compiler optimizations and co-processors (Page 39-45)