Differential Power Analysis - Power Analysis Attacks

Chapter 4 Security of Algorithms

4.3 Power Analysis Attacks

4.3.2 Differential Power Analysis

Differential power analysis (DPA) is a statistical attack that uses power consumption data from a large number of encryptions to retrieve secret information about the key. DPA has proved to be a powerful cryptanalysis technique that has been able to extract the secret key from several DES implementations [2]. The DPA

algorithm is presented below [2]:

1. A set of N plaintexts are randomly generated.

2. The power consumption during the encryption of the N plaintexts is measured. The attacker gets N traces each containing n values.

3. A hypothetical model of the chip is fed with the plaintexts (or ciphertexts) and a guess at one byte of the first (or last) sub-key.

4. A selection function, D, is applied to the output of the hypothetical model which separates the traces into two sets.

5. The average of both sets is computed and the difference between the averages is calculated.

6. Steps 3 to 5 are repeated for each sub-key guess. This will give 28 differential traces.

7. For each differential trace the peak and mean value is determined and the ratio between the two is calculated.

8. For a correct sub-key guess there will be large peaks seen in an otherwise flat differential trace.

9. To get all the sub-keys, steps 2 to 8 are repeated 16 times (for a 128-bit key). The choice of hypothetical model determines the section of the algorithm that is being attacked. It takes the input or output to that section, generally a section of the plaintext or ciphertext, and a guess at one byte of the relevant sub-key and outputs

either the output or the input to section. The selection function separates the plain- or ciphertexts, and therefore their associated power traces, into two sets. Kocher’s original hypothetical model and selection function D (C; b; Ks) [2] attacked the left

hand intermediate at the beginning of the 16th round. It accepted the ciphertext, C, a 6- bit sub-key guess, Ks, to predict the output and a value between 0 and 31 representing

which bit of the DES intermediate was being attacked, b, as inputs. The selection function applies the ciphertext and sub-key guess to an inverse DES algorithm and returns either a 1 or a 0 depending on the value of the bth_{bit that would give these}

values. Varying the value of b modulus 4 targets different sub-bytes of the key, as in DES there are 8 s-boxes each with a 4-bit output. Kocher was using DPA to analyse DES; Schuster reported that while the original selection function used by Kocher on DES works with the AES power consumption model it was unsuccessful with real test data [3] and proposes a new one based on the Hamming weight of the output of the s- box, if it is greater than four then the trace is added to one set, if it is not then it is added to the other. Schuster uses this to successfully crack an AES implementation that is being run on an 8-bit microcontroller

4.3.2.1 Leakage Based Differential Power Analysis

As CMOS technology shrinks in size the leakage power becomes a more significant portion of overall power consumption. While leakage power is mainly dependent on physical parameters its dependence on input patterns becomes significant in sub-90 nm technology [65], therefore leakage power needs to be considered when evaluation a system for susceptibility to DPA. Lin and Burleson took this into account and developed “Leakage-based” DPA (LDPA) [66].

The LPDA algorithm is essentially the same as the regular DPA algorithm except the power traces that are recorded capture both the dynamic power and the leakage power. The attack was tested on a SPICE simulation of an implementation of DES and it revealed the correct key after 120 traces using 45 nm CMOS, compared to 200 traces for regular DPA using 180 nm CMOS.

4.3.2.2 Correlation as the Statistical Test in DPA

The DPA attack described in section 4.3.2 uses a statistical test called the difference-of-means. The distance-of-mean test simply takes the difference between the mean of two sets of data, it assumes that the variances of the two data sets are the

same and not much information from the model can be included. Other tests have been proposed, including analysis of variance (ANOVA), which can simultaneously compare the means of several sets of data and works better than the distance-of-mean test [67]. This section discusses the use of correlation in DPA using the Pearson correlation coefficient. It was first described by Brier et al in [68]. This coefficient reflects the degree of linear relationship between two random variables, it can be used to provide a direct comparison between the real and hypothetical model of the device. It is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom. This is equivalent to dividing the covariance between the two variables by the product of their standard deviations as shown in equation (4-2). Y X Y X σ σ ρ = cov( , ) (4-2)

In order to calculate an estimate of the correlation from a number of samples the formula in equation (4-3) must be used.

2 1 1 2 2 1 1 2 1 1 1 ) ( ) (

∑

= = = = = = = − − − = N i i N i N i i N i N i i N i i N i i i Y Y N X X N Y X Y X N i i

ρ

(4-3)

The coefficient ranges from −1 to 1, the sign indicating the direction of the relationship. If the coefficient has the value 1 then a linear equation describes the relationship perfectly and positively, all data points lie on the same line and Y increases with X. A value of −1 means a linear equation describes the relationship perfectly but negatively, i.e. all data points lie on a single line but Y increases as X decreases. A correlation value of 0 means that there is no linear relationship between the variables.

The technique described by Kocher in [2] attacks an algorithm by predicting the value of one bit and partitions the traces accordingly. The method proposed by Brier is a multi-bit attack; it predicts the number of bits that change in a byte of registers. This means that the technique involved is slightly different from regular DPA. It has three stages, prediction, measurement and correlation, a description is given below [61]:

a. Predict the number of bit changes inside a number of targeted registers in a specific clock cycle.

b. Repeat this for all 28 possible values of a byte of the key and for N different randomly chosen plaintexts.

c. Put them in N * 28 matrix. This is called the Prediction Matrix 2. Measurement Stage

a. Measure the power consumption over all (C) clock cycles in the encryption process

b. Record the highest power consumption in each clock cycle in an N * C matrix. This is called the Consumption Matrix

3. Correlation Stage

a. Calculate the correlation between the column representing the clock cycle that was targeted in the prediction phase in the Consumption Matrix and each column in the Prediction Matrix.

b. The column of the Prediction Matrix that shows the greatest correlation is the one that represents a correct key guess.

It is possible to perform this type of attack using purely simulated data. This requires using a more detailed hypothetical model of the device that can be used to predict the bit changes in all of the registers in the device for all cycles and entering the data into an N * C Prediction Matrix. This is then used instead of the Consumption Matrix in the Correlation Stage.

4.3.2.3 Choice of Target in Differential Power Analysis

Both forms of power analysis attack a specific point in an algorithm. In DPA the position of this is selected by the choice selection function and in a correlation attack the choice of which register to target is explicitly made. This section defines the properties that determine whether a particular register is an appropriate target for the attack. Figure 4-2 shows a diagram of the AES algorithm with all the possible positions of registers between the stages. It shows which of the registers in the design have the properties that make them suitable for the target of a DPA attack.

Figure 4-2: Diagram showing the predictability and fullness of registers at different points in AES.

Both forms of power analysis find the correct key value by testing all possible key values and finding the value whose result best fulfils the attack’s selection criterion, the target must therefore be determined by a small enough number of key bits for this to be computationally feasible. In practise this limit is assumed to be 16 bits [60], below this a register is said to be predictable. In AES the s-boxes are 8 bits wide; this gives 256 different key values to test which is easily performed. The Mix Columns operation mixes the data from 4 bytes; this means the output depends on 32 key bits, above the predictability limit.

A register is described as full if it leaks information about the key via its transitions. This is also a property required in order to make a register a valid target. As seen in Figure 4-2 register 1 does not leak information as it only contains plaintext data. Interestingly, registers 2 and 3 do not necessarily leak information either as the influence of the key on the transition cancels out over two successive plaintexts as illustrated in equation (4-4). They can be made to be full by resetting the contents to 0s between plaintexts. Also they can be full in smart card implementations where there is a constant instruction address loaded.

Reg21 ⊕ Reg22 = (plaintext1 ⊕ key) ⊕ (plaintext2 ⊕ key)

= plaintext1 ⊕ plaintext2

(4-4)

Registers after the s-box will all be full as the non-linearity of the substitution stops the influence of the key on the transition value over 2 successive plaintexts cancelling.

In document Novel countermeasures and techniques for differential power analysis (Page 89-94)