Generic Cryptanalysis - The Generic Case - White-Box Cryptography: Analysis of White-Box AES Im

5.4 The Generic Case

5.4.1 Generic Cryptanalysis

This section elaborates on the impact of Assumption 2 on each phase of the cryptanalysis presented in Sect. 5.3 resulting in a generic cryptanalysis.

Setup Phase and Phase 1

The setup phase is independent of the secret randomization and hence remains the same, i.e., the eight dTMC(1,j)_i tables (i = 0, 1 and j = 0, 1, 2, 3) can still be made key-independent based on Lemma 4. With regard to Phase 1, the attacker is still able to construct four sets S_l(i,j)(l = 0, 1, 2, 3) as defined by (5.5) comprising leaked information for each linear input encoding L(1,j)_i −1

for i = 0, 1 and j = 0, 1, 2, 3; however, the associated function f_l(i,j) with each set S_l(i,j) is no longer known due to the secret randomization. Instead, the associated function can be any element of the known set

Sf = n f = S−1◦ ⊗_mc−1 1 ◦ ⊗mc0◦ S (mc0, mc1) ∈ S ∗ MC o with S_MC∗ = {(01, 02), (02, 03), (03, 01), (01, 01), (01, 03), (03, 02), (02, 01)} . The set S_MC∗ comprises all possible pairs formed out of the four MixColumns coefficients appearing on each row of the 4 × 4 matrix MC, i.e., out of the set

THE GENERIC CASE 145

{01, 01, 02, 03}. Since two MixColumns coefficients are equal to 01 for AES encryption, we have that |S_MC∗| = 7 and as a result also |Sf| = 7.

Phase 2

The second phase retrieves the secret linear input encodings L(1,j)_i −1for i = 0, 1 and j = 0, 1, 2, 3. Originally, this was achieved by using Algorithm 1 of Phase 2 (i.e., the algorithm for finding the desired linear equivalence (A, B)d) which

required as inputs one of the four sets S_l(i,j)(l = 0, 1, 2, 3) and its associated function f_l(i,j). However, as mentioned above, f_l(i,j) is unknown in the generic case and thus we need to guess ˜f_l(i,j)∈ Sf. Now, the question remains: “can

we filter out the incorrect guesses of ˜f_l(i,j)6= f_l(i,j) and obtain (A, B)d?”. This

is discussed in the following, where S(i,j)= {S₀(i,j), S₁(i,j), S₂(i,j), S₃(i,j)}. First, randomly select a set S ∈ S(i,j) _{without knowing the associated function}

f . Given that the chosen two distinct points xn ∈ S (n = 1, 2) are defined by

L(1,j)_i −1

(xn) = unkf (un), Algorithm 1 finds a linear equivalence if there exist

two distinct values an ∈ F82\ {0} for n = 1, 2 such that the initial guesses for A

become

A(xn) = ank ˜f (an) = As· unkf (un) for n = 1, 2 ,

for some guess of ˜f ∈ Sf and for some As(see Property 5). This problem can

be reduced to the following problem statement:

Problem Statement 2. Given x ∈ F8

2\ {0}, does there exist a y ∈ F82\ {0}

such that

xk ˜f (x) = As· ykf (y)

, (5.8)

for any As(see Property 5) and for any pair of functions (f, ˜f ) ∈ Sf × Sf?

Table 5.2 (left entries if applicable) lists the maximum number of x-values for which there exists a y satisfying (5.8) for each possible As _{and for any pair}

(f, ˜f ). As a result of a certain symmetry within the set Sf and As, the entries

of 255 in Table 5.2 on both ‘diagonals’ can be explained by the following: 1. If ˜f = f and As _{= I}

16, where I16 denotes the 16-bit identity matrix

over F2, then (5.8) becomes xkf (x)

= ykf (y) such that for each x ∈ F82\ {0} there exist a y satisfying the equation, i.e., y = x. This is

considered to be the trivial case; if we guess f correctly, then at least the desired linear equivalence (A, B)d with A = L

(1,j)

−1

Table 5.2: For any pair of functions (f, ˜f ) ∈ Sf × Sf listing the maximum

number of x ∈ F8

2\ {0} for which there exists a y ∈ F82\ {0} satisfying (5.8)

taken over all possible As_.

H H H H H f ˜ f (01, 02) (02, 03) (03, 01) (01, 03) (03, 02) (02, 01) (01, 02) 255 3 6 4 4 6 255 3 (02, 03) 4 255 3 4 4 255 3 4 (03, 01) 4 5 255 6 255 6 5 4 (01, 01) 3 4 3 3 4 3 (01, 03) 4 5 255 6 255 6 5 4 (03, 02) 4 255 3 4 4 255 3 4 (02, 01) 255 3 6 4 4 6 255 3 2. If ˜f = f−1 and As ₌ 08×8 I8 I8 08×8

, where 08×8 denotes the 8 × 8

zero matrix and I8 denotes the 8-bit identity matrix over F2, then (5.8)

becomes xkf−1(x)_{= f (y)ky such that for each x ∈ F}8

2\{0} there exist

a y satisfying the equation, i.e., y = f−1(x). Hence, if we guess the inverse of f , then at least the linear equivalence (A, B) with A = As_{· L}(1,j)

−1 is given as output where As _{is as specified above. Let us denote this specific}

linear equivalence by (A, B)0_d in the following.

Excluding the above two cases results in the right entries (if applicable) of Table 5.2. This shows that there are at most six distinct x-values for each possible As_{and for any pair (f, ˜}_{f ) (excluding the above cases) for which there}

exists a y satisfying (5.8). Observe that the grey-colored entries correspond to the cases discussed in Sect. 5.3.3 to determine the best choice of the set S selected out of S(i,j)_{in order to execute Algorithm 1.}

Note that in Table 5.2 the identity function I8 (i.e., the function with

(mc0, mc1) = (01, 01)) is left out as a possible guess for ˜f . The reason for

this omission is that the identity function requires additional guesses during the execution of LE, which is undesirable since it increases the work factor. This was already discussed in Sect. 5.3.3.

Generic algorithm for finding (A, B)d and (A, B)0_d. Here, we present a

generic algorithm for finding the linear equivalences (A, B)d and (A, B)0d that

eventually yield the secret linear input encoding L(1,j)_i −1. From Table 5.2 and the above observations it follows that if LE is repeated four times for a certain chosen set S ∈ S(i,j)_{and for all six guesses of ˜}_{f ∈ S}

THE GENERIC CASE 147

1. no solutions which shows that the chosen set is S ↔ (01, 01). In this case we need to chose a different set S∗ ∈ S(i,j) _{and repeat the whole}

procedure for this new set;

2. exactly two solutions, i.e., (A, B)dand (A, B)0d, out of which we can easily

filter out the linear input encoding L(1,j)_i −1

as explained below. The reason for repeating LE four times is to exclude additional linear equivalences except for (A, B)dand (A, B)0d. From Table 5.2 it follows that such

additional linear equivalences can only occur during at most three executions of LE. Note that it is only required to repeat LE four times if at least one linear equivalence is found during the first execution of LE.

Algorithm 2 gives a detailed description of the whole procedure for finding both linear equivalences (A, B)d and (A, B)0d contained within the returned set

S(A,B). Although the attacker cannot distinguish both elements in S(A,B), he

knows that both A’s of the found pairs of linear equivalences have the form A1= L (1,j) i −1 and A2= C · L (1,j) i −1 with C = 08×8 I8 I8 08×8 ,

or vice versa. Hence by verifying whether A1· A−12 or A2· A−11 equals C, the

attacker is able to retrieve the secret linear input encoding L(1,j)_i −1.

Phase 3

After the setup phase and Phases 1-2, the attacker retrieved all encodings

L(1,j)_i −1 (i = 0, 1 and j = 0, 1, 2, 3) of the first round. This enables him

to extract the round key bytes of the first round as described in Sect. 5.3.4. However, due to the secret randomization, there exists an ambiguity about the order of the round key bytes. Therefore, as was done in the BGE attack, the attacker needs to extract the round key bytes of the second round as well. This can be achieved by repeating the setup phase and Phases 1-2 for the second round. Observe that the generic cryptanalysis presented above can be applied to any two consecutive rounds r and r + 1 for some value of r with 1 ≤ r ≤ 8 and is not restricted to the first two rounds.

After that, the values of the round key bytes of two consecutive rounds are known, though with an unknown order of the round key bytes associated with each subround and an unknown order of the four subrounds. Phase 4 of the improved BGE attack (Sect. 4.1.3) provides an efficient method to determine the correct order of the round key bytes and to extract the secret AES key.

Algorithm 2 Finding the linear equivalences (A, B)d and (A, B)0d

Input: S1= (S, S), S2= dTMC

(1,j)

i , S(i,j), Sf\ {I8}

Output: (A, B)d and (A, B)0d

1: choose S ∈ S(i,j)

2: S(A,B) ← ∅

3: for all ˜f ∈ Sf\ {I8} do

4: select 8 distinct points x(i)n ∈ S with x(i)n 6= 0 for n = 1, 2 and 0 ≤ i ≤ 3

5: call search-LE x(0)₁ , x(0)₂ , ˜f ,S1,S2 → SLE

6: if |SLE| > 0 then

7: for i = 1 to 3 do

8: call search-LE x(i)₁ , x(i)₂ , ˜f ,S1,S2 → SLE∗

9: SLE ← SLE∩ SLE∗ 10: end for 11: end if 12: S(A,B) ← S(A,B)∪ SLE 13: end for 14: if |S(A,B)| = ∅ then

15: choose S∗∈ S(i,j) _{with S 6= S}∗

16: repeat steps 3–13 with the set S∗

17: end if

18: return S(A,B)

where Procedure search-LE is as specified in Algorithm 1.

With regard to the external encodings IN−1 and OUT, both bijective linear mappings on F128

2 , it suffices to say that the attacker is in possession of the

AES key (such that he can construct a standard AES encryption/decryption routine instantiated with the extracted key) and furthermore can observe a plain intermediate AES result that gives him access to the raw plaintext and ciphertext. This enables the attacker to determine the image of the external encodings for each i-th unit vector ei in F1282 .

In document White-Box Cryptography: Analysis of White-Box AES Implementations (Page 174-178)