Initial State Recovery for Stream Ciphers

6.2 Attacks in One Dimension

6.2.3 Initial State Recovery for Stream Ciphers

Berbain, et al., used a linear approximation (6.6) for finding part of the initial state of the Grain stream cipher [4]. We describe the basic idea first. LetL andn be the length and the block size of the LFSR, respectively. Then the LFSR has2^Lnpossible initial states. Letz₁, . . . , z_N be the key stream output from the cipher and denote byYt the state of the LFSR at timet ≥ 0. The linear approximation (6.6) can be written as

w · z_t⊕ v · Y_t, for all t ≥ 0, (6.14) where v is padded with zeros if necessary. It has correlation c = c(v; w).

Recall from Section 4.2.2 thatY_t = A^tY₀, t ≥ 0 where Y₀ = Y is the initial state andA is given by (4.2). Using (3.1) we can write v · Yt = (A^t)^Tv · y, whereT denotes transposition. Denote v(t) = (A^t)^Tu. We can rewrite the approximation (6.14) as

w · zt⊕ v(t) · Y,

and it still holds with the same correlationc for all t ≥ 0. We assume as usual that the keystream wordszt, t ≥ 0 are statistically independent. Hence, for given Y , we can draw N statistically independent samples from the popu-lation Bernoulli(1/2 + c/2) by computing (6.14) for N consecutive times t.

We proceed by guessing the initial stateY. We obtain a sequence z₁^Y, . . . , z_N^Y from the cipher and we compute the empirical correlation byρ^Y = 2N⁻¹#{t : w · z_t^Y ⊕ v(t) · Y }. If the guess is correct, then the sample population is Bernoulli(1/2 + c/2). On the other hand, for any wrong guess, the LFSR statev(t)·Y and the keystream z_tshould not have any correlation, that is, the sample population isBernoulli(1/2). This is again the Wrong-key Hypothe-sis, Assumption 6.2.

We must find the empirical correlation for all possible initial states Y . Then we have2^Ln statistically independent random variables with realised valuesρ^Y, Y ∈ F^Ln₂ . Moreover, since the sample population is the Bernoulli distribution (with either correlation c or 0), we have by Section 3.3.4 that each sample is normally distributed. For the right guess, the mean isµR= c and variance isσ_R² = 1/N. For all the wrong guesses, the mean is µW = 0 and the variance isσ²_W = 1/N.

The problem of determining the right initial state is then the d-sample distinction problem studied in Section 5.3 and we can apply Selçuk’s key

62 6. LINEAR CRYPTANALYSIS IN ONE DIMENSION

ranking theory described in Section 6.2.2. The mark for each initial stateY is the empirical correlationρ^Y. We can find the data complexity using (6.12).

If we wish to find the right initial state, we haver = 1, and obtain that the data complexity is proportional to

N =

√2Φ⁻¹(PS) + b c

!₂ ,

whereP_S is a fixed success probability andb = Φ⁻¹(1 − 2^−nL).

The problem with this straightforward approach is that the initial state is so large that it is not possible to run through the whole space F^Ln₂ . Instead, Berbain, et al., proposed the following method they called the second LFSR derivation technique. The purpose is to restrict the initial state Y to some sub-state of only, say, M bits. For simplicity, we may assume that we want to determine theM first bits in the LFSR. Then we only have to consider 2^M different Y . We denote the set of Ln-bit vectors, whose Ln − M last components are zero, by∆M. We now show how to find many masksv(t) ∈

∆_M.

First, we sort the masksv(t) according to their last Ln − M bits. Then it is easy to divide the masks v(t) to groups, where the Ln − M last bits of the masks are the same. Let v(t₁) and v(t₂), where t₁ 6= t₂, belong to the same group. Then their XOR is in the set ∆M. By Piling Up lemma, this approximation has correlation c². The number of pairs among N different v(t) is ^N₂

≈ N². The number of pairs, whose XOR is in ∆_M is on average 2^m−LnN². The number of masks v(t) ∈ ∆M corresponding to correlation c is negligible when compared to the number of masks obtained by XOR. We may then omit them from the analysis.

The data complexity is proportional to [4]

N = 2^(Ln−m)/2/c².

After theM bits of the initial state are found, we may repeat the same proce-dure for some other initial state bits, provided that we find suitable approxi-mations.

6. LINEAR CRYPTANALYSIS IN ONE DIMENSION 63

7 MULTIDIMENSIONAL LINEAR CRYPTANALYSIS

7.1 BACKGROUND

In one-dimensional linear cryptanalysis, the analyst tries to find a strong lin-ear approximation such as (6.9) or (6.7). Sometimes it is possible to find several other approximations. That is, the analyst has, saym, approximations of the form

ui· x ⊕ wi· f(x), i = 1, . . . , m, (7.1) at disposal and each approximation has a non-negligible correlationc_i. The natural question is then if the analyst can somehow use all these approxima-tions for either reducing the data complexity or for finding more information about the cipher. It is also important to know what is the best possible method for using all the approximations.

First Matsui in [30] and then Junod and Vaudenay in [26] used two ap-proximations for key ranking in Alg.2. In [8], Kaliski and Robshaw considered m approximations of the form (6.9). They presented new versions of Matsui’s Algorithms 1 and 2 assuming the same key mask for all approximations. They showed that the data complexity of finding one key parity bit is reduced when multiple approximations are used. However, they assumed that the approx-imations are statistically independent. As a different approach, Johansson and Maximov presented an idea of a multidimensional distinguishing attack against the stream cipher Scream [25].

Similarly as Kaliski and Robshaw, Biryukov, et al., used also the assump-tion about statistical independence, but they let the key mask vary [6]. Using their version of Alg. 1, they could determinem key parity bits with reduced data complexity. With a new version of Alg. 2, they could also determine the last round key. They measured the efficiency of their method using “gain”.

The method by Kaliski and Robshaw can be regarded as a special case of the method by Biryukov, et al., which we call the Biryukov method, for brevity.

In 2004, Baignères, et al., presented a true multidimensional distinguisher that did not rely on the assumption of statistical independence [1]. However, they did not provide a way to determine the p.d. that was needed in the attack. Englund and Maximov tried to solve this problem by determining the p.d. over a whole stream cipher for example in [21]. However, it is unfeasible to compute the p.d. directly if the word-size of the cipher is more than 32 bits. The problem of finding the multidimensional approximation in practice remained unsolved. Another open question was how Matsui’s algorithms could be generalised to multiple dimensions.

The next sections give the answers to all these questions. Section 7.2 defines the multidimensional linear approximation. We show how the p.d.

can be found efficiently and with no need to consider the whole wordsize or blocksize of the cipher. Only the necessary information, i.e., non-uniform behaviour of the cipher, has to be considered.

In Section 7.3 we study the distinguishing attack of Baignères, et al. Sec-tions 7.4 and 7.5 give the generalisaSec-tions for Matsui’s algorithms. We give also the data, time and memory complexities for the algorithms in multiple dimensions. We conclude our findings of multidimensional Alg. 1 and Alg. 2

64 7. MULTIDIMENSIONAL LINEAR CRYPTANALYSIS

in Section 7.6. Finally, we show in Section 7.7, how multiple approximations can be used for making the one-dimensional initial state recovery attack of Section 6.2.3 more efficient.

In document Multidimensional linear cryptanalysis (Page 64-67)