Differential Cryptanalysis - A Salad of Block Ciphers

𝑥_𝑖 𝑥_𝑖+1) (mod 256) .

Hence, the T layer is represented by the8×8block diagonal matrix𝐴with the matrix

(

2 1 1 1)on

the diagonal four times. The permutation of the bundles is called ashuffle. It is the permutation 3

(0 2 4 6 1 3 5 7)

and corresponds to the matrix 4 𝐵 = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (mod 256) .

Hence, the𝑅-matrix representing the entire diffusion layer of SAFER is 5 𝑀 = 𝐴 ⋅ 𝐵 ⋅ 𝐴 ⋅ 𝐵 ⋅ 𝐴 = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 8 4 4 2 4 2 2 1 4 2 2 1 4 2 2 1 4 4 2 2 2 2 1 1 2 2 1 1 2 2 1 1 4 2 4 2 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (mod 256) .

Let us now consider the effect of this matrix on the state, interpreted as a vector of length eight 6

over the ring𝑅. Since the operator defined by𝑀is linear, in order to determine how differences 7

in the input propagate it suffices to consider the differences as relative to the zero vector, i.e. to 8

study the images of individual vectors. Most vectors𝑣of weight one are mapped to vectors 9

𝑣 ⋅ 𝑀 of weight eight, but not all, for instance, 𝑣 = (32 0 0 0 0 0 0 0)𝑡 _{is mapped to a vector}

𝑣 ⋅ 𝑀 =(0 128 128 64 128 64 64 32)𝑡_{of weight seven, and the image of}_𝑣′₌₍_{128 0 0 0 0 0 0 0}₎𝑡_has

weight just one: 𝑣′⋅ 𝑀 =(0 0 0 0 0 0 0 128)𝑡. Intuitively, this means that some inputs or some 12

differences do not diffuse well through the layer. Catastrophic consequences are in fact avoided 13

only by the fact that SAFER uses good S-boxes and the fact that the single bundle difference on 14

the eighth vector element will completely diffuse in the following round. 15

1.8.3.1 Multipermutations and MDS Matrices 16

The obvious question is: what are the matrices that guarantee themost completediffusion? The 17

question is somewhat ill posed because a desirable property of any component of a block cipher 18

is its fast evaluation. Hence, a good diffusion matrix must strike the right balance between good 1

diffusion and fast evaluation: a less perfect but much faster diffusion layer could still lead to a 2

cipher that it faster and not less secure than another cipher making use of an ideal, but slower, 3

diffusion layer. Also the question of performance is per se difficult to formalise: for instance, a 4

sparse matrix is not necessarily good if some entries represent elements which are expensive 5

to multiply with. 6

This said, the first problem remains that of measuring the quality of diffusion and determining 7

optimal matrices – performance considerations, including compromises, come later. 8

To address this first problem, Serge Vaudenay suggested [Vau94] (generalising previous work 9

by himself and Claus-Peter Schnorr [SV94]) to use multipermutations: Given an alphabetℳand

integers𝑛,𝑠, a a(s, n)-multipermutationover the alphabetℳis a function 𝑓 fromℳ𝑠 _to_ℳ𝑛_such

that two different(𝑠 + 𝑛)-tuples of the form(𝑥, 𝑓(𝑥))cannot collide in any𝑠positions. Serge Vaudenay 12

in particular first observed that the PHT in SAFER (and hence the whole diffusion layer) is not 13

a multipermutation. 14

To construct multipermutations, if the alphabet is representable as a finite field, he suggested to 15

use (the redundancy part) ofMDS matrices, i.e. matrices of MDS (maximum distance separa- 16

ble) codes, which are the codes which reach the Singleton bound: In other words a𝑛 × 𝑠matrix 17

𝑀over a finite field𝔽 is an MDS matrix if it is the transformation matrix of a linear transforma- 18

tion 𝑓 ∶𝔽𝑠 _𝔽𝑛_,_𝑥 _𝐴𝑥_{with the following property: if}_𝑥_and_𝑥∗_{differ in exactly}_𝑡_components,

then 𝑓(𝑥)and 𝑓(𝑥∗) must differ in at least𝑛 − 𝑡 + 1components. The latter property is called 20

perfect diffusion. Vaudenay also showed how to exploit imperfect diffusion for cryptanalysis (as 21

in the case of reduced rounds of SAFER with suboptimal S-boxes, cf. Section3.8 on page 150). 22

Now, to see why this is optimal and indeed a desirable cryptographic property, let us assume 23

𝑠 = 𝑛and consider first the case of a single changed input word. Then the change should spread 24

to all outputs – a property that, as we have seen at the beginning of this section, is not satisfied 25

by the SAFER diffusion later. If we now change two words, we may always choose them to 26

thatoneof the outputs of the linear transformation is equal to the corresponding input (this is 27

a simple linear algebra exercise) so we cannot do better than requiring that at least𝑛 − 1inputs 28

are changed. 29

Note however, that the MDS condition, for𝑠 = 𝑛 is stronger than being invertible (i.e. non- 30

singular), as exemplified by the identity matrix, and non-singularity is of course not a sufficient 31

condition for being an MDS matrix, since it applies only to square matrices. Non-singularity, 32

however, gives a way to characterise an MDS matrix: Theorem 8 (page 321) of [MS77] states 33

thata matrix is an MDS matrix if and only if every square sub-matrix is non-singular. In particular, 34

a MDS matrix cannot have zero entries. 35

The first notable cipher to use MDS matrices for diffusion is Shark [RDP+96], designed by 36

Vincent Rijmen, Joan Daemen, Bart Preneel, Antoon Bosselaers and Erik De Win (cf. Subsec- 37

tion3.21.2 on page 190). For the design of the diffusion layer, the𝑚-bit outputs of the S-boxes 38

are considered as elements of𝔽_𝑚. The diffusion layer takes𝑛 𝑚-bit values as input, and gives 39

𝑛 𝑚-bit outputs. Such a vector or𝑛 𝑚-bit values represents the state of the cipher. Joan Dae- 40

men defines optimal diffusion using thebranch number[Dae95]: Thebranch numberℬof an 41

invertible linear mapping𝜃is 42

ℬ_𝜃= min

wherewt(𝑎)is the Hamming weight of𝑎(here𝑎is considered as a tuple of elements over some al- 1

gebraic structure - the bundles - so by Hamming weight it is understood the number of nonzero 2

elements of the tuple). ℬ_𝜃gives a measure for the worst case diffusion: it is a lower bound for 3

the number of active S-boxes in two consecutive rounds of a linear trail or a differential charac- 4

teristic. 5

Note thatwt(𝑎)⩽ 𝑛, for every choice of𝜃; ifwt(𝑎)= 1, this implies thatℬ_𝜃⩽ 𝑛 + 1. An invertible 6

linear mapping𝜃for whichℬ_𝜃= 𝑛 + 1is calledoptimal. If the vector𝑎represents, for instance, 7

an input differential, we see how this definition of optimality corresponds to the multipermu- 8

tation property. In fact, it follows directly from the definitions of branch number and of MDS 9

codes that the generator matrix of an MDS code defines an optimal linear transformation 𝜃. 10

Furthermore, this𝜃must be invertible. 11

Other examples of block ciphers that use MDS matrices for diffusion are SQUARE [DKR97] (see 12

Section3.11 on page 160), Twofish (Section3.13 on page 164), the AES contest winner Rijndael 13

(Section3.20 on page 182), Hierocrypt [OMSK00], IDEA NXT (Section3.23 on page 195), Clefia 14

(Section3.28 on page 203), Piccolo (Subsection3.37.2 on page 228), and LED (Subsection3.37.4 15

on page 229), MDS matrices are used also in the stream cipher MUGI [WFY+_{04] and in the}

cryptographic hash function WHIRLPOOL [BR11a]. 17

It is worth noting that the entries in the MDS matrices are usually chosen as to be elements 18

of low Hamming weight, in order to make multiplication by them as inexpensive as possible. 19

This is often done by exhaustive search within certain classes of MDS matrices, such as gener- 20

ator matrices of Reed-Solomon codes. Also, since a MDS matrix cannot have zero entries, the 21

desirable type of sparseness is a small amount of entries not equal to one. 22

Even multiplication by low Hamming weight elements of a finite field can be too expensive for 23

some applications. Therefore some ciphers, such as mCrypton (Section3.25 on page 198), define 24

their diffusion matrix in a different, ad hoc, way. The corresponding study of the diffusion 25

properties is also ad hoc and the S-boxes have to be chosen carefully. 26

We shall return to the problem of constructing efficient MDS diffusion layers in Subsubsec- 27

tion1.8.3.3 on page 51. 28

Another problem arises with states that consists of many words, for instance 16, as in 128-bit 29

SPNs with 8-bit S-boxes (or with 64 bit SPNs that use 4-bit S-boxes), namely that the diffusion 30

matrix becomes too large - in the examples we just made it would be a16 × 16matrix. The 31

solution adopted in ciphers such as SQUARE and Rijndael, with 16 words (of one byte each) 32

is to only apply diffusion to each of four blocks of four words independently during a round 33

- and then to simply permute the words in such a way that full diffusion will be completed 34

in the followinground. Therefore instead of the multiplication of a diffusion matrix times a 35

column vector, in such ciphers the diffusion operation is implemented as a multiplication of 36

two matrices: the diffusion matrices and a matrix whose columns are segments of the state. In 37

mathematical notation this is described in Section3.20 on page 182. 38

1.8.3.2 Types of MDS Matrices 39

We recall that what we called a MDS matrix𝑀 is, formally, thenon-systematic (or redundancy)

part of the generator matrix of an MDS code. This means that a basis for the corresponding𝑛 + 𝑘- 41

dimensional codeword space over a finite field𝐾is given by the rows of the generator matrix 42

𝐺 =(𝑀|𝐼_𝑛), where𝑀is a𝑘 × 𝑛matrix and𝐼_𝑛is the identity𝑛 × 𝑛matrix. 1

Here we are chiefly interested insquare MDS matrices, i.e. with𝑘 = 𝑛, however we must ob- 2

serve there are further uses in cryptography: the F-function of the block cipher PICARO (Sub- 3

section3.38.1 on page 230), a Feistel network, uses a full generator matrix𝐺of an MDS code 4

to embed a eight-dimensional vector space over𝔽₂8 into a 14-dimensional one, and then the

transpose of𝐺to compress back 14 dimensions to eight. In this case the generator matrix has 6

a6 × 8redundancy part. 7

There are two ways of constructing MDS matrices: one can start with a known MDS code, for 8

instance the code used in Shark is a Reed-Solomon code, or search for matrices that satisfy the 9

non-singular sub-matrix condition. 10

Cauchy matrices are a classic example of MDS matrices. They are of the form₍_𝑥1

𝑖−𝑦𝑗)0⩽𝑖,𝑗<𝑛with

all𝑥_𝑖− 𝑦_𝑗 ≠ 0over a field𝐾. In general they do not lend themselves readily to optimisation. 12

Amr Youssef, Serge Mister and Stafford Tavares define in [YMT97] a special class of Cauchy 13

matrices for the design of diffusion layers in block ciphers: they construct their matrices 𝐴 14

over a binary field𝐾by first choosing the𝑥_𝑖’s such that the least significant𝑟bits of𝑥_𝑖are the 15

binary representation of the number𝑖, and then putting𝑦_𝑖= 𝑥_𝑖⊕ 𝑣where𝑣is a nonzero field 16

element such that its least significant𝑟bits are all zero. This matrix satisfies𝐴2 = 𝑐𝐼𝑛 where

𝑐 = ⨁𝑛−1_𝑖=0 ₍_𝑥 1

1⊕𝑦𝑖)

over 𝐾. The matrix 𝐴is then normalised dividing all its entries by √𝑐, so 18

that it becomes involutory. Such a 𝑛 × 𝑛matrix also has only 𝑛 different entries, which are 19

used for both encryption and decryption, reducing the number of circuits or short programs to 20

implement for the multiplication by constants. 21

Vandermonde matrices (see [Yca13] for their history and naming) are matrices where each row 22

is of the form1, 𝛼_𝑖, 𝛼2_𝑖, … , 𝛼𝑛−1_𝑖 for pairwise distinct𝛼_𝑖’s. They are MDS matrices and there are 23

very efficient algorithms for multiplication of vectors by them, as this operation amounts to 24

multi-evaluation of a polynomial of degree𝑛 − 1at𝑛points. These algorithms are DFT based 25

(cf. Chapter 3 of [Pan01]) and therefore suitable only for large matrices. We are not sure which 26

is the first mention of Vandermonde matrices for the construction of diffusion layers in SPNs: 27

Often, in the literature, a 2004 paper by Jérôme Lacan and Jérôme Fimes [LF04] is cited, which 28

however deals with a clever use of Vandermonde matrices to build erasure codes, not with 29

cryptographic applications. 30

Hadamard matrices 𝐻 have the property that 𝐻 ⋅ 𝐻𝑡 = 𝑛𝐼_𝑛 (here 𝐻𝑡 denotes the transpose 31

of 𝐻). The first such matrices were originally constructed by James Joseph Sylvester [Syl67] 32

and Jacques Hadamard [Had93] as real matrices with entries equal to±1, but over finite fields 33

the latter condition is relaxed. The property 𝐻 ⋅ 𝐻𝑡 = 𝑛𝐼_𝑛 makes them suitable to construct 34

involutory diffusion layers, upon scaling, and they are used for this purpose in Anubis (Sub- 35

section3.21.3 on page 192) and Khazad (Subsection3.21.2 on page 190). 36

Mahdi Sajadieh et al. in [SDMO12], construct involutory MDS matrices using Vandermonde 37

matrices over fields𝔽₂𝑟. Their idea is to take𝑛pairwise distinct and non-vanishing values𝛼_𝑖for

0 ⩽ 𝑖 < 𝑛, a nonzero𝛿in𝔽₂𝑟, and to put𝛽_𝑖= 𝛼_𝑖⊕ 𝛿. If𝐴and𝐵are the Vandermonde matrices

associated to the 𝑛-uples(𝛼₀, 𝛼₁, … , 𝛼_𝑛−1) and(𝛽₀, 𝛽₁, … , 𝛽_𝑛−1), then𝐵 ⋅ 𝐴−1 is an involutory 40

MDS matrix. They then go on to construct2𝑑× 2𝑑Hadamard involutory matrices recursively, 41

starting from slightly modified4×4Vandermonde matrices: Kishan Chand Gupta and Indranil 42

Ghosh Ray [GR13a] show that these matrices can be constructed also starting from Cauchy 43

matrices. 1

1.8.3.3 Constructing Efficient MDS Matrices 2

We now focus on the problem of constructingefficient MDS matrices. We have already men- 3

tioned that choosing matrices with entries of low Hamming weight is often desirable, but the 4

actual problem is the minimisation of the cost of the multiplication by the whole MDS matrix. 5

Multiplication of a variable vector or matrix, over a finite field, by a fixed matrix with hard- 6

wired nonzero constant entries has a code or area complexity strongly correlated with the num- 7

ber of entries different from one (the actual values of such entries of course also plays a role). 8

Therefore, Pascal Junod and Serge Vaudenay introduce in [JV04b] the following criterion: if

𝑣₁(𝑀)is the number of entries equal to one in the matrix𝑀and𝑐₁(𝑀)is the cardinality of the set𝐶(𝑀)

of distinct entries in𝑀which are different from one, then the goal is to maximise𝑣₁(𝑀)and to minimise

𝑐(𝑀). Of course this does not take into account special situations which may make some ma- 12

trices more efficient than others in some cases: for instance, multiplication the generator𝑥of 13

the polynomial basis of the field is inexpensive, as is also multiplication by𝑥−1_{; and the struc-}

ture of the set𝐶(𝑀)is not taken into account, as when some elements are the sum or product 15

of other elements in the set. Junod and Vaudenay then start constructing candidates for MDS 16

matrix from the concept ofbi-regularity:a2 × 2array with nonzero entries in a field𝐾isbi-regular

if at least one row and one column have two different entries. It is clear that bi-regularity is a pre- 18

requisite for non-singularity. MDS matrices are constructed iteratively by extending bi-regular 19

arrays, and lower bounds for𝑣₁(𝑀)and𝑐₁(𝑀)are given as a function of the dimensions of the 20

matrices. 21

To support their point through examples, Junod and Vaudenay consider the4×4MDS matrix𝑀 22

over𝔽₂8 =𝔽₂[𝑥]/(𝑥8+𝑥4+𝑥3+𝑥+1)used in Rijndael – i.e. the matrix used in theMixColumns

step described in Section 3.20 on page 182. It has𝑐₁(𝑀) = 2, which according to [JV04b] is 24

optimal, but𝑣₁(𝑀) = 8, whereas a lower bound of𝑣₁(𝑀) = 7is possible. Multiplication by 25

the Rijndael matrix can be implemented using 15 XORs, four table lookups in one table (to 26

implement multiplication) and using three temporary variables. Junod and Vaudenay show 27

that the family of matrices of the form 28 ⎛ ⎜ ⎜ ⎜ ⎝ 𝑎 1 1 1 1 𝑎 1 𝑏 1 𝑏 𝑎 1 1 1 𝑏 𝑎 ⎞ ⎟ ⎟ ⎟ ⎠

can be implemented using 10 XORs and seven table lookups in two tables, using two temporary 29

variables. Using the sub-matrix non-singularity criterion, it is easily seen that such a matrix is 30

a MDS matrix over a field extension of𝔽₂if and only if 1,𝑎,𝑏, and𝑎 + 𝑏are pairwise distinct 31

from each other,𝑎 ≠ 𝑏2_{, and}_𝑎2 _{≠ 𝑏. This matrix is at the basis of the diffusion layer in IDEA}

NXT-64 (Section3.23 on page 195). Junod and Vaudenay also construct a8 × 8matrix over𝔽₂8,

which is used in IDEA NXT-128. Being MDS, these matrices all have optimal branch numbers, 34

i.e.5and9respectively. 35

A different line of research, followed by several authors during the last few years, and that is 36

particularly advantageous for ciphers whose design criteria are compactness of code and data 37

or of area, is to construct MDS matricesiteratively. The idea is simple, if a matrix𝑁exists with a 38

very compact and sparse representation such that the𝑘th_{power of}_𝑁_{is a MDS matrix}_{𝑀, then}

one can just apply𝑘times the matrix𝑁in place of𝑀. (For some reason, such constructions are 2

often calledrecursivein the literature.) 3

Jian Guo, Thomas Peyrin, Axel Poschmann, and Matthew Robshaw design the diffusion layers 4

of the hash function PHOTON [GPP11] and of the block cipher LED [GPPR12, GPPR11] by 5

constructing their MDS matrix as the power of the companion matrix of a LFSR. Recall that if 6

𝑦_𝑛+𝑘= 𝑐_𝑛−1𝑦_{𝑛+𝑘−1}+ 𝑐_𝑛−2𝑦_{𝑛+𝑘−2}+ ⋯ + 𝑐₁𝑦_𝑘+1+ 𝑐₀𝑦_𝑘 (1.3) is a recursive relation with𝑐₀, 𝑐_𝑛−1≠ 0, then its characteristic polynomial is

𝑔(𝑋)= 𝑋𝑛−₍𝑐_𝑛−1𝑋𝑛−1+ 𝑐_𝑛−2𝑋𝑛−2+ ⋯ + 𝑐₁𝑋 + 𝑐₀₎ (1.4) and its companion matrix is the matrix𝐶such that

8 ⎛ ⎜ ⎜ ⎜ ⎝ 0 𝐼_𝑛−1 𝑐₀ 𝑐₁ ⋯ 𝑐_𝑛−1 ⏟⏟⏟⏟⏟⏟⏟⏟⏟ 𝐶 ⎞ ⎟ ⎟ ⎟ ⎠ ⋅ ⎛ ⎜ ⎜ ⎜ ⎝ 𝑦_𝑘 𝑦_𝑘+1 ⋮ 𝑦_{𝑛+𝑘−1} ⎞ ⎟ ⎟ ⎟ ⎠ = ⎛ ⎜ ⎜ ⎜ ⎝ 𝑦_𝑘+1 ⋮ 𝑦_{𝑛+𝑘−1} 𝑦_𝑛+𝑘 ⎞ ⎟ ⎟ ⎟ ⎠ , (1.5)

which is denoted bySerial(𝑐₀, 𝑐₁, … , 𝑐_𝑛−1)in [GPP11]. The inverse of𝐶has a simple form as well

In document A Salad of Block Ciphers (Page 51-95)