๐ฅ๐ ๐ฅ๐+1) (mod 256) .
Hence, the T layer is represented by the8ร8block diagonal matrix๐ดwith the matrix
(
2 1 1 1)on
2
the diagonal four times. The permutation of the bundles is called ashuffle. It is the permutation 3
(0 2 4 6 1 3 5 7)
and corresponds to the matrix 4 ๐ต = โ โ โ โ โ โ โ โ โ โ โ 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 โ โ โ โ โ โ โ โ โ โ โ (mod 256) .
Hence, the๐ -matrix representing the entire diffusion layer of SAFER is 5 ๐ = ๐ด โ ๐ต โ ๐ด โ ๐ต โ ๐ด = โ โ โ โ โ โ โ โ โ โ โ 8 4 4 2 4 2 2 1 4 2 2 1 4 2 2 1 4 4 2 2 2 2 1 1 2 2 1 1 2 2 1 1 4 2 4 2 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 โ โ โ โ โ โ โ โ โ โ โ (mod 256) .
Let us now consider the effect of this matrix on the state, interpreted as a vector of length eight 6
over the ring๐ . Since the operator defined by๐is linear, in order to determine how differences 7
in the input propagate it suffices to consider the differences as relative to the zero vector, i.e. to 8
study the images of individual vectors. Most vectors๐ฃof weight one are mapped to vectors 9
๐ฃ โ ๐ of weight eight, but not all, for instance, ๐ฃ = (32 0 0 0 0 0 0 0)๐ก is mapped to a vector
10
๐ฃ โ ๐ =(0 128 128 64 128 64 64 32)๐กof weight seven, and the image of๐ฃโฒ=(128 0 0 0 0 0 0 0)๐กhas
11
weight just one: ๐ฃโฒโ ๐ =(0 0 0 0 0 0 0 128)๐ก. Intuitively, this means that some inputs or some 12
differences do not diffuse well through the layer. Catastrophic consequences are in fact avoided 13
only by the fact that SAFER uses good S-boxes and the fact that the single bundle difference on 14
the eighth vector element will completely diffuse in the following round. 15
1.8.3.1 Multipermutations and MDS Matrices 16
The obvious question is: what are the matrices that guarantee themost completediffusion? The 17
question is somewhat ill posed because a desirable property of any component of a block cipher 18
is its fast evaluation. Hence, a good diffusion matrix must strike the right balance between good 1
diffusion and fast evaluation: a less perfect but much faster diffusion layer could still lead to a 2
cipher that it faster and not less secure than another cipher making use of an ideal, but slower, 3
diffusion layer. Also the question of performance is per se difficult to formalise: for instance, a 4
sparse matrix is not necessarily good if some entries represent elements which are expensive 5
to multiply with. 6
This said, the first problem remains that of measuring the quality of diffusion and determining 7
optimal matrices โ performance considerations, including compromises, come later. 8
To address this first problem, Serge Vaudenay suggested [Vau94] (generalising previous work 9
by himself and Claus-Peter Schnorr [SV94]) to use multipermutations: Given an alphabetโณand
10
integers๐,๐ , a a(s, n)-multipermutationover the alphabetโณis a function ๐ fromโณ๐ toโณ๐such
11
that two different(๐ + ๐)-tuples of the form(๐ฅ, ๐(๐ฅ))cannot collide in any๐ positions. Serge Vaudenay 12
in particular first observed that the PHT in SAFER (and hence the whole diffusion layer) is not 13
a multipermutation. 14
To construct multipermutations, if the alphabet is representable as a finite field, he suggested to 15
use (the redundancy part) ofMDS matrices, i.e. matrices of MDS (maximum distance separa- 16
ble) codes, which are the codes which reach the Singleton bound: In other words a๐ ร ๐ matrix 17
๐over a finite field๐ฝ is an MDS matrix if it is the transformation matrix of a linear transforma- 18
tion ๐ โถ๐ฝ๐ ๐ฝ๐,๐ฅ ๐ด๐ฅwith the following property: if๐ฅand๐ฅโdiffer in exactly๐กcomponents,
19
then ๐(๐ฅ)and ๐(๐ฅโ) must differ in at least๐ โ ๐ก + 1components. The latter property is called 20
perfect diffusion. Vaudenay also showed how to exploit imperfect diffusion for cryptanalysis (as 21
in the case of reduced rounds of SAFER with suboptimal S-boxes, cf. Section3.8 on page 150). 22
Now, to see why this is optimal and indeed a desirable cryptographic property, let us assume 23
๐ = ๐and consider first the case of a single changed input word. Then the change should spread 24
to all outputs โ a property that, as we have seen at the beginning of this section, is not satisfied 25
by the SAFER diffusion later. If we now change two words, we may always choose them to 26
thatoneof the outputs of the linear transformation is equal to the corresponding input (this is 27
a simple linear algebra exercise) so we cannot do better than requiring that at least๐ โ 1inputs 28
are changed. 29
Note however, that the MDS condition, for๐ = ๐ is stronger than being invertible (i.e. non- 30
singular), as exemplified by the identity matrix, and non-singularity is of course not a sufficient 31
condition for being an MDS matrix, since it applies only to square matrices. Non-singularity, 32
however, gives a way to characterise an MDS matrix: Theorem 8 (page 321) of [MS77] states 33
thata matrix is an MDS matrix if and only if every square sub-matrix is non-singular. In particular, 34
a MDS matrix cannot have zero entries. 35
The first notable cipher to use MDS matrices for diffusion is Shark [RDP+96], designed by 36
Vincent Rijmen, Joan Daemen, Bart Preneel, Antoon Bosselaers and Erik De Win (cf. Subsec- 37
tion3.21.2 on page 190). For the design of the diffusion layer, the๐-bit outputs of the S-boxes 38
are considered as elements of๐ฝ๐. The diffusion layer takes๐ ๐-bit values as input, and gives 39
๐ ๐-bit outputs. Such a vector or๐ ๐-bit values represents the state of the cipher. Joan Dae- 40
men defines optimal diffusion using thebranch number[Dae95]: Thebranch numberโฌof an 41
invertible linear mapping๐is 42
โฌ๐= min
wherewt(๐)is the Hamming weight of๐(here๐is considered as a tuple of elements over some al- 1
gebraic structure - the bundles - so by Hamming weight it is understood the number of nonzero 2
elements of the tuple). โฌ๐gives a measure for the worst case diffusion: it is a lower bound for 3
the number of active S-boxes in two consecutive rounds of a linear trail or a differential charac- 4
teristic. 5
Note thatwt(๐)โฉฝ ๐, for every choice of๐; ifwt(๐)= 1, this implies thatโฌ๐โฉฝ ๐ + 1. An invertible 6
linear mapping๐for whichโฌ๐= ๐ + 1is calledoptimal. If the vector๐represents, for instance, 7
an input differential, we see how this definition of optimality corresponds to the multipermu- 8
tation property. In fact, it follows directly from the definitions of branch number and of MDS 9
codes that the generator matrix of an MDS code defines an optimal linear transformation ๐. 10
Furthermore, this๐must be invertible. 11
Other examples of block ciphers that use MDS matrices for diffusion are SQUARE [DKR97] (see 12
Section3.11 on page 160), Twofish (Section3.13 on page 164), the AES contest winner Rijndael 13
(Section3.20 on page 182), Hierocrypt [OMSK00], IDEA NXT (Section3.23 on page 195), Clefia 14
(Section3.28 on page 203), Piccolo (Subsection3.37.2 on page 228), and LED (Subsection3.37.4 15
on page 229), MDS matrices are used also in the stream cipher MUGI [WFY+04] and in the
16
cryptographic hash function WHIRLPOOL [BR11a]. 17
It is worth noting that the entries in the MDS matrices are usually chosen as to be elements 18
of low Hamming weight, in order to make multiplication by them as inexpensive as possible. 19
This is often done by exhaustive search within certain classes of MDS matrices, such as gener- 20
ator matrices of Reed-Solomon codes. Also, since a MDS matrix cannot have zero entries, the 21
desirable type of sparseness is a small amount of entries not equal to one. 22
Even multiplication by low Hamming weight elements of a finite field can be too expensive for 23
some applications. Therefore some ciphers, such as mCrypton (Section3.25 on page 198), define 24
their diffusion matrix in a different, ad hoc, way. The corresponding study of the diffusion 25
properties is also ad hoc and the S-boxes have to be chosen carefully. 26
We shall return to the problem of constructing efficient MDS diffusion layers in Subsubsec- 27
tion1.8.3.3 on page 51. 28
Another problem arises with states that consists of many words, for instance 16, as in 128-bit 29
SPNs with 8-bit S-boxes (or with 64 bit SPNs that use 4-bit S-boxes), namely that the diffusion 30
matrix becomes too large - in the examples we just made it would be a16 ร 16matrix. The 31
solution adopted in ciphers such as SQUARE and Rijndael, with 16 words (of one byte each) 32
is to only apply diffusion to each of four blocks of four words independently during a round 33
- and then to simply permute the words in such a way that full diffusion will be completed 34
in the followinground. Therefore instead of the multiplication of a diffusion matrix times a 35
column vector, in such ciphers the diffusion operation is implemented as a multiplication of 36
two matrices: the diffusion matrices and a matrix whose columns are segments of the state. In 37
mathematical notation this is described in Section3.20 on page 182. 38
1.8.3.2 Types of MDS Matrices 39
We recall that what we called a MDS matrix๐ is, formally, thenon-systematic (or redundancy)
40
part of the generator matrix of an MDS code. This means that a basis for the corresponding๐ + ๐- 41
dimensional codeword space over a finite field๐พis given by the rows of the generator matrix 42
๐บ =(๐|๐ผ๐), where๐is a๐ ร ๐matrix and๐ผ๐is the identity๐ ร ๐matrix. 1
Here we are chiefly interested insquare MDS matrices, i.e. with๐ = ๐, however we must ob- 2
serve there are further uses in cryptography: the F-function of the block cipher PICARO (Sub- 3
section3.38.1 on page 230), a Feistel network, uses a full generator matrix๐บof an MDS code 4
to embed a eight-dimensional vector space over๐ฝ28 into a 14-dimensional one, and then the
5
transpose of๐บto compress back 14 dimensions to eight. In this case the generator matrix has 6
a6 ร 8redundancy part. 7
There are two ways of constructing MDS matrices: one can start with a known MDS code, for 8
instance the code used in Shark is a Reed-Solomon code, or search for matrices that satisfy the 9
non-singular sub-matrix condition. 10
Cauchy matrices are a classic example of MDS matrices. They are of the form(๐ฅ1
๐โ๐ฆ๐)0โฉฝ๐,๐<๐with
11
all๐ฅ๐โ ๐ฆ๐ โ 0over a field๐พ. In general they do not lend themselves readily to optimisation. 12
Amr Youssef, Serge Mister and Stafford Tavares define in [YMT97] a special class of Cauchy 13
matrices for the design of diffusion layers in block ciphers: they construct their matrices ๐ด 14
over a binary field๐พby first choosing the๐ฅ๐โs such that the least significant๐bits of๐ฅ๐are the 15
binary representation of the number๐, and then putting๐ฆ๐= ๐ฅ๐โ ๐ฃwhere๐ฃis a nonzero field 16
element such that its least significant๐bits are all zero. This matrix satisfies๐ด2 = ๐๐ผ๐ where
17
๐ = โจ๐โ1๐=0 (๐ฅ 1
1โ๐ฆ๐)
2
over ๐พ. The matrix ๐ดis then normalised dividing all its entries by โ๐, so 18
that it becomes involutory. Such a ๐ ร ๐matrix also has only ๐ different entries, which are 19
used for both encryption and decryption, reducing the number of circuits or short programs to 20
implement for the multiplication by constants. 21
Vandermonde matrices (see [Yca13] for their history and naming) are matrices where each row 22
is of the form1, ๐ผ๐, ๐ผ2๐, โฆ , ๐ผ๐โ1๐ for pairwise distinct๐ผ๐โs. They are MDS matrices and there are 23
very efficient algorithms for multiplication of vectors by them, as this operation amounts to 24
multi-evaluation of a polynomial of degree๐ โ 1at๐points. These algorithms are DFT based 25
(cf. Chapter 3 of [Pan01]) and therefore suitable only for large matrices. We are not sure which 26
is the first mention of Vandermonde matrices for the construction of diffusion layers in SPNs: 27
Often, in the literature, a 2004 paper by Jรฉrรดme Lacan and Jรฉrรดme Fimes [LF04] is cited, which 28
however deals with a clever use of Vandermonde matrices to build erasure codes, not with 29
cryptographic applications. 30
Hadamard matrices ๐ป have the property that ๐ป โ ๐ป๐ก = ๐๐ผ๐ (here ๐ป๐ก denotes the transpose 31
of ๐ป). The first such matrices were originally constructed by James Joseph Sylvester [Syl67] 32
and Jacques Hadamard [Had93] as real matrices with entries equal toยฑ1, but over finite fields 33
the latter condition is relaxed. The property ๐ป โ ๐ป๐ก = ๐๐ผ๐ makes them suitable to construct 34
involutory diffusion layers, upon scaling, and they are used for this purpose in Anubis (Sub- 35
section3.21.3 on page 192) and Khazad (Subsection3.21.2 on page 190). 36
Mahdi Sajadieh et al. in [SDMO12], construct involutory MDS matrices using Vandermonde 37
matrices over fields๐ฝ2๐. Their idea is to take๐pairwise distinct and non-vanishing values๐ผ๐for
38
0 โฉฝ ๐ < ๐, a nonzero๐ฟin๐ฝ2๐, and to put๐ฝ๐= ๐ผ๐โ ๐ฟ. If๐ดand๐ตare the Vandermonde matrices
39
associated to the ๐-uples(๐ผ0, ๐ผ1, โฆ , ๐ผ๐โ1) and(๐ฝ0, ๐ฝ1, โฆ , ๐ฝ๐โ1), then๐ต โ ๐ดโ1 is an involutory 40
MDS matrix. They then go on to construct2๐ร 2๐Hadamard involutory matrices recursively, 41
starting from slightly modified4ร4Vandermonde matrices: Kishan Chand Gupta and Indranil 42
Ghosh Ray [GR13a] show that these matrices can be constructed also starting from Cauchy 43
matrices. 1
1.8.3.3 Constructing Efficient MDS Matrices 2
We now focus on the problem of constructingefficient MDS matrices. We have already men- 3
tioned that choosing matrices with entries of low Hamming weight is often desirable, but the 4
actual problem is the minimisation of the cost of the multiplication by the whole MDS matrix. 5
Multiplication of a variable vector or matrix, over a finite field, by a fixed matrix with hard- 6
wired nonzero constant entries has a code or area complexity strongly correlated with the num- 7
ber of entries different from one (the actual values of such entries of course also plays a role). 8
Therefore, Pascal Junod and Serge Vaudenay introduce in [JV04b] the following criterion: if
9
๐ฃ1(๐)is the number of entries equal to one in the matrix๐and๐1(๐)is the cardinality of the set๐ถ(๐)
10
of distinct entries in๐which are different from one, then the goal is to maximise๐ฃ1(๐)and to minimise
11
๐(๐). Of course this does not take into account special situations which may make some ma- 12
trices more efficient than others in some cases: for instance, multiplication the generator๐ฅof 13
the polynomial basis of the field is inexpensive, as is also multiplication by๐ฅโ1; and the struc-
14
ture of the set๐ถ(๐)is not taken into account, as when some elements are the sum or product 15
of other elements in the set. Junod and Vaudenay then start constructing candidates for MDS 16
matrix from the concept ofbi-regularity:a2 ร 2array with nonzero entries in a field๐พisbi-regular
17
if at least one row and one column have two different entries. It is clear that bi-regularity is a pre- 18
requisite for non-singularity. MDS matrices are constructed iteratively by extending bi-regular 19
arrays, and lower bounds for๐ฃ1(๐)and๐1(๐)are given as a function of the dimensions of the 20
matrices. 21
To support their point through examples, Junod and Vaudenay consider the4ร4MDS matrix๐ 22
over๐ฝ28 =๐ฝ2[๐ฅ]/(๐ฅ8+๐ฅ4+๐ฅ3+๐ฅ+1)used in Rijndael โ i.e. the matrix used in theMixColumns
23
step described in Section 3.20 on page 182. It has๐1(๐) = 2, which according to [JV04b] is 24
optimal, but๐ฃ1(๐) = 8, whereas a lower bound of๐ฃ1(๐) = 7is possible. Multiplication by 25
the Rijndael matrix can be implemented using 15 XORs, four table lookups in one table (to 26
implement multiplication) and using three temporary variables. Junod and Vaudenay show 27
that the family of matrices of the form 28 โ โ โ โ โ ๐ 1 1 1 1 ๐ 1 ๐ 1 ๐ ๐ 1 1 1 ๐ ๐ โ โ โ โ โ
can be implemented using 10 XORs and seven table lookups in two tables, using two temporary 29
variables. Using the sub-matrix non-singularity criterion, it is easily seen that such a matrix is 30
a MDS matrix over a field extension of๐ฝ2if and only if 1,๐,๐, and๐ + ๐are pairwise distinct 31
from each other,๐ โ ๐2, and๐2 โ ๐. This matrix is at the basis of the diffusion layer in IDEA
32
NXT-64 (Section3.23 on page 195). Junod and Vaudenay also construct a8 ร 8matrix over๐ฝ28,
33
which is used in IDEA NXT-128. Being MDS, these matrices all have optimal branch numbers, 34
i.e.5and9respectively. 35
A different line of research, followed by several authors during the last few years, and that is 36
particularly advantageous for ciphers whose design criteria are compactness of code and data 37
or of area, is to construct MDS matricesiteratively. The idea is simple, if a matrix๐exists with a 38
very compact and sparse representation such that the๐thpower of๐is a MDS matrix๐, then
1
one can just apply๐times the matrix๐in place of๐. (For some reason, such constructions are 2
often calledrecursivein the literature.) 3
Jian Guo, Thomas Peyrin, Axel Poschmann, and Matthew Robshaw design the diffusion layers 4
of the hash function PHOTON [GPP11] and of the block cipher LED [GPPR12, GPPR11] by 5
constructing their MDS matrix as the power of the companion matrix of a LFSR. Recall that if 6
๐ฆ๐+๐= ๐๐โ1๐ฆ๐+๐โ1+ ๐๐โ2๐ฆ๐+๐โ2+ โฏ + ๐1๐ฆ๐+1+ ๐0๐ฆ๐ (1.3) is a recursive relation with๐0, ๐๐โ1โ 0, then its characteristic polynomial is
7
๐(๐)= ๐๐โ(๐๐โ1๐๐โ1+ ๐๐โ2๐๐โ2+ โฏ + ๐1๐ + ๐0) (1.4) and its companion matrix is the matrix๐ถsuch that
8 โ โ โ โ โ 0 ๐ผ๐โ1 ๐0 ๐1 โฏ ๐๐โ1 โโโโโโโโโ ๐ถ โ โ โ โ โ โ โ โ โ โ โ ๐ฆ๐ ๐ฆ๐+1 โฎ ๐ฆ๐+๐โ1 โ โ โ โ โ = โ โ โ โ โ ๐ฆ๐+1 โฎ ๐ฆ๐+๐โ1 ๐ฆ๐+๐ โ โ โ โ โ , (1.5)
which is denoted bySerial(๐0, ๐1, โฆ , ๐๐โ1)in [GPP11]. The inverse of๐ถhas a simple form as well