37
The state of a SPN can usually be viewed as an array of smaller words or bundles. It is these bun- 38
dles that are usually transformed by some non-linear operation such as S-boxes, then combined 39
with a round key, and permuted among each other or, if they are interpreted as elements of 1
some algebraic structure (for instance as a vector space over a finite field) even composed with 2
each other using some kind of linear operation. 3
Suppose an attacker changes a bit in the input. The corresponding S-box ideally changes on 4
average a half of its bits β this means that, other than confusion, an S-box already performs 5
a limited diffusion process. These altered bits are then furtherdiffused by the diffusion layer 6
and thus affect the input to one or more S-boxes in the following round. Ideally, even more 7
S-boxes are affected in the following round and so on. The affected S-boxes are calledactive 8
S-boxes and, intuitively, to increase both efficiency and security it is desirable that the number 9
of active S-boxes becomes maximal after as few rounds as possible. In other word, we want to 10
attain gooddiffusion. There are different techniques to achieve it, and we shall present the most 11
significant ones. 12
1.8.1 Bit Permutations
13
Several recent ciphers, such as PRESENT (Section 3.29 on page 206) and PRINTcipher (Sec- 14
tion3.32 on page 211) use a simple bit permutation of the S-box outputs to achieve diffusion. 15
Therefore additional conditions on the S-boxes are placed to improve the avalanche effect. (For 16
instance single input differences should not trigger the same single input difference in another 17
S-box in the next round). 18
However, these ideas have been developed very early in the history of block ciphers, and bit 19
permutations were commonplace in early designs, such as the Data Encryption Standard3.2 20
on page 129and GOST 28147-893.4 on page 140. 21
One of the earliest treatments of bit permutation networks in SPNs is due to John Kam and 22
George Davida [KD79], who introduce the notion of completeness: A bijective function π βΆ 23
{0, 1}π {0, 1}π (for instance a S-box) is said to be complete if, for everyπ, π β [0..π β 1], there
24
exist two n-bit vectorsπ₯1,π₯2such thatπ₯1andπ₯2differ only in theπthbit and π(π₯
1)differs from π(π₯2)at
25
least in theπthbit. Similarly, a keyed cipher is said to becompleteif it is complete function for all keys.
26
(It is easily seen that the Strict Avalanche Criterion is a strengthening of this property.) 27
Kam and Davida also show how to construct complete ciphers. The fundamental idea consists 28
in alternating substitution layers, where the S-boxes are assumed to be complete themselves, 29
with specially defined wired permutation layers. Each output bit in a layer should be wired to 30
input bits to distinct (but not necessarily different) S-boxes at the next layer. The corresponding 31
graph, with the S-boxes as nodes and the wirings as edges directed from an S-box in a layer to 32
an S-box in the next layer, should connect each input bit to the SPN to each output bit through 33
exactly one path, in other words it should be a polytree, i.e. be acyclic. Usually the S-boxes are 34
either keyed or the key is mixed before the S-Box, one key bit per input bit. 35
Each SPN output bit may thus be viewed as a tree function of the SPN input bits, where each 36
tree function is composed of S-boxes. Similarly, the network inputs may be viewed as tree 37
functions of the network output bits. Therefore, several authors follow Howard M. Heys and 38
Stafford E. Tavares in [HT93,HT95] and refer to SPNs constructed using the Kam and Davida 39
methodology astree-structured SPNsor TS-SPNs. 40
Kam and Davida provide a concrete construction method, by means of which, a complete SPN 41
with block size ofπ = ππ‘bits can be built usingπ‘substitution layers andπ‘ β 1bit permutation 42
Figure 1.7: Complete Tree-Structure SPNs Following the Kam-Davida Construction π0 πβ²0 π1 πβ²1 π2 πβ²2 π3 πβ²3 π0 πβ²0 πβ³ 0 π1 πβ²1 πβ³1 π2 πβ²2 πβ³2 π3 πβ²3 πβ³ 3 π4 πβ²4 πβ³4 π5 πβ²5 πβ³ 5 π6 πβ²6 πβ³ 6 π7 πβ²7 πβ³7 π8 πβ²8 πβ³ 8
layers. Examples for(π, π‘)=(4, 2)and(3, 3)are shown in Figure1.7. However, there is a prob- 1
lem with their construction: it can be shown that if a single input bit is changed, the probability 2
of an output bit changing will always be2βπ‘. This was shown in [Web85] and termed avalanche 3
damping (see also [WT85]). 4
John Kam and George Davida obtainedUS Patent 4,275,265on their design in 1981. A similar 5
idea, wherepairsof bits are permuted as bundles, but otherwise in an entirely analogous way, 6
is described in Thomson-CSFβsUS Patent 4,751,733(filed in 1986, lapsed). 7
It is these ideas that influenced the wiring used in PRESENT (where completeness is achieved 8
over two rounds, as it can be seen from Figure3.34 on page 206) and, to a lesser extent, PRINTci- 9
pher. The design of composite S-boxes such as those of Khazad (Subsection3.21.2 on page 190) 10
and Anubis (Subsection3.21.3 on page 192), or of ICEBERG (Section3.24 on page 197) is, on 11
the other hand, more similar to the scheme of the Thomson-CSF patent. 12
Simple bit permutations are particularly efficient in hardware, as they amount to just wiring. 13
Some exceptional cases may be implemented efficiently also in software. However, they suffer 14
from the problem that an output bit may at most influence a single input bit of the next layer 15
of the cipher, and therefore diffusion is slower than with more sophisticated linear layers. and 16
more rounds may be necessary than with more complex linear diffusion layers. As a conse- 17
quence, bit permutation based diffusion layers are more suitable to very compact hardware 18
implementations than to efficient implementations (mostly in software, but also in hardware). 19
A historical remark: Kam-Davida constructions have been used repeatedly to build example 20
ciphers for the purposes of explaining cryptanalysis. For instance in Howard M. Heysβs excel- 21
lent tutorial on linear and differential cryptanalysis [Hey02], or to create practical examples of 22
new attacks in order to analyse them, as in [Lea10,BG10,AL12]. However, this SPN is almost 23
always used or rediscovered without attribution. 24
1.8.2 Shuffles
25
The shuffle of the branches of a (generalised) Feistel network is usually just a simple circular 26
rotation of the branches of the state. A shuffle is therefore just a particular bit permutation. 27
In the SAFER cipher family (Section 3.8 on page 150) diffusion is achieved by alternating a 28
layer of Pseudo-Hadamard Transforms with a bundle shuffle; the latter is carefully chosen to 1
guarantee that after three such layers a change is diffused to all bundles (with some exceptions 2
that we discussed earlier). 3
Taking a cue from Kaisa Nybergβs work on GFNs [Nyb96], that includes, for instance, the design 4
depicted in Figure1.4 on page 31, Subfigure 6 (b), Tomoyasu Suzaki and Kazuhiko Minematsu 5
in [SM10] consider shuffles of the branches in GFNs, with the goal of minimising the number 6
of rounds necessary to achieve full diffusion. 7
Consequently, they consider the setting where a single shuffle is performed per round β as 8
opposed to the multi-shuffle design of SAFER. We see immediately that on one hand, shuffles 9
are just bit permutations, but on the other hand they are compatible with the partitioning of the 10
state in branches or bundles. This has two important consequences: all the output bits of a S-box 11
(or of a Feistel target) influences the input to just one bundle or branch in the following round; 12
this influence can be intuitively controlled better, since we do not have to take into account the 13
influence of single unchanged output bits. 14
Let π be a shuffle of the branches of the Feistel network, where the branches are identified 15
with the corresponding index set (for a π-branch GFN this set would be[0..π β 1]). Define 16
the quantityDRπ(π)as the minimum number of rounds necessary to diffuse theπ-th sub input 17
block of the first round,π₯(0π)to all sub output blocks (of theDRπ(π)-th round), andDRmax(π)
18
to be the maximum of all suchDRπ(π). For a π-branch Type 2 GFS it isπ(π) = (π + 1)mod π 19
and it is easy to see thatDRmax(π)= π. However, for a different shuffleπof the branches, the 20
correspondingDRmax(π)can be different. 21
Now, let us first define 22
DRmaxΒ±(π)βΆ= max{DRmax(π), DRmax(πβ1)} (so that both encryption and decryption are taken into consideration) and 23
DRmaxβπ βΆ= min{DRmaxΒ±(π)βΆ π β π΄([0..π β 1])}
whereπ΄([0..π β1])is the full symmetric group over the set[0..π β1]. An exhaustive search gives 24
DRmaxβ4= 4,DRmaxβ6= 5,DRmaxβ8= 6,DRmaxβ10= 7, andDRmaxβπ = 8forπ = 12, 14, 16. 25
Suzaki and Minematsu then searched for optimal block shuffles, and for their optimalπβπ 26
DRmax(ππβ)= DRmax((ππβ)β1)
holds true. Interestingly, forπ = 8a permutationπ was found such thatDRmax(π) = 5and 27
DRmax(πβ1) = 7, which is not optimal w.r.t. the above definition of DRmaxβπ. A cipher de- 28
signed around that permutation could have decryption easier to analyze than encryption! This 29
is an uncommon occurrence, but, for instance, FROG (Subsection3.19.3 on page 180) is such a 30
cipher. 31
All optimum block shufflesππβfound by Suzaki and Minematsu also satisfy the property that 32
any even-indexed input block is mapped to an odd-indexed output block and vice versa β so 33
that the output of a target branch is permuted to the input of a source branch. Such shuffles are 34
calledeven-odd shuffles. 35
A lower bound forDRmaxβfor even-odd shuffles can be derived as follows. For a fixed one 1
block input difference, letπππ, resp.πππ, be the number of odd-numbered, resp. even-numbered, 2
sub blocks in theπ-th round output affected by that input block. Initially we have that one 3
of ππ
0 andπ0π is0 and the other one is 1. Assuming that the shuffle works ideally, we have
4
πππ = ππβ1π + ππβ1π , andπππ = ππβ1π and from this we see thatπππ = ππβ1π + ππβ2π holds. Hence 5
{πππ}πis a Fibonacci sequence. For a GFS with an even-odd shuffle, if a certain number of rounds 6
is sufficient to achieve the diffusion to all even output blocks, the full diffusion is achieved 7
by one more round. Therefore, ifπis the smallest integer that satisfiesπππ β©Ύ π/2, π + 1is the 8
lower bound forDRmaxβfor all even-odd shuffles forπblocks (not necessarily achievable). The 9
sequence{ππ
π}π takes lower values withπ0π = 0andπ0π = 1and gives the Fibonacci numbers.
10
Henceπππ β ππ/β5, whereπis the golden ratio, and the lower bound forDRmaxβis roughly 11
logπβ5π/2 β log21.44π. The optimal results mentioned above for evenπ,4 β©½ π β©½ 16are very 12
close to this bound. 13
In [SM10] Suzaki and Minematsu show how to use colored de Bruijn graphs to build a block 14
shuffle forπ = 2π +1 (for anyπ β©Ύ 2) whoseDRmaxβis at most2π + 2 = 2 log2π. This is quite 15
close to thelog21.44πlower bound just proved. For the details of the construction we refer to the 16
paper. The important remark here is that this gives an upper bound that proves the logarithmic 17
growth ofDRmaxβπ. 18
The authors also compare their results to those of James Massey for the branch permutation 19
used for diffusion in the SAFER family, in particular to the Armenian Shuffle used in SAFER+ 20
(cf. Subsection 3.8.3 on page 152). Even though the Armenian Shuffle is also based on a de 21
Bruijn graph, it is not an even-odd shuffle β bit this is not a problem for SAFER+, since it is a 22
bricklayer cipher, not a generalised Feistel. 23
It is still an open question if better shuffle families can be found β i.e. with a smallerDRmaxβ 24
or how it can be achieved by mixing different types of shuffles. 25
1.8.3 Diffusion Layers Based on Linear Algebra
26
For simplicity let us assume that the words of the state are elements of a moduleπover a ring 27
π (π can be the ring itself) so that its elements can be added to each other and multiplied by 28
elements ofπ (scalars). More generally, we can consider modules over rings instead. The state 29
is this just aπ-tupleπ£of elements ofπand we consider the following type of transformation: 30
π£is multiplied by aπ Γ πmatrixπoverπ . Multiplication byπshould be invertible at least in 31
the case where the diffusion is used directly in a βclassicβ SPN. 32
Note that the diffusion layer can sometimes be described as a matrix even when this operation 33
is described in a different way. For instance, in SAFER (Section3.8 on page 150) the diffusion 34
layer (which can be seen in Figure3.14 on page 151) is constructed from simpler operations over 35
the ringπ βΆ= β€/256β€of integers modulo 256, but a matrix representation is clearly possible. 36
SAFER is perhaps the oldest cipher to use a linear diffusion layer in place of a permutation of 37
the bits of the state to achieve diffusion, so we want to have a closer look at its design. The 38
structure of the diffusion layer can be written as TPTPT, where P means permutation (of the 39
bundles) and T is the layer of pseudo-Hadamard transforms (PHT). T transforms all pairs of 40
adjacent bundlesπ₯π, π₯π+1withπeven using the following PHT: 1 ( π₯β²π π₯π+1β² )=( 2 1 1 1)β ( π₯π π₯π+1) (mod 256) .
Hence, the T layer is represented by the8Γ8block diagonal matrixπ΄with the matrix
(
2 1 1 1)on
2
the diagonal four times. The permutation of the bundles is called ashuffle. It is the permutation 3
(0 2 4 6 1 3 5 7)
and corresponds to the matrix 4 π΅ = β β β β β β β β β β β 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 β β β β β β β β β β β (mod 256) .
Hence, theπ -matrix representing the entire diffusion layer of SAFER is 5 π = π΄ β π΅ β π΄ β π΅ β π΄ = β β β β β β β β β β β 8 4 4 2 4 2 2 1 4 2 2 1 4 2 2 1 4 4 2 2 2 2 1 1 2 2 1 1 2 2 1 1 4 2 4 2 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 β β β β β β β β β β β (mod 256) .
Let us now consider the effect of this matrix on the state, interpreted as a vector of length eight 6
over the ringπ . Since the operator defined byπis linear, in order to determine how differences 7
in the input propagate it suffices to consider the differences as relative to the zero vector, i.e. to 8
study the images of individual vectors. Most vectorsπ£of weight one are mapped to vectors 9
π£ β π of weight eight, but not all, for instance, π£ = (32 0 0 0 0 0 0 0)π‘ is mapped to a vector
10
π£ β π =(0 128 128 64 128 64 64 32)π‘of weight seven, and the image ofπ£β²=(128 0 0 0 0 0 0 0)π‘has
11
weight just one: π£β²β π =(0 0 0 0 0 0 0 128)π‘. Intuitively, this means that some inputs or some 12
differences do not diffuse well through the layer. Catastrophic consequences are in fact avoided 13
only by the fact that SAFER uses good S-boxes and the fact that the single bundle difference on 14
the eighth vector element will completely diffuse in the following round. 15
1.8.3.1 Multipermutations and MDS Matrices 16
The obvious question is: what are the matrices that guarantee themost completediffusion? The 17
question is somewhat ill posed because a desirable property of any component of a block cipher 18
is its fast evaluation. Hence, a good diffusion matrix must strike the right balance between good 1
diffusion and fast evaluation: a less perfect but much faster diffusion layer could still lead to a 2
cipher that it faster and not less secure than another cipher making use of an ideal, but slower, 3
diffusion layer. Also the question of performance is per se difficult to formalise: for instance, a 4
sparse matrix is not necessarily good if some entries represent elements which are expensive 5
to multiply with. 6
This said, the first problem remains that of measuring the quality of diffusion and determining 7
optimal matrices β performance considerations, including compromises, come later. 8
To address this first problem, Serge Vaudenay suggested [Vau94] (generalising previous work 9
by himself and Claus-Peter Schnorr [SV94]) to use multipermutations: Given an alphabetβ³and
10
integersπ,π , a a(s, n)-multipermutationover the alphabetβ³is a function π fromβ³π toβ³πsuch
11
that two different(π + π)-tuples of the form(π₯, π(π₯))cannot collide in anyπ positions. Serge Vaudenay 12
in particular first observed that the PHT in SAFER (and hence the whole diffusion layer) is not 13
a multipermutation. 14
To construct multipermutations, if the alphabet is representable as a finite field, he suggested to 15
use (the redundancy part) ofMDS matrices, i.e. matrices of MDS (maximum distance separa- 16
ble) codes, which are the codes which reach the Singleton bound: In other words aπ Γ π matrix 17
πover a finite fieldπ½ is an MDS matrix if it is the transformation matrix of a linear transforma- 18
tion π βΆπ½π π½π,π₯ π΄π₯with the following property: ifπ₯andπ₯βdiffer in exactlyπ‘components,
19
then π(π₯)and π(π₯β) must differ in at leastπ β π‘ + 1components. The latter property is called 20
perfect diffusion. Vaudenay also showed how to exploit imperfect diffusion for cryptanalysis (as 21
in the case of reduced rounds of SAFER with suboptimal S-boxes, cf. Section3.8 on page 150). 22
Now, to see why this is optimal and indeed a desirable cryptographic property, let us assume 23
π = πand consider first the case of a single changed input word. Then the change should spread 24
to all outputs β a property that, as we have seen at the beginning of this section, is not satisfied 25
by the SAFER diffusion later. If we now change two words, we may always choose them to 26
thatoneof the outputs of the linear transformation is equal to the corresponding input (this is 27
a simple linear algebra exercise) so we cannot do better than requiring that at leastπ β 1inputs