Diffusion - A Salad of Block Ciphers

The state of a SPN can usually be viewed as an array of smaller words or bundles. It is these bun- 38

dles that are usually transformed by some non-linear operation such as S-boxes, then combined 39

with a round key, and permuted among each other or, if they are interpreted as elements of 1

some algebraic structure (for instance as a vector space over a finite field) even composed with 2

each other using some kind of linear operation. 3

Suppose an attacker changes a bit in the input. The corresponding S-box ideally changes on 4

average a half of its bits – this means that, other than confusion, an S-box already performs 5

a limited diffusion process. These altered bits are then furtherdiffused by the diffusion layer 6

and thus affect the input to one or more S-boxes in the following round. Ideally, even more 7

S-boxes are affected in the following round and so on. The affected S-boxes are calledactive 8

S-boxes and, intuitively, to increase both efficiency and security it is desirable that the number 9

of active S-boxes becomes maximal after as few rounds as possible. In other word, we want to 10

attain gooddiffusion. There are different techniques to achieve it, and we shall present the most 11

significant ones. 12

1.8.1 Bit Permutations

Several recent ciphers, such as PRESENT (Section 3.29 on page 206) and PRINTcipher (Sec- 14

tion3.32 on page 211) use a simple bit permutation of the S-box outputs to achieve diffusion. 15

Therefore additional conditions on the S-boxes are placed to improve the avalanche effect. (For 16

instance single input differences should not trigger the same single input difference in another 17

S-box in the next round). 18

However, these ideas have been developed very early in the history of block ciphers, and bit 19

permutations were commonplace in early designs, such as the Data Encryption Standard3.2 20

on page 129and GOST 28147-893.4 on page 140. 21

One of the earliest treatments of bit permutation networks in SPNs is due to John Kam and 22

George Davida [KD79], who introduce the notion of completeness: A bijective function 𝑓 ∶ 23

{0, 1}𝑛 {0, 1}𝑛 (for instance a S-box) is said to be complete if, for every𝑖, 𝑗 ∈ [0..𝑛 − 1], there

exist two n-bit vectors𝑥₁,𝑥₂such that𝑥₁and𝑥₂differ only in the𝑖th_{bit and} _𝑓₍_𝑥

1)differs from 𝑓(𝑥2)at

least in the𝑗thbit. Similarly, a keyed cipher is said to becompleteif it is complete function for all keys.

(It is easily seen that the Strict Avalanche Criterion is a strengthening of this property.) 27

Kam and Davida also show how to construct complete ciphers. The fundamental idea consists 28

in alternating substitution layers, where the S-boxes are assumed to be complete themselves, 29

with specially defined wired permutation layers. Each output bit in a layer should be wired to 30

input bits to distinct (but not necessarily different) S-boxes at the next layer. The corresponding 31

graph, with the S-boxes as nodes and the wirings as edges directed from an S-box in a layer to 32

an S-box in the next layer, should connect each input bit to the SPN to each output bit through 33

exactly one path, in other words it should be a polytree, i.e. be acyclic. Usually the S-boxes are 34

either keyed or the key is mixed before the S-Box, one key bit per input bit. 35

Each SPN output bit may thus be viewed as a tree function of the SPN input bits, where each 36

tree function is composed of S-boxes. Similarly, the network inputs may be viewed as tree 37

functions of the network output bits. Therefore, several authors follow Howard M. Heys and 38

Stafford E. Tavares in [HT93,HT95] and refer to SPNs constructed using the Kam and Davida 39

methodology astree-structured SPNsor TS-SPNs. 40

Kam and Davida provide a concrete construction method, by means of which, a complete SPN 41

with block size of𝑛 = 𝑚𝑡bits can be built using𝑡substitution layers and𝑡 − 1bit permutation 42

Figure 1.7: Complete Tree-Structure SPNs Following the Kam-Davida Construction 𝑆₀ 𝑆′₀ 𝑆₁ 𝑆′₁ 𝑆₂ 𝑆′₂ 𝑆₃ 𝑆′₃ 𝑆₀ 𝑆′₀ 𝑆″ 0 𝑆₁ 𝑆′₁ 𝑆″₁ 𝑆₂ 𝑆′₂ 𝑆″₂ 𝑆₃ 𝑆′₃ 𝑆″ 3 𝑆₄ 𝑆′₄ 𝑆″₄ 𝑆₅ 𝑆′₅ 𝑆″ 5 𝑆₆ 𝑆′₆ 𝑆″ 6 𝑆₇ 𝑆′₇ 𝑆″₇ 𝑆₈ 𝑆′₈ 𝑆″ 8

layers. Examples for(𝑚, 𝑡)=(4, 2)and(3, 3)are shown in Figure1.7. However, there is a prob- 1

lem with their construction: it can be shown that if a single input bit is changed, the probability 2

of an output bit changing will always be2−𝑡. This was shown in [Web85] and termed avalanche 3

damping (see also [WT85]). 4

John Kam and George Davida obtainedUS Patent 4,275,265on their design in 1981. A similar 5

idea, wherepairsof bits are permuted as bundles, but otherwise in an entirely analogous way, 6

is described in Thomson-CSF’sUS Patent 4,751,733(filed in 1986, lapsed). 7

It is these ideas that influenced the wiring used in PRESENT (where completeness is achieved 8

over two rounds, as it can be seen from Figure3.34 on page 206) and, to a lesser extent, PRINTci- 9

pher. The design of composite S-boxes such as those of Khazad (Subsection3.21.2 on page 190) 10

and Anubis (Subsection3.21.3 on page 192), or of ICEBERG (Section3.24 on page 197) is, on 11

the other hand, more similar to the scheme of the Thomson-CSF patent. 12

Simple bit permutations are particularly efficient in hardware, as they amount to just wiring. 13

Some exceptional cases may be implemented efficiently also in software. However, they suffer 14

from the problem that an output bit may at most influence a single input bit of the next layer 15

of the cipher, and therefore diffusion is slower than with more sophisticated linear layers. and 16

more rounds may be necessary than with more complex linear diffusion layers. As a conse- 17

quence, bit permutation based diffusion layers are more suitable to very compact hardware 18

implementations than to efficient implementations (mostly in software, but also in hardware). 19

A historical remark: Kam-Davida constructions have been used repeatedly to build example 20

ciphers for the purposes of explaining cryptanalysis. For instance in Howard M. Heys’s excel- 21

lent tutorial on linear and differential cryptanalysis [Hey02], or to create practical examples of 22

new attacks in order to analyse them, as in [Lea10,BG10,AL12]. However, this SPN is almost 23

always used or rediscovered without attribution. 24

1.8.2 Shuffles

The shuffle of the branches of a (generalised) Feistel network is usually just a simple circular 26

rotation of the branches of the state. A shuffle is therefore just a particular bit permutation. 27

In the SAFER cipher family (Section 3.8 on page 150) diffusion is achieved by alternating a 28

layer of Pseudo-Hadamard Transforms with a bundle shuffle; the latter is carefully chosen to 1

guarantee that after three such layers a change is diffused to all bundles (with some exceptions 2

that we discussed earlier). 3

Taking a cue from Kaisa Nyberg’s work on GFNs [Nyb96], that includes, for instance, the design 4

depicted in Figure1.4 on page 31, Subfigure 6 (b), Tomoyasu Suzaki and Kazuhiko Minematsu 5

in [SM10] consider shuffles of the branches in GFNs, with the goal of minimising the number 6

of rounds necessary to achieve full diffusion. 7

Consequently, they consider the setting where a single shuffle is performed per round – as 8

opposed to the multi-shuffle design of SAFER. We see immediately that on one hand, shuffles 9

are just bit permutations, but on the other hand they are compatible with the partitioning of the 10

state in branches or bundles. This has two important consequences: all the output bits of a S-box 11

(or of a Feistel target) influences the input to just one bundle or branch in the following round; 12

this influence can be intuitively controlled better, since we do not have to take into account the 13

influence of single unchanged output bits. 14

Let 𝜋 be a shuffle of the branches of the Feistel network, where the branches are identified 15

with the corresponding index set (for a 𝑘-branch GFN this set would be[0..𝑘 − 1]). Define 16

the quantityDR_𝜋(𝑗)as the minimum number of rounds necessary to diffuse the𝑗-th sub input 17

block of the first round,𝑥(₀𝑗)to all sub output blocks (of theDR_𝜋(𝑗)-th round), andDRmax(𝜋)

to be the maximum of all suchDR_𝜋(𝑗). For a 𝑘-branch Type 2 GFS it is𝜋(𝑗) = (𝑗 + 1)mod 𝑘 19

and it is easy to see thatDRmax(𝜋)= 𝑘. However, for a different shuffle𝜋of the branches, the 20

correspondingDRmax(𝜋)can be different. 21

Now, let us first define 22

DRmax±(𝜋)∶= max_{DRmax(𝜋), DRmax(𝜋−1_)} (so that both encryption and decryption are taken into consideration) and 23

DRmax∗_𝑘 ∶= min_{DRmax±(𝜋)∶ 𝜋 ∈ 𝛴([0..𝑘 − 1]_)}

where𝛴([0..𝑘 −1])is the full symmetric group over the set[0..𝑘 −1]. An exhaustive search gives 24

DRmax∗₄= 4,DRmax∗₆= 5,DRmax∗₈= 6,DRmax∗₁₀= 7, andDRmax∗_𝑘 = 8for𝑘 = 12, 14, 16. 25

Suzaki and Minematsu then searched for optimal block shuffles, and for their optimal𝜋∗_𝑘 26

DRmax(𝜋_𝑘∗)= DRmax((𝜋_𝑘∗)−1₎

holds true. Interestingly, for𝑘 = 8a permutation𝜋 was found such thatDRmax(𝜋) = 5and 27

DRmax(𝜋−1) = 7, which is not optimal w.r.t. the above definition of DRmax∗_𝑘. A cipher de- 28

signed around that permutation could have decryption easier to analyze than encryption! This 29

is an uncommon occurrence, but, for instance, FROG (Subsection3.19.3 on page 180) is such a 30

cipher. 31

All optimum block shuffles𝜋_𝑘∗found by Suzaki and Minematsu also satisfy the property that 32

any even-indexed input block is mapped to an odd-indexed output block and vice versa – so 33

that the output of a target branch is permuted to the input of a source branch. Such shuffles are 34

calledeven-odd shuffles. 35

A lower bound forDRmax∗for even-odd shuffles can be derived as follows. For a fixed one 1

block input difference, let𝑁_𝑖𝑜, resp.𝑁_𝑖𝑒, be the number of odd-numbered, resp. even-numbered, 2

sub blocks in the𝑖-th round output affected by that input block. Initially we have that one 3

of 𝑁𝑒

0 and𝑁0𝑜 is0 and the other one is 1. Assuming that the shuffle works ideally, we have

𝑁_𝑖𝑒 = 𝑁_𝑖−1𝑒 + 𝑁_𝑖−1𝑜 , and𝑁_𝑖𝑜 = 𝑁_𝑖−1𝑒 and from this we see that𝑁_𝑖𝑒 = 𝑁_𝑖−1𝑒 + 𝑁_𝑖−2𝑒 holds. Hence 5

{𝑁_𝑖𝑒}_𝑖is a Fibonacci sequence. For a GFS with an even-odd shuffle, if a certain number of rounds 6

is sufficient to achieve the diffusion to all even output blocks, the full diffusion is achieved 7

by one more round. Therefore, if𝑖is the smallest integer that satisfies𝑁_𝑖𝑒 ⩾ 𝑘/2, 𝑖 + 1is the 8

lower bound forDRmax∗for all even-odd shuffles for𝑘blocks (not necessarily achievable). The 9

sequence{𝑁𝑒

𝑖}𝑖 takes lower values with𝑁0𝑒 = 0and𝑁0𝑜 = 1and gives the Fibonacci numbers.

Hence𝑁𝑒_𝑖 ≈ 𝜑𝑖/√5, where𝜑is the golden ratio, and the lower bound forDRmax∗is roughly 11

log_𝜑√5𝑘/2 ≈ log₂1.44𝑘. The optimal results mentioned above for even𝑘,4 ⩽ 𝑘 ⩽ 16are very 12

close to this bound. 13

In [SM10] Suzaki and Minematsu show how to use colored de Bruijn graphs to build a block 14

shuffle for𝑘 = 2𝑠+1 (for any𝑠 ⩾ 2) whoseDRmax∗is at most2𝑠 + 2 = 2 log₂𝑘. This is quite 15

close to thelog₂1.44𝑘lower bound just proved. For the details of the construction we refer to the 16

paper. The important remark here is that this gives an upper bound that proves the logarithmic 17

growth ofDRmax∗_𝑘. 18

The authors also compare their results to those of James Massey for the branch permutation 19

used for diffusion in the SAFER family, in particular to the Armenian Shuffle used in SAFER+ 20

(cf. Subsection 3.8.3 on page 152). Even though the Armenian Shuffle is also based on a de 21

Bruijn graph, it is not an even-odd shuffle – bit this is not a problem for SAFER+, since it is a 22

bricklayer cipher, not a generalised Feistel. 23

It is still an open question if better shuffle families can be found – i.e. with a smallerDRmax– 24

or how it can be achieved by mixing different types of shuffles. 25

1.8.3 Diffusion Layers Based on Linear Algebra

For simplicity let us assume that the words of the state are elements of a module𝑉over a ring 27

𝑅(𝑉 can be the ring itself) so that its elements can be added to each other and multiplied by 28

elements of𝑅(scalars). More generally, we can consider modules over rings instead. The state 29

is this just a𝑛-tuple𝑣of elements of𝑉and we consider the following type of transformation: 30

𝑣is multiplied by a𝑛 × 𝑛matrix𝑀over𝑅. Multiplication by𝑀should be invertible at least in 31

the case where the diffusion is used directly in a “classic” SPN. 32

Note that the diffusion layer can sometimes be described as a matrix even when this operation 33

is described in a different way. For instance, in SAFER (Section3.8 on page 150) the diffusion 34

layer (which can be seen in Figure3.14 on page 151) is constructed from simpler operations over 35

the ring𝑅 ∶= ℤ/256ℤof integers modulo 256, but a matrix representation is clearly possible. 36

SAFER is perhaps the oldest cipher to use a linear diffusion layer in place of a permutation of 37

the bits of the state to achieve diffusion, so we want to have a closer look at its design. The 38

structure of the diffusion layer can be written as TPTPT, where P means permutation (of the 39

bundles) and T is the layer of pseudo-Hadamard transforms (PHT). T transforms all pairs of 40

adjacent bundles𝑥_𝑖, 𝑥_𝑖+1with𝑖even using the following PHT: 1 ( 𝑥′_𝑖 𝑥_𝑖+1′ )=( 2 1 1 1)⋅( 𝑥_𝑖 𝑥_𝑖+1) (mod 256) .

Hence, the T layer is represented by the8×8block diagonal matrix𝐴with the matrix

(

2 1 1 1)on

the diagonal four times. The permutation of the bundles is called ashuffle. It is the permutation 3

(0 2 4 6 1 3 5 7)

and corresponds to the matrix 4 𝐵 = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (mod 256) .

Hence, the𝑅-matrix representing the entire diffusion layer of SAFER is 5 𝑀 = 𝐴 ⋅ 𝐵 ⋅ 𝐴 ⋅ 𝐵 ⋅ 𝐴 = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 8 4 4 2 4 2 2 1 4 2 2 1 4 2 2 1 4 4 2 2 2 2 1 1 2 2 1 1 2 2 1 1 4 2 4 2 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (mod 256) .

Let us now consider the effect of this matrix on the state, interpreted as a vector of length eight 6

over the ring𝑅. Since the operator defined by𝑀is linear, in order to determine how differences 7

in the input propagate it suffices to consider the differences as relative to the zero vector, i.e. to 8

study the images of individual vectors. Most vectors𝑣of weight one are mapped to vectors 9

𝑣 ⋅ 𝑀 of weight eight, but not all, for instance, 𝑣 = (32 0 0 0 0 0 0 0)𝑡 _{is mapped to a vector}

𝑣 ⋅ 𝑀 =(0 128 128 64 128 64 64 32)𝑡_{of weight seven, and the image of}_𝑣′₌₍_{128 0 0 0 0 0 0 0}₎𝑡_has

weight just one: 𝑣′⋅ 𝑀 =(0 0 0 0 0 0 0 128)𝑡. Intuitively, this means that some inputs or some 12

differences do not diffuse well through the layer. Catastrophic consequences are in fact avoided 13

only by the fact that SAFER uses good S-boxes and the fact that the single bundle difference on 14

the eighth vector element will completely diffuse in the following round. 15

1.8.3.1 Multipermutations and MDS Matrices 16

The obvious question is: what are the matrices that guarantee themost completediffusion? The 17

question is somewhat ill posed because a desirable property of any component of a block cipher 18

is its fast evaluation. Hence, a good diffusion matrix must strike the right balance between good 1

diffusion and fast evaluation: a less perfect but much faster diffusion layer could still lead to a 2

cipher that it faster and not less secure than another cipher making use of an ideal, but slower, 3

diffusion layer. Also the question of performance is per se difficult to formalise: for instance, a 4

sparse matrix is not necessarily good if some entries represent elements which are expensive 5

to multiply with. 6

This said, the first problem remains that of measuring the quality of diffusion and determining 7

optimal matrices – performance considerations, including compromises, come later. 8

To address this first problem, Serge Vaudenay suggested [Vau94] (generalising previous work 9

by himself and Claus-Peter Schnorr [SV94]) to use multipermutations: Given an alphabetℳand

integers𝑛,𝑠, a a(s, n)-multipermutationover the alphabetℳis a function 𝑓 fromℳ𝑠 _to_ℳ𝑛_such

that two different(𝑠 + 𝑛)-tuples of the form(𝑥, 𝑓(𝑥))cannot collide in any𝑠positions. Serge Vaudenay 12

in particular first observed that the PHT in SAFER (and hence the whole diffusion layer) is not 13

a multipermutation. 14

To construct multipermutations, if the alphabet is representable as a finite field, he suggested to 15

use (the redundancy part) ofMDS matrices, i.e. matrices of MDS (maximum distance separa- 16

ble) codes, which are the codes which reach the Singleton bound: In other words a𝑛 × 𝑠matrix 17

𝑀over a finite field𝔽 is an MDS matrix if it is the transformation matrix of a linear transforma- 18

tion 𝑓 ∶𝔽𝑠 _𝔽𝑛_,_𝑥 _𝐴𝑥_{with the following property: if}_𝑥_and_𝑥∗_{differ in exactly}_𝑡_components,

then 𝑓(𝑥)and 𝑓(𝑥∗) must differ in at least𝑛 − 𝑡 + 1components. The latter property is called 20

perfect diffusion. Vaudenay also showed how to exploit imperfect diffusion for cryptanalysis (as 21

in the case of reduced rounds of SAFER with suboptimal S-boxes, cf. Section3.8 on page 150). 22

Now, to see why this is optimal and indeed a desirable cryptographic property, let us assume 23

𝑠 = 𝑛and consider first the case of a single changed input word. Then the change should spread 24

to all outputs – a property that, as we have seen at the beginning of this section, is not satisfied 25

by the SAFER diffusion later. If we now change two words, we may always choose them to 26

thatoneof the outputs of the linear transformation is equal to the corresponding input (this is 27

a simple linear algebra exercise) so we cannot do better than requiring that at least𝑛 − 1inputs

In document A Salad of Block Ciphers (Page 46-60)