Confusion - A Salad of Block Ciphers

A crucial aspect in the design of a substitution-permutation network (including Feistel ciphers 28

and other types) is the substitution layer – without it a cipher is in most cases simply a linear 29

operation. 30

Some ciphers achieve the desired resistance while dispensing with S-boxes altogether. In IDEA 31

(Section3.6 on page 144) three operations on mutually “incompatible” algebraic structures are 32

combined: multiplication in the multiplicative group of the integers modulo 257, integer addi- 33

tion modulo 256, and bitwise XOR. Any two of these three operations do not satisfy any distribu- 34

tive or associative law. This makes it very difficult to find linear approximations oF-functions 35

obtained by chaining these operations and provides also good differential properties. Further 36

example of such ciphers are given by the RC5 (Section 3.10 on page 158), RC6 (Section 3.15 37

on page 168), TEA and XTEA (Section3.12 on page 161), HIGHT (Section3.27 on page 202), 38

Threefish (Section3.30 on page 207), and SIMON and SPECK (Section3.36 on page 222). A lot 39

of work done to assess the strength of a cipher designed around different types of operations 1

is ad-hoc. However, Serge Vaudenay’s decorrelation theory, explained in Subsection1.10.3 on 2

page 64, offers some insights, especially when multiplications are used. 3

Sometimes mutually incompatible algebraic structures are used in conjunction with S-Boxes, to 4

increase the non-linearity provided by the latter. This has been done in GOST (Section3.4 on 5

page 140), in the the SAFER family (Section3.8 on page 150), Blowfish (Section3.9 on page 157), 6

the cast CAST family (Section3.7 on page 148), and many other ciphers. 7

In many cases, however, the non-linear components of a block cipher – or at least the most 8

important ones – can be represented asvectorial Boolean functionsthat map𝑛bits to𝑚bits: 9

𝐹 ∶𝔽₂𝑛 𝔽₂𝑚

(𝑥₀, … , 𝑥_𝑛−1) (𝑦₀, … , 𝑦_𝑚−1) .

Such a function is also called a(n,m)-function. It can also be viewed as a vector of𝑚Boolean 10

functions with the same𝑛inputs, that can also be studied independently. 11

Regardless of how the actual function is implemented, we can consider it as a S-box. 12

A very thorough discussion of Boolean functions and of vectorial Boolean functions for cryp- 13

tography can be found in Claude Carlet’s chapters of [CH10], i.e. Chapter 8 [Car10a] and Chap- 14

ter 9 [Car10b]. 15

In the following sections we shall recall the principal design or selection criteria for S-boxes. To 16

add some confusion (sic) to our treatment, some of these criteria actually deal with diffusion, 17

such as the strict avalanche criterion, but, as we shall see, are ultimately related to confusion 18

properties of the S-box as well. 19

1.9.1 Properties of a Confusion Function

1.9.1.1 Balancedness 21

A(𝑛, 𝑚)-function𝐹is calledbalancedif every value of𝔽₂𝑚is taken by𝐹the same number2𝑛−𝑚

of times. In other words, the function is surjective and the distribution of the values is uniform. 23

The balanced(𝑛, 𝑛)-functions are the permutations on𝔽₂𝑛. 24

Balancedness is an important property for nonlinear components of a cipher in order to avoid 25

statistical dependences between plaintext and ciphertext. Per se, balancedness is not a rare 26

property: Any nonconstant affine function is balanced, but affine S-boxes are useless. On the 27

other hand, it can be tricky to find a S-box that is balanced as well as satisfying other desirable 28

cryptographic properties, as we shall see in the following subsections. 29

A(𝑛, 𝑚)-function𝐹is balanced if and only if its component functions are balanced, that is, if 30

and only if, for every nonzero𝑏 ∈𝔽₂𝑚, thecomponent (Boolean) function 31

𝐹_𝑏 ∶ 𝑥 ⟨𝑣, 𝐹(𝑥)⟩

is balanced (i.e. it takes the values0and1each2𝑛−1times). A proof of this result can be found 32

in [LN83] or [Car10b]. 33

1.9.1.2 Algebraic Degree 1

The S-boxes should be algebraic functions of high degree in other to resist higher order dif- 2

ferential attacks (Subsection2.1.5 on page 83), which aim at eliminating low degree functions 3

contributing to confusion in block cipher, as well as algebraic attacks. 4

We can write any Boolean function𝑓 ∶𝔽₂𝑛 𝔽₂as a sum of monomials: 5 𝑓(𝑥)= ∑ 𝑢∈𝔽𝑛 2 𝑎𝑢 𝑛−1 ∏ 𝑖=0 𝑥𝑢𝑖 𝑖 .

This is thealgebraic normal form(short ANF, also known as Zhegalkin normal form, or Reed- 6

Muller expansion) of the Boolean function𝑓. The degree of the function𝑓 is the degree of the 7

largest monomial in its ANF. It is a desirable property that for an S-Box𝑠, any linear function 8

of its output bits𝑥 ⟨𝑠(𝑥), 𝑎⟩(with0 ≠ 𝑎 ∈ 𝔽₂𝑛) has as largest degree as possible (or, at least, 9

those linear combinations that are actually used in the cipher). 10

For large𝑛, random Boolean functions have almost always algebraic degrees at least𝑛 − 1, In 11

fact, number of Boolean functions of algebraic degrees at most𝑛 − 2equals2∑

𝑛−2 𝑖=0 (

𝑛

𝑖)_{= 2}2𝑛−𝑛−1

which is a fraction of 1/2𝑛+1 of the set of all22𝑛 Boolean functions. However, functions of 13

optimal algebraic degrees do not allow achieving some other characteristics, for instance we 14

shall see that bent and almost bent functions (defined in Subsection1.9.1.4 on the facing page) 15

have algebraic degree at most𝑛/2and(𝑛 + 1)/2respectively. 16

A high algebraic degree is important because lower degree functions will be more easily at- 17

tacked by means of algebraic attacks (see Section 2.8 on page 123 and in particular Subsec- 18

tion2.8.4 on page 125) or higher-order differential cryptanalysis (Subsection2.1.5 on page 83): 19

using lower degree functions can be offset by increasing the complexity of the cipher in other 20

places, for instance by increasing the number of rounds – at the price of worse performance. 21

The algebraic degree is an affine invariant. 22

1.9.1.3 Algebraic Immunity 23

Considering just the degree of a Boolean function is not sufficient, since Boolean functions may 24

have multiples of low degrees, and these can be used instead – these multiples can arise from 25

the way the various bits are combined in the cipher, and thus their corresponding functions. 26

Hence the concept ofalgebraic immunityof a Boolean function𝑓 ∶𝔽₂𝑛 𝔽₂, was introduced by 27

Willi Meier, Enes Pasalic and Claude Carlet [MPC04] as the minimum degree of all annihilators 28

of𝑓 or𝑓 + 1: 29

𝒜𝒾(𝑓)∶= min_{deg(𝑔)∣ 𝑓 𝑔 = 0 ∨(𝑓 + 1)𝑔 = 0, 𝑔 ≠ 0_} .

In other words, only a function of degree at least𝒜𝒾(𝑓)in the same inputs will “kill” the output 30

of 𝑓 once a round constant or a fixed key bit is added to𝑓. It is also known [CM03,MPC04] 31

that any function of degree𝑛 must have an annihilator at the degree⌈𝑛/2⌉, whence𝒜𝒾(𝑓) ⩽ 32

⌈deg(𝑓)/2⌉. This important characteristic is an affine invariant. 33

There are several generalisations of the concept of algebraic immunity to(𝑛, 𝑚)-functions, some 34

of which are more useful for function where𝑚is small with respect to𝑛(a common situation 35

in stream ciphers) and some which are better when𝑚and𝑛are comparable, as in block ciphers. 36

For the latter case a natural generalisation is thecomponent algebraic immunity, defined as 1

the minimal algebraic immunity of the component functions𝐹_𝑏of the S-box𝐹for𝑏 ≠ 0. 2

1.9.1.4 Nonlinearity 3

A single biased linear approximation of a block cipher – possibly with a few initial or final 4

rounds omitted – is often sufficient to mount a successful linear attack against it (see Section2.2 5

on page 91). Therefore we need to ensure that such an approximation cannot possibly exist. 6

This can be achieved by choosing highly nonlinear S-boxes and then ensuring that the diffusion 7

in the cipher forces all characteristics to cross a sufficiently high minimal number of “active” S- 8

boxes. We consider here the non-linearity of the S-boxes. 9

In order to approximate a Boolean function 𝑓 ∶=𝔽₂𝑛 𝔽₂we can consider itsWalsh transform 10

𝑓𝓌(𝑎)= ∑

𝑥∈𝔽𝑛 2

(−1)𝑓(𝑥)+⟨𝑎,𝑥⟩

where 𝑓𝓌(𝑎)is called theWalsh coefficientof 𝑓at the place𝑎. A Boolean function 𝑓 is balanced 11

if and only if 𝑓𝓌(𝑎)= 0. The Walsh transform of 𝑓 is the Fourier transform of the sign function 12

𝑓_𝜒of 𝑓, defined as 𝑓_𝜒(𝑥) = 1if 𝑓(𝑥) = 0and 𝑓_𝜒(𝑥) = −1if 𝑓(𝑥) = 1, or, equivalently, 𝑓_𝜒(𝑥) = 13

(−1)𝑓(𝑥)_{. Also, note that if} 14

ℒ_𝑎(𝑓)=_{𝑥 ∈𝔽₂𝑛∣ ⟨𝑎, 𝑥⟩ = 𝑓(𝑥_)}

is the set of inputs where 𝑓 and the linear function𝑥 ⟨𝑎, 𝑥⟩agree, we have 15

𝑓𝓌(𝑎)= 2 #ℒ_𝑎(𝑓)− 2𝑛 .

If the Walsh coefficient of 𝑓 at place 𝑎 is positive, resp. negative, it is clear that the larger it 16

is the better the function 𝑓 will be approximated by the linear function 𝑥 ⟨𝑎, 𝑥⟩, resp. the 17

affine function𝑥 1 + ⟨𝑎, 𝑥⟩. So we want Walsh coefficients that are close to zero, and such 18

that the largest of them in absolute value is as small as possible. This leads to the definition of 19

nonlinearityof a Boolean function 𝑓 ∶𝔽₂𝑛 𝔽₂: 20

𝑛𝑙(𝑓)∶= 2𝑛−1− 1 2max𝑎∈𝔽₂𝑛∣ 𝑓

𝓌

(𝑎)∣ .

It is clear that we aim at finding functions with the highest nonlinearity possible: the reason 21

being that𝑛𝑙(𝑓)/2𝑛−1_{is an upper bound for the likelihood that an affine relation holds between}

some input bits and the output bit. Now, Parseval’s Theorem states that∑_𝑎∈_𝔽𝑛 2 𝑓

𝓌

(𝑎)2_{= 2}2𝑛_{, i.e.}

average of the squares of the Walsh coefficients is2𝑛. From this it follows that the “best” nonlin- 24

ear functions have Walsh coefficient equal to𝑓𝓌(𝑎)= ±2𝑛/2_{at all places}_{𝑎, that}_max 𝑎∈𝔽𝑛 2 ∣ 𝑓 𝓌 (𝑎)∣ ≥ 25 2𝑛/2and, finally, 26 𝑛𝑙(𝑓)≥ 2𝑛−1− 2𝑛/2−1 . (1.8)

This bound is tight, and is also called thecovering radius bound, since this is the value of the 27

covering radius of the Reed-Muller code of order1for𝑛even. The functions that achieve bound 28

(1.8) are calledbent functionsbecause they are as different as possible (in the sense of Hamming 29

weight of the difference) – and in fact equidistant – from all linear and affine functions. They 30

have been investigated in the ’60s by Oscar Rothaus in research that was not published until 31

1976 [Rot76]. Clearly, bent functions exist only for𝑛even. 1

The concept of Walsh coefficient is easily generalised to(𝑛, 𝑚)-functions𝐹by simply replacing 2

the single-bit valued 𝑓 in the original definition with the component functions of𝐹: 3

𝐹𝓌_𝑏(𝑎)= ∑

𝑥∈𝔽4 2

(−1)⟨𝑏,𝐹(𝑥)⟩+⟨𝑎,𝑥⟩ _. _(1.9)

Then, we define thenonlinearityof a(𝑛, 𝑚)-function𝐹as the minimum of the nonlinearities of 4

all its component functions, i.e.: 5 𝑛𝑙(𝐹)∶= 2𝑛−1₋ 1 2 max𝑎∈𝔽𝑛 2 𝑏∈𝔽𝑛 2,𝑏≠0 ∣𝐹𝓌_𝑏(𝑎)∣ .

As for Boolean functions, 𝑛𝑙(𝐹)/2𝑛−1 gives an upper bound for the likelihood that an affine 6

relation between some of the input bits and at least one of the output bits holds. 7

Vladimir Sidelnikov [Sid71] and, independently, Florent Chabaud and Serge Vaudenay [CV94], 8

proved following bound on the nonlinearity of a(𝑛, 𝑚)-function with𝑛 ⩾ 𝑚 − 1: 9

𝑛𝑙(𝐹)⩽ 2𝑛−1− 1 2√3 × 2

𝑛_{− 2 − 2}(2𝑛− 1)(2𝑛−1− 1)

2𝑚_{− 1} . (1.10)

This is called theSidelnikov-Chabaud-Vaudenay boundin [Car10b], and it can be tight only 10

if𝑛 = 𝑚with𝑛odd. 11

Similarly to the definition of bent Boolean function, we say that a vectorial Boolean function𝐹 12

is bent if (1.8) holds with equality. It is easy to see that the notion of bent vectorial function is 13

invariant by addition of affine functions and under composition on the left and on the right by 14

affine automorphisms. By definition, a(𝑛, 𝑚)-function is bent if and only if all of the nonzero 15

component functions are bent. Bent(𝑛, 𝑚)-functions exist only if𝑛is even and𝑚 ⩽ 𝑛/2– in par- 16

ticular there are no bent permutations of𝔽₂𝑛, as proved by Kaisa Nyberg in [Nyb91]. Also, the 17

algebraic degree of a bent function with𝑛 ⩾ 4is at most𝑛/2, as proved in [Rot76]. More refined 18

bounds on the degree of bent functions have been proved by Xiang-Dong Hou in [Hou00] (the 19

result is also given as Proposition 19 in [Car10a]). 20

Nyberg suggested two constructions of bent S-boxes, one based on the Maiorana-McFarland 21

construction and one based on Dillon’s construction of difference sets. For more details and 22

references see [Nyb91]. Further references and constructions can be found in [Car10a,Car10b]. 23

Also, already in 1988 Réjane Forré observed that the Walsh transform of a function could be 24

used to verify whether a vectorial Boolean function satisfies the strict avalanche criterion (cf. 25

Subsection1.9.1.6 on page 62). It turns out that the functions satisfying the SAC to the highest 26

possible order (i.e. functions that are SAC even if an arbitrary number of input bits are fixed) 27

are bent [AT90b] – in other words they are ideal candidates to achieve good diffusion. 28

Ideal diffusion properties, resistance to linear cryptanalysis as well as to differential cryptanaly- 29

sis (as we shall see in the next Subsection) may lead us to the conclusion that bent functions are 30

the perfect choice to construct secure cryptographic primitives. However, they are not balanced, 31

which means that they cannot be used directly to construct invertible S-boxes, and they also 32

“funnel” the input to a half-size output at best, which also complicates the constructions. There- 1

fore a more common approach is to start with bent functions and modify them by augmenting 2

the output in order to obtain balanced functions that still attain high nonlinearity [MS89]. 3

An important relaxing of the bentness condition is given byalmost bent (AB)functions. A(𝑛, 𝑛)- 4

function𝐹is almost bent if they achieve the bound (1.10) with equality, i.e.𝐹_𝑏𝓌(𝑎)∈{0, ±2(𝑛+1)/2}

at all places 𝑎. Almost bent functions exist only for 𝑛 odd. They have degree at most (𝑛 + 6

1)/2[CCZ98]. The name “almost bent” may be confusing because it may lead to think that they 7

are not optimal, but they are – bent(𝑛, 𝑛)-functions do not exist. 8

Natalia Tokareva in [Tok11] gives a survey of generalisations of bent functions. 9

1.9.1.5 Differential Uniformity 10

To make differential cryptanalysis difficult (Section2.1 on page 73), for each input difference𝛥, 11

the set of output differences𝑆(𝑥)− 𝑆(𝑥 + 𝛥)of an S-box𝑆should have a as uniform distribution 12

as possible, so that even when some output differences occur more often for given input differ- 13

ences, they do not stand out in a particular way. This is measured by(differential) uniformity: 14

For a(𝑛, 𝑚)-function𝐹we define the differetial of𝐹at the point𝑐 15

𝛥_𝑐𝐹(𝑥)∶= 𝐹(𝑥 ⊕ 𝑐)⊕ 𝐹(𝑥)

(a more general definition for functions over rings is𝛥_𝑐𝐹(𝑥)∶= 𝐹(𝑥 + 𝑐)− 𝐹(𝑥)) and the value 16 𝛿_𝐹∶= max 𝑐∈𝔽𝑚 2.𝑐≠0 𝑎∈𝔽𝑚 2 ∣𝛥−1 𝐹,𝑐(𝑎)∣

is called the(differential) uniformityof𝐹. 17

Since𝛥_𝑐𝐹(𝑥)is a(𝑛, 𝑚)-function, we trivially have the tight bound𝛿 ⩾ 2𝑛−𝑚. If this bound is 18

attained, the function𝐹is calledperfect nonlinear (PN)– this means that each𝛥_𝑐𝐹(𝑥)with𝑐 ≠ 0 19

is balanced, and conversely if each𝛥_𝑐𝐹(𝑥)with𝑐 ≠ 0is balanced then𝐹is PN. It was proven by 20

Willi Meier and Othmar Staffelbach in [MS89] that a function is PN if and only if it is bent. This 21

result was generalised to fields of arbitrary characteristic by Kaisa Nyberg in [Nyb90]. This is 22

a strong link between resistance to linear and differential cryptanalysis, and also implies that 23

PN(𝑛, 𝑚)-functions only exist if𝑚 ⩽ 𝑛/2. 24

Let us now consider the cryptographically important case𝑛 = 𝑚: it is clear that all∣(𝛥_𝑐𝐹)−1₍_𝑎₎_∣

are even (so they cannot be2𝑛−𝑛 = 1) and not all can be zero, so𝛿_𝐹⩾ 2. Functions with𝑚 = 𝑛 26

attaining this lower bound are calledAPN (Almost-Perfect Nonlinear)functions. 27

Since, for modern cipher design the S-box usually is bijective, of particular interest are bijective 28

APN functions, calledAPN permutations. If a function is APN and bijective, then the inverse 29

is also APN. 30

APN permutations exist, and they are plenty in odd dimension – if we identify𝔽₂𝑛for odd𝑛with 31

the Galois field𝔽₂𝑛we can use either cubing or inversion (other exponents can be used as well).

As a result, for instance,𝑆(𝑥)= 𝑥3in any odd binary field is immune to differential and linear 33

cryptanalysis. This is in part why the MISTY designs (see Subsection3.18.8 on page 176) use 7- 34

and 9-bit functions in the 16-bit non-linear function. (What these functions gain in immunity 35

to low order differential and linear attacks they lose to higher order differential cryptanalysis 36

and algebraic attacks, i.e. they can be described and solved via a SAT solver.) 1

The search for APN permutations in even dimensions, which are highly desirable, is more dif- 2

ficult. Until recently it was not known whether they existed at all. At the Fq9 conference in 3

2009 John Dillon announced an APN permutation on𝔽₂6[BDMW10]. It is not known whether

there are other examples. Hence, in general, the best one can realistically hope to find for a 5

(𝑛, 𝑛)-function 𝐹with 𝑛 even is𝛿_𝐹 = 4– there are plenty of functions with this property, for 6

instance field inversion in 𝔽₂𝑛. The fact that field inversion in𝔽₂𝑛 has𝛿_𝐹 = 2for𝑛 odd and

𝛿_𝐹 = 4for𝑛 = 2is proved by Kaisa Nyberg in [Nyb93], where she attributes the observation 8

to Lars Knudsen. This result will later influence the choice of the AES S-box cf. Section3.20 on 9

page 182. 10

In the same paper, Nyberg studies other power functions as well. She also consider mappings 11

derived from exponential functions in prime fields (such as those used in the SAFER family – 12

cf. Section3.8 on page 150): she proves (Proposition 7 in [Nyb93]) that a mapping from the of 13

integers modulo a prime𝑝defined as exponentiation of element of order𝑝 − 1in𝔽_𝑝is differen- 14

tially2-uniform with respect to addition modulo𝑝(the binary differential uniformity is usually 15

different). The type of differential uniformity determined also the principal type of operation 16

in the cipher, and this explains why the main operation in SAFER is the modular addition. 17

There are some important relations between almost bent functions and APN functions: 18

1. AB functions are APN. To formulate a more precise result, let us first define plateaued func- 19

tions: A (𝑛, 𝑚)-function is called plateaued if, for every nonzero 𝑐 ∈ 𝔽₂𝑚, the component 20

function𝐹𝑐is plateaued, that is, there exists a positive integer𝜈𝑐(called the amplitude of the

plateaued Boolean function) such that the Walsh spectrum of𝐹_𝑐is{0, ±𝜈_𝑐}. 22

Now,a(𝑛, 𝑛)function𝐹is AB if and only if it is APN and all its nonzero component functions are

In document A Salad of Block Ciphers (Page 60-67)