Practical Cryptanalysis of Full Sprout with TMD Tradeoff Attacks

(1)

Practical Cryptanalysis of Full Sprout

with TMD Tradeoff Attacks

?

Muhammed F. Esgin1,2 and Orhun Kara1,??

1 _{T ¨}_{UB˙ITAK B˙ILGEM UEKAE, Gebze, Kocaeli, Turkey}

{muhammed.esgin, orhun.kara}@tubitak.gov.tr

2

Graduate School of Natural and Applied Sciences, ˙Istanbul S¸ehir University, ˙Istanbul, Turkey

Abstract. The internal state size of a stream cipher is supposed to be at least twice the key length to provide resistance against the conventional Time-Memory-Data (TMD) tradeoff attacks. This well adopted security criterion seems to be one of the main obstacles in designing, particularly, ultra lightweight stream ciphers. At FSE 2015, Armknecht and Mikhalev proposed an elegant design philosophy for stream ciphers as fixing the key and dividing the internal states into equivalence classes where any two different keys always produce non-equivalent internal states. The main concern in the design philosophy is to decrease the internal state size without compromising the security against TMD tradeoff at-tacks. If the number of equivalence classes is more than the cardinality of the key space, then the cipher is expected to be resistant against TMD tradeoff attacks even though the internal state (except the fixed key) is of fairly small length. Moreover, Armknecht and Mikhalev presented a new design, which they call Sprout, to embody their philosophy. In this work, ironically, we mount a TMD tradeoff attack on Sprout within practical limits using 2doutput bits in 271−dencryptions of Sprout along with 2d _{table lookups. The memory complexity is 2}86−d _where

d≤40. In one instance, it is possible to recover the key in 231 encryp-tions and 240_{table lookups if we have 2}40 _{bits of keystream output by}

using tables of 770 Terabytes in total. The offline phase of preparing the tables consists of solving roughly 241.3 _{systems of linear equations with}

20 unknowns and an effort of about 235 encryptions. Furthermore, we mount a guess-and-determine attack having a complexity about 268

en-cryptions with negligible data and memory. We have verified our attacks by conducting several experiments. Our results show that Sprout can be practically broken.

Key words:Sprout, stream cipher, keystream generator, Time Memory Data tradeoff attacks, LFSR, NLFSR, guess-and-determine, divide-and-conquer

?

This paper also appears in the proceedings of Selected Areas in Cryptography (SAC) 2015.

??

(2)

1 Introduction

One of the main design principles for stream ciphers is that internal state size should be at least twice the key size. Otherwise, the cipher will be vulnerable to generic Time-Memory-Data (TMD) tradeoff attacks [16, 3, 11, 7, 6, 5]. This principle makes lightweight stream cipher design more challenging in comparison to designing block ciphers even though the same generic attacks are valid for block ciphers, particularly, when they are used in stream cipher modes such as OFB or CTR mode.

There have been several lightweight block cipher designs appeared in the literature such as Present [8], ITUBee [17], Lblock [21], Prince [9], Ktantan [10], Twine [20] and many more in the last decade. An example that does not follow the common tendency of designing lightweight block ciphers rather than lightweight stream ciphers is the new stream cipher design Sprout, presented at FSE 2015 [2] by Armknecht and Mikhalev. The cipher makes use of a variable internal state of only 80 bits and a fixed key of 80 bits as well. The authors show that a cipher design such as Sprout is resistant to classical TMD tradeoff attacks even though its (variable) internal state size is strictly less than twice the key size.

The property of having a smaller internal state size results from the un-common design principle adopted by the designers of Sprout [2]. According to this principle, the key is also incorporated into the next state function of the cipher as a fixed vector and the internal states including the keys are divided into equivalence classes. Each key is in a different equivalence class. That is, it is impossible to obtain two equivalent internal states with different keys. There-fore, the conventional TMD tradeoff attacks are ineffective since the number of equivalence classes is not less than the cardinality of the key space and one needs to travel almost all the equivalence classes to recover one specific key, rendering the generic tradeoff attacks slower than the exhaustive search.

The design of Sprout is inspired by Grain 128a in [1]. The sizes of both the NLFSR and the LFSR of Grain family are reduced to half of their values and the functions are slightly changed. The design philosophy of output generation and feedback functions is almost conserved except adding a round key function to incorporate key bits for each clock. As a result, a stream cipher of area cost roughly 800 GE is developed whereas Grain needs 1162 GE for the same level of 80-bit security [2].

Sprout has immediately attracted the interest of the cryptology community intensively, indicating that the design philosophy itself introduces several open questions about the security of such ciphers. Even though it is a very recent cipher, there have been a couple of its analyses [18, 19, 4, 12].

The first attack paper is by Lallemand and Naya-Plasencia, and they have shown that it is possible to recover the key with a time complexity equivalent to roughly 270 _{Sprout encryptions by merging the sets of possible LFSRs and}

NLFSRs through a careful sieving [18]. Indeed, the actual workload is 274.51

(3)

one step of exhaustive search in [18], the overall time complexity is finalized as 269.36_{. The memory complexity for leading the values for the registers is 2}46_.

In another analysis, Maitra et al. show that when the whole (variable) internal state is known, the key can be found using a SAT solver [19]. The system of nonlinear equations generated from the output is quite easy to solve. The authors show that it is possible to solve the system with roughly 900 keystream bits in less than half a second on an ordinary PC. Moreover, the authors show that it is still possible to solve the system even though two thirds of LFSR bits are also unknown. However, solving the system takes around one minute this time. Hence, guessing the variables of whole NLFSR and one third of the variables of LFSR and then solving all the corresponding 254 _{systems of equations, it is}

possible to recover the key. On the other hand, the authors do not compare the time complexity of solving 254_{equations with that of exhaustive search. Besides}

their algebraic attack, they give an example of a fault attack as well [19]. The guess-and-determine attack in [19] using a SAT solver is further improved in [4] by Banik. After guessing 50 bits of the state, the remaining bits including key bits can be solved in roughly half a minute on a standard PC by means of Cryptominisat 2.9.5 solver installed in SAGE 5.7. Moreover, Banik observed that the LFSR is expected to be initialized to the all zero vector in one of 240_{random synchronizations. When such an event occurs, the NLFSR is easily}

recovered with the key by guessing 33 bits of its state. The total time complexity is around 267 encryptions with negligible memory and about 242 bits of output. A distinguisher attack is also included by observing that it is possible to produce shifted versions of a keystream up to a factor of 80 with the same key by using different IVs. When the factor for shifting is limited as 210, it is possible to distinguish Sprout output with 257 bits of memory and 232 encryptions [4].

In [12], a related-key chosen-IV distinguisher is shown. However, the designers of the Sprout regard related-key attacks as out of scope since the key is assumed to be fixed.

Time Data Memory

[18] 270 negligible 246

[19] 275 negligible negligible

Sec. 3 in [4] 270 _negligible _negligible

Sec. 5 in [4] 266.7 ₂42 _bits _negligible

Sec. 3 in this work 268 _negligible _negligible

Sec. 4.1 in this work 240_{T Ls}_{+ 2}31 ₂40 _bits ₂46

TMD Tradeoff in this work 2d_{T Ls}_{+ 2}71−d ₂d_bits ₂86−d

(4)

In Table 1, we compare the complexities of known attacks on Sprout. Our straightforward C/C++ implementation of Sprout runs more than 223.5 _clocks

in a second on a standard PC. Thus, we make the assumption that an effort of a second is equivalent to 215 _{Sprout encryptions and that of a minute to 2}21

Sprout encryptions. One may further improve these numbers by a more efficient and performance-oriented implementation. As we show in Appendix A, one clock of Sprout costs about 2−8.33 _{Sprout encryptions.}

Our Contribution: None of the attacks in the literature so far has practi-cal workloads. We introduce a new TMD tradeoff cryptanalysis of full Sprout within the practical bounds where all data, memory and time complexities can be upperbounded by 245_{. We first show that when the internal state is known}

(excluding the key), it is much easier to obtain the secret key when the cipher is run backwards compared to running the cipher forward. Later, guessing the internal states from the keystream bits is described based on a time-memory-data tradeoff approach. For the key-recovery part, our attack is a combination of both guess-and-determine and divide-and-conquer approaches. We show that the key recovery part itself can be transformed to a guess-and-determine attack of a complexity 268 encryptions with negligible data and memory.

Our TMD tradeoff attack is based on a specific occasion where incorporation of key bits into the register can be discarded during the keystream production. So, Sprout behaves like Grain family since the round key function is bypassed. We can compute the internal states in advance for all the predefined occasions and store them with the produced keystream bits in a table. We can produce some keystream bits for these special internal states without knowing the key.

In general, one may expect that the special internal states to be stored are deduced by exhaustively searching them. However, we can deduce these states without a search-and-eliminate mechanism, reducing the time complexity of the precomputation phase dramatically. We show that computing the special internal states which cause to bypass the key bits is as easy as solving some systems of linear equations. Each of our guesses gives a valid solution. So, unlike generic TMD tradeoff attacks where the offline phase is generally as expensive as an exhaustive search, our offline phase of the attack is also in practical limits. To sum up, we explore and exploit a special kind of lack of sampling resistance in Sprout. The objective of this procedure is similar to the one of BSW sampling applied to A5/1 cipher [7].

In the online phase of the attack, we check if a special occasion occurs in the given keystream. The time complexity is 279−d _{Sprout clocks when we have 2}d bits of keystream (not necessarily from the same IV). The memory requirement is about 286−d_{. For example, when} _d_{= 40, we can recover the key in only 2}31

encryptions and 240 _{table lookups by using 2}40 _{bits of keystream with 770}

Ter-abytes of memory. The precomputation cost of preparing the tables is equivalent to solving about 241.32_{systems of linear equations with 20 unknowns and about}

235_{encryptions. We have verified our results by conducting several experiments.}

(5)

(variable) internal state of Sprout is known and introduce a guess-and-determine attack. Section 4 introduces a TMD tradeoff approach to deal with filling the internal state, including how to construct systems of linear equations during the off-line phase of the attack. We conclude the paper with some remarks and propose a solution in Section 5. Some details of Sprout is given in Appendix A and we provide experimental results verifying our attack in Appendix B.

2 High Level Description of Sprout

Sprout [2] is a lightweight stream cipher inspired by Grain family [1, 13, 15, 14]. The (variable) internal state of Sprout consist of an LFSR and an NLFSR, and there is also a fixed key that is used in the state update function. The sizes of LFSR and NLFSR are 40 bits each and the key length is 80 bits. An IV of size 70 bits is also incorporated during the initialization phase. The feedback functions of NLFSR and LFSR, and the nonlinear part of the output function are denoted byg,f andh, respectively (See Figure 1 in the appendix). We follow the notations below throughout the paper.

– t- the clock-cycle number

– ⊕- the XOR operation

– Lt:= (l0t, lt1, . . . , lt39) - state of the LFSR at clock-cyclet

– Nt:= (nt0, nt1, . . . , nt39) - state of the NLFSR at clock-cyclet

– Ct:= (ct0, ct1, . . . , ct8) - state of the counter at clock-cyclet

– K:= (k0, k1, . . . , k79) - the fixed key

– IV := (iv0, iv1, . . . , iv69) - the initialization vector

– k_t∗ - the round key bit generated during clockt

– nt- the output bit of NLFSR during clockt

– lt- the output bit of LFSR during clock t

– zt- the keystream bit generated during clockt

A 9-bit counter is used in the algorithm to count the number of rounds for the initialization phase (which has 320 rounds). After initialization, its first seven bits run cyclically from 0 to 79 and determine the index of the key bit selected at the current time. Moreover, the fourth bit of the counterct

4is involved in the

NLFSR feedback.

The linear relation of the LFSR isl₃₉t+1=f(Lt) =l0t⊕l5t⊕l15t ⊕lt20⊕lt25⊕lt34.

The function g is the nonlinear feedback function for the NLFSR. Its output is XORed with the round key bitk_t∗, the counter bitct

4 and the output of the

LFSR as nt₃₉+1 = g(Nt)⊕k∗t ⊕lt0⊕ct4 where k∗t = ktmod 80·δt with δt := lt

4⊕l21t ⊕lt37⊕nt9⊕nt20⊕nt29. Remark that one clock of Sprout is equivalent to

2−8.33 _{encryption of an exhaustive key search. We give the necessary details in}

(6)

3 A Key Recovery Attack

It is easy to see that Sprout’s next state function is invertible. So, once we obtain the whole state (including the key), we can clock the internal states of the cipher forward and backwards, as well. So, it is straightforward to recover the internal state at clock-cycletfrom the internal state at clock-cyclet+1. We first decrease the counter. Then, for the LFSR feedback, we have

l₀t=l₃₉t+1⊕lt₄+1⊕l₁₄t+1⊕lt₁₉+1⊕l₂₄t+1⊕l₃₃t+1 (1)

andlt_i₊₁=lt_i+1 for 0≤i≤38. For the NLFSR feedback, we have

nt0=kt∗⊕ct4⊕l

t+1 39 ⊕l

t+1 4 ⊕l

t+1 14 ⊕l

t+1 19 ⊕l

t+1 24 ⊕l

t+1 33

⊕nt₃₉+1⊕nt₁₂+1⊕nt₁₈+1⊕n₃₄t+1⊕nt₃₈+1⊕nt₁+1nt₂₄+1

⊕nt₂+1nt₄+1⊕n₆t+1nt₇+1⊕nt₁₃+1nt₂₀+1⊕nt₁₅+1nt₁₇+1⊕nt₂₁+1nt₂₃+1⊕nt₂₅+1nt₃₁+1 ⊕nt₃₂+1nt₃₅+1n₃₆t+1nt₃₇+1⊕nt₉+1nt₁₀+1nt₁₁+1⊕n₂₆t+1nt₂₉+1nt₃₀+1 (2)

where

k_t∗=kt, 0≤t≤79 k_t∗=ktmod 80·(l3t+1⊕l

t+1 20 ⊕l

t+1 36 ⊕n

t+1 8 ⊕n

t+1 19 ⊕n

t+1 28 )

andnt i+1=n

t+1

i for 0≤i≤38 (see [2] or Appendix A). Now, the keystreamzt can be generated while the indextis decreasing.

Maitra et al. has shown in their recent paper that it is possible to recover the key once the (variable) internal state is known by solving a system of nonlinear equations by a SAT Solver in less than half second on a single PC, using roughly 900 bits of keystream sequence [19]. We have a similar problem indeed: We make a guess for the internal state and then, we do not just want to determine the key from the internal state but also we would like to check if our guess is correct simultaneously without recovering the whole set of the key bits. The simple observation below gives us a much faster key recovery and internal state checking mechanism. Indeed, we do not need to solve a system of nonlinear equations. The following property suggests that recovering key from the internal state and output is much easier if we trace backwards through the registers.

Proposition 1. Assume that at timet+ 1, we know the internal states of both registers NLFSR and LFSR, but the whole key is unknown and thatδt= 1. While clocking the registers backwards, when a key bit appears in the keystream for the first time, it will appear as a single unknown inside nt₁−1 in the keystream bit zt−1. This happens before the key bit is incorporated into the feedback of NLFSR

through the g function.

The proof of Proposition 1 is straightforward. Assume that while the cipher is run backwards, at some clock-cycle t+ 1, we guess the value of the whole internal state (excluding the key) andδt= 1. One clock later,nt

(7)

of the formki⊕awhereais a known value obtained from the NLFSR feedback, the LFSR output and the counter. Now, at timet,nt

0 is not incorporated into

the NLFSR feedback function. Thus, ki ⊕a shifts to the position nt1−1 and

nt₀−1 does not depend on ki (it may depend on another key bit but that is not important). Now, at timet−1, we know all the register values except fornt₀−1 andnt₁−1=ki⊕a. But,nt₀−1 is not involved in the output function. So, we can easily determineki aski=zt−1⊕a⊕a0 wherea0 is a known value coming from

the tap points of the output function.

As a result, when we make a guess for the registers, at each clock, we will either have opportunity to check if the keystream bit we generate matches the corresponding output bit, or a key bit will appear as a single unknown and we will determine the key bit from the output. Hence, if the state candidate does not yield a contradiction, we again end up with registers that are completely known except maybe for the first bit nt₀ of NLFSR. However, if a key bit is involved in that term, it will be determined one clock later before going into the feedback. To sum up, continuing the procedure recursively, either we recover a single key bit or we have a check bit for each clock. Let us illustrate this with a simple example.

Example 1. Assume that at timet+ 1 we know the whole internal state but the secret key and let δt = δt−1 = δt−2 = 1 and δt−3 = 0. Let k0, k1, k2 and k3

be the key bits selected in given order. In Table 1, we show how the values of NLFSR bits and keystream bits proceed.

Clock-cycle δi ni0 ni1 n2i ni3 · · · ni39 zi D/C

i=t+ 1 - X X X X X X X C

i=t 1 k0⊕X X X X X X X C

i=t−1 1 k1⊕X k0⊕X X X X X k0⊕X D(k0)

i=t−2 1 k2⊕X k1⊕X X X X X k1⊕X D(k1)

i=t−3 0 X k2⊕X X X X X k2⊕X D(k2)

i=t−4 - ? X X X X X X C

Table 2.Xdenotes a known value, ? a value that is either known or unknown,D(ki)

determining the value ofki and C is a check if the keystream bit generated matches

the actual one.

(8)

be eliminated. So, the average number of clocks for each elimination among 2s possible guesses is

s

X

i=0

2·2s−i 2s =

s

X

i=0

1

2i−1 ≈4 for 21≤s≤40.

We see that we can check if a given state is correct in 4 clocks on aver-age. Moreover, it is possible to mount a guess-and-determine attack by using Proposition 1. We can guess 77 bits of the internal state and determine the three remaining bits that appear as XOR in the output (such as lt

31, lt30, l29t )

since three bits of keystream can be produced without knowing the key. Observe that the bits lt_i for 28 ≥ i ≥ 25 are incorporated into the output as XOR at timet+i−30 for the first time after clocktduring the backward clocking. So, the guess-and-determine attack can be improved by further determining lt_i for 28 ≥ i ≥ 25, if a key bit is not involved in the output along with lt_i (that is, δt+i−29 = 0). If δt+i−29 = 1, then guess lti as well and determine the related key bit. So, we guess at least 73 and at most 77 bits according to the values of δt+i−29. The overall complexity is around 270 encryptions. It can be further

improved by assuming thatδt+i−29 = 0 for 28≥i≥25. In this case, we guess

69 bits and determine 11 bits (lt

36, l35t , l34t , lt33 from δt+i−29 = 0 andlt31, . . . , l25t

from keystream bits). The cipher is clocked 9 times on average to come up with a contradiction (5 clocks during determininglt

31, . . . , lt25and 4 clocks for the key

recovery and checking). Repeat the attack 16 times by using shifted keystreams to fulfill the assumption. Hence, the average complexity is 24_·₂69_·₂3.17_{= 2}76.17

clocks and thus 276.17_·₂−8.33 _{= 2}67.84 _{encryptions of Sprout. The data and}

memory complexities are negligible.

4 A Time-Memory-Data Tradeoff Attack

Recall that, treating the bits of the registers as the terms of a sequence, we denote lt+i+j:=l

t+j

i andnt+i+j:=n t+j

i . We mount an attack on Sprout with 2 d _data

and a time complexity of 279−d _{clocks where}_d _≤_{40 by enhancing the idea of} the guess-and-determine attack given in Section 3. We make use of memory also, having roughly 286−d _entries.

The attack scenario is simple. Assume thatδtis zero for consecutivedclocks. That is, the key bits are not incorporated into the NLFSR duringdconsecutive clocks:t−9, t−8, . . . , t+d−10. Then we can make a guess to the internal state at time t and then check if the guess is correct and the condition is satisfied since we can producedbits of outputs without knowing any key bit.

Assuming thatδt−9=δt−8=· · ·=δt+d−10= 0, we get the followingdlinear

(9)

lt−5 ⊕lt+12 ⊕lt+28 ⊕nt ⊕nt+11 ⊕nt+20 = 0

lt−4 ⊕lt+13 ⊕lt+29 ⊕nt+1 ⊕nt+12 ⊕nt+21 = 0

· · ·

lt+4 ⊕lt+21 ⊕lt+37 ⊕nt+9 ⊕nt+20 ⊕nt+29 = 0

lt+5 ⊕lt+22 ⊕lt+38 ⊕nt+10 ⊕nt+21 ⊕nt+30 = 0

· · ·

lt+d−6⊕lt+d+11⊕lt+d+27⊕nt+d−1⊕nt+d+10⊕nt+d+19= 0

Ifd≤20, we havedlinear equations with at most 80 unknowns since there will be no feedback for NLFSR. Note that we can write an LFSR feedback as a linear equation without adding new unknowns. So, we simply exclude both the linear equations and new unknowns coming from LFSR feedback. However, if d > 20, the new unknowns from the feedback of the NLFSR will appear with some nonlinear equations. The new equations coming from the feedback are

ct₄ ⊕lt ⊕nt+40 ⊕g(Nt) = 0

ct₄+1 ⊕lt+1 ⊕nt+41 ⊕g(Nt+1) = 0

· · ·

ct₄+d−21⊕lt+d−21⊕nt+d+19⊕g(Nt+d−21) = 0,

which are adding d−20 more equations with d−20 new unknowns nt+40,. . .,

nt+d+19. We have 2d−20 equations with 60 +d unknowns. We see that by

carefully choosing 80−d unknowns to be guessed, we mostly come up with 2d−20 linear equations with 2d−20 unknowns. Solving the linear system for each counter set, we can determine 2d−20 unknown bits. That is, we can determine the whole internal state. Then we can produce the output up to d+ 3 bits for all possible counter combinations. See Section 4.1 for solving the system of equations in the most extreme case.

Let us store all the guessed internal states whereδt = 0 for d consecutive clocks with their outputs up to d+ 3 bits for each counter, sorted according to the outputs. Note that we can generate keystream bitszt−10, zt, . . . , zt+d−8 due

to the fact that the output is not affected by the most and the least significant taps of the NLFSR.

(10)

key easily once the internal state is known. Recovering the key from a known internal state is explained in Section 3.

For each output of d+ 3 bits, we have on average 277−2d _{internal states} producing the output. We both check the validity of the internal state and recover the key bits for each candidate. On average, clocking 4 times is enough for the checking. Hence, the time complexity is 4·277−d _{= 2}79−d _{clocks which is} equivalent to 271−d _{encryptions of Sprout along with 2}d _{table lookups.}

4.1 Detailed workload for d= 40

We focus on the extreme case d= 40 (i.e., δt = 0 for 40 consecutivet values) and give the workloads in detail for this case. We need to solve the following systems of equations: a linear systemLS and a nonlinear systemN S.

LS:=                                       

lt−5 ⊕lt+12⊕lt+28⊕nt ⊕nt+11⊕nt+20= 0

lt−4 ⊕lt+13⊕lt+29⊕nt+1 ⊕nt+12⊕nt+21= 0

· · ·

lt+4 ⊕lt+21⊕lt+37⊕nt+9 ⊕nt+20⊕nt+29= 0

lt+5 ⊕lt+22⊕lt+38⊕nt+10⊕nt+21⊕nt+30= 0

· · ·

lt+33⊕lt+50⊕lt+66⊕nt+38⊕nt+49⊕nt+58= 0

lt+34⊕lt+51⊕lt+67⊕nt+39⊕nt+50⊕nt+59= 0

N S:=

                   ct

4 ⊕lt ⊕nt+40⊕g(Nt) = 0 ct₄+1 ⊕lt+1 ⊕nt+41⊕g(Nt+1) = 0

· · ·

ct₄+18⊕lt+18⊕nt+58⊕g(Nt+18) = 0

ct₄+19⊕lt+19⊕nt+59⊕g(Nt+19) = 0

First of all, we can easily write allli’s in LS as linear combinations of lj’s for t ≤ j ≤ t+ 39. Let LS0 be the new system of equations where all li’s for t −5 ≤ i ≤ t −1 and t+ 40 ≤ i ≤ t+ 67 are replaced with lj’s for t ≤j ≤t+ 39 in accordance with the LFSR feedback function. Now denoting L:= (lt, lt+1, . . . , lt+39)T and

B:= (nt⊕nt+11⊕nt+20, nt+1⊕nt+12⊕nt+21, . . . , nt+39⊕nt+50⊕nt+59)T,

we can writeLS0 asM · L=BwhereMis the 40×40 coefficient matrix oflj’s andT _{is the transpose operation.}

The sequencelt−5⊕lt+12⊕lt+28can be also produced by the LFSR of Sprout.

(11)

power of the next state matrix and hence it is invertible. We also have verified on a computer thatMis invertible. Hence,L=M−1_B_{implying we can equate}

eachlj, t≤j ≤t+ 39, to linear combinations of someni’s for t ≤i ≤t+ 59. Plugging in the values oflj’s fort≤j≤t+ 19 inN S, we end up with a system, denoted byN S0, of 20 nonlinear equations in 60 variables (ignoring the counter values for the moment). As a result, the main goal is to find all the solutions of N S0 and store them in a table with their outputs.

It’s expected that there exist 240_{solutions for the system. One approach for}

solving it may be to use a SAT solver. However, this approach is quite inefficient compared to solving a system of linear equations. Having a more detailed look at the equations inN S0, we see that by carefully guessing 40ni values, the system becomes almost linear. To do that, one possible choice for selected ni values, SEC{ni} is as follows:

nt+5 nt+6 nt+8 nt+9 nt+11nt+12nt+14nt+15nt+17nt+18

nt+19nt+21nt+22nt+23nt+25nt+26nt+27nt+29nt+30nt+31

Selection of thesenivalues mostly (but not always) follows the rule thatni+1and

ni+2 should be guessed to make anni value appear as a linear term. Fixing the

values in SEC{ni}, there still exists (at most 3) nonlinear terms nt+51nt+53,

nt+51nt+56 and nt+56nt+57 with probability 1₂. We see some nonlinear terms

when (nt+48, nt+52, nt+54, nt+55)∈S where

S :={(1,1,1,1),(0,1,1,1),(1,0,1,1),(1,1,0,1), (1,1,1,0),(0,0,1,1),(0,1,0,1),(1,1,0,0)}.

It is expected that each 40-bit guess in both cases,

(nt+48, nt+52, nt+54, nt+55)∈S and (nt+48, nt+52, nt+54, nt+55)6∈S

yields 1 solution on average, which we have verified experimentally (see Appendix B). Hence, by solving all the cases where the system is linear, we can obtain around 239 _{solutions. As a result, we can obtain half of the solutions with an}

effort of solving 1₂·240= 239systems of linear equations. If one wishes to obtain all the solutions, then in order to make the system linear, 2 more ni values (e.g., adding nt+51 and nt+57 to SEC{ni} and formingEXSEC{ni}) need to

be guessed. Hence, the effort of finding all the solutions is equivalent to solving

1 2 ·2

42_{+ 2}39 _≈ ₂41.32 _{systems of linear equations. The nonlinear cases can be}

solved using a SAT solver as well. However, this is still less efficient than guessing 2 more bits and solving a linear system.

We can repeat the computations for each candidate of the counter values. For d= 40, there are 39 different values of (ct

4, c

t+1 4 , . . . , c

t+d−21

4 ), which we will refer

(12)

Algorithm 1Creating Tables

foreach choice of 40nivalues inSEC{ni}with (nt+48, nt+52, nt+54, nt+55)∈S do

Nt←Solve(N S0)

foreachCiwhereCi’s denote possible counter arraysdo

FindNCi

t fromNtby plugging in the values inCi

LCi

t ← M

−1_{· B}

Ztt−+3210←(zt−10, . . . , zt+32) // generate keystream for 43 clocks.

Store (Ztt−+3210, N Ci t−9, L

Ci

t−9) in a tableT Ci

Z sorted byZ t+32 t−10

end for end for

foreach choice of 42nivalues inEXSEC{ni}with (nt+48, nt+52, nt+54, nt+55)6∈S

do

Nt←Solve(N S0)

foreachCiwhereCi’s denote possible counter arraysdo

FindNCi

t fromNtby plugging in the values inCi

LCi

t ← M

−1_{· B}

Zt+32

t−10←(zt−10, . . . , zt+32) // generate keystream for 43 clocks.

Store (Zt+32 t−10, N

Ci t−9, L

Ci

t−9) in a tableT Ci

Z sorted byZ t+32 t−10

end for end for

each counter array to produce 43 bits of output for each internal state. Since there are 74 possible counter arrays to produce 43 output bits, we need to store 74 separate tables. The solutions give internal states at thet-th clock. Store the internal states at time (t−9) just to avoid clocking backwards 9 times before mounting the key recovery attack in Section 3.

Algorithm 1 and Algorithm 2 summarizes the full attack. Algorithm 1 is the off-line phase of the attack, related to preparing tables for each counter value. Algorithm 2 gives the set of instructions of how to recover key or come to a contradiction in a formal way for each trial of 43-bit keystream segment. We verified by producing some test values that Algorithm 1 generates internal states ((Nt, Lt) pairs) for which (δt−9, δt−8, . . . , δt+30) = (0,0, . . . ,0) and that

Algorithm 2 finds the correct key when our assumption aboutδtis fulfilled during keystream generation (see Appendix B).

The data complexity is given as D = 240 _{bits of output, not necessarily}

produced by just one IV, and we have 74 tables of each having 240 _{rows. Each}

row contains 80-bit internal state and 3 output bits, indexed by the remaining 40 bits of the output. Hence, the memory requirementM is roughly 770 Terabytes3

(this can be reduced by storing some of the tables in cost of increased data complexity). The precomputation for creating the tables is 241.32 _{row reduction}

operations of at most 20 by 20 matrices along with producing 43 bit outputs for each solution. Since 60 ni values and all the li values are known, 33-bit

3 _{One may choose to store only 60}_n

i values to reduce the memory requirement. In

(13)

Algorithm 2Online Phase of the Attack

Take 240 keystream bits (not necessarily generated using the same IV) foreach 43-bit keystream blockdo

//Cj:= corresponding counter array for the current clock-cycle

if Keystream block exists inTCj Z then

Fill NLFSR and LFSR according to values in TCj Z

while Internal state does not produce a contradictiondo Clock cipher backwards

if keystream is in{0,1}then Check state!

else

Determine key bit value involved end if

end while end if end for

output may be generated by substituting the appropriate values in the output function without needing to clock the whole cipher. In addition, we need to clock backwards 9 times to produce the remaining 10 output bits, which brings an additional effort of about 9·240·2−8.33≈234.84encryptions. The workload of adding an output with its internal state to the appropriate table in a sorted way is negligible. The time complexity during the online phase is 240 table lookups along with 2−3·240·5·2−8.33 ≈231 encryptions. Recall that only 1/8 of the keystreams are in one of the tables. And, if a keystream is found in a table, the corresponding internal state obtained is for timet−9, but we already make use ofzt−10. Hence, a candidate is checked in 1 + 4 = 5 clocks on average.

4.2 Reducing the data complexity

Let∆d

t := (δt, δt+1, . . . , δt+d−1). We focused on the case when∆40t = (0,0, . . . ,0) for some t. However, suppose ∆40

t contains a single 1 at i-th index and ki is incorporated into the NLFSR feedback at clock-cyclet+i. In that case, we can create the tables for ki = 0 and ki = 1 separately (which would double the table size) and apply the same attack. If we do this for each possible indexi,M increases by a factor of 2·40 = 80 andDdecreases by a factor of 40. Generalizing this idea, we can create tables for each possible∆dt. If there aren1s in∆dt, M increases by a factor of 2n· d_n

whileD decreases by a factor of d_n.

5 Conclusion and Discussion

(14)

the internal state satisfying a certain property and then determined the remain-ing taps by solvremain-ing a specific system of linear equations. After storremain-ing all such internal states in tables with their outputs (we can produce some output bits without knowing the key bits, thanks to the special property that the internal states satisfy), we mount a divide-and-conquer attack to recover the key from the given output keystream bits. The complexities indicate that the attack is highly feasible. We have verified our statements by conducting several exper-iments. The detailed experimental results including an implementation of the attack with a small-scale table are given in Appendix B.

Designing ultra lightweight stream ciphers allocating less than 1K GE in hardware with a moderate security level such as 80 bits, is a new challenge, initiated by Armknecht and Mikhalev [2]. Even though their first design attempt does not achieve the required security level, we believe that several other designs will likely appear in the literature soon and some of them will probably be secure enough to be used in the industry.

We claim that producing output during each clocking of the registers is not a convenient design philosophy for ciphers whose internal state sizes are not large enough. Otherwise, they may be prone to guess-and-determine or divide-and-conquer attacks. We think that adopting the design philosophy of Grain family, particularly in bitwise operations and clockwise output generation, has brought the security failure to Sprout. The research question is about the optimization of the rate of the output generation over the register clocking in terms of security versus throughput. This can be considered as a generic question relating to all ultra lightweight stream ciphers. One straightforward countermeasure against all the attacks on Sprout may be decreasing the throughput of Sprout and giving only one bit output in, for instance, 16 clockings of Sprout registers. That output may be the sum of all the 16 bit outputs of the original Sprout.

Acknowledgments. We would like to thank anonymous reviewers for their helpful comments, especially the second reviewer who makes a comprehensive analysis of the paper and a lot of useful suggestions. We would also like to thank Ferhat Karako¸c and Güven Yücetürk for commenting on the paper.

References

1. Martin ˚Agren, Martin Hell, Thomas Johansson, and Willi Meier. Grain-128a: A new version of Grain-128 with optional authentication.Int. J. Wire. Mob. Comput., 5(1):48–59, December 2011.

2. Frederik Armknecht and Vasily Mikhalev. On lightweight stream ciphers with shorter internal states. In Gregor Leander, editor,Fast Software Encryption, vol-ume 9054 ofLecture Notes in Computer Science, pages 451–470. Springer Berlin Heidelberg, 2015.

3. Steve Babbage. Improved exhaustive search attacks on stream ciphers. Security and Detection 1995, European Convention IET, 1995.

(15)

5. Elad Barkan, Eli Biham, and Adi Shamir. Rigorous bounds on cryptanalytic time/memory tradeoffs. In Cynthia Dwork, editor, Advances in Cryptology -CRYPTO 2006, volume 4117 of Lecture Notes in Computer Science, pages 1–21. Springer Berlin Heidelberg, 2006.

6. Alex Biryukov and Adi Shamir. Cryptanalytic time/memory/data tradeoffs for stream ciphers. In Tatsuaki Okamoto, editor, Advances in Cryptology - ASI-ACRYPT 2000, 6th International Conference on the Theory and Application of Cryptology and Information Security, Kyoto, Japan, December 3-7, 2000, Proceed-ings, volume 1976 of Lecture Notes in Computer Science, pages 1–13. Springer, 2000.

7. Alex Biryukov, Adi Shamir, and David Wagner. Real time cryptanalysis of A5/1 on a PC. In Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, and Bruce Schneier, editors,Fast Software Encryption, volume 1978 ofLecture Notes in Computer Sci-ence, pages 1–18. Springer Berlin Heidelberg, 2001.

8. A. Bogdanov, L. R. Knudsen, G. Leander, C. Paar, A. Poschmann, M. J. B. Rob-shaw, Y. Seurin, and C. Vikkelsoe. PRESENT: An ultra-lightweight block cipher. In Pascal Paillier and Ingrid Verbauwhede, editors, Cryptographic Hardware and Embedded Systems - CHES 2007, volume 4727 ofLecture Notes in Computer Sci-ence, pages 450–466. Springer Berlin Heidelberg, 2007.

9. Julia Borghoff, Anne Canteaut, Tim G¨uneysu, Elif Bilge Kavun, Miroslav Kneze-vic, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Rombouts, Søren S. Thomsen, and Tolga Yal¸cın. PRINCE – a low-latency block cipher for pervasive computing applications. In Xiaoyun Wang and Kazue Sako, editors, Advances in Cryptology – ASIACRYPT 2012, volume 7658 of Lecture Notes in Computer Science, pages 208–225. Springer Berlin Hei-delberg, 2012.

10. Christophe De Canni`ere, Orr Dunkelman, and Miroslav Kne˘zevi´c. KATAN and KTANTAN – a family of small and efficient hardware-oriented block ciphers. In Christophe Clavier and Kris Gaj, editors,Cryptographic Hardware and Embedded Systems - CHES 2009, volume 5747 ofLecture Notes in Computer Science, pages 272–288. Springer Berlin Heidelberg, 2009.

11. Jovan Dj. Goli´c. Cryptanalysis of alleged A5 stream cipher. In Walter Fumy, editor,Advances in Cryptology - EUROCRYPT ’97, International Conference on the Theory and Application of Cryptographic Techniques, Konstanz, Germany, May 11-15, 1997, Proceeding, volume 1233 ofLecture Notes in Computer Science, pages 239–255. Springer, 1997.

12. Yonglin Hao. A related-key chosen-iv distinguishing attack on full Sprout stream cipher. Cryptology ePrint Archive, Report 2015/231, 2015. http://eprint.iacr.org/. 13. Martin Hell, Thomas Johansson, Alexander Maximov, and Willi Meier. A stream cipher proposal: Grain-128. InInformation Theory, 2006 IEEE International Sym-posium on, pages 1614–1618, July 2006.

14. Martin Hell, Thomas Johansson, Alexander Maximov, and Willi Meier. The Grain family of stream ciphers. In Matthew Robshaw and Olivier Billet, editors, New Stream Cipher Designs, volume 4986 ofLecture Notes in Computer Science, pages 179–190. Springer Berlin Heidelberg, 2008.

15. Martin Hell, Thomas Johansson, and Willi Meier. Grain: a stream cipher for constrained environments. Int. J. Wire. Mob. Comput., 2(1):86–93, May 2007. 16. Martin E. Hellman. A cryptanalytic time-memory trade-off. IEEE Transactions

(16)

17. Ferhat Karako¸c, H¨useyin Demirci, and A. Emre Harmancı. ITUbee: A software oriented lightweight block cipher. In Gildas Avoine and Orhun Kara, editors, Lightweight Cryptography for Security and Privacy, volume 8162 ofLecture Notes in Computer Science, pages 16–27. Springer Berlin Heidelberg, 2013.

18. Virginie Lallemand and Mar´ıa Naya-Plasencia. Cryptanalysis of full Sprout. In Rosario Gennaro and Matthew Robshaw, editors, Advances in Cryptology – CRYPTO 2015, volume 9215 of Lecture Notes in Computer Science, pages 663– 682. Springer Berlin Heidelberg, 2015.

19. Subhamoy Maitra, Santanu Sarkar, Anubhab Baksi, and Pramit Dey. Key recovery from state information of Sprout: Application to cryptanalysis and fault attack. Cryptology ePrint Archive, Report 2015/236, 2015. http://eprint.iacr.org/. 20. Tomoyasu Suzaki, Kazuhiko Minematsu, Sumio Morioka, and Eita Kobayashi.

twine: A lightweight block cipher for multiple platforms. In Lars R. Knudsen and Huapeng Wu, editors,Selected Areas in Cryptography, volume 7707 ofLecture Notes in Computer Science, pages 339–354. Springer Berlin Heidelberg, 2013. 21. Wenling Wu and Lei Zhang. LBlock: A lightweight block cipher. In Javier Lopez

and Gene Tsudik, editors, Applied Cryptography and Network Security, volume 6715 of Lecture Notes in Computer Science, pages 327–344. Springer Berlin Hei-delberg, 2011.

A

Details of Sprout

k k

k h RKF

g _f

zt

NLFSR LFSR

KEY

ct

4

7

6

2 7

3 3

29

Fig. 1.The High Level Structure of Sprout

RKF in Figure 1 represents the round key function. The LFSR is clocked as lt₃₉+1 = f(Lt) = lt

0⊕lt5⊕lt15⊕lt20⊕l25t ⊕l34t and l

t+1

i = lti+1 for 0 ≤ i ≤ 38.

(17)

is XORed with the round key bitk_t∗, the counter bitct

4 and the output of the

LFSR lt=lt0. So, the feedback of the NLFSRn

t+1

39 is given as follows:

nt₃₉+1=g(Nt)⊕kt∗⊕l t

0⊕c

t

4

=k_t∗⊕l₀t⊕ct₄⊕nt₀⊕n₁₃t ⊕nt₁₉⊕nt₃₅⊕nt₃₉⊕nt₂nt₂₅ ⊕nt₃nt₅⊕n₇tnt₈⊕nt₁₄nt₂₁⊕nt₁₆nt₁₈⊕nt₂₂nt₂₄⊕nt₂₆nt₃₂ ⊕nt₃₃nt₃₆nt₃₇nt₃₈⊕n₁₀t nt₁₁nt₁₂⊕nt₂₇nt₃₀nt₃₁

where

k∗_t =kt, 0≤t≤79 k∗_t =ktmod 80·(lt4⊕l

t

21⊕l

t

37⊕n

t

9⊕n

t

20⊕n

t

29)

withδt:=lt4⊕lt21⊕l37t ⊕nt9⊕nt20⊕nt29. Once the internal state is determined

for clock-cyclet, the keystream bitzt is generated as follows:

zt=nt4l

t

6⊕l

t

8l

t

10⊕l

t

32l

t

17⊕l

t

19l

t

23⊕n

t 4l t 32n t 38

⊕lt₃₀⊕nt₁⊕nt₆⊕nt₁₅⊕nt₁₇⊕nt₂₃⊕nt₂₈⊕nt₃₄

Initialization phase: The feedback registers are initialized as follows:

ni =ivi, for 0≤i≤39 li =iv40+i,for 0≤i≤29 li = 1, for 30≤i≤38 l39= 0

After filling the registers, the cipher is run 320 clocks without producing keystream while the output zt is fed back into both feedback registers such that lt39+1 =

zt⊕f(Lt) andnt39+1=zt⊕k∗t⊕lt0⊕ct4⊕g(Nt). The keystream generator starts generating the keystream after the initialization phase.

The designers of Sprout suggest to generate up to 240 _{keystream bits with one}

(key, IV) pair.

Workload of exhaustive search:To exhaustively search a key, one has to run the initialization phase first (320 clocks), and then generate 80 bits of keystream for a unique match. However, since each keystream bit generated matches the corresponding actual keystream bit one with probability 1₂, 280keys are tried for 1 clock and roughly half of them are eliminated, 279for one more clock and half of the remaining keys are eliminated, and so on. Hence, the average number of clocks per one trial after the initialization step among 280 keys is

79 X

i=0

280−i 280 =

79 X

i=0

1 2i ≈2

As a result, we will assume that clocking the registers once will cost roughly

1 322 ≈2

(18)

B

Experiments

We did not implement the whole attack due to our memory shortage. Instead, we have conducted several experiments to verify our attack. We have solved millions of the systems of linear equations to collect some of the special internal states where δt = 0 for consecutive 40 values of t and stored the solutions in tables. Then, we have accomplished to recover an internal state in the table with the key used in the given keystream sequence.

First of all, we performed a run test onδt values to check if the probability that 40 consecutiveδtvalues vanish simultaneously is around 2−40for the Sprout internals. We chose 30 random (IV, K) pairs, and run the cipher 240 _{clocks for}

each selection. We depict the average number of runs having length i in Table 3. The values for 10 ≤ i ≤ 21 are not given in the table since they are too big to display. However, we can simply say that the empirical result for any 10≤i≤21 does not deviate from the corresponding expected value more than 0.0678 percent.

i i+ 1 i+ 2

Empirical Expected Empirical Expected Empirical Expected

i= 22 65596.66 65536 32802.60 32768 16398.17 16384

i= 25 8202.73 8192 4114.03 4096 2042.5 2048

i= 28 1025.9 1024 509.9 512 259.4 256

i= 31 127.8 128 64.2 64 31.33 32

i= 34 17.63 16 7.7 8 4.13 4

i= 37 1.83 2 1.2 1 0.400 0.500

i= 40 0.333 0.250 0.133 0.125 0 0.062

i= 43 0 0.031 0.067 0.016 0.033 0.008

Table 3.Comparison between the empirical results and the expected number of runs.

Moreover, we have implemented the attack with a small-scale table in Sage 6.5. We found more than 5 million solutions for the system N S0 and generated tables for different counter values. The total size of all the tables is about 215 MB. For a randomly chosen state in the tables, we have found a valid (IV, K) pair generating the state. Then, we mounted the attack on the corresponding keystream sequence and successfully recovered the key. The (IV, K) pair was

IV = 1101010110100000111010011001011101111010 011011100110110011000001001010

K= 1000100100011001111111110000011000111110 0011001110100100010110100010000100101001

(19)

internal state at timet was

Lt= 0100111111110010000010111111011100101111 Nt= 1111011001011111111011111001000101100000

which is in one of the tables. Running the key recovery attack, we have found 62 key bits in 160 clocks, verifying our statements. The remaining bits can be recovered by an exhaustive search.

We also verified the average number of clocks iterated before arriving at a contradiction during the internal-state-check mechanism. We chose 1 million random internal states and ran the key recovery attack for each of them. Each state was clocked backwards once before starting the test since the output at this clock is definitely a check bit. Table 4 shows how many states were eliminated at each clocki. The average number of clocks run for a candidate was about 3.999, as expected (See Section 3).

Clock-cycle number # of states eliminated ati-th clock

i= 1 250071

i= 2 187325

i= 3 140862

i= 4 105620

i= 5 79061

i= 6 59067

i= 7 44714

i= 8 33287

i= 9 25044

i= 10 18590

Average # of

clocks run 3.999· · ·

Table 4.Number of state candidates eliminated at each clocki.

Another experiment was about verifying the assumption that there exists approximately 240 _{solutions for the system} _{N S}0 _{in total. We have solved the}

system 16000 times for each case when

(nt+48, nt+52, nt+54, nt+55)∈S and (nt+48, nt+52, nt+54, nt+55)6∈S.

(20)

# of solutions # of eqns withisolns when (nt+48, nt+52, nt+54, nt+55)∈S

# of eqns withisolns when (nt+48, nt+52, nt+54, nt+55)6∈S

i= 0 6049 6478

i= 1 5198 4376

i= 2 3704 4637

i= 3 671 0

i= 4 344 501

i= 5 9 0

i= 6 20 0