Parametric Trojans for Fault-Injection Attacks on Cryptographic Hardware

(1)

Parametric Trojans for Fault-Injection Attacks on

Cryptographic Hardware

Raghavan Kumar$, Philipp Jovanovice, Wayne Burleson$ and Ilia Poliane

$_{University of Massachusetts Amherst, 01002, USA} e_{University of Passau, 94032, Germany}

{rkumar|burleson}@ecs.umass.edu, {philipp.jovanovic|ilia.polian}@uni-passau.de

Abstract—We propose two extremely stealthy hardware Trojans that facilitate fault-injection attacks in crypto-graphic blocks. The Trojans are carefully inserted to modify the electrical characteristics of predetermined transistors in a circuit by altering parameters such as doping concentration and dopant area. These Trojans are activated with very low probability under the presence of a slightly reduced supply voltage (0.001 for 20% Vdd reduction). We demonstrate the effectiveness of the

Trojans by utilizing them to inject faults into an ASIC implementation of the recently introduced lightweight cipher PRINCE. Full circuit-level simulation followed by differential cryptanalysis demonstrate that the secret key can be reconstructed after around 5 fault-injections.

Keywords-fault-based cryptanalysis, fault injection, hardware Trojans

I. INTRODUCTION

Hardware Trojans are malicious modifications of a circuit by an untrusted third-party manufacturer that aim at manipulating its behavior in an undesired manner [1]. A Trojan may deactivate the circuit (denial-of-service), change its functionality, or establish a hidden side chan-nel through which protected secret information processed by the circuit is leaked. Trojans may be activated by external events (e. g., applying a specific combination of logic values to the circuit inputs) or internal events (for instance, a counter reaching a certain value). In general, Trojans are designed to be stealthy, that is, to escape detection by methods such as testing [2], optical inspection [3] or side-channel analysis [1].

In this paper, we introduce two types of Trojans that are optimized for fault-based attacks on circuits that implement cryptographic functions. In the course of a fault-based attack [4], physical disturbances are introduced into the circuit when it runs the cryptographic algorithm. The fault-affected output values are collected, and differential cryptanalysis is used to derive the secret key. Recently, a number of highly efficient attacks on

state-of-the-art ciphers including AES [5], LED [6], [7], PRESENT [8] and PRINCE [7], [9] have been reported. These attacks need a small number of fault injections (1 for AES and LED64, around 3–4 for LED128 and PRINCE) for successful key recovery. However, fault injection must be precise: both the location and the time of the disturbance have to be well-controlled. Low-cost fault-injection techniques like Vdd reduction or clock

manipulation do not achieve the required accuracy, while highly precise methods such as pinpointed irradiation of desired fault sites by intensive laser light are difficult to perform and require costly equipment [10].

We call the Trojans introduced in this work

MAnufacturing-Process-LEvel Trojans, or MAPLE Tro-jans. They are based on manipulating the voltage-transfer characteristics (VTC) of a specific gate in the circuit by changing the doping concentration or reducing the dopant area within the active area of its transistors. The idea of dopant-level Trojans was recently introduced in [11] and used to manipulate both the functionality and the non-functional properties of the affected gates and memory elements. Our MAPLE Trojans inherit some properties of the Trojan from [11]; in particular, the layout of the circuit is not changed and the Trojans are nearly impossible to detect by optical inspection. However, the Trojans in [11] resulted in a deterministic change of the affected gate’s function, similar to the effect of a stuck-at fault, and can be detected by testing.1

In contrast, our Trojans only slightly shift the Vin-Vout

characteristic of the gate. The resulting faults are rare and therefore highly unlikely to be detected during testing, while at the same time sufficient for a successful fault-based attack.

We demonstrate the fault-based attack on the recent

1

(2)

lightweight cipher PRINCE [12]. The attack [7] works in multiple stages and requires two or three fault injec-tions into the 8th round (out of 10) and several further (between 2 and 11, roughly 3 on average) fault injections into the 9th round of PRINCE. In order to be exploitable for the attack, the injected faults must adhere to several conditions, and the faults in the 8th and the 9th round maynotbe present at the same time. MAPLE Trojans are applied to three inverters that belong to the 8th round of PRINCE and three further inverters from the 9th round. Trojans are activated by a slight reduction of Vdd, with

the probability of activation being around 10−5 for 10% reduction and around 10−3 for 20% reduction.

The multi-stage attack is particularly challenging for the MAPLE Trojan insertion because a fault may be injected in the 8th and in the 9th round at the same time and such double faults are not exploitable. We demonstrate the feasibility of the attack using both a combinational (one clock cycle per encryption) and a se-quential (one cycle per round, 10 cycles per encryption) implementation of the circuit. We simulate the PRINCE circuit incorporating the manipulated six inverters on electrical level and hand the observed outputs to a soft-ware routine that performs the differential cryptanalysis and derives the secret key. In the combinational circuit, the percentage of exploitable faults is around 10%. This means that around 10,000 to 1,000,000 encryptions are sufficient to obtain an exploitable fault, depending on the used amount of Vdd reduction. This is clearly

feasible as one encryption takes one clock cycle. In the sequential implementation, the correlation is much lower, the percentage of exploitable faults is higher but the duration of one encryption is longer, so the overall effort is comparable to the combinational case.

The remainder of the paper is organized as follows. Background on Trojans is provided in the next section. Section III provides details on the MAPLE Trojans. The threat model, detectability methods and the possible countermeasures are provided in section IV. Section V explains the fault attack on the PRINCE circuit. Results are reported in Section VI. Section VII concludes the paper.

II. HARDWARE TROJANS ANDTHEIRDETECTION

Apart from register transfer level (RTL) modifications, Trojans can also be inserted to an IC by the malicious foundry. As the foundry doesn’t have access to the RTL code, the modifications are made in the layout mask and process level [11], [13]. The Trojans proposed in [13] affect the reliability of CMOS circuits by accelerating

wear-out mechanisms such as Negative Bias Tempera-ture Instability (NBTI) and Hot Carrier Injection (HCI) effects. To insert the Trojans, the manufacturing process conditions are slightly altered. The recently reported layout-level Trojans in [11] involve modification of the dopant polarities in the active-area of a transistor and are extremely difficult to discover by functional testing. By applying the Trojan to a random number generator (RNG) from Intel, the authors demonstrate that the entropy of the RNG can be reduced but in a way that is not noticed by the on-chip self-test block.

Most of the hardware Trojans inserted into an IC can be detected using either side-channel or activation mechanisms [1]. As the Trojans typically have para-metric effects on the circuit including changes in delay and power consumption profile, they can be detected by power- and timing-based side-channel analysis (SCA) [2]. These techniques require comparison of the mod-ified chip with a golden chip. So, the effectiveness of these techniques are often affected by increasing process variations in integrated circuits. An IC can have millions of paths, which can make timing-based SCA impractical. Another way to detect Trojans is by functional testing, in which the Trojans are activated by test patterns. One of the problems involved in functional testing is the lack of information about the inserted Trojan, which can increase the complexity involved in test pattern generation.

III. MAPLE TROJANS

In this section, we present efficient techniques for inserting Trojans to an IC to facilitate fault attacks. The proposed hardware Trojans are based on the modification of a logic gate’s electrical characteristics in such a way that the metal, polysilicon layer and active area remain unchanged. This is essential to render them hard to detect by optical inspection, as changes in the above mentioned layers can be detected reliably by inspection [11]. To alter the gate’s electrical characteristics, we exploit doping concentration and dopant area within the active area of a transistor. As the proposed Trojans are either in manufacturing level or layout-level and also introduce changes in electrical characteristics, they fall under the category ofparametric Trojans [1]. We use an inverter as the target logic gate to explain the proposed Trojan insertion techniques.

A. Doping Concentration Manipulation

In this hardware Trojan insertion technique, the Vin

(3)

target logic gate is modified by changing the doping concentration in the channel region. The channel doping concentration of the n-MOS and p-MOS transistors is one of the major factors that determine the threshold voltage of the transistors and therefore the VTC of the logic gate. The malicious foundry could create a Trojan gate, e.g., an inverter with a manipulated VTC, by reducing the doping concentration. This is illustrated in Figure 1. As shown in Figure 4, the threshold voltages of the transistors in an inverter are increased and the inverter exhibits a reduced voltage swing. This results in a shift in switching threshold. We call this type of gate modification TrojanConc and denote the switching threshold of the Trojan gate by Vm. We define switching

threshold as the input voltage (Vin) at which the output

voltage (Vout) is around 0.5 V, although TrojanConc

exhibits a reduced output voltage swing. The shift inVm

will have a significant impact on the gate being driven by the Trojan as explained in section III-C.

Fig. 1. Cross-sectional view of (a) original inverter and (b) Trojan

inverter using doping concentration manipulation

B. Dopant Area Manipulation

A different Trojan inverter can be created by reducing the dopant area within the active area of a transistor [11]. This effectively weakens a transistor and pushes the VTC towards the weakened transistor. For example, by reducing the dopant area of the n-MOS transistor in an inverter, a Trojan with a VTC shown in Figure 4 can be created and we call this type of gate modification as

TrojanArea. A similar Trojan inverter can be created by reducing the dopant area within the p-MOS transistor. The layouts of the normal inverter and TrojanArea are shown in Figure 3. The shift in switching voltage (Vm)

for TrojanAreais shown in Figure 4. In our analysis, we observed that the maximum shift inVmfor the Trojans is

significantly higher than the shift observed due to process variations (3σ ≈150 mV for 45 nm CMOS technology [14]). The resulting VTCs with process variations (both the maximum and minimum bounds) are shown in

Fig-ure 5. Note that the VTC of the Trojan is far from the shifted VTCs due to process variations and hence the faults induced into the circuit will be most likely from the Trojans at slightly reduced Vdd.

Fig. 2. Layout of (a) original inverter and (b) Trojan inverter using

dopant polarity manipulation from [11]

Fig. 3. Layout of (a) original inverter and (b) Trojan inverter using

dopant area manipulation

C. Trojan Activation

The proposed Trojans are activated in the presence of slightly reduced supply voltage. If the supply voltage is noisy enough, then the Trojan inverter will flip its state, when the supply voltage crosses the switching threshold. To ensure that only the Trojan gate flips the state, the switching threshold of the Trojan should be pushed far away from the original threshold. The switching probability ortrigger factor under variousVdd

(4)

0 0.5 1

0 0.2 0.4 0.6 0.8 1

V

out

V_in ∆V_m

No Trojan TrojanConc TrojanArea Dopant Trojan[11]

Fig. 4. Electrical characteristics of the unmodified and Trojan

inverters

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Vout

V_in

Maximum bound Trojan Nominal VTC Minimum Bound

Fig. 5. Impact of process variations on Nominal VTC and the Trojan VTC

model the noisy supply voltage as a Gaussian distribution with a 3σ deviation of ±10% of the meanVdd. We can

infer that the trigger factor is sufficiently low (<10−6) when Vdd is reduced from its nominal value of 1 V to

0.9 V. This means that the Trojan inverter approximately produces a faulty bit (bit-flip) with a probability of10−6. Note that the activation of the MAPLE Trojans pro-posed in this paper is fundamentally different from the technique in [11], even though it works on process level. The manipulation from [11] is based on a complete change of dopant polarities and results in a “stuck-at” behavior, that is, constant voltage at the inverter output independent of the voltage at its input (see the Trojan in Figure 2 and VTC in Figure 4). This behavior results in deterministic fault injection which is detectable by func-tional testing. In contrast, the MAPLE Trojans from this section are activated stochastically, with low probability controlled by the extent of manipulation (actual change in dopant area or doping concentration). This allows fault injection with a rate that is still sufficient for practical

0.0001 0.001 0.01 0.1 1 10 100

0.65 0.7 0.75 0.8 0.85 0.9 0.95

Trigger factor (%)

V_dd

TrojanConc TrojanArea

Fig. 6. Triggering factor of the Trojan inverters

cryptanalysis but makes the detection infeasible, as dis-cussed in Section IV. Also, [11] discusses about Dopant-area Trojans from side-channel perspective (reducing dopant area within the active area of a transistor) and we focus only on transient fault-injection rather than a side-channel perspective. Conventional functional test methods are based on the notion of fault coverage: a fault that did not manifest itself during test is considered to be absent from the circuit. This approach is not applicable in case of non-deterministic faults which show up with low probability, such as the faults due to MAPLE Trojans. We are not aware of earlier Trojan insertion techniques that employ probabilistic behavior to resist detection while still being useful for an actual attack.

As the main objective of our Trojans is to facilitate fault-based attacks in cryptographic circuits, we apply the presented Trojan insertion techniques over a recently introduced block cipher known as PRINCE [12]. The details on fault injection and cryptanalysis are presented in Section V. In the next section, we outline the threat model and discuss possible countermeasures.

IV. THREATMODEL ANDCOUNTERMEASURES

(5)

“back-doors” in security-relevant IP and could use MAPLE Trojans to implement such backdoors. In the following section, we outline the capabilities needed by the manu-facturer and by the attacker to exploit MAPLE Trojans, their detectability and countermeasures.

A. Threat Model

In order to manipulate gates using TrojanConc or

TrojanArea techniques, the malicious manufacturer has to be able to use masks that deviate from the GDS II layout files received from the circuit designer. This is not a limitation, as most manufacturers perform optical proximity correction and other post-processing in order to improve the manufacturability of the circuit, and these modifications are not communicated back to the designer. Realizing TrojanArea requires a slight modifi-cation of one mask and can be done in a very stealthy way (only the engineer working with the mask will know about the modification). ImplementingTrojanConc

requires a substantial modification of dopant concentra-tion. This is technically feasible: the doped areas of the manipulated gates must undergo implanting for a different amount of time than the doped area of all other gates. The same approach is used to implement low-Vt and high-low-Vt transistors within the same design, even though the concentrations used for TrojanConc are far outside the regular specifications (doping concentrations have been reduced by 1000x for synthesizing the Trojan in Figure 4). An additional set of masks and new process steps which are not performed for a Trojan-free circuit are needed to implement TrojanConc. Therefore, the effort is more significant (though realistic), and more employees will know about the manipulation.

State-of-the-art fault-injection attacks require precise knowledge of the location of the fault, that is, the manipulated logic gate(s). For example, in the case study used later in this paper, six inverters known to be driving specific state bits are manipulated while all other gates are left untouched. In this paper, we follow Kerckhoff’s principle, i.e. “the enemy knows the system”, which is usually assumed in security analysis. In practice, the ma-licious manufacturer may not have a functional gate-level description of the circuit and will first have to find out the location of the gate(s) to be manipulated within the GDS II layout. This is a classical reverse-engineering problem which can be solved given sufficient resources. The circuit designer can complicate reverse-engineering by using obfuscation techniques. In general, a manufacturer who knows that the circuit must include a cryptographic block should be able to locate characteristic structures

of such a block within the larger layout.

After the Trojan-affected circuit has been manufac-tured, mounting the attack involves simply running the circuit at a slightly lowered Vdd. As pointed out in

Section III-C, the Trojan will trigger with a rather low probability of < 10−6 for 10% Vdd reduction. This

means that the operation has to be repeated several thou-sand times until a fault-injection takes place. Therefore, the attacker needs the capability to control the Vdd and

to repeat the computation multiple times assuming that the same secret key is used. These capabilities do not require any non-trivial equipment or skills.

B. Detection

The key characteristic of the MAPLE Trojans is their ultra-low detectability by all known means: functional testing, side-channel analysis, and visual inspection. We elaborate on the detectability aspects in detail.

1) Functional testing under nominal Vdd: Since the

Trojans are not activated under nominalVdd, they cannot

be detected during regular post-manufacturing testing.

2) Functional testing under slightly reduced Vdd:

It appears promising to detect the Trojans under their activation conditions, namely slightly reducedVdd.

Low-voltage testing is feasible and is often performed in practice [15]. However, recall that the activation of the Trojans is probabilistic and the likelihood of activation (trigger factor) is as low as10−6 _{for 10%}_V

ddreduction.

It is acceptable for the attacker, who knows the location of the Trojan gate and its expected trigger factor, to perform the well-defined attack several thousand times. In contrast, the party who runs the test (e. g., the system integrator) neither knows whether a MAPLE Trojan or any other Trojan is present in the circuit at all, nor where it might be located, nor whether the trigger factor is

<10−5_,_<₁₀−4_, _<₁₀−6 _{or any other value. Repeating}

the test several thousand times based just on vague suspicions is generally incompatible with economical aspects. Even if the test is applied and no fault is observed, it is not clear whether the circuit is Trojan-free or a Trojan is present but its trigger factor is so low that more repetitions are required. Finally, if several thousands of tests are indeed performed and a faulty effect is observed, it is easily confused with a random transient fault due to noise or radiation, rather than a Trojan.

3) Functional testing under significantly reducedVdd:

(6)

increased by further lowering the operating voltage. However, ifVddis lowered far beyond its nominal value,

the switching delays of the gates on the circuit paths will be increased and the circuit will exhibit regular timing errors. Consequently, if failures are observed, it does not appear feasible to distinguish the effects of the activated Trojan from that of regular failures.

4) Side-channel analysis (parametric test): The ma-nipulated logic gates have a parametric response that is clearly different from regular gates. If one manip-ulated gate would be considered in isolation, it would be easily distinguished from non-manipulated gates by, for instance, IDDQ testing. This is true forTrojanConc

and, in particular, TrojanArea. However, the number of manipulated gates will be low: for example, only six inverters are affected in the case study in this paper, and that block will be likely integrated into an even larger circuit. Therefore, the effect of few manipulated gates will be negligible compared withIDDQdrawn elsewhere

on the chip and cannot be measured in presence of even minimal variability.

In order to illustrate low detectability of MAPLE Trojans by IDDQ testing, we simulated the circuit of

cipher PRINCE used in Section V-D. Figure 7 shows the current drawn from the supply during ten encryptions using random key and plaintext. The peak in the profile corresponds to the latching of key and plaintext into the corresponding registers right before the start of the encryption. The Trojan inverters inserted into PRINCE are under operation between the time duration 8.7 and 8.75ns, and this period is magnified in Figure 8. The current profile shows the supply drawn current for both the unmodified (“Normal”) and Trojan inverters for the possible input patterns (0 → 1 and 1 → 0). It can be seen from the figure that all deviations in drawn current are absolutely minimal and will likely be masked by the IDDQ drawn in other parts of the chip. Note that

the process variations have not been accounted in this analysis and they can further impede the side-channel analysis using IDDQ profiles.

5) Optical inspection: The Trojans are highly im-mune to detection by optical inspection, as the metal, polysilicon and active area remain unchanged [3].

C. Other countermeasures

1) On-chip voltage detectors: Some cryptographic devices are protected by voltage detectors in order to identify fault injection by Vdd manipulation. However,

the activation of the presented Trojans requires only very moderate Vdd reduction to values that are routinely

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1

0 2 4 6 8 10 12

i(V

DD

) A

Time (ns)

Fig. 7. Supply drawn current for PRINCE

-0.0005 -0.0004 -0.0003 -0.0002 -0.0001 0 0.0001 0.0002 0.0003

8700 8710 8720 8730 8740 8750

i(V

DD

) A

Time (ps)

Normal(0->1) Trojan(0->1) Normal(1->0) Trojan(1->0)

Fig. 8. Supply drawn currents for the normal and Trojan inverters

observed in regular operation due to power-supply noise [16]. Consequently, voltage detectors will have to tol-erate power-supply voltages of 10 to 20% below the nominalVdd, which are sufficient for Trojan activation. If

there are no voltage detectors in the circuit, the attacker can increase the probability of activation by further lowering Vdd, as can be inferred from Figure 6.

(7)

P

k0 k1

RC0 R1

k1

RC1

. . . R5

k1

RC5

S M’ S-1 _R-1 6

k1

RC6

. . . R-1 10

k1

RC10_RC 11

k1 k2

C

M S

RCik1

M-1_S-1

RCj

k1

Fig. 9. Layout of PRINCE.

3) Frequent key exchange: The repeated encryptions must all use the same secret key, in order to gather consistent data for cryptanalysis. If the key is frequently exchanged, the attacker may not succeed in performing a sufficient number of encryptions using the Trojan-affected circuit to break the cipher. Moreover, if the attacker does determine the key, this key is only useful to access data that was being processed before the last key exchange. In other words, the attacker could log the encrypted data and, once the key has been determined, apply this key to decrypt these data. As soon as a new key has been generated, a new round of cryptanalysis is required. Note, however, that key distribution may be interrupted or compromised in a scenario where the attacker has physical access to the circuit. This system-level countermeasure also does not identify a Trojan but only alleviates its effect if one is present. Moreover, frequent key distribution can be costly and security is generally traded off for efficiency.

V. TROJAN FAULTATTACK ONPRINCE

In this section we describe the block cipher PRINCE, as specified in [12], report on our hardware design of the cipher, outline the fault-based cryptanalysis of PRINCE using the Multi-Stage Fault Attack algorithm [7] and explain the fault-injection techniques using the proposed MAPLE Trojans.

A. Specification of PRINCE

PRINCE is a 64-bit block cipher with a 128-bit key. Before an encryption (or decryption) is executed, a 64-bit subkey k2 is derived from the user supplied 128-bit

key k0 kk1 via k2 = (k0 ≫1)⊕(k063), where

and ≫ denote non-cyclic and cyclic shift, respectively. The subkeys k0 and k2 are used for input- and

output-whitening. The core of PRINCE is a 10-round block cipher which solely uses k1 as subkey. Figure 9 gives an

overview of the cipher.

Each round Ri and R−1_j with i ∈ {1, . . . ,5} and

j ∈ {6, . . . ,10} consists of four operations: a key addition; an S-layer that applies a combinational SBoxS

shown in Figure 10a; a linear layer which multiplies the

x 0 1 2 3 4 5 6 7 8 9 A B C D E F

S[x] B F 3 2 A C 9 1 6 7 8 0 E 5 D 4

(a)

i Round constantRCi

0–2 0000000000000000,13198a2e03707344,a4093822299f31d0,

3–5 082efa98ec4e6c89,452821e638d01377,be5466cf34e90c6c,

6–8 7ef84f78fd955cb1,85840851f1ac43aa,c882d32f25323c54,

9–11 64a51195e0e3610d,d3b5a399ca0c2399,c0ac29b7c97c50dd

(b)

Fig. 10. PRINCE SBox (a), round constants (b)

state (represented by a 64-bit row vector) by a matrix

M or M0 _{(see the original specification [12] for the}

exact definitions ofM andM0_{); and the addition (bitwise}

XOR) of a round constant (see Figure 10b).

B. Hardware Implementation of PRINCE

We designed a combinational (fully unrolled) imple-mentation of PRINCE that executes a complete encryp-tion or decrypencryp-tion in one round, and a sequential version with one cycle per round. We implemented the circuit using Synopsys DC Compiler and Nangate Open Cell Library(45nm). The gate count and power details of the combinational version, which includes 16 identical SBoxes, are presented in Table I and shown to be highly competitive with the original circuit reported in [12]. We use this circuit for Trojan-based fault injection and cryptanalysis.

TABLE I

ASICIMPLEMENTATION DETAILS FORPRINCE (ENCRYPTION

AND DECRYPTION)

Original design Our design

Area (GE) 8260 8320

Power (mW) 38.5 41.2

C. Differential Fault Analysis of PRINCE

(8)

k1

RCi

SR-1 M’ S-1, k1

RCi+1

SR-1 M’ S-1, k1

RCi+2 r

f f ϕ(f)

0 ϕ(f) 1 ϕ(f) 2 ϕ(f) 3

r + 1

w x y z w x y z ϕ(w) 0 ϕ(w) 1 ϕ(w) 2 ϕ(w) 3 ϕ(x) 2 ϕ(x) 3 ϕ(x) 0 ϕ(x) 1 ϕ(y) 3 ϕ(y) 0 ϕ(y) 1 ϕ(y) 2 ϕ(z) 3 ϕ(z) 0 ϕ(z) 1 ϕ(z) 2

Fig. 11. Fault propagation in PRINCE over twoR−1 rounds [7]

stages are outlined below; refer to [7] for an in-depth discussion.

Stage 0 aims at restricting the keyspace for the expression (k1 ⊕ k2). The fault is injected into the

state of the cipher in the beginning of round R−1₉ . The general scheme of the attack does not specify the means of fault injection, but in this paper we employ the MAPLE Trojans, as described in Section V-D. The state of PRINCE is organized in 4-bit nibbles, and the attack requires that the fault injection is restricted to one nibble, that is, an arbitrary number of bits in the nibble have to be flipped, but all other nibbles must remain unaffected. Based on the obtained faulty ciphertext pair

C0 _{and the fault-free ciphertext} _C_{, the fault equations}

are constructed as follows.

Let i ∈ {0, . . . ,15}, and let vi and vi0 be variables

representing the ith _{nibble of the correct and the faulty}

ciphertext, respectively. Let the variables pi represent

the nibbles of the key and qi the nibbles of the round

constant RC11. The ith nibble of the state of the

fault-free circuit just before the application of the final SBox

S−1 _{of round} _R−1

10 is S(vi ⊕ pi ⊕qi). The Boolean

difference of each such nibble between the fault-free and the faulty circuit isS(vi⊕pi⊕qi)⊕S(v0i⊕pi⊕qi). On the

other hand, if the fault f is injected into the circuit state before the beginning of round R−1₉ , the corresponding Boolean difference of this state will be simply f. If the fault is restricted to one nibble l, it is possible to symbolically propagate the Boolean difference through round R−1₉ to the step in round R−1₁₀ just before SBox application. It can be shown that the value of nibble

i in that state has the shape ϕji(a) for some 4-bit value a = b0 k b1 k b2 k b3. Here, ϕji(a) is equal to a except for the jth

i bit which is set to 0 (e. g.,

ϕ2(a) = b0 k b1 k 0 k b3). The values of indices ji

are derived from the nibble l into which the fault has been injected using the propertis of the matrixM0_{. Their}

values are shown below:

(j0, . . . , j15) =

        

(0,1,2,3,2,3,0,1,3,0,1,2,3,0,1,2), ifl∈ {0,7,10,13}

(3,0,1,2,1,2,3,0,2,3,0,1,2,3,0,1), ifl∈ {1,4,11,14}

(2,3,0,1,0,1,2,3,1,2,3,0,1,2,3,0), ifl∈ {2,5,8,15}

(1,2,3,0,3,0,1,2,0,1,2,3,0,1,2,3), ifl∈ {3,6,8,12}

Sixteen fault equations are obtained by propagating the Boolean difference of the observed fault-free and faulty ciphertext from the circuit outputs backward to

R₁₀−1 and equating them with the Boolean difference f

due to the fault injection propagated forward to the same round as shown in Figure 11:

S(vi⊕pi⊕qi)⊕S(v0i⊕pi⊕qi) =         

ϕ_ji(w), i∈ {0, . . . ,3}

ϕ_ji(x), i∈ {4, . . . ,7}

ϕ_ji(y), i∈ {8, . . . ,11}

ϕ_ji(z), i∈ {12, . . . ,15}

Here,w,x,yandzare unknown (free) 4-bit variables. A solution of this system of 16 fault equations contains these four variables and 16 nibbles of the secret key

p1, . . . , p16. One of the solutions is guaranteed to be the

correct secret key. The keyspace for(k1⊕k2)is restricted

from 264 _{possible values in the beginning by excluding}

values that are inconsistent with the fault equations. This filtering is done in three steps, Evaluation, Inner FilteringandOuter Filtering; the details are omitted here and can be looked up in the Appendix (Section VIII). The size of the restricted keyspace is compared with a user-defined threshold. If the threshold is exceeded, another fault injection followed by another round of filtering is applied. Otherwise, stage 0 is finished and stage 1 starts with a set of key candidates for(k1⊕k2)

as input.

Stage 1 aims at shrinking the keyspace for k1. It

is performed similar to stage 1, with the following distinctions. The fault is injected in the beginning of roundR−1₈ (instead ofR−1₉ ) and the faulty ciphertextC0

is obtained. The analysis is repeated for each (k1⊕k2)

candidate calculated in stage 0. For each such can-didate, the last operations are “peeled off” by using

M(S(C⊕k1⊕k2⊕RC11))⊕RC10 instead of C and

M(S(C0⊕k1⊕k2⊕RC11))⊕RC10instead ofC0. Then,

the same fault equations are constructed and filtering is applied. When the keyspace falls below a user-defined threshold, brute-force search is applied for all remaining

k1 values using the k0 part of the key calculated from

(k1⊕k2). If brute-force search is unsuccessful, the guess

of (k1 ⊕k2) was wrong and stage 1 is repeated with

(9)

D. Fault Injections using MAPLE Trojans

The attack outlined in the previous section requires fault injection in single nibbles of roundsR−1₈ andR−1₉ . Faulty ciphertexts obtained from faults inR−1₉ (R−1₈ ) are used during stage 0 (stage 1) of the algorithm. As the round constants of PRINCE can be implemented through inverters, they are an ideal target for MAPLE Trojans. In order to maximize the probability of a fault injection we choose those nibbles of a round constant with a high number of inverters, namely nibble 5 from RC8 and

nibble 16 from RC9. As can be seen in Figure 10b,

these nibbles have the value d and therefore there are3 Trojan inverters for stage 0 and stage 1.

RC8 M−1 _S−1

k1 c882d32f25323c54

. . .

nibble 6 nibble 5

nibble 5 nibble 4

T

regular

Trojans

Fig. 12. Trojan insertion inRC8 of PRINCE for injecting stage1

faults

The insertion of Trojan inverters into the design is illustrated in Figure 12. In the upper part the layout of round 8is depicted. The lower part shows the section of the round constants (RC8) addition at bit level, where

the MAPLE Trojans are inserted into the nibble with value d. The Trojan inverters are coloured in black and marked with a white “T”. It is important to note that faults may be injected in both stages simultaneously most of the time. However, at certain times the faults occur solely in one of the two stages. In that case we capture the resulting faulty ciphertexts and use them later for our cryptanalysis. Figure 13 shows the trigger factor of the TrojanConcinverters inserted into the combinational version of PRINCE. We observed a similar behaviour forTrojanArea inverters. The probabilities correspond to the case where the faults occur only in one of the two stages and not in both simultaneously.

E. Impact of Sequential Design

All the discussions above are for a combinational design of the PRINCE cipher, which can perform encryp-tions and decrypencryp-tions in a single clock cycle. In order

1e-05 0.0001 0.001 0.01 0.1 1 10 100

0.65 0.7 0.75 0.8 0.85 0.9 0.95

Trigger factor (%)

V_dd

Combinational design Sequential design

Fig. 13. Triggering factor of the Trojan inverters inserted into

PRINCE

to evaluate the trigger factor in a sequential design, we implemented PRINCE such that each round in the cipher takes one clock cycle, with a total of ten clock cycles for encryption and decryption. Though it takes more clock cycles, the frequency of regular operation (∼1.7 GHz) is much higher than the combinational design (∼150 MHz). As the sequential design allows an attacker to manip-ulate the supply voltage over a particular clock cycle, the number of simultaneous bit-flips can be reduced. However, to achieve higher trigger factor, the frequency must be reduced, as switching the supply voltage from one level to another takes considerable amount of time. Recent figures indicate that switching the voltage by ±0.1V takes around 20ns [17]. So, when the cipher is under attack it should be operated at a much lower frequency than the maximum frequency possible in the sequential design. Also, the sequential design does not completely eliminate the simultaneous bit-flips, albeit their likelihood is much lower (∼5%). The activation probabilities ofTrojanConcinserted in sequential design of PRINCE are shown in Figure 13.

VI. EXPERIMENTALRESULTS

In this part we report on the experimental results of the differential fault analysis of PRINCE using MAPLE Trojans. The analysis was executed on a workstation with an AMD Opteron Processor 6172 operating at 2.1GHz. The attack was executed 10000times, in each case on a data set of 50 triples (C, C0, C00) where C

(10)

MAPLE Trojans using SynopsysR NanoSimTM. Table II summarizes the results of the attack and shows that on average between 4 and 5 faults are necessary to successfully reconstruct the 128-bit key.

TABLE II

OVERVIEW ON THE NUMBER OF REQUIRED FAULTS

Stage Min Max Avg Median

0 2 3 2.06 2.0

1 2 11 2.84 3.0

We modified the thresholds for the Multi-Stage Fault Attack algorithm to rather low values:212_{for stage}₀_and

216_{for stage 1. In general, higher thresholds lead to less}

required fault injections but increase the complexity of subsequent post-processing. In our case, fault injections using Trojans are relatively easy to perform; therefore we opted for lower values and observed approximately one more required fault on average, compared with [7]. The run time of the post-processing algorithm, implemented in Python and employing no parallelization, was around 13 seconds.

We also analysed the distributions of the number of remaining keys after one and two fault injections and compared them to the theoretical results. The distribu-tions are shown in Figure 14. The upper graph shows the results for stage 0 and the lower graph for stage 1. The x-axis denotes the logarithms with respect to base 2 of the number of key candidates and the y-axis shows the (rounded) rate how often a particular number of key candidates occurred. There are cases where much more faults (up to 11) are required, but 2 faults were the minimum for every instance. As every exploitable fault requires a reasonable number (104₋₁₀6₎ _{of fault}

injections, a complete attack using 4 – 5 exploitable faults is feasible.

VII. CONCLUSION

We presented two parametric manufacturing process level Trojans and demonstrated their application to a suc-cessful fault-based attack on the state-of-the-art cipher PRINCE. The Trojans are extremely stealthy and nearly impossible to detect by today’s methods. The conducted attack is modeled by a cross-level framework which combines electrical-level simulation of an optimized PRINCE circuit with advanced post-processing based on multi-stage filtering techniques. The considered attack is particularly challenging for Trojan-based fault injection, as two fault locations in different rounds have to be used which might interfere with each other. Even with this

0 5 10 15 20 25 30 35 40 45 0

200 400600 800 1000 1200 1400

rate

1 Fault 2 Faults

0 5 10 15 20 25 30 35 40 45

log2(#keys)

0 200 400600 800 1000 1200 1400

rate

1 Fault 2 Faults

Fig. 14. Analysis results for stage0(upper) and1(lower)

restriction, the effort to inject a fault in a well-defined location is so low that the number of injected faults can be increased in order to reduce the post-processing complexity.

REFERENCES

[1] M. Tehranipoor and F. Koushanfar, “A Survey of Hardware

Trojan Taxonomy and Detection,” IEEE Design & Test of

Computers, vol. 27, no. 1, pp. 10–25, 2010.

[2] B. Cha and S. Gupta, “Trojan Detection Via Delay Measure-ments: A New Approach to Select Paths and Vectors to

Maxi-mize Effectiveness and MiniMaxi-mize Cost,” inDesign Automation

and Test in Europe, 2013, pp. 1265–1270.

[3] S. International, “Circuit Camouflage Technology - SMI IP Pro-tection and Anti-Tamper Technologies,” White Paper Version 1.9.8j, 2012.

[4] A. Barenghi, L. Breveglieri, I. Koren, and D. Naccache, “Fault Injection Attacks on Cryptographic Devices: Theory, Practice

and Countermeasures,”Proc. IEEE, vol. 99, 2012.

[5] M. Tunstall, D. Mukhopadhyay, and S. Ali, “Differential Fault Analysis of the Advanced Encryption Standard Using a Single

Fault,”LNCS, vol. 6633, pp. pp. 224–233, 2011.

[6] P. Jovanovic, M. Kreuzer, and I. Polian, “A Fault Attack on the

LED Block Cipher,” inInt’l Workshop on Constructive

Side-channel Analysis and Secure Design (LNCS 7275), 2012, pp.

120–134.

[7] P. Jovanovic, M. Kreuzer, and I. Polian, “Multi-Stage Fault Attacks on Block Ciphers,” Cryptology ePrint Archive, Report 2013/778, 2013.

[8] N. Bagheri, R. Ebrahimpour, and N. Ghaedi, “New Differential

Fault Analysis of PRESENT,”EURASIP Journal on Advances

in Signal Processing, no. 1, pp. 1–10, 2013.

[9] L. Song and L. Hu, “Differential Fault Attack on the PRINCE Block Cipher,” Cryptology ePrint Archive, Report 2013/043, 2013.

[10] H. Bar-El et al., “The Sorcerer’s Apprentice Guide to Fault

Attacks,”Proc. IEEE, vol. 94, pp. 370–382, 2006.

[11] G. Becker, F. Regazzoni, C. Paar, and W. Burleson, “Stealthy

(11)

and Embedded Systems - CHES 2013, 2013, vol. 8086, pp. 197–214.

[12] J. Borghoffet al., “PRINCE – A Low-Latency Block Cipher for

Pervasive Computing Applications,” inAdvances in Cryptology

– ASIACRYPT 2012, ser. Lecture Notes in Computer Science,

X. Wang and K. Sako, Eds. Springer Berlin Heidelberg, 2012, vol. 7658, pp. 208–225.

[13] Y. Shiyanovskii et al., “Process Reliability Based Trojans

Through NBTI and HCI Effects,” in Adaptive Hardware and

Systems (AHS), 2010 NASA/ESA Conference on, 2010, pp. 215–

222.

[14] “International technology roadmap for semiconductor (itrs),” 2006.

[15] H. Hao and E. McCluskey, “Very-Low-Voltage Testing for

Weak CMOS Logic ICs,” in Test Conference, 1993.

Proceed-ings., International, 1993, pp. 275–284.

[16] I. Polian, “Power Supply Noise: Causes, Effects, and Testing,”

ASP Jour. Low-Power Electronics, vol. 6, no. 2, pp. 326–338,

2010.

[17] W. Kim, M. S. Gupta, G.-Y. Wei, and D. M. Brooks, “Enabling On-Chip Switching Regulators for Multi-Core Processors Using

Current Staggering,” inIn Proceedings of the Work. on

Archi-tectural Support for Gigascale Integration, 2007.

VIII. APPENDIX: PRINCE KEYCANDIDATE

FILTERING

In this part we provide details on the steps done during the cryptanalysis of a correct and faulty ciphertext pair shown in Figure 11.

Evaluation. Each equation Ei is evaluated for all

possible 4-bit values u of nibble candidates associated to the variable pi. When the result of an evaluation

t = Ei(u) has been computed, the tuple (t, u) is

appended to the set Si.

Inner Filtering. In this step we check for all tuples (t, u) ∈Si if the entry t is valid with respect to the bit

pattern ϕji. Those tuples that do not have a valid entry t are discarded, all others are kept.

For example, we take fault equation E0 and assume

that a fault was injected in nibble l = 0. From the definition ofjiabove we see that the0-th entry ofj0 is0.

Moreover, we assume that the tuple (t, u) = (0x7,0x3) is an element of S0 and observe immediately that 0x7

matches the bit pattern ϕj0 = 0 k s1 k s2 k s3. Thus (0x7,0x3)is a valid tuple and0x3a potential candidate for the nibble associated to p0.

Outer Filtering. The idea in the final filtering step is to exploit the fact that the elements of the sets

S4·m, . . . , S4·m+3 are related to each other for a fixed

m∈ {0, . . . ,3}. This is due to the fact that the right-hand sides of the equations E4·m, . . . , E4·m+3 are derived

from a common pre-image. This can be utilized to build conditions for filtering candidates of the nibbles associ-ated to p4·m, . . . , p4·m+3. First we fix m ∈ {0, . . . ,3}

and order the tuples (t4·m+n, u4·m+n)∈S4·m+n

lexico-graphically for all n ∈ {0, . . . ,3}. Then we compute the sets P4·m+n containing the pre-images of all the

values t4·m+n. This is done as follows. After the Inner

Filtering, all valuest4·m+nmatch the bit pattern derived

from ϕ4·m+n. But we do not know if the j4·m+n-th bit

of t4·m+n had value 0 or 1 before it was fixed to 0.

Hence we obviously have two possible values for the pre-images oft4·m+n. One ist4·m+nitself, and the other

has a 1 at bit position j4·m+n. Then we intersect the

pre-image sets P4·m, . . . , P4·m+3 with each other and

obtain a set Gm of pre-image candidates. After that we

check for each gm ∈ Gm if, for every n ∈ {0, . . . ,3},

there is at least one tuple inS4·m+nwhich has the value

ϕi4·m+n(gm) in its first component. If so, gm is a valid pre-image. When all pre-images have been processed, all tuples are deleted from the setsS4·m, . . . , S4·m+3, except

those where the first entry has a valid pre-imagegm.

Finally, after projecting the sets Si to their second

components ui, the Cartesian product over those

projec-tions is computed to get the key candidates fork1⊕k2.

If the number of candidates fork1⊕k2 is small enough,

stage 0 of the attack ends. Otherwise the whole pro-cedure from above is executed again and the resulting candidate set is intersected with the ones from all the previous runs of the procedure. This is repeated until the set of candidates fork1⊕k2 is smaller then a previously

defined threshold τ0.

In the second stage of the attack, candidates fork1are

computed using the previously described configurations for the fault injections. This is repeated until the number of candidates for k1 falls below the specified threshold

valueτ1. As soon as this is the case, the candidates fork1

andk1⊕k2 are used to derive candidates for the subkeys

k2 andk0. Finally, a brute-force search on the Cartesian

product K0 ×K1 is performed to find the actual key