Logic level countermeasures - DPA countermeasures

2.2 DPA countermeasures

2.2.1 Logic level countermeasures

During the last years, many logic styles that counteract SCA attacks have been proposed. The big advantage of counteracting SCA attacks at the logic level is that this approach treats the problem right where it arises. If the basic building blocks, i.e. the logic cells, are resistant against this kind of attacks, a designer can build a digital circuit with an arbitrary functionality and it will also be resistant against SCA attacks. Having SCA- resistant cells means that hardware as well as software designers do not need to care about side-channel leakages any more. This greatly simplifies the design flow of a cryptographic device. Only the designers of the logic cells themselves need to be aware of these risks.

The data-dependencies in the dynamic power consumption causes four major sources of leakage:

1. Energy imbalance. Energy imbalance can be measured as the variation in energy consumed by a circuit processing different data.

2. Exposure time. The longer the imbalance is visible, the easier it is to measure.

3. Early propagation. The early propagation is the ability of a gate to fire without waiting for all its inputs. Early propagation causes the data-dependent distribution of circuit switching events in time. The effect of early propagation is bounded by half of the clock cycle. One way to avoid the early propagation is to balance all paths by inserting buffers in such a way that all inputs of each gate arrive simultaneously. 4. Memory effect. The memory effect is the ability of a CMOS gate to remember its

previous state, due to capacities not being completely discharged.

The basic idea of these proposals is to design logic cells with a power consumption that is independent of the data they process. Essentially, there exist two approaches to build such cells. The first approach is to design these cells from scratch. This implies that a completely new cell library needs to be designed for every process technology. Examples of such logic styles are SABL [173], RSL [169], DRSL [45], and TDPL [164].

The alternative to this approach is to build secure logic cells based on existing standard cells. In this case, the design effort for new cell libraries is minimal. This is the motivation for logic styles like WDDL [174] or MDPL [143].

2.2. DPA countermeasures

The logic families used as countermeasure can be divided in hiding and masked logic styles. Hiding logic adds redundant logic so as to end up with a constant activity when sensitive bits are manipulated. Masked logic does not use special differential routing but instead randomizes the signals on the complementary wire pairs. As a result, the remaining leakage is assumed to be randomized to the quality of the random numbers provided.

Of course, each of the proposed logic styles also has other pros and cons besides the design effort for the cells. Dual-rail Precharge Logic styles (e.g. SABL, TDPL, WDDL), which belong to the group of hiding logic styles, are for example smaller than masked logic styles (e.g. MDPL, RSL, DRSL). However, the security of Dual-rail Precharge Logic (DPL) styles strongly depends on the balancing of complementary wires in the circuit, while this is not the case for masked logic styles. Design methods to balance complementary wires can be found in [175], [85], and [176].

On the other hand, also attacks against these secured logic styles have been published. Most of them exploit circuit “anomalies” as for example glitches and the early propagation effect.

The dual-rail concept can be applied to ASIC and to FPGA. SABL and SecLib [84, 83] belong to the first group. SecLib is a cell library compatible with standard cells that has been proven to be one of the most secure solutions. However, creating an ASIC is a great cost overhead. A major advantage of FPGA compatible DPL is that they can be incorporated by the common tool flow. The automated design flow generates a secure design from the VHDL netlist. The digital designer does not need specialized understanding of the methodology. Since the FPGA design tools miss the flexibility required for balanced routing, most of the efforts in this direction led to duplication schemes. They follow the idea of dual-rail concept without precharging the signals. This indeed leads to making a dual copy of a fully placed-and-routed circuit which should consume the complement amount of energy that the original counterpart does.

In this Ph.D. thesis we focus on solutions that are compatible with existing resources. We describe in more detail the logic families compatible with FPGA.

Dual Rail Precharge Logic

Dual-rail Precharge Logic (DPL) styles aim to consume an equal amount of power for every transition of a node in a circuit. The constant power consumption is achieved by guaranteeing that in every clock cycle one of these two wires is charged and discharged again. Which one of the two wires performs this charge and discharge operation depends on the logical value that the wires represent. This design of secure implementations using DPL is independent of the cryptographic algorithm.

A constant power consumption can only be achieved if the complementary wires have the same capacitive load. Otherwise, the amount of energy needed per clock cycle would depend on which of the two nodes is switched and therefore would be correlated to the logical value. Unfortunately, the requirement to balance the capacitive load of two wires is hard to fulfill in a semi-custom design flow. They present a significant overhead in area

Chapter 2. Related work

and energy consumption.

One major hurdle for DPL is the routing of complementary nets. Any routing asym- metry between complementary nets results in unbalanced parasitic capacitive loading and in a residual power variation between direct and complementary transitions. The goal is to keep routing of the direct and complementary gates close to each other, so that resulting nets are as symmetrical as possible. Proposed routing techniques include “fat wire” [175] and “backend duplication” [85] as countermeasures in the placement and routing to im- prove the DPA-resistance.

The first DPL is Sense Amplifier Based Logic (SABL) [173], introduced in 2002. Sense Amplifier Based Logic (SABL) is a full-custom logic based on two principles. First, it is a Dynamic and Differential Logic (DDL) and therefore has exactly one switching event per cycle and this independently of the input value and sequence. Second, during a switching event, it discharges and charges the sum of all the internal node capacitances together with one of the balanced output capacitances. Hence, it discharges and charges a constant capacitance value.

Wave Dynamic Differential Logic (WDDL) [174] emulates the behavior of SABL using CMOS standard cells, requiring less designing effort. Wave Dynamic Differential Logic (WDDL) conceives a secure version of the AND- and OR-operator. Any other logic function in Boolean algebra can be expressed with these two differential operators. It introduces more overhead in area and energy consumption than SABL.

WDDL gates remove the logic cells required to precharge the outputs to 0, and so they do not precharge simultaneously. Instead of a precharge signal that resets the logic, there is a precharge wave: hence the name “Wave Dynamic Differential Logic”.

It is possible to place and route the original gate-level netlist and subsequently take the layout and interchange the AND- and OR-gates. This Divided Wave Dynamic Differential Logic has the same properties of the original WDDL and avoids the requirement of the differential signals to be matched to guarantee constant load capacitance.

Two main factors have been pointed out as leakage sources that can be exploited by DPA attacks [175, 168]:

1. Leakage caused by the difference of loading capacitance between two complementary logic gates in a WDDL gate.

2. Leakage caused by the difference of delay time between the input signals of WDDL gates.

The impact of these leakages has been studied in [103]. The problem is called Early Propagation Effect (EPE). The implementation might show the output value with different time delays. If the output of an implementation can be known while an input is not known yet, i.e. an OR function with one known input that is true, there is a leakage about the value of this input and the output. This EPE-related vulnerability is known as “early evaluation”.

The S-box design method for low power consumption proposed by Morioka et al. [131] is recommended as one technique to reduce this leakage.

2.2. DPA countermeasures

According to [134], there are two rules to follow in order to avoid EPE-related vulner- abilities: evaluation phase starts after all the input signals are valid, and precharge phase begins after all the inputs becomes NULL and the evaluation outputs are memorized or before the first input becomes NULL (no memory).

Bucci et al. [38] propose Three-phase Dual-rail Precharge Logic (TDPL), a DPL family whose power consumption is insensitive to unbalanced load conditions thus allowing adopting a semi-custom design flow (automatic place and route) without any constraint on the routing of the complementary wires. An additional discharge phase is performed after the pre-charge/evaluation steps. It can be implemented as an improvement of the SABL logic with a limited increase in circuit complexity. It shows a reduction of one order of magnitude in the normalized energy deviation with respect to SABL [164].

The implementation of cryptographic algorithms is typically done using a bottom-up approach. A set of secured primitives is build using the logic style, i.e. XOR, MUX or ROM, which are used as basic cells for implementing the algorithm.

Balanced Cell-based Dual-rail Logic (BCDL) is a high speed balanced DPL for FPGA with global precharge and no early evaluation [134] It uses a rendezvous cell to indicate that the inputs are valid. The output of this cell and the global precharge signal are the triggers for the evaluation phase to begin. The following precharge starts as soon as the precharge signal changes. It precharges before the first input becomes NULL. The synchronization signals are introduced at LUT-level to avoid early evaluation in LUTs. The amount of inputs available for logic implementation depends on the presence of rendezvous signal in the LUT, that reduces the capacity from 6 to 2.

DPL-noEE [25] follows a similar approach. Instead of having a global precharge signal, the synchronization is built into the truth tables used to implement a logic function in the LUT. Moreover, DPL that avoid EPE are also natively protected against intrusive SCA such as Differential Fault Analysis.

However, according to [128] preventing early propagation in the evaluation phase might be not enough to reduce information leakage. Moreover, both of these proposals lack of place and routing guidelines to reduce leakage. In [128] they propose a new evolu- tion of WDDL for FPGA called Asynchronous WDDL (AWDDL). AWDDL guarantees the disappearance of early propagation in both evaluation and precharge phases, using a latch inside every LUT by means of a feedback loop. To avoid the routing imbalances of DPL they present a customized routing algorithm that works on the resulting placement, with constraints for complementary gate placement, of automatic tools. The routing algorithm evaluates all the possible routes for each dual-rail connection, assigning weights regard- ing the signal delays or the interconnection resources used. A final free-conflict route is selected from the available, if satisfiable, and the remaining paths are automatically re- routed.

Precharge-Absorbed Dual-rail Precharge Logic (PA-DPL) is a specific configuration for FPGA LUTs presented in [87]. A 6-input LUT is used to make the delay of the gates constant. They implement an EPE-preventing logic using a compound precharge signal, P rch ∗ Ex. P rch is the precharge signal, similar to other logic proposals, and Ex delays

Chapter 2. Related work

frequency, reducing the duty cycle of the compound signal to approximately 25%. As long as the signal Ex is advancing P rch by ∆t, and the duty cycle of Ex guarantees to

cover any early evaluation of the implemented function, EPE-problems are prevented. PA-DPL enables the implementation of up to 4-input logic gates or functions, as two of the inputs are reserved for the compound precharge signal. An interleaved automatic placement of the complementary functions is proposed in [88] As an enhancement to avoid EMA attacks. In [89] the authors present a tool to automatically transform the logic from a raw single rail on Xilinx Design Language to PA-DPL.

Masked Logic Styles

Besides hiding, the second way to hide the data-dependent power consumption was to randomize the power consumption. We have already presented masking techniques at the algorithm level (see Section 1.3), but they can also be applied to the logic level.

Masking-based logic attempts to remove the correlation between power consumption and secret data by randomizing the power consumption such that the correlation is destroyed. Masking-logic works by XOR-ing signals with a masking-bit, then later removes the mask by doing another XOR operation. For example, a 2-input XOR function g = a⊕bis replaced with gm= am⊕bmand mg = ma⊕mb. We refer to the implementation

gmas M-XOR gate. The mg signal is called its correction mask.

Masking on the gate level was first considered in the patent US 6295606 of Messerges et al. in 2001. However, the described masked gates are extremely big because they are built based on multiplexers. A different approach has been pursued later on by Gammel et al. in patent DE 10201449. This patent shows how to mask complex circuits such as crypto cores, arithmetic-logic units, and complete micro controller systems.

The problem with those masked logic styles is that glitches occur in these circuits. As shown in [169], glitches in masked CMOS circuits reduce the SCA resistance significantly. Therefore, glitches must be considered when introducing an SCA countermeasure based on masking.

Popp and Mangard [143] proposed a new logic style called Masked Dual-rail Precharge Logic (MDPL) that applies random data masking into WDDL gates. Every signal in an Masked Dual-rail Precharge Logic (MDPL) circuit is masked with the same mask m. There are no constraints for the place and route process. All MDPL cells can be built from standard CMOS cells that are commonly available in standard cell libraries.

An MDPL AND gate takes six dual-rail inputs (am, am, bm, bm, m, m) and produces

two output values (qm, qm). In an MDPL circuit, all signals are precharged to 0 before the

next evaluation phase occurs, including the mask signals. The outputs are calculated by a majority gate, which is available in a typical CMOS standard cell library. However, the use of MDPL gates has a significant cost. It implies an average area increment by 4.5, the maximum speed is also reduced by 0.58, and the power consumption is increased by 4 to 6.

Leakage caused by the difference of delay time between the 6 input signals of MDPL gates (Early Propagation Effect), as pointed out in [167]. In order to implement the secure

2.2. DPA countermeasures

logic circuits using MDPL gates, it is required to adjust differences in the delay time between the input signals.

Gierlichs [74] points out another possible source of leakage. Studying the switching behavior of a majority gate in detail, he discovered that there are “internal nodes” in the pull up and pull down networks, which can not always be fully charged/discharged, depending on the input signals’ values and delays. This fact induces a kind of memory effect, which can be data-dependent.

To avoid leakages, mask signals need to be unbiased, and the circuit must consume the same power for each value of the unmasked input in a statistical sense. However, real circuits have a large number of higher-order effects that can cause the power consumption to become dependent on the circuit state. Two of them have been studied by Chen et al. [44]: glitches and inter-wire capacitance. When intermediate values contain both the information of the mask and the corresponding masked value an attack based on the pdf is possible [172, 154].

To solve this problem, Suzuki et al. [169] propose Random Switching Logic (RSL), a logic style which does not require complementary operations and it avoids glitches in the circuit. Yet, Random Switching Logic (RSL) needs a careful timing of enable signals.

RSL replaces traditional logic gates such as NAND, NOR, and XOR gates with their RSL version respectively, a new standard cell library. Each RSL gate has four inputs as opposed to two of their traditional counterparts. The two extra inputs are enable and RandomBit. The RandomBit input is used to alter transition probabilities of the circuit and achieve the randomized switching property. The enable signal on the other hand is used to suppress spurious transitions. The circuit starts operating when enable is asserted, otherwise, circuit is driven to logic-0.

Assuming RSL enable signal can be created and routed, it is shown through simulated attacks in [172], that RSL can still be attacked fairly effectively.

Logic styles that are secure against DPA attacks must avoid early propagation. Otherwise, a power consumption occurs that depends on the unmasked data values due to data-dependent evaluation moments. In [45], the logic style Dual-Rail Random Switching Logic (DRSL) is presented. In Dual-Rail Random Switching Logic (DRSL), a cell avoids early propagation by delaying the evaluation moment until all input signals of a cell are in a valid differential state.

Popp et al. [142] point out that DRSL does not completely avoid an early propagation effect on the precharge phase. The reason is that the input signals, which arrive at different moments, can still directly precharge the DRSL cell. The propagation delay of the Evaluation-Precharge Detection Unit (EPDU) leads to a time frame in which this can happen. Only after that time frame, the EPDU unconditionally precharges the DRSL cell.

In [142] Popp et al. present and improved MDPL (iMDPL), with respect to the early propagation effect. It includes an EPDU. The price that has to be paid for the improvements in terms of early propagation is a further significant increase of the area requirements of iMDPL cells compared to MDPL. One can expect an increase of the area by a factor of up to 3 compared to original MDPL. This makes it clear that carefully finding out which parts of a design really need to be implemented in DPA-resistant logic

Chapter 2. Related work

Family Mask Area Dual-rail

placement Synchro Gate Input (LUT- 6) Pre EPE WDDL no Medium together no no 6

MDPL yes High together no no 2

DRSL no High together no yes 2

iMDPL yes Very High together yes yes 2

BCDL no Medium together yes yes 2 (6)

DPL-noEE no Medium together yes yes 3

PA-DPL no Medium interleaved yes yes 4

Table 2.3: DPL summary

is essential to save area.

A significant reduction of the cell size can be achieved by designing new standard cells that implement the functionality of iMDPL. Of course, that has the well known disadvantages of a greatly increased design and verification effort. Furthermore, a change of the process technology would then mean spending all the effort to design an iMDPL standard cell library again.

In [111], Lin et al. propose a logic design style, called precharge Masked Reed-Muller Logic (PMRML) to overcome the glitch and EPE problems. The PMRML design can be fully realized using common CMOS standard cell libraries. It can be used to implement

In document Securing implementations of feedback-shift-register-based ciphers using compiler optimizations and co-processors (Page 46-53)