• No results found

Exploring Energy Efficiency of Lightweight Block Ciphers

N/A
N/A
Protected

Academic year: 2020

Share "Exploring Energy Efficiency of Lightweight Block Ciphers"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Exploring Energy Efficiency of Lightweight

Block Ciphers

Subhadeep Banik1, Andrey Bogdanov1 and Francesco Regazzoni2

1

DTU Compute, Technical University of Denmark, Lyngby

{subb,anbog}@dtu.dk 2

ALARI, University of Lugano

regazzoni@alari.ch

Abstract. In the last few years, the field of lightweight cryptography has seen an influx in the number of block ciphers and hash functions being proposed. One of the metrics that define a good lightweight design is the energy consumed per unit operation of the algorithm. For block ciphers, this operation is the encryption of one plaintext. By studying the energy consumption model of a CMOS gate, we arrive at the conclusion that the total energy consumed during the encryption operation of an r-round unrolled architecture of any block cipher is a quadratic function inr. We then apply our model to 9 well known lightweight block ciphers, and thereby try to predict the optimal value of r at which anr-round unrolled architecture for a cipher is likely to be most energy efficient. We also try to relate our results to some physical design parameters like the signal delay across a round and algorithmic parameters like the number of rounds taken to achieve full diffusion of a difference in the plaintext/key.

Keywords:AES, Lightweight Block Cipher, Low Power/Energy Circuits.

1

Introduction

(2)

TWINE [23]. Possible example of the second approach are implementations of the Advanced Encryption Standard algorithm (AES) [11], SHA-256 [1], or Kec-cak [4].

Focusing on block ciphers in particular, it is important to notice that AES still remains the preferred choice for providing security also in constrained de-vices, even if some lightweight algorithms are now standardized. For this reason, several implementations of AES and its basic transformations (such as S-boxes) targeting low area and low power were proposed in the past, for example, the implementation of Feldhofer et al. [12] and the one of Moradi et al. [19]. The first design is based on a 8-bit datapath, and occupies approximately 3400 Gate Equivalents (GE). The second design features a mixed data path and requires approximately 2400 GE. The work of Hocquet et al. [15] discusses the silicon implementation of low power AES. The authors showed that by exploiting tech-nological advances and algorithmic optimization the AES core, can consume as little as 740 pJ per encryption.

Despite a large number of previous works targeting area and power, only limited efforts were devoted to the optimization of the energy parameter. Energy and power are, for obvious reasons, correlated parameters. Power is the amount of energy consumed per unit time or simply the rate of energy consumption. More specifically, energy consumption is a measure of the total electrical effort expended during the execution of an operation, and the total energy consumed is essentially the time integral of power. However, being directly linked with the battery life or the amount of electrical work to be harvested, energy, rather than power, would become a more relevant parameter for evaluating the suitability of a design. In fact, energy is a much stricter constraint for future cyber-physical systems as well as for the next generation of implantable devices.

Designing for low energy can be significantly different than designing for low power. Furthermore, there is no guarantee that low power architectures would lead to low energy consumption. For instance, block ciphers implemented using smaller datapath and aggressively exploiting serialization to reuse components results generally in a smaller power consumption compared to a round based design having a datapath as large as the block of the cipher. However, serial implementations have high latency, which can be significantly larger compared to round based designs. As a result, the energy consumed per encryption for serial designs could be much higher than the corresponding figure for round based designs.

Starting with the AES algorithm, in this work, we carry out a complete ex-ploration of the implementation choice of block ciphers concentrating on their energy consumption, discussing and evaluating the design choice of each round transformation, and the best trade off between datapath and serialization. From the detailed analysis of this exploration, we extract a model for the energy con-sumption of a circuit, using as reference number of lightweight algorithms re-cently proposed.

(3)

of efficiency for lightweight designs. The authors present a comprehensive study comparing a number of algorithms using different metrics such as area, through-put, power, and energy, and applying state of the art techniques for reducing power consumption such as voltage scaling. However, the evaluation reported in the paper is at very high level and concentrates only on a specific imple-mentation, without considering the effects on energy consumption of different design choices, such as size of the datapath, amount of serialization, or effects of architectural optimization applied at each stage of the algorithm. The second work explores area, power, and energy consumption of several recently-developed lightweight block ciphers and compares it with the AES algorithm, considering also possible optimization for the non linear transformation. However, no possi-ble optimization was considered for other transformations, and effects of other design choices, such as serialization were not considered in the work. In another work [18], a comparison of the energy consumptions of fully and partially un-rolled circuits was done with respect to the latency in the circuit.

1.1 Contribution and Organization of the Paper

In this paper we complete the analysis started with these works, looking at all the parameters which might affect the energy consumption of a design. We start with the case of AES and investigate how the variation in(a)the architectural design of the individual components (S-box, MixColumn),(b)frequency of the clock signal and (c) serializing or unrolling the design can affect the energy consumption. Furthermore, starting with the detailed analysis of our exploration, we build an energy model for anyr-round unrolled architecture of block ciphers. We prove that if all other factors are constant, then the total energy consumed per encryption in anr-round unrolled circuit is quadratic inr. We validate our model by estimating the energy consumed by several lightweight algorithms and comparing it with the figures obtained by simulating their implementations.

The remainder of the paper is organized as follows. Section 2 presents, as a motivating example a detailed study of the AES algorithm from the energy point of view. Section 3 presents our model for estimating the energy consumption of a block cipher, discussing how the contribution of each component is modeled and included into the overall energy consumption equation. Section 4 reports how our model is validated using a number of lightweight algorithms. Section 5, tabulates the final energy figures for all the block ciphers that we have considered, and relates these results to physical parameters like critical path and algorithmic parameters like the minimum number of rounds required to achieve full diffusion of a difference introduced in the plaintext or key. Section 6 concludes the paper.

2

A case study of energy consumption of AES 128

(4)

(a) Architecture of S-Box/MixColumn: It is known that the Canright ar-chitecture [9] is the most compact representation of the AES S-box in terms of gate area. However it is unlikely to be the most efficient energy-wise. We then experimented with a Lookup table based S-box. However, we found that the Decoder-Switch-Encoder (DSE) architecture [5] is the most energy-efficient. We also considered two different variants of the MixColumn archi-tecture. Considering AES MixColumn to be a linear map from{0,1}32 {0,1}32, it can be composed with 152 xor gates by following the

mathemat-ical definition. However, as shown by [20], the outputs of several xor gates can be reused and it is possible to get a compact design in 108 gates. The 108 gate variant is likely to be more energy efficient as it provides a balanced datapath and also uses less gates. In Table 1 ,we present the area and energy per encryption figure for the round based designs for a combination of all the above choices of S-boxes/MixColumns at an operating frequency of 10 MHz (using the standard cell library of the STM 90 nm low leakage process). Clearly the DSE S-box and the 108 gate MixColumn is optimal in terms of energy efficiency.

# S-box MixColumn Area(in GE) Energy Energy/bit (inpJ) (inpJ) 1 LUT 152 gates 13836.2 797.2 6.23 2 LUT 108 gates 13647.9 755.3 5.90 3 Canright 152 gates 8127.9 753.6 5.89 4 Canright 108 gates 7872.5 708.5 5.53 5 DSE 152 gates 12601.7 377.5 2.95 6 DSE 108 gates 12459.0 350.7 2.74

Table 1: Area, Energy figures for round based AES 128 using different component architectures

(5)

consump-tion, for the round based AES architecture (using the DSE S-box and the 108 gate MixColumn) for frequencies ranging from 100 KHz to 100 MHz. We can see that for frequencies higher than 1 MHz, the energy consumption is more or less invariant with respect to frequency.

105 106 107 108

400 600 800 1,000 1,200

Frequency (in Hz)

Energy

(in

pJ

)

Fig. 1: Energy consumption for round based AES 128 over a range of clock fre-quencies

(c) Width of the Data path/Unrolled Design: We performed our exper-iments with numerous serialized implementations of AES in which the data-path width varies from 8 to 32 to 64 bits. For our experiments, we used the 8-bit serial architecture described in [19]. For the 32-bit serialized datapath we used three architectures described as follows

1. A1: In this architecture every round is completed in 9 cycles: 4 for the Substitution operation, 4 for the MixColumn operation and 1 for Shift row. This architecture takes 94 cycles to complete one encryption. 2. A2: In this architecture every round is completed in 5 cycles: 4 cycles are

used for the combined Substitution operation and MixColumn operation and 1 is used for Shift row. This architecture takes 54 cycles to complete one encryption.

3. A3: In this architecture every round is completed in 4 cycles. Extra multiplexers are used to ensure that each clock cycle performs the Shift Row, Substitution and MixColumn operation on a given chunk of 32 bit data. This architecture takes 44 cycles to complete one encryption. Similarly, we used three architecturesB1, B2 and B3 for the 64-bit serial design that takes 52, 32 and 22 cycles respectively. Thereafter we continue to explore lower latency designs like the round based architecture and the 2, 3, 4, 5, 10 round unrolled architectures.

(6)

the standard cell library based on the STM 90nm logic process, at a clock fre-quency of 10 MHz in Table 2. We found that the round based implementation

# Design Area(in GE) #Cycles Energy Energy/bit (pJ) (pJ) 1 8-bit 2722.0 226 1913.1 14.94 2 32-bit (A1) 4069.7 94 1123.3 8.77

32-bit (A2) 4061.8 54 819.2 6.40

32-bit (A3) 5528.4 44 801.7 6.26

3 64-bit (B1) 6380.9 52 1018.7 7.96

64-bit (B2) 6362.6 32 869.8 6.79

64-bit (B3) 7747.5 22 616.2 4.81

4 Round based 12459.0 11 350.7 2.74 5 2-round 22842.3 6 593.6 4.64 6 3-round 32731.9 5 1043.0 8.15 7 4-round 43641.1 4 1416.5 11.07 8 5-round 53998.7 3 1634.4 12.77 9 10-round 101216.7 1 2129.5 16.64

Table 2: Area and Energy figures for different AES 128 architectures

of AES 128 is the most energy efficient. Since the serialized architectures take longer time to complete an encryption operation it was expected that they would consume more energy, but the fact that the round based design was better in terms of energy than its unrolled counterparts was certainly an interesting result. To understand the reason for this we first need to understand which components of the architecture are consuming the most energy. A breakdown of this energy consumption, by percentage of the total energy, for the various components is shown in Figure 2.

We can see that the Substitution layer consumes the most part of the energy budget (24% and 36.3%) in the round based design and the 2-round unrolled designs respectively. However we also find that in the 2-round unrolled design, the second round functions (Substitution Layer, MixColumn, Add round key and round key logic) consume more energy than the first. To understand the reason for this trend, we need to study the energy consumption model in CMOS gates, and start to analyze the situation from there.

3

CMOS Energy Consumption Model

(7)

24 %

S.Layer

20.8 % Reg

15.3 % RKL

12.4 %

MC

10.9 %

Input Mux 10.3 %

ARK 4.9 %

Round Mux 1.4 %

Other

(a) Round based

8 %

S.Layer1 28.3 %

S.Layer2

8.4 % Reg

6 %

RKL1

12.7 %

RKL2 4 %

MC1

10.4 %

MC2

6.9 %

Input Mux 4.4 %

ARK1

7.1 %

ARK2

3.7 %

Round Mux

0.1 % Other

(b) 2-round unrolled

S.Layer: Substitution Layer Reg: Registers MC: MixColumn ARK: Add Round Key RKL: Round Key Logic

Fig. 2: Energy shares for the round based and 2-round unrolled AES 128

shrinking of technologies, static power consumption of CMOS is increasing. Nev-ertheless, static CMOS is likely to continue to be the preferred technology for electronic fabrications in the foreseeable future.

Energy consumption of a static CMOS gate, is defined by the following equa-tion:

Egate=Eload+Esc+Eleakage

where Esc is the energy due to the short-circuit current. Eleakage is the energy consumed due to the sub-threshold leakage current when the transistor is OFF. This component is usually small, but is gaining importance as the technology scaling makes the sub-threshold leakage more significant. Eload is the energy dissipated for charging and discharging the capacitive load CL of a gate when output transitions occur.

Hardware implementations of any cryptographic primitive consist of a num-ber of registers and logic blocks connected together as required by the specifi-cations of the algorithm itself. Block ciphers based on SPN or Feistel designs, consist in particular of a round function and round key generation logic which transform a plaintext and key into a Ciphertext by iterating the round function for a specific amount of rounds. Consider an ideal block cipherEoperating on a plaintext space{0,1}Lp and a key space{0,1}Lk. Its hardware implementation,

illustrated in Figure 3, would include:

A. A state register (SReg) and a key register (KReg) of LpandLk bits respec-tively, to store the intermediate states produced by the round function and the computed round keys.

(8)

C. Depending on the choices of the designer, one or more instances of the round function (RFi) and the round key generation logic (RKi)

D. Additional logic needed to generate control signals, round constants etc.

bbbbbbb

bbbbbbb

Plaintext

Key

Ciphertext SReg

KReg

RF1 RF2 RFr

RK1 RK2 RKr

K1 K2 Kr

Fig. 3: Block Cipher Architecture

A designer, depending on the specific requirements of the target application, can implement the algorithm following different strategies. One of the most im-portant decisions in this respect is the number of instances of the round function to be replicated in hardware. The smallest amount of replication happens when a single instance of the round function and the round key are instantiated: imple-mentations following this style are called round based architectures. The round based architecture of a block cipher which has to be executed forR rounds, can be compute the result of an encryption in R+ 1 clock cycles (1 cycle for the loading of the plaintext/key and the remaining R cycles for executing the R

rounds). A designer can instantiate more than one round function, opting for an unrolled architecture. An r-round unrolled architecture consists of r instances of the round function and round key logic. Encryption on such a circuit would take 1 +R

r

clock cycles.

The selection of the number of round functions instantiated depends on the specific optimization parameters. For instance, an r-round unrolled circuit for high values ofr would require a smaller number of clock cycles to compute the encrypted Ciphertext compared to a round based design. However, its power consumption is usually higher. It thus interesting to investigate how the value of

raffects the energy consumption for a given block cipher. To tackle this problem from a purely analytical point of view, one can make the following observations:

(9)

will see a signal that will be switching for aroundrτF, rτK respectively in every clock cycle before stabilizing. If each of the muxes itself introduce a delay ofτM, then their outputs will be switching for rτF +τM, rτK +τM in every round. Since in a low leakage environment, the energy consumed is essentially the measure of the total number of logic switches, we can assume that the energy consumed in the each of the multiplexers is proportional to

rτF +τM, rτK+τM respectively. Let EM ux,r be used to denote the total energy drawn per cycle by the multiplexers in an r-round unrolled design. Then we can write (αandβ are constants of proportionality)

EM ux,r−EM ux,1=α·[(rτF +τM)−(τF+τM)] +β·[(rτK+τM)−(τK+τM)]

= (r−1)·(ατF+βτK) = (r−1)·dEM ux,

wheredEM ux is therefore the difference between EM ux,r andEM ux,r−1, i.e.

energy consumed per cycle in the multiplexers in therandr−1 round un-rolled architectures. One can see that theEM ux,1, EM ux,2, . . .forms an arith-metic sequence with difference between successive terms equal todEM ux.

2. Similarly, we can derive the energy drawn by the registers. However, reg-isters switch only at the positive/negative edge of the clock (assuming a synchronous design). If EReg,r is the total energy per cycle drawn by the registers in ther-round architecture, we have

EReg,r=EReg,1+ (r−1)·dEReg.

However the value of incremental energydERegwhen compared toEReg,1is

generally much smaller.

3. By similar arguments, the energy consumed in each successive logic block

RFiand similarlyRKi is likely to constitute two arithmetic sequences. The total energy drawn per cycle by therround function blocks, given byERF is

ERF =

r

X

i=1

ERF,i= r

X

i=1

ERF,1+ (i−1)·dERF

=rERF,1+r(r −1)

2 ·dERF,

and similarly, the total energy drawn per cycle by the r round key logic blocks is

ERK =rERK,1+

r(r−1)

2 ·dERK

(10)

cycle between successive round function and round key logic blocks respec-tively. In deriving the above equations we have made the implicit assumption that the capacitive loads driven by the final blocksRFr, RKrare the same as the ones driven by the previous blocksRFi, RKi (fori < r). This, how-ever, is not always true. For example,RFrdrives the multiplexer in front of the state register, and all of the previousRFi blocks drive the subsequent

RFi+1. This may result in small deviation in the actual and the estimated

en-ergy consumed in the final block. However the deviation is usually negligible.

4. The energy drawn by the rest of the logic (Erem) may or may not form a sequence with any special property for increasing values of r. This would naturally depend on the specific algorithm of the block cipher. The value of this figure is usually a small fraction of the total energy drawn by the circuit: in the set of ciphers we have considered in this work,Eremwas never exceeding 5% of the total energy budget.

Summing all the contributions, we can write the total energyEr consumed per cycle in anr-round unrolled circuit as:

Er=EM ux,r+EReg,r+ERK+ERF +Erem

=EM ux,1+ (r−1)·dEM ux+EReg,1+ (r−1)·dEReg+

r·ERF,1+r(r −1)

2 ·dERF+r·ERK,1+

r(r−1)

2 ·dERK+Erem

Er is a quadratic function inrin the formAr2+Br+C, where

A= dERF+dERK

2 , B= ERF,1+ERK,1+dEReg+dEM ux−

dERF+dERK

2

C=EReg,1+EM ux,1+Erem−dEReg−dEM ux.

To compute the total energyErwhich a particular implementation consumes to perform an encryption, the energy required for one round needs to be multiplied for total time required for the computation i.e. 1 +R

r

:

Er=Er·

1 +

R

r

= (Ar2+Br+C)·

1 +

R

r

(1)

As before,Eris a function inrof the formαr2+βr+γ+δr. The analysis for a fully unrolled circuit is slightly different such circuits do not need registers used to store intermediate values. As a result, the total energy consumed by a fully unrolled design does not contain the EReg,r component. Also, a fully unrolled circuit takes only a single clock cycle to complete an encryption.

4

Application of the model

(11)

the parametersEReg, EM ux, ERF,1, ERK1, Eremand b)the energy differentials dERF, dERK, . . .by simulating the energy consumption of the round based and 2-round unrolled design. Using this data, we predict the energy consumption re-quired for one encryption by changing the number of unrolled rounds. Thereafter, we determine the value ofrwhich achieves the highest energy efficiency. Finally, we compare our predictions with the actual energy consumption estimated using a well recognized gate level power simulator.

Estimation of power consumption (and, as a consequence, energy consump-tion) can be carried out at different levels. A designer has to trade the desired precision in the estimation with the time (and the level of circuit details) required for the simulation. A first approximation of power consumption can be achieved by simply counting the amount of switches which each node of the circuit makes during a given time period. This approach is extremely fast. However, the accu-racy is very limited, as all the gates are assumed to consume the same amount of power. A better estimation can be obtained by collecting the switching activity of the circuit under test, obtained by simulating a significant and sufficiently large test bench, and annotating it back to the power estimation tool. In this way, the amount of switches is combined with the power fingerprint of each gate indicated in the technological library and produces a more precise estimation. The back-annotation of the switching activity can be carried out in different ways. The first and simpler, consists of annotating only the switching activity at the primary inputs. In this case, the tool estimates the switching of the internal gates. A more precise back annotation involves annotating the exact amount of switching of each gate, as produced by the simulation of a test bench. It is worth mentioning that most precise estimation of power consumption is obtained by simulating a circuit at SPICE level. This simulation, however, requires the avail-ability of technological models (which are often not provided by the foundry) and needs significant amount of time to be carried out. For this reason, SPICE level simulation is not the preferred way to estimate power consumption. Power consumption estimation is also affected by the point in the design flow where it is carried out. A post-synthesized netlist contains all the information for esti-mating the power consumed by the gates, however it does not have information about the interconnecting wires. To obtain them, it is necessary complete the placement and the routing of the circuit, which is out of the scope of this work. In this work, we are mainly concerned by the energy consumed by the gates. Hence, we carried out the energy estimation using the following design flow: The design was implemented at RTL level. A functional verification of the VHDL code was done usingMentorgraphics ModelSim.Synopsys Design Compiler was used to synthesize the RTL design. The switching activity of each gate of the circuit was collected by running post-synthesis simulation. The average power was obtained usingSynopsys Power Compiler, using the back annotated switch-ing activity. The energy was then computed as the product of the average power and the total time taken for one encryption.

(12)

operating frequency was fixed at 10 MHz since we have already established that at sufficiently high frequencies, the energy consumption of a circuit is invariant with frequency. We selected a set of 9 lightweight block ciphers of different design flavors. We classified them into two categories:

a) Iterated ciphers are those all of whose round functions are similar. In this category we have AES 128, Noekeon, Present, Piccolo, TWINE and Simon 64/96. Such ciphers readily fit the model of energy consumption given by Equation (1).

b) Non-iterated ciphers are those whose round functions are not all similar. For example, in the cipher LED 128, the most significant bits and the least significant bits of the 128-bit key are alternately added to the state after every 4 rounds. So, in a round based design, to account for the addition of the round key once every four cycles, one needs to place a multiplexer/and gate to filter the key every fourth round. However, in a 2-round unrolled design, this filtering is not needed in the second round function. In a 3-round unrolled design, filtering would be needed in all the 3-rounds, whereas in a 4-round unrolled design, filtering is not needed in any of the rounds. So, this is a cipher in which the structure of the round function varies widely from one architecture to another, we will call this a non-iterative cipher. Another example is Prince, in which 3 different type of round functions are used in the design itself. Finally we have the KATAN64 block cipher which is based on a bitwise Shift register. Since the cipher is based on a Shift register, its functioning is very different from the existing SPN/Feistel designs. Each round consists of the execution of a few simple Boolean Functions over only a limited number of bits of the current state. This is why we do not see a compounding of switching activity across rounds. Such ciphers do not readily fit the model for energy consumption as defined in Equation (1). But the core logic remains the same, rounds further away from the register would consume more energy that the ones closer to it.

For all the iterated ciphers in our set we measured the values ofEReg,1, dEReg,

EM ux,1, dEM ux, ERF,1, dERF, ERK,1, dERK Erem and formulate the expres-sion forEr as given in Equation (1). The results are shown in Table 3.

Cipher Blocksize/EReg,1dERegEM ux,1dEM uxERF,1dERFERK,1dERKErem Er

Keysize

AES 128 128/128 6.75 1.49 3.65 3.20 32.70 8.26 16.10 8.26 0.42 (6.13 + 6.03r+ 20.48r2

)· 1 +10

r

Noekeon 128/128 3.26 0.86 1.67 1.81 15.53 26.98 0.00 0.00 0.88 (3.14 + 4.71r+ 13.49r2

)· 1 +17

r

Present 64/80 2.99 0.27 0.58 0.36 1.47 1.59 0.10 0.00 0.20 (3.15 + 1.40r+ 0.795r2

)· 1 +32

r

Piccolo 64/80 1.56 0.39 0.61 1.00 3.93 7.87 0.74 0.00 0.70 (1.48 + 2.13r+ 3.93r2)· 1 +25

r

TWINE 64/80 3.08 0.37 0.76 0.67 1.56 2.16 0.48 0.25 0.42 (3.23 + 1.82r+ 1.25r2

)· 1 +36

r

Simon 64/96 64/96 3.34 0.30 0.60 0.48 1.19 0.99 0.75 0.42 0.52 (3.68 + 2.01r+ 0.71r2

)· 1 +42

r

(13)

Noekeon, when operated in direct mode, does not use a key schedule opera-tion and henceERK,1, dERKparameters are both zero for this cipher. Similarly, in Piccolo, the key schedule consists of selecting different portions of the key de-pending on the current round number and adding a round constant to it. This functionality can be achieved by a set of multiplexers and xor gates for any r -round unrolled architecture and sodERK = 0 for this cipher. The key schedule of Present is such that extremely slow diffusion occurs in the key path. So, the switching activity of the RK1 block does not necessarily compound the switch-ing activity inRK2andERK,1= 0.1, dERK = 0 is a reasonable approximation for analyzing less than 5-round unrolled designs.

By analyzing the expressions forEr in Table 3, one can conclude thatr= 2, is the optimal energy configuration for Present, TWINE and Simon 64/96. For AES 128, Piccolo and Noekeon,r= 1 is likely to be optimal in terms of energy. In Figure 4, we compare our estimates for the energy consumption for upto the 4-round unrolled implementation calculated as per the Equation forErin Table 3, with the actual figures. It can be seen that for Present, TWINE and Simon 64/96, our prediction thatr= 2 is the optimal energy configuration holds good. Similarly our prediction that the round based architecture is the most energy-efficient for AES 128, Noekeon and Piccolo also holds good.

For the non-iterated ciphers, although it is not possible to model the energy consumption of round unrolled designs, the concept holds that successive round functions consume more energy than the previous. The simulation results for all ciphers are given in Table 4. It can be seen that the round based configurations of LED 128 is most energy-efficient. Prince uses three types of round functions: Forward, Middle and Inverse. We implemented 3 architectures for Prince: the round based, Fully unrolled and a Half unrolled design in which Forward/Middle and the Inverse rounds are executed in one cycle each. Again, the round based design was found to be most energy efficient. Finally, we experimented with the round based and 2, 4, 8, 16 and 32 round unrolled versions of KATAN64. As can be seen in Table 4, the 16-round version was found to consume the least energy.

5

Discussion

(14)

0 1 2 3 4 5 6 400

600 800 1,000 1,200 1,400 1,600

1 2 3 4

Ei Energy ( pJ ) Predicted Actual

(a) AES 128

0 1 2 3 4 5

200 400 600 800 1,000 1,200 1,400

1 2 3 4

Ei Energy ( pJ ) Predicted Actual (b) Noekeon

0 1 2 3 4 5

150 160 170 180 190 200 210

1 2 3 4

160 170 180 190 200 Ei Energy ( pJ ) Predicted Actual (c) Present

0 1 2 3 4 5

100 200 300 400 500 600

1 2 3 4

Ei Energy ( pJ ) Predicted Actual (d) Piccolo

0 1 2 3 4 5

200 220 240 260 280 300 320

1 2 3 4

Ei Energy ( pJ ) Predicted Actual (e) TWINE

0 1 2 3 4 5

200 220 240 260 280 300

1 2 3 4

Ei Energy ( pJ ) Predicted Actual

(f) Simon 64/96

(15)

signal delay in one round. The figures in Table 4 confirm that the ciphers which have low differential energies across successive unrolled architectures are also those in which a signal experiences low delay across a round. These are also the ciphers in which the 2-round unrolled design is more energy efficient. For example in Present, the critical path is composed of 1 S-box and 1 xor gate. In Simon 64/96, the critical path includes 3 xor gates and a single and gate. In Twine, the critical path is made up of 2 xor gates and an S-box. In all other ciphers, the critical path is comprised of atleast one S-box, multiple xor gates and MixColumn layers.

# Cipher Blocksize/ Round Type Unrolled #Cycles #Rounds Area(in GE) Energy Energy/bit Delay

Keysize Rounds for full (pJ) (pJ) per round

diffusion (ns)

1 AES 128 128/128 SPN 1 11 2 12459.0 350.7 2.74 3.32

2 6 22842.3 593.4 4.64

3 5 32731.9 1043.0 8.15

4 4 43641.1 1416.5 11.07

5 3 53998.7 1634.4 12.77

2 Noekeon 128/128 SPN 1 18 2 2348.1 339.2 2.65 3.41

2 10 3890.3 583.0 4.55

3 7 5434.9 936.7 7.32

4 6 6946.6 1290.6 10.08

3 LED 128 64/128 SPN 1 50 2 1830.8 656.5 10.26 5.25

2 26 2864.7 1216.8 19.01

4 14 4780.3 1638.0 25.59

4 Present 64/80 SPN 1 33 4 1439.9 172.3 2.69 2.09

2 17 1967.9 155.2 2.43

3 12 2499.3 178.8 2.79

4 9 3000.4 200.0 3.13

5 Prince 64/128 SPN 1 13 2 2286.5 149.1 2.33 4.06

Half 3 8245.9 358.4 5.60

Full 1 7728.6 369.5 5.77

6 Piccolo 64/80 Feistel 1 26 3 1492.0 178.1 2.78 3.28

2 14 2385.5 282.8 4.42

3 10 3268.1 419.0 6.55

4 8 4124.7 604.8 9.45

7 TWINE 64/80 Feistel 1 37 8 1408.2 218.7 3.42 3.10

2 19 1902.8 214.7 3.35

3 13 2399.5 260.0 4.06

4 10 2850.8 318.0 4.97

8 Simon 64/96 64/96 Feistel 1 43 4 1480.0 255.0 3.98 2.18

2 22 1948.7 212.5 3.32

3 15 2419.0 234.0 3.65

4 12 2875.7 268.8 4.20

9 KATAN64 64/80 Shift 1 255 983.8 913.6 14.28 2.04

register 2 128 1055.4 481.9 7.53

4 65 1194.4 269.8 4.22

8 33 1459.6 169.1 2.64

16 17 1992.4 140.1 2.19

32 9 3058.1 167.2 2.61

Table 4: Area, Energy and related figures for all the ciphers

(16)

in-troduced in any one byte/nibble of the state/key to spread across to all the bytes/nibbles of the current state/key. This figure directly controls the quantum of switching activity across a round, and as Table 4 suggests, the ciphers with low differential energies are also the ones which take more rounds to achieve complete diffusion.

Overall, if we compare the energy consumptions of all the ciphers we find that the 16-round unrolled implementation of KATAN64 consumes least energy. A round in KATAN64 is composed of extremely simple Boolean equations, and hence the trend for KATAN64 is such that unrolling more rounds does not al-ways lead to increase of switching across the rounds. Among the SPN/Feistel architectures, the round based implementation of Prince consumes the least en-ergy, as it takes only 13 cycles to complete an encryption, and the fact that it does not employ any key schedule operation. Close second, is the 2-round un-rolled implementation of Present, followed by the round based implementations of Present and Piccolo. Piccolo benefits from the fact that it does not have a key schedule operation, and hence does not expend any energy on writing values to a key register. Coming in next are the 3 and 4 round unrolled Present and the 2-round unrolled designs of Simon 64/96 and TWINE. It is also interesting to note that if we use a DSE based S-box, then the energy/bit figure of round based AES 128 is quite comparable to Prince and Present.

6

Conclusion

In this paper, we looked at the energy consumption figures of several lightweight ciphers with different degrees of unrolling. By constructing a model of energy consumption, we proved that the total energy consumed in a circuit during an encryption operation has roughly a quadratic relation with the degree of un-rolling. In this respect we looked to apply our model to a number of lightweight ciphers and predict the most energy efficient architecture for the design. In the end, we tried to relate the energy consumption in an arbitrarily unrolled archi-tecture of a circuit to physical parameters like critical path in a single round function and algorithmic parameters like number of rounds required to achieve full diffusion.

References

1. Descriptions of SHA-256, SHA-384, and SHA-512. Available at http://csrc. nist.gov/groups/STM/cavp/documents/shs/sha256-384-512.pdf.

2. L. Batina, A. Das, B. Ege, E. B. Kavun, N. Mentens, C. Paar, I. Verbauwhede, T. Yal¸cin. Dietary Recommendations for Lightweight Block Ciphers: Power, Energy and Area Analysis of Recently Developed Architectures. In RFIDSec 2013, LNCS, vol. 8262, pp. 103-112, 2013.

(17)

4. G. Bertoni, J. Daemen, M. Peeters, G. V. Assche. The Keccak Reference. Available athttp://keccak.noekeon.org/Keccak-reference-3.0.pdf.

5. G. Bertoni, M. Macchetti, L. Negri, P. Fragneto. Power-efficient ASIC synthesis of cryptographic S-boxes. In Proceedings of the 14th ACM Great Lakes Symposium on VLSI, pp. 277-281. ACM (2004).

6. A. Bogdanov, L. Knudsen, G. Leander, C. Paar, A. Poschmann, M. Robshaw, Y. Seurin, C. Vikkelsoe. PRESENT: An Ultra-Lightweight Block Cipher. In CHES 2007, LNCS, vol. 4727, pp. 450-466, 2007.

7. J. Borghoff, A. Canteaut, T. G¨uneysu, E. B. Kavun, M. Kneˇzevi´c, L. R. Knudsen, G. Leander, V. Nikov, C. Paar, C. Rechberger, P. Rombouts, S. S. Thomsen, T. Yal¸cin. PRINCE - A Low-Latency Block Cipher for Pervasive Computing Appli-cations - Extended Abstract. In Asiacrypt 2012, LNCS, vol. 7658, pages 208-225, 2012.

8. C. De Canni`ere, O. Dunkelman, M. Kneˇzevi´c. KATAN and KTANTAN - a family of small and efficient hardware-oriented block ciphers. In CHES 2009, LNCS, vol. 5747, pp. 272-288, 2009.

9. D. Canright. A very compact S-Box for AES. In CHES 2005, LNCS, vol. 3659, pp. 441-455, 2005.

10. J. Daemen, M. Peeters, G. V. Assche, V. Rijmen. Nessie Proposal: NOEKEON. Available athttp://gro.noekeon.org/Noekeon-spec.pdf.

11. J. Daemen, V. Rijmen. The design of Rijndael: AES - the Advanced Encryption Standard. Springer-Verlag, 2002.

12. M. Feldhofer, J. Wolkerstorfer, V. Rijmen. AES Implementation on a Grain of Sand. In IEEE Proceedings of Information Security, vol. 152(1), pages 13-20, 2005. 13. Z. Gong, S. Nikova, Y.W. Law. KLEIN: a new family of lightweight block ciphers.

In RFIDSec 2011, LNCS, vol. 7055, pp. 1-18, 2011.

14. J. Guo, T. Peyrin, A. Poschmann, M. J. B. Robshaw. The LED Block Cipher. In CHES 2011, LNCS, vol. 6917, pp. 326-341, 2011.

15. C. Hocquet, D. Kamel, F. Regazzoni, J.-D. Legat, D. Flandre, D. Bol, F.-X. Stan-daert. Harvesting the potential of nano-CMOS for lightweight cryptography: an ultra-low-voltage 65 nm AES coprocessor for passive RFID tags. Journal of Cryp-tographic Engineering. April 2011, Volume 1, Issue 1, pp 79-86.

16. D. Hong, J. Sung, S. Hong, J. Lim, S. Lee, B. Ko, C. Lee, D. Chang, J. Lee, K. Jeong, H. Kim, J. Kim, S. Chee. HIGHT: A New Block Cipher Suitable for Low-Resource Device. In CHES 2006, LNCS, vol. 4249, pp. 46-59, 2006.

17. S. Kerckhof, F. Durvaux, C. Hocquet, D. Bol, F. X. Standaert. Towards Green Cryptography: a Comparison of Lightweight Ciphers from the Energy Viewpoint. In CHES 2012, LNCS, vol. 7428, pp. 390-407, 2012.

18. M. Kneˇzevi´c, V. Nikov, P. Rombouts. Low-Latency Encryption - Is “Lightweight = Light + Wait”? In CHES 2012, LNCS, vol. 7428, pp. 426-446, 2012.

19. A. Moradi, A. Poschmann, S. Ling, C. Paar, H. Wang. Pushing the Limits: A Very Compact and a Threshold Implementation of AES. In Eurocrypt 2011, LNCS, vol. 6632, pp. 69-88, 2011.

20. A. Satoh, S. Morioka, K. Takano, S. Munetoh. A Compact Rijndael Hardware Architecture with S-Box Optimization. In Asiacrypt 2001, LNCS, vol. 2248, pp. 239-254, 2001.

21. K. Shibutani, T. Isobe, H. Hiwatari, A. Mitsuda, T. Akishita, T. Shirai. Piccolo: An Ultra-Lightweight Blockcipher. In CHES 2011, LNCS, vol. 6917, pp. 342-357, 2011.

(18)

Figure

Table 1: Area, Energy figures for round based AES 128 using different componentarchitectures
Fig. 1: Energy consumption for round based AES 128 over a range of clock fre-quencies
Table 2: Area and Energy figures for different AES 128 architectures
Fig. 2: Energy shares for the round based and 2-round unrolled AES 128
+4

References

Related documents

The summary resource report prepared by North Atlantic is based on a 43-101 Compliant Resource Report prepared by M. Holter, Consulting Professional Engineer,

In Hierarchy 1 of strategic factors to revitalize the sports contents distribution industry, the weight and priority of technical infrastructures factors, contents

Annual NA Multipoint contact : average percentage of markets (out of the total number of markets in which the firm competes) that a focal hospital shares with the firms al-

Communicate with patients, families, communities, and other health professionals in a responsive and responsible manner that supports a team approach to the maintenance of health

Given the stakes regarding parent involvement, satisfaction, and choice in international schools,.. information regarding the measurement of parent perception and experience is

The proposed framework in the transformation and analyzing process includes rasterizing data or convert layers from vector to raster by using spatial tools,

Since 2010 The OUHFT maternity service in conjunction with the University of Oxford Mindfulness Centre (OMC) has been running a small but innovative project, the first

The federal government has a law called the Health Insurance Portability and Accountability Act of 1996 (HIPAA) which requires that “personal health information” be private.