Implementation e fficiency and different metrics

3.3 Computer hardware

3.3.2 Implementation e fficiency and different metrics

Performance of FPGA and ASIC implementations is described with three key metrics (dimen- sions): area, time and power. Other derived metrics are sometimes used, because they make pre- dictions and comparisons between different design options easier.

The primary time metrics of a design are latency, clock period (and its reciprocal clock frequency) and total time. These terms apply in the same manner to both, FPGAs and ASICs. Latency is the time that elapses from the moment when the input data is available to the moment the results appear on the outputs [65]. If an algorithm can be realized with a purely combinational circuit (without storage elements), the time complexity equals to the delay of the signal along the critical path, where a path is a sequence of interconnects and logical elements. In sequential circuits, the time complexity is given by two parameters, the clock period, which depends on the critical path, and total time, which is the product of the clock period and the number of clock cycles needed.

The throughput measures the amount of data processed per time. Based on how the data and the time are measured, there are slight variations of throughput. They are presented in incremental fashion:

1. data measured in parcels, time measured in clock cycles Tput= #parcels #clk.cycles " parcel cycle # (3.20)

2. data measured in parcels, time measured in seconds

Tput= #parcels #clk.cycles =

#parcels #clk.cycles · #seconds_clockcyle

parcel seconds

(3.21)

This form of throughput is obtained by including the clock cycle period, measured in seconds- per-clock cycle.

3. data measured in bits, time measured in seconds

Tput= #parcels #clk.cycles =

#parcels · _parcel#bits #clk.cycles · #seconds_clockcyle

" bits seconds

(3.22)

This form of throughput is obtained by including the bit-width of the parcel.

Throughout literature, the throughput is measured as _cyclebit , i.e. bits-per-cycle or as word_cycle, i.e. words- per-cycle; the latter corresponds with equation (3.20), using word as synonym for parcel or output. For example, during WG stream cipher (Subsection 3.2.4), the key initialization algorithm, the parcel is a m-bit word, yielding words-per-cycle. During the WG running phase, the parcel is 1 keystream bit, yielding bits-per-cycle. Some literature [44] defines the throughput as bits-per-cycle multiplied by the clock frequency, yielding bits-per-second.

The area complexity in FPGAs is given in terms of resources used by the design, for example the number of used slices (a collection of LUTs and registers, concrete configuration depends on the specific FPGA device), LUTs, storage elements, input-output blocks (IOBs), etc. Since both LUTs and registers are contained in slices, the number of slices will be used as the primary area metric for FPGAs. Area complexity for ASICs is measured by the amount of silicon used and can be given either in µm2or in Gate Equivalents (GE). The latter is the area in µm2divided by the area of a two-input NAND gate. GE is preferred metric to µm2_{, because it is believed to allow very rough}

comparisons across different fabrication technologies and gate libraries. However, as stressed by [66], the GE metric is technology specific, and direct comparison of area expressed in GEs across different technologies is not possible.

Power and energy are becoming more and more significant as metrics for various reasons: they affect battery life, can enforce a limit of the clock frequency, causes higher temperatures which in turn reduces the lifetime of the device, and increases dissipated heat of hand-held devices etc. In general, total power consumption depends on the number of logic cells in the circuit, connections between them, the underlying technology being used and finally on data that is being processed.

In CMOS circuits, the total power consumption has two components: static power and dynamic power. Dynamic power is proportional to how often the signals change their value and on clock frequency. It is attributed to the evaluation of logic cell outputs and depends on two factors, the load capacitance of the cell that needs to be charged and the short circuit current occurring when the output of a cell is switched. The static power is caused by leakage currents and increases with decreasing size of transistors. It is roughly proportional to the area ([44]). Power and energy are closely related. According to [67], energy is becoming more important for determining the lifetime of battery operated devices, suitable for lightweight applications. A frequently used metric is energy-per-bit, calculated by dividing the total power consumption by the throughput, both obtained at the same clock frequency [44].

Since it is difficult to compare two designs based on more than one metric (for example the clock period and the area), the derived metrics are used to measure design efficiency, for example the time-area productor with the power consumption being more and more important, the time-area- power product. These two metrics are, just like the clock period and area, “the smaller the better”. However, it is more natural for us to look for the opposite, the “bigger number”, which is also one of the reasons why frequency is often preferred to clock period. Taking the reciprocal of these two products and keeping throughput in mind, the optimality metrics are derived: the throughput per time-area product o1= _tAT = Tf_A and throughput per time-area-power product o2= _tAPT = _APTf. The

value T in o1 and o2 is the throughput measured in parcels-per-cycle (equation (3.20)). Because

power analysis is tedious it is often approximate it with area as T·f_A2. This ratio is also preferred

to the _APT , because of sensitivity of power analysis to differences between the cell libraries and to tool configurations [68]. There is yet another viewpoint to these metrics, namely the fact that high throughput comes at the cost of area increase, for example exploiting maximum level of paral- lelism or unrolling an iterative implementation into a pipeline [69], or by increasing the frequency, which in turn causes increased area and power consumption. Metrics like _Af and _Af2 put a better

perspective on the actual improvement of the design by some optimization attempt; they emphasize the tradeoffs between the throughput and area.

In document Automated Design Space Exploration and Datapath Synthesis for Finite Field Arithmetic with Applications to Lightweight Cryptography (Page 67-69)