Evaluation of Reconfigurable Systems - Models, Design Methods and Tools for Improved Partial Dy

pression schemes can also be implemented at low hardware cost for the decom-pression functionality. Although the approaches are difficult to compare directly — because there exits no set of standard bitstreams — the best known method by Malik et al. [55] achieves a compression of up to 10 % of the original configuration data size. The efficiency of the compression methods also supported by a more recent study on Virtex4 FPGAs [86]

The authors of the compression schemes also made an important observation that supports the approaches presented in our work. They observed that inter-configuration data compression becomes much more efficient if the inter-configurations contain parts that remain static between configurations. Later on we will describe in detail how this can be achieved with an automated design flow.

2.6 Evaluation of Reconfigurable Systems

There are several approaches to evaluate the efficiency of reconfigurable computing systems. Efficiency can be measured for different system parameters: energy, area, and execution latency. Here we only summarize models that are targeted specifically at reconfigurable computing architectures. For a discussion on how reconfiguration overhead is incorporated into runtime management refer to Section 2.4.4.

2.6.1 Energy Efficiency Models

A major concern in today’s computing systems is energy consumption. In line-powered systems the energy consumed in high-performance computing causes mainly thermal design challenges – in battery powered devices (mobile computing, wire-less sensor networks etc.) the energy consumption determines the required battery capacity and the system runtime. Thus the computational requirements must be ful-filled with limited amount of energy. In instruction stream processors the operation and the data to be processed are controlled by a continuous instruction stream and hence the average energy efficiency can be given in million instructions per second per Watt (MIPS/W).

In [41] the reconfigurable architectures energy throughput ratio (RETR) is defined to quantify energy efficiency for reconfigurable computing. The metric is separated into reconfiguration of the hardware and the data processing itself. It is given by:

RETR = E_ex+ E_rec

T (2.1)

RETR =

C_ex

(N_aΦ_opup)²+ C_recαR N_aΦ_opup

V_DD²

f_clk. (2.2)

The terms E_exand E_recdenote the average energy consumption per operation for

data processing and reconfiguration, respectively. T is the throughput of the archi-tecture. The terms in Equation 2.2 are defined as follows: C_exand C_recdefine the average effective switching capacity during data processing and reconfiguration, re-spectively; V_DDis the supply voltage, f_clkthe systems clock frequency. N_a, Φ_op, u, p are the number of available operational resources, the operator’s performance, the utilization factor of the operators and the penalty in execution delay caused by re-configuration. α denotes the effective reconfiguration activity. R is the ratio of performed operations to reconfigured operations. In his work, Hinkelmann draws several notable conclusions regarding the energy efficiency of reconfigurable sys-tems from the metric described by Equation 2.2. The main argument is to optimize both throughput and reconfiguration of the device to increase energy efficiency.

Hinkelmann’s conclusions are as follows:

Increasing the number of available resources N_a will also increase C_ex and C_rec, but more resources allow to increase throughput. It is expected that the energy efficiency for reconfiguration will decrease at the same time.

If the throughput Φ_op of the operators is increased, overall throughput in-creases, too. At the same time reconfiguration takes place more often.

The overall throughput can also be increased if the execution is not delayed by reconfiguration. This can be achieved by using reconfiguration sparingly, enabling fast reconfiguration, or execute reconfiguration and execution in par-allel.

Reconfiguration efficiency can be increased if the redundancy in the configu-ration data is exploited for reconfiguconfigu-ration.

Another method to increase reconfiguration efficiency is to separate reconfig-uration that is required frequently and reconfigreconfig-uration that remains constant over longer time periods [68].

The granularity of reconfiguration is also important. If it is too high, reconfig-uration becomes less efficient if only few operators must be reconfigured. On the other hand, a low reconfiguration granularity will increase the reconfigu-ration cost per operator, i.e. C_recis increased.

In multi-context reconfigurable architectures, the configurations are cached on-chip. This increases the reconfiguration efficiency by decreasing the execu-tion delay required for reconfiguraexecu-tion and by reducing the energy consump-tion for loading configuraconsump-tion data from off-chip memory.

The number of reconfigurations and hence the energy for reconfiguration E_rec also depends on the available resources N_a. If the same functionality is imple-mented in a device with less resources, reconfigurations occur more frequently which increases E_rec.

2.6. Evaluation of Reconfigurable Systems 37

2.6.2 Area Efficiency Models

Another approach to measure the computational efficiency has been developed by DeHon [24]. The main interest of his model is the area efficiency of a reconfigurable computer for general purpose computing. He proposes the so-called RP-space model that allows him to compute several efficiency measures. The functional density F_density is defined as the number of gate evaluations N_ge, e.g. 4-input LUTs, per unit space-time t_cycle· A:

F_density= N_ge

t_cycle· A. (2.3)

Similarly, DeHon defines the functional diversity or instruction density I_densityas the number of distinct function descriptions Ninstructionthat are present per unit area A:

I_density=Ninstruction

A . (2.4)

The RP-space model describes an estimation function for the required device area that depends on several architectural parameters: the number of processing ele-ments, the datapath width, the number of on-chip instructions (or contexts), the size of the instruction word and the data memory. The total area is composed of the device area allocated to interconnect, instruction memory, data memory and control.

The model provides guidelines for the design of reconfigurable architectures if some of the aforementioned parameters are known for an application domain.

These guidelines give hints to design an architecture such that the functional density and the instruction density is acceptable for a large range of applications. DeHon proposes the general rule that the instruction memory should account for one half of the processing cell area.

A major advantage of the RP-space model is that it allows to relate the functional density of reconfigurable architectures to other general purpose computing architec-tures. It is found that the functional density of FPGAs can be up to 100 times better then general purpose processors in regular, highly pipelined computations.

The model calculates the functional diversity that originates from the configura-tion (multi-context) memory inside the architecture only. The increase in funcconfigura-tional density that can be achieved with runtime reconfiguration is not covered.

2.6.3 Runtime Efficiency Models

Wirthlin et al. [99] investigate the functional density of statically versus runtime reconfigured circuits. Therefore the reconfiguration time t_recis introduced into the

functional density metric:

Fdensity,rec = N_ge

(t_cycle+ t_rec)· A (2.5)

F_density_,_rec = N_ge

t_cycle(1 + f )· A with f = t_rec

t_cycle (2.6)

The equations above suggests that the relationship between reconfiguration time and execution time affects the functional density. Hence, if the reconfiguration time is small compared to the execution time, the increase in functional density is more prominent. Note that the advantage of a runtime reconfigurable circuit stems from the possibly smaller area and less execution time of a task.The theoretical maximum improvement is achieved if the reconfiguration time can be neglected, i.e. F_density_,_max= lim_f_→0F_density_,_rec.

More important is the relationship of the functional density between the stat-ically and the runtime reconfigurable circuit. In order to be more efficient, the functional density of the runtime reconfigurable circuit must be higher than the functional density of the statically configured circuit, i.e. F_density_,_rec≥ F_density. This yields by substitution of F_density_,_max:

F_density_,_max

F_density − 1 ≥ f . (2.7)

In [99], the authors conclude that the maximum allowable configuration ratio f must be less then the maximum potential improvement in functional density, in order to be more efficient. They suggest that if a runtime reconfigurable circuit has a greater advantage over a static circuit then the reconfiguration time is a less important limitation.

2.7 Similarity Based Reduction of

In document Models, Design Methods and Tools for Improved Partial Dynamic Reconﬁguration (Page 49-52)