An Architect’s View
3.2.2 Power Fundamentals
At the elementary transistor gate (e.g., an inverter) level, total power dissipation can be formulated as the sum of three major components: switching loss, leakage, and short-circuit loss [2–5].
Powerdevice¼(1=2)CVddVswingaf þIleakageVddþIscVdd (3:3)
where
Cis the output capacitance
Vddis the supply voltage fis the chip clock frequency
ais the activity factor (0<a1), which determines the device switching frequency
Vswingis the maximum voltage swing across the output capacitor, which in general can be less
thanVdd
Ileakageis the leakage current Iscis the short-circuit current
In the literature,Vswingis often approximated to be equal toVdd(or simplyVfor short) making the
switching loss(1=2)CV2af. Also, as discussed in Ref. [3], for a prior generation range ofV
switching loss, (1=2)CV2af, was the dominant component, assuming the activity factor to be above a
reasonable minimum. So, as a first-order approximation, for the whole chip of the previous generation (e.g., CMOS 180 nm, and before) we may formulate the power dissipation to be
Powerchip¼(1=2) X i CiVi2aifi " # (3:4)
where,Ci,Vi,ai, andfiare unit- or block-specific average values in the most general case; the summation
is taken over all blocks or units i, at the microarchitecture level (e.g., icache, dcache, integer unit, floating-point unit (FPU), load-store unit, register files, and buses [if not included in individual units], etc). Also, for the voltage range considered, the operating frequency is roughly proportional to the supply voltage, and the capacitanceCremains roughly the same if we keep the same design but scale the voltage. If a single voltage and clock frequency are used for the whole chip, the above reduces to
Powerchip¼V3 X i Kivai ! ¼f3 X i Kifai ! (3:5)
If we consider the very worst-case activity factor for each uniti, i.e., ifai¼1 for all i, then, an upper
bound on the maximum chip power may be formulated as
Max powerchip¼KVV3¼KFf3 (3:6)
whereKVandKFare design-specific constants. Note that an estimation of peak or maximum power is
important, for the purposes of determining the packaging and cooling solution required. The larger the maximum power, the more expensive is the net cooling solution. Note also that the formulation in Equation 3.6 is overly conservative, as stated. In practice, it is possible to estimate the worst-case achievable maximum for the activity factors. This allows the designers to come up with a tighter bound on maximum power before the packaging decision is made.
The last equation (Equation 3.6) is what leads to the so-called cube root rule [3], where redesigning a chip to operate at half the voltage (and frequency) results in the power dissipation being lowered to (1=2)3 or one-eighth the original. This implies the single-most efficient method for reducing power dissipation for a processor that has already been designed to operate at high frequency—reduce the voltage (and hence the frequency). There is a limit, however, of how lowVddcan be reduced (for a given
technology), which has to do with manufacturability and circuit reliability issues. Thus, a combination of microarchitecture and circuit techniques to reduce power consumption, without necessarily employing multiple or variable supply voltages is of special relevance in the design of robust systems.
In post-180 nm technologies, static (i.e., leakage or standby) power has increasingly become a major, if not the dominating, component of chip power. As discussed in Ref. [6], the three major types of leakage effects are (1) subthreshold, (2) gate, and (3) reverse-biased, drain- and source-substrate junction band-to-band tunneling (BTBT). With technology scaling, each of these leakage components tends to increase drastically. For example, as technology scales downward, the supply voltage (Vdd) must
also scale down to reduce dynamic power and maintain device reliability. However, this requires the scaling down of the threshold voltage (Vth) to maintain reasonable gate overdrive and therefore,
performance, which is a function of (VddVth). However, lowering theVthcauses substantial increases
in leakage current, and therefore, standby power, in spite of the lowerVdd.The subthreshold channel
leakage current in an MOS device is governed by the following equation [7]:
where
Kw, measured in units of microamps per micron (mA=mm) can be thought of as the width-
sensitivity coefficient
Wis the device width
Sis the subthreshold swing, measured in mV like theVth
In Ref. [7], the value of Kw is quoted to be 10.S is a parameter that is defined to characterize the
efficiency of a device in turning on or off. It can be shown that the turn-off characteristic of a device is proportional to the thermal voltage (kT=q) and the ratio of junction capacitance (Cj) to oxide
capacitance (Cox) [8]. The parameterScan be formulated as
S¼2:3(kT=q)(1þCj=Cox) (3:8)
This parameter is usually specified in units of millivolt (mV) per decade and it defines how many millivolts the gate voltage must drop before the drain current is reduced by one decade. The thermal voltagekT=qis equal to 26 mV at room temperature. Thus, at room temperature, the minimum value ofSis about 60 mV per decade. This means that an ideal device at room temperature would experience a 103reduction in drain current for every 60 mV reduction of the gate voltageVgsin the subthreshold region. In the deep
submicron era, a typical transistor device has anSvalue in the range of 85–90 mV per decade.
Also note that Vth(Equation 3.7) is itself a function of temperature (T); in fact Vthdecreases by
2.5 mV=K as temperature increases. Also,Kwitself is a strong function of temperature (T2). Thus, asT
increases, leakage current increases dramatically, both because of its dependence onTand decrease in
Vth. The delay of an inverter gate is given by the alpha-power model [9] as Tg
LeffVdd
m(T)(VddVth)a
(3:9) where,ais typically around 1.3 andmis the mobility of carriers (which is a function of temperature
T,m(T)T1.5). AsVthdecreases, (VddVth) increases so the inverter becomes faster. AsTincreases,
(Vdd–Vth) increases, butm(T) decreases [10]. This latter effect dominates; so, with higher temperatures,
the logic gates in a processor generally become slower.