Figure 3.20: Supply voltage Vs energy per operation, 16-bit parallel multiplier
3.4 Comparative Analysis with Subthreshold
Subthreshold design technique enables realization of minimum energy computation by aggressively lowering the supply voltage until a minimum energy point is found where the energy consumed to dynamic switching per clock cycle is equivalent to the leakage energy consumed per clock cycle [36,37]. The principles of the technique were given in Chapter 1, Section 1.4.2. The work proposed in this chapter can be considered as an alternate, orthogonal approach to the subthreshold technique to achieve low performance operation whilst maximising energy efficiency. This section investigates the subthreshold operation of two of the test circuits, namely the 16-bit parallel multiplier and the ARM Cortex-M0, with the aim of establishing the performance of sub-clock power gating relative to the subthreshold technique. The two test circuits were implemented using the same 90nm technology library used in Section 3.3 however, to be representative of the constraints typically enforced on gate libraries used when designing subthreshold circuits, gates with transistor stacks greater than 3 were banned from synthesis [50,52]. The experimental flow in Fig. 3.7was followed to obtain the power results and the HSpice simulation stage was conducted for a range of supply voltages. For each supply voltage the circuit was first simulated once at a low clock frequency to obtain the critical path length of the circuit and then again at the corresponding maximum operating frequency. The netlists were full parasitic netlists but the transistor models were kept ideal and effects of process variation such as threshold voltage, supply voltage and channel length variation, which can affect both the reliability and performance of a subthreshold circuit [36, 37, 53]
were omitted resulting in ideal subthreshold results. However, due to the up to 24 hour simulation time required with HSpice this method kept turnaround time reasonable.
Fig. 3.20shows the energy per operation against supply voltage of the 16-bit multiplier when using subthreshold design. In Table3.5, the sub-clock power gating technique using
Table 3.5: Comparison of sub-clock power gating relative to subthreshold oper-ation performance points, 16bit Multiplier
Comparison Subthreshold Proposed SCPG-Max Point Freq. Power Energy Freq. Power Energy
(MHz) (uW) (pJ) (MHz) (uW) (pJ) Freq @ Min. Vdd 1.7 7.6 4.47 2 22.22 11.11
Freq @ Min. Energy 10 17 1.7 10 56.4 5.64
Power @ Min. Energy 10 17 1.7 1-2 14.18 14.18
a maximised duty cycle has been compared against the subthreshold technique at three different power/performance points. The first compares SCPG against subthreshold operation at the same clock frequency achievable at the minimum operational Vddof the subthreshold circuit. The minimum voltage the subthreshold circuit was still functional was at 200mV where the maximum attainable frequency was 1.7MHz consuming 4.47pJ per operation. In a sub-clock based design at a clock frequency of 2MHz, the circuit consumes 2.5x more energy equal to 11.11pJ per operation, Table 3.5. The second comparison made is at the same clock frequency attainable at the minimum energy point of the subthreshold circuit. It was found from HSpice simulation that the minimum energy point in the subthreshold circuit of 1.7pJ per operation is obtained at a supply voltage of 310mV corresponding to an operating frequency of approximately 10MHz.
At 10MHz the sub-clock power gated 16-bit multiplier consumes 5.64pJ per operation, representing a 3.3x increase in energy. The final comparison made assumes the power consumption of 17µW at the minimum energy point as the power budget of the circuit.
For a power budget of 17µW using the SCPG multiplier this dictates operation between 1-2MHz consuming 14.18pJ per operation; a 5x reduction in performance and a 6.5x increase in energy.
Fig. 3.21shows the energy per operation at different supply voltages for the Cortex-M0 using subthreshold operation. The same trend is observed here as in the case of the 16-bit multiplier with an inflection at the minimum energy point. Note, however, that the increased density of logic in this circuit pushes the minimum energy point towards a higher supply voltage. This is because the leakage energy of the increased number of gates dominates at a higher clock frequency and is a common observation in larger subthreshold circuits due to more circuitry [37]. The same three power/performance comparisons made with the subthreshold and SCPG multiplier circuits have been made with the Cortex-M0 and are shown in Table3.6. The minimum operating voltage of the subthreshold circuit was located at 200mV at a clock frequency of 1.4MHz consuming 31.15pJ per operation. This is a 3.3x lower energy consumption than using a sub-clock based Cortex-M0 at 2MHz as shown in Table3.6. For the second comparison, simulation locates the minimum energy point of the subthreshold circuit at a supply voltage of 450mV, corresponding to an operating frequency of 24MHz, consuming 12.01pJ per operation or average power consumption of 288.24µW. In this case, a direct comparison cannot be made as a sub-clock power gated Cortex-M0 cannot be operated at a clock
0 5 10 15 20 25 30
0 100 200 300 400 500 600 700
Energy per Operation (pJ)
Supply Voltage (mV)
Figure 3.21: Supply voltage Vs energy per operation, Cortex-M0 Table 3.6: Comparison of sub-clock power gating relative to subthreshold oper-ation performance points, Cortex-M0
Comparison Subthreshold Proposed SCPG-Max Point Freq. Power Energy Freq. Power Energy
(MHz) (uW) (pJ) (MHz) (uW) (pJ) Freq @ Min. Vdd 1.4 43.61 31.15 2 22.22 11.11
Freq @ Min. Energy 24 288.24 12.01 - -
-Power @ Min. Energy 24 288.24 12.01 5 289.79 57.96
frequency of 24MHz. However, the third comparison which assumes the same power budget of the subthreshold circuit’s minimum energy point shows the sub-clock power gated Cortex-M0 can be operated at 5MHz consuming 57.96pJ per operation, a 5x reduction in performance and 4.8x increase in energy, as shown in Table 3.6.
This analysis from the ideal subthreshold test cases shows, as expected, that the sub-threshold technique offers better energy efficiency than sub-clock power gating since it enables minimum energy computation. As mentioned in Chapter 2, Section2.5though, the subthreshold technique has a number of design challenges associated with it due to the ultra-low operating voltages. The circuit is more sensitive to process variations such as variations in threshold and supply voltages [36,37] requiring careful design con-siderations including custom or modified gate libraries [50, 52] and custom tools for characterisation of gate library cells and timing analysis [51–53]. The sub-clock power gating technique on the other hand is utilised sufficiently above the threshold voltage maintaining greater stability with process variations and is fully compatible with a stan-dard power gating design flow, using commercially available stanstan-dard cell libraries and EDA tools with little additional design effort. In addition to the design challenges of us-ing subthreshold regime, the circuits are also optimised for operation at ultra low supply voltages and low operating frequencies only. Sub-clock power gating conversely provides
a performance/power trade-off. The performance of the circuit can easily be changed between various clock frequencies whilst minimising leakage energy. Additionally, the nOverride signal enables the circuit to achieve normal timing which can be particularly useful in devices such as the MSP430 which utilises clock frequencies in the range of 32kHz to 8MHz.
3.5 Concluding Remarks
Leakage power is a major concern in current and future nanometer technologies, and its reduction is key to improving the energy efficiency of integrated circuits. In applications that demand low to moderate performance, leakage becomes a major contributor of active mode power due to idle time of combinational logic that occurs within the clock period. This chapter has addressed this problem with power gating which can be used to cut leakage power dissipation within the clock period and improve overall energy efficiency of an integrated circuit operating at low clock frequencies.
This is the first investigation into employing power gating within the clock period to minimise leakage power during the active mode and it is shown through simulation of a 16-bit multiplier, an ARM Cortex-M0 microprocessor and the Event Processor [63], using a 90nm technology library, that considerable savings are achievable with the sub-clock power gating technique. Power gating is used only on the combinational logic while the sequential logic is kept always-on during the active mode. The control to the power gates is provided by the clock signal by cutting the power when the clock is high and enabling power when clock is low. It is shown that taking control of the duty cycle and extending the high phase of the clock, leakage power saving can be maximised by capitalising on all the combinational logic idle time within the clock period. The sub-clock power gating technique is fully compatible with a standard power gating design flow using commercially available gate libraries and EDA tools.