Figure 4.20: Normalised measured power of ARM Cortex-M0 microprocessor with 10kHz clock at varying duty cycle in SCPG mode, Vdd=0.7V
on the same supply voltage as the fabricated Cortex-M0 and a method to record its oscillating frequency was also implemented. From measuring the oscillator’s frequency at 1.2V and then again at 0.7V, it is found that the propagation delay of the NAND gates increase by 13.87x. This equates to the Cortex-M0 critical path increasing from 5ns to 70ns. The charge up time of the virtual rails in symmetric virtual rail clamping is measured from the oscilloscope traces and is found to be 45ns. To ensure no timing violations occur during operation, a margin is introduced and the low period of the clock is therefore chosen to be 200ns for measurements in the following section. Using the clock modulator circuit (Fig. 4.13) a 200ns low period corresponds to an external clock frequency of 5MHz, and n can then be programmed according to the desired clock frequency.
4.4.3 Sub-Clock Power Gating with Symmetric Virtual Rail Clamping Analysis
Next, the power consumption of the Cortex-M0 with and without the proposed SCPG technique using symmetric virtual rail clamping over a range of clock frequencies is com-pared. The Dhrystone benchmark [161] previously used in Chapter3is used again. The measured results across five test chips are presented in Fig. 4.21and show that the pro-posed SCPG technique achieves lower power consumption at all frequency points up to a clock frequency of just over 400kHz. At all of these frequency points, the energy saved (Esav) from using the proposed SCPG technique exceeds the energy overhead (Eoh) of power gating resulting in the savings seen. However, as clock frequency increases, Esav
0 5 10 15 20
0.1 1 10 100 1000
Avg. Power ( µ W)
Clock Frequency (kHz, log)
Power Gating Disabled Proposed SCPG
Figure 4.21: Dhrystone - Measured power of ARM Cortex-M0 at varying clock frequency, Vdd=0.7V
Table 4.4: Dhrystone - Average measured power and energy over five test chips with power gating disabled (No-PG) & sub-clock power gating (SCPG)
Clock No Power Gating Proposed SCPG Freq. Power Energy Power Energy Saving
(kHz) (uW) (pJ) (uW) (pJ) (%)
0.5 8.18 16351 2.69 5385 67.06
1 8.18 8179 2.70 2702 66.96
2 8.18 4091 2.72 1361 66.72
5 8.20 1639 2.79 558.0 65.97
10 8.22 822.3 2.90 290.0 64.73
20 8.27 413.5 3.12 156.0 62.27
50 8.42 168.3 3.78 75.58 55.11
100 8.66 86.61 4.84 48.42 44.10
200 9.15 45.73 6.73 33.64 26.43
250 9.39 37.55 7.57 30.30 19.31
263.2 9.52 36.18 8.02 30.48 15.73 277.8 9.60 34.56 8.28 29.81 13.73 312.5 9.69 31.00 8.57 27.41 11.57 333.3 9.79 29.37 8.89 26.66 9.22 357.1 9.90 27.73 9.24 25.88 6.67 384.6 10.04 26.09 9.65 25.09 3.86 416.6 10.19 24.46 10.11 24.27 0.79 454.5 10.38 22.83 10.65 23.43 -2.62
500 10.60 21.19 11.28 22.55 -6.42 1000 13.01 13.01 17.58 17.58 -35.13
reduces because of the shorter combinational idle time and eventually becomes compa-rable to Eoh resulting in the convergence point around 400kHz in Fig. 4.21. At clock frequencies above 400kHz, Eoh> Esavand the power consumed by the Cortex-M0 when
using SCPG exceeds that of the Cortex-M0 without power gating. This maximum appli-cable clock frequency of sub-clock power gating with symmetric virtual rail clamping is 400x higher than the convergence point seen with shut down power gating in Fig. 4.19 which was between 1kHz and 2kHz. In the intended applications of sub-clock power gating, if clock frequencies above and below 400kHz are required, the processor could be switched to no power gating mode by using the nOverride signal (Fig. 4.4) for clock frequencies above 400kHz.
The reason five test chips were used for the data shown in Fig. 4.21is to compare results across multiple dies and, as can be seen, the measurements all follow the same trend.
The spread between plotted points can be explained by die to die process variation.
The average power and energy per operation across the five test chips is shown in Table4.4. In the final column the percentage saving achieved when using the proposed technique is stated. As can be seen, the proposed technique saves up to 67% of the energy compared to without power gating and demonstrates sub-clock power gating’s ability to improve energy efficiency for a circuit operating at low clock frequencies. At 455kHz, the processor would need to switch to no power gating with the nOverride signal to remain in the lowest energy mode of operation. The measurements were also repeated at 0.8V to investigate variation with voltage and results showed the same trend in power saving.
Wireless Sensor Node Program
To investigate the utility of sub-clock power gating in one of the target applications a second test program is used. The program is an algorithm found in a real wireless sensor node used in the ‘Next Generation Energy Harvesting Electronics’ project which tracks the resonant frequency (between 42Hz and 55Hz) for a vibrational energy harvester [164]. The same five chips as were used with the Dhrystone program were used for the tuning program and the results are presented in Fig. 4.22. The average power values can be seen in Table4.5. In the real application, an analogue to digital converter (ADC) is used to measure the acceleration on an accelerometer at a rate of 2kHz and every new reading triggers an algorithmic computation on the core processor. Over a set of 1000 samples the processor is capable of calculating the current frequency of vibration which is used to set a stepper position on the energy harvester. In the fabricated Cortex-M0 an example set of 1000 samples from a 48Hz vibration source is loaded into the SRAM to emulate obtaining a new reading from the ADC. Per ADC sample, the tuning program loops around a maximum of 85 instructions, therefore at a sampling rate of 2kHz the Cortex-M0 could operate at 200kHz without missing a new sample. At 200kHz, without sub-clock power gating the processor would consume 45pJ/operation and with sub-clock power gating the processor consumes 33.20pJ/operation, representing a 1.4x improvement in energy efficiency.
0 5 10 15 20
0.1 1 10 100 1000
Avg. Power ( µ W)
Clock Frequency (kHz, log)
Power Gating Disabled Proposed SCPG
Figure 4.22: Tuning Program - Measured power of ARM Cortex-M0 at varying clock frequency, Vdd=0.7V
Table 4.5: Tuning Program - Average measured power and energy over five test chips with power gating disabled (No-PG) & sub-clock power gating (SCPG)
Clock No Power Gating Proposed SCPG Freq. Power Energy Power Energy Saving
(kHz) (uW) (pJ) (uW) (pJ) (%)
0.5 8.06 16117 2.65 5300 67.11
1 8.06 8061 2.66 2660 66.99
2 8.07 4035 2.69 1342 66.72
5 8.10 1619 2.76 551.0 65.97
10 8.12 812.4 2.87 286.6 64.72
20 8.18 408.8 3.08 154.2 62.27
50 8.33 166.5 3.74 74.71 55.14
100 8.57 85.74 4.78 47.84 44.20
200 9.07 45.33 6.64 33.20 26.77
250 9.31 37.26 7.48 29.91 19.72
263.2 9.45 35.90 7.91 30.07 16.23 277.8 9.53 34.30 8.17 29.41 14.26 312.5 9.62 30.78 8.45 27.05 12.14 333.3 9.72 29.18 8.77 26.31 9.83 357.1 9.84 27.56 9.12 25.53 7.34 384.6 9.98 25.94 9.52 24.74 4.60 416.6 10.13 24.32 9.97 23.94 1.59 454.5 10.32 22.71 10.50 23.10 -1.74
500 10.54 21.09 11.12 22.24 -5.44 1000 13.01 13.01 17.31 17.31 -33.06
4.4.4 Ground Bounce Analysis
A potential concern when applying power gating is the ground bounce that is induced on the always-on supply rail [3,42]. As discussed in Chapter 1, Section 1.4.1, this ground