• No results found

Bank-Gating Algorithm Comparison

4. Register File Power-Gating

4.7 Evaluation

4.7.3 Bank-Gating Algorithm Comparison

Algorithms that control when a register bank is disabled will affect the leakage energy of the register file bank and the fraction of bank toggles which are profitable and exceed the breakeven time. An aggressive algorithm my disable register file banks more frequently, but if those banks are only disabled for a short period of time, then the scheme could exhibit a net loss of energy compared to a conventional clock-gated approach.

0 5 10 15 20 25 30 35 40

cactus calc dealII gamessGems

lbm milc

namd povray soplexsphinx3 tonto wrf zeus F.avg astar bzip2 gccgobmk h264 hmmer

libq mcf omnetpp

sjeng xal I.avg

% gate opp

imm rob wm8

(a) Priority allocation

0 5 10 15 20 25 30 35 40

cactus calc dealIIgamessGems lbm milc namd povray soplexsphinx3 tonto wrf zeus F.avg astar bzip2 gccgobmk h264 hmmer

libq mcf omnetpp

sjeng xal I.avg

% gate opp

imm rob wm8

(b) Fullest allocation

Figure 4.13: Gating algorithm comparison for banks of eight-registers

Fraction Gated

The first metric affected by the gating algorithm is the average amount of the register file that is disabled. A conservative algorithm will disable fewer banks, keeping some banks enabled to be used by new instructions. Aggressive algorithms will try to disable the register banks as soon as they are empty. Figure 4.13a shows the fraction of the register file that is disabled when sweeping the gating algorithm while keeping the allocation algorithm fixed at priority encoded allocation. Figure 4.13b shows the fraction of the register file that is disabled when sweeping the gating algorithm while keeping the allocation algorithm fixed at fullest allocation. We do not consider the free-list or most-recent allocations because the power-gating opportunity is significantly constrained by register scattering.

0 20 40 60 80 100

cactus calc dealII

gamessGems

lbm milc

namd povray soplexsphinx3 tonto wrf zeus F.avg astar bzip2 gccgobmk h264

hmmer

libq mcf

omnetpp

sjeng xal I.avg

% breakeven

imm rob wm8

(a) Toggles exceeding breakeven time

40 45 50 55 60 65 70 75 80

cactus calc dealII

gamessGems

lbm milc

namd povray soplexsphinx3 tonto wrf zeus F.avg astar bzip2 gccgobmk h264

hmmer

libq mcf

omnetpp

sjeng xal I.avg

Leakage E/cycle (pJ)

imm rob wm8 baseline

(b) Leakage energy per cycle

Figure 4.14: Gating algorithm comparison for banks of eight-registers allocated with priority algorithm

In terms of fraction gated, the immediate gating algorithm performs best, as banks are disabled once their reference count is empty. The disabled bank has the lowest priority to be re-enabled because any active bank will have higher occupancy and be preferred. Watermark-8 has the lowest amount disabled as its watermark approach is slower to track changes in program behavior. The ROB-proportional approach performs similarly well with as the immediate algorithm as ROB occu- pancy can function as a proxy for register pressure. These trends are the same for both Figure 4.13a and b.

Breakeven and Leakage Energy

Figures 4.14 and 4.15 apply our energy model from HSPICE simulations to the simulator performance model, associating a per-cycle energy cost for each cycle of power-gating. With this information, we can measure both the fraction of toggles that break-even and the per-cycle leakage energy compared to clock-gating the register file bank. These figures show the breakeven time and leakage energy per

0 20 40 60 80 100

cactus calc dealIIgamessGems lbm milc namd povray soplexsphinx3 tonto wrf zeus F.avg astar bzip2 gccgobmk h264

hmmer

libq mcf

omnetpp

sjeng xal I.avg

% breakeven

imm rob wm8

(a) Toggles exceeding breakeven time

40 45 50 55 60 65 70 75 80

cactus calc dealIIgamessGems lbm milc namd povray soplexsphinx3 tonto wrf zeus F.avg astar bzip2 gccgobmk h264

hmmer

libq mcf

omnetpp

sjeng xal I.avg

Leakage E/cycle (pJ)

imm rob wm8 baseline

(b) Leakage energy per cycle

Figure 4.15: Gating algorithm comparison for banks of eight-registers allocated with fullest- bank algorithm −240 0 24 48 72 96 >120 25 50 75 100 % Toggles

Gated cycles in excess of T Breakeven

fullest,imm fullest,wm8 most−recent,imm most−recent,wm8

Figure 4.16: % Breakeven CDF. Cycles spent disabled in excess of breakeven time for sjeng benchmark and two different allocation and gating configurations

cycle for each gating algorithm applied with the priority and fullest-bank allocation algorithms. The leakage energy per cycle is measured across the entire workload. The default clock-gating leakage current is applied to cycles when a bank is enabled, as there is still leakage current associated with normal operation of the bank. The delta between clock-gating and power-gating is only applicable during the fraction of the workload when the bank is power-gated.

Figures 4.14a and 4.15a show how gating algorithms and allocation algorithm affect bank

breakeven rates for register banks allocated using priority-encoded and fullest-bank algorithms. Note that you cannot infer the reduction in leakage-energy from these figures, just the efficacy of the gating decision. A bank may breakeven and be disabled for hundreds of cycles, saving significant amount of leakage energy vs clock-gating, while a bank may miss breaking even by only one cycle without a significant energy cost. In Figure 4.14a, immediate gating is typically the best performer, though watermarked gating works particularly well for lbm and gamess. In these cases, the water- mark algorithms guardband protects against a register bank toggling on and off too frequently. The immediate algorithm does not incorporate hysteresis, but it does take advantage of extra cycles that a bank can be disabled. In most cases, the watermark guardband is too conservative.

We see the effect of each gating algorithm on leakage energy in Figure 4.14b. This figure shows the leakage energy per cycle for the register file. Integer workloads (right) have lower energy per cycle have a larger fraction of the register file that is idle. The effect of gating algorithm is minimal because this energy is aggregated over the entire execution of the workload, including both when

the bank is gated and when the bank is active. These figures show that while the watermark

gating algorithm may have more toggles that exceed the breakeven time for some workloads, gating immediately allows a higher fraction of the register file to be power-gated and ultimately has the lowest energy costs.

Figure 4.16 illustrates the interplay of allocation and gating algorithm and the effect on the reg- ister file banks. There is a fixed opportunity to power gate a register bank resulting from the register occupancy at any given point in the workload. The sjeng workload has a maximum opportunity of of approximately 35% depending on the allocation algorithm (seen in Figure 4.13). Figure 4.16

shows a CDF of the amount of time the register bank is powered down in excess of the breakeven time. The x-axis extends to -24 cycles because it takes 24-cycles for banks of eight registers to breakeven—the bank is disabled at x = −24 cycles. When the line crosses x = 0, the toggle has finally reached the breakeven time. The time to the right of x = 0 is an energy recovery region, yielding a net reduction in energy compared to clock-gating that bank. Time spent to the left of x = 0 is an energy sink region, where the cost to disable and power on the bank exceeds energy spent clock-gating.

For the ‘most-recent’ allocation scheme with watermarked gating (solid line), 80% of the bank power-gate toggles are for a duration shorter than the breakeven time. These leaves very little opportunity to reduce power. WM8 is the most conservative gating policy, leaving plenty of banks enabled for incoming instructions. This lowers the opportunity for gating, but those banks that are disabled should be disabled for a longer time than However, the fullest allocation scheme with immediate power-gating (dashed line), has fewer than 50% toggles which do not break even. The curve is shifted down and to the right, indicating that more time is spent in the energy-recovery region.

Related documents