Current Estimation for CMOS
3.2 Previous Work
Besides the simple (and slow) expedient of running Spice on the circuit[21], other re- searchers have investigated a couple of approaches to current estimation. The first, described in the next section, uses a pattern independent approach adopted from timing verification. The second, based on probabilistic simulation, modifies techniques originally pioneered for test-pattern generation. These two methods and the timing simulation ap- proach that I use were developed at approximately the same time, providing an interesting contrast on how competing teams can reach different solutions to the same problem.
3.2.1
Timing Analysis
A timing analyzer such as TV [27] or Crystal[46] might be used to derive a current distribution. Tyagi’s estimator HERCULES [58] uses this approach. Its primary advantage is pattern independence; instead of propagating individual real or boolean signals, it applies all possible values to the circuit at once.
A simple example is shown in Figure 31. Nodes In0 and In1 are assumed to be stable when the clock1 rises. Depending on the values of these signals, either a 1 or a
0 will be propagated to the inputs of the inverters. The timing analyzer considers both cases. Possible waveforms at the circuit’s nodes are shown on the right. (Two other possible waveforms, constant 0 and constant 1, are not shown.) The waveforms for the
two inverters are simple; the input signal is simply inverted and delayed at the output. For node N2, the potential waveforms are more complicated. There are three possible cases for each transition, depending on whether nodes N0 and N1 change value. The transition starting time, ending time, and slope all vary depending on what values In1 and In2 initially held; the timing analyzer simply reports that a transition did not start before time t2 and was definitely over by time t5.
In1
In0
N0
N1
N2
φ1
φ1
C 2Cφ1
N0
N1
N2
N2
t0 t1 t2 t3 t4 t5 M1 M2 M3 M4 M5 M6 M7 M8Figure 31: Timing Analysis Example
The next step is to convert this timing information into a current profile for the circuit. For Nodes N0 and N1, this is fairly simple; the starting and ending times for the transitions are known and the average current is simply(Vdd)(C)=(tf 0ti). If desired, a
more accurate current pulse shaping algorithm, such as Ousterhout’s tabular method[46], may be applied. These current pulses are applied to the power buses at the source connection of transistors M1-M4.
Calculating a pulse for node N2 is more complicated. Both the current pulse’s mag- nitude and its timing depend on the inputs. For the falling edge, the location of the pulse (either through M7 or M8, or through both) is also input dependent. The most conservative approach is to assume that the maximum current, which occurs when both M7 and M8 are on, flows for the entire interval from t2 to t5.
In theory, this approach seems attractive because it is pattern independent, giving the designer a worst-case approximation. In practice, it does not work particularly well. When I modified Jouppi’s timing analyzer, TV, to estimate currents, the voltage drops calculated were greater than the supply voltage of the chip. This was due to a number of causes, the most important of which was overestimation of decoder currents. In the 2-bit decoder of Figure 32, there are eight possible transitions for the four output signals; each may go from high to low or low to high. TV assumes they all occur and produces eight current pulses. In practice, at most two of the transitions will occur in a given cycle; one line will rise and one will fall. For the example, TV will overestimate the current by a factor of 4. Circuits with a 32-bit datapaths often have 5-bit decoders; for these, the current estimator will be off by a factor of 32. A decoder’s load capacitance is usually fairly large, making the error from just this one source substantial. In the layout, these decoders are placed next to one another and share common power and ground lines, which are generally sized to support only the two decoders active in a cycle. Putting an order of magnitude more than the actual current through these wires produces some horrendous voltage spikes in the supplies.
There are some other sources of error in this approach. As shown in the example, any gate with a fan-in greater than one will have some uncertainty about the timing, location, and magnitude of its current pulses. To be conservative, the timing analyzer must assume that the maximum current occurs for the entire interval. In a synchronous CMOS design, power usage is spread out across the clock cycle; extending the intervals during which a gate draws current leads to greater overlap between current pulses and an overly conservative current estimate.
Remedying these problems in a pattern independent manner is difficult. For decoders, the designer could label the outputs as having mutually exclusive transitions; the analyzer would then select which transition produces the highest voltage drop. The success of