Circuits Techniques for Dynamic Power
10.6 Circuit Technology-Dependent Power Reduction
The physical implementation of circuit behavior (e.g., Boolean function) may be differentiated by the chosen technology. In other words, the realized circuit topology and the chosen circuit components (e.g., from a library) may result in circuit designs with different hardware features (e.g., chain topology vs. tree topology), which affect the circuit capacitance and switching activity. The next paragraphs describe a series of low-power techniques, which achieve power savings through the reduction of a or CL (Equation 10.2).
10.6.1 Path Balancing
The way the gates of a logic circuit are interconnected can strongly affect the overall switching activity, and hence the power dissipation. For example, timing skew between signals in a circuit can cause spurious transitions (glitches) resulting in extra power. To reduce the possible spurious activity in a circuit, delay of all true paths that converge at each gate must be balanced, as depicted in Figure 10.19, where the logic FIGURE 10.18 (a) Single clock, flip-flop based FSM, and (b) gated-clock version.
FIGURE 10.19 Path balancing for glitching reduction.
IN OUT
CLK A B
STATE STATE
Combinational Logic OUT
CLK
GCLK IN
Fa L
Combinational Logic
Fa
Fa
0
a a
b 1 b
1 1
1 1
c d c
d
Balanced path Unbalanced path
1941_C10.fm Page 17 Thursday, September 30, 2004 4:46 PM
Copyright 2005 by CRC Press
10-18 Low-Power Electronics Design
function f = abcd is implemented in two alternative ways (i.e., chain structure and tree structure). In addition, notice that the tree implementation of function f provides glitches elimination, thus reducing effectively the total power dissipation.
Path balancing can be achieved before technology mapping by selective collapsing and logic decom-position or after technology mapping by delay insertion and pin reordering. The advantage of this technique is that by selectively collapsing the fan-ins of a node, the arrival time at the output of the node can be changed. Logic decomposition and extraction can be performed to minimize the level difference between the inputs of the nodes that are driving high capacitive nodes. Additionally, by inserting variable-delay buffers in a circuit, the variable-delays of all paths in the circuit can be made equal. The issue in variable-delay insertion is to use the minimum number of delay elements to achieve the maximum reduction in glitching activity. Path delays may sometimes be balanced by an appropriate signal to the pin assignment. This is possible, because the delay characteristics of CMOS gates vary as a function of the input pin that is causing a transition at the output.
10.6.2 Technology Decomposition
The next step during logic synthesis of a network is to convert the network to another, which only contains two-input AND/NAND and inverter gates. This step, named technology decomposition, is very useful for network synthesis and is carried out before the mapping of the network, according to the current cell library, takes place. Therefore, a decomposition scheme that minimizes the total switching activities of the network is a good starting point for power-efficient technology mapping.
Given the switching activity at each input of a node, Tsui et al. [28] suggested a technique for AND decomposition of this node, which reduces the total switching activity in the resulting two-input AND structure under a zero-delay model. The idea is to inject the high switching activity inputs into the decomposition model as late as possible, as shown in Figure 10.20, where two different decomposition structures for the four-input AND gate are depicted.
Note that signal d, which has the highest switching activity, is injected last in configuration A, thus implying better power performance for this configuration. This technique has been found as being optimal for dynamic CMOS circuits, but also produces very good results for static CMOS circuits. In general, the low-power technology decomposition procedure reduces the total switching activity in the circuits by 5% over the conventional balanced tree decomposition method.
10.6.3 Technology Mapping
Technology mapping refers to the process of binding a given Boolean network to the gates included in a target cell library. In Lin and Man [17], Tiwari et al. [27], and Tsui et al. [28], some design techniques FIGURE 10.20 Technology decomposition for minimizing switching activity.
a 1941_C10.fm Page 18 Thursday, September 30, 2004 4:46 PM
Copyright 2005 by CRC Press
Circuits Techniques for Dynamic Power Reduction 10-19
for low-power consumption during technology mapping have been proposed. The main concept is to hide nodes with high switching activity inside the gates, thus they can drive smaller load capacitance, as presented in Figure 10.21.
According to Tsui et al. [28], the whole process consists of two steps. The first step requires the computation of power-delay curves (i.e., power consumption vs. arrival time) of all nodes in the network. The second step produces the mapping solution according to the previous curves and the required times at the primary inputs. This method has been proven to imply an 18% power savings at the expense of a 16% increase in area, without any penalty in network performance. In other words, we can say that the power-delay mapper reduces the number of high switching activity sub-networks at the expense of increasing the number of them having low switching activity. In addition, it reduces the network average load.
Although the approach mentioned previously refers to mapping for zero-delay circuits, an extension to a real-delay model is considered in Tsui et al. [28], resulting in optimum power solutions. According to [28], every point on the power-delay curve of a specific node uniquely defines a mapped subnet from the circuit inputs up to the node. The principle is to compute each such point with the probability waveform for the node in the corresponding mapped subnet. Thus, the total power cost, owing to steady-state transitions and hazards, of a candidate match can be calculated from the computed power-delay curves at the inputs of the gate and the power-delay characteristics of the gate itself.
10.7 Conclusions
The dynamic power consumption is a dominant power component for the current and future design technologies. Dynamic power substantially increases in nanometer technologies because of increased number of on-chip functions as well as a prolonging trend on getting higher clock frequencies. A multi-objective approach for reducing dynamic power consumption should combine multiple supply and threshold voltages with flexible gates from suitable cell libraries and efficient signaling schemes. Two design strategies can be adopted to reduce dynamic power. The first strategy concerns the supply voltage reduction, where substantial power savings can be achieved due to its quadratic dependence (i.e., ). The second strategy concerns the capacitance or switching activity reduction, which is very useful when the design process is fixed. Four different sets of low-power design techniques were presented. More specifically, circuit techniques based on the principle of parallelism, techniques that use multiple supply voltages and low on-chip voltage swing, and techniques that are circuit technology-dependent and technology-intechnology-dependent. The key challenges to using multiple voltage supplies on a chip are minimizing area cost, placing logic cells under appropriate clustering constraints, as well as using dual power rails and efficient cell libraries that are capable of assigning the appropriate threshold voltage to each cell.
References
[1] V. Adler and E. Friedman, Repeater design to reduce delay and power in resistive interconnect, Trans. Circuits and Systems — II, Vol. 45, May 1998, pp. 607–616.
[2] M. Alidima, J. Monteiro, S. Devadas, A. Ghosh, and M. Papaefthimiou, Precomputation-based sequential logic optimization for low power, IEEE Trans. on VLSI, Vol. 2, No. 4, pp. 426–435, Dec. 1994.
FIGURE 10.21 Technology mapping for minimizing switching activity.
p
p q
q
q
q p
p
P V• dd2
1941_C10.fm Page 19 Thursday, September 30, 2004 4:46 PM
Copyright 2005 by CRC Press
10-20 Low-Power Electronics Design
[3] L. Benini, G. De Micheli, E. Macii, M. Poncino, and R. Scrasi, Symbolic synthesis of clock-gating logic for power optimization of synchronous controllers, ACM Trans. on Design Automation of Electron. Syst., Vol. 4, No. 4, pp. 351–375, Oct. 1999.
[4] L. Benini, G. De Micheli, A. Lioy, E. Macii, G. Odasso, and M. Poncino, Synthesis of power-managed sequential components based on computation kernel extraction, IEEE Trans. on CAD, Vol. 20, No.
9, pp. 1118–1131, Sept. 2001.
[5] L. Benini, P. Siegel, and G. De Micheli, Saving power by synthesizing gated clocks for sequential circuits, IEEE Design Test of Comput., pp. 32–41, Winter 1994.
[6] L. Benini and G. De Micheli, Automatic Synthesis of low-power gated-clock finite-state machines, IEEE Trans. on CAD, Vol. 15, No. 6, pp. 630–643, June 1996.
[7] T. Burd and R.W. Brodersen, Energy Efficient Microprocessor Design, Kluwer Academic Publishers, Boston, 2002.
[8] A.P. Chandrakasan and R.W. Brodersen, Low-Power Digital CMOS Design, Kluwer Academic Pub-lishers, Boston, 1995.
[9] W. Chung, T. Lo, and M. Sachdev, A comparative analysis of low-power low-voltage dual-edge triggered flip-flops, Trans. on VLSI Syst., Vol. 10, No. 6, Dec. 2002, pp. 913–918.
[10] R. Colshan and B. Jaroun, A novel reduced swing CMOS BUS interface circuit for high-speed low-power VLSI systems, Proc. of Int. Symp. on Circuits and Syst. (ISCAS), 30 May 1994, London, UK, Vol. IV, pp. 351–354.
[11] J. Goodman and A.P. Chandrakasan, Low-power scalable encryption for wireless systems, Wireless Networks, 4, 1998, pp. 55–70.
[12] M. Hiraki et al., Data-dependent logic swing internal bus architecture for ultra low-power LSI’s, IEEE J. Solid-State Circuits, Vol. 30, Apr. 1995, pp. 397–402.
[13] R. Hossain, L. Wronski, and A. Albicki, Low-power design using double edge triggered flip-flops, Trans. on VLSI Syst., Vol. 2, No. 2, June 1994, pp. 261–265.
[14] J.P. Hayes, Computer Architecture and Organization, McGraw-Hill, New York, 1978, p. 382.
[15] T. Kuroda, Low-power CMOS design challenges, IEICE Trans. on Electron., Vol. E84-C, Aug. 2001, pp. 1021–1028.
[16] H.-J. Kwon and K. Lee, A new division algorithm based on lookahead of partial-remainder (LAPR) for high-speed/low-power coding applications, IEEE Trans. of CAS-II, Vol. 46, No. 2, Feb. 1999, pp. 202–209.
[17] B. Lin and H. De Man, Low-power driven technology mapping under timing constraints, Proc.
ICCAD, 1993, pp. 421–427.
[18] R.P. Llopis and M. Sachdev, Low-power, testable dual-edge triggered flip-flops, Proc. Int. Symp.
Low-Power Electronics and Design, 1996, pp. 341–345.
[19] M. Lowy, Low-power spread spectrum code generator based on parallel shift registers, 1994 IEEE Symp. on Low-Power Electron., San Diego, CA, Oct. 10–12, 1994, pp. 22–23.
[20] J. Monteiro, S. Devadas, and A. Ghosh, Retiming sequential circuits for low power, Proc. ICCAD, Nov. 7–11, Santa Clara, CA, pp. 398–402, 1993.
[21] G. Panigrahi, The implications of electronic serial memories, Computer, July 1977, pp.18–25.
[22] M. Pedram, Q. Wu, and X. Wu, A new design of double-edge triggered flip-flops, Proc. ASP-DAC
’98 Asian and South Pacific Design Automation Conf., Feb. 10–13, 1998, Yokohama, Japan, pp.
417–421.
[23] C. Piguet, J.-M. Masgonty, V. von Kaenel, and T. Schneider, Logic design for low-voltage/low-power CMOS circuits, 1995 Int. Symp. on Low-Power Design, Dana Point, CA, Apr. 23–26, 1995, pp.
117–122.
[24] C. Piguet, Logic design for low-power CMOS circuits. Invited talk at TENCON ’95, Hong-Kong, Nov. 7–10, 1995, pp. 299–302.
[25] T. Schneider, V. von Kaenel, and C. Piguet, Low-voltage/low-power parallelized logic modules, Proc. PATMOS ’95, Paper S4.2, Oldenburg, Germany, Oct. 4–6, 1995, pp. 147–160.
1941_C10.fm Page 20 Thursday, September 30, 2004 4:46 PM
Copyright 2005 by CRC Press
Circuits Techniques for Dynamic Power Reduction 10-21
[26] A. Antonio, G.M. Strollo, E. Napoli, and C. Cimino, Analysis of power dissipation in double edge-triggered flip-flops, Trans. on VLSI Syst., Vol. 8., No. 5, Oct. 2000, pp. 624–629.
[27] V. Tiwari, P. Ashar, and S. Malik, Technology mapping for low-power in logic synthesis, Integration, the VLSI J., July 1996.
[28] C.-Y. Tsui, M. Pedram, and A. Despain, Power-efficient technology decomposition and mapping under extended power consumption model, IEEE Trans. on CAD, Vol. 13, No. 9, Sept. 1994.
[29] K. Usami and M. Horowitz, Clustered voltage scaling technique for low-power design, Proc. Int.
Symp. on Low-Power Design, Apr. 1995, pp. 3–8.
[30] K. Usami and M. Igarashi, Low-power design methodology and applications utilizing dual supply voltages, Proc. Asia and South Pacific Design Automation Conf., Jan. 25–28, 2000, Yokohama, Japan, pp. 123–128.
[31] H. Yamauchi et al., An asymptotically zero power charge–recycling bus architecture for battery-operated ultra-high data rate ULSIs IEEE J. Solid-State Circuits, Vol. 30, Apr. 1995, pp. 423–431.
[32] H. Zhang, G. Varghese, and J. Rabaey, Low-swing on-chip signaling techniques: effectiveness and robustness, Trans. on VLSI Syst., Vol. 8, No. 3, June 2000, pp. 264–272.
1941_C10.fm Page 21 Thursday, September 30, 2004 4:46 PM
Copyright 2005 by CRC Press
11-1
0-8493-1941-2/05/$0.00+$1.50
© 2005 by CRC Press LLC