• No results found

Fast and Efficient CMOS Functional Circuits

In document LowPowerElectronics.pdf (Page 131-137)

Low-Power Very Fast Dynamic Logic Circuits

8.4 Fast and Efficient CMOS Functional Circuits

The CMOS functional circuits introduced in this section are featured with high efficiency and high speed.

High efficiency leads to a small number of both clocked and logic-operating devices, resulting in low-power consumption, and high speed offers a large delay margin that can be used for trading low-power at a lower supply voltage.

8.4.1 Dividers and Ripple Counters

A very fast divider can be constructed simply by connecting the output and the input of the nonclassic nine-transistor TSPC flip-flop depicted in Figure 8.4(a). Its dynamic version is shown in Figure 8.20 while its semi-static version is presented in Figure 8.21, respectively. The transistor widths in 2-mm CMOS, given in Figure 8.20, are optimized for speed. An 8-bit ripple counter built from the dynamic divide-by-two stage reached an input frequency of 750 MHz [15]. The semi-static divider in Figure 8.21 can be used in a very long ripple counter where the frequencies get very low in later stages. Note that the narrow pulse signal (out 1) should be fed to the next stage so that the condition for a dynamic circuit will be always satisfied for all stages in the ripple chain, while the 50% duty-cycle signal (out 2) used for bit-output.

FIGURE 8.19 Unified connection rules of TSPC and CDPD stages.

FIGURE 8.20 A dynamic divide-by-two circuit (D-1/2).

SP

SP

PN

H/L

H/L

L/H

L/H

PP SN

SP

SN

SN

In

24 12

5 42

34

24 20

20 14

5 Out

Reset Speed-optimized in 2 µm CMOS.

1941_C08.fm Page 12 Thursday, September 30, 2004 4:42 PM

Copyright 2005 by CRC Press

Low-Power Very Fast Dynamic Logic Circuits 8-13

A differential divider offers differential outputs, which is sometimes a quite useful feature. In favor of speed, two n-SSTC latches can be cascaded to form a divider instead of using a p-SSTC and an n-SSTC, by using differential output signals available from the previous stage, named an S-1/2 stage and presented in Figure 8.22(a). For a single ended input, a semi-static differential divider may be used, named an HS-1/2 stage and presented in Figure 8.22(b). Two divider chains, one constructed by four D-HS-1/2 stages with a buffer between stage 1 and stage 2 and another one constructed by an HS-1/2 stage followed by three S-1/2 stages, were constructed in IBM’s partially scaled 0.1-mm CMOS process [16]. The measured input frequencies achieved 16.6 GHz for the dynamic divider chain and 12.5 GHz for the static divider chain [17].

8.4.2 Synchronous Counter

A TSPC synchronous counter is depicted in Figure 8.23 as an example showing how the carry-logic can be arranged in a p-block in favor of speed while using the dynamic divider as the toggle stage with the carry control function embedded [15]. The transistor widths optimized for speed are valid in a 3-mm technology. An 8-bit synchronous counter of this kind in the 3-mm technology was measured to reach a clock rate of 200 MHz. For a very long counter, however, the carry propagation becomes a serious FIGURE 8.21 A semi-static divide-by-two circuit.

FIGURE 8.22 Differential divide-by-two circuits.

Out 2 In In

Out 1

Out 1 -to next stage

Out 2 -Bit output

In

In

Out Out

In

Out Out (A) A static differential divide-by-two circuit (S-1/2).

(B) A single-in double-out semi-static divide-by-two circuit (HS-1/2).

1941_C08.fm Page 13 Thursday, September 30, 2004 4:42 PM

Copyright 2005 by CRC Press

8-14 Low-Power Electronics Design

speed bottleneck even with the parallel carry-logic depicted in Figure 8.23. A so-called backward carry propagation topology, in contrast to the conventional forward carry propagation, can be used to break the limit [18]. The principle block diagram of a backward carry propagation synchronous counter is presented in Figure 8.24. In a conventional counter, the worst-case scenario happens at output 0111 … 111, and the 0Æ1 flip of LSB has to be propagated through the whole chain of “AND” gates to MSB to enable the next output 1000 … 000. In Figure 8.24, however, when the out is 0111 … 110, the carry propagation is almost finished, and when the 0Æ1 flip of LSB comes all bits are ready for next output simultaneously. A more practical architecture is presented in Figure 8.25, mixing backward with forward carry propagation, at a lowest area and power penalty. The interface between the two propagation strategies depends on the counter length. Generally, a few bits using the backward carry propagation are enough.

FIGURE 8.23 A bit-slice of a synchronous counter with parallel carry-logic in TSPC.

FIGURE 8.24 A synchronous counter with fully backward carry propagation.

FIGURE 8.25 A synchronous counter mixing backward and forward carry propagations.

b0 b1 bk-1

Carry Carry Carry Carry Carry Carry

bn-1

Carry Carry Carry Carry Carry Carry Carry

Backward (m bits) Forward

(td < 2mT clock) 1941_C08.fm Page 14 Thursday, September 30, 2004 4:42 PM

Copyright 2005 by CRC Press

Low-Power Very Fast Dynamic Logic Circuits 8-15

8.4.3 Nonbinary Divider and Prescaler

A nonbinary divider is usually constructed by a synchronous counter plus a decoding logic (i.e., when the output code reaches the target dividing ratio, the counter is reset). Such a topology can offer any dividing ratio; however, a synchronous counter is slow and the decoding logic adds additional delay. It was found that an SP stage followed by an SN stage in TSPC becomes a half-transparent register (HT-register) (i.e., registering a low-input [imposing a clock cycle delay] but no-register function for a high-input [transparent]). This feature can be utilized for constructing a nonbinary divider [19]. A divide-by-three circuit is depicted in Figure 8.26 along with the waveforms at different nodes. At node b, a symmetric waveform is obtained at a frequency of fin/3, assuming the input clock is symmetric. If (n-2) HT-registers are used, it becomes a divide-by-n circuit. Because no decoding and no carry propagation exist, this circuit can work at the same speed as a 1/2 divider. The long propagation delay due to many cascaded transparent stages can be solved by a few speed-up transistors, see Yuan and Svensson [19].

Note that in a nonbinary divider the output is still edge-triggered (i.e., there is no skew between input and output as could be a problem for a ripple counter. Although a single nonbinary divider can offer any dividing ratio, it is more efficient to cascade two or more than two nonbinary dividers to achieve a high dividing ratio. For example, a 1/3 divider cascaded by a 1/7 divider becomes a 1/21 divider, and so on. The nonbinary divider is extremely useful for prescalers as the needed operating speed is often very high and the dynamic feature is usually not a problem. A dual-modulus prescaler, divided by either n or (n-1), is presented in Figure 8.27 for the purpose of frequency synthesis, where the “Inv-register”

FIGURE 8.26 A dynamic nonbinary divider (1/3).

FIGURE 8.27 A divide-by-n/(n-1) prescaler.

fin 1941_C08.fm Page 15 Thursday, September 30, 2004 4:42 PM

Copyright 2005 by CRC Press

8-16 Low-Power Electronics Design

represents the 9-transistor TSPC flip-flop. The control of divide-by-n and divide-by-(n-1) is extremely simple, only a single n-transistor in one of the (n-b) HT-registers, making this circuit highly attractive.

When the input of the transistor is high, this HT-register becomes fully transparent. Other techniques in dual-modulus prescalers based on the modification of the nine-transistor TSPC flip-flop can be found in Chang et al. [20] and Yang et al. [21].

8.4.4 Adder and Accumulator

The core part of an adder is the “XOR” logic. A highly efficient pipelined XOR gate in TSPC is shown in Figure 8.28. The basic topology is to implement the XOR function in two steps respectively in a p-block and an n-p-block and to embed logic into latches [22]. The logic diagram is given at the upper part while the circuit diagram is given at the lower part of the figure. The NAND, OR, and AND functions are respectively embedded in an SN stage, an n-type precharged latch and a p-type precharged latch. The connection exactly follows the rule mentioned previously. The efficiency comes from two facts. First, the nonclassic principle is applied to the pair of SN and precharged p-type latch, so a single SN stage is used to embed the NAND function in favor of speed. Second, both the OR function in the n-type precharged latch and the AND function in the p-type precharged latch use parallel transistors which is also in favor of speed. The pipelined XOR gate can be directly cascaded to deliver the sum output for a full adder, and the sum can be fed back to its own input for accumulation. An accumulator can be therefore configured efficiently by using the pipelined XOR gate, and one of the bit-slices is given in Figure 8.29.

A 24-bit pipelined accumulator in 1.2-mm CMOS for a numerically controlled oscillator based on the topology achieved a clock rate of 700 MHz [23].

FIGURE 8.28 A pipelined XOR gate in TSPC.

FIGURE 8.29 A bit-slice of an accumulator using the pipelined XOR in TSPC.

A

A B

B φ φ

φ

φ φ

φ φ

AB

AB p-block

n-block

Carryout

XOR Reset

Carryin

D D

bin

XOR Sum

p-block n-block p-block

n-block 1941_C08.fm Page 16 Thursday, September 30, 2004 4:42 PM

Copyright 2005 by CRC Press

Low-Power Very Fast Dynamic Logic Circuits 8-17

8.4.5 Bit-Serial Comparator and Sorter

Dynamic logic may greatly simplify a complex circuit function and minimize the number of devices, resulting in low power without sacrificing speed or even with improved speed. One example is the bit-serial word-parallel maximum/minimum selector [24] in Figure 8.30. The main part of the selector is an n-type TSPC precharged latch embedding the selecting logic with AND functions (two n-transistors in stack) in parallel. The purpose to show this circuit is to emphasize that the precharged node signal can be used to effectively simplify the configuration. This signal is used as a second “clock” for a number of parallel PP stages. Each PP stage receives an input word. The precharged node signal of each PP stage is again used for the nMOS input of the flag logic stage, which consists of only an n- and a p-transistor.

The p-transistor sets the flag high before the new input words start, and therefore all flags are high from the beginning. All word inputs start with MSB and are compared digits by digits. If all digits are the same, no matter zero or one, all flags are kept high. If partial digits become zero, the input with a zero digit will make the output of the PP stage high and the flag low to disable this input. In the end, only the maximum input is left. During comparison and selection, the output never stops. A minimum selector can be easily completed by inverting the input words and, of course, inverting the output again.

A bit-serial and word-parallel compare-and-swap cell [24] is presented in Figure 8.31. The maximum selector is used along with a minimum selector, which uses an inverted flag and the logic opposite to the maximum selector. When the two digits are equal, both go to the outputs, and when the two digits are different, the upper output will be “one” and the lower output will be “zero.” In the same time, the smaller input will be disabled in the maximum selector, while the larger input will be disabled in the minimum FIGURE 8.30 A bit-serial and word-parallel max/min selector.

FIGURE 8.31 A bit-serial compare-and-swap cell.

1941_C08.fm Page 17 Thursday, September 30, 2004 4:42 PM

Copyright 2005 by CRC Press

8-18 Low-Power Electronics Design

selector. It must be noted that the complete cell is an n-block in TSPC, and to cascade two such cells, a p-type latch must be used in between. An 8-input bit-serial and word-parallel sorter is depicted in Figure 8.32, where each box represents the compare-and-swap cell plus a p-type latch. This pipelined sorter can achieve a very high data throughput.

In document LowPowerElectronics.pdf (Page 131-137)