Low-Power Very Fast Dynamic Logic Circuits
8.3 High-Throughput CMOS Circuit Techniques
8.3.1 TSPC Pipeline
TSPC flip-flops can be used as edge-triggered elements in a synchronous pipeline. Its short setup-time, hold-time, and propagation delay contribute to high speed. Complementary logic stages can be placed between two TSPC latches in the pipeline. More efficiently, the logic gates can be embedded within TSPC latches [8], as depicted in Figure 8.13(a) and Figure 8.13(b). The previously mentioned pipeline can be divided into p-blocks and n-blocks, as depicted in Figure 8.13(c). A p-block consists of a p-type latch, which may embed logic, associated with the complementary logic stages before and after the p-latch, and it is the same for an n-block but with an n-type latch instead. The blocks must be connected with p-type and n-type latches alternately. Feedback is allowed but must also follow the rule from p-type to n-type or vice versa. In such a pipeline, p-blocks are the speed bottlenecks, especially when logic gates are included in the p-blocks or embedded in the p-latch with many stacked p-transistors. Therefore, in order to achieve a high throughput, logic gates are preferably placed in the n-blocks, leaving the p-block as a passing stage or with very simple logic. The nonclassic concept may be used to simplify the p-block to just a single SP stage if directly (or indirectly after an even number of complementary stages) followed by an n-type precharged latch. An all n-logic true-single-phase dynamic CMOS circuit technique was proposed in Gu and Elmasry [12] to speed up the p-block, in which the logic embedded in a p-type precharged latch uses n-transistors instead of p-transistors.
TABLE 8.1 Comparison of Flip-Flops
Type No. Master + Slave Power (mW) Delay (ns)
Dynamic 1 p-CVSL + n-CVSL 699.4 0.691
2 p-C2MOS + n-C2MOS 491.8 0.950 3 p-Classic + n-Classic 512.4 0.776 4 (SP + SP) + (PN + SN) 404.3 0.835
5 SP + (PN + SN) 331.6 0.832
6 (SP + SP) + (SN + SN) 317.6 0.802
7 p-DSTC2 + n-DSTC1 313.1 0.717
Static 8 Classic + Classic 668.8 1.008
9 p-RAM + n-RAM 685.4 0.673
10 p-DSTC2 + n-SSTC 393.5 0.705
Note: Activity ratio = 0.25 and load = two inverters, in 0.8 mm CMOS.
FIGURE 8.12 Comparison of power-delay products.
800
1941_C08.fm Page 8 Thursday, September 30, 2004 4:42 PM
Copyright 2005 by CRC Press
Low-Power Very Fast Dynamic Logic Circuits 8-9
8.3.2 TSPC Double Pipeline
Synchronous elements in a pipeline are usually triggered by a single clock edge. In a double pipeline, however, both edges of a clock are utilized for achieving high throughput and efficiency [9]. The data rate at the input and output of a double pipeline is at twice the clock rate. Internally, each pipeline works as normal, and data can be cross-connected between the two lines as long as following the n-to-p or p-to-n rule. As illustrated in Figure 8.14(a), two TSPC pipelines starting and finishing with opposite types of blocks can be such a double pipeline, and the two input-connected input blocks and the two output-connected output blocks become a multiplexer and a demultiplexer respectively. Because the input and output blocks have to work at a double data rate, they are the speed bottlenecks. It is therefore preferred to have the double pipeline configured as shown in Figure 8.14(b) (i.e., single-stage latches at both ends).
This can be done by using the single-stage full latches [10], depicted in Figure 8.6(a) and Figure 8.6(b), which narrows the forbidden windows of low-to-high and high-to-low data transition by almost half, increasing speed and robustness. To reduce power consumption at a given data rate, a low-swing clock double-edge triggered flip-flop was proposed in Kim and Kang [13], in which both edges of a low swing clock are used to trigger a single flip-flop to reduce overall clock rate and associated power consumption.
8.3.3 Clock-and-Data Precharged Circuit Technique
All the preceding circuits are aiming at a high throughput regardless of the latency or the number of operating clock cycles for a final output. In many applications, however, the decision has to be made in one clock cycle. A technique named clock-and-data precharged dynamic (CDPD) circuit technique may FIGURE 8.13 Logic embedded in latch stages in a TSPC pipeline.
FIGURE 8.14 TSPC double pipelines.
P-block Clock (φ)
n-block P-block n-block
(C) A TSPC pipeline.
(A) Precharged logic in PP and PN. (B) Complementary logic in SP and SN.
In1
(A) Bottlenecks at both ends. (B) Bottlenecks alleviated.
Data 1 1941_C08.fm Page 9 Thursday, September 30, 2004 4:42 PM
Copyright 2005 by CRC Press
8-10 Low-Power Electronics Design
offer an alternative for a fast one-clock-cycle decision and in the same time reduce the power consumption [14]. Domino logic is often used for logic calculations with a large depth as the logic parts can be distributed along the domino chain and are all in nMOS. As illustrated in Figure 8.15(a), however, an inverter has to be placed between two precharged stages to prevent an erroneous high-output to the next stage at the beginning of evaluation. Moreover, charge sharing may occur between the output node and the intermediate nodes so extra precharging transistors have to be used. As illustrated in Figure 8.15(b), all contents in the dashed line box can be replaced by only three transistors in CDPD technique, and no clocked transistor is contained in it. This CDPD block is named an H/L (high-to-low) stage in which the output is precharged to low by a high data input, and the NOR function is simply fulfilled by the two p-transistors. An H/L stage can be followed by an L/H (low-to-high) stage in which the output is precharged to high by a low data input. An n-type CDPD chain can be formed by the original domino precharged stages along with the H/L and L/H stages in between, as illustrated in Figure 8.16(a). It needs FIGURE 8.15 Domino logic and its equivalent CDPD logic.
FIGURE 8.16 Two types of CDPD chains each ended with an SN latching stage.
φ (A) An n-type CDPD chain.
p
Optional
From another CDPD p-chain
(B) A p-n type CDPD chain.
1941_C08.fm Page 10 Thursday, September 30, 2004 4:42 PM
Copyright 2005 by CRC Press
Low-Power Very Fast Dynamic Logic Circuits 8-11
an odd number of CDPD stages between two domino precharged stages, and an even number of CDPD stages between a domino stage and an output latch.
A number of advantages can be cited. First and second, all domino inverters are removed, and the number of clocked devices is minimized, reducing unnecessary power consumption. Third, the skewed precharging of CDPD stages effectively reduces the peak current. A p-n type CDPD chain is presented in Figure 8.16(b), and the rules can be found in the figure.
The p-n type CDPD chain has additional advantages. First, the logic operations are completed in both high and low clock periods so each duty cycle of the clock is fully utilized. Second, not only the number of clocked devices but also the number of latch stages is reduced. As indicated in Figure 8.16(b), the latch before the n-type precharged stage is optional, only depending on the need of inversion. Cares and skills are needed for designing CDPD stages to avoid erroneous results, as illustrated in Figure 8.17. A “NAND”
function can be simplified in an L/H stage but is directly used for an H/L stage, while a “NOR” function can be simplified in an H/L stage but is directly used for an L/H stage. The wrong connections, which will result in charge sharing, should be avoided. Generally, complementary gates are simplified differently in an H/L or an L/H stage, (see Figure 8.18). In the worst-case scenario, complementary gates can be directly used for either an L/H or an H/L stage.
8.3.4 United Connection Rules of TSPC and CDPD Stages
It is important to follow the connection rules for constructing TSPC and CDPD circuits, and computer-aided design (CAD) tools should be able to check the correctness of the circuit connection according to FIGURE 8.17 NAND and NOR gates transferred into L/H or H/L stages.
FIGURE 8.18 Logic gates transferred into either L/H or H/L stages.
A
Original static stages Equivalent H/L stage Equivalent L/H stage 1941_C08.fm Page 11 Thursday, September 30, 2004 4:42 PM
Copyright 2005 by CRC Press
8-12 Low-Power Electronics Design
the rules. If connections are correct, the circuit will undoubtedly work but the target function and speed have to be checked by simulation. The unified connection rules of TSPC and CDPD stages are illustrated in Figure 8.19. For example, SPÆSPÆSNÆSN represents a TSPC nonprecharged flip-flop, and SPÆSPÆPNÆSN becomes a TSPC precharged flip-flop. Nonclassic flip-flops are represented by the connections of SPÆPNÆSN (positive edge-triggered) and SNÆPPÆSP (negative edge-triggered). The connection rules between CDPD and TSPC stages are also clearly included.