EC6601 VLSI Design Model Qb

(1)

Department of Electronics and Communication

Engineering

QUESTION BANK

VI-SEMESTER

( Reg-2013)

AUTHORS

1. Mr. M. Yuvaraj,

Assistant Professor, Dept. of ECE, Agni College of Technology.

2. Mrs. A Shifana Parween,

Assistant Professor, Dept. of ECE, Agni College of Technology.

3. Mr. G. Laxmanaa ,

(2)

1.

What are the different operating regions foe an MOS transistor? _ Cutoff region

_ Non- Saturated Region _ Saturated Region

2.

What is Channel-length modulation?

The current between drain and source terminals is constant and independent of the

applied voltage over the terminals. This is not entirely correct. The effective length of the conductive channel is actually modulated by the applied VDS, increasing VDS causes the depletion region at the drain junction to grow, reducing the length of the effective channel.

3.

Define Threshold voltage in CMOS?

The Threshold voltage, VT for a MOS transistor can be defined as the voltage applied between the gate and the source of the MOS transistor below which the drain to source current, IDS effectively drops to zero.

4.

What is Body effect?

The threshold volatge VT is not a constant w. r. to the voltage difference between the substrate and the source of MOS transistor. This effect is called substrate-bias effect or body effect.

5.

What is Scaling?

Proportional adjustment of the dimensions of an electronic device while maintaining the electrical properties of the device, results in a device either larger or smaller than the un-scaled device

6.

What is Elmore’s Constant.

In general, most circuits of interest can be represented as an RC tree, i.e., an RC circuit with no loops. The root of the tree is the voltage source and the leaves are the capacitors at the ends of the branches. The Elmore delay model [Elmore48] estimates the delay from a source switching to one of the leaf nodes changing as the sum over each node I of the capacitance Cion the node, multiplied by the effective resistance Rison the shared path from the source to the node and the leaf. Application of Elmore delay is best illustrated through examples.

(3)

7.

Define Static CMOS logic.

The principle of static CMOS logic is that the output is connected to ground through an n-block and to VDD through a dual p-block. Without changes of the inputs this gate consumes only the leakage currents of some transistors. When it is switching it draws an additional current which is needed to charge and discharge the internal capacitances. and the load. Although the gate's logic function is ideally independent of the transistor channel widths, they determine the dynamic behavior essentially: wider transistors will switch a capacitive load faster, but they will also cause a larger input capacitance of the gate. Unless otherwise noted, minimum-width and, of course, minimum-channel-length transistors are assumed. For given capacitances the transistors' on-state current Ion will limit the switching speed of the gate and, consequently, the maximum clock frequency of a synchronous circuit

.

8.

Define Dynamic CMOS logic.

Dynamic logic is distinguished from so-called static logic in that dynamic logic uses a clock signal in its implementation of combinational logic circuits. The usual use of a clock signal is to synchronize transitions in sequential logic circuits. For most implementations of combinational logic, a clock signal is not even needed.

(4)

9.

What is meant by transmission gate?

A transmission gate, or analog switch, is defined as an electronic element that will selectively block or pass a signal level from the input to the output. This solid-state switch is comprised of a pMOS transistor and nMOS transistor. The control gates are biased in a complementary manner so that both transistors are either on or off.A transmission gate consists of an n-channel transistor and p-channel transistor with separate gates and common source and drain.

10.

What are the different types of power dissipation?

There are three types of power dissipation. They are

 Static power dissipation. Ps= leakage power * supply voltage.  Dynamic power dissipation. Pd = CLV2_{dd fclk}

 Short circuit power dissipation Psc = Imean * Vdd

11.

What are synchronizers?

A synchronizer is a circuit that accepts an input that can change at arbitrary times and produces an output aligned to the synchronizer’s clock. Because the input can change during the synchronizer’s aperture, the synchronizer has a nonzero probability of producing a metastable output

(5)

12.

State bistability principle

A bistable circuit has two stable states. In absence of any triggering, the circuit remains in a single state (assuming that the power supply remains applied to the circuit), and hence remembers a value. A trigger pulse must be applied to change the state of the circuit. Another common name for a bistable circuit is flip-flop (unfortunately, an

edge-triggered register is also referred to as a flip-flop).

13.

Explain about C2MOS latch.

The dynamic latch of Figure 10.17(d) can also be drawn as a clocked tristate. Such a form is sometimes called clocked CMOS (C2MOS) the output is driven through the nMOS and pMOS working in parallel. C2MOS is slightly smaller because it eliminates two contacts.

14.

What is meant by true single phase clocked register?

The True Single-Phase Clocked Register (TSPCR) uses a single clock (without an inverse clock). The basic single-phase positive and negative latches are shown in Figure 7.30. For the positive latch, when CLK is high, the latch is in the transparent mode and corresponds to two cascaded inverters; the latch is non-inverting, and propagates the input to the output. On the other hand, whenC LK = 0, both inverters are disabled, and the latch is in hold-mode. Only the pull-up networks are still active, while the pull-down circuits are deactivated. As a result of the dual-stage approach, no signal can ever propagate from the input of the latch to the output in thism ode. A register can be constructed by cascading positive and negative latches.

15.

Define pipelining.

Pipelining is a popular design technique often used to accelerate the operation of

the datapaths in digital processors. The idea is easily explained with the example of Figure 7.40a. The goal of the presented circuit is to computelog(|a - b|), where both a and

(6)

b represent streams of numbers, that is, the computation must be performed on a large set

of input values.

16.

What is meant by Datapath circuits?

A datapath is a collection of functional units, such as arithmetic logic units or multipliers, that perform data processing operations, registers, and buses.[1] Along with the control unit it composes the central processing unit (CPU).

17.

How CLA differ from RCA.

CLA RCA

The carry lookahead adder (CLA) solves the carry delay problem by calculating the carry signals in advance, based on the input signals. It is based on the fact that a carry signal will be generated in two cases: (1) when both bits ai and bi are 1, or (2) when one of the two bits is and the carry-in is 1

In the ripple carry adder, the output is known after the carry generated by the previous stage is produced. Thus, the sum of the most significant bit is only

available after the carry signal has rippled through the adder from the least

significant stage to the most significant stage. As a result, the final sum and carry bits will be valid after a considerable delay.

18.

List out different high speed adders.

 Carry look ahead adder

 Carry skip adder

 Carry save adder

 Carry select adder

 Carry bypass adder

19.

Define Accumulator.

An accumulator is a register for short-term, intermediate storage of arithmetic and logic data in a computer's CPU (central processing unit). The term "accumulator" is rarely used in reference to contemporary CPUs, having been replaced around the turn of the millennium by the term "register." In a modern computers, any register can function as an accumulator.

(7)

20.

Draw the generic block diagram of digital processor?

21.

Write the design style classification? The IC design style can be classified as

 Full custom Design ASICs

 Semi custom Design ASICs o Standard Cell Design o Gate Array Design

 Channeled Gate Array  Channel less Gate Array

 Programmable ASICs o PLDs

o FPGA

22.

Differentiate between channeled & channel less gate array.

The channeled gate array channel less gate array

 The channeled gate array was the first to be developed . In a channeled gate array space is left between the rows of transistors for wiring.

 A channeled gate array is similar to a CBIC. Both use the rows of cells separated by channels used for

interconnect. One difference is that the space for interconnect between rows of cells are fixed in height in a channeled

• This channel less gate-array

architecture is now more widely used . The routing on a channelless gate array uses rows of unused transistors. • The key difference between a channel

less gate array and channeled gate array is that there are no predefined areas set aside for routing between cells on a channel less gate array. Instead we route over the top of the gate-array

(8)

gate array, whereas the space between rows of cells may be adjusted in a CBIC

devices. We can do this because we customize the contact layer that defines the connections between metal 1, the first layer of metal, and the transistors.

23.

What is a FPGA?

A field programmable gate array (FPGA) is a programmable logic device that supports implementation of relatively large logic circuits. FPGAs can be used to implement a logic circuit with more than 20,000 gates whereas a CPLD can implement circuits of upto about 20,000 equivalent gates.

24.

What is an antifuse?

An antifuse is normally high resistance (>). On application of appropriate 100M programming voltages, the antifuse is changed permanently to a low-resistance structure (200-500).

(9)

1. A. Discuss DC transfer characteristics of the CMOS. (8)

DC transfer characteristics

Digital circuits are merely analog circuits used over a special portion of their range. The DC transfer characteristics of a circuit relate the output voltage to the input voltage, assuming the input changes slowly enough that capacitances have plenty of time to charge or discharge. Specific ranges of input and output voltages are defined as valid 0 and 1 logic levels. This section explores the DC transfer characteristics of CMOS gates and pass transistors.

Static CMOS Inverter DC Characteristics

Let us derive the DC transfer function (Vout vs. Vin) for the static CMOS inverter shownin Figure 2.25. We begin with Table 2.2, which outlines various regions of operation forthe n- and p-transistors. In this table, Vtnis the threshold voltage of the n-channel device, and Vtpis the threshold voltage of the p-channel device. Note that Vtpis negative. The equations are given both in terms of Vgs/Vdsand Vin /Vout. As the source of the nMOS transistor is grounded, Vgsn= Vin and Vdsn= Vout. As the source of the pMOS transistoristied to VDD, Vgsp= Vin – VDD and Vdsp= Vout – VDD.The objective is to find the variation in output voltage (Vout) as a function of the inputvoltage (Vin). This may be done graphically, analytically (see Exercise 2.16), or through simulation [Carr72]. Given Vin, we must find Vout subject to the constraint that Idsn=|Idsp|. For simplicity, we assume Vtp= –Vtnand that the pMOS transistor is 2–3 times as wide as the nMOS transistor so �n = �p. We relax this assumption in Section 2.5.2.We comnce with the graphical representation of the simple algebraic equations described by EQ (2.10) for the two transistors shown in Figure 2.26(a). The plot shows Idsn and Idspin terms of Vdsnand Vdspfor various values of Vgsn and Vgsp. Figure 2.26(b)shows the same plot of Idsnand |Idsp| now in terms of Vout for various values of Vin. The possible operating points of the inverter, marked with dots, are the values of Vout where Idsn= |Idsp| for a given value of Vin. These operating points are plotted on Vout vs. Vin axes in Figure 2.26(c) to show the inverter DC transfer characteristics. The supply current IDD= Idsn= |Idsp| is also plotted against Vin in Figure 2.26(d) showing that both transistors are momentarily ON as Vin passes through voltages between GND and VDD, resulting ina pulse of current drawn from the power supply. The operation of the CMOS inverter can be divided into five regions indicated on Figure2.26(c). The state of each transistor in each region is shown in Table 2.3. In region A, then MOS transistor is OFF so the pMOS transistor pulls the output to VDD. In region B, then MOS transistor starts to turn ON, pulling the output down. In region C, both transistors are in saturation. Notice that ideal transistors are only in region C for Vin = VDD/2 and that the slope of the transfer curve in this example is – in this region, corresponding to infinite gain. Real transistors have finite output resistances on account of channel length modulation, described in Section 2.4.2, and thus have finite slopes over a broader region C. In region D, the pMOS transistor is partially ON and in region E, it is completely

Region Condition p-device n-device Output A 0 <= Vin<Vtnlinear cutoff Vout= VDD

B Vtn<= Vin<VDD/2 linear saturated Vout>VDD /2 C Vin = VDD/2 saturated saturatedVout drops sharply

(10)

D VDD /2 <Vin<= VDD – |Vtp| saturated linear Vout<VDD/2 E Vin >VDD – |Vtp| cutoff linear Vout= 0

2. Explain in detail about current voltage characteristics of MOS transistor.

I-V Characteristics

When familiarizing yourself with a new process, a starting point is to plot the current voltage(I-V) characteristics. Although digital designers seldom make calculations directly from these plots, it is helpful to know the ON current of nMOS and pMOS transistors, how severely velocity-saturated the process is, how the current rolls off below threshold, how the devices are affected by DIBL and body effect, and so forth. These plots are made with DC sweeps, as discussed in Section 8.2.2. Each transistor is 1 �m wide in a representative65 nm process at 70 °C with VDD = 1.0 V. Figure 8.16 shows nMOS characteristics and Figure 8.17 shows pMOS characteristics. Figure 8.16(a) plots Ids vs. Vdsat various values fVgs, as was done in Figure 8.5. The saturation current would ideally increase quadratically with Vgs– Vt, but in this plot it shows closer to a linear dependence, indicating that the nMOS transistor is severely velocity saturated(� closer to 1 than 2 in the �-power model). The significant increase in saturation current with Vdsis caused by channel-length modulation. Figure 8.16(b) makes a similar plot for a device with a drawn channel length of twice minimum. The current drops by less than a factor of two because it experiences less velocity saturation. The current is slightly flatter in saturation because channel-length modulation has less impact at longer channel channel-lengths.Figure 8.16(c) plots Ids vs. Vgson a semilogarithmi scale for Vds= 0.1 V and 1.0 V.The straight line at low Vgs indicates that the currentrolls off exponentially below threshold. The difference in subthreshold leakage at the varying drain voltage reflects the effects

(11)

(12)

2.a. Discuss the techniques to reduce switching activity in a static and dynamic CMOS circuits.

Circuit Families

Static CMOS circuits with complementary nMOS pulldown and pMOS pullup network sare used for the vast majority of logic gates in integrated circuits. They have good noise margins, and are fast, low power, insensitive to device variations, easy to design, widely supported by CAD tools, and readily available in standard cell libraries. When noise does exceed the margins, the gate delay increases because of the glitch, but the gate eventually will settle to the correct answer. Most design teams now use static CMOS exclusively for combinational logic. This section begins with a number of techniques for optimizing static CMOS circuits. Nevertheless, performance or area constraints occasionally dictate the need for other circuit families. The most important alternative is dynamic circuits. However, we begin by considering ratioed circuits, which are simpler and offer a helpful conceptual transition between static and dynamic. We also consider pass transistors, which had their zenith in the 1990s for general-purpose logic and still appear in specialized applications.

Static CMOS

Designers accustomed to AND and OR functions must learn to think in terms of NAND and NOR to take advantage of static CMOS. In manual circuit design, this is often done through bubble pushing. Compound gates are particularly useful to perform complex functions with relatively low logical efforts. When a particular input is known to be latest, the gate can be optimized to favor that input. Similarly, when either the rising or falling edge is known to be more critical, the gate can be optimized to favor that edge. We have focused on building gates with equal rising and falling delays; however, using smaller pMOS transistors can reduce power, area, and delay. In processes with multiple threshold voltages, multiple flavors of gates can be constructed with different speed/leakage power trade-offs

Bubble Pushing CMOS stages are inherently inverting, so AND and OR functions must be built from NAND and NOR gates. DeMorgan’s law helps with this conversion: In general, logical effort of compound gates can be different for different inputs. Figure 9.4 shows how logical efforts can be estimated for the AOI21, AOI22, and a more complex compound AOI gate. The transistor widths are chosen to give the same drive as a unit inverter. The logical effort of each input is the ratio of the input capacitance of that input to the input capacitance of the inverter. For the AOI21 gate, this means the logical effort is slightly lower for the OR terminal (C) than for the two AND terminals (A, B). The parasitic delay is crudely estimated from the total diffusion capacitance on the output node by summing the sizes of the transistors attached to the output. These relations are illustrated graphically in Figure 9.1. A NAND gate is equivalent to an OR of inverted inputs. A NOR gate is equivalent to an AND of inverted inputs. The same relationship applies to gates with more inputs. Switching between these representations is easy to do on a whiteboard and is often called bubble pushing.

(13)

Dynamic Circuits

Ratioed circuits reduce the input capacitance by replacing the pMOS transistors connected to the inputs with a single resistive pullup. The drawbacks of ratioed circuits include slow rising transitions, contention on the falling transitions, static power dissipation, and a nonzero VOL. dynamic circuits circumvent these drawbacks by using a clocked pullup transistor rather than a pMOS that is always ON. Figure 9.21 compares (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters. Dynamic circuit operation is divided into two modes, as shown in Figure 9.22. During precharge, the clock � is 0, so the clocked pMOS is ON and initializes the output Y high. During evaluation, the clock is 1 and the clocked pMOS turns OFF. The output may remain high or may be discharged low through the pulldown network. Dynamic circuits are the fastest commonly used circuit family because they have lower input capacitance and no contention during switching. They also have zero static power dissipation. However, they require careful clocking, consume significant dynamic power, and are sensitive to noise during evaluation. Clocking of dynamic circuits will be discussed in much more detail in Section 10.5. In Figure 9.21(c), if the input A is 1 during precharge, contention will take

place because both the pMOS and nMOS transistors will be ON. When the input cannot be guaranteed to be 0 during precharge, an extra clocked evaluation transistor can be added to the bottom of the nMOS stack to avoid contention as shown in Figure 9.23. The extra transistor is sometimes called a foot.

Figure 9.24 shows generic footed and unfootedgates.4 Figure 9.25 estimates the falling logical effort of both footed and unfooted dynamic gates. As usual, the pulldown transistors’ widths are chosen to give unit resistance. Precharge occurs while the gate is idle and often may take place more slowly. Therefore, the precharge transistor width is chosen for twice unit resistance. This reduces the capacitive load on the clock and the parasitic capacitance at the expense of greater rising delays. We see that the logical efforts are very low. Footed gates have higher logical effort

(14)

than their unfooted counterparts but are still an improvement over static logic. In practice, the logical effort of footed gates is better than predicted because velocity saturation means series nMOS transistors have less resistance than we have estimated. Moreover, logical efforts are also slightly better than predicted because there is no contention between nMOS and pMOS transistors during the input transition. The size of the foot can be increased relative to the other nMOS transistors to reduce logical effort of the other inputs at the expense of greater clock loading. Like pseudo-nMOS gates, dynamic gates are particularly well suited to wide NOR functions or multiplexers because the logical effort is independent of the number of inputs. Of course, the parasitic delay does increase with the number of inputs because there is more diffusion capacitance on the output node. Characterizing the logical effort and parasitic delay of dynamic gates is tricky because the output tends to fall much faster than the input

rises, leading to potentially misleading dependence of propagation delay on fanout [Sutherland99]. A fundamental difficulty with dynamic circuits is the monotonicity requirement. While a dynamic gate is in evaluation, the inputs must be monotonically rising. That is, the input can start LOW and remain LOW, start LOW and rise HIGH, start HIGH and remain HIGH, but not start HIGH and fall LOW. Figure 9.26 shows waveforms for a footed dynamic inverter in which the input violates monotonicity. During precharge, the output is pulled HIGH. When the clock rises, the input is HIGH so the output is discharged LOW through the pulldown network, as you would want to have happen in an inverter. The input later falls LOW, turning off the pulldown network. However, the precharge transistor is also OFF so the output floats, staying LOW rather than rising as it would in a normal inverter. The output will remain low until the next precharge step. In summary, the inputs must be monotonically rising for the dynamic gate to compute the correct function. Unfortunately, the output of a dynamic gate begins HIGH and monotonically falls LOW during evaluation. This monotonically falling output X is not a suitable input to asecond dynamic gate expecting monotonically rising signals, as shown in Figure 9.27. Dynamic gates sharing the same clock cannot be directly connected

(15)

2.b. Explain the various Power dissipation in CMOS circuits.

Sources of Power Dissipation

Power dissipation in CMOS circuits comes from two components: _ Dynamic dissipation due to

○ charging and discharging load capacitances as gates switch

○ “short-circuit” current while both pMOS and nMOS stacks are partially ON _ Static dissipation due to

○ subthreshold leakage through OFF transistors ○ gate leakage through gate dielectric

○ junction leakage from source/drain diffusions

○ contention current in ratioed circuits (see Section 9.2.2) Putting this together gives the total power

of a circuit

P dynamic + P switching =P short circuit

Power can also be considered in active, standby, and sleep modes. Active power is the power consumed while the chip is doing useful work. It is usually dominated by Pswitching. Standby power is the power consumed while the chip is idle. If clocks are stopped and ratioed circuits are disabled, the standby power is set by leakage. In sleep mode, the supplies to unneeded circuits are turned off to eliminate leakage. This drastically reduces the sleep power required, but the chip requires time and energy to wake up so sleeping is only viable if the chip will idle for long enough.[Gonzalez96] found that roughly one-third of microprocessor power is spent on the clock, another third on memories, and the remaining third on logic and wires. In nanometer technologies, nearly one-third of the power is leakage. High-speed I/O contributes growing component too. For example, Figure 5.6 shows the active power consumption of Sun’s 8-core 84 W Niagra2 processor [Nawathe08]. The cores and other components collectively account for clock, logic, and wires. The next sections investigate how to estimate and minimize each of these components of power.

Dynamic Power

Dynamic power consists mostly of the switching power, given in EQ (5.10). The supply voltage VDD and frequency f are readily known by the designer. To estimate this power, one can consider each node of the circuit. The capacitance of the node is the sum of the gate, diffusion, and wire capacitances on the node. The activity factor can be estimated using techniques described in Section 5.2.1 or measured from logic simulations. The effective capacitance of the node is its true capacitance multiplied by the activity factor. The switching power depends on the

(16)

sum of the effective capacitances of all the nodes. Activity factors can be heavily dependent on the particular task being executed. For example, a processor in a cell phone will use more ower while running video games than while displaying a calendar. CAD tools do a fine job of power estimation when given a realistic workload. Low power design involves considering and reducing each of the terms in switching power.

As VDD is a quadratic term, it is good to select the minimum VDD that can support the required frequency of operation. Likewise, we choose the lowest frequency of operation that achieves the desired end performance. The activity factor is mainly reduced by putting unused blocks to sleep. Finally, the circuit may be optimized to reduce the overall load capacitance of each section. Dynamic power also includes a short-circuit power component cause by power rushing from VDD to GND when both the pull up and pull down networks are partially ON while a transistor switches. This is normally less than 10% of the whole, so it can be conservatively estimated by adding 10% to the switching power. Switching power is consumed by delivering energy to charge a load capacitance, then dumping this energy to GND. Intuitively, one might expect that power could be saved by shuffling the energy around to where it is needed rather than just dumping it. Resonant circuits, and adiabatic charge-recovering circuits [Maksimovic00, Sathe07] seek to achieve such a goal. Unfortunately, all of these techniques add complexity that detracts

from the potential energy savings, and none have

3.a. Explain about Static CMOS circuit.

The most widely used logic style is static complementary CMOS. The static CMOS style is really an extension of the static CMOS inverter to multiple inputs. In review, the primary advantage of the CMOS structure is robustness (i.e, low sensitivity to noise), good performance, and low power consumption with no static power dissipation. Most of those properties are carried over to large fan-in logic gates implemented using a similar circuit topology. The complementary CMOS circuit style falls under a broad class of logic circuits called static circuits in which at every point in time (except during the switching transients), each gate output is connected to either VDD or Vss via a low-resistance path. Also, the outputs of the gates assume

(17)

at all times the value of the Boolean function implemented by the circuit (ignoring, once again, the transient effects during switching periods). This is in contrast to the dynamic circuit class, which relies on temporary storage of signal values on the capacitance of high-impedance circuit nodes. The latter approach has the advantage that the resulting gate is simpler and faster. Its design and operation are however more involved and prone to failure due to an increased sensitivity to noise. In this section, we sequentially address the design of various static circuit flavors including complementary CMOS, ratioed logic (pseudo-NMOS and DCVSL), and passtransistor logic. The issues of scaling to lower power supply voltages and threshold voltages will also be dealt with.

Complementary CMOS Concept

A static CMOS gate is a combination of two networks, called the pull-up network (PUN) and the pull-down network (PDN) (Figure 6.2). The figure shows a generic N input logic gate where all inputs are distributed to both the pull-up and pull-down networks. The function of the PUN is to provide a connection between the output and VDD anytime the output of the logic gate is meant to be 1 (based on the inputs). Similarly, the function of the PDN is to connect the output to VSS when the output of the logic gate is meant to be 0. The PUN and PDN networks are constructed in a mutually exclusive fashion such that one and only one of the networks is conducting in steady state. In this way, once the transients have settled, a path always exists between VDD and the output F, realizing a high output (“one”),

or, alternatively, between VSS and F for a low output (“zero”). This is equivalent to stating that the output node is always a low-impedance node in steady state.

In constructing the PDN and PUN networks, the following observations should be kept in mind: • A transistor can be thought of as a switch controlled by its gate signal. An NMOS switch is on when the controlling signal is high and is off when the controlling signal is low. A PMOS transistor acts as an inverse switch that is on when the controlling signal is low and off when the controlling signal is high.

• The PDN is constructed using NMOS devices, while PMOS transistors are used in the PUN. The primary reason for this choice is that NMOS transistors produce “strong zeros,” and PMOS

(18)

devices generate “strong ones”. To illustrate this, consider the examples shown in Figure 6.3. In Figure 6.3a, the output capacitance is initially charged to VDD. Two possible discharge scenarios are shown. An NMOS device pulls the output all the way down to GND, while a PMOS lowers the output no further than |VTp| — the PMOS turns off at that point, and stops contributing discharge current. NMOS transistors are hence the preferred devices in the PDN. Similarly, two alternative approaches to charging up a capacitor, with the output initially at GND. A PMOS switch succeeds in charging theoutput all the way to VDD, while the NMOS device fails to raise the output above VDD-VTn. This explains why PMOS transistors are preferentially used in a PUN. A set of construction rules can be derived to construct logic functions

NMOS devices connected in series corresponds to an AND function. With all the inputs high, the series combination conducts and the value at one end of the chain is transferred to the other end. Similarly, NMOS transistors connected in parallel represent an OR function. A conducting path exists between the output and input terminal if at least one of the inputs is high. Using similar arguments, construction rules for PMOS networks can be formulated. A series connection of PMOS conducts if both inputs are low, representing a NOR function (A.B = A+B), while PMOS transistors in parallel implement a NAND (A+B = A·B.• Using De Morgan’s theorems ((A + B) =

A·B and A·B = A + B), it can be shown that the pull-up and pull-down networks of a

complementary CMOS structure are dual networks. This means that a parallel connection of transistors in the pull-up network corresponds to a series connection of the corresponding devices in the pull-down network, and vice versa. Therefore, to construct a CMOS gate, one of the networks (e.g., PDN) is implemented using combinations of series and parallel devices. The other network (i.e., PUN) is obtained using duality principle by walking the hierarchy, replacing series sub-nets with parallel sub-nets, and parallel sub-nets with series sub-nets. The complete CMOS gate is constructed by combining the PDN with the PUN.

• The complementary gate is naturally inverting, implementing only functions such as NAND, NOR, and XNOR. The realization of a non-inverting Boolean function (such as AND OR, or XOR) in a single stage is not possible, and requires the addition of an extra inverter stage. • The number of transistors required to implement an N-input logic gate is 2N.

Ratioed Logic Concept

Ratioed logic is an attempt to reduce the number of transistors required to implement a given logic function, at the cost of reduced robustness and extra power dissipation. The purpose

(19)

of the PUN in complementary CMOS is to provide a conditional path between VDD and the output when the PDN is turned off. In ratioed logic, the entire PUN is replaced with a single unconditional load device that pulls up the output for a high output. Instead of a combination of active pull-down and pull-up networks, such a gate consists of an NMOS pull-down network that realizes the logic function, and a simple load device. Figure 6.27b shows an example of ratioed logic, which uses a grounded PMOS load and is referred to as a pseudo-NMOS gate.

The clear advantage of pseudo-NMOS is the reduced number of transistors (N+1 versus 2N for complementary CMOS). The nominal high output voltage (VOH) for this gate is VDD since the pull-down devices are turned off when the output is pulled high (assuming that VOL is below

VTn). On the other hand, the nominal low output voltage is

not 0 V since there is a fight between the devices in the PDN and the grounded PMOS load device. This results in reduced noise margins and more importantly static power dissipation.

The sizing of the load device relative to the pull-down devices can be used to trade-off parameters such a noise margin, propagation delay and power dissipation. Since the voltage swing on the output and the overall functionality of the gate depends upon the ratio between the NMOS and PMOS sizes, the circuit is called ratioed. This is in contrast to the ratioless logic styles, such as complementary CMOS, where the low and high levels do not depend upon transistor sizes. Computing the dc-transfer characteristic of the pseudo-NMOS proceeds along paths

similar to those used for its complementary CMOS counterpart. The value of VOL is obtained by equating the currents through the driver and load devices for Vin = VDD. At this operation point, it is reasonable to assume that the NMOS device resides in linear mode (since the output should ideally be close to 0V), while the PMOS load is saturated. In order to make VOL as small as possible, the PMOS device should be sized much smaller than the NMOS pull-down devices. Unfortunately, this has a negative impact on the propagation delay for charging up the output node since the current provided by the PMOS device is limited.

(20)

A major disadvantage of the pseudo-NMOS gate is the static power that is dissipated when the output is low through the direct current path that exists between VDD and GND. The static power consumption in the low-output mode is easily derived

Pass-Transistor Logic Pass-Transistor Basics

A popular and widely-used alternative to complementary CMOS is pass-transistor logic, which attempts to reduce the number of transistors required to implement logic by allowing the primary inputs to drive gate terminals as well as source/drain terminals This is in contrast to logic families that we have studied so far, which only allow primary inputs to drive the gate terminals of MOSFETS. shows an implementation of the AND function constructed that way, using only NMOS transistors. In this gate, if the B input is high, the top transistor is turned on and copies the input A to the output F. When B is low, the bottom pass transistor is turned on and passes a 0. The switch driven by B seems to be redundant at first glance. Its presence is essential to ensure that the gate is static, this is that a low-impedance path exists to the supply rails under all circumstances, or, in this particular case, when B is low. The promise of this approach is that fewer transistors are required to implement a given function. For example, the implementation of the AND gate in Figure 6.33 requires 4 transistors (including the inverter required to invert B), while a complementary CMOS implementation would require 6 transistors. The reduced number of devices has the additional advantage of lower capacitance. Unfortunately, as discussed earlier, an NMOS device is effective at passing a 0 but is poor at pulling a node to VDD. When the pass transistor pulls a node high, the output only charges up to VDD -VTn. In fact, the situation is worsened by the fact that the devices

3.b.Explain about Dynamic CMOS Design (8)

Dynamic CMOS Design

It was noted earlier that static CMOS logic with a fan-in of N requires 2N devices. A variety of approaches were presented to reduce the number of transistors required to implement a given logic function including pseudo-NMOS, pass transistor logic, etc. The pseudo-NMOS logic style requires only N + 1 transistors to implement an N input logic gate, but unfortunately it has static power dissipation. In this section, an alternate logic style called dynamic logic is presented that obtains a similar result, while avoiding static power consumption. With the addition of a clock input, it uses a sequence of precharge and conditional evaluation phases.

(21)

Dynamic Logic: Basic Principles

The basic construction of an (n-type) dynamic logic gate is shown in Figure 6.52a. The PDN (pull-down network) is constructed exactly as in complementary CMOS. The operation of this circuit is divided into two major phases: precharge and evaluation, with the mode of operation determined by the clock signal CLK.

Precharge

When CLK = 0, the output node Out is precharged to VDD by the PMOS transistor Mp. During that time, the evaluate NMOS transistor Me is off, so that the pull-down path is disabled. The evaluation FET eliminates any static power that would be consumed during the precharge period (this is, static current would flow between the supplies if both the pulldown and the precharge device were turned on simultaneously).

Evaluation

For CLK = 1, the precharge transistor Mp is off, and the evaluation transistor Me is turned on. The output is conditionally discharged based on the input values and the pull-down topology. If the inputs are such that the PDN conducts, then a low resistance path exists between Out and

GND and the output is discharged to GND. If the PDN is turned off, the precharged value

remains stored on the output capacitance CL, which is a combination of junction capacitances, the wiring capacitance, and the input capacitance of the fan-out gates. During the evaluation phase, the only possible path between the output node and a supply rail is to GND. Consequently, once Out is discharged, it cannot be charged again till then next precharge operation. The inputs to the gate can therefore make at most one transition during evaluation. Notice that the output can be in the high-impedance state during the evaluation period if the pull-down network is turned off. This behavior is fundamentally different from the static counterpart that always has a low resistance path between the output and one of the power rails. As as an example, consider the circuit shown in Figure 6.52b. During the precharge phase (CLK=0), the output is precharged to VDD regardless of the input values since the evaluation device is turned off. During evaluation (CLK=1), a conducting path is created

between Out and GND if (and only if) A·B+C is TRUE. Otherwise, the output remains at the precharged state of VDD. The following function is thus realized:

(22)

A number of important properties can be derived for the dynamic logic gate:

• The logic function is implemented by the NMOS pull-down network. The construction of the PDN proceeds just as it does for static CMOS.

• The number of transistors (for complex gates) is substantially lower than in the static case: N + 2 versus 2N.

• It is non-ratioed. The sizing of the PMOS precharge device is not important for realizing proper functionality of the gate. The size of the precharge device can be made large to improve the low-to-high transition time (of course, at a cost to the high-to low transition time). There is however, a trade-off with power dissipation since a larger precharge device directly increases clock-power dissipation.

• It only consumes dynamic power. Ideally, no static current path ever exists between VDD and

GND. The overall power dissipation, however, can be significantly higher compared to a static

logic gate.

• The logic gates have faster switching speeds. There are two main reasons for this. The first (obvious) reason is due to the reduced load capacitance attributed to the lower number of transistors per gate and the single-transistor load per fan-in. Second, the dynamic gate does not have short circuit current, and all the current provided by the pull-down devices goes towards discharging the load capacitance. The low and high output levels VOL and VOH are easily identified as GND and VDD and are not dependent upon the transistor sizes. The other VTC parameters are dramatically different from static gates. Noise margins and switching thresholds have been defined as static quantities that are not a function of time. To be functional, a dynamic gate requires a periodic sequence of precharges and evaluations. Pure static analysis, therefore, does not apply. During the evaluate period, the pull-down network of a dynamic inverter starts to conduct when the input signal exceeds the threshold voltage (VTn) of the NMOS pull-down transistor. Therefore, it is reasonable to set the switching threshold (VM) as well as VIH and VIL of the gate equal to VTn. This translates to a low value for the NML.

Speed and Power Dissipation of Dynamic Logic

The main advantages of dynamic logic are increased speed and reduced implementation area. Fewer devices to implement a given logic function implies that the overall load capacitance is much smaller. The analysis of the switching behavior of the gate has some interesting peculiarities to it. After the precharge phase, the output is high. For a low input signal, no additional switching occurs. As a result, tpLH = 0! The high-to-low transition, on the other hand, requires the discharging of the output capacitance through the pull-down network. Therefore

tpHL is proportional to CL and the current-sinking capabilities of the pull-down network. The

presence of the evaluation transistor slows the gate somewhat, as it presents an extra series resistance. Omitting this transistor, while functionally not forbidden, may result in static power dissipation and potentially a performance loss. The above analysis is somewhat unfair, because it ignores the influence of the precharge time on the switching speed of the gate. The precharge time is determined by the time it takes to charge CL through the PMOS precharge transistor. During this time, the logic in the gate cannot be utilized. However, very often, the overall digital system can be designed in such a way that the precharge time coincides with other system functions. For instance, the precharge of the arithmetic unit in a microprocessor can coincide with the instruction decode. The designer has to be aware of this “dead zone” in the use of dynamic logic, and should carefully consider the pros and cons of its usage, taking the overall system requirements into account.

(23)

When evaluating the power dissipation of a dynamic gate, it would appear that dynamic logic presents a significant advantage. There are three reasons for this. First, the physical capacitance is lower since dynamic logic uses fewer transistors to implement a given function. Also, the load seen for each fanout is one transistor instead of two. Second, dynamic logic gates by construction can at most have one transition per clock cycle. Glitching (or dynamic hazards) does not occur in dynamic logic. Finally, dynamic gates do not exhibit short circuit power since the pull-up path is not turned on when the gate is evaluating. While these arguments are generally true, they are offset by other considerations:

(i) the clock power of dynamic logic can be significant, particularly since the clock node has a guaranteed transition on every single clock cycle;

(ii) the number of transistors is higherthan the minimal set required for implementing the logic;

(iii) short-circuit power may exist when leakage-combatting devices are added (as will be discussed further);

(iv) and,most importantly, dynamic logic generally displays a higher switching activity due to the periodic precharge and discharge operations. Earlier, the transition probability for a static gate was shown to be p0 p1= p0 (1-p0). For dynamic logic, the output transition probabilitydoes not depend on the state (history) of the inputs, but rather on the signal probabilities only. For an n-tree dynamic gate, the output makes a 0Õ1 transition during the precharge phase only if the output was discharged during the preceding evaluate phase. The 0Õ1 transition probability for an n-type dynamic gate hence equals

4a. Explain in detail about Pulsed Latches and resettable lathes

A pulsed latch can be built from a conventional CMOS transparent latch driven by a brief clock pulse. Figure 10.22(a) shows a simple pulse generator, sometimes called a clock chopper or

one-shot [Harris01a]. The pulsed latch is faster than a regular flip-flop because it involves a single

latch rather than two and because it allows time borrowing. It can also consume less energy, although the pulse generator adds to the energy consumption (and is ideally shared across multiple pulsed latches for energy and area efficiency). The drawback is the increased hold time. The Naffziger pulsed latch used on the Itanium 2

(24)

processor consists of the latch from Figure 10.17(k) driven by even shorter pulses produced by the generator of Figure 10.22(b) [Naffziger02]. This pulse generator uses a fairly slow (weak) inverter to produce a pulse with a nominal width of about one-sixth of the cycle (125 ps for 1.2 GHz operation). When disabled, the internal node of the pulse generator floats high momentarily, but no keeper is required because the duration is short. Of course, the enable signal has setup and hold requirements around the rising edge of the clock, as shown in Figure 10.22(c).

Figure 10.22(d) shows yet another pulse generator used on an NEC RISC processor [Kozu96] to produce substantially longer pulses. It includes a built-in dynamic transmission gate latch to prevent the enable from glitching during the pulse. Many designers consider short pulses risky. The pulse generator should be carefully simulated across process corners and possible RC loads to ensure the pulse is not degraded too badly by process variation or routing. However, the Itanium 2 team found that the pulses could be used just as regular clocks as long as the pulse generator had adequate drive. The quad-core Itanium pulse generator selects between 1- and 3-inverter delay chains using a transmission gate multiplexer [Stackhouse09]. The wider pulse offers more robust latch operation across process and environmental variability and permits more time borrowing, but increases the hold time. The multiplexer select is software-programmable to fix problems discovered after fabrication. The Partovi pulsed latch in Figure 10.23 eliminates the need to distribute the pulse by building the pulse generator into the latch itself [Partovi96, Draper97]. The weak crosscoupled inverters in the dashed box staticize the circuit, although the latch is susceptible to back-driven output noise on Q or Q unless an extra inverter is used to buffer the output. The Partovi pulsed latch was used on the AMD K6 and Athlon [Golden99], but

(25)

is slightly slower than a simple latch [Naffziger02]. It was originally called an Edge Triggered

Latch (ETL), but strictly speaking is a pulsed latch because it has a brief window of

transparency.

Resettable Latches and Flip-Flops

Most practical sequencing elements require a reset signal to enter a known initial state on startup and ensure deterministic behavior. Figure 10.24 shows latches and flip-flops with reset inputs. There are two types of reset: synchronous and asynchronous. Asynchronous reset forces

Q low immediately, while synchronous reset waits for the clock. Synchronous reset signals must

be stable for a setup and hold time around the clock edge while asynchronous reset is characterized by a propagation delay from reset to output. Synchronous reset simply requires ANDing the input D with reset. Asynchronous reset requires gating both the data and the feedback to force the reset independent of the clock. The tristate NAND gate can be constructed from a NAND gate in series with a clocked transmission gate. Settable latches and flip-flops force the output high instead of low. They are similar to resettable elements of Figure 10.24 but replace NAND with NOR and reset with set shows a flip-flop combining both asynchronous set and reset.

Enabled Latches and Flip-Flops

Sequencing elements also often accept an enable input. When enable en is low, the element retains its state independently of the clock. The enable can be performed with an input multiplexer or clock gating, as shown in Figure 10.26. The input multiplexer feeds back the old state when the element is disabled. The multiplexer adds area and delay. Clock gating does not affect delay from the data input and the AND gate can be shared among multiple clocked elements. Moreover, it significantly reduces power consumption because the clock on the disabled element does not toggle. However, the AND gate delays

(26)

the clock, potentially introducing clock skew. addresses techniques to minimize the skew by building the AND gate into the final buffer of the clock distribution network. en must be stable while the clock is high to prevent glitches on the clock, as will be discussed further

4b. Explain about Master-Slave Based Edge Triggered Register

Master-Slave Based Edge Triggered Register

The most common approach for constructing an edge-triggered register is to use a

master-slave configuration as shown in Figure 7.14. The register consists of cascading a negative latch

(master stage) with a positive latch (slave stage). A multiplexer based latch is used in this particular implementation, though any latch can be used to realize the master and slave stages. On the low phase of the clock, the master stage ist ransparent and the D input is passed to the master stage output, QM. During this period, the slave stage is in the hold mode, keeping its previous value using feedback. On the rising edge of the clock, the master slave stops sampling the input, and the slave stage starts sampling. During the high phase of the clock, the slave stage samples the output of the master stageQ (M), while the master stage remains in a hold mode. Since QM is constant during the high phase of the clock, the output Q makes only one transition per cycle. The value of Q is the value of D right before the rising edge of the clock, achieving the

positive edge-triggered effect. A negative edge-triggered register can be constructed using the

same principle by simply switching the order of the positive and negative latch (i.e., placing the positive latch first). A complete transistor level implementation of a the master-slave positive

edge-triggered register is shown in Figure 7.15. The multiplexer is implemented using

transmission gates as discussed in the previous section. When clock is low (CLK = 1), T1 is on and T2 is off, and the D input is sampled onto node QM. During this period, T3 is off and T4 is on and the cross-coupled inverters (I5, I6) holds the state of the slave latch. When the clock goes high, the master stage stops sampling the input and goes into a hold mode. T1 is off and T2 is on, and the cross coupled inverters I3 and I4 holds the state of QM. Also, T3 is on and T4 is off, and

(27)

Timing Properties of the multiplexer Bases Master-Slave Register. As discussed earlier, there are three important timing metrics in registers: the set up time, the hold time and the propagation

delay. It is important to understand these factors that affect the timing parameters and develop

the intuition to manually estimate the parameters. Assume that the propagation delay of each inverter is tpd_inv and the propagation delay of the transmission gate is tpd_tx. Also assume that the contamination delay is 0 and the inverter delay to derive CLK from CLK has a delay equal to 0.

The set-up time is the time before the rising edge of the clock that the input dataD must become valid. Another way to ask the question is how long before the rising edge does the D input have to be stable such that QM samples the value reliably. For the transmission gate multiplexer-based register, the input D has to propagate through I1, T1, I3 and I2 before the rising edge of the clock. This is to ensure that the node voltage s on both terminals of the transmission gate T2 are at the same value. Otherwise, it is possible for the cross-coupled pair I2 and I3 to settle to an incorrect value. The set-up time is therefore equal to 3 *tpd_inv + tpd_tx . The propagation delay is the time for the value of QM to propagate to the output Q. Note that since we included the delay ofI2 in the set-up time, the output of I4 is valid before the rising edge of clock. Therefore the delayt c-q is simply the delay throughT 3 and I6 (tc-q = tpd_tx +

tpd_inv).The hold time represents the time that the input must be held stable after the rising edge

of the clock. In this case, the transmission gateT 1 turns off when clock goes high and therefore any changes in theD-input after clock going high are not seen by the input. Therefore, the hold

time is 0.

As mentioned earlier, the drawback of the transmission gate register is the high capacitive load presented to the clock signal. The clock load per register is important since it

(28)

directly impacts the power dissipation of the clock network. Ignoring the overhead required to invert the clock signal (since the buffer inverter overhead can be amortized over multiple register bits), each register has a clock load of 8 transistors. One approach to reduce the

clock load at the cost of robustness is to make the circuit ratioed. Figure 7.18 shows that the feedback transmission gate can be eliminated by directly cross coupling the inverters.

The penalty for the reduced clock load is increased design complexity. The transmission gate (T1) and its source driver must overpower the feedback inverter (I2) to switch the state of the cross-coupled inverter. The sizing requirements for the transmission gates can be derived using a similar analysis as performed for the SR flip-flop. The input to the inverter I1 must be brought below its switching threshold in order to make a transition. If minimum-sized devices are to be used in the transmission gates, it is essential that the transistors of inverter I2 should be made even weaker. This can be accomplished by making their channel-lengths larger than minimum. Using minimum or close-to-minimumsize devices in the transmission gates is desirable to reduce the power dissipation in the latches and the clock distribution network. Another problem with this scheme is the reverse conduction — this is ,the second stage can affect the state of the first latch. When the slave stage is on (Figure 7.19), it is possible for the combination of T2 and I4 to influence the data stored in I1-I2 latch. As long as I4 is a weak device, this is fortunately not a major problem.

5a. Explain the concept of a 4-bit barrel Shifter (8) Barrel Shifter

Any general purpose n-bit shifter should be able to shift incoming data by up to n - 1 places in a right-shift or left-shift direction. If we now further specify that all shifts should be on an 'end-around' basis, so that any bit shifted out at one end of a data word will be shifted in at the other

end of the word, then the problem of right shift or left shift is greatly eased. In fact, a moment's

consideration will reveal, for a 4-bit word, that a 1-bit shift right is equivalent to a 3-bit shift left

and a 2-bit shift right is equivalent to a 2-bit shift left, etc. Thus we can achieve a capability to

shift left or right by zero, one, two, or three places by designing a circuit which will shift right

only (say) by zero, one, two, or three places. The nature of the shifter having been decided on, its

implementation must then be considered. Obviously, the first circuit which comes to mind is that

of the shift register in Figures 6.38, 6.39 and 6.40. Data could be loaded from the output of the

ALU and shifting effected; then the outputs of each stage of the shift register would provide the required parallel output to be returned to the register array (or elsewhere in the general case).

However, there is danger in accepting the obvious without question. Many designers, used to the

constraints of TTL, MSI, and SSI logic, would be conditioned to think in terms of such standard

arrangements. When designing VLSI systems, it pays to set out exactly what

is req).lired to assess the best approach. In this case, the shifter must have: input from a four-line parallel data bus;

(29)

• four output lines for the shifted data;

• means of transferring input data to output lines with any shift from zero to three bits inclusive.

In looking for a way of meeting these requirements, we should also attempt to take best advantage of the technology; for example, the availability of the switch-like MOS pass transistor and transmission gate.

(30)

We must also observe the strategy decided on earlier for the direction of data and control signal flow, and the approach adopted should make this feasible. Remember that the overall strategy in this case is for data to flow horizontally and control signals vertically. A solution which meets these requirements emerges from the days of switch and relay contact based switching networks-the crossbar switch. Consider a direct MOS switch implementation of a 4 x 4 crossbar switch, as in Figure 7.6. The arrangement is quite general and may be readily expanded to accommodate n-bit inputs/outputs. In fact, this arrangement is an overkill in that any input line can be connected to any or all output lines-if all switches are closed, then all inputs are connected to all outputs in one glorious short circuit. Furthermore, 16 control signals (sw00-sw15), one for each transistor switch, must be provided to drive the crossbar switch, and such complexity is highly undesirable. An adaptation of this arrangement recognizes the fact that we can couple the switch gates together in groups of four (in this case) and also form four separate groups corresponding to shifts of zero, one, two and three bits. The arrangement is readily adapted so that the in-lines also run horizontally (to conform to the required strategy). The resulting arrangement is known as a barrel shifter and a 4 x 4-bit barrel shifter circuit diagram is given ih Figure 7. 7. The interbus switches have their gate inputs connected in a staircase fashion in

groups of four and there are now four shift control inputs which must be mutually exclusive in

the active state. CMOS transmission gates may be used in place of the simple pass transistor

switches if appropriate.

The structure of the barrel shifter is clearly one of high regularity and generality and it may be

readily represented in stick diagram form. One possible implementation, using simple n-type

switches, is given in Figure 7.8. The stick diagram clearly conveys regular topology and allows

the choice of a standard

cell from which complete barrel shifters of any size may be formed by replication of the standard cell. It should be noted that standard cell boundaries must be carefully chosen to allow for

butting together side by side and top to bottom to retain the overall topology. The mask layout

for standard cell number 2 (arbitrary choice) of Figure 7.8 may then be set out as in Figure 7.9. Once the standard cell dimensions have been determined, then any n x n barrel shifter may be configured and its outline, or bounding box, arrived at by summing up the dimensions of the

(31)

replicated standard cell. The use of simple n-type switches in a CMOS environment might be

questioned. Although there will be a degrading of logic 1 levels through n-type switches, this

generally does not matter if the shifter is followed by restoring circuitry such as inverters or gate logic. Furthermore, as there will only ever be one n-type switch in series between an input and the corresponding output line, the arrangement is fast. The minimum size bounding box outline

for the 4 x 4-way barrel shifter is given in

Figure 7.10. The figure also indicates all inlet and outlet points around the periphery together

with the layer on which each is located. This allows ready placing of the shifter within the floor

plan (Figure 7.5) and its interconnection with the other subsystems forming the datapath.It also emphasizes the fact that, as in this case, many subsystems need external links to complete their architecture. In this case, the links shown on the right of the bounding box must be made and must be allowed for in interconnections and overall dimensions. This formof representation also allows the subsystem geometric characterization to be that of the bounding box alone for composing higher levels of the system hierarchy.

5b.Explain about Carry Look ahead adder in detail.

Carry-Propagate Addition

N-bit adders take inputs {AN, …, A1}, {BN, …, B1}, and carry-in Cin, and compute the sum {SN, …, S1} and the out of the most significant bit Cout, as shown in Figure 11.9. carry-out. Long adders use multiple levels of lookahead structures for even more speed.

Carry-Ripple Adder An N-bit adder can be constructed by cascading N full adders, as shown in Figure 11.11(a) for N carry-ripple adder (or ripple-carry adder). The

carry-out of bit i, Ci, is the carry-in to bit i weight of

the sum Si. The delay of the adder is set by the time for the carries to ripple through the N stages, so the tC�Cout delay should be

minimized. This delay can be reduced by omitting the inverters on the outputs, as was done in Figure 11.4(c). Because addition is a self-dual function (i.e., the function of complementary inputs is the complement of the function), an inverting full adder receiving complementary inputs produces true outputs. Figure 11.11(b) shows a carry ripple adder built from inverting full adders. Every other stage operates on complementary data. The delay inverting the adder inputs or sum outputs is off the critical ripple-carry path.

Carry Generation and Propagation

This section introduces notation commonly used in describing faster adders. Recall that the P (propagate) and G ( generate) signals were defined in Section 11.2.1. We can generalize these signals to describe whether a group spanning bits i…j, inclusive, generate a carry or propagate a carry. A group of bits generates a carry if its carry-out is true independent of the carryin; it propagates a carry if its carry-out is true when there is a carry-in. These signals can be defined recursively for i � k j as In other words, a group generates a carry if the upper (more significant) or the lower portion generates and the upper portion propagates that carry. The group propagates a carry if both the upper and lower portions propagate the carry.2 The carry-in must be treated specially. Let us define C Cin and CN Cout. Then we can define generate and propagate signals for bit 0 as

(32)

Observe that the carry into bit i is the carry-out of bit i–1 and is Ci– Gi–1:0. This is an important relationship; group generate signals and carries will be used synonymously in the subsequent sections. We can thus compute the sum for bit i using EQ (11.2) as (11.7)

Hence, addition can be reduced to a three-step process:

1. Computing bitwise generate and propagate signals using EQs (11.5) and (11.6)

2. Combining PG signals to determine group generates Gi–1:0 for all N � i � 1 using EQ (11.4) 3. Calculating the sums using EQ (11.7)

These steps are illustrated in Figure 11.12. The first and third steps are routine, so most of the attention in the remainder of this section is devoted to alternatives for the group PG logic with different trade-offs between speed, area, and complexity. Some of the hardware can be shared in the bitwise PG logic, as shown in Figure 11.13. carry-in to carry-out along the carry chain majority gates. As the P and G signals will have already stabilized by the time the carry arrives, we can use them to simplify the majority function into an AND-OR gate:3

In this extreme, the group propagate signals are never used and need not be computed. Figure 11.14 shows a 4-bit carry-ripple adder. The critical carry path now proceeds through a chain of AND-OR gates rather than a chain of majority gates. Figure 11.15 illustrates the group PG logic for a 16-bit carry-ripple adder, where the AND-OR gates in the group PG network are represented with gray cells. Diagrams like these will be used to compare a variety of adder architectures in subsequent sections. The diagrams use black cells, gray cells, and white buffers defined in Figure 11.16(a) for valency-2 cells. Black cells contain the group generate and