Figure 3.2: Simulated unwanted glitch at different logic depth in a chain of inverters with FTL implementation.
between M 1 and nMOS PDN because all inputs to nMOS transistors are reset to logic “0” during the reset period. The first generation of FTL exhibits many shortcomings including excessive power dissipation and reduced noise margin. To mitigate these problems, we propose a new high-performance logic style that we call constant-delay (CD) logic. CD logic provides a local window technique and a self-reset circuit that enables robust logic operation with minimized power consumption while maintaining FTL’s speed advantage. The most distinct characteristic of CD logic from previously proposed logic styles is that the delay is, to a first-order approximation, not affected by the logic expression 1 . Unlike SCL, CD logic does not require complementary signals and can be easily integrated with static and dynamic domino logics. Also, CD logic does not have the problem of constant static power dissipation similar to pseudo-nMOS. Furthermore, the clock timing requirement of CD logic is not as stringent as OPL. CD logic can achieve robust operation with optimal performance as long as CLK signal arrives earlier than the input signals. This thesis will demonstrate that CD logic has the potential to 1) outperform other logic styles with better energy-efficiency, and is particularly suitable for high-performance digital blocks; and 2) CD logic is robust under extreme process, voltage, and temperature (PVT) variations.
Domino logic is attractive for high-speed circuits & it is 1.5 – 2x faster than static CMOS, so widely used in high-performance microprocessors. But many challenges: Monotonicity, Leakage, Charge sharing, Noise. The Fig.2 above shows X is a dynamic node holding value as charge on the node & eventually sub-threshold leakage charge may disturb to reduce the charge leakage by using keeper circuit and foot transistor is used to increase the performance of the circuit.
of these test vectors on the chip for BIST is not possible. In such condition, DFT modifications are required for effective reduction of test vector set size. Such modifications degrade the performance of embedded array multipliers. The application of highly regular test patterns which are efficiently generated on the chip without any DFT modifications, for embedded array multipliers, leads to an efficient BIST scheme which leads to a very high fault coverage. BIST designs methods for the Iterative Logic Arrays have been proposed in [18, 19]. A customized Test Pattern Generator (TPG) needs to be designed for the method suggested in . Again, this is applicable only for one-dimensional Iterative Logic Array and further requires modifications to two-dimensional ILAs which increases complexity in hardware. For carry-propagate type array multiplier, test methodology is studied and explained in . Based on the graph labeling techniques, test pattern generation and DFT for Iterative Logic arrays are proposed. Tree multipliers are better choice for performance oriented designs. Testability of tree multipliers is at par with the array multipliers as both of them give the same BIST methodology for testing .Tree like parallel multipliers are considered better than array multipliers but due to the lower regularity in their structure, generally avoided for use . The linear programming approach for test pattern generation is proposed in  by which the best possible the test vectors are selected from the n-detection test set which is derived out of the weighted defect part level estimation model. The output deviations are used for ranking of patterns in .
Parallel Asynchronous Self-Timed Adder (PASTA) is performing multi-bit binary addition based on the recursive formulation. PASTA does not need any carry chain propagation. The design is provided with the completion and detection unit and the design attains the logarithmic performance without any speedup circuitry or look-ahead schema. The implementation of PASTA has no limitations for high fan-outs. For asynchronous logic the high fan-in gate is required and which is managed by connecting the transistors in parallel. The simulations have been performed in CADENCE 180nm technology.
The current sequential circuit designs do not focus on maintaining the state of the flip-flop while moving to the sleep mode. However, it is important for the flip-flop devices to maintain their state while they are in the sleep mode and retrieve it after coming out of the sleep state. For this, a special data retention circuitry is required. This circuitry should be such that it does not increase the leakage during the sleep mode and at the same time does not degrade the performance while in the active mode. It is also essential that it utilize the circuitry and the control signals from the current design for storing the data . It keeps the circuit simple, as it does not require any additional circuitry and control signals. This reduces the extra capacitive load on the critical path, thereby making the circuit faster.
Neural networks, are also known as artificial neutral network (ANN’s), are information processing system with their design inspired by the studies of the ability of the human brain to learn from observations and to generalize by abstraction. The fact that neural network can be trained to learn any arbitrary nonlinear input/output relationships from corresponding data and the acquired knowledge has resulted in their use in a number of areas such as pattern recognition, speech processing, control, bio medical engineering, RF and microwave etc. recently, (ANNs) have been applied to CMOS analog circuit design and optimization problems as well. Neural networks are first trained to model the electrical behavior of passive and active components/ circuits. These trained neural networks, often referred to as neural- network models, can then be used in high level simulation and design, providing first and accurate answers to the task they have learnt by acquiring the knowledge from their training. Neural networks are effective and efficient alternatives to conventional method such as numerical modeling methods, which could be highly computationally expensive, or analytical methods which could be difficult to obtain for newly achived devices, or empirical modeling solutions due to huge range and limited accuracy. Neural network
CMOS is the basic building block of many of the digital circuits. The CMOS circuit itself acts as an inverter. It can be realized as a combination of PMOS in the pull up section whose source is connected to power supply and NMOS in the pull down section whose source is connected to ground and the output is taken across the drain junction of the two devices. The CMOS circuit has less power dissipation when compared to many of the previous VLSI families of RTL, TTL and ECL. The power consumption in CMOS is due to the switching activity of the transistors from one state to another state, charging and discharging of the load capacitance and frequency of operation .
However, the chief disadvantage with the domino dynamic logic circuit is its excessive power dissipation owing to the change activity and the clock load. To upset the excessive power dissipation of the dynamic logic, the present style methodologies trade power for performance within the delay in the important sections of the circuit . This can be achieved through a combination of dynamic and static circuit designs, use of twin provide voltages and dual-Vt transistors. Domino and dynamic logics are chiefly tormented by charge sharing and race issues . To resolve these demerits, we tend to propose the new MT-CMOS dynamic logic circuit victimization multi-threshold MOS semiconductor device for reducing the facility dissipation and FTL logic for increasing the operative speed of arithmeticcircuits . The projected circuit is giving response to these issues and minimizing power wastages. During this paper, we tend to propose a coffee power highperformance Ripple Carry Adder (RCA) logic structure employing a new MT-CMOS domino logic blocks .
Abstract: - IC development is nowadays a huge industry. There is an almost innate amount of consumer products like mobile phones, processors, televisions, cameras, refrigerators, ovens and cars that in one way or another uses custom IC components. Integrated circuits can provide anything from analog-to-digital conversion to digital filtering and much more. A digital integrated circuit can be manufactured with a number of different approaches, but they all contain the same basic steps. It all starts with transistors, wiring and all the things that make up the circuit being placed in a layout, designed in a CAD (Computer Aided Design) tool and ends up with that layout being physically created on a chip. The way to create these layout dyers depending on design requirements. Standard cell library contains a collection of components that are standardized at the logic or functional level, and consists of cells or macro-cells based on the unique layout. The economic and efficient accomplishment of an IC design depends heavily upon the choice of the library. Therefore, it is important to build library that fulfills the design requirement. A library of logic cells is the set of building blocks for the ASIC design flow. The library is typically called a standard cell library because of its common interface implementation and regular structure. The library provides the functional building blocks used for synthesis and a layout representation of the cells for place-and-route. It is very important to note that the process of HDL synthesis limits the choice of logic cells to those that are found in the library provided. This guarantees that a physical or layout representation of the cells exists when the design is implemented using place and route tools. One way to understand the required layout characteristics of standard cells is to understand their history and the reasons behind their development.
Now in these days, demand for less power VLSI with improved performance and efficiency tends to design more effectively by their layout, logic levels, technology and basic architecture of design . When taking into consideration of logic design of combinational logic circuits, one has to look for its less power consumption, more level of achievement, utilization of less space, more speed of operation and mostly with high efficiency. The different parameters affecting the above conditions are any short circuit and leakage current, activities on transition and power dissipation switching capacitance. Now it becomes more vital to choose proper design of circuit and efficient method of implementation by restricting the formulation of universal regulations for optimal logic styles. Investigations of the process are mainly considered on full-adder, full subtractor, and multiplier circuits used in
tightly related to the circuit working temperature. Hence, low power consumption is a zero-order constraint for most ICs manufactured today. In fact, higher performance-per-watt is the new mantra for micro-processor chip manufacturers today. In order to achieve high density and highperformance, CMOS technology feature size and threshold voltage have been scaling down for decades. Because of this trend, transistor leakage power has increased exponentially. The reduction of the supply voltage is dictated by the need to maintain the electric field constant on the ever shrinking gate oxide. Unfortunately, to keep transistor speed (proportional to the transistor “on” current) acceptable, the threshold voltage must be reduced too, which results in an exponential increase of the “off” transistor current, i.e. the current constantly flowing through the transistor even when it should be “non- conducting”.
FFT processor consists of flip flop, MUX, SRAM and radix-2 blocks. Here, flip flop block is replaced by the modified DOMS-FF, which makes the computation of the result much faster than the existing system. As a result FFT processor can be used for high speed and low power applications. Twiddle factors are fetched and stored in SRAM memory, stored data is used for next iterations. REFERENCES
The second adder is complementary pass transistor logic (CPL) uses 32 transistors with swing restoration. Most CPL gates can have an complexity in interconnection at the layout level with the increase in power and delay. For low power applications Pass Transistor Logic (PTL) is best suitable technique and explanation was given in .The advantage of Pass Transistor Logic (PTL) is that either PMOS or NMOS is enough to implement a complete design - and so number of transistors gets decreased and also smaller input loads, especially for NMOS network and also by this PTL we can eliminate short circuit energy dissipation.
In addition to the area requirement, the error rate performance achieved is an important metric of PHY layer ASICs. Therefore, in this section we compare the error rate performance of the proposed PHY layer implementations. In particular, we compare in this section the error rate performance of the soft-output MMSE detection, the soft-output STS SD with two and with five cores, and the LRALD based PHY layer ASICs. The error rate of the hard-output MMSE detection based PHY layer ASIC is of less interest, as the area of the soft-output and the hard- output MMSE detector based PHY layer ASICs are similar and the error rate of the soft-output MMSE detector based implementation will always be superior to the hard-output MMSE detector based PHY layer. Furthermore, the error rate performance of the PHY layer ASIC with ten sphere cores is disregarded, as the area of the implementation is tremendously larger than the other implementations, and the error rate performance improvement of five sphere cores compared to the implementation with two sphere cores is only significant for high modulation order and a large number of spatial streams. Hence, for transmissions with less than four streams, the PHY layer ASIC with ten sphere cores does not provide any error rate performance gain compared to the other STS SD based PHY layer implementations. However, the implementation with ten sphere cores may improve the error rate performance for transmissions with four streams, 64-QAM modulation and a code rate of 5 /6, compared to implementation STS1 and STS2. The error rate performance was evaluated with Monte-Carlo simulations by transmitting frames with 1000 Bytes payload using the HT-MF frame format in the 40 MHz (108 OFDM tones) transmission mode, over a TGnD channel model. A regular GI of 0.8 µs is used. Independent of the number of streams for the transmission, the transmitter always uses four transmit antennas and spatial expansion to radiate the signal. Similar to the transmitter, the receiver always uses four receive chains to receive the transmission. The modulation, the number of streams, and the code rate is altered according to the MCSs used for evaluation. The code itself is always generated with a convolution code with constraint length seven and the generator polynomial [133 o 171 o ]
The rapidly increasing complexity of VLSI systems has made it necessary to pay ever more attention to de- sign issues that affect energy consumption. One of the original motivations for CMOS technology was its low en- ergy consumption, and today, there are still no alternatives that approach it in energy efficiency. Nevertheless, energy consumption is more and more often the factor that limits the performance of contemporary CMOS systems.
In this paper, the author presented novel architectures and designs of high speed, low power 3- 2, 4-2 and 5-2 compressors capable of operating at ultra-low voltages. The power consumption, delay and area of these new compressor architectures are compared with existing and recently proposed compressor architectures and are shown to perform better. The proposed architecture lays emphasis on the use of multiplexers in arithmeticcircuits that result in high speed and efficient design. In the proposed architecture these outputs are efficiently utilized when compared to existing designs to improve the performance of compressors. In the proposed novel architectures of 3-2, 4-2 and 5-2 compressors, the author replaces some XOR blocks with MUX blocks. Since the availability of the select bits before the input bits arrive completes the switching activity of the transistors, the overall delay in the critical path is reduced while using MUX blocks. This paper concludes by analyzing the proposed architectures of 3-2, 4-2, 5-2 compressors with the conventional architectures.
consequently time and cost of development. The automatic regeneration of logic levels involves high noise immunity and low sensitivity to the variances of transistor characteristics. This provides accurate and reliable data and signal processing.
Digital fuzzy processors are generally designed for multipurpose applications in order to interest a maximum of potential customers. They should thus implement a great and various number of fuzzy operators, membership functions and inference rules. This make them rather efficient for a large range of applications, provided that appropriate programming is possible (which supposes an appropriate internal or external memory). On the other hand there are some disadvantage for digital realization. Complex representation of fuzzy vectors and parallel structures are however required to obtain accurate and fast processing. Digital implementations of common fuzzy operations leads unfortunately rapidly to complicated, enormous VLSI circuits.
In order to reduce the dynamic power, an alternative approach to the traditional techniques of power consumption reduction, named adiabatic switching, has been proposed in the last years. In such approach, the process of charging and discharging the node capacitances is carried out in a way so that a small amount of energy is wasted and a recovery of the energy stored on the capacitors is achieved. In literature, various kinds of adiabatic circuits proposed all of them can be grouped into two fundamental classes: fully adiabatic circuits and quasi- adiabatic or partial energy recovery circuits. In the first class, in particular working conditions can consume asymptotically zero energy for operation, the large area occupation and the design complexity make these circuits not competitive with traditional CMOS where as in second class circuits designed to recover large portion of the energy stored in the circuit node capacitances. This energy loss drawback however allows a good trade-off between circuit complexity and then area occupation.In this paper different circuit are presented, and comparison of conventional CMOS adder circuits , 2PASCAL  ,and also comparison of adiabatic families is done. In this work we analyzed the performance of conventional and adiabatic adder circuit’s in-terms of power consumption.
The objective of this project is to design highperformancearithmeticcircuits which are faster and have lower power consumption using a new dynamic logic family of CMOS and to analyze its performance for sequential circuits and effects upon cascading. This new dynamic logic family is known as Feedthrough logic. It has two basic structures: high speed (HS0) and low power (LP0). It allows for commencement of evaluation in a computational block before its evaluation phase begins, and quickly performs a final evaluation as soon as the inputs are valid. This dynamic logic family is best suited to arithmeticcircuits because the critical path is made of a long chain of cascaded inverting gates. As the major advantage of this logic which is higher speed is observed upon cascading, it’s most suitable for arithmeticcircuits. We compare a set of ripple carry adders 4 bit and 16 bit in domino logic with the two basic structures derived. Experimental results have shown that the lower power structure provides for smaller power delay product when compared with domino logic.
Abstract- The most fundamental arithmetic operation is addition which is used in a digital data path logic system. Arithmetic and logic units , Microprocessors ,etc. are some examples where we need to use arithmetic operations for processing data, for calculating addresses respectively .There are different architectures for building adder circuit .For example: 1)carry look ahead adder(CLA), 2)carry propagate adder(CPA), 3)carry save adder(CSA), & 4)carry select adder(CSLA) . Among these different architectures CSLA is a particular way of implementing adder that performs addition rapidly and are used for faster addition in many data processing processors .From observation of the carry select adder architecture we can see that there is scope for modification in order to significantly minimize the area and power consumed by the circuit. In this work we are going to propose simple and efficient modification at gate-level structure in CSLA. Based on this 16-, 32-bit square root CSLA (SQRT CSLA) have been developed & compared with regular structure. The proposed architecture design has reduced area & power consumption compared to regular structure with slight increase in delay. The evaluation of the proposed design is done based on delay, area & power performance metrics. The results show that proposed CSLA design is better than regular SQRT CSLA. Index Terms- Area and energyefficient, CSLA, Arithmetic operations, SQRT CSLA, Data path logic systems.