12.1 Introduction ...12-1 Clock Distribution
12.2 Clocking Considerations in Sequential Systems ...12-2 Clocked Storage Elements • Time Borrowing and Absorption of Clock Uncertainties
12.3 Asynchronous Systems...12-8 12.4 Globally Asynchronous Locally Synchronous
Systems ...12-8 12.5 Conclusion ...12-10 To Probe Further...12-10 References ...12-10
12.1 Introduction
The clock speed has been rising rapidly, doubling every 3 years as plotted in Figure 12.1. Currently, the highest microprocessor clock frequency is slightly above 3 GHz, while that number is changing rapidly upward. It is expected that we will reach 10 GHz in the next 5 years, and by the year 2010, the processors will be running at frequencies beyond 10 GHz. At that clock rate, several challenges may force us to reexamine standard approaches to clocking.
As the clock speed increases, the number of logic levels in the critical pathdiminishes. In today’s high-speed processors, instructions are executed in one cycle, which is driven by a single-phase clock. In addition, the pipeline depth is increasing to 15 or 20 to accommodate the speed increase. Today, 10 levels of logic in the critical path are common; however, the amount of logic between the two stages is decreasing further. Thus, any overhead associated with the clock system and clocking mechanism that is directly and adversely affecting the machine performance is critically important.
At today’s frequencies, the ability to absorb clock skew and to use a faster clocked storage element (CSE) results in direct and significant performance improvements. These improvements are very difficult to obtain through architectural techniques or micro-architecture levels. As the clock frequency reaches 5 to 10 GHz, traditional clocking techniques will be stretched to their limits. New ideas and new ways of designing digital systems are required.
12.1.1 Clock Distribution
The two most important timing parameters affecting the clock signal are: clock skew and clock jitter.
Clock skew is a spatial variation of the clock signal as distributed through the system. It is caused by the various RC characteristics of the clock paths to the various points in the system, as well as different loading of the clock signal at different points on the chip. In addition, we can distinguish between global clock skew and local clock skew. These are both equally important in high-performance system design.
Vojin G. Oklobdzija
University of California—Davis
1941_C12.fm Page 1 Thursday, September 30, 2004 4:50 PM
Copyright 2005 by CRC Press
12-2 Low-Power Electronics Design
Clock jitter is a temporal variation of the clock signal with regard to the reference transition (i.e., reference edge) of the clock signal as illustrated in Figure 12.2.
Clock jitter represents edge-to-edge variation of the clock signal in time. As such, clock jitter can also be classified as long-term jitter and edge-to-edge clock jitter, which defines the clock signal variation between two consecutive clock edges. In the course of high-speed logic design, we are more concerned about edge-to-edge clock jitter because this phenomenon affects the time available for the logic operation.
Typically, the clock signal has to be distributed to several hundreds of thousands of the CSEs. Therefore, the clock signal has the largest fan-out of any node in the design, which requires several levels of amplification. Consequently, the clock system alone can use up to 40 to 50% of the power of the entire very large-scale integration (VLSI) chip [1,9]. We also must assure that every CSE receives the clock signal precisely at the same moment in time.
12.2 Clocking Considerations in Sequential Systems
A traditional view of the finite state machine (FSM) is represented by the Huffman model, which consists of combinational logic (CL) and CSEs. In this model, the next state, which is determined by the present FIGURE 12.1 Clock frequency over the years for various representative machines and processors.
FIGURE 12.2 Clock parameters: period, width, clock skew, and clock jitter.
PIII
1975 1980 1985 1990 1995 2000 Year2005
Nominal Clock Frequency (MHz) 1941_C12.fm Page 2 Thursday, September 30, 2004 4:50 PM
Copyright 2005 by CRC Press
Clocking Multi-GHz Systems 12-3
state and the input (as in the case of Mealy machine), is stored into the CSE by the triggering mechanism of the clock (i.e., edge or level). Following this model, we are used to thinking that the purpose of the CSE is to “hold” or “memorize” the state. This view is further supported by the level sensitive scan design (LSSD) methodology, which uses the storage elements to “scan-out” the state of the machine during the test and debug mode.
We want to present a slightly different view. The purpose of the CSE is to prevent the corruption of the next state as illustrated in Figure 12.3.
This model is broader and can represent wave pipelining [2], for example. In the case of wave pipelining, the signal is blocked from corrupting the present state Sn by a sheer delay of the wire. It simply cannot arrive in time, therefore, no blocking is necessary; however, this model also reveals problems of wave pipelining technique. Ideally, all the signals should arrive at the same point in time, which is not possible. Therefore, the fast-path problem becomes more difficult to control and stringent requirements are necessary. Thus, the system will run the risk of corrupting the state after several cycles.
The case of skew-tolerant domino logic [5,6], illustrated in Figure 12.4, conforms to the model presented in Figure 12.3.
Blocking of the signal is accomplished by the precharge phase of the clock. For example, while clock F2 is “low” (precharge), data from stage 1 cannot be passed onto the stage 2. Only after the precharge phase has elapsed and clock F2 has returned to “high” value can data from the stage 1 be passed to Stage 2. This transfer has to be completed while the clock F1 is “high.” Obviously, the speed of this logic is determined by precise matching of the clocks. This is accomplished by having the clock signal travel along the data-path, while delaying the clock for the amount of time needed in the logic stage generates the local clocks. In some way, this is similar to the clocking used in the early mainframe computers [3].
12.2.1 Clocked Storage Elements
The function of aCSE (flip-flop or latch)is to block the signal path, thus preventing it from corrupting the present state. In addition, it may be used to capture the state information and preserve it as long as it is needed by the digital system. It is not possible to define a storage element without defining its relationship to the clock.
FIGURE 12.3 Different view of an FSM (Huffman model).
Present State: Sn
Inputs (X) Combinational Outputs (Y) Logic
Next State Sn+1 Sn+1 = f (Sn, X ) Path Blocker
(CSE)
Clock
Blocker Transparent
Signal Blocked can not corrupt present state Sn
logic signals adjusted not to arrive
earlyer
Signals blocked
Orderly change of state from Sn to Sn+1
at this point Y = Y(X,Sn) 1941_C12.fm Page 3 Thursday, September 30, 2004 4:50 PM
Copyright 2005 by CRC Press
12-4 Low-Power Electronics Design
12.2.1.1 Timing Parameters
Data and clock inputs of a CSE must satisfy basic timing restrictions to ensure correct operation [4].
Fundamental timing constraints between data and clock inputs are quantified with setup and hold times, as illustrated in Figure 12.5 [8]. Setup and hold times define time intervals during which input has to be stable to ensure correct flip-flop operation. The sum of setup and hold times define the sampling window of the CSE. The sampling window is the period in which the CSE is sampling, and data is not allowed to change.
12.2.1.2 Setup and Hold Time Properties
Failure of the CSE due to the setup and hold time violations is not an abrupt process. This failing behavior is depicted in Figure 12.6.
Two opposing requirements exist with respect to the change of data signal as the locking event is approaching [8]:
FIGURE 12.4 Skew-tolerant domino logic: no explicit latches.
FIGURE 12.5 Setup and hold time behavior as a function of clock-to-output delay [8].
Clock 1 Clock 2 Clock 3
Clock 1
Slow path has to pass its value to the next
stage by now Signal Paths blocked
Transfer of Data from 1 to 2 Pipline stage 1 Pipline stage 2 Pipline stage 3
Clock 3 1941_C12.fm Page 4 Thursday, September 30, 2004 4:50 PM
Copyright 2005 by CRC Press