Statistical Spreadsheets - Design Methodologies

Setup Timings.

Chapter 9: Design Methodologies

9.1.2. Statistical Spreadsheets

The problem with using a spreadsheet based on the worst-case timings is that system speeds are getting so fast that it might be impossible to design a system that works, assuming that all components are worst case throughout all manufacturing processes. Statistically, the conditions of the worst-case timing spreadsheet may happen only once in a million or once in a billion. Subsequently, in extremely high-speed designs where the worst- case spreadsheet shows negative margin, it is beneficial to use a statistical spreadsheet in conjunction with (not in place of) the worst-case timing spreadsheet. The statistical

spreadsheet is used to assess the risk of a timing failure in the worst-case spreadsheet. For example, if the worst-case timing spreadsheet shows negative margin but the statistical spreadsheet shows that the probability of a failure occurring is only 0.00001%, the risk will probably be deemed acceptable. If it is shown that the probability of a failure is 2%, the risk will probably not be acceptable.

The statistical spreadsheet requires that a mean and a standard deviation be obtained for each timing component. The spreadsheets are then constructed with the following equations:

(9.1) (9.2) (9.3) (9.4)

where µ is the timing mean, σ the standard deviation, and k the number of standard

deviations (i.e., k = 3 refers to a 3σ spread). Equations (9.1) and (9.2) are for common-clock buses and (9.3) and (9.4) are for source synchronous buses.

It should be noted that these equations assume that the distributions are approximately normal (Gaussian) and that the timing components are completely independent of each other. Technically, neither of these assumptions is correct in a digital design; however, the approximations are close enough to assess the risk of a failure. If the engineer understands these limitations and uses the worst-case spreadsheet as the primary tool, the statistical equations can be a powerful asset. Furthermore, the equations below represent only one way to implement a statistical spreadsheet, a resourceful engineer may come up with a better method. Equations (9.1) through (9.4) can be compared directly to the equations derived in Chapter 8. However, note that there is no jitter term because the jitter is included in the statistical variation of the cycle time. Notice the tester guard band (Tguard) term is not

handled statistically. This is because the statistical accuracy of the testers is usually not known.

The most significant obstacle in using a statistical timing spreadsheet is determining the correct statistical means and distributions of the individual components. This becomes especially problematic if portions of the system will be obtained from numerous sources. For example, PCB vendors each have a different process, which will provide different mean velocities and impedance values with different distribution shapes. Furthermore, the

statistical data for a given vendor may change. Sometimes, however, the statistical behavior can be estimated. For example, consider a PCB vendor which claims that the impedance of the PCB board will have a nominal value of 50 Ω and will not vary more than ±15%. A reasonable assumption would be to assume a normal distribution and that the ±15% points are at three standard deviations (3σ). Approximations such as this can be used in

conjunction with Monte Carlo simulations to produce statistical approximations of flight time and skew. Alternatively, when uniform distributions are combined in an interconnect simulation using large Monte Carlo analysis, the result usually approximately resembles a normal distribution. Subsequently, the interconnect designer could approximate the statistical parameters of flight time or flight-time skew by running a large Monte Carlo

analysis where all the input variables are varied randomly over uniform distributions bounded by the absolute variations. Since the resultant of the analysis will resemble a normal

distribution, the mean and standard deviation can be approximated by examining the resultant data. Numerous tools, such as Microsoft Excel, Mathematica, and Mathcad, are well suited to the statistical evaluation of large data sets. It is possible that the statistical technique described here may have some flaws; however, it is adequate to approximate the risk of a failure if the worst-case spreadsheet shows negative margin.

Ideally, the designer would be able to obtain accurate numbers for the mean and standard deviation for all or most of the variables in the simulation (i.e., PCB Zo, buffer impedance,

input receiver capacitance, etc.). Monte Carlo analysis (an analysis method available in most simulators in which many variables are varied fairly randomly and the simulated results are observed) is then performed with a large number of iterations. The resultant timing numbers are then inserted into the statistical spreadsheet. Note that similar analysis is required from the silicon design team to determine the mean and standard deviation of the silicon timings (i.e., Tvb, Tva, Tsetup, Thold, and Tjitter).

To determine the risk of failure for a design that produces negative margin in a worst-case timing spreadsheet, the value of k is increased from 1 incrementally until the timing margin is zero. This will determine how many standard deviations from the mean timings are

necessary to break the design and will provide insight into the probability of a failure. For example, if the value of k that produces zero margin is 1, the probability that the design will exhibit positive margin is only 68.26%. In other words, there is a 31.74% chance of a failure (i.e., a design that produces negative timing margin). However, if the value of k that

produces zero margin is 3.09, the probability that the timing margin will be greater or equal to zero is 99.8%, which will produce 0.2% failures. In reality, however, a 3.09σ design will probably produce a 99.9% yield instead of 99.8% simply because the instances that will cause a failure will usually be skewed to one side of the bell curve. When considering the entire distribution, this analysis technique assumes that the probability of a result lying outside the area defined by sigma is equivalent to the percentage of failures. Obviously, this is not necessarily true; however, it is adequate for risk analysis, especially since it leans toward conservatism. Typically, in high-volume manufacturing, a 3.0σ design (actually, 3.09), is considered acceptable, which will usually produce a yield of 99.9%, assuming one-half of the curve.

Figure 9.3 demonstrates how the value of k relates to the probability of an event occurring for a normal distribution. Table 9.1 relates the value of k to the percentage of area not covered, which is assumed to be the percentage of failures.

Table 9.1: Relationship Between k and the Probability That a Result Will Lie Outside the Solution

k 1 2 3 3.09 3.72 4.26

Percent failure 31.74 4.56 0.27 0.2 0.02 0.002

Source: Data from Zwillinger [1996].

Figure 9.3: Relationship between sigma and area under a normal distribution: (a) k = 1

9.2. TIMING METRICS, SIGNAL QUALITY METRICS, AND TEST

In document High-Speed Digital System Design A Handbook of Interconnect Theory and Design Practices (Page 178-181)