Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis

(1)

APPLIED

RELIABILITY

Techniques for Reliability

Analysis

with

Applied Reliability Tools (ART) (an EXCEL Add-In)

and

JMP® Software

AM216 Class 5 Notes

Santa Clara University

Copyright



David C. Trindade, Ph.D.

S

TAT-

T

ECH ®

(2)

AM216 Class 5 Notes

• Accelerated Testing

(continued from Class 4 Notes)

– Accelerated Test Example (Analysis in JMP) – Degradation Modeling

– Sample Sizes for Accelerated Testing

• System Models

– Series System – Parallel System

– Analysis of Complex Systems – Standby Redundancy

• Defective Subpopulations

– Graphical Analysis – Mortals and Immortals – Models

– Case Study

– Class Project Example

• Modeling the Field Reliability

(3)

System Models

Series System

Consider a system made up with n components in series. If the

i

th component has reliability R_i(t),

the system reliability is the product of the individual reliabilities, that is,

 

R t

_s

( )



R t

₁



R t

₂

 

...

R t

_n

which we denote with the capital “pi” symbol for multiplication

 

R t

_s

R t

_i i n





1

The system CDF, in terms of the individual CDF’s, is

The system failure rate is the sum of the individual

component failure rates. The system failure rate

is higher than the highest individual failure rate.

(4)

System Models

Parallel System

Consider a system made up with n components in parallel. The system CDF is the product of the individual CDF’s, that is,

The system reliability is

System failure rates are no longer additive (in

fact, the system failure rate is smaller than the

smallest individual failure rate), but must be

calculated using basic definitions.

(5)

System Failure Rate

Two Parallel Components

A component has CDF F(t) and a failure rate h(t). Two components are used in parallel in a system. Determine the failure rate of the system.

SOLUTION

The CDF for the two components in parallel is F2_(t)

and the PDF, by differentiation, is 2F(t)f(t). The failure rate of the system is

 

   

 

   

               f t F t F t f t F t F t F t f t F t F t F t h t s s 1 2 1 2 1 1 2 1 2

 

h t_s

The result shows that the system failure rate is a

factor 2F/(1+F) times the component failure rate. The smaller the component CDF, the bigger the

(6)

Class Project

System Models

A) A component has reliability R(t) = 0.99. Twenty-five components in series form a system. Calculate the system reliability.

B) A component has reliability R(t) = 0.95

(7)

Reliability Block Diagrams

For components in series:

For components in parallel:

A B

(8)

Example of Series-Parallel

System: Big Rig

G H J I C D F E A B

Trailer

Cab

G H I J E F C D B A

(9)

Class Project

Complex Systems

A system consists of seven units: A, B, C, D, E, G, H. For the system to function unit A and either unit B or C

and either D and E together or G and H together must

be working. Draw the reliability block diagram for this setup.

(10)

Standby Versus Active

Redundancy

In contrast to active parallel redundancy, there is

standby redundancy in which the second

component is idle until needed. Assuming perfect

switching and no degradation of the idle

component, standby redundancy results in higher reliability and less maintenance costs than active parallel redundancy. An illustration, assuming exponentially distributed failure times, is shown below.

System Failure Rates (2 Components)

(11)

Series, Parallel Reliability in

ART

(12)

Reliability Experiment

Consider . . .

We test 100 units for 1,000 hours. There are 30 failures by 500 hours, but no more by the end of test.

Question : Are we dealing with two

populations or just censored data ?

Question : If we continue the test, will we see

(13)

Defect Models

Mortals versus Immortals

The usual assumption in reliability analysis is that

all units can fail for a specific mechanism. If a

defective subpopulation exists, only a fraction of

the units containing the defect may be susceptible to failure. These are called mortals.

Units without the fatal flaw do not fail. These are called immortals.

The model for the total population of mortals and immortals becomes :

CDF = (fraction mortals) x CDF(mortals)

(14)

Example of a Defective

Subpopulation

A Processing Problem

Suppose we have 25 wafers in a lot, but only two wafers are contaminated with mobile ions due to a processing error.

If components are assembled from the 25 wafers, assuming equal yield per wafer, only 2/25= 8% of the components can have the fatal “defect” that makes failure possible.

The components from the non-contaminated wafers will not fail for this mechanism since they are defect free; that is, we have a defective

(15)

Spotting a Defective

Subpopulation

Graphical Analysis

Assume that a specified failure mode follows a lognormal distribution.

Plot the data on lognormal graph paper. If instead of following a straight line, the points seem to curve

away from the cumulative percent axis, it’s a signal

that a defective subpopulation may be present. If test is run long enough, expect plot to bend over

(16)

Defective Subpopulations

Graphical Analysis

Plot based on total sample (mortals and immortals).

(17)

Defect Model

Mortals and Immortals

The observed CDF F_obs(t) is

F

_obs

(t) = p F

_m

(t)

where F_m(t) is the CDF of the mortals and p is the fraction of mortals (units with the fatal defect) in the

total sample size.

For example, if there are 25 % mortals in the

population, and the mortal CDF at time t is 40%, then we would expect to observe about

0.25x0.40 = 0.10

(18)

Major Computer

Manufacturer Reliability Data

Gate Oxide Fails

Time (hours) 24 48 168 500 1000

Rejects 201 23 1 1 1

Sample Size 58,000 57,392 10,000 2,000 1,999

Censored 407 47,369 7,999 0 1,998

(19)

What Do These

Numbers Mean?

Plus and minus 3 sigma range of time to failure distribution extends from 33 seconds to 1.66E62

years !

It takes seconds to get to 0.1% cumulative failures, but over 412,000 hours (that is, 47 years) to get to

1.00% !

Assuming everything can fail is misleading and

unnecessary.

(20)

Modeling with

Defective Subpopulations

The same data, assuming 99% of the failures have occurred by 48 hours, can be modeled by a fraction

defective subpopulation of 227/58,000 = 0.39% and

a lognormal distribution of failure times for the

mortals T₅₀ =10.6 hours and sigma = 0.68.

Practically 100% of failures occur by 168 hours. Any failures thereafter are probably not related to the

defective subpopulation. For example, handling

(21)

Defective Subpopulation

Models

If we don’t consider mortals vs. immortals, we will incorrectly assume that all units can fail.

Projections of field reliability will be biased

(22)

Statistical Reliability

Analysis and Modeling:

A Case Study

Analysis of Reliability Data

with Failures from a

(23)

Reliability Study

Background

One lot of a device type with initial burn-in results at 168 hours, 125oC :

Over 50% fallout due to bake recoverable failures

Since other lots, with similar manufacturing, might have escaped to a few customers, we needed to

assess the field impact.

(24)

Reliability Study

Design

Two static stresses:

179 Units : 125oC ambient

90 Units : 150oC ambient

30 Units: Control

Frequent readouts at 2, 4, 8, 16, 32, 48, 68, 92,

(25)

Purpose of Study

Reliability Modeling

• Determine if fraction defective (mortals) model applies

• Determine failure distribution (lognormal, parameters)

• Determine if true acceleration is present

• Determine activation energy for acceleration factors

• Determine recovery kinetics with and without bake

- Is 24 hours at 150o_{C necessary?}

(26)

Modeling Procedure

Statistical Analysis Plan

• Analyze cumulative percent failures plot versus time, both linear and probability plots.

• Estimate fraction mortals for stress cells. Test for significant difference.

• Plot fallout of mortals (reduced sample size) on lognormal probability graph. Check for linearity and equality of slopes.

• Run maximum likelihood analysis. Test for equality of shape factors (sigmas). Estimate

single sigma. Estimate median life T₅₀ for both cells.

(27)

Reliability Study

Bake Recoverable Failures

(28)

Reliability Study

Bake Recoverable Failures

(29)

Reliability Study

Bake Recoverable Failures

P ro b ab ility P lo t (Ad ju sted fo r M o rtals)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 L n (T im e to F a ilu re ) S ta n d a rd N o rm a l V a ri a te : Z 150oC 125oC

(30)

CHO OSE CONF. LIMIT FOR BOUND IN PERCENT: 9 0

ENTER ANY EXACT TIME S O F FAILURE FOR CELL 1

ENTER START AND ENDPOINT OF ALL READOUT INTERVALS (INCLUDE ZERO’S) SPREAD 2 4 8 16 32 48 68 92 11 6

ENTER CORRESPONDING NUMBERS OF FAILS PER INTERVAL (INCLUDE ZERO’S) 34 6 21 2 0 0 0 1 0

ENTER TIMES AL L FAILED UNITS WERE REMO VED FROM TEST (INCLUDING END OF TES T) 116

ENTER CORRE SPONDING NUMBERS REMOVED 0

ENTER ANY EXACT TIME S O F FAILURE FOR CELL 2

ENTER START AND ENDPOINT OF ALL READOUT INTERVALS (INCLUDE ZERO’S) SPREAD 2 4 8 16 32 48 68 92 11 6

ENTER CORRESPONDING NUMBERS OF FAILS PER INTERVAL (INCLUDE ZERO’S) 5 0 36 8 42 7 3 4 3

ENTER TIMES AL L FAILED UNITS WERE REMO VED FROM TEST (INCLUDING END OF TES T) 16 116

ENTER CORRE SPONDING NUMBERS REMOVED 2 3

MAXIMUM LIKELIHOOD ESTIMATES

VARIANCE VARIANCE COVARIANCE

CELL T50 SIGMA MU SIGMA MU MU SIGMA

1 1.90 1.208 .444 .0322 .0 373e-1 .643e-2

2 15 .08 1.060 2.714 .0059 .0 104e-3 .266e-5

ESTIMATE BO UNDS (90 PERCE NT CO NFIDENCE)

NUM. NUM.

CELL ON TEST FA IL T50 LOW T50 UP SIGMA LOW SIGMA UP

1 64 64 1.38 2.63 .909 1.508

2 11 3 108 12.74 17.86 .933 1.187

WANT EQUAL T50’S OR SIGMAS OR BOTH IN SOME CELLS (Y/N)? Y

CELLS: 1 2

TYPE 1 FOR EQUAL SIGMA’S, 2 FOR EQUAL MU’S, 3 FOR BOTH THE SAME: 1

THE ASSUMPTION OF QUAL SIGMA’S CAN NOT BE REJECTED AT THE 95 PERCENT LEVEL. UNDER THIS A SSUMP TION, RESULTS LIK E O BSERVED OCCUR AB OUT 41.9 PERCE NT OF THE TIME. (THE S MA LLER THIS PE RCENT, THE LESS LIKEL Y THE ASSUMPTION.)

MAXIMUM LIKELIHOOD ESTIMATES

VARIANCE VARIANCE COVARIANCE

CELL T50 SIGMA MU SIGMA MU MU SIGMA

1 2.02 1.090 .704 .0051 .0 247e-2 .538e-3

2 15 .08 1.090 1.713 .0051 .0 110e-2 .250e-5

ESTIMATE BO UNDS (90 PERCE NT CO NFIDENCE)

NUM. NUM.

CELL ON TEST FA IL T50 LOW T50 UP SIGMA LOW SIGMA UP

1 64 64 1.56 2.63 .972 1.207

2 11 3 108 12.68 17.54 .972 1.207

(31)

Reliability Study

Bake Recoverable Failures

(32)

Projection to Field Conditions

Acceleration Statistics

• Estimate acceleration factor between two stress cells : AF = 15.08 / 2.02 = 7.465 • Estimate activation energy, based on Tj’s,

35oC above ambient: E_A = 1.375 eV

• Estimate field T₅₀ based on Tj at 55oC ambient : field T₅₀ = 18,288 hours

• Using field T₅₀, sigma = 1.090, lognormal

distribution:

-project fallout and failure rates for various mortal fractions

(33)

Projection to Field Use

Bake Recoverable Fails

(34)

A Note of Caution

Analysis When Mortals Are Present

Since the analysis which took into account the presence of a defective subpopulation, parameter

estimates were accurate. The two customers,

notified of the affected lots, used analysis for

(35)

A Side Benefit

Screening a Wearout Mechanism

Note that it may be possible to screen a wearout

failure mechanism if only a subpopulation of the

units are mortal for that mechanism and sufficient acceleration is obtainable.

See Trindade paper “Can Burn-in Screen Wearout Mechanism? Reliability Models of Defective

Subpopulations - A Case Study” in 29th _Annual

(36)

Class Project

Defect Models

50 components are put on stress. Readouts are at 10, 25, 50, 100, 200, 500, and 1,000 hours. The failure counts at the respective readouts are 2, 2, 4, 5, 4, 3, and 0.

1. Estimate the CDF for all units using the table below with n = 50.

2. Plot the data on Weibull probability paper on the next page.

(37)

(38)

(39)

Class Project

Defect Model Estimates

Weibull Parameter Estimates for Mortal Population: Characteristic Life (c) __________

Shape Parameter (m) __________





F t

( )

 

1 e



t c

/

m

How could we confirm that the Weibull model for the mortal population fits the data? We estimate the CDF at three times and compare to

(40)

Defective Subpopulations in

ART

Enter failure information (readout times, cumulative failures) into columns. Under ART, select Defective

(41)

System Models

A General Model for the

Field Reliability of

Integrated Circuits

(42)

Failure Rate Calculations

Primitive Method

Assumptions

• Constant failure rate • Single overall activation

energy

• Ambient temperatures

(43)

Primitive Method

Problems with Calculations

Example

100 units are stressed for 1,000 hours at 125oC.

Assume no self heating. One unit fails at 10 hours for mechanism with E_A of 1.0 eV. Second unit fails at 500 hours for failure mechanism with E_A of 0.5 eV.

Primitive Method Calculation

Overall average activation energy : 0.75 eV Acceleration Factor (125oC to 55oC): AF = 106 IFR (constant) at 55oC :

(44)

Primitive Method

Comparative Calculation

Individual Analysis by Failure Mechanism

Mechanism 1: E_A = 1.0 eV, AF = 501 IFR (constant) at 55oC:

[1E9/(10+500+98x1000)]/AF = 20 FITS

Mechanism 2: E_A = 0.5 eV, AF = 22, IFR (constant) at 55oC:

[1E9/(10+500+98x1000)]/AF = 461 FITS

(45)

Failure Rate Calculations

Later Improved Method

• Early failures (infant mortality) reported separately

• Long-term life modeled with activation energy

specific to failure mechanisms

• Constant failure rate for long term life

• Temperature acceleration calculated with junction

(46)

Later Method

Problems

• Defective subpopulations not adequately

modeled

• Competing failure modes not adequately

modeled with constant failure rate

• Zero rejects and unidentified mechanisms

often not treated

(47)

An Alternative Model

Three categories of possible failures:

Test Escapes

Defective Subpopulations

Competing Failure Mechanisms

The three D’s:

(48)

Non-Functional Test Escapes

 Dead on arrival (DOA)

 Quality issue

 Inadequate testing at manufacturer

or damaged after testing prior to customer receipt

 Rejects “discovered” at customer;

called mistakenly reliability failures

(49)

Defective Subpopulations

There are proportions of the total population at risk of failure. Defective units are called mortals. The ones without the defect are called immortals.

Defective subpopulations are generally associated with processing problems.

There are physical reasons why defective subpopulations should exist.

(50)

Competing Risks

There are failure mechanisms that can affect all units.

We call these mechanisms competing risks

because several different types may exist and any

one can cause the unit to fail.

These mechanisms are typically associated with

design, processing, or material problems.

(51)

General Reliability Model

• Activation energies are specific to failure mechanisms.

• Zero rejects and unidentified mechanisms are included.

• Generates complete bathtub curve!





F

_T





F

_e





F

_d

  

1  

F

_N

where

(52)

General Reliability Model In

Use at AMD

(53)

(54)

(55)

Class Project

System Models

A) A component has reliability R(t) = 0.99. Twenty-five components in series form a system. Calculate the system reliability.

R_s(t) = (0.99)25 _{= 0.778 or 77.8%}

B) A component has reliability R(t) = 0.95

Three components in parallel form a system. Calculate the system reliability.

(56)

Class Project

Complex Systems

A system consists of seven units: A, B, C, D, E, G, H. For the system to function unit A and either unit B or C

and either D and E together or G and H together must

be working. Draw the reliability block diagram for this setup.

Write the equation for the CDF of the system in

(57)

Defect Models

1. Estimate the proportion defective p and the

number of mortals in the sample. Fill in the mortal CDF column in the table below.

2. Plot the data for the mortal subpopulation on

the same sheet of paper. Does the fit look reasonable?

4. Estimate the characteristic life c = T₆₃, the 63rd percentile.

5. Estimate the shape parameter m by drawing a

line perpendicular to the “best fit by eye line”

through the estimation point on the Weibull paper and reading the beta estimation scale.

(58)

Class Project

Defect Model Example

Time Cum # Fails CDF Est All Units (%) CDF Est Mortals (%) 10 2 2/50 = 4% 2/20 = 10% 25 4 4/50 = 8% 4/20 = 20% 50 8 8/50 = 16% 8/20 = 40% 100 13 13/50 = 26% 13/20 = 65% 200 17 17/50 = 34% 17/20 = 85% 500 20 20/50 = 40% 20/20 = 100% 1000 20 20/50 = 40% 20/20 = 100%

n = 50

(59)

(60)

Class Project

Defect Model Example

Model Check

(61)