Reliability Modeling: The RIAC Guide to Reliability Prediction, Assessment

(1)

RIAC is a DoD Information Analysis Center sponsored by the Defense Technical Information Center. RIAC is operated by a

Reliability Modeling

The RIAC Guide to Reliability Prediction,

Assessment and Estimation

LI[(Ta1,Tb1),...,(TaL,TbL)/θ] ∝ [F(Tbi) − F(Tai)] |θ) i=1 L

∏

1− CL =

( )

λt k k! e−λt k=0 r

∑

=e−λt _{1+ λt + ⋅⋅⋅⋅ +}

( )

λt r−1 r −1

(

)

!+ λt

( )

r r

( )

! ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥

λ = λ

b

e

−Ea KT

S

n

(2)

(3)

Reliability Modeling -

The RIAC Guide to Reliability

Prediction, Assessment and

Estimation

Prepared by:

Reliability Information Analysis Center 6000 Flanagan Rd.

Suite 3

Utica, NY 13502-1348 Under Contract to:

Defense Technical Information Center DTIC-AI

8725 John J. Kingman Rd. Suite 0944

Fort Belvoir, VA 22060

RIAC is a DoD Information Analysis Center sponsored by the Defense Technical Information Center. RIAC is operated by a team of Wyle Laboratories, Quanterion Solutions Inc., the University of Maryland, the Penn State University Applied Research Laboratory and the State University

(4)

The information and data contained herein have been compiled from government and nongovernment technical reports and from material

supplied by various manufacturers and are intended to be used for reference purposes. Neither the United States Government nor the Wyle Laboratories contract team warrant the accuracy of this information and data. The user is further cautioned that the data contained herein may not be used in lieu of other contractually cited references and specifications.

Publication of this information is not an expression of the opinion of The United States Government or of the Wyle Laboratories contract team as to the quality or durability of any product mentioned herein and any use for advertising or promotional purposes of this information in conjunction with the name of The United States Government or the Wyle Laboratories contract team without written permission is expressly prohibited.

ISBN-10: 1-933904-17-8 (Hardcopy) ISBN-13: 978-1-933904-17-7 (Hardcopy) ISBN-10: 1-933904-18-6 (PDF Download) ISBN-13: 978-1-933904-18-4 (PDF Download)

(5)

gathering and maintaining the data needed and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports(0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a current or valid OMB control number.

PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.

1. REPORT DATE 31 May 2010

2. REPORT TYPE

Technical

3. DATES COVERED (From - To) N/A

4. TITLE AND SUBTITLE

Reliability Modeling – The RIAC Guide to Reliability Prediction, Assessment and Estimation

5a. CONTRACT NUMBER HC1047-05-D-4005 5b. GRANT NUMBER N/A

5c. PROGRAM ELEMENT NUMBER N/A

6. AUTHORS

William Denson

5d. PROJECT NUMBER N/A 5e. TASK NUMBER

N/A 5f. WORK UNIT NUMBER

N/A 7. PERFORMING ORGANIZATIONS NAME(S) AND ADDRESS(ES)

Reliability Information Analysis Center 100 Sherman Rd. Suite C101 Utica, NY 13502-1348 8. PERFORMING ORGANIZATION REPORT NUMBER RPAE

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)

Defense Technical Information Center DTIC-AI Air Force Research Lab/RISE 8725 John J. Kingman Rd. STE 0944 525 Brooks Rd.

Ft. Belvoir, VA 22060 Rome, NY 13440

10. SPONSORING/MONITOR’S ACRONYM(S)

DTIC-AI and AFRL/RISE 11. SPONSORING/MONITOR’S REPORT NUMBERS

N/A 12. DISTRIBUTION/AVAILABILITY STATEMENT

Approved for public release, distribution unlimited.

13. SUPPLEMENTARY NOTES

Hardcopies available from Reliability Information Analysis Center, 100 Sherman Rd., Suite C101, Utica, NY 13502-1348. (Price: $85 US/$95 Non-US). PDF Download available from http://theRIAC.org (Price $70).

14. ABSTRACT

The intent of this book is to provide guidance on modeling techniques that can be used to quantify the reliability of a product or system. In this context, reliability modeling is the process of constructing a mathematical model that is used to estimate the reliability characteristics of a product. There are many ways in which this can be accomplished, depending on the product or system and the type of information that is available, or practical to obtain, to the analyst. This book will review possible approaches, summarize their advantages and disadvantages, and provide guidance on selecting a methodology based on the specific goals and constraints of the analyst. While this book will not discuss the use of specific published methodologies, in cases where examples are provided, tools and methodologies with which the author has personal experience in their development are used, such as life modeling, NPRD, MIL-HDBK-217 and 217Plus. 15. SUBJECT TERMS

Reliability Modeling Reliability Prediction Reliability Assessment Reliability Estimation NPRD MIL-HDBK-217 217Plus

16. SECURITY CLASSIFICATION OF:

UNCLASSIFIED 17. LIMITATION OF ABSTRACT UNLIMITED 18. NUMBER OF PAGES 410

19a. NAME OF RESPONSIBLE PERSON David Nicholls a. REPORT UNCLASSIFIED b. ABSTRACT UNCLASSIFIED c. THIS PAGE UNCLASSIFIED 19b. TELEPHONE NUMBER (include area code)

315.351.4202

Standard Form 298 (Rev. 8/98) Prescribed by ANSI Std. Z39.18

(6)

The Reliability Information Analysis Center (RIAC), formerly the Reliability Analysis Center (RAC), is a Department of Defense Information Analysis Center sponsored by the Defense Technical Information Center, managed by the Air Force Research Laboratory (formerly Rome Laboratory), and operated by a team of Wyle Laboratories, Quanterion Solutions, the University of Maryland, the Penn State University Applied Research Laboratory and the State University of New York Institute of Technology. RIAC is chartered to collect, analyze and disseminate reliability, maintainability, quality, supportability and interoperability (RMQSI) information pertaining to systems and products, as well as the components used in them. The RIAC addresses both military and commercial perspectives.

The data contained in the RIAC databases is collected on a continuous basis from a broad range of sources, including testing laboratories, device and equipment manufacturers, government laboratories and equipment users (government and industry). Automatic distribution lists, voluntary data submittals and field failure reporting systems supplement an intensive data solicitation program. Users of RIAC are encouraged to submit their RMQSI data to enhance these data collection efforts. RIAC publishes documents for its users in a variety of formats and subject areas. While most are intended to meet the needs of RMQSI practitioners, many are also targeted to managers and designers. RIAC also offers RMQSI consulting, training and responses to technical and bibliographic inquiries. REQUESTS FOR TECHNICAL ASSISTANCE

AND INFORMATION ON AVAILABLE RIAC SERVICES AND PUBLICATIONS MAY BE DIRECTED TO:

ALL OTHER RIAC REQUESTS SHOULD BE DIRECTED TO:

Reliability Information Analysis Center 100 Sherman Rd.

Suite C101

Utica, NY 13502-1348

General Information:(877) 363-RIAC

(877) 363-7422

Technical Inquiries: (315) 351-4200

Fax: (315) 351-4209

E-Mail: [email protected] Internet: http://theRIAC.org

Air Force Research Laboratory AFRL – Systems and Information Interoperability Branch Attn: R. Hyle 525 Brooks Road Rome, NY 13441-4505 Telephone: (315) 330-4857 DSN: 587-4857 Fax: (315) 330-7647 E-Mail: [email protected]

Solutions Incorporated, in support of the prime contractor (Wyle Laboratories) in the operation of the Department of Defense Reliability Information Analysis Center (RIAC) under Contract HC1047-05-D-4005. The Government has a fully paid up perpetual license for free use of and access to this publication and its contents among all the DOD IACs in both hardcopy and electronic versions, without limitation on the number of users or servers. Subject to the rights of the Government, this document (hardcopy and electronic versions) and the content contained within it are protected by U.S. Copyright Law and may not be copied, automated, re-sold, or redistributed to multiple users without the express written permission. The copyrighted work may not be made available on a server for use by more than one person simultaneously without the express written permission. If automation of the technical content for other than personal use, or for multiple simultaneous user access to a copyrighted work is desired, please contact 877.363.RIAC (toll free) or 315.351.4202 for licensing information.

(7)

Page 1. INTRODUCTION 1 1.1. Scope 2 1.2. Book Organization 5 1.3. Reliability Program Elements 7 1.4. The History of Reliability Prediction 11 1.5. Acronyms 17 1.6. References 18 2. GENERAL ASSESSMENT APPROACH 19 2.1. Define System 20 2.2. Identify the Purpose of the Model 22 2.3. Determine the Appropriate Level at Which to Perform the Modeling 25 2.3.1. Level vs. Data Needed 26 2.3.2. Using an FMEA as the basis for a reliability model 28 2.3.3. Model Form vs. Level 34 2.4. Assess Data Available 36 2.5. Determine and Execute Appropriate Approach 38 2.5.1. Empirical 44 2.5.1.1. Test 44 2.5.1.2. Field Data 77 2.5.2. Physics 106 2.5.2.1. Stress/Strength Modeling 106 2.5.2.2. First Principals 111 2.6. Combine Data 114 2.6.1. Bayesian Inference 121 2.7. Develop System Model 123 2.7.1. Monte Carlo Analysis 127 2.8. References 133 3. FUNDAMENTAL CONCEPTS 135 3.1. Reliability Theory Concepts 135 3.2. Probability concepts 142 3.2.1. Covariance 142 3.2.2. Correlation Coefficient 142 3.2.3. Permutations and Combinations 143 3.2.4. Mutual Exclusivity 144

(8)

Page 3.2.5. Independent Events 144 3.2.6. Non‐independent (Dependent) Events 145 3.2.7. Non‐independent (Dependent) Events: Bayes Theorem 146 3.2.8. System Models 146 3.2.9. K‐out‐of‐N Configurations 151 3.3. Distributions 153 3.3.1. Exponential 159 3.3.2. Weibull 160 3.3.3. Lognormal 166 3.4. References 169 4. DOEBASED APPROACHES TO RELIABILITY MODELING 171 4.1. Determine the Feature to be Assessed 172 4.2. Determine Factors 172 4.3. Determine the Factor Levels 172 4.4. Design the Tests 174 4.5. Perform Tests and Measurements 180 4.6. Analyze the Data 181 4.7. Develop the Life Model 183 4.8. References 183 5. LIFE DATA MODELING 185 5.1. Selecting a Distribution 185 5.2. Parameter Estimation Overview 186 5.2.1. Closed Form Parameter Approximations 189 5.2.2. Least Squares Regression 190 5.2.3. Parameter Estimation Using MLE 192

5.2.3.1. Brief Historical Remarks 193

5.2.3.2. Likelihood Function 193

5.2.3.3. Maximum Likelihood Estimator (MLE) 195

5.2.4. Confidence Bounds and Uncertainty 198

5.2.4.1. Confidence Bounds with MLE 198

5.2.4.2. Confidence Bounds Approximations 199

5.3. Acceleration Models 206

5.3.1. Fundamental Acceleration Models 207

(9)

Page 5.3.2. Combined Models 210 5.3.3. Cumulative Damage Model 214 5.4. MLE Equations 216 5.4.1. Likelihood Functions 217 5.5. References 221 6. INTERPRETATION OF RELIABILITY ESTIMATES 223 6.1. Bathtub Curve 223 6.2. Common Cause vs. Special Cause 225 6.3. Confidence Bounds 238 6.3.1. Traditional Techniques for Confidence Bounds 238 6.3.2. Uncertainty in Reliability Prediction Estimates 240 6.4. Failure Rate vs pdf 243 6.5. Practical Aspects of Reliability Assessments 245 6.6. Weibayes 245 6.7. Weibull Closure Property 246 6.8. Estimating Event‐Related Reliability 247 6.9. Combining Different Types of Assessments at Different Levels 248 6.10. Estimating the Number of Failures 250 6.11. Calculation of Equivalent Failure Rates 251 6.12. Failure Rate Units 252 6.13. Factors to be Considered When Developing Models 253 6.13.1. Causes of Electronic System Failure 253 6.13.2. Selection of Factors 255 6.13.3. Reliability Growth of Components 257 6.13.4. Relative vs. Absolute Humidity 259 6.14. Addressing Data with No Failures 259 6.15. Reliability of Components Used Outside of Their Rating 261 6.16. References 262 7. EXAMPLES 263 7.1. MIL‐HDBK‐217 Model Development Methodology 264 7.1.1. Identify Possible Variables 266 7.1.2. Develop Theoretical Model 266 7.1.3. Collect and QC Data 267 7.1.4. Correlation Coefficient Analysis 268

(10)

Page 7.1.5. Stepwise Multiple Regression Analysis 270 7.1.6. Goodness‐of‐Fit Analysis 271 7.1.7. Extreme Case Analysis 272 7.1.8. Model Validation 272 7.2. 217Plus Reliability Prediction Models 273 7.2.1. Background 273 7.2.2. System Reliability Prediction Model 274 7.2.2.1. 217Plus Background 274 7.2.2.2. Methodology Overview 277

7.2.2.3. System Reliability Model 278

7.2.2.4. Initial Failure Rate Estimate 279

7.2.2.5. Process Grading Factors 280

7.2.2.6. Basis Data for the Model 281

7.2.2.7. Uncertainty in Traditional Approach Estimates 281

7.2.2.8. System Failure Causes 282

7.2.2.9. Environmental Factor 287

7.2.2.10. Reliability Growth 291

7.2.2.11. Infant Mortality 292

7.2.2.12. Combining Predicted Failure Rate with Empirical Data 292

7.2.3. Development of Component Reliability Models 292

7.2.3.1. Model Form 292

7.2.3.2. Acceleration Factors 294

7.2.3.3. Time Basis of Models 294

7.2.3.4. Failure Mode to Failure Cause Mapping 295

7.2.3.5. Derivation of Base Failure Rates 296

7.2.3.6. Combining the Predicted Failure Rate with Empirical Data 296

7.2.3.7. Estimating Confidence Levels 298

7.2.3.8. Using the 217Plus Model in a Top-Down Analysis 298

7.2.3.9. Capacitor Model Example 299

7.2.3.10. Default Values 301

7.2.4. Photonic Model Development Example 303

7.2.4.1. Introduction 303

7.2.4.2. Model development methodology and results 306

7.2.4.3. Uncertainty Analysis 322

7.2.4.4. Comments on Part Quality Levels 325

7.2.4.5. Explanation of Failure Rate Units 325

7.2.5. System‐Level Model 326

(11)

Page 8.3.4. Implementation 385 8.4. How to Perform an FMEA 385 8.5. Identify System Hierarchy 387 8.6. Function Analysis 388 8.7. IPOUND Analysis 388 8.8. Identify the Severity 390 8.9. Identify the Possible Effect(s) that Result from Occurrence of Each Failure Mode 392 8.10. Identify Potential Causes of Each Failure Mode 392 8.11. Identify Factors for Each Failure Cause 398 8.11.1. Accelerating Stress(es) or Potential Tests 398 8.11.2. Occurrence 398 8.11.2.1. Occurrence Rankings 398 8.11.3. Preventions 401 8.11.4. Detections 401 8.11.5. Detectability 401 8.12. Calculate the RPN 404 8.13. Determine Appropriate Corrective Action 405 8.14. Update the RPN 408 8.15. Using Quality Function Deployment to Feed the FMEA 408 8.16. References 410 9. CONCLUDING REMARKS 411

(13)

List of Figures

Page FIGURE 1.1‐1: PHASES OF A RELIABILITY PROGRAM ... 2 FIGURE 1.1‐2: RELATIVE COST OF FAILURES VS. PHASE ... 3 FIGURE 1.1‐3: RELIABILITY PREDICTION, ASSESSMENT AND ESTIMATION... 4 FIGURE 1.1‐4: PERCENT OF COMPANIES USING RELIABILITY ENGINEERING TOOLS ... 5 FIGURE 1.3‐1: EXAMPLE RELIABILITY PROGRAM APPROACH ... 7 FIGURE 2.0‐1: GENERAL MODELING APPROACH ... 20 FIGURE 2.1‐1: FAULT TREE REPRESENTATION OF SYSTEM MODEL ... 21 FIGURE 2.1‐2: FAULT TREE REPRESENTATION TO THE FAILURE CAUSE LEVEL ... 21 FIGURE 2.2‐1: BREAKDOWN OF POTENTIAL RELIABILITY MODELING PURPOSES ... 23 FIGURE 2.3‐1: TYPICAL DATA REQUIREMENTS VS. LEVEL OF HIERARCHY ... 27 FIGURE 2.3‐2: THE BASIC FMEA APPROACH ... 28 FIGURE 2.3‐3: HIERARCHICAL RELATIONSHIP BETWEEN CAUSE, MODE AND EFFECT ... 29 FIGURE 2.3‐4: APPROACH TO IDENTIFYING CAUSES ... 29 FIGURE 2.3‐5: FAULT TREE OF PRODUCT OR SYSTEM ... 32 FIGURE 2.3‐6: FAULT TREE OF PRODUCT OR SYSTEM WITH CAUSE AS THE LOWEST LEVEL ... 32 FIGURE 2.3‐7: FAULT TREE OF PRODUCT OR SYSTEM WITH CAUSE ABOVE THE LOWEST LEVEL ... 33 FIGURE 2.3‐8: FAULT TREE OF PRODUCT OR SYSTEM WITH CAUSE TWO LEVELS ABOVE THE LOWEST LEVEL ... 33 FIGURE 2.5‐1: BREAKDOWN OF RELIABILITY ASSESSMENT OPTIONS ... 38 FIGURE 2.5‐2: QUALIFICATION CONCEPTS AND TERMINOLOGY ... 46 FIGURE 2.5‐3: EVT, DVT AND PVT RELATIONSHIPS... 48 FIGURE 2.5‐4: ACCELERATION LEVELS ... 51 FIGURE 2.5‐5: UNCERTAINTY IN EXTRAPOLATION ... 52 FIGURE 2.5‐6: ACCELERATION LEVELS ... 53 FIGURE 2.5‐7: ACCELERATION ALTERNATIVES ... 53 FIGURE 2.5‐8: RELATIVE LIFETIME VS. STRESS ... 54 FIGURE 2.5‐9: RELIABILITY REQUIREMENT VS. SMALL POPULATION RELIABILITY INFERENCE ... 60 FIGURE 2.5‐10: LIFE MODELING METHODOLOGY ... 62 FIGURE 2.5‐11: IDENTIFICATION OF TEST STRESSES BASED ON THE FMEA ... 64 FIGURE 2.5‐12: USING THE DESTRUCT LIMIT TO DEFINE THE LIFE TEST MAX STRESS ... 66 FIGURE 2.5‐13: POSSIBLE STRESS PROFILES ... 67 FIGURE 2.5‐14: MEASUREMENT POINTS FOR AN INFANT MORTALITY FAILURE CAUSE ... 69 FIGURE 2.5‐15: MEASUREMENT POINTS FOR A WEAROUT FAILURE CAUSE ... 69 FIGURE 2.5‐16: ACCELERATION WHEN THE DISTRIBUTIONS FOR AT LEAST TWO STRESSES ARE AVAILABLE ... 71 FIGURE 2.5‐17: ACCELERATION WHEN THE DISTRIBUTIONS FOR LOW STRESSES ARE NOT AVAILABLE ... 71 FIGURE 2.5‐18: LIFE MODEL SEQUENCE ... 72 FIGURE 2.5‐19 DEGRADATION MODELING APPROACH ... 75 FIGURE 2.5‐20: DEGRADATION DATA EXAMPLE ... 76 FIGURE 2.5‐21: DEGRADATION DATA CONVERSION TO TIMES TO FAILURE ... 77 FIGURE 2.5‐22: RELIABILITY ESTIMATES FROM FIELD DATA ... 78

(14)

List of Figures

Page FIGURE 2.5‐23: FMEA AS A TOLL FOR ASSESSING SIMILARITY ... 81 FIGURE 2.5‐24: MIL‐HDBK‐217 PART COUNT EXAMPLE ... 85 FIGURE 2.5‐25: MIL‐HDBK‐217 PART STRESS EXAMPLE ... 86 FIGURE 2.5‐26: TELCORDIA SR‐332 (BELLCORE) ... 87 FIGURE 2.5‐27: RAC PRISM REPLACED BY RIAC 217PLUS ... 88 FIGURE 2.5‐28: CNET/RDF 2000 ... 89 FIGURE 2.5‐29: CNET/RDF 2000 MODEL EXAMPLE ... 90 FIGURE 2.5‐30: FIDES ... 91 FIGURE 2.5‐31: USES OF PROGRAM DATA ELEMENTS ... 93 FIGURE 2.5‐32: PROGRAM DATABASE STRUCTURE ... 93 FIGURE 2.5‐33: DATABASE INFORMATION FLOW ... 95 FIGURE 2.5‐34: HIERARCHY OF MAINTENANCE ACTIONS ... 97 FIGURE 2.5‐35: CALCULATION OF PART LIFE UNIT ... 100 FIGURE 2.5‐36: FAILURE TIMES BASED ON OPERATING TIME ... 101 FIGURE 2.5‐37: FAILURE TIMES BASED ON CALENDAR TIME ... 102 FIGURE 2.5‐38: FAILURE RATE SIMULATION WITH WEIBULL BETA = 20 ... 103 FIGURE 2.5‐39: FAILURE RATE SIMULATION WITH WEIBULL BETA = 5.0 ... 103 FIGURE 2.5‐40: FAILURE RATE SIMULATION WITH WEIBULL BETA = 2.0 ... 104 FIGURE 2.5‐41: FAILURE RATE SIMULATION WITH WEIBULL BETA = 1.0 ... 104 FIGURE 2.5‐42: FAILURE RATE SIMULATION WITH WEIBULL BETA = 0.5 ... 105 FIGURE 2.5‐44: STRESS/STRENGTH INTERFERENCE ... 108 FIGURE 2.5‐45: STRESS/STRENGTH INTERFERENCE VS. TIME ... 109 FIGURE 2.6‐1: 217PLUS APPROACH TO FAILURE RATE ESTIMATION ... 114 FIGURE 2.6‐3. BAYESIAN INFERENCE OUTLINE ... 122 FIGURE 2.7‐1: COMBINING SEVEN FAILURE CAUSE DISTRIBUTIONS ... 125 FIGURE 2.7‐2: POSSIBLE FAULT TREE REPRESENTATION OF A SERIES RELIABILITY BLOCK DIAGRAM ... 126 FIGURE 2.7‐3: PDF OF NORMAL DISTRIBUTION WITH MEAN OF 10 AND STANDARD DEVIATION OF 3. ... 128 FIGURE 2.7‐4: CUMULATIVE NORMAL DISTRIBUTION WITH MEAN OF 10 AND STANDARD DEVIATION OF 3 ... 128 FIGURE 2.7‐5: VALUE SELECTION FROM A DISTRIBUTION ... 129 FIGURE 2.7‐6: VALUE SELECTION FROM A WEIBULL DISTRIBUTION ... 130 FIGURE 2.7‐7: RELIABILITY BLOCK DIAGRAM OF REDUNDANT EXAMPLE ... 131 FIGURE 2.7‐8: SYSTEM MONTE CARLO EXAMPLE... 131 FIGURE 2.7‐9: MONTE CARLO SIMULATION OF EXAMPLE SYSTEM ... 132 FIGURE 3.1‐1: DISCRETE PROBABILITY DISTRIBUTION ... 135 FIGURE 3.1‐2: CONTINUOUS PROBABILITY DISTRIBUTION ... 136 FIGURE 3.2‐1: EXAMPLES OF CORRELATION COEFFICIENTS ... 142 FIGURE 3.2‐2: VENN DIAGRAM OF MUTUALLY EXCLUSIVE EVENTS ... 144 FIGURE 3.2‐3: INDEPENDENT EVENTS ... 145 FIGURE 3.2‐4: FAULT TREE OR GATE ... 147 FIGURE 3.2‐5: RELIABILITY BLOCK DIAGRAM FOR AN OR GATE ... 147

(15)

List of Figures

Page FIGURE 3.2‐7: RELIABILITY BLOCK DIAGRAM FOR AN AND GATE ... 149 FIGURE 3.2‐8: FAULT TREE OF AN AND/OR COMBINATION ... 150 FIGURE 3.2‐9: RBD OF AND/OR COMBINATION ... 150 FIGURE 3.3‐1: SHAPES OF FAILURE DENSITY AND RELIABILITY FUNCTIONS OF COMMONLY USED DISCRETE DISTRIBUTIONS (FROM MIL‐HDBK‐338B) ... 157 FIGURE 3.3‐2: SHAPES OF FAILURE DENSITY, RELIABILITY AND HAZARD RATE FUNCTIONS FOR COMMONLY USED CONTINUOUS DISTRIBUTIONS (FROM MIL‐HDBK‐338B) ... 158 FIGURE 3.3‐3: EXAMPLE PDF PLOTS FOR THE WEIBULL DISTRIBUTION ... 164 FIGURE 3.3‐4: EXAMPLE HAZARD RATE PLOTS FOR THE WEIBULL DISTRIBUTION ... 164 FIGURE 3.3‐5: EXAMPLE PROBABILITY PLOTS FOR WEIBULL DISTRIBUTION ... 165 FIGURE 3.3‐6: EXAMPLE PDF PLOTS FOR THE LOGNORMAL DISTRIBUTION ... 167 FIGURE 3.3‐7: EXAMPLE HAZARD RATE PLOTS FOR THE LOGNORMAL DISTRIBUTION ... 168 FIGURE 3.3‐8: EXAMPLE PROBABILITY PLOTS FOR THE LOGNORMAL DISTRIBUTION ... 168 FIGURE 4.0‐1: THE DOE CONCEPT ... 171 FIGURE 4.3‐1: POSSIBLE RESPONSE‐FACTOR LEVEL RELATIONSHIP ... 173 FIGURE 4.4‐1: DOE TERMINOLOGY ... 174 FIGURE 4.4‐2: ONE‐FACTOR‐AT‐A‐TIME EXPERIMENTS ... 176 FIGURE 4.4‐3: STANDARD DOE NOMENCLATURE ... 177 FIGURE 4.4‐4: POTENTIAL INTERACTIONS ... 178 FIGURE 4.6‐1: ANALYSIS OF MEANS ... 182 FIGURE 4.6‐2: LINEARIZATION OF THE ARRHENIUS RELATIONSHIP ... 182 FIGURE 4.6‐3: OPTIMAL FACTOR SETTINGS... 183 FIGURE 5.4‐1: LIKELIHOOD CONTOUR EXAMPLE... 220 FIGURE 6.1‐1: BATHTUB CURVE ... 223 FIGURE 6.2‐1: EXAMPLE OF NON‐MONOMODAL DISTRIBUTION ... 228 FIGURE 6.2‐2: MULTIMODAL DISTRIBUTION EXAMPLE 1 ... 229 FIGURE 6.2‐3: MULTIMODAL DISTRIBUTION EXAMPLE 2 ... 230 FIGURE 6.2‐4: MULTIMODAL DISTRIBUTION EXAMPLE 3 ... 231 FIGURE 6.2‐5: MULTIMODAL DISTRIBUTION EXAMPLE 4 ... 232 FIGURE 6.2‐6: MULTIMODAL DISTRIBUTION EXAMPLE 5 ... 233 FIGURE 6.2‐7: MULTIMODAL DISTRIBUTION EXAMPLE OF POOLED DATA SET ... 234 FIGURE 6.2‐8: AGE AT DEATH DATA ... 235 FIGURE 6.2‐9: PDF OF MULTIMODE DISTRIBUTION OF AGES ... 236 FIGURE 6.2‐10: FAILURE RATE OF AGE DATA ... 236 FIGURE 6.2‐11: PROBABILITY PLOT OF AGE DATA ... 237 FIGURE 6.2‐12: SINGLE MODE WEIBULL FIT TO THE AGE DATA ... 238 FIGURE 6.3‐1: SOURCES OF ERROR IN EMPIRICAL MODELS ... 241 FIGURE 6.3‐2: CONFIDENCE LEVEL THROUGH PREDICTION, ASSESSMENT AND ESTIMATION ... 243 FIGURE 6.6‐1: WEIBAYES EXAMPLE ... 246 FIGURE 6.13‐1: NOMINAL FAILURE CAUSE DISTRIBUTION OF ELECTRONIC SYSTEMS ... 254

(16)

List of Figures

Page FIGURE 6.13‐2: IPO MODEL ... 256 FIGURE 6.13‐3: RELATIONSHIP BETWEEN ABSOLUTE AND RELATIVE HUMIDITY... 259 FIGURE 6.14‐1: ESTIMATED UPPER BOUND FAILURE RATES VS OPERATING TIME AT 60 AND 90% CONFIDENCE ... 260 FIGURE 7.1‐1: MIL‐HDBK‐217 MODEL DEVELOPMENT METHODOLOGY ... 265 FIGURE 7.2‐1: FAILURE CAUSE DISTRIBUTION OF ELECTRONIC SYSTEMS ... 275 FIGURE 7.2‐2: OPTICAL AMPLIFIER FAILURE CAUSE DISTRIBUTION ... 277 FIGURE 7.2‐3: ΠG VS. TIME AND GROWTH RATES ... 291 FIGURE 7.2‐4: MODEL DEVELOPMENT METHODOLOGY FLOWCHART ... 306 FIGURE 7.2‐5: DISTRIBUTION OF LOG10 PREDICTED/OBSERVED FAILURE RATE RATIO FOR ALL DATA .... 323 FIGURE 7.2‐6: DISTRIBUTION OF LOG10 PREDICTED/OBSERVED RATIO FOR FIELD DATA ONLY ... 324 FIGURE 7.2‐7: DISTRIBUTIONS OF THE PREDICTED/OBSERVED FAILURE RATE RATIO FOR ALL DATA AND FOR FIELD DATA ONLY ... 324 FIGURE 7.3‐1: TIMES TO FAILURE DISTRIBUTIONS ... 354 FIGURE 7.3‐2: PROBABILITY OF FAILURE VS. TEMPERATURE AND RELATIVE HUMIDITY AT 50,000 HOURS ... 357 FIGURE 7.4‐1: APPARENT FAILURE RATE FOR REPLACEMENT UPON FAILURE... 362 FIGURE 7.4‐3: EXAMPLE OF PART DETAIL ENTRIES ... 374 FIGURE 8.1‐1: TWO BASIC TYPES OF FMEA ... 378 FIGURE 8.4‐1: FMEA PROCESS FLOW ... 386 FIGURE 8.7‐1: FAILURE CAUSE‐MODE EFFECT RELATIONSHIP ... 390 FIGURE 8.10‐1: FAILURE CAUSE, MODE AND EFFECT HIERARCHY ... 393 FIGURE 8.10‐2: FAILURE CAUSES ... 395 FIGURE 8.11‐1: OCCURRENCE DEFINITIONS ... 399 FIGURE 8.11‐2: OCCURRENCE GUIDELINES ... 400 FIGURE 8.11‐3: DETECTABILITY DEFINITIONS ... 402 FIGURE 8.11‐4: LIFE CYCLE VS DETECTABILITY DIMENSION ... 403 FIGURE 8.13‐1: POTENTIAL CORRECTIVE ACTIONS ... 407 FIGURE 8.15‐1: QFD‐TO‐FMEA LINKS ... 408 FIGURE 8.15‐2: QFD‐FMEA ... 410

(17)

List of Tables

Page TABLE 1.3‐1: RANGES OF POTENTIAL CUSTOMER REACTIONS... 8 TABLE 2.2‐1: RELIABILITY ASSESSMENT PURPOSES ... 24 TABLE 2.2‐2: PROGRAM PHASE VS. RELIABILITY ASSESSMENT PURPOSE ... 25 TABLE 2.3‐1: EXAMPLES OF INITIAL CONDITIONS, STRESSES AND MECHANISMS ... 30 TABLE 2.3‐2: RELATIONSHIP BETWEEN CAUSE, MODE AND EFFECT. ... 31 TABLE 2.5‐1: SUMMARY OF RELIABILITY ASSESSMENT OPTIONS ... 39 TABLE 2.5‐1: SUMMARY OF ASSESSMENT OPTIONS (CONTINUED) ... 40 TABLE 2.5‐2: RELEVANCY OF APPROACH TO PREDICTION, ASSESSMENT AND ESTIMATION... 41 TABLE 2.5‐3: IDENTIFICATION OF APPROPRIATE APPROACHES BASED ON THE PURPOSE ... 43 TABLE 2.5‐4: RANKING THE ATTRIBUTES OF EMPIRICAL DATA ... 44 TABLE 2.5‐5: EVT, DVT AND PVT PURPOSE AND APPROACH ... 47 TABLE 2.5‐6: RELIABILITY DEMONSTRATION EXAMPLE ... 50 TABLE 2.5‐7: EXAMPLE OF A QUALIFICATION PLAN FOR AN ASSEMBLY ... 57 TABLE 2.5‐8: QUALIFICATION EXAMPLE FOR A LASER DIODE ... 58 TABLE 2.5‐9: STRESS PROFILE OPTION ADVANTAGES AND DISADVANTAGES ... 68 TABLE 2.5‐10: SIMILARITY ANALYSIS ... 80 TABLE 2.5‐11: DIGITAL CIRCUIT BOARD FAILURE RATES (IN FAILURES PER MILLION PART HOURS) ... 83 TABLE 2.5‐12: TEST CONDITIONS ... 111 TABLE 2.5‐13: DATA TO ESTIMATE DIFFUSION RATE ... 112 TABLE 2.5‐14: PREDICTED LIFETIMES VS. OBSERVED ... 113 TABLE 3.1‐1: PROBABILITY DISTRIBUTION NOTATION & MATHEMATICAL REPRESENTATIONS ... 141 TABLE 3.2‐1: COMBINATIONS EXAMPLE ... 143 TABLE 3.2‐2: COMBINATIONS OF AN OR CONFIGURATION ... 147 TABLE 3.2‐3: COMBINATIONS OF AN AND CONFIGURATION ... 149 TABLE 3.2‐4: EXAMPLE OF “K‐OUT‐OF‐N” PROBABILITY CALCULATIONS... 151 TABLE 3.2‐5: EXAMPLE OF “2‐OUT‐OF‐3” REQUIRED FOR SUCCESS ... 152 TABLE 3.3‐1: PROBABILITY DISTRIBUTIONS APPLICABLE TO RELIABILITY ENGINEERING ... 154 TABLE 3.3‐2: EXPONENTIAL DISTRIBUTION PARAMETERS ... 160 TABLE 3.3‐3: CONFUSING TERMINOLOGY OF THE WEIBULL DISTRIBUTION ... 162 TABLE 3.3‐4: WEIBULL DISTRIBUTION PARAMETERS ... 163 TABLE 4.3‐1: POSSIBLE CONCLUSIONS FOR A NON‐LINEAR RESPONSE‐FACTOR RELATIONSHIP ... 173 TABLE 4.4‐1: FULL‐FACTORIAL EXAMPLE ... 175 TABLE 4.4‐2: FULL AND HALF FACTORIAL EXAMPLE FOR CORROSION ... 179 TABLE 5.2‐1: TERMINOLOGY USED IN PARAMETER ESTIMATION ... 187 TABLE 5.2‐2: TECHNIQUES FOR PARAMETER ESTIMATION ... 188 TABLE 5.2‐3: PARAMETERS TYPICALLY ESTIMATED FROM STATISTICAL DISTRIBUTIONS ... 189 TABLE 5.2‐4: CONFIDENCE BOUNDS FOR THE POISSON DISTRIBUTION ... 200 TABLE 5.2‐5: CONFIDENCE BOUNDS FOR THE BINOMIAL DISTRIBUTION ... 201 TABLE 5.2‐6: CONFIDENCE BOUNDS FOR THE EXPONENTIAL DISTRIBUTION ... 202 TABLE 5.2‐8: CONFIDENCE BOUNDS FOR THE NORMAL DISTRIBUTION ... 203 TABLE 5.3‐10: CONFIDENCE BOUNDS FOR THE WEIBULL DISTRIBUTION ... 205

(18)

List of Tables

Page TABLE 6.1‐1: CATEGORIES OF FAILURE EFFECTS ... 227 TABLE 6.2‐2: BIMODAL POPULATION EXAMPLE 1 ... 229 TABLE 6.2‐3: BIMODAL POPULATION EXAMPLE 2 ... 230 TABLE 6.1‐4: BIMODAL POPULATION EXAMPLE 3 ... 231 TABLE 6.1‐5: BIMODAL POPULATION EXAMPLE 4 ... 232 TABLE 6.1‐6: BIMODAL POPULATION EXAMPLE 5 ... 233 TABLE 6.1‐7: FOUR MODE WEIBULL DISTRIBUTION PARAMETERS ... 235 TABLE 6.3‐1: FAILURE RATE UNCERTAINTY LEVEL MULTIPLIERS ... 242 TABLE 6.9‐1: EXAMPLE OF COMBING DIFFERENT TYPES OF MODELS... 248 TABLE 6.13‐1: FACTORS TO BE CONSIDERED IN A RELIABILITY MODEL ... 256 TABLE 6.13‐2: FAILURE RATE DATA SUMMARY ... 258 TABLE 7.1‐1: DATA COLLECTED FOR MODEL DEVELOPMENT ... 269 TABLE 7.1‐2: DATA TRANSFORMS ... 270 TABLE 7.1‐3: REGRESSION DATA INCLUDING CATEGORICAL VARIABLES ... 271 TABLE 7.2‐1: UNCERTAINTY LEVEL MULTIPLIER ... 282 TABLE 7.2‐2: PERCENTAGE OF FAILURES ATTRIBUTABLE TO EACH FAILURE CAUSE ... 283 TABLE 7.2‐3: WEIBULL PARAMETERS FOR FAILURE CAUSE PERCENTAGES ... 283 TABLE 7.2‐4: MULTIPLIERS AS A FUNCTION OF PROCESS GRADE ... 284 TABLE 7.2‐5: EXAMPLE OF FAILURE MODE‐TO‐FAILURE CAUSE CATEGORY MAPPING ... 295 TABLE 7.2‐6: CAPACITOR PARAMETERS ... 301 TABLE 7.2‐7: DEFAULT ENVIRONMENTAL STRESS VALUES ... 302 TABLE 7.2‐8: DEFAULT OPERATING PROFILE VALUES... 303 TABLE 7.2‐9: FAILURE CAUSE SUMMARY FOR CONNECTORS ... 308 TABLE 7.2‐10: FAILURE MODE TO FAILURE CAUSE CATEGORY FOR CONNECTORS (SC AND FC) ... 309 TABLE 7.2‐11: FAILURE CAUSE PERCENTAGES FOR CONNECTORS ... 311 TABLE 7.2‐12: DATA COLLECTED FOR CONNECTORS... 312 TABLE 7.2‐13: CATEGORIES OF ACCELERATION MODEL PARAMETERS ... 315 TABLE 7.2‐14: ACCELERATION MODEL PARAMETERS ... 315 TABLE 7.2‐15: DEFAULT MODEL PARAMETERS ... 316 TABLE 7.2‐16: SUMMARY OF PI‐FACTOR CALCULATIONS ... 317 TABLE 7.2‐17: APPLICABILITY OF TEST DATA ... 318 TABLE 7.2‐18: BASE FAILURE RATES (FAILURES PER MILLION CALENDAR HOURS) ... 319 TABLE 7.2‐19: PART QUALITY PROCESS GRADE FACTOR QUESTIONS FOR PHOTONIC DEVICE MODELS .. 320 TABLE 7.2‐20: SUMMARY OF UNCERTAINTY METRICS ... 323 TABLE 7.2‐21: PARAMETERS FOR THE PROCESS GRADE FACTORS ... 327 TABLE 7.2‐22. INDEX OF PROCESS GRADE TYPE QUESTIONS ... 328 TABLE 7.2‐23: DESIGN PROCESS GRADE FACTOR QUESTIONS ... 330 TABLE 7.2‐24: MANUFACTURING PROCESS GRADE FACTOR QUESTIONS ... 336 TABLE 7.2‐25: PART QUALITY PROCESS GRADE FACTOR QUESTIONS ... 340 TABLE 7.2‐26: SYSTEM MANAGEMENT PROCESS GRADE FACTOR QUESTIONS ... 342 TABLE 7.2‐27: CAN NOT DUPLICATE (CND) PROCESS GRADE FACTOR QUESTIONS ... 346

(19)

List of Tables

Page TABLE 7.2‐29: WEAROUT PROCESS GRADE FACTOR QUESTIONS ... 348 TABLE 7.2‐30: GROWTH PROCESS GRADE FACTOR QUESTIONS ... 349 TABLE 7.3‐1: PARAMETER LEVELS ... 350 TABLE 7.3‐2: TEST PLAN SUMMARY ... 351 TABLE 7.3‐3: LIFE TEST RESULTS ... 352 TABLE 7.3‐4: TIMES TO FAILURE DISTRIBUTION PARAMETERS ... 353 TABLE 7.3‐5: ESTIMATED PARAMETER 80% 2‐SIDED CONFIDENCE BOUNDS ... 356 TABLE 7.4‐1: DATA SUMMARIZATION PROCESS ... 359 TABLE 7.4‐2: TIME AT WHICH ASYMPTOTIC VALUE IS REACHED ... 363 TABLE 7.4‐3 α/MTTF RATIO AS A FUNCTION OF β ... 363 TABLE 7.4‐4: PERCENT FAILURE FOR WEIBULL DISTRIBUTION ... 364 TABLE 7.4‐5: FIELD DESCRIPTIONS ... 367 TABLE 7.4‐6: APPLICATION ENVIRONMENTS DEFINED IN NPRD ... 368 TABLE 8.7‐1: FAILURE MODE RELATIONSHIP TO TAGUCHI LOSS FUNCTION ... 389 TABLE 8.8‐1: DIMENSIONS OF FUNCTIONAL SEVERITY ... 391 TABLE 8.8‐2: DIMENSIONS OF SEVERITY ... 392 TABLE 8.11‐1: CATEGORIES OF FAILURE EFFECTS ... 401 TABLE 8.11‐2: RECOMMENDED DETECTABILITY RATING CRITERIA ... 404

(20)

(21)

1. Introduction

Few engineering techniques have caused as much controversy in the last several decades as the topic of reliability prediction. One of the primary reasons for this is the stochastic nature of reliability. Whereas many engineering disciplines are governed by

deterministic processes, reliability is governed by a complex interaction of stochastic processes. As a result, the metrics of interest in other engineering disciplines are

generally much more quantifiable by their very nature. While there is always a stochastic element in any engineering model, the topic of reliability quantification must address its extreme stochastic nature.

Many highly respected reliability engineering texts treat the topic of reliability modeling thoroughly and in great detail. Included in these texts are detailed ways to model system reliability using techniques like Failure Modes and Effects Analysis (FMEA), Fault Tree Analysis (FTA), Markov models, fault tolerant design techniques, etc. The techniques that are addressed in detail in these texts often gloss over a fundamental requirement in order to effectively utilize these techniques, i.e., the ability to quantify the reliability of the constituent components and subsystems comprising the system.

The intent of this book is to provide guidance on reliability modeling techniques that can be used to quantify the reliability of a product or system. In this context, reliability modeling is the process of constructing a mathematical model that is used to estimate the reliability characteristics of an item. There are many ways in which this can be

accomplished, depending on the item and the type of information that is available to, or practical to obtain by, the analyst. This book will review possible approaches, summarize their advantages and disadvantages, and provide guidance on selecting a methodology based on specific goals and constraints. While this book will not discuss the use of specific published methodologies, in cases where examples are provided, tools and methodologies with which the author has personal experience in their development are used, such as life modeling, NPRD, MIL-HDBK-217 and 217Plus.

The Reliability Information Analysis Center (RIAC) has prepared many documents in the past relating to many different reliability engineering techniques, such as FMEA, FTA, Worst Case Analysis (WCA), etc. However, one noteworthy omission from this list is reliability modeling. This, coupled with (1) the RIAC’s history of providing reliability modeling data and solutions, and (2) the need to objectively address some of the

(22)

In years past, DoD contracts would require specific reliability prediction methodologies, usually MIL-HDBK-217, be used. This resulted in system developers having very little flexibility in applying different reliability prediction practices. Since the DoD has not, until very recently, supported updates to MIL-HDBK-217, companies were encouraged to use best practices in quantifying product reliability. The difficult question to be addressed is “what are the best practices that should be used?” This book attempts to provide guidance on selecting an appropriate methodology based on the specific conditions and constraints of the company and its products or systems.

It is hoped that the author’s experience gained by attempting many different reliability assessment approaches, including physics and empirical approaches, can be used to the advantage of the reader in a practical way.

1.1. Scope

The intent of a reliability program is to identify and mitigate failure modes/mechanisms, verify their removal through reliability testing, implement corrective actions for

“discovered” failures, and maintain reliability levels after reliability has been designed in. These correspond to the designing-in reliability, reliability growth and ensuring on-going reliability goals, respectively, as illustrated in Figure 1.1-1.

(23)

The cost to an organization increases exponentially as a function of when failure causes are discovered, as illustrated in Figure 1.1-2. It is most efficient to discover failure modes and mechanisms as early as possible, when they can be effectively mitigated. If failure modes and mechanisms are discovered late in development or, worse, in the field, organizations can be faced with staggering costs associated with corrective actions.

Figure 1.1-2: Relative Cost of Failures vs. Phase

The use of reliability engineering techniques early in the development cycle of a system is critical to achieving high reliability. An important part of these efforts is the modeling of reliability before the product or system is fielded.

The term “Reliability Prediction” has had a relatively narrow connotation, primarily associated with “handbook” approaches. This document attempts to take a broader view of this topic by investigating the various approaches for quantifying reliability, and their effectiveness when used to achieve specific objectives. For this reason, the book is entitled “Reliability Modeling – the RIAC Guide to Reliability Prediction, Assessment and Estimation”. The definitions of these are:

Prediction - something that is predicted, forecasted

Assessment - to determine the importance, size, or value of

Estimation - A tentative evaluation or rough calculation, as of worth, quantity, or

(24)

Predictions are performed very early, before there is any empirical data on the item under analysis. Reliability assessments are made to determine the affects of certain factors on reliability and to identify failure causes. Reliability estimates are made based on empirical data. This book covers all three areas, as illustrated in Figure 1.1-3.

Figure 1.1-3: Reliability Prediction, Assessment and Estimation

Figure 1.1-4 summarizes the results of a benchmarking study of best commercial

reliability practices (Reference 9). In this study, reliability predictions were identified by more than 90% of the participants as being an appropriate reliability task during the product/system development life cycle. Approximately 70% of the survey respondents felt that reliability predictions were effective, supporting the proposition that, while generally perceived as beneficial, there are problems associated with their use. This information highlights the importance that organizations often place on assessing and predicting reliability.

(25)

Figure 1.1-4: Percent of Companies Using Reliability Engineering Tools

1.2. Book

Organization

Chapter 1 of this book presents background information on reliability modeling. The next section of this chapter includes a description of a typical reliability program, the intent of which is to present the elements that should be considered when developing a program, and to highlight how reliability modeling fits into such a program. Also included is a section on the history of reliability prediction, to provide a historical perspective of its evolution.

Chapter 2 covers the primary topic of this book, and includes information on the various ways in which a product can be modeled and guidance on selecting an approach. It presents a generic approach, and describes the elements of this approach.

Chapter 3 presents fundamental concepts of reliability theory, probability and statistics. In many books, these topics are presented first. However, in this book, it is presented after Chapter 3 because it is not the primary topic. Rather, it is presented to provide the fundamental foundation for the concepts used in reliability modeling. It is also the foundation for Design of Experiments (DOE) and Life Modeling techniques, which are further detailed in Chapters 4 and 5.

(26)

Approaches like using a “Multi-cell”-based designed experiment to generate data from which a life model is developed are presented in Chapter 2. Here, a generic approach to this topic is presented. Since the topic of life modeling is central to reliability modeling, important elements of it are presented in more detail in Chapters 4 and 5. One of the critical aspects of life modeling is reliability testing.

Design of Experiments is a technique to maximize the usefulness of the data resulting from DOE tests, and is the topic of Chapter 4.

Chapter 5 presents information relative to development of the mathematical models that form the basis of the reliability model, and includes information pertaining to parameter estimation.

Chapter 6 presents a variety of topics pertaining to the interpretation of reliability models. This is provided to allow the reader to gain a better appreciation for what can, and cannot, be concluded from a model.

Chapter 7 is a compilation of examples of reliability models. Presented here are the following examples:

1. A typical MIL-HDBK-217 model development process

2. Information on the development of the RIAC’s 217Plus methodology 3. A life modeling example

4. A description of RIAC’s Nonelectronic Parts Reliability Data (NPRD), provided as an example of the use of field data in reliability modeling

These examples are provided to give the reader a better appreciation for the tools, techniques and limitations of various approaches to reliability modeling.

A discussion of FMEA is presented in Chapter 8. Although FMEA is secondary to the primary intent of this book, it can form the basis for many elements of a reliability program, including reliability modeling. Therefore, Chapter 8 is intended to present FMEA concepts in this context, as well as provide practical information on performing FMEAs that this author has found to be useful.

(27)

1.3. Reliability

Program

Elements

In order to allow a perspective on how reliability modeling fits into a reliability program, this section presents a generic reliability program, with a description of its various elements. It is presented to highlight how reliability modeling fits into such a program. There are many possible approaches to “designing in” reliability. The specific approach used will depend on the needs of the specific organization. Figure 1.3-1 presents one possible approach, and includes the elements that should be included in all approaches. The premise of this approach is to identify the critical parts and material which warrant detailed attention. Since it is impractical to perform some reliability modeling

approaches on all system parts, it is imperative to identify the critical parts which are the highest risk. Since one of the most effective ways to verify the robustness of parts or materials is from experience, an effective reliability program must leverage knowledge gained in the development and deployment of previous systems. It will be shown that reliability assessments impact many of the elements of this approach.

(28)

Elements of the reliability program are summarized as follows:

1. Design requirements: The first step in any product development process is the identification of requirements. These requirements include items pertaining to Performance, Reliability (failure rate, life), Maintainability, Diagnostics, and Use

Environment and Operational stresses (i.e., mission profiles). Typically, the medium for communicating these requirements is the product specification. While the specification usually contains details regarding the require performance of the product or system, it is often lacking relative to quantifying the reliability attributes required. The following questions should be answered to determine these reliability requirements:

• What is the required failure rate of the item in its useful life? • What is the service life required?

• What criteria will be used to determine when the requirements are not met? • Whose responsibility will it be to take corrective action if these requirements are

not met?

• What are the operating and environmental profiles expected in field deployed conditions?

A valuable tool to assist in understanding the requirements is Quality Function Deployment (QFD).

The reliability that is considered acceptable will, of course, be specific to the industry, criticality of failure, etc. The specific value may be specified, or it may not be,

depending on the industry and the maturity of the product. The range of potential customer reactions to various scenarios are summarized in Table 1.3-1.

Table 1.3-1: Ranges of Potential Customer Reactions

Outcome Field reliability Likely Customer reaction

Best

Worst

No failures Pleased

Failures occur at an acceptable rate Tolerant

Recurring failures, but on a relatively small percent of items

Annoyed Recurring failures on a high percent of items Angry An unexpected failure mechanism is discovered that

will affect the entire population, or critical safety related failures

(29)

If the requirement is not specified, an estimate of the requirement must be made so that there is a goal that can be used in the development process.

2. Initial Design: After the product requirements are understood, the design team generally derives an initial, or preliminary, design for the product or system. Inputs to this initial design should be in the form of design rules and a Standard parts list. Design rules are the culmination of lessons learned from previous development activities, from both empirical field or test data, and from analysis. These design rules should be a living document which is continuously updated based on current information. Effective use of design rules also saves much effort since reliability attributes which have a reliability history or which have been previously studied do not need to be addressed in detail, thus saving resources to be applied to the study of critical parts.

3. Similarity analysis: Once an initial design is available, a similarity analysis can be performed to identify attributes which are similar to those for which a reliability history is available, and those for which it is not. A FMEA can be a valuable technique for this analysis, and will be discussed later. In this analysis, each reliability attribute identified in the FMEA is reviewed to determine if a reliability history exists or not.

4. Identify attributes that are similar: Similar attributes are those that have a reliability history

5. Assess robustness of attribute: If the part or attribute does have a history, previous test data or field experience data can be used to assess the robustness of the part or attribute. 6. Identify attributes that are not similar: Attributes that are not similar do not have a reliability history.

7. Perform design analysis: Although any attribute that is potentially different in the new design relative to the previous design must be analyzed, particular attention is given to the attributes that are not similar. Design techniques that are used for this purpose are FMEA, tolerance or worst case analysis, thermal analysis, stress analysis, and reliability predictions.

8. Implement corrective action: From the results of the design analysis, corrective action should be taken to improve the robustness of the design.

9. Identify critical parts/materials: Based on the results of the analysis, critical parts or materials are identified.

(30)

10. Model critical parts/materials: Once critical parts are identified, action must be taken to ensure that the parts or materials are robust enough to meet the reliability and

durability requirements. More details of the approach used for this purpose will be presented later in the book.

11. Identify effective tests for non-similar attributes: Based on the identification of critical parts and the design analysis that was performed, specific tests that will assess the reliability and durability of the attribute can be determined. Part of the FMEA should include identification of stresses that will accelerate the attribute under analysis and therefore, this analysis is important for identifying the appropriate stress tests.

12. Develop a test plan and execute tests: Based on the design analysis performed and the identification of tests for non-similar attributes, a test plan can be determined. In the context of this approach, the goal of these tests is to assess the robustness of the product by subjecting the product to test stresses that are intended to accelerate the critical parts and non-similar attributes to failure. In addition to these tests, other test requirements should be incorporated into this test plan. These additional test requirements include any tests required by the customer, such as qualification or reliability demonstration tests. 13. Document the test results: Once the tests have been performed and the data analyzed, the results should be fully documented, since they subsequently will be used for a variety of purposes.

14. Monitor field reliability: Once the product is deployed, field reliability experience data should be carefully gathered, since it will be used for a variety of purposes. Elements of the data to be gathered include:

1. Product or system deployment history by serial number, including when deployed, when fielded

2. Failure information, including failure date, root failure cause, results of failure analysis

3. Product or system re-deployment information

15. Update reliability database: A database is required to manage the reliability data, and should include both test data and field data. This data can be used to generate a

company-specific reliability prediction methodology.

(31)

16. Update Design Rules: Data acquired from tests and field surveillance should be used to update the design rules. Field data is probably the most valuable type of data for this purpose since it represents the actual product or system in the intended use environment. The process of maintaining design rules and ensuring that they are used in new designs is the cornerstone of the means by which reliability is improved in a reliability growth process.

Critical parts are those which may result in a significant risk to the project. This risk can be related to reliability, lifetime, availability, or maintainability. Some of the factors that constitute critical parts are:

• New, unproven technology

• New, unproven manufacturing processes

• Performance limitations: stringent environmental conditions or non-robust design practices

• Reliability limitations: components/materials with life limitations

• Vendors with a past history of delivery, cost performance or reliability problems • Old technology with availability problems

These critical parts or items warrant additional attention in assessing their reliability, as they generally will represent the greatest reliability risk.

1.4. The History of Reliability Prediction

The term “reliability prediction” has historically be used to denote the process of

applying mathematical models and data for the purposes of estimating field reliability of a product or system before empirical data is available on that product or system. This section will review some of the developments in the area of reliability prediction from the 1950’s to the present. While there are several techniques available to reliability

practitioners to perform reliability predictions, the discussion inevitably centers around MIL-HDBK-217 due to its historical prominence as a reliability prediction tool.

During World War II, electronic tubes were by far the most unreliable component used in DoD electronic systems. This observation led to various studies and ad hoc groups whose purpose was to identify ways that their reliability, and the reliability of the systems in which they operated, could be improved. One group in the early 1950’s concluded that:

1. There needs to be better reliability data collected from the field 2. Better components need to be developed

(32)

3. Quantitative reliability requirements need to be established

4. Reliability needs to be verified by test before full scale production

5. A permanent committee needs to be established to guide the reliability discipline Item 5, above, was implemented in the form of the Advisory Group on Reliability of Electronic Equipment (AGREE), whose charter was to identify actions that could be taken to provide more reliable electronic equipment. This time period was the advent of the reliability engineering discipline. It soon became clear that the emerging discipline was using several different methods to achieve its goal of higher reliability. One was the identification of root causes of field failure and determination of mitigating actions. Another was the specification of quantitative reliability requirements. The specification of requirements in turn led to the desire to have a means of estimating reliability before an equipment is built and tested so that the probability of achieving its reliability goal could be estimated. This, of course, was the beginning of reliability prediction. The 1950’s also saw much pioneering work in the reliability discipline, including;

• A variety of efforts to improve device reliability through data collection and design

• The establishment of reliability programs

• Symposiums devoted to quality and reliability engineering

• Statistical techniques development such as the Weibull distribution • Military handbooks that provided guidance on the reliable application of

electronic components

In addition to these accomplishments, the 50’s also included pioneering work in the area of quantitative reliability prediction. In 1956, RCA released TR-1100, “Reliability Stress Analysis for Electronic Equipment”, which presented mathematical models for the estimation of component failure rates. This report turned out to be the predecessor of MIL-HDBK-217.

Several additional early works in the area of reliability prediction were produced in the early 1960’s, including D.R. Erles’ report (Reference 2) and the Erles and Edins paper (Reference 3). In 1962, the first version of MIL-HDBK-217 was published by the Navy. Once issued, MIL HDBK-217 quickly became the standard by which reliability

predictions were performed, and other sources of failure rates gradually disappeared. Part of the reason for the demise of other sources was the fact that MIL-HDBK-217 was often a contractually cited document and defense contractors did not have the option of using other sources of data.

(33)

These early sources of failure rates also often included design guidance on the reliable application on electronic components. However, subsequent versions of the documents, primarily MIL-HDBK-217, would delete the application information because it was treated in more detail elsewhere.

By now, the reliability discipline was working under the tenet that reliability was a quantitative discipline that needed quantitative data sources to support its many

statistically based techniques, such as allocations and redundancy modeling. However, another branch of the reliability discipline focused on the physical processes by which components were failing. The first symposium devoted to this topic was the “Physics of Failure In Electronics” Symposium sponsored by the Rome Air Development Center (RADC) and IIT Research Institute (IITRI) in 19621. This symposium later became known as the International Reliability Physics Symposium (IRPS). In this period of time, the two branches of reliability engineering seemed to be diverging, with the “systems” engineers devoted to the tasks of specifying, allocating, predicting and demonstrating reliability, while the physics-of-failure (PoF) engineers and scientists were devoting their efforts to identifying and modeling the physical causes of failure. Both branches were integral parts of the reliability discipline, and both were hosted at RADC (later to become Rome Laboratory). The physics-based information was necessary to develop part

qualification, screening and application requirements, and the “systems” tasks of specifying, allocating, predicting and demonstrating reliability were necessary to insure that reliability requirements were met. The component research efforts of the 1950’s and 1960’s culminated with the implementation of the “ER” and “TX” families of

specifications. This complicated the issue of predicting their reliability because there were now many different combinations of quality levels and environments that needed to be addressed in MIL-HDBK-217.

In the early 1970’s, the responsibility for preparing MIL-HDBK-217 was transferred to RADC, who published revision B in 1974. However, other than the transition to RADC, the 1970’s maintained the status quo in the area of reliability prediction. MIL-HDBK-217 was updated to reflect the technology at that time, but there were few other efforts that changed the manner in which predictions were performed. One exception, however, was that there was a shift in the complexity of the models being developed for MIL-HDBK-217. There were several efforts to develop new and innovative models for reliability prediction. The results of these efforts were extremely complex models that may have been technically sound, but were criticized by the user community as being too

1

IITRI was the original contractor of the Reliability Analysis Center (RAC). In 2005, the RAC contract was awarded as RIAC to the current team of Wyle Labs (prime), Quanterion Solutions Incorporated, the University of Maryland Center for Risk and Reliability, the Pennsylvania State Applied Research Laboratory (ARL), and the State University of New York Institute of Technology (SUNYIT)

(34)

complex, too costly, and unrealistic given the low level of detailed design information available at the point in time when the models were needed. RCA, under contract to RADC, had developed PoF-based models which were rejected as unusable, since the detailed design and construction data for microcircuits were simply unavailable to typical model users. These models were never incorporated into MIL-HDBK-217.

While MIL-HDBK-217 was updated again several times in the 1980’s, there were

agencies that were developing reliability prediction models unique to their industries. As an example, the automotive industry, under the auspices of the Society of Automotive Engineers (SAE) Reliability Standards Committee, developed a series of models specific to automotive electronics. The SAE committee felt that there was no existing prediction methodologies that were applicable to the specific quality levels and environments of automotive applications. The Bellcore reliability prediction standard is another example of a specific industry developing methodologies for their unique conditions and

equipment. It originally was developed by modifying MIL-HDBK-217 to better reflect the conditions of interest of the telecommunications industry. It has since taken on its own identity with models derived from telecommunications equipment and is now used widely within that industry.

The 1980’s also saw explosive growth in integrated circuit technology. Very dense circuits were being fabricated using feature sizes as small as 0.5 microns. This presented unique challenges to reliability modelers. The VHSIC (Very High Speed Integrated Circuit) program was the government’s attempt to leverage from the technological advancements of the commercial industry and, at the same time, produce circuits capable of meeting the unique requirements of military applications. From the VHSIC program came the Qualified Manufacturers List (QML) - a qualification methodology that qualified an integrated circuit manufacturing line, unlike the traditional qualification of specific parts. The government realized that it needed a QML-like process if it were to leverage from the advancements in commercial technologies and, at the same time, have a timely and effective qualification scheme for military parts. A reliability prediction model was also developed for VHSIC devices in 1989 (Reference 9) in support of a MIL-HDBK-217 update. An interesting observation was made during that study that deviated from the premise on which most of the MIL-HDBK-217 models were based. The

traditional approach to developing models was to collect as much field failure rate data as possible, statistically analyze it, and quantify model factors based on the results of the statistical analysis. For integrated circuits, one of the factors that was quantified was inevitably device complexity. This complexity was measured by the number of gates or transistors and was the primary factor on which the models were based. The correlation between failure rate and complexity was strong and could be quantified because the

(35)

failure rate of circuits was much higher than they are today and the defect rate was directly proportional to the complexity. As technology has advanced, the gate or

transistor count became so high that it could no longer effectively be used as the measure of complexity in a reliability model. Furthermore, transistor or gate count data was often difficult or impossible to obtain. Therefore, the model developed for VHSIC

microcircuits needed another measure of complexity on which to base the model. The best measures, and the ones most highly correlated to reliability are defect density and silicon area. It can be shown that the failure rate (for small cumulative percent failure) is directly proportional to the product of the area and defect density. However, another factor that is highly correlated to defect density and area is the yield of the die, or the percent of die that are functional upon manufacture. Ideally, a reliability model would use either yield or defect density/area as the primary factor(s) on which to base the model. The problem in using these factors in a model is that they are considered highly proprietary parameters from a market competition viewpoint and, therefore, are rarely released by the manufacturers. Therefore, the single most important driver of reliability cannot be obtained by the user of the device, which is unfortunate because the accuracy of the model suffers. The conflict between the usability of a model and its accuracy has always been a difficult tradeoff to address for model developers.

Much of the literature in the 1990’s on the topic of reliability prediction has centered around the debate as to whether the reliability discipline should focus on PoF-based or empirically-based models (such as MIL-HDBK-217) for the quantification of reliability. In the author’s opinion, many of the primary criticisms of MIL-HDBK-217 stem from the fact that it was often used for purposes for which it was not intended. For example, it was often used as a means by which the reliability of a product was demonstrated. Since its use was contractually required, contractors would try to demonstrate compliance to the specified reliability requirements by “adjusting” factors in the model to make it appear that the reliability would meet requirements. Sometimes these adjustments had a

technical basis, and sometimes they did not. Les Gubbins, one of the government’s first project managers for the handbook, once made the analogy that engaging in the use of these adjustment factors is like pushing the needle on your car’s speedometer up, and convincing yourself you’re going faster. This, of course, is not good engineering practice, but rather was done for nontechnical reasons.

Another key development in the area of reliability predictions was related to the implications of acquisition reform. In 1994, Military Specifications and Standards Reform (MSSR) was initiated which decreed the adoption of performance-based

Reliability Modeling: The RIAC Guide to Reliability Prediction, Assessment

Reliability Modeling

The RIAC Guide to Reliability Prediction,

Assessment and Estimation

∏

( )

∑

( )

(

)

( )

( )

λ = λ

e

S

Reliability Modeling -

The RIAC Guide to Reliability

Prediction, Assessment and

Estimation

Table of Contents

Table of Contents

Table of Contents

Table of Contents

Table of Contents

Table of Contents

List of Figures

List of Figures

List of Figures

List of Figures

List of Tables

List of Tables

List of Tables

1. Introduction

1.1. Scope

1.2. Book

Organization

1.3. Reliability

Program

Elements

1.4. The History of Reliability Prediction