• No results found

Reliability Engineering Notes

N/A
N/A
Protected

Academic year: 2021

Share "Reliability Engineering Notes"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Roozbeh Izadi-Zamanabadi

Roozbeh Izadi-Zamanabadi

Lecture Notes - Practical

Lecture Notes - Practical

Approach to Reliability, Safety,

Approach to Reliability, Safety,

and Active Fault-tolerance

and Active Fault-tolerance

December 11, 2000

December 11, 2000

Aalborg University

Aalborg University

Department of Control Engineering

Department of Control Engineering

Fredrik Bajers Vej 7C

Fredrik Bajers Vej 7C

DK-9220 Aalborg

DK-9220 Aalborg

Denmark 

(2)

Table of Contents

Table of Contents

1.

1. INTINTRORODUDUCTCTIONION 44

1.1

1.1 TTermerminoinologlogyy .. . . 44

2.

2. RELRELIABIIABILITY LITY AND AND SAFSAFETYETY 66

2.1

2.1 RelReliabiabilitilityy . . . 66 2.1

2.1.1 .1 BasBasic deic definifinitiontions . . . s . . . 66 2.1.2

2.1.2 ConstaConstant nt failurfailure e rate rate model model and and the the expoexponentianential l distribdistributions utions 88 2.1

2.1.3 .3 MeaMean tn time ime betbetweeween fn failuailure wre with ith conconstastant nt faifailure lure raterate . . . 99 2.2

2.2 SafSafety . . . ety . . . 1212 2.3

2.3 HazHazard Aard Analnalysiysis s . . . 1313 2.3

2.3.1 .1 SteSteps in tps in the Hahe Hazarzard and analyalysis psis procrocess ess . . . 1414 2.3

2.3.2 .2 SysSystem motem model . . . . del . . . 1515 2.3

2.3.3 .3 HazHazard lard levevel el . . . 1515 2.4

2.4 HazarHazard anald analysis mysis models odels and teand techniqchniques ues . . . 1717 2.4

2.4.1 .1 FaiFailure lure ModMode ane and Efd Effecfect Anat Analyslysis (Fis (FMEAMEA)) . . . 1717 2.4.2

2.4.2 FailuFailure Mre Mode, ode, EffecEffect at and nd CriticaCritically Ally Analysnalysis (Fis (FMECA) . . MECA) . . . 1818 2.4.3

2.4.3 Fault TFault Tree Anaree Analysis (lysis (FTFTA) . A) . . . 1919 2.4

2.4.4 .4 RisRisk k . . . 2323 2.5

2.5 ReliabReliability anility and safd safety asety assesssessment - ament - an ovn overvieerview w . . . 2626 2.5

2.5.1 .1 RelReliabiabilitility Assy Assessessmenmentt . . . 2626 2.5

2.5.2 .2 SteSteps in rps in reliaeliabilibility asty assessessmesmentnt . . . 2626 2.5

2.5.3 .3 SafSafety Aety Assessessmssment ent . . . 2626 2.5

2.5.4 .4 SteSteps in sps in safeafety asty assessessmesments . . . . nts . . . 2626

3.

3. ACACTIVE TIVE FFAULAULTT-TOLE-TOLERANT RANT SYSTEM SYSTEM DESIGNDESIGN 2828 3.1

3.1 IntrIntroduoductioction n . . . 2828 3.2

3.2 TType oype of fauf faults . . . lts . . . 2828 3.3

3.3 HarHardwadware fre faulault-tot-tolerlerancance e . . . 2929 3.3.1

3.3.1 MechaMechanical nical and Eleand Electricactrical systel systems . ms . . . 3131 3.4

3.4 ReliabReliability coility considensiderationrations for s for standstandby conby configuratfigurations . . . ions . . . 3131 3.5

3.5 SofSoftwatware fare fault-ult-toletoleranrance . . . ce . . . 3535 3.6

3.6 ActivActive faue fault-tolelt-tolerant rant controcontrol sysl system destem design ign . . . 3636 3.6.1

3.6.1 JustifiJustificatiocation for apn for applying plying fault-fault-toleratolerance . . . nce . . . 3636

                                                                                                                                                             

(3)

T

Tabablle e oof f CCoonntetenntts s 33

3.6.2

3.6.2 A proceduA procedure for desigre for designing actning active fauive fault-tolerlt-tolerant (conant (control)trol) sys

(4)

1. INTRODUCTION

1. INTRODUCTION

The fundamental objective of the combined safety and Reliability The fundamental objective of the combined safety and Reliability assess-ment is to identify critical items in the design and the choice of equipassess-ment ment is to identify critical items in the design and the choice of equipment that

that maymay jeopajeopardizerdize safetsafetyy or or avavailabiailabilitylity,, andand theretherebyby to to proviprovidede arguargumentsments for the selection between different options for the system.

for the selection between different options for the system.

Achieving safety and reliability has been one the prime objectives for system Achieving safety and reliability has been one the prime objectives for system designers while designing safety critical system for dec

designers while designing safety critical system for decades. With growingades. With growing environ- environ-mental awareness, concerns, and demands, the scope of the design of reliable (and mental awareness, concerns, and demands, the scope of the design of reliable (and safe) sys

safe) systems has betems has beenen enhanenhancedced to eveto evenn small comsmall componenponentsts as sensoas sensors andrs and actuaactuators.tors. In the past, the normal procedure to address the higher demand for reliability was In the past, the normal procedure to address the higher demand for reliability was to add hardware redundancy that in turn increases the production and maintenance to add hardware redundancy that in turn increases the production and maintenance costs. Active fault-tolerant design is an attempt to achieve higher redundancy while costs. Active fault-tolerant design is an attempt to achieve higher redundancy while minimizing the costs.

minimizing the costs.

In chapter 2 reliability and safety related issues are considered and described. In chapter 2 reliability and safety related issues are considered and described. The idea of introducing this chapter is to provide an overview of the concepts and The idea of introducing this chapter is to provide an overview of the concepts and methods used for reliability and safety assessment.

methods used for reliability and safety assessment.

The focus in chapter 3 is on fault-tolerance concept. Type of possible faults in The focus in chapter 3 is on fault-tolerance concept. Type of possible faults in components and customary methods for applying redundancy is described. Finally, components and customary methods for applying redundancy is described. Finally, the chapter is wrapped up by considering and describing the main subject, which is the chapter is wrapped up by considering and describing the main subject, which is a formal and consistent procedure to design active fault-tolerant systems.

a formal and consistent procedure to design active fault-tolerant systems.

1.1 Terminology

1.1 Terminology

Definition of the used notations in this report is provided in this section. Definition of the used notations in this report is provided in this section.

 Availability:

 Availability: The probability that a The probability that a systesystem m is is perforperforming satisfming satisfactorilactorilyy at time

at time ..

Fault-tolerance:

Fault-tolerance: Ability to tolerate and accommodate for a fault with orAbility to tolerate and accommodate for a fault with or without performance degradation.

without performance degradation.

Fail-operational:

Fail-operational: The component/unit stays operational after one failure.The component/unit stays operational after one failure.

(5)

1

1..1 1 TTeerrmmiinnoollooggy y 55

Fail-safe:

Fail-safe: TheThe compocomponent/unent/unitnit proceprocesses dsses directlyirectly to a predto a predefinedefined safesafe state (actively or passively) after one (or several) fault has state (actively or passively) after one (or several) fault has occurred.

occurred.

 Hazard:

 Hazard: the presence of any process abnormality which representsthe presence of any process abnormality which represents a dangerous or potentially dangerous state.

a dangerous or potentially dangerous state.

 MTBF:

 MTBF: Mean time between failureMean time between failure

 MTTR:

 MTTR: Mean time to repairMean time to repair

 Reliability:

 Reliability: is the characteristic of an item expressed by the probabil-is the characteristic of an item expressed by the probabil-ity that it will pe

ity that it will performrform its requirits requireded functfunctionion in the specin the specifiedified manne

manner r oveover r a a givegiven time n time period and under specifieperiod and under specified d andand assumed conditions.

assumed conditions.

 Risk:

 Risk: (normally) refers to the(normally) refers to thestatistical annual statistical annual frequenfrequencycy of theof the particular hazard state. Ex. once per year, or once per particular hazard state. Ex. once per year, or once per 10000 years.

10000 years.

 Risk

 Risk reductireduction:on: Reduce the probability of hazardous eventsReduce the probability of hazardous events

Safety:

Safety: is freedom from accident or losses.is freedom from accident or losses.

Safety

Safety integriintegrity:ty: is is the probability of the probability of a a safetsafety-relaty-related system ed system satisfsatisfactorilactorilyy performing the required safety functions under all stated performing the required safety functions under all stated conditions within a stated period of time.

conditions within a stated period of time.

Safety Integrity Level:

Safety Integrity Level: AAveveragragee proprobabbabilitilityy ofof faifailurluree toto perperformform itsits desdesignigneded funfunc- c-tion on demand (

tion on demand (SILSIL).).

Potential Hazard states, are most often identified by carrying out hazard and Potential Hazard states, are most often identified by carrying out hazard and operational studies known as

operational studies known as HAZOPHAZOPs.s.

£ £ 

(6)

2. RELIABILITY AND SAFETY

2. RELIABILITY AND SAFETY

2.1 Reliability

2.1 Reliability

Reliability engineering is concerned primarily with failures and failure rate Reliability engineering is concerned primarily with failures and failure rate reduc-tion. The reliability engineering approach to safety thus concentrate on failures as tion. The reliability engineering approach to safety thus concentrate on failures as cause of accidents. Reliability engineers use a variety of techniques to minimize cause of accidents. Reliability engineers use a variety of techniques to minimize component failure (Leveson 1995).

component failure (Leveson 1995).

Definition of reliability embraces the clear-cut criterion for failure, from which Definition of reliability embraces the clear-cut criterion for failure, from which we may judge at what point the system is no longer functioning properly.

we may judge at what point the system is no longer functioning properly.

2.1.1

2.1.1 Basic Basic definitionsdefinitions

Reliability is defined as the probability that a system survives for some specified Reliability is defined as the probability that a system survives for some specified period of time. It may be expressed in term of the random

period of time. It may be expressed in term of the random , the, thetime-to-system- time-to-system- failure

 failure. The probability density function (PDF),. The probability density function (PDF), , has the physical meaning:, has the physical meaning: Probability that failure takes place Probability that failure takes place

at a time between

at a time between andand (2.1)(2.1) for vanishing small

for vanishing small . The cumulative distribution function (CDF),. The cumulative distribution function (CDF), has nowhas now the following meaning:

the following meaning:

(2.2) (2.2) The

Thereliabilityreliabilityis defined as:is defined as:

Probability that a system operates Probability that a system operates without failure for a length of time

without failure for a length of time (2.3)(2.3) Since a system that does not fail for

Since a system that does not fail for must fail at somemust fail at some , one get:, one get:

(2.4) (2.4) or equivalently either or equivalently either ؠؠ    ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µ µ  ¡¡ ؠؠ    È È     ØØ      ؠؠ    ؠؠ · ·  ¡¡ ؠؠ            ؠؠ ¡¡ ؠؠ      ¡¡ ؠؠ    ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µµ    È È     ؠؠ    ؠؠ           

Probability that failure takes place Probability that failure takes place

at a time less

at a time less than or equal tothan or equal toؠؠ

     Ê Ê  ´ ´  ؠؠ µµ    È È     ؠؠ  ؠؠ            ؠؠ      ؠؠ    ؠؠ ؠؠ  ؠؠ Ê Ê  ´ ´  ؠؠ µµ  ½ ½        ´ ´  ؠؠ µ µ 

(7)

2 2..1 1 RReelliiaabbiilliitty y 77 (2.5) (2.5) or or (2.6) (2.6) Following properties are clear:

Following properties are clear:

and

and (2.7)(2.7)

One can see from equation 2.4 that reliability is the complementary cumulative One can see from equation 2.4 that reliability is the complementary cumulative dis-tribu

tribution function (CCDF), tion function (CCDF), that is,that is, . Similarly, since. Similarly, since is theis the probability that the system will fail before

probability that the system will fail before , it is often referred to as the unreli-, it is often referred to as the unreli-ability or failure probunreli-ability, i.e.,

ability or failure probability, i.e., ..

Equation 2.5 may be inverted by differentiation to give the PDF of failure times Equation 2.5 may be inverted by differentiation to give the PDF of failure times in terms of the reliability:

in terms of the reliability:

(2.8) (2.8) One can gain insight into failure mechanisms by examining the behavior of the One can gain insight into failure mechanisms by examining the behavior of the failure rate. The

failure rate. The failure ratefailure rate,, , may be defined in terms of the reliability or the, may be defined in terms of the reliability or the PDF of the time-to-failure as follows. Let

PDF of the time-to-failure as follows. Let be the probability that the systembe the probability that the system will fail at some time

will fail at some time given that it has not yet failed at timegiven that it has not yet failed at time . Thus it is. Thus it is the conditional probability

the conditional probability

(2.9) (2.9) Following the definition of conditional probability we have:

Following the definition of conditional probability we have:

(2.10) (2.10) The numerator of equation 2.10 is just an alternative way of writing the PDF, i.e.: The numerator of equation 2.10 is just an alternative way of writing the PDF, i.e.:

(2.11) (2.11) The De-numerator of equation 2.10 is just R(t) (see equation 2.3). Therefore, by The De-numerator of equation 2.10 is just R(t) (see equation 2.3). Therefore, by combining equations, one obtains:

combining equations, one obtains:

(2.12) (2.12) This quality, the failure rate, is also referred to as the

This quality, the failure rate, is also referred to as thehazard hazard orormortality ratemortality rate.. The

The mosmostt useusedd way way toto exexprepressss the the relreliabiabilitilityy andand thethe faifailurelure PDF PDF is is inin tertermsms ofof faifailurluree rate. To do this, we first eliminate

rate. To do this, we first eliminate from Eq. 2.12 by inserting Eq. 2.8 to obtainfrom Eq. 2.12 by inserting Eq. 2.8 to obtain the failure rate in terms of the reliability,

the failure rate in terms of the reliability,

Ê Ê  ´ ´  ؠؠ µµ  ½ ½          ؠؠ ¼ ¼     ´ ´  ؠؠ ¼ ¼  µ µ   ؠؠ ¼ ¼  Ê Ê  ´ ´  ؠؠ µµ         ½ ½  ؠؠ    ´ ´  ؠؠ ¼ ¼  µ µ   ؠؠ ¼ ¼  Ê Ê  ´´ ¼¼ µµ  ½½    Ê Ê  ´ ´  ½  ½   µµ  ¼ ¼  Ê Ê  ´ ´  ؠؠ µµ  ½ ½        ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µ µ  ؠؠ    ؠؠ    ´ ´  ؠؠ µµ  ½ ½     Ê Ê  ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µµ           ؠؠ Ê Ê  ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µ µ  ¡¡ ؠؠ ؠؠ  ؠؠ · ·  ¡¡ ؠؠ ؠؠ    ´ ´  ؠؠ µ µ  ¡¡ ؠؠ    È È     ؠؠ  ؠؠ · ·  ¡¡ ؠؠ   Ø Ø   ؠؠ       È È     ؠؠ  ؠؠ · ·  ¡¡ ؠؠ    ؠؠ  ؠؠ       È È     ´ ´  ؠؠ  ؠؠ µ µ     ´ ´  ؠؠ  ؠؠ · ·  ¡¡ ؠؠ µ µ     È È     ؠؠ  ؠؠ       È È     ´ ´  ؠؠ  ؠؠ µ µ     ´ ´  ؠؠ  ؠؠ · ·  ¡¡ ؠؠ µ µ        È È     ØØ      ؠؠ  ؠؠ · ·  ¡¡ ؠؠ          ´ ´  ؠؠ µ µ  ¡¡ ØØ       ´ ´  ؠؠ µµ       ´ ´  ؠؠ µ µ  Ê Ê  ´ ´  ؠؠ µ µ        ´ ´  ؠؠ µ µ 

(8)

(2.13) (2.13) Then by multiplying both side of the equation by

Then by multiplying both side of the equation by and integrating between zeroand integrating between zero and

and , we obtain, we obtain

(2.14) (2.14) since

since

exponentiating the results exponentiating the results

(2.15) (2.15) The probability density function

The probability density function is obtained by inserting Eq. 2.15 into Eq.is obtained by inserting Eq. 2.15 into Eq. 2.12 and solve for

2.12 and solve for

(2.16) (2.16) The most used parameter to characterize reliability is the

The most used parameter to characterize reliability is the mean time to failuremean time to failureoror MTTF. It is just the expected or mean value of 

MTTF. It is just the expected or mean value of  of the failure at timeof the failure at time . Hence. Hence MTTF

MTTF (2.17)(2.17)

The MTTF may be written directly in terms of the reliability by substituting Eq. 2.8 The MTTF may be written directly in terms of the reliability by substituting Eq. 2.8 and integration by parts:

and integration by parts: MTTF

MTTF (2.18)(2.18)

The term

The term vanishes atvanishes at . Similarly, from Eq. 2.15, we see that. Similarly, from Eq. 2.15, we see that willwill decay exponentially, since the failure rate

decay exponentially, since the failure rate greater than zero. Thusgreater than zero. Thus as

as . Therefore, we have. Therefore, we have MTTF MTTF

2.1.2

2.1.2 Constant failure ratConstant failure rate model and the exponential distributie model and the exponential distributionsons

Random failures that give rise to the constant failure rate model are the most widely Random failures that give rise to the constant failure rate model are the most widely used basis for desc

used basis for describingribing reliabreliability phenoility phenomena.mena. TheyThey are definedare defined by the assumptby the assumptionion that the rate at which

that the rate at which the system fails is indepethe system fails is independentndent of its age. For of its age. For continuouslycontinuously op- op-erating systems this implies a constant failure rate.

erating systems this implies a constant failure rate.

The constant failure rate model for continuously operating systems leads to an The constant failure rate model for continuously operating systems leads to an ex-ponential distribution. Replacing the time dependent failure rate

ponential distribution. Replacing the time dependent failure rate by a constantby a constant in Eq. 2.16 yields, for the FDP,

in Eq. 2.16 yields, for the FDP,

   ´ ´  ؠؠ µµ       ½ ½  Ê Ê  ´ ´  ؠؠ µ µ      ؠؠ Ê Ê  ´ ´  ؠؠ µ µ      ؠؠ ؠؠ      ؠؠ ¼ ¼     ´ ´  ؠؠ ¼ ¼  µ µ   ؠؠ ¼ ¼        ÐÐ ÒÒ    Ê Ê  ´ ´  ؠؠ µµ    Ê Ê 

´´ ¼¼ µµ  ½½ . The desired expression for reliability can hence be derived by. The desired expression for reliability can hence be derived by  

Ê Ê  ´ ´  ؠؠ µµ   ÜÜ ÔÔ               ؠؠ ¼ ¼     ´ ´  ؠؠ ¼ ¼  µ µ   ؠؠ ¼ ¼           ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µ µ  ::    ´ ´  ؠؠ µµ       ´ ´  ؠؠ µµ  ÜÜ ÔÔ               ؠؠ ¼ ¼     ´ ´  ؠؠ ¼ ¼  µ µ   ؠؠ ¼ ¼              ؠؠ    ؠؠ         ½ ½  ¼ ¼  ØØ    ´ ´  ؠؠ µ µ   ØØ               ½ ½  ¼ ¼  ؠؠ  Ê Ê   ؠؠ  ؠؠ       ØØ Ê Ê  ´ ´  ؠؠ µ µ  ¬ ¬  ¬ ¬  ¬ ¬  ¬ ¬  ½ ½  ¼ ¼  · ·       ½ ½  ¼ ¼  Ê Ê  ´ ´  ؠؠ µ µ   ؠؠ ØØ Ê Ê  ´ ´  ؠؠ µ µ  ؠؠ  ¼ ¼  Ê Ê  ´ ´  ؠؠ µ µ     ´ ´  ؠؠ µ µ  ØØ Ê Ê  ´ ´  ؠؠ µ µ       ¼ ¼  ؠؠ  ½  ½           ½ ½  ¼ ¼  Ê Ê  ´ ´  ؠؠ µ µ   ؠؠ (2.19)(2.19)    ´ ´  ؠؠ µ µ    

(9)

2

2..1 1 RReelliiaabbiilliitty y 99

(2.20) (2.20) Similarly, the CDF becomes

Similarly, the CDF becomes

(2.21) (2.21) and reliability may be written as (see Eq. 2.4)

and reliability may be written as (see Eq. 2.4)

(2.22) (2.22) The MTTF and the variance of the failure times are also given in terms of  The MTTF and the variance of the failure times are also given in terms of  .. From Eq. 2.19 we obtain

From Eq. 2.19 we obtain

MTTF MTTF and the variance is found to be:

and the variance is found to be:

(2.24) (2.24)

2.1.3

2.1.3 Mean time between failMean time between failure with constant fure with constant failure rateailure rate

In many situations failure does not constitute the end of life. Rather, the system In many situations failure does not constitute the end of life. Rather, the system is immediately replaced or repaired and operation continues. In such situations a is immediately replaced or repaired and operation continues. In such situations a number of new pieces of information become important. We may want to know the number of new pieces of information become important. We may want to know the expected number of failures over some specified period of time in order to estimate expected number of failures over some specified period of time in order to estimate the cost of replacement parts. More important, it may be necessary to estimate the the cost of replacement parts. More important, it may be necessary to estimate the probability that more than a specific number of failures

probability that more than a specific number of failures will occur over a periodwill occur over a period of time. Such information allows us to maintain an adequate inventory of repair of time. Such information allows us to maintain an adequate inventory of repair parts.

parts.

In modeling these situations, one restrict his/her attention to the constant failure In modeling these situations, one restrict his/her attention to the constant failure rate approximation. In this the failure rate is often given in terms of the

rate approximation. In this the failure rate is often given in terms of the mean timemean time between failure

between failure (MTBF), as opposed to the mean time to failure or MTTF. In fact(MTBF), as opposed to the mean time to failure or MTTF. In fact they are both the same number if, when a system fails it is assumed to be repaired they are both the same number if, when a system fails it is assumed to be repaired immediately to an as-good-as-new condition.

immediately to an as-good-as-new condition. W

We first conside first considerer the times at whthe times at whichich the failuthe failuresres take platake place,ce, and therand thereforeefore the numbthe numberer that occur within any given span of the time. Suppose that we let

that occur within any given span of the time. Suppose that we let be a discretebe a discrete random variable representing the number of failures that take place between random variable representing the number of failures that take place between and a time

and a time . Let. Let

(2.25) (2.25) be the probability that exactly

be the probability that exactly

we start counting failures at time zero, we must have we start counting failures at time zero, we must have

(2.26) (2.26) (2.27) (2.27)    ´ ´  ؠؠ µµ            ؠؠ       ´ ´  ؠؠ µµ  ½ ½            ؠؠ    Ê Ê  ´ ´  ؠؠ µµ           ؠؠ          ½ ½        (2.23)(2.23)    ¾ ¾   ½ ½      ¾ ¾     Æ Æ  Ò Ò  ؠؠ  ¼ ¼  ؠؠ  Ô  Ô  Ò Ò  ´ ´  ؠؠ µµ    È È     Ò Ò     Ò Ò    Ø Ø    

Ò Ò  failures have taken place before timefailures have taken place before time . Clearly, if Ø Ø  . Clearly, if 

 Ô  Ô  ¼ ¼  ´´ ¼¼ µµ  ½½        Ô  Ô  Ò Ò  ´´ ¼¼ µµ  ¼¼     Ò Ò   ¼ ¼     ½ ½     ¾ ¾             ½  ½  

(10)

In addition, at any time In addition, at any time

(2.28) (2.28) For small

For small , let failure, let failure be the be the probaprobability that thebility that the th failure will th failure will taketake place during the time increment between

place during the time increment between andand , given that exactly, given that exactly failuresfailures have

have taken place taken place beforebefore timetime . Then the probab. Then the probabilityility that no failure wilthat no failure will occurl occur duringduring before

before my be written asmy be written as

(2.29) (2.29) Then noting that

Then noting that

(2.30) (2.30) we obtain the simple differential equation

we obtain the simple differential equation

(2.31) (2.31) Using the initial condition, Eq. 2.26, we find

Using the initial condition, Eq. 2.26, we find

(2.32) (2.32) With

With determined, we may now solve successively fordetermined, we may now solve successively for in the following manner.We first observe that if 

in the following manner.We first observe that if  failures have taken place beforefailures have taken place before time

time , the probab, the probabilityility that thethat the th failure will take place betweenth failure will take place between andand is

is . Therefore, since this transition probability is independent of the number of . Therefore, since this transition probability is independent of the number of  previous failures, we may write

previous failures, we may write

The last term accounts for the probability that no failure takes place during

The last term accounts for the probability that no failure takes place during . For. For sufficiently small

sufficiently small we can ignore the possibility of two or more failures takingwe can ignore the possibility of two or more failures taking place.

place.

Using the definition of the derivative once again, we may reduce Eq.

Using the definition of the derivative once again, we may reduce Eq. ????to the dif-to the dif-ferential equation

ferential equation

(2.34) (2.34) with the following solution

with the following solution

½ ½         Ò Ò   ¼ ¼   Ô  Ô  Ò Ò  ´ ´  ؠؠ µµ  ½ ½     ¡¡ ؠؠ  ¡¡ ؠؠ ´ ´  Ò Ò  ·· ½½ µ µ  ؠؠ ؠؠ · ·  ¡¡ ؠؠ Ò Ò  ؠؠ

¡¡ ؠؠ isis . From this we see that the probability that no failures have occurred. From this we see that the probability that no failures have occurred ½ ½      ¡¡ ؠؠ ؠؠ · ·  ¡¡ ؠؠ  Ô  Ô  ¼ ¼  ´ ´  ؠؠ · ·  ¡¡ ؠؠ µµ  ´´ ½½       ¡¡ ؠؠ µ µ   Ô  Ô  ¼ ¼  ´ ´  ؠؠ µ µ         ؠؠ  Ô  Ô  Ò Ò  ´ ´  ؠؠ µµ  ÐÐ ÑÑ    ¡¡ ؠؠ    ¼ ¼   Ô  Ô  Ò Ò  ´ ´  ؠؠ · ·  ¡¡ ؠؠ µ µ      Ô  Ô  ¼ ¼  ´ ´  ؠؠ µ µ  ¡¡ ؠؠ     ؠؠ  Ô  Ô  ¼ ¼  ´ ´  ؠؠ µµ        Ô Ô  ¼ ¼  ´ ´  ؠؠ µ µ   Ô  Ô  ¼ ¼  ´ ´  ؠؠ µµ           ؠؠ     Ô  Ô  ¼ ¼  ´ ´  ؠؠ µ µ   Ô  Ô  Ò Ò  ´ ´  ؠؠ µ µ   Ò Ò   ½ ½     ¾ ¾     ¿ ¿         Ò Ò  ؠؠ ´ ´  Ò Ò  ·· ½½ µ µ  ؠؠ ؠؠ · ·  ¡¡ ؠؠ  ¡¡ ؠؠ  Ô  Ô  Ò Ò  ´ ´  ؠؠ · ·  ¡¡ ؠؠ µµ     ¡¡ ؠؠ µ µ   Ô  Ô  Ò Ò     ½ ½  ´ ´  ؠؠ µµ ·· ´´ ½½       ¡¡ ؠؠ µ µ   Ô  Ô  Ò Ò  ´ ´  ؠؠ µ µ     (2.33)(2.33) ¡¡ ؠؠ ¡¡ ؠؠ     ؠؠ  Ô  Ô  Ò Ò  ´ ´  ؠؠ µµ        Ô Ô  Ò Ò  ´ ´  ؠؠ µµ · ·   Ô Ô  Ò Ò     ½ ½  ´ ´  ؠؠ µ µ      ؠؠ  Ô  Ô  Ò Ò  ´ ´  ؠؠ µ µ      Ô  Ô  Ò Ò  ´´ ¼¼ µµ             ؠؠ  Ô  Ô  Ò Ò     ½ ½  ´ ´  ؠؠ ¼ ¼  µ µ      ؠؠ ¼ ¼   ؠؠ ¼ ¼     (2.35)(2.35)

(11)

2

2.1 .1 RReeliliaabbililitity y 1111

But since from Eq. 2.27 But since from Eq. 2.27

(2.36) (2.36) This recursive relationship allows us to calculate the

This recursive relationship allows us to calculate the , insert, insert Eq. 2.32 on the right-hand side and carry out the integral to obtain

Eq. 2.32 on the right-hand side and carry out the integral to obtain

(2.37) (2.37) and for all

and for all the Eq. 2.36 is satisfied bythe Eq. 2.36 is satisfied by

(2.38) (2.38) and these quantities in turn satisfy the initial conditions given by Eqs. 2.26 and and these quantities in turn satisfy the initial conditions given by Eqs. 2.26 and 2.27.

2.27.

The probabilities

The probabilities are the same as the Poisson distribution withare the same as the Poisson distribution with . Thus. Thus it is possible to determine the mean and the variance of the number

it is possible to determine the mean and the variance of the number of eventsof events occurring over a time span

occurring over a time span . Thus the expected number of failures during time. Thus the expected number of failures during time isis

and the variance of  and the variance of  isis

(2.40) (2.40) Since

Since are the probability mass functions of a discrete variableare the probability mass functions of a discrete variable , we must, we must have,

have,

(2.41) (2.41) The number of failures can be related to the mean time between failures by

The number of failures can be related to the mean time between failures by

MTBF

MTBF (2.42)(2.42)

This is the expression that relates

This is the expression that relates and the MTBF assuming a constant failureand the MTBF assuming a constant failure rate.

rate.

In general, the MTBF may be determined from In general, the MTBF may be determined from

MTBF

MTBF (2.43)(2.43)

where

where , the number of failures, is large., the number of failures, is large. The probability that more than

The probability that more than failures have occurred isfailures have occurred is

 Ô  Ô  Ò Ò  ´´ ¼¼ µµ  ¼¼ , we have, we have    Ô  Ô  Ò Ò  ´ ´  ؠؠ µµ            ؠؠ      ؠؠ ¼ ¼   Ô  Ô  Ò Ò     ½ ½  ´ ´  ؠؠ ¼ ¼  µ µ      ؠؠ ¼ ¼   ؠؠ ¼ ¼      Ô  Ô 

Ò Ò  successively. Forsuccessively. For

 Ô  Ô  ½ ½   Ô  Ô  ½ ½  ´ ´  ؠؠ µµ     ØØ        ؠؠ Ò Ò     ¼ ¼   Ô  Ô  ½ ½  ´ ´  ؠؠ µµ    ´ ´   ؠؠ µ µ  Ò Ò  Ò Ò            ؠؠ  Ô  Ô  Ò Ò  ´ ´  ؠؠ µ µ         ؠؠ Ò Ò  ؠؠ ؠؠ    Ò Ò             Ò Ò         ØØ    (2.39)(2.39) Ò Ò     ¾ ¾  Ò Ò      ØØ     Ô  Ô  Ò Ò  ´ ´  ؠؠ µ µ  Ò Ò  ½ ½         Ò Ò   ¼ ¼   Ô  Ô  Ò Ò  ´ ´  ؠؠ µµ  ½ ½     Ò Ò     ؠؠ    Ò Ò     ؠؠ Ò Ò     Ò Ò  Æ Æ 

(12)

(2.44) (2.44) Instead of writing this infinite series, however, we may use Eq. 2.41 to write

Instead of writing this infinite series, however, we may use Eq. 2.41 to write

(2.45) (2.45)

2.2 Safety

2.2 Safety

While using various techniques are often effective in increasing reliability, they do While using various techniques are often effective in increasing reliability, they do not necessarily increase safety. Under some conditions

not necessarily increase safety. Under some conditions they may even reduce safetythey may even reduce safety.. For example, increasing

For example, increasing the burst-pressure to working-presthe burst-pressure to working-pressuresure ratio of a tank intro-ratio of a tank intro-duces new dangers of an explosion or chemical reaction in the event of a rapture. duces new dangers of an explosion or chemical reaction in the event of a rapture. Assuming that reliability and safety are synonymous is hence true in special cases. Assuming that reliability and safety are synonymous is hence true in special cases. In general, safety has a broader scope than failures, and failures may not In general, safety has a broader scope than failures, and failures may not compro-mise safety. There is obviously an overlap between reliability and safety (Leveson mise safety. There is obviously an overlap between reliability and safety (Leveson 1995) (figure 2.1); manly acciden

1995) (figure 2.1); manly accidents may occur without componentts may occur without component failure - the indi-failure - the

indi-Safety

Safety

Reliability

Reliability

Fig. 2.1.

Fig. 2.1.Safety and reliability are overlapping, but are not identical.Safety and reliability are overlapping, but are not identical.

vidua

vidual componenl componentsts where operwhere operatingating exacexactly as specified or intendedtly as specified or intended,, that is, withoutthat is, without fai

failurlure.e. TheThe oppopposiositete is also trueis also true - - ComComponponententss maymay faifail l withwithoutout aa resresultultinging accaccideident.nt. Accidents may be caused by equipment operation outside the parameters and time Accidents may be caused by equipment operation outside the parameters and time limits upon which the reliability analysis are based. Therefore, a system may have limits upon which the reliability analysis are based. Therefore, a system may have high reliability and still have accidents. Safety is an emergent property that arises at high reliability and still have accidents. Safety is an emergent property that arises at the system level when components are operating together.

the system level when components are operating together.

For a safety-related system all aspects of reliability, availability, maintainability, For a safety-related system all aspects of reliability, availability, maintainability, and safety (RAMS) have to be considered as they are relevant to the responsibility and safety (RAMS) have to be considered as they are relevant to the responsibility of

of manufmanufactureacturers and rs and the acceptabthe acceptability of ility of the customersthe customers. . Safety and reliability areSafety and reliability are generally achieved by a combination of 

generally achieved by a combination of  fault prevention fault prevention È È     Ò Ò   Æ Æ        ½ ½         Ò Ò     Æ Æ  ·· ½ ½  ´ ´   ؠؠ µ µ  Ò Ò  Ò Ò            ؠؠ    È È     Ò Ò   Æ Æ      ½ ½     Æ Æ         Ò Ò   ¼ ¼  ´ ´   ؠؠ µ µ  Ò Ò  Ò Ò            ؠؠ    Á Á 

(13)

2.

2.3 3 HHazazarard d AnAnalalysysis is 1313

fault tolerance fault tolerance

fault detection and diagnosis fault detection and diagnosis

autonomous supervision and protection autonomous supervision and protection There are two stages for fault prevention:

There are two stages for fault prevention: fault avoidancefault avoidance andand fault removalfault removal.. Fault avoidance and removal has to be accomplished during the design and test Fault avoidance and removal has to be accomplished during the design and test phase.

phase.

2.3 Hazard Analysis

2.3 Hazard Analysis

Hazard analysis is the essential part of any effective safety program, providing Hazard analysis is the essential part of any effective safety program, providing vis-ibility and coordination (Leveson 1995) (see figure 2.2). Information flows both ibility and coordination (Leveson 1995) (see figure 2.2). Information flows both

Hazard

Hazard

analysis

analysis

Design

Design

Management

Management

Maintenance

Maintenance

Operations

Operations

Training

Training

QA

QA

Test

Test

Fig. 2.2.

Fig. 2.2.Hazard analysis provides visibility and coordination.Hazard analysis provides visibility and coordination.

outward from and back into the hazard analysis. The outward information can help outward from and back into the hazard analysis. The outward information can help designers perform

designers perform trade studies atrade studies and eliminate or nd eliminate or mitigatemitigate hazards and hazards and can help can help qual- qual-ity assurance identify qualqual-ity categories, acceptance tests, required inspections, and ity assurance identify quality categories, acceptance tests, required inspections, and components that need special care. Hazard analysis is a necessary step before components that need special care. Hazard analysis is a necessary step before haz-ards can be eliminated or controlled through design or operational procedures. ards can be eliminated or controlled through design or operational procedures. Per-forming this analysis, however, does not ensure safety.

forming this analysis, however, does not ensure safety. Hazar

Hazard d analyanalysis is sis is perforperformed continuomed continuously througusly throughout the hout the life of life of the system, the system, withwith increasing depth and extent as more information is obtained about the system increasing depth and extent as more information is obtained about the system de-sign. As the project progress, the use of hazard analysis will vary - such as sign. As the project progress, the use of hazard analysis will vary - such as identi-fying hazards, testing basic assumptions and various scenarios about the system fying hazards, testing basic assumptions and various scenarios about the system op-eration, specifying operational and maintenance tasks, planning training programs, eration, specifying operational and maintenance tasks, planning training programs, evaluation potential changes, and evaluating the assumptions and foundation of the evaluation potential changes, and evaluating the assumptions and foundation of the models as the system is used and feedback is obtained.

models as the system is used and feedback is obtained.

Á Á 

Á Á 

(14)

The

The variovariousus hazarhazardd analyanalysissis techntechniquesprovideiquesprovide formaliformalismssms forfor systesystematizimatizingng knoknowl- wl-edge, draw attention to gaps in knowlwl-edge, help prevent important consideration edge, draw attention to gaps in knowledge, help prevent important consideration being missed, and aid in reasoning about systems and determining where being missed, and aid in reasoning about systems and determining where improve-ments are most likely to be effective.

ments are most likely to be effective.

The way of choosing the appropriate hazard analysis process depends on the The way of choosing the appropriate hazard analysis process depends on the goals or purposes of the hazard analysis. The goals of safety analysis are related to goals or purposes of the hazard analysis. The goals of safety analysis are related to three general tasks:

three general tasks: 1.

1. DevelopmentDevelopment: the examination of a new system to identify and assess potential: the examination of a new system to identify and assess potential hazards and eliminate or control them.

hazards and eliminate or control them. 2.

2. Operational managementOperational management: the examination of an existing system to identify: the examination of an existing system to identify and assess hazards in order to improve the level of safety, to formulate a safety and assess hazards in order to improve the level of safety, to formulate a safety manag

managementement policpolicy,y, to to traintrain persopersonnel,nnel, andand to to increaincreasese motivmotivationation forfor efficefficienciencyy and safety of operation.

and safety of operation. 3.

3. CertificationCertification: the examination of a planned or existing system to demonstrate: the examination of a planned or existing system to demonstrate its level of safety in order to be accepted by the authorities or the public. its level of safety in order to be accepted by the authorities or the public. The first two tasks have the common goal of making the system safer (by using The first two tasks have the common goal of making the system safer (by using appropriate techniques to engineer safer systems), while the third task has the goal appropriate techniques to engineer safer systems), while the third task has the goal of con

of convincingvincing managementmanagement or goveor governmentrnment licenser that licenser that an existing an existing design or design or systemsystem is safe (which is also called

is safe (which is also called risk assessmentrisk assessment).).

2.3.1

2.3.1 Steps in the Hazard Steps in the Hazard analysis procesanalysis processs

A hazard analysis consists of the following steps: A hazard analysis consists of the following steps:

1.

1. Definition oDefinition of objectivf objectives.es. 2.

2. DefinitDefinition of sion of scope.cope. 3.

3. DefinitDefinition and ion and descrdescription of iption of the system, system boundarithe system, system boundaries, and informationes, and information to be used in the analysis.

to be used in the analysis. 4.

4. IdentiIdentificatiofication of hazardsn of hazards.. 5.

5. Collection of data (such as hCollection of data (such as historical data, related standards istorical data, related standards and codes of prac-and codes of prac-tice, scientific tests and experimental results).

tice, scientific tests and experimental results). 6.

6. QualitaQualitative rankintive ranking of g of hazahazards based on rds based on their potentiatheir potential l effeeffects (immediate orcts (immediate or protracted) and perhaps their likelihood (qualitative or quantitative).

protracted) and perhaps their likelihood (qualitative or quantitative). 7.

7. Identification oIdentification of caused f caused factors.factors. 8.

8. Identification of preventivIdentification of preventive or corrective e or corrective measures and general design criteriameasures and general design criteria and controls.

and controls. 9.

9. EvalEvaluation of uation of prevpreventientive or ve or correccorrective meastive measures, includinures, including estimates of g estimates of cost.cost. Relative cost ranking may be adequate.

Relative cost ranking may be adequate. 10.

10. VVerification that controls have been implemented correctly and are effective.erification that controls have been implemented correctly and are effective. 11.

11. Quantification of selected, unresolved Quantification of selected, unresolved hazards, including probability ohazards, including probability of occur-f occur-rence, economic impact, potential losses, and costs of preventive or corrective rence, economic impact, potential losses, and costs of preventive or corrective measures.

measures. 12.

(15)

2.

2.3 3 HHazazarard d AnAnalalysysis is 1515

13.

13. Feedback and eFeedback and evaluation of operational exvaluation of operational experience.perience.

Not all of the steps are needed to be performed for every system and for every Not all of the steps are needed to be performed for every system and for every hazard. For new systems designs, usually the first 10 steps are necessary.

hazard. For new systems designs, usually the first 10 steps are necessary.

2.3.2

2.3.2 System System modelmodel

Every system analysis requires some type of model of the system, which may Every system analysis requires some type of model of the system, which may change from a fuzzy idea in the analysts mind to a complex and carefully specified change from a fuzzy idea in the analysts mind to a complex and carefully specified mathematical model. A model is a representation of a system that can be mathematical model. A model is a representation of a system that can be manipu-lated in order to obtain information about the system itself. Modeling any system lated in order to obtain information about the system itself. Modeling any system requires a description of the following:

requires a description of the following:

Structure

Structure: : The interrelaThe interrelationshtionship of ip of the parts the parts along some dimensioalong some dimension(s), such asn(s), such as space, time, relative importance, and logic and decision making properties: space, time, relative importance, and logic and decision making properties:

 Distinguishing

 Distinguishing qualitiesqualities:: a a qualitaqualitativetive descdescriptionription of of the the particuparticularlar varivariablesables andand parameters the characterize the system and distinguish it from similar structure. parameters the characterize the system and distinguish it from similar structure.

 Magnitude

 Magnitude, , probaprobabilitybility, , and and timetime: a description of the variables and parameters: a description of the variables and parameters with the distinguishing qualities in terms of their magnitude or frequency over with the distinguishing qualities in terms of their magnitude or frequency over time and their degree of certainty for the situations or conditions of interest. time and their degree of certainty for the situations or conditions of interest.

2.3.3

2.3.3 Hazard Hazard levellevel

The hazard category or level is characterized by (1)

The hazard category or level is characterized by (1) severityseverity(sometimes also called(sometimes also called

damage

damage) and (2)) and (2)likelihood likelihood (or(orfrequencyfrequency) of occurrence.) of occurrence. Hazard level is often sHazard level is often spec- pec-ifie

ified in formd in form of aof ahazard criticality index matrixhazard criticality index matrixto aid in prioritizationto aid in prioritization as it is as it is shoshownwn in figure 2.3. Hazard criticality index matrix combines hazard severity and hazard in figure 2.3. Hazard criticality index matrix combines hazard severity and hazard probability/frequency.

probability/frequency.

 Hazard consequence classes/categories.

 Hazard consequence classes/categories. Categorizing the hazard severity reflectsCategorizing the hazard severity reflects worst possible consequences. Different categories are used depending on the worst possible consequences. Different categories are used depending on the indus-try or even the used system. Some examples are:

try or even the used system. Some examples are:

Catastrophic – Critical – Marginal – Negligible Catastrophic – Critical – Marginal – Negligible or

or

Major – Medium – Minor Major – Medium – Minor or simply

or simply

Category 1 – Category 2 – Category 3. Category 1 – Category 2 – Category 3. Consequence classes can vary from

Consequence classes can vary from AcceptableAcceptabletoto UnacceptableUnacceptableas it is shown inas it is shown in figu

figurere 2.32.3.. SeSeveverityrity ofof hazhazardard (fo(forr aa gigivevenn proproducductionplanttionplant)) is is conconsidsidereeredd withwith regregardard to:

to:

Environmental damages Environmental damages Production line damages Production line damages Lost production Lost production £ £  £ £  £ £  £ £  £ £  £ £ 

(16)

Frequen Frequency cy of of 

occurrence occurrence Consequence Consequence class class Major  Major  Medium Medium Minor  Minor  Low

Low MediumMedium HighHigh

Unacceptable Unacceptable high risk  high risk  Acceptable Acceptable low risk  low risk  Fig. 2.3.

Fig. 2.3.The hazard level shown in form of hazard criticality index matrix: hazard conse-The hazard level shown in form of hazard criticality index matrix: hazard conse-quence classes versus frequency of occurrence

quence classes versus frequency of occurrence

Lost workforce Lost workforce Lost company image Lost company image Insurance (premium) Insurance (premium) The basis for defining

The basis for defining hazard consequenhazard consequencece classes is the companclasses is the companyy policy in general.policy in general. Hence, definition of hazard consequence classes are not application specific, i.e. Hence, definition of hazard consequence classes are not application specific, i.e. different companies can define different consequence classes for the same case. different companies can define different consequence classes for the same case.

 Hazard likelihood/frequency of occurrence.

 Hazard likelihood/frequency of occurrence. The hazard likelihood of occurrenceThe hazard likelihood of occurrence can be specified either qualitatively or quantitatively. During the design phase of  can be specified either qualitatively or quantitatively. During the design phase of  a system, due to the lack of detailed information, accurate evaluation of hazard a system, due to the lack of detailed information, accurate evaluation of hazard likelihood is not possible. Qualitative evaluation of likelihood is usually the best likelihood is not possible. Qualitative evaluation of likelihood is usually the best that can be done.

that can be done.

  Accepted frequencies of occurrence.

  Accepted frequencies of occurrence. The basis for definingThe basis for defining accepted frequenciesaccepted frequencies of

of occurreoccurrencencedepends on the company policy on the issue. So, the definition of ac-depends on the company policy on the issue. So, the definition of ac-cepte

cepted frequd frequencieenciess of occof occurrencurrencee is notis not appliapplicationcation specispecific (cafic (can van varyry fromfrom compacompanyny to company). Examples of ranking of frequencies of occurrence are:

to company). Examples of ranking of frequencies of occurrence are:

Frequent – Probable – Occasional – Remote – Improbable - Impossible Frequent – Probable – Occasional – Remote – Improbable - Impossible or

or

High – Medium – Low High – Medium – Low By

By spespecifcifyinyingg hazhazardard conconseqsequenuencece catcategegorieoriess andand freqfrequenuencieciess ofof occoccurreurrencence oneone cancan draw the hazard criticality index matrix as shown in figure 2.3.

draw the hazard criticality index matrix as shown in figure 2.3.

Specifying the acceptable Hazard (risk) level is management’s responsibility  Specifying the acceptable Hazard (risk) level is management’s responsibility .. This corresponds to drawing the thick line in the figure. Following issues have This corresponds to drawing the thick line in the figure. Following issues have im-pact on how to specify the acceptable risk levels:

pact on how to specify the acceptable risk levels:

£ £ 

£ £ 

(17)

2.

2.4 4 HaHazarzard d ananalyalysisis s modmodels els and and tectechnhniquiques es 1717

National regulations National regulations

Application of methods specified in standards, Application of methods specified in standards,

Following the best practice applied by major industrial companies, Following the best practice applied by major industrial companies,

By drawing comparison between socially accepted risks (e.g. risk of driving by By drawing comparison between socially accepted risks (e.g. risk of driving by car, airplane,...),

car, airplane,...),

Policy to propagate the company image. Policy to propagate the company image.

2.4

2.4 Hazard analys

Hazard analysis models and technique

is models and techniquess

There exists various methods for performing hazard analysis, each having different There exists various methods for performing hazard analysis, each having different coverage and validity. Hence, several may be required during the life of the project. coverage and validity. Hence, several may be required during the life of the project. It is important to notice that none of these methods alone is superior to all others for It is important to notice that none of these methods alone is superior to all others for every objective and even applicable to all type of systems. Furthermore, very little every objective and even applicable to all type of systems. Furthermore, very little validation of these techniques has been done. The most used techniques, which are validation of these techniques has been done. The most used techniques, which are listed below, are described extensively in (Leveson 1995):

listed below, are described extensively in (Leveson 1995): Checklists,

Checklists, Hazard Indices, Hazard Indices,

Fault tree analysis (FTA), Fault tree analysis (FTA),

Management Oversight and Risk Tree analysis (MORT), Management Oversight and Risk Tree analysis (MORT), Even Tree Analysis

Even Tree Analysis (ET(ETA),A),

Cause-Consequence Analysis (CCa), Cause-Consequence Analysis (CCa),

Hazard and Operability Analysis (HAZOP), Hazard and Operability Analysis (HAZOP), Interface Analysis,

Interface Analysis,

Failure Mode and Effect Analysis (FMEA), Failure Mode and Effect Analysis (FMEA),

Failure Mode Effect, and Critically Analysis (FMEAC), Failure Mode Effect, and Critically Analysis (FMEAC), Fault Hazard Analysis (FHA),

Fault Hazard Analysis (FHA), State Machine Hazard Analysis, State Machine Hazard Analysis,

In this report, FMEA and FMECA are described more in details, as they have been In this report, FMEA and FMECA are described more in details, as they have been applied ((Bøgh, Izadi-Zamanabadi, and Blanke 1995)).

applied ((Bøgh, Izadi-Zamanabadi, and Blanke 1995)).

2.4.1

2.4.1 FailurFailure Mode and Effect Analyse Mode and Effect Analysis (FMEA)is (FMEA)

This technique was developed by reliability engineers to permit them to predict This technique was developed by reliability engineers to permit them to predict equipment reliability. This is a hierarchical, bottom-up, inductive analysis equipment reliability. This is a hierarchical, bottom-up, inductive analysis tech-nique, where the initiating events are failures of individual components.

nique, where the initiating events are failures of individual components.

The objectives of FMEA are as follows (Russomanno, Bonnell, and Bowles 1994) The objectives of FMEA are as follows (Russomanno, Bonnell, and Bowles 1994)

– to provide a systematic examination of potential system failures;to provide a systematic examination of potential system failures;

– to analyze the effect of each failure mode on system operation;to analyze the effect of each failure mode on system operation;

– to access the safety of various systems or components, andto access the safety of various systems or components, and

– to identify corrective actions or design modifications.to identify corrective actions or design modifications.

£ £  £ £  £ £  £ £  £ £  µ  µ   µ  µ   µ  µ   µ  µ   µ  µ   µ  µ   µ  µ   µ  µ   µ  µ   µ  µ   µ  µ   µ  µ  

(18)

Identify

Identify IndenturIndenturee Level Level

Define Each Item Define Each Item

Define Ground Define Ground Rules Rules Identify Failure Identify Failure Modes Modes

Determine effect of  Determine effect of  Each Item Failure Each Item Failure

Classify Failures Classify Failures

Identify

Identify DetectioDetectionn Methods Methods Identify Identify Compensation Compensation Fig. 2.4.

Fig. 2.4.Major logical steps in the Major logical steps in the FMEA analysiFMEA analysiss

The

The anaanalyslysisis conconsidsidersers thethe efeffecfectsts ofof aa sinsinglegle itemitem faifailurluree onon ovoveraerallll syssystemtem opeoperatirationon and spans a myriad of applications, including electrical, mechanical, hydraulic, and and spans a myriad of applications, including electrical, mechanical, hydraulic, and software domains. Before the FMEA analysis begins, a thorough understanding of  software domains. Before the FMEA analysis begins, a thorough understanding of  the system components, their relationships, and failure modes must be determined the system components, their relationships, and failure modes must be determined and represented in a formalized manner. The indenture level of the analysis (i.e. the and represented in a formalized manner. The indenture level of the analysis (i.e. the item levels that

item levels that describe the relatidescribe the relative complexityve complexity of assembly or of assembly or function)function) must coin-must coin-cide with the stage of design. The indenture level progresses from the most general cide with the stage of design. The indenture level progresses from the most general description; then, properties are added and the indenture level becomes more description; then, properties are added and the indenture level becomes more spe-cialized. The FMEA must be conducted at multiple indenture level; hence a failure cialized. The FMEA must be conducted at multiple indenture level; hence a failure mod

modee can becan be ideidentifintifieded andand tractraceded to the most speto the most speciacializlizeded assassembemblyly forfor easease of e of reprepairair or design modification. Figure 2.4 outlines the major logical steps in the analysis. or design modification. Figure 2.4 outlines the major logical steps in the analysis. The possibl

The possible problems with e problems with the applicathe application of tion of FMEA (also FMECA) are FMEA (also FMECA) are includincludeded in the following:

in the following: 1.

1. Engineers havEngineers have insufficient time for propee insufficient time for proper analysisr analysis 2.

2. AnalysAnalysts have a ts have a poor understpoor understandinanding of g of FMEA and designers are inadequaFMEA and designers are inadequatelytely trained in FMEA.

trained in FMEA. 3.

3. FMEA is perceived as a FMEA is perceived as a difficult, laborious, and mundifficult, laborious, and mundane task.dane task. 4.

4. FMEA knowledge FMEA knowledge tends to be restricted to small groups of spetends to be restricted to small groups of specialists.cialists. 5.

5. ComComputputerierized aids zed aids are needeare needed d to to decdecrearease se the efforthe effort t reqrequireuired d to to proproducduce e aa FMEA.

FMEA.

2.4.2

2.4.2 FailurFailure Mode, Effect and Critically Analye Mode, Effect and Critically Analysis (FMECA)sis (FMECA)

Failure Mode, Effect and Critically Analysis (FMECA) is basically a FMEA with Failure Mode, Effect and Critically Analysis (FMECA) is basically a FMEA with more detailed analysis of the criticality of the failure. In this analysis criticalities more detailed analysis of the criticality of the failure. In this analysis criticalities or priorities are assigned to failure mode effects. Figure 2.5 shows an example of  or priorities are assigned to failure mode effects. Figure 2.5 shows an example of  tables used in this analysis.

(19)

2.

2.4 4 HaHazarzard d ananalyalysisis s modmodels els and and tectechnhniquiques es 1919

Failure Modes an

Failure Modes and Effects Critd Effects CriticallyicallyAnalysisAnalysis S

Suubbssyysstteem m PPrreeppaarreed d bbyy DateDate

Item

Item FailureFailure

Modes

Modes Cause of FailureCause of Failure

Possible

Possible

Effects

Effects PrProbob. . LeLevevell

Possible action to Reduce

Possible action to Reduce

Failure Rate or Effects

Failure Rate or Effects

Fig. 2.5.

Fig. 2.5.A simple FMECA schemeA simple FMECA scheme

2.4.3

2.4.3 Fault TFault Tree Analysis (FTree Analysis (FTA)A)

FTA is primarily a means for analyzing causes of hazards, not identifying hazards. FTA is primarily a means for analyzing causes of hazards, not identifying hazards. The top event in the tree must have been foreseen and thus identified by other The top event in the tree must have been foreseen and thus identified by other tech-niq

niquesues.. FTFTA is,A is, henhence,ce, a top-da top-dowownn seasearchrch metmethodhod.. OncOncee thethe tretreee is consis constructructedted,, it canit can be written

be written as Boolean expressias Boolean expression and on and simplisimplified to fied to show the specific combinatioshow the specific combinationn of basic events that are necessary and sufficient to cause the hazardous event. of basic events that are necessary and sufficient to cause the hazardous event.

Fault-tree evaluation by cut sets.

Fault-tree evaluation by cut sets. The direct evaluation procedures just discussedThe direct evaluation procedures just discussed allo

allow w us us to assess fault to assess fault trees with trees with relatirelatively few branches and basic eventsvely few branches and basic events. . WhenWhen larger trees are considered, both evaluation and interpretation of the results become larger trees are considered, both evaluation and interpretation of the results become more difficult and digital computer codes are invariably employed. Such codes are more difficult and digital computer codes are invariably employed. Such codes are usually formulated in terms of the minimum cut-set methodology discussed in this usually formulated in terms of the minimum cut-set methodology discussed in this section. There are at least two reasons for this. First, the techniques lend themselves section. There are at least two reasons for this. First, the techniques lend themselves well to the computer

well to the computer algorithms, and second, from them a algorithms, and second, from them a good deal of intermediategood deal of intermediate information can be obtained concerning the combination of component failures that information can be obtained concerning the combination of component failures that are pertinent to improvements in system design and operations.

are pertinent to improvements in system design and operations.

The discussion that follows is conveniently divided into qualitative and quantitative The discussion that follows is conveniently divided into qualitative and quantitative analysis. In qualitative analysis information about the logical structure of the tree analysis. In qualitative analysis information about the logical structure of the tree is used to locate weak points and evaluate and improve system design. In is used to locate weak points and evaluate and improve system design. In quantita-tive analysis the same objecquantita-tives are taken further by studying the probabilities of  tive analysis the same objectives are taken further by studying the probabilities of  component failures in relation to system design.

component failures in relation to system design.

Qualitativ

Qualitative e analysis.analysis. In the following the idea of minimum cut-sets is introducedIn the following the idea of minimum cut-sets is introduced and then it is related to th qualitative evaluation of of fault trees. We then and then it is related to th qualitative evaluation of of fault trees. We then dis-cuss briefly how the minimum cut sets are determined for large fault trees. Finally, cuss briefly how the minimum cut sets are determined for large fault trees. Finally, we discuss their use in locating system weak points, particularly possibilities for we discuss their use in locating system weak points, particularly possibilities for common-mode failures.

common-mode failures.

minimum cut-set

minimum cut-set formulatiformulation.on. A minimum cut set is defined as the smallest com-A minimum cut set is defined as the smallest com-bin

binatiationon ofof primprimaryary faifailurelures whichs which,, if theyif they occoccurur,, will cawill cause the topuse the top eveventent to occuto occurr.. It,It, therefore, is a combination (i.e. intersection) of primary failures sufficient to cause therefore, is a combination (i.e. intersection) of primary failures sufficient to cause the top event. It is the smallest combination in that all the failures must take place the top event. It is the smallest combination in that all the failures must take place for the top event to occur. If even one of the failures in the minimum cut set does for the top event to occur. If even one of the failures in the minimum cut set does not happen, the top event will not take place.

(20)

The terms minimum cut set and failure mode are sometimes used The terms minimum cut set and failure mode are sometimes used interchange-ably

ably.. HowHoweveever,r, therethere is is a a subtlesubtle diffedifferencerence that that we we shallshall obserobserveve hereahereafterfter.. InIn reliabreliabil- il-ity calculations a failure mode is a combination of component or other failures that ity calculations a failure mode is a combination of component or other failures that cause the system to fail, regardless of the consequences of the failure. A minimum cause the system to fail, regardless of the consequences of the failure. A minimum cut set is usally more restrictive, for it is the minimum combination of failures that cut set is usally more restrictive, for it is the minimum combination of failures that causes the top event as defined for a particular fault tree. If the top event is defined causes the top event as defined for a particular fault tree. If the top event is defined broadly as system failure, the two are indeed

broadly as system failure, the two are indeed interchangeable.interchangeable. Usually, Usually, howevhowever,er, thethe top event encompasses only the particular subset of system failures that bring about top event encompasses only the particular subset of system failures that bring about a particular safety hazard.

a particular safety hazard.

B

B

C

C

A

A

B

B

C

C

E4

E4

A

A

E3

E3

E

E

1

1

E

E

2

2

T

T

Fig. 2.6.

Fig. 2.6.Example of a fault treeExample of a fault tree

The origin for using the term cut set may be illustrated graphically using the The origin for using the term cut set may be illustrated graphically using the re-duced fault tree in Fig. 2.7. The reliability block diagram corresponding to the tree duced fault tree in Fig. 2.7. The reliability block diagram corresponding to the tree is shown in Fig. 2.6. The idea of a cut set comes originally from the use of such is shown in Fig. 2.6. The idea of a cut set comes originally from the use of such

(21)

2.

2.4 4 HaHazarzard d ananalyalysisis s modmodels els and and tectechnhniquiques es 2121

A A B B C C Fig. 2.7.

Fig. 2.7.Minimum cut-sets on a Minimum cut-sets on a reliabilreliability block diagram.ity block diagram.

diagrams for electric apparatus, where the signal enters at the left and leaves at the diagrams for electric apparatus, where the signal enters at the left and leaves at the right. Thus the minimum cut set is the minimum number of components that must right. Thus the minimum cut set is the minimum number of components that must be cut to prevent the signal flow. There are two minimum cut sets,

be cut to prevent the signal flow. There are two minimum cut sets, consisting of consisting of  components

components andand , and, and , consisting of component, consisting of component ..

a1 a1 a2 a2  b1  b1 a3 a3 a4 a4  b2  b2 c c Fig. 2.8.

Fig. 2.8.Minimum cut-setMinimum cut-sets on s on a reliability block diagram of a a reliability block diagram of a sevseven component system.en component system.

Practice

Practice: Find the minimum cut sets in Fig. 2.8.: Find the minimum cut sets in Fig. 2.8.

For larger systems, particularly those in which the primary failures appear more For larger systems, particularly those in which the primary failures appear more than once in the fault tree, the simple geometrical interpretation becomes than once in the fault tree, the simple geometrical interpretation becomes problem-atical. However, the primary characteristics of the concept remain valid. It permits atical. However, the primary characteristics of the concept remain valid. It permits the logical structure of the fault tree to be presented in a systematic way that is the logical structure of the fault tree to be presented in a systematic way that is amenable to interpretation in terms of the behavior of the minimum cut-sets. amenable to interpretation in terms of the behavior of the minimum cut-sets.

Suppose that the minimum cut sets of a system can be found. The top event, Suppose that the minimum cut sets of a system can be found. The top event, system failure, may then be expressed as the union of these sets, Thus if there are system failure, may then be expressed as the union of these sets, Thus if there are

minimum cut sets, minimum cut sets,

(2.46) (2.46) Each minimum cut set then

Each minimum cut set then consiconsists of sts of the intersecthe intersection of the tion of the minimum numbminimum number of er of  primary failures required to cause the top event. For example, the minimum cut sets primary failures required to cause the top event. For example, the minimum cut sets for the system shown in Fig. 2.8 are

for the system shown in Fig. 2.8 are

Å  Å   ½ ½        Å  Å   ¾ ¾     Æ Æ  Ì Ì     Å  Å   ½ ½     Å  Å   ¾ ¾   ¡¡ ¡¡ ¡¡     Å  Å   Æ Æ    

References

Related documents

National Conference on Technical Vocational Education, Training and Skills Development: A Roadmap for Empowerment (Dec. 2008): Ministry of Human Resource Development, Department

Proposed Framework Form a project team Communicat e objectives to all concerned Confirm the value drivers Define Objective s. Based on the results of data analysis, the

4.1 The Select Committee is asked to consider the proposed development of the Customer Service Function, the recommended service delivery option and the investment required8. It

The aim of this paper is to present the experiences of seven family therapy trainees regarding their personal paths toward the development of

• Follow up with your employer each reporting period to ensure your hours are reported on a regular basis?. • Discuss your progress with

And we didn't, did we, Hiccup?" When my father got to the bit about how Humungous the Hero had appeared out of nowhere after all those years when everybody thought he was dead,

HealthLink SmartForms enable a healthcare provider to share structured patient information in real time with any other healthcare provider. This creates significant efficiencies

National Association for the Education of Young Children & National Association of Early Childhood Specialists in State Departments of