Basic Quantification of the Fault Tree and Associated Data Used

7. Quantitative Evaluations of a Fault Tree

7.1 Basic Quantification of the Fault Tree and Associated Data Used

To quantify the probability of the top event of the FT a probability for each basic event (BE) in the fault tree must be provided. These BE probabilities are then propagated upward to the top event using the Boolean relationships for the FT. This process was illustrated when the basic gates of the FT were discussed in Chapter 6. The BE probabilities can be propagated upward using BDDS that represent the FT structure. Alternatively, the minimal cut sets can be generated from the FT and then used to quantify the top event. The minimal cut set generation approach is used by most FT software because of the additional, important information provided by the minimal cut sets.

Since the top event is expressed as the union of the minimal cut sets, the probability of the top event can be approximated as the sum of the individual minimal cut set probabilities, provided these probabilities are small. This is typically an accurate approximation for basic event probabilities below 0.1. The approximation is termed the “rare event approximation.” Other, more accurate approximations can be applied if needed using standard bracketing approaches to compute the probability of a union. However, FT software often uses the simple sum of the minimal cut set probabilities as the standard quantification method. A user must be aware of this, and use a more accurate quantification method, considering the intersection of the minimal cut sets, if minimal cut set probabilities are high (if the sum of the minimal cut set probabilities is greater than 0.1).

Since a minimal cut set is an intersection of BEs, the probability of a minimal cut set is simply the product of the individual BE probabilities. Thus, the probability of the top event is expressible as the sum of the products of individual BE probabilities. This expression is called the sum of products approximation. It has a relative accuracy of at least 10% (at least two significant figures) if the BE probabilities are less than 0.1. Further, even if some BE probabilities are greater than 0.1, the approximation is accurate if most of the probabilities are less than 0.1. The accuracy can be checked by carrying out a second order bracketing approximation (i.e., including the intersections between all pairs of minimal cut sets) and comparing the results for the top event. Additionally some FT codes estimate the accuracy of the sum of products approximation.

In terms of symbols the sum of products expression is given as:

P(Top) = Σ P(Mi) (7.1)

P(Mi) = P(BE1)P(BE2) … P(BEk) (7.2)

where the term “P( )” denotes the probability of the enclosed event, “Top” denotes the top event, “Mi” denotes a particular minimal cut set, “BE” a basic event, and k the number of basic events

If the FT is a small one and has relatively few events then all the minimal cut sets of the FT can be generated. If the FT is large, say more than 100 events, and has AND gates with OR gate inputs then the number of minimal cut sets for the FT can be too large to be exhaustively generated. For some large FTs that have been constructed, the numbers of minimal cut sets have exceeded one million. Most FT software includes techniques for estimating the total number of minimal cut sets in the FT based on the number of BEs, the number of gates and the types of gates. When the number of minimal cut sets is large then the number generated can be truncated for the sum in Equation (1). In this case, using the probabilities of the BEs and the logic structure of the FT, minimal cut sets are not generated if their probability is below some cutoff value, such as 1×10-12

. The minimal cut set probability can be estimated (bounded) using the logic structure of the FT and probability bounding techniques. FT software documentation describes these techniques.

The input data that must be supplied for a BE is usually one of four basic types: 1) a component failure probability in some time interval,

2) an event occurrence probability in some time interval, 3) a component unavailability, and

4) a pure event probability.

To calculate the component failure probability over some interval, a component failure rate

(expressed in failures per unit time) and the elapsed mission time must be supplied as input data.

The elapsed mission time is the sum of the entire time over which the component is in an operating and non-operating state. The operating state is defined as a state where the component is subjected to the stresses of actual operation that may involve the transfer of energy, signals or information. The non-operating state is defined as the state when the component has been “shutdown” and is subject only to environmental stresses due to its location within a system. The failure rate can be defined as

λ = λOd + λN(1-d),

where

d = fractional duty cycle (total operating time/total mission time) λO = the component failure rate in the operating state

λN = the contribution to the component failure rate from the non-

operating state.

The standard assumption is to assume a constant failure rate. This is based on the further assumption that there is no aging (wearout) and that burn in of the component was 100%

effective and removed all infant mortalities from the population of parts used for assembly, and therefore the failures are purely random∗.

The foundation of a good analysis is the pedigree of failure rate or event probability data that is assigned to basic events. A good faith effort must be made to obtain the best failure rate data that is available. The uncertainty in failure rate data depends in large part on the applicability of the data (its source). A failure rate should apply to the particular application of a component, its operating environment, and its non-operating environment. The failure rate data hierarchy is as follows:

1. Actual mission data on the component,

2. Actual mission data on a component of similar design, 3. Life test or accelerated test data on the component, 4. Life test or accelerated test data on a similar component, 5. Field or test data from the component supplier,

6. Specialized data base or in-house data base on similar components, and 7. Standard handbooks for reliability data.

The component failure probability P, which is also called the unreliability, is determined from the formula

P = 1- e-λt, (7.3)

where λ is the component failure rate and t is the relevant time interval. For small values of λt (λt<0.1) the above formula for P simplifies to

P ≅ λt. (7.4)

The units of λ are the failure probability per unit time, e.g. per hour of exposure. For most FT software, the failure rate and time interval can be separately input as data.

Table 7-1 illustrates a sample of component failure rate data used for the Space Shuttle PRA evaluation. The component failure rates are given in the column labeled “Rate” and are in units of per hour. The failure rates are assembled from historical data and from expert inputs. The other columns give attributes of the component failure rate that allow component failure probabilities to be calculated for the appropriated component failures identified in the Shuttle PRA model. Uncertainty information associated with the component failure rate estimates is given in the last three columns for uncertainty propagation.

Table 7-1. Illustrative Component Failure Rate Data

N AME D ESC EVEN T LEVEL PART NAM E

PAR T

N U MBER SO U R CE R ELAT ED C ILLO C ATE SU BSYST EM N U MB ER O F FAILU R ES EXPO SU

RE UN ITS R AT E D ELT A T PF M ISSIO N D IST UN C ER TAIN T Y PARM1 (M ED IAN ) PARM2 (EF) 042BD0101A APU 1 BURST DISK FAILS TO BURST BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU1SC D 04-2-BD01- 01 1R/2 04-2 0 0 X 2.55E-05 1 0.0000255 L 6.58E-06 15 042BD0101A APU 2 BURST DISK FAILS TO BURST BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU2SC D 04-2-BD01- 01 1R/2 04-2 0 0 X 2.55E-05 1 0.0000255 L 6.58E-06 15 042BD0101A APU 3 BURST DISK FAILS TO BURST BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU3SC D 04-2-BD01- 01 1R/2 04-2 0 0 X 2.55E-05 1 0.0000255 L 6.58E-06 15 042BD0102A APU 1 BURST DISK EXTERNAL LEAKAG E BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU1FL K, PEAPU1SC D 04-2-BD01- 02 1/1 04-2 0 0 H 2.55E-05 217.5 0.0055463 L 6.58E-06 15 042BD0102A APU 2 BURST DISK EXTERNAL LEAKAG E BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU2FL K, PEAPU2SC D 04-2-BD01- 02 1/1 04-2 0 0 H 2.55E-05 217.5 0.0055463 L 6.58E-06 15 042BD0102A APU 3 BURST DISK EXTERNAL LEAKAG E BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU3FL K, PEAPU3SC D 04-2-BD01- 02 1/1 04-2 0 0 H 2.55E-05 217.5 0.0055463 L 6.58E-06 15 042BD0103A APU 1 BURST DISK INTERNAL LEAKAG E O R PREM ATU RE RUPTURE BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU1SC D 04-2-BD01- 03 1R/2 04-2 0 0 H 2.55E-05 217.5 0.0055463 L 6.58E-06 15 042BD0103A APU 2 BURST DISK INTERNAL LEAKAG E O R PREM ATU RE RUPTURE BE BURST DISK - SEAL CAVITY DRAIN 48-6806, M E251- 0017-0001 PEAPU2SC D 04-2-BD01- 03 1R/2 04-2 0 0 H 2.55E-05 217.5 0.0055463 L 6.58E-06 15

The second type of probability data, an event occurrence probability in some time interval, is similar to a component failure probability in some time interval. An event occurrence rate and the time interval must be supplied. The formulas for the probability of the event occurrence are the same as above, with λ now the event occurrence rate. Event occurrence rates and exposure time intervals are used for fire occurrences, rupture occurrences, and other initiating or occurring events for which there are data on event occurrence rates (e.g., in units of per year or per hour) for the event.

The third type of probability data that may be required is an unavailability. This data is supplied for a component that is repairable or checkable. Events in the FT requiring this type of data include cases where a component is out of service and unavailable if called upon to operate. For a component unavailability, a component failure rate and repair time or test interval must be supplied. The specific data required and the associated expression for the component unavailability depends on the type of component. The two usual expressions for the component unavailability “q” are

q= λ0τ/(1+λ0τ) ≅ λ0τ for an operating component (7.5)

and

q = (1/2)λsT/(1+1/2 λsT) +1-e-λ0

τ_{≅ (1/2)λ}

In the above formulas, for an operating component λ0 is the operating failure rate and τ is the

average repair time. For a standby component λs is the standby component failure rate, T is the

test or inspection interval, λ0 is the operating failure rate and τ is the operating time if the

component needs to operate for a time after being called upon. (Note that strictly speaking, q in Equation 7.6 is really an unavailability (the first term) plus an operational failure probability (the second term)). An example of a standby component would be a battery that is called upon to operate if normal electric power is lost. An example of an operating component would be an operating pump that continuously circulates water through a cooling system. For most FT software, the component failure rate(s), e.g., λ0 and/or λs, and the appropriate time intervals, e.g.

τ and/or T are input. More detailed expressions for the unavailability can be applied for special situations such as when component testing is staggered. FT software documentation frequently describes these other expressions.

The fourth, and last type of probability data that may be required is a pure event probability. A pure event probability is also sometimes called a probability per act or probability per demand. A pure event probability is not decomposed into more basic parameters and can be input for any event. It generally is only input for an event for which a failure rate or occurrence rate per unit time is not recorded. Examples of events for which pure event probabilities are generally input are human errors (input as a probability per demand or per act) and pivotal events, which are generally conditional probabilities. Another example of an event probability would be the probability of a relief valve failing to lift once demanded (i.e., once the system pressure exceeds the lift pressure). Some components have both a pure demand failure rate and a per-hour failure rate. For these components, the demand failure rate needs to be added to the probability of component failure using the per-hour failure rate.

In document Prepared for NASA Office of Safety and Mission Assurance NASA Headquarters Washington, DC 20546 (Page 96-100)