Fault Tree Analysis - State-of-the-Art of Reliability Analysis and Optimisation Techniques

2 Literature Review

2.7 State-of-the-Art of Reliability Analysis and Optimisation Techniques

2.7.4 Fault Tree Analysis

Through inspecting the logic between the undesired events that could happen in a system or a mission, Fault Tree Analysis (FTA) allows to trace back the root cause of a system or mission failure by using a systematic top-down approach. Moreover, the probability or frequency of a system or mission failure can be calculated via Boolean algebra. The FTA provides a straightforward and intuitive presentation of the logic between undesired events and is therefore regarded as an effective, systematic, accurate and predictive method for dealing with the safety and reliability problems in complex systems, such as the safety issues in a nuclear power plant [71]. A more detailed

description of the FTA can be found in [17].

In order to construct a fault tree, the undesired outcome, also known as the ‘Top Event’, should be defined first. Then, the Fault Tree (FT) will be developed based on the various causes of the ‘Top Event’. Following this logic, the tree is continually redefined through a series of intermediate events until component failure events, known as basic events, are obtained. Here, the intermediate event denotes an event that can be described as a combination of other events or basic events through logic gates. Once the failure probabilities of basic events are available, the quantitative analysis of the FT can be conducted to calculate the system failure parameters and event importance measures. In terms of structure, a fault tree is composed of a variety of events and gates. The ‘Top Event’ and intermediate events are indicated by rectangular boxes. The corresponding information about the events is given in the boxes. In the FT, circles are used to represent basic events, labelled with abbreviations or codes standing for failure modes. They cannot be broken down further unless additional information is provided. The relationship between the events is indicated by gates. The gates with different functions are donated using different symbols. Some events and basic gates that are often used in FTA are listed in Table 2.2 and Table 2.3 [17], respectively, to facilitate understanding.

Once the construction of the fault tree is completed, logical analysis can be conducted. In this process, the cut sets (CSs) will be found. A CS is a list of failure events that will lead to the occurrence of the top event once all these failure events occur simultaneously. In addition, as there are always many CSs in industrial engineering systems, the CSs should be simplified by eliminating those redundant terms. In order to find the minimal, necessary and sufficient conditions for the occurrence of the top event, the concept of minimum cut sets (MCSs) is introduced. This is known as the qualitative FT analysis. These MCSs are the smallest combination of those basic events that can cause the undesired top event. The FT model is a qualitative model because what it shows are the logical relations between the events. But it can be

evaluated quantitatively as well based on the failure possibilities in a system.

Table 2.2 Event symbols

Event Symbol Event Name Meaning of symbol

Top event or

intermediate event System or component event description.

Basic event

The lowest level fault that cannot be further broken down. It usually represents a component failure mode.

Transfer symbol It indicates that this part of the fault tree is developed elsewhere on the fault tree.

Once the MCSs are identified, quantification of the FT can be conducted using Boolean logic. Selected basic mathematical laws of Boolean algebra are given here as examples. The symbols “.” and “+” are used to represent the logical AND and OR operators respectively [72].

1. Commutative Law:

𝐴. 𝐵 = 𝐵. 𝐴 (2.4)

𝐴 + 𝐵 = 𝐵 + 𝐴 (2.5)

𝐴. (𝐵. 𝐶) = (𝐴. 𝐵). 𝐶 (2.6) 𝐴 + (𝐵 + 𝐶) = (𝐴 + 𝐵) + 𝐶 (2.7) 3. Distributive Law: (𝐴 + 𝐵). (𝐶 + 𝐷) = 𝐴. 𝐶 + 𝐴. 𝐷 + 𝐵. 𝐶 + 𝐵. 𝐷 (2.8) 4. Idempotent Law: 𝐴 + 𝐴 = 𝐴 (2.9) 𝐴. 𝐴 = 𝐴 (2.10) 5. Absorption Law: 𝐴 + 𝐴. 𝐵 = 𝐴 (2.11)

Table 2.3 Gate symbols

Gate Symbol Gate Name Causal Relation Valid Number of Inputs

OR Output event occurs if at least

one of the input events occur.  2

AND Output event occurs if all input

events occur simultaneously.  2

NOT Output event occurs if the input

event does not occur. 1

INHIBIT

Output event only occurs if input event occurs and the condition event exists.

It is worth noting that the ‘NOT’ gate in the FT is generally not encouraged as it will make the FT non-coherent. Once ‘NOT’ gates are present in a fault tree, they will imply that both the failure of components and working states can lead to system failure [73]. Traditionally, this is regarded as a poor system design, where both components and working states can cause the failure of the system. In addition, ‘NOT’ logic can also increase the complexity of both qualitative and quantitative analyses because it results in a non-coherent fault tree structure [74]. For non-coherent fault trees, each possible cause of system failure is called a prime implicant set, which is a combination of component failure states and component working states. They both are necessary and sufficient to cause the top event failure.

In the process of the FTA, a criterion, namely importance measures, is used to rank components, basic events, or cut sets based on their contribution to the occurrence of a system failure [75, 76, 77, 78]. According to the calculation results of this criterion, the top contributors to system unreliability can be readily identified so as to improve system reliability more effectively. Nowadays, a few different component importance measures have been defined based on the different interpretations of the concept component importance. They are briefly described below.

Birnbaum’s Measure of Importance, also known as the criticality function, defines

the probability that the system is in a critical state for a particular component. It means that if the component fails, the system will breakdown. The criticality function (𝐺_𝑖) for a component i at time t can be expressed as:

𝐺𝑖(𝑞(𝑡)) =

𝜕𝑄𝑠𝑦𝑠(𝑡)

𝜕𝑞_𝑖(𝑡) (2.12)

It is the partial derivative of the system unreliability (𝑄_𝑠𝑦𝑠(𝑡)) with respect to failure probability of component i (𝑞_𝑖(𝑡)).

probabi1ity of component i itself on the base of Birnbaum’s Measure of Importance. It calculates the probability that the system is in a critical state for component i and that has failed, which can be obtained by

𝐼𝐶𝑀𝑖 =

𝐺𝑖(𝑞(𝑡))𝑞𝑖(𝑡)

𝑄𝑠𝑦𝑠(𝑡)

(2.13)

Fussell-Vesely Importance Measure for Cut Set, (𝐼𝐶), ranks the minimal cut sets

based on their contribution to the occurrence of top event. It calculates the probability of occurrence of the minimal cut set i (𝑃(𝐶_𝑖)) given that the system has failed, i.e.

𝐼_𝐶_𝑖 = 𝑃(𝐶𝑖)

𝑄_𝑠𝑦𝑠(𝑞(𝑡)) (2.14)

The application of the FTA has been extended to the analysis of phased missions, which are made up of consecutive time intervals, phases, with distinct and differing objectives. Accordingly, different phased missions will have different failure logic models. The failure of a mission is expressed as the loss of the function of the system during at least one of the phases. It was studied firstly by Esary and Ziehms in 1975 [79]. In that study, a mission was seen as being successfully completed only after all phases are completed successfully. In [80], a method for computing the probability of system failure in each phase was developed by La Band and Andrews by using non- coherent fault trees with NOT gates. This was achieved by combining the causes of system failure by the end of phase 𝑝 with the causes of system success from the start of the mission to the end of phase 𝑝 − 1. The analytical method for obtaining the unreliability of each phase was presented. The mission unreliability can be obtained by calculating the sum of the phase unreliability. Figure 2.2 shows a system that works successfully from the beginning of the mission to the end of the phase 𝑝 − 1 but fails during phase 𝑝. The fault tree of each phase is manifested by the logic of relevant basic failure events, which could have occurred in the phase. Assuming the components are

unrepairable, then if a component is found failed in a phase, it could have failed in any of the phases before the end of the current phase. This can be expressed using Equation (2.15):

𝐴_𝑖,𝑗 = 𝐴_i+ 𝐴_i+1+ ⋯ + 𝐴_𝑗 (2.15)

where 𝐴_i represents the basic failure event A occurring in phase 𝑖 and 𝐴_i,j represents that event A occurs in any phases from 𝑖 to 𝑗.

Figure 2.2 General phase 𝑝 failure fault tree

except that the basic failure modes of the light detection and ranging (LIDAR) system and the camera-based computer vision (CV) system of AGVs were identified by Duran et al. by using the FTA in 2013 [15]. In their study, human injury, property damage and vehicle damage were defined as the three top events in the fault trees. The contributions of the LIDAR and CV failures to these top events are illustrated in Figure 2.3 [15]. By further applying the probabilistic analysis using Bayesian Belief Network, they confirmed that the reliability and availability of the AGV LIDAR and CV subsystems are important to the system safety. However, their research did not cover all components and subassemblies in the AGVs.

However, both the FTA and FMECA show limitations in practical application. For example, although the top down analysis of the FTA discloses the causes of failures occurring in a system, the detectability of the ‘failure causes’ are not taken into account in the process of the FTA. In addition, the FTA is not good at identifying all possible basic failure events and the local effects due to the failure since it is a deductive top- down method. By contrast, the FMECA considers the criticality and detectability of ‘failure causes’, however it only gears towards analysing the failure modes of individual component, while fails to analyse the reliability of the whole system like the FTA does. This is why the FTA and FMECA are used in combination in the research of this thesis.

In document Enhancing the performance of automated guided vehicles through reliability, operation and maintenance assessment (Page 56-64)