System Failure Data and System Reliability Prediction

A dataset o f partially masked and censored system life data contains three types of observations:

1. Censored. The system is taken offline while still in operational condition. In a competing risks situation, i.e. where components are configured such that they are logically connected in series, this means that none o f the components failed.

2. Failure. The system experiences a failure o f one o f its components. Diagnostic efforts led to the identification o f the faulty component.

3. Masked. The system experiences a failure o f one o f its components. Diagnostic efforts did not lead to the unique identification o f the faulty component. If none o f the components in the system could be identified as unfailed, the observation is fully masked. However, if one or more components were determined to be in

operational condition when the system was taken offline, the observation is called partially masked.

There can be more than one cause for the failure o f any equipment or product. Some examples are:

• A semi-conductor device can fail at the junction or at a lead,

• A ball bearing can fail because o f the failure o f a ball, the inner or outer race,

• The lubricating system o f a car can fail because o f the deteriorated condition o f the lubricating oil or due to leakage o f the oil through a failing seal.

Most industrial equipment can fail because o f failure o f any one o f its many constituting components. The failure data o f such equipment, if available to product designers or maintenance engineers, is o f great use. It can help in identifying the different causes o f failure o f the product as well as their frequency o f occurrence. Consequentially, maintenance engineers can use this information to reduce the failure frequency o f the equipment in an efficient manner, thereby leading to a more reliable and efficiently operating piece o f equipment. In addition, this information can be used to design better devices that use some o f the same components in different configurations.

Observations o f system failure can play an important role in decision-making during the design phase o f new systems. During that period, several alternative prototypes can be designed. The reliability o f these alternatives needs to be considered before one o f them enters into the process development phase where significant investments in production processes will be made. Estimates o f equipment reliability need to be made in order to verify that the alternatives can achieve the system reliability requirements. One

way to verify this is by performing a life test. In such a test, a sample o f newly designed equipment can be tested and its failures and censoring (unfailed removal) times registered. Statistical techniques can then be applied to determine a statistical model or reliability distribution, hazard rate or other characterizations o f the equipment population. Although these tests can be performed under normal field conditions, they are generally conducted under extreme conditions, thus accelerating the failure processes and life. Such Accelerated Life Tests (ALT) allow the estimates to be made faster and with reduced costs as tests under normal field conditions would. The ALT observations need to be transformed into observations under normal field conditions through some assumed model. This transformation should be such that resulting observations can serve as a basis for estimation o f equipment reliability under normal field conditions.

Internal life tests cannot be performed in situations where a sample o f the newly designed equipment is unavailable. This is the case when a reliability estimate is needed in advance o f prototype development or when such prototypes will not be developed at all. That could occur when building a prototype is expensive, time-consuming or requires resources that aren’t available. In such cases, the design engineer is forced to predict the reliability o f his equipment design, rather than estimate it from test observations.

Design or reliability engineers generally perform the reliability prediction o f an equipment design by considering the equipment as a system consisting o f many components. System reliability prediction is then achieved by: 1) compiling a bill o f materials listing the type and quantity o f components, 2) determining their configuration, 3) predicting the component reliabilities in the intended operating environment, and 4) calculating the system reliability [Usher et al, 1990; Usher and Hodgson, 1990].

In the first step, a list of components that form the system is made. It is relatively easy for a design engineer to produce such a list, as it is a common output o f the design process. However, the other steps require more effort to complete.

A piece o f equipment or product can be viewed as a system o f connected components, any o f which can fail. The way components are connected in a system is referred to as its configuration. In a simple system consisting o f two components, ju st two basic configurations are possible: parallel or series. If the components are logically connected in parallel, then the system will function as long as at least one component is in operational state. System failure requires both components to fail. However, if components are logically connected in series, the system will fail as soon as the first o f the components fails. In more complicated systems consisting o f many components, other configurations, such as x-out-of-y, are possible. Configuration information is readily available from schematic diagrams.

Once a list o f components and their configuration information has been collected, component reliability information is needed in order to perform a system reliability prediction. The calculation o f predicted system reliability based on this information can be difficult if the failure processes are interdependent. However, it is much less complicated when independence is assumed. Other simplifying assumptions, such as the series-system assumption, can be made in many cases in the description o f the configuration. For complex systems, this assumption is commonly made to make system reliability estimation tractable [Usher and Hodgson, 1990]. This makes the calculation of system reliability a relatively easy step [Usher et al, 1991]. Many techniques exist for this estimation process, amongst which Monte Carlo simulation and Fault Tree Analysis.

The accurate estimation o f component reliabilities under intended operating conditions is arguably the most difficult step in the system reliability prediction process [Usher et al, 1990; Usher and Hodgson, 1990; Usher et al, 1991]. The research efforts described in this document concern methodological support for this step.

In document Component Reliability Estimation From Partially Masked and Censored System Life Data Under Competing Risks. (Page 30-34)