• No results found

The 1-parameter Exponential probability density, cumulative failure, reliability and hazard distribution functions are given below in Equations (3), (4), (5) and (6),

2.5 Fully Masked Datasets

If an observed system failure occurred where all components are suspect o f causing the failure, the observation is considered fully masked. No information is available on which components certainly did not cause the failure. Datasets consisting entirely o f these observations could be analyzed to determine underlying component distributions. Arts et al. [1997] describe such research, which was performed as part o f this research effort. This concerns research in category N/IN/P/CW/N. Some standard software packages have modules that address this problem. They are briefly discussed as well. The assumptions that are commonly made in this research category are described next.

46

2.5.1 Assumptions

1. The system consists o f two components connected in series, where the system fails as soon as any one o f these components fails. Simultaneous failure o f the two components at the exact same time cannot occur.

2. One configuration is tested, D = l.V i: K{ = K, J[ = O j = constant, J\k = Ojfc.

3. The system configuration d has no more than one component o f type k. Then Mfc becomes meaningless and OSifc cannot exceed 1. Thus, Mfc = 0, OSfk < 1, if Vi: Jfc = 1. Since D — 1 and V/: J fk = 1, then k = j .

4. All observations are system failures with known failure time 7/, and therefore no censored observations were made. However, the cause o f failure o f each observation is fully masked. Then, S = C = 0, and M = N .

5. The probability o f masking is independent o f the cause o f failure.

6. Component life-lengths are two-parameter Weibull-distributed with shape parameter /?£ and scale parameter afc, thus Ok = (afo fifd- Therefore, the location parameter, present in three-parameter Weibull distributions, is set at 0. This assumption, that components have no “safe life” period in which they cannot fail, is common in reliability problems.

7. Upon repair, systems will behave as good as new when returned into field service. 2.5.2 Introduction to Fully Masked Datasets

One o f the most commonly used techniques to analyze failure data resulting from competing failure modes acting on a machine or a product is the MLE method (described in more detail later on). It has been used in the past by researchers to estimate:

47

• the relationship between life and stress for each failure mode, • the life distribution at any stress with all modes acting, and,

• the life distribution that would result if certain failure modes were eliminated.

Moeschberger and David [1971], as well as Nelson [1982] analyzed failure data from a single population with competing failure modes using the Maximum Likelihood Estimation technique. These methods require that the failure mode (cause) be identified. However, Nelson [1982] showed the use o f the MLE method to the case when the failure modes are not identified as well.

Zhao and Xie [1994] study the use o f the EM algorithm on fully masked system life datasets. The data concern number o f software faults detected per test period. They conclude that maximum likelihood estimates for super-positioned non-homogeneous Poisson processes o f these data do not result in unique solutions if only system test data are used. No maximum likelihood estimates o f parameter values o f component lifetime distributions can be obtained if no additional information is given! Despite the extensive research conducted on the theory o f competing risks, it has not yet been developed to a stage allowing accurate estimation o f parameter values o f component lifetime distributions from fully masked system life datasets.

2.5.3 New Methods

As part o f this research, efforts were made to solve the problem o f component reliability estimation from fully masked system life datasets. Several methods were developed and implemented into a C software program (see [Kemighan and Ritchie, 1988]). The software was used to test the methods and preliminary results were published [Arts et al.,

48

1997], The reference shows the use o f the MLE method to estimate the distribution parameters o f two Weibull failure mode distributions from uncensored, fully masked system life data. The results were compared with those obtained in experimental runs using some techniques developed as part o f the research. The accurate estimation of distribution parameters o f fully masked system life datasets would be very useful in improving machine reliability in an efficient manner, both from the viewpoint o f a manufacturer as from a user perspective.

Through the MINESS-method o f Arts et al. [1997], estimates o f both shape and scale Weibull parameter values are obtained upon allocating masked data points to distributions. A component failure cause is assigned to each system failure observation. This step could be considered the “unmasking” o f masked observations. The allocation is based on the relative magnitude o f the failure densities at the masked system failure times calculated using assumed parameter values. Note that the MINES S method does not attempt to maximize the number of correct assignments o f each individual observation to one o f the possible failure modes. A larger number o f correct cause allocations does not necessarily result in better component distribution estimates.

The MINES S method assumes different combinations o f parameter values and determines their goodness-of-fit. The accuracy of the estimation is gauged by minimizing the total error sum o f squares over all observations. Preliminary results on simulated data suggest that this method generates relatively accurate estimates o f parameter values. However, testing has been limited in scale and censored observations have been ignored. Further details on this research effort can be found in Appendix B. Fully Masked Data”.

49

As this effort resulted in the methods for partially masked and censored datasets described in later chapters, the reader is advised to review the appendix first.

Abemethy [1993] mentions that failure modes with high hazard rates early on in the life o f the system, will “cover” failure modes, which would have occurred later on. These modes might never occur as the system fails for other reasons first. The WeibullSmith commercial-off-the-shelf (COTS) software was developed based on his research [Fulton Findings, 1993]. It contains a module called “Bi-Weibull” which attempts to determine component reliability functions from fully masked system failure data. Arts et al. [1997] describe some results o f applications o f this module to simulated datasets (see page 226).

ReliaSoft Corporation [1997] developed the COTS software Weibull++. Its reference manual devotes a chapter to “Multiple Population (Mixed Weibull) Analysis”. It highlights the importance o f mathematically describing each failure mode in such a set separately. The method is illustrated with an example o f a life test where the systems are observed till failure, inspected to determine the failure mode that occurred and segregated into sets based on failure mode. Therefore, it assumes no censored observations and no masking. Sets are a mixture o f Weibull distributions, not a result o f competing risks acting on the system. The software uses a modified Levenberg-Marquardt algorithm when performing non-linear regression on a Mixed Weibull distribution.

2.5.4 Maximum Log-likelihood Estimation Method

Previous research efforts on the topic o f CRE from fully masked datasets o f two- component systems used the general MLE method. The following approach was taken. A fully masked failure observation o f a system o f two series connected components must be

50

the result- o f a failure due to failure mode (i.e. component type) 1 or 2. Therefore for each observation two cases exist:

I . The first failure mode caused the failure. 2. The second failure mode caused the failure.

The likelihood contribution under the assumption that the first case applies, Z.//, can be calculated as shown in Equation (12) by multiplying the reliability function value o f the surviving component type 2 with the PDF-value o f the failing type 1 at time f/.

(12)