3.4 Model selection (Step 4)
3.4.2 Testing Applicability of Models
The only way to know for sure which model is the most appropriate one for a specific data set would be to apply each model to the data set and evaluate which model fits it the best. This study does not apply all possible models, only the most popular ones discovered in literature. It is, thus, possible that a model which is not reviewed in this study may have a better fit to a certain data set. It is, however, not a valid option to review all known models. The process of selecting the best model is presented in the form of a flow diagram in Figure 3.4.
Step 4
Literature review on relevant survival models
Relevant survival models Obtain parameter estimates Test which models are applicable to data set
Select most accurate one if more than one is applicable Residual life predictions
Data set obtained from Step 3
Figure 3.4: Flow diagram of Step 4.
When more than one model fits the data set, the model which yields the most accurate results should be used. Guidelines are now provided on when the reviewed models are appropriate and when not.
3.4.2.1 PHM and PWP
The PHM is the most popular and most widely used model, yet it will be useless if the data being considered does not adhere to the assumptions which the model is based on. The PWP is used when the data being considered to fit the proportionality assumption but is from a repairable system. Section 2.4.1 illustrates survival functions that indicate when the PHM is appropriate or not.
A very simple graphical method can also be used. The plot log(− log(R(x))) vs log(x), the log of the cumulative hazard (log(H(x))) versus the log time. The result needs to be a linear plot and when the covariates are stratified, the cumulative hazard is also stratified and when plotted, the lines must be parallel to one another for the proportionality assumption to hold.
log(x)
log(H(x))
Figure 3.5: PHM goodness of fit.
A plot of the survival function with stratified covariate values can also be used as shown in Chapter 2. The survival curves of equipment are not allowed to cross in order to keep the proportionality assumption valid (Gorjian et al., 2010b).
Another method to check the proportionality assumption is to again plot the log of the system’s cumulative hazard rate, and log(H(x)) multiplied by a constant determined by the estimated parameters (stratifying the cumulative hazard rate) versus time. This method is described by Kumar and Klefsjö (1994) in detail. The different strata will display whether the proportionality assumption holds or not. The vertical distance between two plotted curves need to be roughly equal for the PHM or PWP to fit the data.
roughly equal
Log(H
(x))
Figure 3.6: Testing proportionality assumption.
When considering a system that the proportionality assumption is valid for and the data is from a repairable system, the PWP model is used. Thus, the intensity is used instead of the hazard rate.
3.4.2.2 AFTM
The AFTM assumes that there is a direct relationship between the effect of the covariates and survival times of a system. This model can be seen as a specific case of the PHM. A simple regression test can be done to establish whether the AFTM is an appropriate model.
If a log(H(x)) vs log(x) plot of different subjects in the study is linear (if the baseline hazard has the Weibull distribution) and parallel to one another, the AFTM is applicable, but because the AFTM is a special case of the PHM, the PHM will also be applicable. An AFTM will be used when this occurs because it assumes a direct relationship between the covariates and the failure time and not the covariates and the system hazard. Figure 3.7 illustrates graphically what the plot should look like.
The log of the cumulative hazard could also be plotted versus just the time, not the log time. The hazard should then also be stratified, the AFTM is said to fit the data by Kumar and Klefsjö (1994) when the horizontal distance between the curves is roughly equal. Figure 3.8 illustrates this concept. The AFTM and the PHM will both be applicable in many cases. Should this happen, both models should be trained with all available data points. The model which can recreate the data set the most accurately should then be used.
log(x)
log(H(x))
Figure 3.7: AFTM goodness of fit.
roughly equal
Log(H
(x))
Figure 3.8: Another goodness of fit for AFTM.
3.4.2.3 AHM
This model will most likely be an option when the system does not have an initial hazard of zero. It is a key merit of this model to be able to represent a system that has non-zero hazard at time zero. Pijnenburg (1991) maintains that the AHM is often used when the proportionality assumption does not hold.
A simple example of the baseline hazard having a linear shape can be used to illustrate when an AHM is applicable and when a PHM is applicable. The set
of covariate values z is stratified to illustrate the effect. Let αs = (α2, ..., αp)0,
zs = (z2, ..., zp)0 and have z1 be a discrete variable and assign two different
values (z? and ˆz). This will result in two hazards;
h(x|zi) =
h(x) + αs0zs+ α1z?x if zi = z?
h(x) + αs0zs+ α1zxˆ if zi = ˆz, (3.4.1)
where x is the gap time (time between failures), or the time over which the curves are considered. This stratification will then indicate whether the addi- tive assumption of the hazard is valid. This assumption can be verified if the two hazards are plotted and they are two parallel lines shifted some constant apart, as indicated in Figure 3.9 (Pijnenburg, 1991). The solid line is when zi = z? and the dotted line is for zi = ˆz.
·10−3 Gap time (x) Hazard rate (h(x|z) )
Figure 3.9: Checking additivity.
This is the case when the baseline hazard rate is assumed as linear as done in this study. One way to know for sure that the model cannot be used is if a negative hazard is returned because this is not realistic. This is also the case for a system/component that has a hazard of zero at time zero.
3.4.2.4 PCM
The PCM operates on the same assumption as the PHM, thus, if the propor- tionality assumption of the PHM is valid, the PCM is also appropriate (Sun, 2006). This model was developed in order to overcome the limitations of the PHM; the one limitation of the PHM is that it requires a sufficient amount of
data. The PCM is, thus, favoured when a small amount of data is available. This model was developed for repairable systems but no indication is found that it could not be used for non-repairable systems. Sun (2006) states that the PHM allows for the baseline covariate function to be updated according to newly obtained CM and failure data. This then prevents error when esti- mating the hazard from the initial estimate of the baseline covariate function from accumulating as time progresses, illustrated in Figure 3.10.
Trending PCM Original
Hazard rate (h(x)
)
Figure 3.10: PCM hazard reduces error.
The PCM would, therefore, work very well when applied to equipment in real time. The covariate function is constantly updated with the new data received. This model will generally be applicable when the PWP and the PHM are applicable. The models should then be trained using the data set, which is then to be recreated with each model. The model which best recreates the data set will be selected to be the most appropriate survival models for the specific case.