• No results found

Fitting Models to Survival Data

1.2 Problem Statement

2.3.3 Fitting Models to Survival Data

When fitting survival models to the data sets, there are three approaches employed, namely parametric, semi-parametric and non-parametric. The dif- ferences in the approaches are now briefly discussed.

2.3.3.1 Parametric

According to Cox (2006), the parametric approach to fitting the survival mod- els makes the assumption that the data considered is from a certain type of probability distribution. In the case of many survival models, this means a specific functional form of the hazard function is assumed since it depends on the PDF. Parametric methods are capable of delivering more accurate results

than non-parametric and semi-parametric models but only if the assumptions made are correct.

Anderson and Keiding (2006) argue that since parametric models depends on the ability to fit the data and not as much the deep physical motivation, it is important to confirm the adequacy of the selected model. Parametric models have the important advantage of simpler methods of estimation and inference considering the likelihood function. The likelihood function is maximized in order to estimate the necessary parameter values.

2.3.3.2 Non-Parametric

Non-parametric techniques do not rely on any assumptions that data is from a specific distribution as mention by Anderson and Keiding (2006). This means that no assumptions are made about the distribution parameters of the survival data. A non-parametric approach has its main focus on estimating the regres- sion coefficients of the selected survival model and no estimation of distribution parameters are necessary. However, not making these estimations leaves the relevant function unspecified (Cox, 1972). This approach is favoured when a small amount of data is available and when it is suspected that the data might be distributed in an unusual manner.

2.3.3.3 Semi-Parametric

The semi-parametric approach is a compromise between parametric and non- parametric models. Anderson and Keiding (2006) explain that this approach uses the rigid structure of parametric models while accessing some of the flex- ibility of the non-parametric models. There is no strict definition for semi- parametric models, thus, any model that is not fully parametric is considered as semi-parametric.

Survival models are developed in a manner that will cause them to be classified as non-parametric, semi-parametric or fully parametric. The models have also been adapted, so if it was originally developed as a non-parametric model, for example, another or the same researcher might have developed extensions of the model allowing it to be solved semi-parametrically or parametrically (Balakrishnan and Rao, 2004).

Different survival models are suitable for different applications. These appli- cations and the extensions of the models are discussed in Section 2.4. This section will also provide methods for establishing when which models are cus- tomarily applied. Choosing the model most applicable in a specific case is discussed in Chapter 3. The distribution families generally used in reliability analysis are discussed in the following section.

2.3.3.4 Parametric Families

In an attempt to deliver more accurate results, assumptions are made in the parametric and semi-parametric cases about the distribution of the data. Gen- erally, the hazard is assumed to belong to one of the distributions reviewed below. These distribution families are parametric in the sense that they have parameters that must be estimated to describe the distribution’s characteris- tics.

Reliability analysis in the engineering field has several general parametric PDF’s, referred to as distribution families (hereafter) that are commonly used to describe the failure characteristics of assets as Anderson and Keiding (2006) state. Most survival models require these models to represent the distribution of the system/component hazard. The popular PDF’s of the different para- metric families are listed:

1. Exponential Distribution

f (x) = λ exp(−λx) for x ≤ 0,

with λ known as rate parameter. This distribution is used when the hazard can be assumed as constant and is why this family is often not applicable. When log(R(x)) plotted versus time yield a relatively linear plot, the exponential distribution can be considered as an appropriate distribution to use. This distribution is generally used when a constant hazard for a system is desired.

2. Gamma distribution f (x) = x

k−1exp(x θ)

θkΓ(k) for x > 0 and k, θ > 0,

with k known as shape parameter and θ as the scale parameter. This is a more flexible distribution than most others and can be used to represent a mixture of exponential distributions.

3. Log-normal Distribution f (x) = 1 (2π)0.5σxexp " −0.5 log x − µ σ 2# for x > 0,

with µ known as location parameter and σ as the scale parameter. If log of the survival times assume the shape of a normal distribution, this is the appropriate distribution to consider.

4. Log-logistic Distribution f (x) = δλ(λx)

δ−1

[1 + (λx)δ]2 for x > 0 and δ, λ > 0,

with δ known as the shape parameter and λ as the scale parameter. If the log



R(x) 1 − R(x)



versus log(x) is relatively linear this is likely the distribution that should be used.

5. Weibull Distribution f (x) = λ η  x η λ−1 exp  x η λ! for x ≤ 0,

with λ known as the shape parameter and η as the scale parameter. This is a very flexible distribution and has been used extensively to model physical equipment in reliability analysis. If log(− log(R(x))) versus the log(x) is more or less linear, the Weibull distribution is likely to be the appropriate distribution to utilize.

These distribution families have all been used regularly in the field of survival analysis. They can be used to develop different parametric or semi-parametric versions of the survival models discussed.

2.4

Survival Models

The models that were chosen to be reviewed were those that were the most common in reliability analysis literature, have been validated by preceding literature and can be used with the data sets for this study. These are the models that reappeared consistently in literature when researching survival analysis. The first model reviewed is arguably the most popular one and has been widely utilized over the past two decades.