ANN application in prognostic modelling - Analytical methodologies

Chapter 2 Analytical methodologies

2.2.3. ANN application in prognostic modelling

Artificial neural networks are non-linear, semi-parametric models that have been considered as alternative methods for prognostic models in the presence of censorship (Lisboa, 2002). The most common applications of neural networks have been for diagnosis or prognosis, that is to decide which class k, k ∈⎨0,…,K⎬, in terms of prognostic risk, an individual belongs by using information on a set of p covariate values x=(x1,…xp). The usual

conditional probability of observing an individual with a certain class level given the covariates:

p (Y = class level | X = x ) = f ( x ,β) (24)

β is the vector of unknown parameters, which are called “regression coefficients” in statistics and weights in neural networks modelling. In neural networks modelling, the input layer corresponds to the covariates. The hidden units are the result of applying the activation function to a weighted sum of the input units plus a constant (w0). The value of a hidden unit

hj is given by: h_j =φ(w_{0 j} + w_i j i=1 p

∑

x_i) (25)

where φ is the activation function, w0j the bias.

The value of that output unit y is calculated by applying another activation function, as follows: y= f (x,w) =φ W₀+ W_j j=1 r

∑

φ w_{0 j}+ w_i j i=1 p

∑

x_i ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ (26)

There is a large literature on the use of neural networks for other kinds of classification tasks have been published in the area of medicine (Lisboa, 2002). The application of feed- forward neural networks to survival data has been discussed in the past years and it is an extension of the previous equation. Here, the output yK corresponds to the conditional

probability of dying in the kth time interval Ik. Data for the nth individual consists of a vector

of covariate variables y=(y1,…,ykn) and a vector which indicates the interval I where the

individual has died. Thus y1,…,yKn-1 are all zero and yKn is equal to 1 if the individual died in

Ikn and equal to zero if the individual was censored. This implies that the network has a

randomly varying number of output nodes according to those time intervals where the individual is at risk.

Some studies have showed that by treating the time interval as an input variable in a standard feed forward network with logistic activation and entropy error function, it was possible to estimate smoothed discrete hazards as conditional probabilities of failure. This proposed artificial neural network (ANN) approach can be applied to the estimation of the functional relationships between covariates and time in survival data to improve model

predictivity in the presence of complex prognostic relationships (Biganzoli, Boracchi, Mariani, Marubini, 1998).

There have been other proposals for analysing survival data using feed-forward neural networks, as using the network with only one output unit using the number k of the time interval as additional input and consider the unconditional survival probability of dying before tk rather than the conditional as output (Ravdin, Clark, 1992), (Ravdin, Clark,

Hilsenbeck, Owens, Vendely, Pandian, McGuire, 1992). Here, time was entered as a predictor, and each patient had as many entries in the model as the number of intervals during which it was alive. The intervals were derived from Kaplan-Meier estimates. It was introduced some bias, due to the introduction of coding time as covariate. This work was one of the first studies, which addressed the use of neural networks for survival analysis using real clinical data, producing accurate estimations for survival of breast cancer patients and raising the important issue on how to deal with censored data in neural network implementations for survival analysis. Another form of neural networks that can be applied to survival data is the called “single time-point models”. Here a single time point t is fixed and the network is trained to predict the t year survival probabilities. This approach is used by (McGuire, Tandon, Allred, Chamness, Ravdin, Clark, 1992), (Kappen, Neijt, 1993), (Burke, 1994). The common drawback of these approaches is that they do not allow incorporating censored observations. Neither omission of the censored observations nor treating censored observations as uncensored is a valid approach.

Other approach to use neural networks in survival analysis has been the use of hierarchical neural networks, which predict the survival in a stepwise manner. This approach predicts for the first time interval, than for the second interval and so on. The system produces a survival estimate for patients at each interval, given relevant covariates and it is able to handle continuous and discrete variables, as well as censored data. They can predict absolute, cumulative survival as well as instantaneous, conditional survival. Ohno-Machado (Ohno- Machado, Walker, Musen, 1995) compared three neural network models for survival analysis and for AIDS patients. He concluded not only that the hierarchical neural-networks models for survival analysis could learn infrequent patterns faster than could a non-hierarchical model, but also that they provide better accuracy in predicting death for the used cohort. However, this can lead to inconsistent answers such as give a higher predicted probability for

death in year 1 or 2 than for deaths in years 1,2 or 3.Other approach is to model conditional probabilities:

p(die in ith interval | survive first i - 1 intervals, x) = g(ηi) (27)

where g is usually the logistic function. The patient dying in the ith_{interval contributes with}

log(g(ηi)(1-g(ηi-1))…(1- g(η1))) to the likelihood, and a patient lost to follow up interval

log((1-g(ηi-1))… (1- g(η1)). The scores η1, …, ηk are given by the output of a neural network

with k linear outputs.

Ripley and Ripley (Ripley, Ripley 1998) tried several methods to compare neural networks and linear methods to classify binary outcomes, 1year period proportional odds, regression, proportional hazards, Weibull survival and log-logistic survival. They obtained a neural network with a specificity, sensitivity and accuracy higher for almost the methods than linear methods.

Delen et al (Delen, Walker, Kadam, 2005) reported a research where they developed several prediction models for breast cancer survivability, especially ANN. They used a binary categorical survival variable where survival is represented with value of “1” and non-survival is represented with “0”. They measured their accuracy, which was very good, and compared with other methods. The conclusion was that ANN are better than linear methods, such as logistic regression.

In document Prognostic modelling of breast cancer patients: a benchmark of predictive models with external validation (Page 42-45)