Chapter 1: Introduction to Prognostic Factor Studies
1.4 Modelling Survival Data
1.4.3 Nonparametric Methods
The nonparametric methods described in sections 1.4.3.1 and 1.4.3.2 are included for completeness only and are not used within this thesis.
19
1.4.3.1Neural Networks
Faraggi and Simon [33], and others [34-37], have proposed a neural network generalisation of the Cox regression model defined by where
and are the values of the
prognostic factors. The weights can be estimated from the
data via maximisation of the partial likelihood although other optimisation procedures are often used. Although the problem of censoring is satisfactorily solved in this approach, there remain problems with potentially serious over-fitting of the data, especially if the
number, , of hidden units is large [38].
In general, feed-forward neural networks with one hidden layer are universal approximators [39] and can consequently approximate any function defined by the
conditional probability that is equal to one given with arbitrary precision by increasing
the number of hidden units. This flexibility can however lead to serious over-fitting. This can be compensated for by introducing some weight decay [40, 41], for example by adding
a penalty term to the log-likelihood [38]. The smoothness
of the resulting function is then controlled by the decay parameter .
Another form of neural networks that have been applied to survival data are the so-called single time point models [42]. As they are identical to a logistic perceptron, otherwise known as a common logistic regression model, or a feed-forward neural network with a hidden layer, they correspond to fits of logistic regression models or their generalisations to survival data. In practice, a single time point is fixed and the network is trained to
predict the survival probability. The corresponding model is given by
20
denotes the logistic function, , and is called the activation
function [43].
A common drawback of these naïve approaches is that they do not enable incorporation of censored observations in a straightforward manner, which is closely related to the fact that they are based on unconditional survival probabilities instead of conditional survival probabilities. Neither omission of the censored observations, as suggested by Burke [44], nor treating censored observations as uncensored [43] are valid approaches but both instead are a serious source of bias. De Laurentiis and Ravdin [42] and Ripley [41] propose to impute estimated conditional survival probabilities for the censored cases from a Cox regression model. Further work is needed in this area.
1.4.3.2Hierarchical Trees
Hierarchical trees are an approach for nonparametric modelling of the relationship between a response variable and several potential prognostic factors [45-49]. The idea of Classification And Regression Trees (CART), a synonym for different types of tree based analyses, is to construct subgroups that are internally as homogeneous as possible with regard to the outcome and externally as separated as possible. Hence the method leads directly to prognostic subgroups defined by the potential prognostic factors and is achieved by a recursive tree building algorithm.
The tree building algorithm produces a binary tree with a set of patients, a splitting rule,
and the minimal value, at each interior node. For patients in the resulting final nodes,
various statistics can be computed such as Kaplan-Meier estimates of event-free survival or hazard ratios with respect to specific references or combined groups.
Unfortunately prognostic factors are usually measured on different scales meaning the
21
restriction to a set of pre-specified cutpoints may be useful to overcome the problem that factors allowing more splits have a higher chance of being selected by the tree building algorithm. Due to multiple testing, the algorithm may also be biased in favour of these factors over binary factors with prognostic relevance [38].
To improve the predictive ability of trees, stabilising methods based on resampling have been proposed [50-54]. However, the results are difficult to interpret which reduces their value for practical applications.
1.4.4
Comparison of Methods
Although traditional statistical methods such as Cox proportional hazards or logistic regression are easy to perform and routinely available in standard software packages, machine learning methods such as hierarchical trees and neural networks are thought to predict more accurately because of greater model-fitting flexibility [55].
Artificial neural networks are popularly used as universal non-linear inference models. However, they suffer from two major drawbacks. The way they work is hidden because of the distributed nature of the representations they form [56], and this makes it difficult to interpret what they do. Worse still, there are no clearly accepted models of generality which makes it difficult to demonstrate reliability when applied to future data.
Cox proportional hazards models are well suited for regression modelling of survival data. They are simple to fit, can deal with time-varying regression coefficients as well as time- dependent covariates and no assumption is made on the distribution of the lifetimes of the baseline population. However, they are not flexible enough to deal with time-varying dynamics of covariate effects [57]. Additionally, the Cox model has the advantage over neural networks of providing some insight into which variables are most influential for
22
prognosis. Nevertheless, it is likely that the assumptions required by the Cox model may not be satisfied in all datasets, justifying the use of neural networks in certain cases.
Parametric regression models, such as the exponential, Weibull and Gompertz, may involve stronger distributional assumptions than it is suitable to make and inference procedures may not be sufficiently robust to departures from these assumptions [30]. This seems particularly to be the case in medical applications in which only limited experimentation in similar situations may have preceded the study in question or in which data are recorded by a number of individuals. Parametric models are also less flexible than proportional hazard models [38].
Although rank tests, encompassing accelerated failure time models, are derived with certain alternatives in mind for which optimum parametric procedures exist, they generally possess greater robustness than the corresponding parametric tests and are generally less sensitive to outliers [30]. In addition, for testing the null hypothesis, these tests generally involve only a small loss in efficiency compared to the parametric procedure when such a procedure is appropriate. Unfortunately though, accelerated failure time models are difficult to extend to handle time-varying effects.
1.5Thesis Outline
This chapter has considered prognostic factors studies and statistical models for prediction. The thesis continues with chapters on epilepsy, methods for identifying prognostic factors for epilepsy, prognostic modelling of time to treatment failure and time to 12 month remission for newly diagnosed patients, prognostic modelling of risk of recurrence for patients with a first seizure only, prognostic modelling of risk of recurrence for patient who withdraw their medication, validation methods for prognostic models and more
23
sophisticated mixture modelling methods. Further descriptions of each chapter can be found in sections 1.5.1 to 1.5.10.
1.5.1
Introduction to Epilepsy
In Chapter 2, the condition of epilepsy will be summarised including descriptions of seizure types, such as simple partial seizures and absence seizures, and classifications, such as focal epilepsy or generalised epilepsy. Methods of identification and diagnosis such as electroencephalogram (EEG) and magnetic resonance imaging (MRI) will also be outlined together with potential treatments including resective surgery, antiepileptic drug treatment and the ketogenic diet.
The chapter will conclude with a literature review of prognostic factors studies in epilepsy which will highlight the dearth of such studies and provide justification for further research in this area. The clinical background of epilepsy, described in this chapter, will inform the terminology and medical concepts used throughout my thesis (Chapters 4 to 11).