Flexible estimators of hazard ratios for exploratory and residual analysis.

(1)

Flexible Estimators of Hazard Ratios for Exploratory

and Residual Analysis

Thesis submitted for the degree of Doctor of Philosophy

by

Angela Susan Winnett

Department of Statistical Science

University College London

University of London

(2)

ProQuest Number: U641833

All rights reserved

INFORMATION TO ALL USERS

The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript

and there are missing pages, these will be noted. Also, if material had to be removed,

a note will indicate the deletion.

uest.

ProQuest U641833

Published by ProQuest LLC(2015). Copyright of the Dissertation is held by the Author.

All rights reserved.

This work is protected against unauthorized copying under Title 17, United States Code.

Microform Edition © ProQuest LLC.

ProQuest LLC

789 East Eisenhower Parkway

P.O. Box 1346

(3)

Abstract

In analysis of censored survival data extensive use is made of models, in particular the Cox

model, which assume a parametric form for the hazard ratio. The assumed form is often inap

propriate and methods are needed for estimating hazard ratios with greater flexibility. Such

methods could be used for model checking, graphical presentation, and for suggesting more

appropriate parametric models. While in areas such as ordinary linear regression there are

clear and well understood methods for checking models and exploring flexible alternatives, in

hazard ratio estimation corresponding approaches are less well developed and both properties

and application are still unclear.

This thesis develops flexible estimators which have desirable theoretical properties, are useful

in practice and are easy to interpret. The practical methodology is developed through an

approach to estimation using Cox model residuals — flexible hazard ratios are estimated as

extensions of the Cox model, using residuals, adjusted, smoothed, and iterated. Theoretical

results are found for the properties of the iterated estimates, leading to practical guidelines

for exploratory and residual analysis in existing software. The methodology is relevant for

applications in a wide range of areas, but particularly for analysis of clinical data. An appli

cation is presented in an analysis of data from the Medical Research Council’s myelomatosis

(4)

Acknowledgements

This PhD was funded by the Im perial Cancer Research Fund and I am very grateful for the

generous support.

I would like to th an k my supervisor P eter Sasieni. Very m any thanks for all the help and inspiration

— particularly when there were so much more im portant things to do.

I am grateful to Jack Cuzick for m any useful comments and suggestions, and to Vern Farewell at

UCL.

M any m any thanks to Rob Edwards for sorting out all my com puting problems, and for answering

all my stupid questions and generally knowing everything. Also thanks to Ina Dan and M arion

Ware a t UCL.

I am grateful to G razia Valsecchi and Stefania Galim berti and the In stitu te di Tumori in Milan.

My visits there were not only very enjoyable and useful bu t also very m otivating.

T hanks to everyone in M aths, Statistics & Epidemiology a t IC R F for being nice to work with,

(5)

C ontents

L ist o f F ig u res 9

L ist o f T ables 13

1 In tr o d u c tio n 15

1 An example in ordinary r e g r e s s io n ... 15

2 N on-param etric and residual analysis in hazard e s tim a tio n ... 17

3 Context and notation ... 19

4 Hazard ratios and partial likelihood ... 21

5 C onstraints on non-par am etric estim ates — s m o o th in g ... 22

6 Models for dependence on tim e and co v a ria te s... 23

7 Contents and outline of t h e s i s ... 24

2 R e v ie w o f re sid u a l a n a ly sis 27 1 In tro d u c tio n ... 27

2 Residuals for the Cox m o d e l... 28

2.1 Crude r e s id u a ls ... 28

2.2 M artingale r e s id u a l s ... 29

2.3 Schoenfeld’s partial r e s i d u a l s ... 29

2.4 Score r e s i d u a l s ... 29

2.5 Generalized re s id u a ls ... 30

2.6 Deviance r e s id u a l s ... 31

3 Assessing the overall fit of the Cox m o d e l ... 31

4 Assessing model adequacy for individuals ... 31

5 Influence r e s i d u a l s ... 32

5.1 A djusted score r e s id u a ls ... 32

5.2 Single measures of case in f lu e n c e ... 32

(6)

6 Assessing the form of the Cox model with respect to c o v a r ia te s ... 33

6.1 S tr a ti f ic a ti o n ... 33

6.2 Arjas p l o t s ... 34

6.3 P lots of crude r e s i d u a l s ... 34

6.4 Sm oothed and transform ed m artingale residuals ... 35

6.5 Cum ulative sums of m artingale re sid u a ls... 36

6.6 C onstructed variable p l o t s ... 37

6.7 Assessing the form of covariate effects with influence re sid u a ls... 37

6.8 H azard s m o o t h i n g ... 38

7 Assessing the proportional hazards a s s u m p t i o n ... 38

7.1 Com paring survival or hazard estim ates in stratified m o d e ls ... 38

7.2 Arjas p l o t s ... 39

7.3 Moving e s tim a te s ... 39

7.4 Including time dependent c o v a r ia te s ... 39

7.5 Aalen p l o t s ... 40

7.6 Schoenfeld r e s i d u a l s ... 40

7.7 Sm oothed adjusted Schoenfeld r e s id u a ls ... 41

7.8 Cum ulative sums of r e s id u a ls ... 42

7.9 Assessing the proportional hazards assum ption with influence residuals . 42 7.10 H azard s m o o t h in g ... 43

8 Inference and goodness of fit tests ... 43

9 C o n c lu s io n s... 43

3 R e v ie w o f n o n -p a r a m etr ic e s tim a te s 45 1 In tro d u c tio n ... 45

2 Hazard estim ation for a homogeneous s a m p l e ... 46

3 E stim ating a constant h a z a r d ... 47

4 N on-param etric proportional hazards m o d e l ... 47

5 Tim e-varying coefficients in the Cox model ... 50

6 Aalen’s additive hazards m o d e l ... 51

7 M ultiplicative models for the hazard r a t i o ... 52

8 Fully non-param etric estim ation ... 52

9 C o n c lu sio n s... 54

4 N o n -p a r a m e tr ic e s tim a te s for h azard ra tio s 55 1 In tro d u c tio n ... 55

(7)

2.1 E s t i m a t i o n ... 56

2.2 Iterative m ethods for finding e s tim a te s ... 58

2.3 C o n siste n c y ... 62

2.4 Sim ulated illu s tra tio n s ... 65

2.5 P roof of theorem 4 . 1 ... 70

2.6 Proof of theorem 4 . 2 ... 77

3 M ultiplicative proportional hazards m o d e l s ... 83

3.1 E s t i m a t i o n ... 83

3.3 Asymptotics ... 88

3.4 Simulated illu s tra tio n s ... 88

4 M ultiplicative hazards m o d e l ... 96

4.1 E s t i m a t i o n ... 96

4.3 Asym ptotics ... 102

5 R e s id u a ls a n d n o n - p a r a m e tr i c e s tim a te s 109 1 In tro d u c tio n ... 109

2 Non-proportional hazards in the Cox m o d e l ... 110

2.1 Residual e s t i m a t e s ... I l l 2.2 Iteration of residual e s tim a te s ... 112

2.3 Theoretical p r o p e r t ie s ... 112

2.4 Im plications for practical a n a ly sis... 116

2.5 F irst-order estim ation and A alen’s additive hazards m o d e l... 117

2.6 C alculation in S -P lu s ... 118

2.7 Further approxim ations for ... 118

3 Non-log-linear form in the Cox model ... 132

3.1 Residual e s t i m a t e s ... 133

3.2 Iteration of residual e s tim a te s ... 134

3.3 Theoretical p r o p e r t i e s ... 134

3.4 Im plications for practical an a ly sis... 136

(8)

3.7 Proofs of theorems 5.2, 5.3 ... 141

4 Non-proportional hazards and non-log-linear form in the Cox m o d e l ... 143

4.1 Residual e s t i m a t e s ... 144

4.2 Iteration of residual e s tim a te s ... 144

4.3 Theoretical p r o p e r t ie s ... 145

4.4 Implications for practical a n a ly sis... 146

4.5 Relationship to m artingale and Schoenfeld residual p l o t s ... 147

4.6 Calculation in S -P lu s ... 148

4.8 An illustration of the effects of using the wrong type of residual analysis . 151 4.9 P roof of theorem 5 . 4 ... 154

6 C o n fid en ce in terva ls 155 1 Introduction . . . . 2 Variance estim ators 155 156 2.1 Variance estim ator for 0 in the Cox model with time-varying coefficients . 156 2.2 Variance estim ator for / in the m ultiplicative proportional hazards model 158 2.3 Variance estim ator for / in the m ultiplicative hazards model ... 159

3 Point wise confidence in te r v a ls ... 161

3.1 Low coverage due to insufficient i t e r a t i o n ... 161

3.2 Low coverage due to sm oothing — u n d e r-s m o o th in g ... 162

4 Global confidence r e g i o n s ... 168

4.1 I l l u s t r a t i o n ... 168

4.2 Global confidence regions using Schefïé’s m e t h o d ... 168

4.3 Illustration continued ... 169

4.4 P roof of equation ( 6 . 1 0 ) ... 170

5 Confidence intervals for the shape of the log hazard ratio ... 172

5.1 I l l u s t r a t i o n ... 172

5.2 Chi-squared confidence r e g i o n s ... 173

5.3 Illustration continued ... 173

5.4 Confidence regions and goodness-of-fit t e s t s ... 174

5.5 Illustration of the ‘power’ of confidence r e g i o n s ... 174

5.6 Visualisation of confidence r e g i o n s ... 175

6 Calculations for variance e s t i m a t e s ... 176

(9)

1 In tro d u c tio n ... 180

2 MRC 4th and 5th myelomatosis t r i a l s ... 181

3 Cox model and residual analysis ... 183

4 A new model based on the residual a n a ly s i s ... 191

A p p e n d ix A S m o o th in g 193 1 In tro d u c tio n ... 193

2 Regression sm oothing — sieve e s t i m a t e s ... 194

3 Polynom ial regression s p l i n e s ... 195

4 Sm oothing splines — roughness penalties ... 197

5 Local and kernel s m o o t h e r s ... 198

6 Sm oothing in two d im e n s io n s ... 198

7 Sm oothing in additive m o d e ls ... 199

A p p e n d ix B I te r a tiv e m e th o d s for so lv in g n o n -lin ea r eq u a tio n s 201

A p p e n d ix C B a c k fittin g an d d ia g o n a lized ite r a tio n s 204

A p p e n d ix D S o m e lin ea r algeb ra 209

A p p e n d ix E In verse fu n ctio n th e o r e m 211

A p p e n d ix F P r o o fs o f lem m a s for c o n s is te n c y 216

A p p e n d ix G P r o o f for e s tim a te s from A a le n ’s a d d itiv e h azard s m o d e l 223

A p p e n d ix H S -P lu s co d e and fu n ctio n s 226

I n d e x o f N o ta tio n 241

(10)

List o f Figures

1.1 Scatter plot of sim ulated d a ta with ordinary least squares linear regression line. . 16

1.2 Residuals from linear model with smoothing spline... 16

1.3 Smooth estim ate of E { Y | 17 4.1 Survival distribution sim ulated in example 1... 66

4.2 E stim ated log hazard ratio for example 1 with n = 500... 66

4.3 E stim ated log hazard ratio for example 1 with n = 10 000... 66

4.4 Survival distribution sim ulated in example 2... 67

4.5 Estim ated log hazard ratio for example 2 with n = 500... 68

4.6 E stim ated log hazard ratio for example 2 with n = 10000... 68

4.7 Survival distributions sim ulated in example 3... 69

4.8 E stim ated log hazard ratios for example 3 w ith n = 500... 69

4.9 E stim ated log hazard ratios for example 3 with n = 10000... 70

4.10 Survival distribution sim ulated in example 6 ... 89

4.11 E stim ated log hazard ratio for example 6 w ith n = 500... 90

4.14 Joint distribution of covariates in d a ta set generated from example 8... 92

4.15 E stim ated log hazard ratios for example 8 with n = 500... 92

4.16 E stim ated log hazard ratio for example 8 with n = 10 000... 92

4.18 Log hazard ratio sim ulated in example 9 ... 104

4.19 Grey scale for example 9... 104

4.21 E stim ated log hazard ratio for example 9 with n = 10 000... 105

4.23 Log hazard ratios sim ulated in example 10... 106

(11)

10

4.25 E stim ated log hazard ratio for example 10 with n = 750... 106

4.26 E stim ated log hazard ratio for example 10 w ith n = 10 000... 106

5.1 E stim ated log hazard ratio from weighted sm ooth of adjusted Schoenfeld residuals for example 1 with n = 500... 120

5.2 E stim ated log hazard ratio from weighted sm ooth of adjusted Schoenfeld residuals for example 1 with n = 10 000... 120

5.3 Log partial likelihood and quadratic approxim ation for example 1... 121

5.5 Log p artial likelihood and quadratic approxim ations for example 4 ... 122

5.6 E stim ated log hazard ratio from weighted sm ooth of adjusted Schoenfeld residuals for example 4 with n = 500... 122

5.7 V(^i) and V in example 1, n = 500... 123

5.8 Survival distributions generated in example 5 ... 123

5.9 V(i) and Ÿ in example 5, n = 500... 124

5.10 Various forms of sm oothed Schoenfeld residuals for example 5... 125

5.11 E stim ated log hazard ratio from weighted sm ooth of sim ultaneously adjusted Schoen feld residuals for example 3 with n = 500... 126

5.12 E stim ated log hazard ratio from weighted sm ooth of simultaneously adjusted Schoen feld residuals for example 3 with n = 10000... 127

5.13 E stim ated log hazard ratios for example 3 w ith n = 500 using weighted sm ooth of sim ultaneously adjusted Schoenfeld residuals, weighted sm ooth of individually adjusted Schoenfeld residuals, and sm oothed Schoenfeld residuals... 128

5.14 E stim ated log hazard ratios for example 3 with n = 10000 using weighted sm ooth of sim ultaneously adjusted Schoenfeld residuals, weighted sm ooth of individually adjusted Schoenfeld residuals, and sm oothed Schoenfeld residuals... 129

5.15 E stim ated log hazard ratio from sm oothed adjusted m artingale residuals for exam ple 6 with n = 500... 138

5.16 E stim ated log hazard ratio from sm oothed adjusted m artingale residuals for exam ple 6 with n = 10000... 138

(12)

11

5.20 Estim ated log hazard ratios from weighted sm ooths of adjusted m artingale residuals

and from weighted additive regression of adjusted m artingale residuals for example

8 with n = 10000... 141

5.21 Estim ated log hazard ratio from sm oothed adjusted m artingale difference residuals for example 9 w ith n = 750... 149

5.24 Log hazard ratio sim ulated in example 9 ... 149

5.27 E stim ated log hazard ratio from sm oothed adjusted m artingale difference residuals for example 10 with n = 10000... 151

5.30 Residual plots for d a ta set sim ulated from example... 11... 153

5.31 Residual plots for d a ta set simulated from example 11, using Gox models with transform ed or tim e-dependent covariates... 153

5.32 Weighted sm ooth of adjusted m artingale difference residuals for example 11. . . . 154

6.1 Variance of estim ates of /3 and m ean of variance estim ates in 1000 sim ulated d a ta sets from example 1... 157

6.2 Variance of estim ates of /3 and mean of variance estim ates in 1000 sim ulated d a ta sets from example 2... 158

6.3 Variance of estim ates of /3 and m ean of variance estim ates in 1000 sim ulated d a ta sets from example 4... 158

6.4 Variance of estim ates of / and m ean of variance estim ates in 1000 sim ulated d a ta sets from example 6... 159

6.5 Variance of estim ates of / and m ean of variance estim ates in 1000 sim ulated d a ta sets from example 9... 160

6.6 Mean of estim ated log hazard ratio in 1000 sim ulated d a ta sets from example 4 with n = 500, and sim ulated log hazard ratio (dashed line)... 163

6.7 M ean of estim ated log hazard ratio in 1000 sim ulated d a ta sets from exam ple 2 with n = 500 w ith the sim ulated log hazard ra tio ... 164

(13)

12

6.9 E stim ated log hazard ratio for example 2 with n = 500, with confidence intervals

and under-sm oothed confidence intervals... 166

6.10 Estim ated log hazard ratio for example 6 with n = 500, with confidence intervals

and under-sm oothed confidence intervals... 166

6.11 E stim ated log hazard ratio for 7th simulated dataset from example 1 with approx

im ate 95% confidence intervals... 169

6.12 E stim ated log hazard ratio for 7th sim ulated dataset from example 1 with approx

im ate 95% global confidence region... 170

6.13 E stim ated log hazard ratio for 2nd sim ulated dataset from example 1 w ith approx

im ate 95% global confidence region... 172

6.14 One hundred functions /? in the 95% confidence region (6.13) for the 2nd sim ulated

d a ta set from example 1 ... 176

6.15 One hundred functions (5 in the 95% confidence region (6.13) for the sim ulated d a ta

set from example 2 ... 177

7.1 Kaplan-M eier estim ates of overall survival in the two treatm ent groups, ABCM and

not ABCM ... 181

7.2 Correlated covariates log2(s/52m) and log2 (serum creatinine)... 183

7.3 F irst iteration residual estim ates of log hazard ratios in Cox model with tim

e-varying coefficients with approxim ate 95% pointwise confidence intervals... 185

7.4 F irst iteration residual estim ates of log hazard ratios in m ultiplicative proportional

hazards model w ith approxim ate 95% pointwise confidence intervals... 186

7.5 E stim ated log hazard ratio for log2 (serum creatinine) from sm oothed unadjusted

Schoenfeld residuals... 189

7.6 F irst iteration estim ate of log hazard ratio for log2(s^2m) from weighted sm ooth of

adjusted m artingale difference residuals... 190

7.7 Grey scale for figure 7.6... 190

C .l Iterations of N ew ton’s m ethod, backfitting and diagonalized iteration for finding

least squares estim ates from sim ulated d a ta w ith p = 2... 207

C.2 Iterations of N ew ton’s m ethod, backfitting and diagonalized iteration for finding

(14)

13

List o f Tables

6.1 P roportion of pointwise confidence intervals which include sim ulated value of fi{t)

in 1000 sim ulated d a ta sets from example 1 with n = 500... 161

6.2 P roportion of pointwise confidence intervals which include sim ulated value of f { t , z)

6.3 P roportion of pointwise confidence intervals which include sim ulated value of P{t)

6.4 Proportion of pointwise confidence intervals which include sim ulated value of /3{t)

in 1000 sim ulated d a ta sets from example 2 w ith n = 500... 164

6.5 Proportion of pointwise confidence intervals which include sim ulated value of / ( z )

in 1000 sim ulated d a ta sets from example 6 w ith n = 500... 165

6.6 P roportion of pointwise confidence intervals with undersm oothing which include

sim ulated value of P{t) in 1000 sim ulated d a ta sets from example 2 w ith n = 500. 167

6.7 Proportion of pointwise confidence intervals w ith undersm oothing which include

sim ulated value of /3{t) in 1000 sim ulated d a ta sets from example 6 with n = 500. 167

6.8 P roportion of pointwise confidence intervals which include sim ulated value of P{t)

for all t in the range of estim ation in 1000 sim ulated d a ta sets from example 1. . 168

6.9 Proportion of global confidence intervals which include sim ulated value of /3{t) for

all t in th e range of estim ation in 1000 sim ulated d a ta sets from example 1. . . . 170

6.10 P roportion of confidence regions (6.13) which include sim ulated function (3 in 1000

sim ulated d a ta sets from example 1... 174

6.11 P roportion of confidence regions which do not include any constant (3 in 1000 sim

ulated d a ta sets from example 1... 175

7.1 Sum m ary of prognostic variables... 182

7.2 Cox proportional hazards model ... 184

7.3 Minimum values of (/3^ - /5)^(T^(/3^)“ (/3^ - /3) for constant indicating inclusion

(15)

14

7.4 Minimum values of (/^ - - / ) for linear / , indicating inclusion in

confidence regions as in (6.13)... 186

7.5 Cox model with tim e-dependent and transform ed covariates resulting from residual

(16)

15

C hapter 1

In trod u ction

Analysis of censored survival data, for example from a clinical trial, typically involves estim at

ing association between covariates and the hazard function. The Cox model (Cox, 1972) allows

estim ation of the hazard ratio as a param etric function of covariates. Although th e Cox model

is sem i-param etric in the sense th a t the baseline hazard function is unspecified, the hazard ratio

has a fixed param etric form with respect to covariates, and is assumed to be constant in time. As

w ith any param etric model, this assumed form is often inappropriate, and m ethods are needed for

checking th e model cissumptions, exploring alternatives, and presenting flexible estim ates. The

next section gives an illustrative example in ordinary regression and the following sections describe

th e aims, approach and context of this thesis, w ith an outline of the content in section 7.

1 A n exam ple in ordinary regression

To illustrate the aims of this thesis, this section describes the simple problem of estim ating the

relationship between the mean of an observable random variable and a continuous covariate.

Typically a regression model is used, with residual analysis to explore w hether a linear or some

other, possibly non-param etric, model for the m ean function is m ost appropriate.

To illustrate this explicitly, d a ta were sim ulated for 500 independent and identically distributed

observations { X , Y ) , X generated from a uniform distribution on [0,1], and Y generated from a

norm al distribution with m ean { X — 1/2)'^I { X > 1/2) and standard deviation 0.1. Figure 1.1

(17)

1. Introduction 16

o

0.0 0.2 0.4 0.6 0.8 1.0

Figure 1.1: S catter plot of simulated d ata with ordinary least squares linear regression line.

o

-a o

o

0.0 0.2 0.4 0.6 0.8 1.0

Figure 1.2: Residuals from linear model with smoothing spline smooth and approxim ate 95%

pointwise confidence intervals. (Note th a t the vertical scale is not the same as in figure 1.1)

The form of the relationship is not clear from the scatterplot, and initially a linear model was

used. A coefficient of 0.21 was estim ated, with standard error 0.02; the estim ated mean is shown

on figure 1.1. Since there is no reason to assume th a t the association is linear, residual analysis

was carried out; figure 1.2 shows the residuals from the fitted linear model plotted against the

covariate. The solid line shows a smoothing spline smooth, with 5 degrees of freedom, and the

dotted lines show approxim ate 95% pointwise confidence intervals for the smooth (see Hastie and

Tibshirani (1990b) for smoothing splines and their standard errors). The residual plot shows the

non-linearity and indicates a more appropriate form. As a result a model might be chosen allowing

different slopes for small and large values of X , and with quadratic term s as well as linear.

Alternatively, a sm ooth but flexible estim ate of the association between the observation Y and

(18)

1. Introduction 17

d

(N d

o

d

(N d

0.0 0.2 0.4 0.6 0.8 1.0

Figure 1.3: Smooth estim ate of E { Y | X ) with approxim ate 95% pointwise confidence intervals.

the d a ta with a sm oothing spline smooth and approxim ate 95% confidence intervals. This figure

also shows the non-linearity, and indicates the underlying form. In fact the sm ooth in figure 1.3

is exactly the same as the smooth in figure 1.2 plus the linear estim ate from the fitted model.

The smooth is a penalized least squares estim ate of the mean, and a consistent estim ator as long

as the underlying mean is a smooth function and the degrees of freedom of the sm oother used to

estim ate it are increased at a suitable rate as the sample size increases (Gu & Qiu, 1994).

These plots and estim ates are all easy to produce in existing statistical packages, in particular

S-Plus, see appendix H.

2 N on-param etric and residual analysis in hazard estim a

tion

The approach to modelling illustrated in section 1 provides simple and intuitive m ethods of explor

ing and estim ating the association between the observation and the covariate. The relationship

between different (param etric and non-param etric) estim ates is clear, results are easy to interpret

and estim ates are easy to calculate in existing software. This thesis is concerned with developing

the corresponding approach to estim ating the relationship between a hazard ratio and covariates

from censored survival data.

F irst, this section describes the problems in this approach when estim ating a hazard ratio from

censored survival d a ta instead of a mean from independent and identically distributed observa

(19)

1. Introduction 18

In addition to any dependence on covariates, a hazard ratio is a function of time, so estim ating the

dependence of the hazard ratio on continuous covariates means estim ating a function of a t leaist

two continuous variables. Therefore a fully non-param etric approach is difficult in practice and

often infeasible. N on-param etric estim ators have been developed in various models which reduce

the dimension of the hazard ratio function by making some assum ptions bu t leaving other aspects

unspecified. As a result there are several ways of estim ating the hazard non-param etrically and it

is difficult to com pare different estim ates and model assumptions. The estim ates are also difficult

to calculate, both in term s of the feasibility of the calculations and the ease with which they can

be found in existing statistical software.

Models such as the Cox model make assum ptions both on the dependence of the hazard on

tim e and on the dependence of the hazard on covariates. Therefore, when checking models and

exploring alternatives, there is more th an one possible alternative and it may not be clear which

of the assum ptions is inappropriate. Residuals from the Cox model are easy to calculate bu t

difficult to interpret. In an ordinary regression model, the system atic variation in the m ean and

the random variation are w ritten as separate components of the observation, V = f { X ) + e where

e has m ean zero, enabling excess system atic variation in the m ean to be analysed using residuals

f { X ) — f { X ) + e. In contrast a hazard function determines both the system atic and random

variation in survival and there is no residual with an obvious interpretation in term s of random

and excess system atic variation. There are several ways of calculating residuals and i t ’s not easy

to see which m ethod should be used, or for what.

In the ordinary regression example in section 1, the two approaches to exploratory analysis for

choosing and checking models (ie. non-param etric estim ation and residual analysis) are basically

the sam e and therefore combine the advantages of each approach. In survival analysis, on the

other hand, the relationship between non-param etric estim ates and residual analysis is not so

clear. N on-param etric estim ates have useful asym ptotic properties, but they are not easy to use

in a practical analysis and they are difficult to interpret in the context of com paring different

models. On the other hand, residuals are easy to calculate and more clearly relate more and less

restrictive models, bu t they have unknown properties; they are not necessarily consistent estim ates

of anything useful and i t ’s not clear w hat they are approxim ating.

This thesis makes use of the relationship between non-param etric estim ation of hazard ratios and

Cox model residual analysis, using residual m ethods to find easy ways of calculating consistent

estim ates, and using the theory for non-param etric estim ates to study the properties of (even

easier) residual plots. The result is estim ators which are easy to calculate, have known and useful

(20)

1. Introduction 19

hazard ratio.

Specifically, the aims of this thesis are

• to review and extend results on consistent non-param etric estim ation of the hazard ratio;

• to find m ethods of calculating consistent non-param etric estim ates th a t are easy to use and

reliable;

• to find m ethods based on residuals available in existing software which approxim ate consis

ten t estim ates w ith useful properties;

• to deduce guidelines for residual analysis and its interpretation.

3 C on text and notation

T he context throughout this thesis will be survival d a ta with right-censoring. Survival tim es for n

individuals will be T i, . . . , T„ which will be equal to either the observed tim e of death or th e time

of censoring; Si is the indicator of Ti being an observed time of death. Individual i has covariate

vector Zi, of length p, which may change over time, although in some places the covariates will

be assum ed to be fixed from time zero. The triples {Zi,Ti,Si) will be independent and identically

distributed.

T he sam ple size n will be assumed to be reasonably big to make exploratory analysis w orth while.

Since the fiexibility of estim ates can only be increased slowly as the sample size increases, very

little fiexibility will be possible if there are less than, say, 100 individuals. Illustrations will be

based on a d a ta set from a study of myelomatosis with ju st over 1000 individuals (chapter 7) and

sim ulated d a ta sets of 500 or 750 individuals.

T he counting process notation of Aalen (1978) will also be used; A^i,. . . , and T i , . . . ,1 ^ are

defined as Ni{t) being equal to 1 if individual i has survival tim e T{ < t and = 1, and zero

otherwise; Yi{t) is 1 if t and zero otherwise. Thus Ni counts the num ber of deaths observed

so far for individual i and Yi indicates whether or not they are at risk. However only one d eath is

allowed per individual, so th a t Ni is always equal to 0 or 1, and individuals are not allowed to be

a t risk a t any tim e after death or censoring.

By this definition Ni is right-continuous with left-hand limits and Yi is left-continuous w ith right-

(21)

1. Introduction 20

also. Let T be the history process for the n individuals, T t consisting of the values of Ni{u), Yi{u)

and Zi{u) for i = 1 , . . . , n and all u < t . Then since Yi and Zi are left-continuous, Yi and Zi are

predictable w ith respect to T .

It is assumed throughout th a t the uncensored survival tim e for individual i has a continuous

distribution with hazard ra te \ i and cum ulative hazard A^, A*(() = Ai(u)du. Also independent

censoring is assumed, in the sense described in Andersen et al. (1991) section II. 1, so th a t the

counting process Ni has intensity process given by

l i m s t io ^ { N i ( t + St) - Ni {t ~ ) | J ^ t -} /S t = Yi{t)Xi{t).

This ensures identifiability of the distribution of Ti in inference from observations on {Ni, Yi), see

for exam ple Fleming & H arrington (1991) page 26. It also means th a t Ni{t) — Yi{u)Xi{u)du is

a local square integrable m artingale — see Andersen et al. (1991) section II.4.

T he h azard function for individual i is and will be assumed to depend on tim e and the covariate

vector for individual i as

Xi{t) = X{t,Zi{t)}.

Note th a t this makes the assum ption th a t if individuals have the same covariate vector a t time t

then they have the same hazard at tim e t. The aim is to estim ate the function A, which has p + 1

variables.

In general it is assumed th a t the survival distributions are continuous so th a t two uncensored

individuals can only have the same survival tim e with probability zero. In practice the m ethods

should be equally useful in d a ta sets with a small num ber of tied survival times, for exam ple due

to rounding. For example ties occur in the myeloma d a ta analysed in chapter 7 when survival

tim es are rounded to whole weeks. For simplicity notation will often apply only to d a ta w ithout

tied survival times; extension to d ata with ties should be fairly obvious.

Let d be the num ber of observed events, which will be indexed by {i), and occur at tim es i(i) <

< ; Z(i) is the covariate for the individual observed to die a t . And let TZ{t) denote the

risk set a t tim e t, 'JZ{t) = {i : Yi{t) = 1}. Then there are two equivalent notations, which will be

convenient in different places; for a function of tim e, / ,

/

Y^f{t)dNi{t) =

YI

f{t{i))]

t=i events (i)

and if f i is a function of tim e for each individual i.

Y,Yi{t)Mt)= YI

(22)

1. Introduction 21

Note this notation assumes th a t there are no ties, a single observed death a t each A further

subscript can be introduced if there are ties, bu t is not used in this thesis.

4 Hazard ratios and partial likelihood

The Cox model (Cox, 1972) formulates the hazard X{t, z) as the product of a baseline hazard Xo{t)

and a hazard ratio. The baseline hazard function Aq is unspecified and is not estim ated, while the

hazard ratio is estim ated according to the param etric form in the Cox model. T hroughout this

thesis a sim ilar form will be used,

X{t, z) = Ao(t) e x p { /(i, z)] (1.1)

where to make the model identifiable, / has some constraint such as / ( ( , 0) = 0 for all t. E stim ates

are developed for the log hazard ratio / , with the baseline hazard Aq rem aining unspecified.

U nder the general model (1.1), Cox’s partial likelihood (Cox, 1975) can be w ritten as

r / f . TT ex p (/(t(i),Z (i))}

observed L t h s «) '

T his can be used w ith various forms for / to estim ate the hazard ratio w ithout estim ating or

restricting the baseline hazard Aq. Thus, although effects to be estim ated m ust necessarily be

restricted by either a param etric form, or some smoothness constraint, the underlying survival

distribution does not affect the estim ation. T he form (1.1) also avoids the problem of negative

hazard estim ates.

One advantage of w riting the hazard as in (1.1) is th a t a m ultiplicative model can be used when

th e dimension of the covariate vector, p, is more th an one,

A(t, z i,...,Z p ) = A o (i)e x p { /i(i,z i) + ■ ■ - + fp{t,Zp)}. (1.2)

In practice, some such model is vital to reduce the dimension of the hazard ratio from p + 1 , and

this form allows the effect of each covariate to act m ultiplicatively on the hazard, which in many

situations may be the m ost biologically plausible. An alternative would be to have th e effects of

covariates acting additively on the hazard function as in Aalen’s model (1989). A lthough this thesis

is based on the m ultiplicative hazards model, m ethods apply even more easily to additive hazards

models (see for example chapter 5 section 2.5). M ethods for choosing between m ultiplicative and

additive hazards have been developed by McKeague & Utikal (1991) and Gr0nnesby & Borgan

(23)

1. Introduction 22

O ther alternatives to (1.1) include accelerated failure tim e models (Kalbfieisch & Prentice, 1980,

chapter 6), (Cox & Oakes, 1984, section 5.2) and param etric survival models such as Weibull

and log-logistic regression models (Kalbfieisch & Prentice, 1980, page 31), (Cox & Oakes, 1984,

section 6.3) and, more generally, the generalized gam m a model (Farewell & Prentice, 1977). Choice

between the sem i-param etric model (1.1) and various param etric models is discussed by M arubini

& Valsecchi (1995, page 290), Gore et al. (1984) and Lin & Spiekerman (1996). Therneau et

al. (1990, appendix 3) discuss the application of model checking m ethods based on m artingale

residuals to param etric models.

E stim ation of the hazard ratio is natural in any context in which the aim is to determ ine the

effect of covariates on survival, ra th e r th an to determ ine the value of the hazard itself. In such a

context estim ation of the hazard X{t,z) for each value of z would in any case need to be followed

by further m ethods to compare hazards for different covariate values. On the other hand, for

clinical interpretation of covariate effects it will norm ally be necessary to eventually estim ate the

baseline hazard function, or ra th e r the cumulative baseline hazard function, as well. This should

in general be possible using an estim ator similar to Breslow’s baseline hazard estim ator for the

Cox model (Breslow, 1974), and is not considered further here.

5 C onstraints on non-param etric estim ates — sm ooth in g

Clearly some constraint is needed when estim ating a continuous function from a finite sample.

To find sensible non-param etric estim ates of continuous functions from finite sam ples, estim ates

are forced to be sm ooth — a num ber of different m ethods of sm oothing are used throughout this

thesis and they are described in appendix A.

T he sm oother the estim ate is made, the lower i t ’s variance will be, bu t th e higher th e bias will be

if the function being estim ated is not really th a t smooth. The aim is to choose a level of sm oothing

th a t keeps both the bias and the variance small, see H astie & Tibshirani (1990b) section 3.3. For

th e purpose of asym ptotics the smoothness should be reduced at a suitable rate as the sam ple size

increases so th a t both th e bias and variance are reduced.

As an alternative to sm oothing, cumulative (or integrated) non-param etric estim ates can be used.

These have the advantage of nicer theoretical properties, in particular faster {y/n) asym ptotic

convergence due to differentiability of the corresponding functional of th e probability m easure

(van der V aart, 1991), and avoid the arbitrariness introduced by choosing a sm oother. They are

(24)

1. Introduction 23

easy to interpret in the context of the form of the relationship between the hazard ra tio and

covariates, and are not very helpful for suggesting and exploring models. The m ethods developed

in this thesis therefore concentrate on estim ating sm oothed hazard ratios and not cum ulative or

integrated hazard ratios.

Asym ptotic considerations govern the ra te a t which the smoothness decreases as the sam ple size

increases. For choosing smoothness for a specific d a ta set, m ethods such as cross-validation might

be used, with the purpose of finding some optim al smoothness depending on the data, as described

in H astie & T ibshirani (1990b) section 3.4. Such m ethods are not developed here and sm oothness

is assum ed throughout to be chosen in advance.

6 M od els for d ependence on tim e and covariates

The log hazard ratios f i { t , z i ) ,. . . , f p { t , Z p ) in (1.2) depend on both tim e and covariates, so non-

param etric estim ation can mean non-param etric in t , in z, or in both t and z. Similarly checking

model assum ptions can mean checking the form with respect to t , checking the form w ith respect

to z, or checking the form with respect to t and z. Therefore throughout this thesis estim ation

will be considered in the context of three different models. Firstly, a model where the log hazard

ratio is param etric (linear) in the covariates, bu t the coefficients are unspecified functions of time,

the ‘Cox model with tim e varying coefficients’,

A(f,z) = Ao(f)exp{/3i(<)zi H h Pp{t)zp}. (1.3)

This model would m ake no assumptions on A if there was only a single binary covariate. It can

be used with a non-linear, but specified, form by first transform ing covariates.

In th e second model, the log hazard ratio is assumed to be constant in time, bu t is an unspecified

function of each covariate, the ‘m ultiplicative proportional hazards m odel’,

A(f,z) = A o(f)exp{ /i(zi) -f • • • -I- f p { z p ) } . (1.4)

Here, hazard ratios th a t are non-constant in tim e can be included, if the form w ith respect to tim e

is specified, using tim e-dependent covariates. Note th a t, confusingly, this model is also called an

additive model due to the additive form in the exponential.

T he th ird model for A assumes nothing but the m ultiplicative form for m ultiple covariates; for

each covariate the log hazard ratio is an unspecified function of both tim e and the covariate, the

‘m ultiplicative hazards m odel’,

(25)

1. Introduction 24

Each model is of course useful in different contexts.

7 C on ten ts and outline o f thesis

The following two chapters are reviews of the literature on residual analysis for th e Cox model

and on non-param etric hazard models. Although there is necessarily considerable overlap between

the two, m ost of the literature approaches the problem either as checking the Cox model, or as

estim ating the hazard or hazard ratio w ithout any use of the Cox model, so the two approaches

are reviewed in separate chapters.

C hapter 4 contains development of non-param etric estim ation in each of the three models (1.3),

(1.4) and (1.5). The Cox-model with time-varying coefficients (1.3) is developed in section 2.

E stim ation is discussed and relevant estim ators from the literature are sum m arised in a common

fram ework in section 2.1, with discussion of iterative m ethods for finding them in section 2.2.

Section 2.3 deals w ith asym ptotics, and contains the main theoretical results for this model —

asym ptotic results are only available in the literature for estim ators th a t would not usually be used

in practice, and here consistency is shown for more useful estim ators using regression sm oothing

and local estim ation. The results are in theorems 4.1 and 4.2 on page 64. E stim ation is illustrated

w ith sim ulated examples in section 2.4.

T he m ultiplicative proportional hazards model (1.4) is developed in section 3 of chapter 4. Again

relevant estim ators from the literature are summ arised in a common framework. Section 3.2 is

concerned with iterative m ethods for finding estim ates; while the usual N ew ton’s m ethod can

involve excessive com putation, theorem 4.3 (page 87) shows th a t an alternative m ethod can be

used which is both simpler and com putationally more feasible — under usual conditions, and

startin g from a suitable initial value, the iteration will arrive at the desired estim ate. Section

3.3 sum m arises results on consistency from the literature, and sim ulated examples are shown in

section 3.4.

Section 4 of chapter 4 develops the m ultiplicative hazards model (1.5). New estim ators are pro

posed in section 4.1, and a feasible iterative m ethod for finding estim ates is found in section 4.2.

Use of the model and presentation of estim ates is illustrated in section 4.4 in sim ulated examples.

C hap ter 5 introduces the use of Cox model residuals and their relation to the non-param etric

models of chapter 4, and the iteration of residual estim ation. C hapter 5 is also divided into

(26)

1. Introduction 25

non-proportional hazards, or the Cox model with tim e-dependent coefficients (1.3). Residual

estim ates, and iterated residual estim ates, are proposed in sections 2.1 and 2.2 — at the first

iteration, the estim ates use a combination of the m ethods in the literature. Section 2.3 contains

theoretical results for the iterated estim ates, specifically th a t under suitable conditions and using

appropriate m ethods, iteration of residual estim ates leads to consistent estim ators of the time-

dependent coefficients, similar to the non-param etric estim ates in chapter 4. Sections 2.4 -2 .7

discuss the im plications of the results for practical residual analysis. It is shown th a t one p articular

m ethod of residual analysis is both easy in existing software (S-Plus) and obtains estim ates with

good, if approxim ate, theoretical properties. Some m ethods th a t are proposed in the recent

lite ratu re are shown to be inferior. These results are illustrated using sim ulations in section 2.8.

Section 3 of chapter 5 gives a similar development for the estim ation of non-log-linear form in the

Cox model, or the multiplicative proportional hazards model (1.4). Estim ates based on (different)

Cox model residuals, and iteration, are proposed in sections 3.1 and 3.2. Again at the first iteration

the estim ates are similar to those suggested in the literature, bu t a new residual m ethod for

m ultiple covariates is introduced in equation (5.14) (page 133). Section 3.3 contains the theoretical

results th a t iteration of residual estim ation leads to the consistent estim ates of chapter 4. The

practical im plications for residual analysis are discussed in section 3.4. Again the theoretical

results show th a t residual analysis as proposed leads to estim ators th a t have desirable theoretical

properties, and are available w ithout further program ming from S-Plus, while some m ethods

recently proposed in the literature have less desirable properties. The m ethods are illustrated in

sim ulations in section 3.6.

Section 4 of chapter 5 develops residual analysis for the im portant situation where neither propor

tional hazards, nor a known functional form for continuous covariates, can be assumed w ithout

checking. This is the most likely situation in m any practical contexts b u t currently hardly any

m ethods are available th a t take this situation into account; those th a t do (see chapter 2) need

developm ent to be useful in practice and involve extensive program m ing and com putational effort.

In section 4.1 a new type of residual estim ate is introduced which can be used to check the Cox

model against a very general alternative, while still presenting inform ation on specifically w hat

alternative is needed. Residual estim ates are related to the non-param etric estim ates of chapter

4, giving theoretical justification for use in a practical analysis. Im plem entation in S-Plus is not

im m ediate for this approach, but still quite reasonable, as discussed in section 4.6. P ractical issues

are discussed w ith sim ulated examples in section 4.7.

C hapter 6 is concerned with confidence intervals and inference for the estim ates of chapters 4

(27)

1. Introduction 26

confidence intervals are m ostly left for future work, with heuristic justification presented in this

chapter. Variance estim ates are reviewed or suggested, discussed, and illustrated in section 2, and

in section 3 the coverage of pointwise confidence intervals is investigated in sim ulation studies.

The problem of low coverage due to bias in sm ooth estim ates is discussed and solutions from

generalized additive models are applied. In section 4 global confidence intervals are developed,

again applying solutions from generalized additive models, while section 5 discusses problem s of

inference for the form of a function and suggests m ethods for visualising global confidence regions.

C hapter 7 is a practical analysis of d a ta from the Medical Research Council’s trials on myeloma.

While these trials have been analysed in the medical literature and the questions they set out

to answer have basically been answered, it is clear th a t there is considerable inform ation to be

found from the d a ta which has not been m ade use of due to a lack of suitable statistical methods.

C hapter 7 presents an analysis of these d a ta using the m ethods developed in this thesis.

Appendices A -E sum m arise general m ethods and results which are used specifically in various

p arts of th e thesis. Since one of the m ain aims of this thesis is to find m ethods of estim ation

which are easily im plemented in existing software, S-Plus code, and where necessary, new S-Plus

(28)

27

C hap ter 2

R ev iew o f residual analysis

1 In trod u ction

T he Cox proportional hazards model (Cox, 1972) has been very widely used to analyse survival

d ata, in p articular in medical applications. Residuals were proposed when the model was first

used, eg. Kay (1977), and have developed a lot since. This chapter gives a review of residual

m ethods developed so far.

T he basic form of the Cox proportional hazards model says th a t the hazard at tim e t for an

individual w ith covariates Z is with

\{t, z) = Xo{t) exp(/)^z) (2.1)

for a constant vector of param eters /3. The log partial likelihood of Cox (1975) is

dNi{t).

^oo / r ^

i = l ”' 0 \ _'-j=l

T he coefficient /3 is usually estim ated by maximizing 1{P)] let P be the m aximum p artial likelihood

estim ate. Also the baseline cumulative hazard function Ao{t) = Ao(u)du is estim ated by the

Breslow estim ate (Breslow, 1972),

" S i

E U

(2.2)

T he p artia l likelihood score function is

(29)

2. Review of residual analysis 28

and the score derivative is

- m = t [ (2.3)

where

5W(t,/3) =

3=1

j = l

= ^ ' £ Y i { t ) Z j ( t ) « ^ e x p { 0 ^ Z j { t ) } , (2.4)

" 3 = 1

and for a column vector z, = z z ^ .

Under the model (2.1) and appropriate conditions, /3 converges asym ptotically to the true /3. Its

variance, X{P)~^, is estim ated by I 0 ) ~ ^ , and Ao(t) converges to Ao{t) (Andersen & Gill, 1982).

T he following section gives a list of the m ain types of residuals th a t have been suggested for the

Cox model; variations of each type have been used for various purposes. Sections 3 - 7 look at

residual analysis for checking various aspects of the model and estim ation.

2 R esiduals for th e Cox m odel

2.1 C rude residuals

The first residuals to be used with the proportional hazards model are based on th e general,

or ‘crude’, residuals of Cox and Snell (Cook & Weisberg, 1982, page 177). T heir use w ith the

proportional hazards model was suggested by Crowley & Hu (1977) and Kay (1977). They are

the fitted values of the cumulative hazard for each individual,

Èi = exp(^^Z<)Âo(Ti) (2.5)

where Aq is the Breslow estim ate (2.2). The residual Ei is equal to the estim ated cum ulative hazard

for individual i evaluated at Ti. The values of the cumulative hazard function, exp(^^Z i)A o(T i),

form a (censored) sam ple from a unit exponential distribution, so under the assumed model, the

estim ated values Ei might be expected to behave like unit exponential variables. They have been

used to assess th e overall fit of the model, as discussed in section 3, and to check the form of the

(30)

having undesirable properties, and being inferior to other types of residual, following (see sections

3 and 6).

2.2 M artin gale residuals

M artingale residuals were first suggested by Barlow & Prentice (1988) as a special case of their

generalized residuals (see section 2.5) and studied more specifically by Therneau, Grambsch &

Flem ing (1990). The m artingale residual Mi is given by

Mi

fOO

= / dNi{u) - yî(u) exp(/3^Zi)dÂo(u) = Si - Èi. Jo

The m artingale residual Mi is then the observed num ber of events for individual i minus the

expected num ber under the fitted model and conditional on the observed covariate value Z*. As

a result they are useful for checking the form of the model with respect to the covariates; several

types of adjusted and transform ed m artingale residuals have also been used, see section 6.

2.3 S ch o en feld ’s partial residuals

Schoenfeld (1982) introduced a type of residuals based on the covariate value corresponding to an

observed event minus the conditional expected value of the covariate under the fitted model;

for observed event (%), with as defined in (2.4). Each residual is a p-vector and there

is a residual for each observed event ra th e r th a n for each individual. In contrast to m artingale

residuals, Schoenfeld’s residuals give the observed minus expected value of the covariate a t the

event times, conditional on the risk sets, and correspondingly they have been used to assess the

form of th e model with respect to time, th a t is, the proportional hazards assum ption, as seen in

section 7.

2.4 Score residuals

A nother type of residual for the Cox model is the score residual . These are defined as

Si = r { Zi(t) - [ d N M - Yi(t)exp{4^Zi(t)}dAo(t)].

(31)

with and defined in (2.4) and where Aq is the Breslow estim ate (2.2) (Therneau et ai,

1990). They are similar to Schoenfeld residuals; each residual Si is a p-vector, bu t they are defined

for individuals not observed events. They are useful for assessing influence, see section 5.

2.5 G eneralized residuals

Barlow and Prentice (1988) suggested a generalized form for residuals for the Cox model, which

includes m artingale residuals, Schoenfeld residuals and score residuals and makes it easier to see

the likely uses and properties of each type of residual. They define a generalized residual for

individual i a t tim e t as

fi{t){dNi{t) - Yi{t) ex p0' ^Z i )dAo{t )} (2.6)

for predictable processes fi, estim ated by fi.

Since dNi{t) and dAo{t) can be non-zero only at observed event times t, this gives a residual for

each individual at each event time — n d residuals. These are then summ ed over the events to

give a generalized m artingale residual for individual i,

rOO

/ (2.7)

Jo

or summed over the individuals to give a residual for (a single) observed event (i).

They can also be summed over both individuals and events a t the same time, or over subsets of

th e individuals or subsets of the events.

If f i is taken to be 1, and summing over the events, then (2.7) is the m artingale residual Mi. If f i is

taken to be Zi and summing over the individuals, then (2.8) is the Schoenfeld residual for event (i).

If /^(i) = Zi{t) — {t, P) / $ ) and summing over the events, then (2.7) is the score residual

Si. From this representation it seems natural to use m artingale residuals (which are generalized

residuals summed over time) to check the form of the model w ith respect to covariates, and to use

Schoenfeld residuals (which axe generalized residuals summed over individuals) to check the form

w ith respect to time. Score residuals include a m easure of the difference between a covariate value

for an individual and an average covariate value, so can be expected to be useful for assessing

(32)

2.6 D ev ia n ce residuals

T herneau, Grambsch & Fleming (1995) also define deviance residuals, m otivated by deviance

residuals from generalized linear models,

di = s ig n (M i)v ^ { -M j - 6^ log(J«

-They are a transform ation of the m artingale residuals which results in a less skewed distribution

and therefore they have been suggested to be used to detect individual outliers, however they have

been little used in practice and have not been found to be particularly useful, see section 4 .

3 A ssessin g th e overall fit o f th e Cox m odel

As m entioned in section 2.1, crude residuals Ei estim ate the cumulative hazards for individuals i

evaluated a t Tj, which form a censored sample from a unit exponential distribution. Consequently,

early developm ents of residual analysis for the Cox model included plotting the survival curve of

the crude residuals as a check of the overall fit of the model (Crowley & Hu, 1977; Kay, 1977).

However Lagakos (1981), Crowley &: Storer (1983) and B altazar-A ban & P ena (1995) show th a t

due to estim ating Aq, and in particular because the function Aq is completely unspecified, the

d istribution of the Ei S is not necessarily w hat would be expected. For example if no covariates

are in the model, Aq can be m ade to ‘fit’ the d a ta exactly; the order statistics of the residuals

are then exactly the order statistics of the (censored) unit exponential distribution, and give no

inform ation about the fit of the model. Furtherm ore even if these plots could indicate th a t the

Cox model is inappropriate, they cannot indicate in w hat way.

4 A ssessin g m odel adequacy for individuals

T herneau, Gram bsch & Fleming (1990) have suggested using deviance residuals to detect individ

ual outliers. A large positive value of the residual indicates an individual dying too soon according

to the model, and a large negative value indicates an individual living too long, while the tra n s

form ation from the m artingale residuals ensures th a t the residuals have a reasonably sym m etric

and even spread if the model applies to each individual (Therneau et al., 1990). However they

have found little use in practice, and in fact indication of individual outliers may be considered of

(33)

form of the model used.

5 Influence residuals

In general regression models influence residuals have been developed using pertu rb atio n of the

model (Cook & Weisberg, 1982); a particularly simple form of influence residual is the jackknife

residuals which measure the difference in the estim ate P due to each observation — /3 —

where is the coefficient from the model fitted w ithout observation i. For the Cox model,

jackknife residuals are not ideal as their calculation involves fitting n + 1 models individually.

Score residuals have been found to approxim ate jackknife residuals, if suitably adjusted, and to

be useful for assessing influence in various forms.

5.1 A d ju sted score residuals

Cain & Lange (1984) and Reid & Crèpeau (1985) suggested scaled score residuals,

îi =

These are derived as approxim ations to the jackknife residuals using a weighted score m ethod by

Cain & Lange and using the empirical influence function by Reid & Crèpeau. Storer & Crowley

(1985) suggest a similar, bu t not identical, approxim ation to A/?i based on one-step augm ented

regression model diagnostics. The jackknife residuals and the two approxim ations are com pared

by T herneau, Grambsch & Fleming (1990). F urther approxim ations are made by P e ttitt & Bin

D aud (1989b).

5.2 S in gle m easures o f case influence

One problem with the adjusted score residuals described above is th a t there is a set of residuals

associated w ith each covariate. Often it is desirable to have a single overall m easure of influence for

each individual. P e ttitt & Bin Daud (1989a) apply the likelihood displacement m ethods of Cook

(1986) for ordinary likelihood models, and with an approxim ation obtain the influence residuals

Sj^îi = —S j ' X 0 ) ~ ^ S i . Again P e ttitt & Bin Daud (1989b) have a further approxim ation. Barlow

(1997) develops this m ethod further to apply to models for multiple failure times and Ccise-cohort

Flexible estimators of hazard ratios for exploratory and residual analysis.