Multiple regression. PhD course: Statistical evaluation of diagnostic and predictive models. Example: epo study 1. Treatment

(1)

PhD course: Statistical evaluation of diagnostic

and predictive models

Modelling

Thomas Alexander Gerds (University of Copenhagen) March 02-04, 2016

1 / 82

Multiple regression

Multiple regression can be used to exploit the joint predictive power of several or many variables.

Conventional modelling techniques:

I _{logistic regression for binary outcome}

I _{Cox regression for time-to-event (survival) outcome}

I _{Cox regression and Fine-Gray regression in the presence of}

competing risks

P-values testing the null hypothesis of no association are not a good measure of predictive power.

2 / 82

Modelling Performance Survival Competing risk Display

Example: epo study

1

Anaemia is a deciency of red blood cells and/or hemoglobin and an additional risk factor for cancer patients.

Randomized placebo controlled trial: does treatment with epoetin beta epo (300 U/kg) enhance hemoglobin concentration level and improve survival chances?

Henke et al. 2006 identied the c20 expression (erythropoietin receptor status) as a new biomarker for the prognosis of locoregional progression-free survival.

1_{Henke et al. Do erythropoietin receptors on cancer cells explain}

unexpected clinical ndings? J Clin Oncol, 24(29):4708-4713, 2006.

Treatment

The study includes 149 2 _{head and neck cancer patients with a}

tumor located in the oropharynx (36%), the oral cavity (27%), the larynx (14%) or in the hypopharynx (23%).

One of the treatments was radiotherapy following Resection

Complete Incomplete No

Placebo 35 14 25

Epo 36 14 25

(2)

Outcome

Blood hemoglobin levels were measured weekly during radiotherapy (7 weeks).

Treatment with epoetin beta was dened successful when the

hemoglobin level increased suciently. For patient i set

Yi = ( 1 treatment successful 0 treatment failed 5 / 82

Exercises

Consider the following subgroup of male patients over 75 years in the active treatment group from the Epo Study:

Treat sex tot.rad.dose HbBase age Y

Epo male 72 16.0 72 1

Epo male 60 12.7 76 1

Epo male 70 9.9 79 0

Epo male 60 10.7 76 0

Compute (by hand without computer) the AUC for discriminating treatment success (Y=1) for:

I _{baseline hemoglobin level}

I _age

I _{total radiation dose}

6 / 82

Solutions

baseline hemoglobin level

100*(I(16>9.9)+I(16>10.7)+I(12.7>9.9)+I(12.7>10.7))/4

[1] 100 age

100*(I(72>79)+I(72>76)+I(76>79)+0.5*I(76==76))/4

[1] 12.5

Total radiation dose

100*(I(72>70)+I(72>60)+I(60>70)+0.5*I(60==60))/4

[1] 62.5

Target

Patient no. Treatment successful Predicted probability

1 0 P1 2 0 P₂ 3 1 P₃ 4 1 P₄ 5 0 P₅ 6 1 P₆ 7 1 P₇ · · · · · ·

(3)

Predictors

Age min: 41 y, median: 59 y, max: 80 y

Gender male: 85%, female: 15%

Baseline hemoglobin mean: 12.03 g/dl, std: 1.45

Treatment epo: 50%, placebo 50%

Stratum complete: 48%, incomplete: 19%,

no resection: 34% Erythropoietin

receptor status neg: 32%, pos: 68%

9 / 82

Logistic regression

Response: treatment successful yes/no

Factor OddsRatio StandardError CI.95 pValue

(Intercept) 0.00 4.01 <0.0001 Age 0.97 0.03 [0.91;1.03] 0.2807 Sex:female 4.71 0.84 [0.91;26.02] 0.0657 HbBase 3.25 0.27 [1.99;5.91] <0.0001 Treatment:Epo 90.92 0.76 [23.9;493.41] <0.0001 Resection:Incompl 1.75 0.81 [0.36;9.03] 0.4924 Resection:Compl 4.14 0.69 [1.13;17.36] 0.0395 Receptor:positive 5.81 0.66 [1.72;23.39] 0.0076 10 / 82

The model provides general information

Treatment with epo increases the chance (odds) of reaching the target hemoglobin level signicantly by a factor of

90.92 (CI₉₅% : [23.9;493.4],p <0.0001)

in the overall study population.

Does that mean everyone should be treated?

The model provides information for a single patient

For example: the predicted probability that a 51 year old man with complete tumor resection and baseline hemoglobin level

12.6g/dl reaches the target hemoglobin level (Y_i=1) is

[Epo group: ] 97.4% [ Placebo: ] 29.2 %

If a similar patient has baseline hemoglobin level 14.8 g/dl then

the model predicts: [Epo group: ] 99.8% [Placebo: ] 84.7 %

(4)

Predictions and Brier score for logistic regression

Patient Treatment Predicted Brier

no. successful probability (%) Residual score

Yi Pi Yi−Pi (Yi−Pi)2 · · · · · 142 0 84.09 -84.09 0.7071 143 0 93.47 -93.47 0.8737 144 0 18.73 -18.73 0.0351 145 0 1.81 -1.81 0.0003 146 0 3.86 -3.86 0.0015 147 1 96.64 3.36 0.0011 148 0 0.5 -0.5 <0.0001 149 0 11.93 -11.93 0.0142 Σ0.0869 13 / 82

The model behind the table

The linear predictor is the linear combination of log-odds-ratios and the values of the patient's characteristics (predictors):

LP=β₀+β₁age+β₂HbBase +β₃Treat+_{· · ·}+β₇epoRec

Predicted probability of successful treatment= 1

1+exp_{LP_}

I _{a prediction can be obtained for any given value of the}

predictor variables (patient characteristics)

I _{the predicted chance of successful treatment depends on}

all odds ratio's and all variables

14 / 82

Predicted treatment success probability (logistic regression)

For a treated man with no resection possible and negative epo receptor status. Predicted risk Age (years) Baseline hemoglobin (g/dl) 9 10 11 12 13 14 50 60 70 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Tools for evaluating prediction accuracy

For each subject we have a predicted risk based on multiple predictors. To evaluate the prediction performance of the logistic regression model we consider the following tools:

I _{Distribution of risks}

I _{Prediction accuracy: Brier score (lack of calibration and}

lack of spread of predictions)

I _{Discrimination: Roc curve, c-index = AUC (lack of spread}

of predictions)

I _{Calibration plot: (lack of calibration)}

I _{Re-classication scatterplot: (comparison of risk}

(5)

Brier score for null model in the Epo study

Patient Treatment Predicted Brier

no. successful probability (%) Residual score

Yi Pi Yi−Pi (Yi−Pi)2 · · · · · 142 0 44.3 -44.3 0.1962 143 0 44.3 -44.3 0.1962 144 0 44.3 -44.3 0.1962 145 0 44.3 -44.3 0.1962 146 0 44.3 -44.3 0.1962 147 1 44.3 55.7 0.3103 148 0 44.3 -44.3 0.1962 149 0 44.3 -44.3 0.1962 Σ0.247

The predicted probability is the prevalence of patients with treatment success in the data set.

17 / 82

Prevalence model

Calibration plot

Predicted probability of treatment success

Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ●

Performance null model Brier=24.7 AUC=50.0

18 / 82

Discrete predictors: Gender, resection status and epo

Calibration plot

Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ●● ● ● ● ● ● Null model Brier=24.7 AUC=50.0 ● Gender model Brier=24.7 AUC=50.3 ● Resection model Brier=24.0 AUC=58.7 ● Treatment model Brier=13.6 AUC=83.7

Continuous predictors: Baseline hemoglobin, Age

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●_● Re−classification plot

Predicted chance (Age model)

Predicted chance (Hemoglobin model)

0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ● ● Treatment success Treatment failed

(6)

Continuous predictors: Baseline hemoglobin, Age

Calibration plot

0 % 25 % 50 % 75 % 100 % Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % _{Null model} Brier=24.7 AUC=50.0 ● Age model Brier=24.7 AUC=51.2 ●

Baseline hemoglobin model Brier=19.3 AUC=77.2

21 / 82

Continuous predictors: Baseline hemoglobin, Age

Roc curves 1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % Null model Brier=24.7 AUC=50.0 ● Age model Brier=24.7 AUC=51.2 ● Baseline hemoglobin Brier=19.3 AUC=77.2 22 / 82

Multiple regression models

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● Re−classification plot

Predicted chance (excluding receptor status)

Predicted chance (including receptor status)

0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ● ● Treatment success Treatment failed

Multiple regression models

Calibration plot

Predicted event probability

0 % 25 % 50 % 75 % 100 % Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % _{Null model} Brier=24.7 AUC=50.0 ● All variables Brier= 9.6 AUC=93.3 ●

All + receptor status Brier= 8.7 AUC=94.7

(7)

Multiple regression models

Roc curves 1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ● All variables Brier= 9.6 AUC=93.3 ●

All + receptor status Brier= 8.7 AUC=94.7

25 / 82

R crash course: logistic regression prediction

Fitting the logistic regression model

logreg <- glm(Y~Treat+HbBase+age,data=Epo,family="binomial") Predicting probability of Y=1

## for same data

Epo$predrisk <- predictStatusProb(logreg,newdata=Epo)

boxplot(predrisk~Y,Epo)

## for specific values of the predictor variables

ndat=data.frame(Treat=c("Epo","Placebo"), HbBase=c(12,12),age=c(50,50)) predictStatusProb(logreg,newdata=ndat) ROC, calibration plot, AUC and Brier score

rlogreg <- Roc(list("logistic regression"=logreg),formula=Y~ 1,data=Epo)

rlogreg

plot(rlogreg,auc=TRUE)

calplot2(list("logistic regression"=rlogreg))

26 / 82

Exercises

Load the in vitro fertilisation data. Variables are described here: http://192.38.117.59/~tag/Teaching/share/data/IVF.html

IVF <- read.table("http://192.38.117.59/~tag/Teaching/share/ data/IVF.txt",header=TRUE)

1. Compute the prevalence of responders (y = 1)

2. Fit a logistic regression model with additive eects of the variables age, bmi, cyclelen, antfoll

3. Boxplot the predicted response probabilities in same data and add a line at the prevalence. Boxplot the probabilities conditional on the outcome.

4. Compare the risks of a 25 year old woman to that of a 35 year old woman when both have a BMI of 25, a cycle length of 28 and 19 antral follicles.

5. Plot the ROC curve and the calibration curve for this 4 variable model.

6. Compute the area under the ROC curve and the Brier score and compare the result to the benchmark prediction which ignores the predictor variables.

Solutions 1.

Compute the prevalence of responders (y = 1)

mean(IVF$y)

[1] 0.6690909 Same same:

mean(IVF$response=="positive")

(8)

Solutions 2.

Fit a logistic regression model with additive eects of the variables age, bmi, cyclelen, antfoll

library(Publish)

fit <- glm(y~age+bmi+cyclelen+antfoll,data=IVF,family=" binomial")

# summary(fit)

publish(fit,org=TRUE,units=list(age="years",bmi="kg/m^2", cyclelen="days"))

Variable Units OddsRatio CI.95 p-value

age years 0.96 [0.88;1.04] 0.33514 bmi kg/m2 _{0.93 [0.85;1.01]} _0.10118 cyclelen days 1.24 [1.05;1.46] 0.01079 antfoll 1.14 [1.09;1.20] < 0.0001 29 / 82

Solutions 3.

Boxplot the predicted response probabilities in same data and add a line at the prevalence. Boxplot the probabilities conditional on the outcome.

IVF$pred <- predictStatusProb(fit,newdata=IVF)

boxplot(pred~response,data=IVF,horizontal=TRUE,ylim=c(0,1))

negativ e positiv e 0.0 0.2 0.4 0.6 0.8 1.0 30 / 82

Solutions 4.

Compare the risks of a 25 year old woman to that of a 35 year old woman when both have a BMI of 25, a cycle length of 28 and 19 antral follicles.

ndat <- data.frame(age=c(25,35),bmi=25,cyclelen=28,antfoll=19) ndat$pred <- round(100*predictStatusProb(fit,newdata=ndat),1) ndat

age bmi cyclelen antfoll pred

1 25 25 28 19 71.7

2 35 25 28 19 62.4

Solutions 5

Plot the ROC curve and the calibration curve for this 4 variable model.

library(ModelGood)

plot(Roc(list("logistic regression"=fit)),auc=TRUE)

1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % AUC (%) logistic.regression (78.9)

(9)

Solutions 5

Plot the calibration curve and compute the

library(ModelGood)

calPlot2(list("logistic regression"=fit))

Predicted event probability

0 % 25 % 50 % 75 % 100 % Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % logistic.regression 33 / 82

Solutions 6

nullfit <- glm(y~1,data=IVF,family="binomial")

Roc(list("logistic regression"=fit,"reference"=nullfit))

Receiver operating characteristic Sample size: 275

Response: '0' (n=91) '1' (n=184)

Area under the ROC curve (AUC, higher better): full data

logistic.regression 78.91

reference NA

Brier score (Brier, lower better): full data logistic.regression 17.28

reference 22.14

34 / 82

The survival part of the epo study

The locoregional progression free survival time in the epo study is the time between treatment and what comes rst, death of patient or locoregional progression of the tumour.

Epo increases the blood hemoglobin level and thus successful epo treatment should improve the survival chances . . .

The role of time

Prediction model timeline

Time point at which patient is provided

with prediction

Time point attached to the prediction

baseline

followup

Origin (time 0) Horizon (time t)

Lost to followup, or (right) censored, means that patient was not followed until horizon time t.

I _{In survival analysis, a prediction is a survival function of time.} I At any time, the predicted risk is equal to one minus the chance

to survive until this time point.

I _{The null model which ignores the predictor variables and} predicts the prevalence to all subjects is obtained with the Kaplan-Meier estimator.

(10)

Predictors in the epo study

I _Age

min: 41 y, median: 59 y, max: 80 y

I _Gender

male: 85%, female: 15%

I _{Baseline hemoglobin level}

mean: 12.03 g/dl, std: 1.45

I _{Treatment arm}

epo: 50%, placebo 50%

I _Resection

complete: 48%, incomplete: 19%, no resection: 34%

I _{Erythropoietin receptor status}

neg: 32%, pos: 68%

37 / 82

Eect of epo treatment on prediction

Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % Treat 74 62 39 34 30 25 23 20 17 11 7 Placebo: 75 52 35 30 25 20 18 13 11 5 3 Epo: Treat Placebo Epo Locoregional progression−free survival 38 / 82

Treatment eect explained by new marker?

Epo−Receptor: positive Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % Treat 50 42 30 26 26 21 18 18 15 10 7 Placebo: 51 36 24 20 16 13 12 9 8 3 2 Epo: Treatment Placebo Epo Locoregional progression−free survival Epo−Receptor: negative Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % Treat 24 21 11 9 7 6 5 4 2 2 1 Placebo: 24 18 12 11 10 9 7 5 3 3 2 Epo: Treatment Placebo Epo Locoregional progression−free survival

Treatment eect explained by resection status?

Resection: none Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % 25 7 6 3 2 2 Placebo: 25 2 0 0 0 0 Epo: Treatment Placebo Epo Resection: incomplete Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % 14 11 7 6 5 0 Placebo: 14 8 5 4 2 0 Epo: Treatment Placebo Epo Resection: complete Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % 35 21 17 14 10 5 Placebo: 36 25 20 14 9 3 Epo: Treatment Placebo Epo

(11)

Cox regression

Variable Units HazardRatio CI.95 p-value

Treat Placebo 1.00 [1.00;1.00] 1.00000 Epo 1.00 [0.51;1.96] 0.99017 epoRec 0 1.00 [1.00;1.00] 1.00000 1 0.69 [0.39;1.21] 0.19199 age 0.98 [0.96;1.00] 0.04769 sex male 1.00 [1.00;1.00] 1.00000 female 0.88 [0.47;1.65] 0.68916 stratum CompleteResection 1.00 [1.00;1.00] 1.00000 IncompleteResection 1.31 [0.75;2.28] 0.34486 noResection 3.25 [2.05;5.18] < 0.001 Treat(Placebo): epoRec(1 vs 0) 0.69 [0.39;1.21] 0.19199 Treat(Epo): epoRec(1 vs 0) 1.39 [0.77;2.51] 0.27682

epoRec(0): Treat(Epo vs Placebo) 1.00 [0.51;1.96] 0.99017

epoRec(1): Treat(Epo vs Placebo) 2.02 [1.25;3.26] 0.00413

I

last two lines show eect of treatment separately for the

two epo receptor status groups

I

main eects have no interpretation

I

likelihood ratio test needed to see if eect modication is

signicant

41 / 82

Predictions of the Cox regression model

The linear predictor is the linear combination of

log-hazard-ratios and the values of the patient's characteristics (predictors):

LP= ˆβ1Treat+ ˆβ2epoRec + ˆβ3age+· · ·+β7Treat∗epoRec

The predicted t-years survival probability for a patient with linear predictor LP is obtained with the formula:

S(t|LP) =exp(−Λ₀(t)exp(LP))

where Λ0(t) is the cumulative hazard function (time-dependent)

when all predictor variables have value 0.

42 / 82

R crash course: Cox regression (part I)

Fitting the Cox regression model

coxreg<-coxph(Surv(ldfs.time,ldfs.status)~Treat*epoRec+age+stratum, data=Epo)

Predicting 4-year survival

## for same data

Epo$predsurv<-predictSurvProb(coxreg,newdata=Epo,times=48)

## for specific values of the predictor variables

ndat=data.frame(Treat=c("Epo","Placebo"),

stratum="CompleteResection",epoRec=factor(1), HbBase=c(12,12),age=c(50,50))

predictSurvProb(coxreg,newdata=ndat,times=48)

Scatterplot of predicted 4-year survival against baseline hemoglobin Epo$risk48<-predictSurvProb(coxreg,newdata=Epo,times=48) plot(risk48~HbBase,data=Epo,ylim=c(0,1))

Quantiles of predicted 4-year survival conditional on outcome

qrisk<-Score(list(coxreg),formula=Surv(ldfs.time,ldfs.status)~1,data =Epo,metrics=NULL,plots=NULL,summary="riskQuantile",times=48, nullModel=FALSE)

boxplot(qrisk,type="risk")

AUC and Brier score for 4-year predicted survival

rcoxreg<-Score(list(coxreg),formula=Surv(ldfs.time,ldfs.status)~1, data=Epo,times=48)

rcoxreg

riskScore:::plot.score.ROC(rcoxreg)

R crash course: Cox regression (part II)

Predicted survival function of time

ndat=data.frame(Treat=c("Epo","Placebo"),

stratum="CompleteResection",epoRec=factor(1), HbBase=c(12,12),age=c(50,50))

etimes <- sort(unique(Epo$ldfs.time))

survpred <- predictSurvProb(coxreg,newdata=ndat,times=etimes) plotPredictSurvProb(coxreg,newdata=ndat,col=1:2)

Time-dependent Brier score and AUC

rtcoxreg <- Score(list(coxreg),formula=Surv(ldfs.time,ldfs.

status)~1,data=Epo,times=1:48) riskScore:::plot.score.AUC(rtcoxreg) riskScore:::plot.score.Brier(rtcoxreg)

(12)

Exercises

Iload the GBSG2 data:

library(pec);data(GBSG2);help(GBSG2)

ITake a random subset of GBSG2 data which contains about 60% of the patients. Name this subset 'GBSG2.learn' and its complement which contains the remaining 40% of the patients 'GBSG2.val':

set.seed(17)

learn.id<- sample(1:NROW(GBSG2),size=.6*NROW(GBSG2)) GBSG2.learn=GBSG2[learn.id,]

GBSG2.val=GBSG2[-learn.id,] In the learning data:

IFit the overall Kaplan-Meier curve. Read from the graph, the estimated probability of surviving 500 days.

IFit a Cox regression model with the prognostic factors age, tumor size and grade, number of positive lymph nodes, estrogen and progesterone receptors.

In the validation data:

IPlot the 500-day survival predictions for given age

IBoxplot the 500-day survival predictions conditional on outcome

IPlot the Brier score of the Cox regression model against time and compare against the reference (Kaplan-Meier prediction).

IPlot the time-dependent AUC of the Cox regression model against time.

45 / 82

Solutions

km <-prodlim(Hist(time,cens)~1,data=GBSG2.learn)

plot(km)

abline(v=500,col=2)

abline(h=predict(km,times=500),col=2)

text(x=550,y=predict(km,times=450),round(100*predict(km,times=500),1),

pos=4,col=2) Time Sur viv al probability 0 500 1000 1500 2000 2500 0 % 25 % 50 % 75 % 100 % 411 385 334 259 211 164 120 70 28 8 1 Subjects: 86.1 46 / 82

Solutions

Fit a Cox regression model with the prognostic factors age, tumor size and grade, number of positive lymph nodes, estrogen and progesterone receptors.

library(Publish);library(survival)

fit <- coxph(Surv(time,cens)~age + menostat + tsize + tgrade + pnodes

+ progrec + estrec,data=GBSG2.learn) publish(fit,org=TRUE)

age 1.00 [0.98;1.03] 0.88397 menostat Post 1.00 [1.00;1.00] 1.00000 Pre 0.94 [0.58;1.53] 0.79777 tsize 1.00 [0.99;1.01] 0.39928 tgrade I 1.00 [1.00;1.00] 1.00000 II 2.42 [1.17;4.98] 0.01669 III 3.09 [1.44;6.65] 0.00389 pnodes 1.05 [1.03;1.07] < 0.001 progrec 1.00 [1.00;1.00] 0.03033 estrec 1.00 [1.00;1.00] 0.86952

Solutions

In the validation data: Plot the 500-day survival predictions for given age

GBSG2.val$risk500<-1-predictSurvProb(fit,newdata=GBSG2.val,times

=500)

plot(risk500~age,data=GBSG2.val,ylim=c(0,1))

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 40 50 60 70 80 0.0 0.2 0.4 0.6 0.8 1.0 age risk500

(13)

Solutions

In the validation data: Plot the 500-day survival predictions for given age score500 <- Score(list(fit),data=GBSG2.val,formula=Surv(time

,cens)~1,times=500,summary="riskQuantile",nullModel= FALSE)

boxplot(score500,type="risk",xlim=c(0,25))

coxph Predicted risk 0 % 6.25 % 12.5 % 18.75 % 25 % Overall Event Event−free Outcome 49 / 82

Solutions

Plot the Brier score of the Cox regression model against time and compare against the reference (Kaplan-Meier prediction).

score.time <-Score(list(fit),data=GBSG2.val,formula=Surv(

time,cens)~1,times=seq(0,2000,100),summary="riskQuantile ",nullModel=FALSE)

riskScore:::plot.score.Brier(score.time)

Time Br ier score 0 500 1000 1500 2000 0 % 5 % 10 % 15 % 20 % 25 % 50 / 82

Solutions

Plot the time-dependent AUC of the Cox regression model against time.

riskScore:::plot.score.AUC(score.time)

Time A UC 0 500 1000 1500 2000 50 % 60 % 70 % 80 % 90 % 100 %

Censored data and competing risks

In survival analysis, with no competing risks, the time of the

event is not observed when the patient was lost to follow-up (right-censored) before the event occurred.

The Kaplan-Meier method and our estimates of risk quantiles, AUC and Brier score give heigher weights to uncensored

subjects to account for that maybe the earlier censored subjects also had an event. This method is called inverse probability of censoring weighting (IPCW).

A competing riskis an event after which it is clear that the patient will never experience the event of interest. Here IPCW would be a mistake (bias).

(14)

Competing risk

Speed = 0

Censored

Speed = ? 53 / 82

Competing risks

Examples

I non-cardiovascular mortality for event cardiovascular mortality

I _{non-cancer mortality for event relapse}

I kidney transplant for event kidney failure without transplant

I discharge from ICU for event death in ICU

Pitfall

If a competing risk event is treated in the same way as

loss-to-followup (censored) then one is analysing a hypothetical world in which the competing risk does not exist (bias in this world).

Aim of prediction

In the presence of competing risks the aim of a risk prediction analysis is unchanged: the risk of the event of interest between the time origin and the prediction horizon.

54 / 82

Illustration

What happens:

I _{A cyclist can have an accident or get injured.}

Consequences:

I _{after the accident the speed is zero.}

Decision making:

I _{To estimate the chances of a cyclists to win a race, one}

has to combine the speed with the probability of an

accident/injury.

Illustration

What happens:

I _{A patient can die free from cardiovascular disease.}

Consequences:

I _{after death the hazard rate of cardiovascular}

disease/mortality is zero. Decision making:

I _{To estimate the risk of cardiovascular disease/mortality,}

one has tocombine the hazard rate of cardiovascular

disease/mortality with the hazard rate of death due to other causes.

(15)

Modelling of competing risks data

Several Cox regression models

I _{One Cox regression for each competing risk (cause of}

death), which then are combined to predict the event of interest

advantage: model focus on the biological mechanism behind the dierent causes

disadvantage: requires modelling the competing risks

Direct regression models

I _{Fine-Gray regression / Logistic regression/ Absolute risk}

regression

advantage: no model for the other causes needed

disadvantage: requires a model for the censoring times

57 / 82

Cox regression 5-year absolute CVD risk (formula I)

Need linear predictors of cause-specic log-hazard-ratios and patient's characteristics X . . . one for each cause:

I _{Cox regression for CVD hazard:}

λ₁(u_|X) =λ₀₁(u)exp(LP₁)

I _{Cox regression for non-CVD hazard:}

λ2(u|X) =λ02(u)exp(LP2)

Absolute 5-year risk of CVD:

Z 5 0 exp − Z s 0 {λ1(u|X) +λ2(u|X)}du | {z }

No event of any cause until s

λ₁(s_|X)

| {z }

CVD at s

ds.

58 / 82

Cox regression 5-year absolute CVD risk (formula II)

Need linear predictors of cause-specic log-hazard-ratios and patient's characteristics X . . . one for event and one for event-free (overall):

I _{Cox regression for CVD hazard:}

λ₁(u|X) =λ₀₁(u)exp(LP₁)

I _{Cox regression for event-free survival hazard:}

λoverall(u|X) =λoverall(u)exp(LPoverall) Absolute 5-year risk of CVD:

Z 5 0 exp − Z s 0 λoverall(u|X)du | {z }

No event of any cause until s

λ₁(s_|X)

| {z }

CVD at s

ds.

Illustration: TRACE study

3

The TRACE contains 1877 patients. Survival of patients after myocardial infarction related to various risk factors:

- status : 0: alive

9: dead from myocardial infarction, 7: dead from other causes.

- time: Survival time in years.

- chf: Clinical heart pump failure, 1: present, 0: absent. - diabetes: Diabetes, 1: present, 0: absent.

- vf: Ventricular fibrillation, 1: present, 0: absent.

- wmi: Measure of heart pumping effect based on ultrasound measurements where 2 is normal and 0 is worst. - sex: 1: female, 0: male.

- age: Age of patient.

3_{Jensen et al. 1997 Does in-hospital ventricular brillation aect}

prognosis after myocardial infarction?, European Heart Journal 18, 919-924.

(16)

Cox regression (formula I)

CSC(list(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,Hist(time,event) ~age+sex+diabetes),data=TRACE)

$`Cause 1`

1 wmi 0.45 [0.38;0.53] <0.001 2 chf 1.74 [1.50;2.03] <0.001 3 age 1.06 [1.05;1.07] <0.001 4 sex 1.10 [0.96;1.27] 0.167 5 diabetes 1.50 [1.24;1.81] <0.001 6 vf 2.18 [1.75;2.72] <0.001 $`Cause 2`

1 age 0.94 [0.90;0.99] 0.012

2 sex 3.35 [0.42;26.76] 0.254

3 diabetes 2.36 [0.52;10.84] 0.268

61 / 82

Cox regression (formula II)

CSC(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE,survtype="

surv")

$`Cause 1`

1 wmi 0.45 [0.38;0.53] <0.001 2 chf 1.74 [1.50;2.03] <0.001 3 age 1.06 [1.05;1.07] <0.001 4 sex 1.10 [0.96;1.27] 0.167 5 diabetes 1.50 [1.24;1.81] <0.001 6 vf 2.18 [1.75;2.72] <0.001 $OverallSurvival

1 wmi 0.46 [0.38;0.54] <0.001 2 chf 1.71 [1.47;1.99] <0.001 3 age 1.06 [1.05;1.07] <0.001 4 sex 1.11 [0.97;1.27] 0.143 5 diabetes 1.51 [1.25;1.82] <0.001 6 vf 2.16 [1.74;2.69] <0.001 62 / 82

Fine-Gray regression

The linear predictor LP is the linear combination of estimated log-*sub*-hazard-ratios and the values of the patient's

characteristics (predictors):

Absolute 5-year risk of CVD:=exp[₋exp_{β₀(5) +LP)_}]

I β₀(t) = log-sub-hazard (time-dependent) when all

predictor variables have value 0.

IPCW to account for censored data

I _{Censoring time = time where subject was lost to follow-up}

I _{Kaplan-Meier method applied to censoring times}

I _{Cox for censoring times}

Fine-Gray regression

FGR(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=

TRACE,cause=1)

coef exp(coef) se(coef) z p-value

wmi -0.80718 0.4461 0.090987 -8.8714 0.00000000000000 chf 0.55653 1.7446 0.075961 7.3265 0.00000000000024 age 0.05921 1.0610 0.003943 15.0170 0.00000000000000 sex 0.09439 1.0990 0.073493 1.2843 0.20000000000000 diabetes 0.40744 1.5030 0.096645 4.2159 0.00002500000000 vf 0.78392 2.1900 0.134950 5.8090 0.00000000630000

I _{Regression coecients = sub-hazard-ratios = hard to interprete} I Direction of eect and p-value are correct

(17)

Logistic risk regression

The linear predictor LP is the linear combination of estimated log-odds-ratios and the values of the patient's characteristics (predictors):

Absolute 5-year risk of CVD:=exp_{β₀(5) +LP(5))_}

I β₀(5) = log-odds (at 5 years) when all predictor variables

have value 0.

I _{Problem: odds(5) = risk(5)/(1-risk(5)) as usual, but here}

(1-risk(5)) = probability of either being alive or having died due to non-CVD related causes within 5-years

IPCW to account for censored data

I _{either Kaplan-Meier method applied to censoring times}

I _{or Cox regression for censoring times}

65 / 82

Logistic regression

LRR(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE, cause=1,times=5)

Factor ARR CI95 p-value

wmi 0.36 [0.27;0.48] < 0.0001 chf 2.53 [2.01;3.20] < 0.0001 age 1.0860 [1.07;1.10] < 0.0001 sex 1.14 [0.90;1.46] 0.2817039 diabetes 2.27 [1.54;3.33] < 0.0001 vf 2.08 [1.36;3.18] 0.0007231

Example interpretation of coecients:

I The odds of cardiovascular mortality within 5-years was 2.27 times

higher for patients with a history of diabetes.

I _{The odds of cardiovascular mortality within 5-years was increased by}

a factor 1.08 for each additional year of age.

66 / 82

Absolute risk regression

The linear predictor LP is the linear combination of estimated log-absolute-risk-ratios and the values of the patient's

characteristics (predictors):

Absolute 5-year risk of CVD:=exp_{β₀(5) +LP(5))_}

I β₀(5) = log-absolute-risk (at 5 years) when all predictor

variables have value 0.

I _{Problem: may not t very well when there are continuous}

predictor variables

IPCW to account for censored data

I _{either Kaplan-Meier method applied to censoring times}

I _{or Cox for censoring times}

Absolute risk regression

ARR(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE, cause=1,times=5)

wmi 0.696 [0.63;0.77] < 0.0001 chf 1.643 [1.44;1.87] < 0.0001 age 1.0328 [1.03;1.04] < 0.0001 sex 1.042 [0.96;1.13] 0.3183186 diabetes 1.20 [1.09;1.33] 0.0002537 vf 1.282 [1.16;1.41] < 0.0001

Example interpretation of coecients:

I _{The risk of cardiovascular mortality within 5-years was 1.2 times}

higher for patients with a history of diabetes.

I The risk of cardiovascular mortality within 5-years was increased by a

(18)

R crash course: competing risks (part I)

Fitting cause-specic Cox regression

cscfit<-CSC(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE)

Predicted risks for a given time point and all data

TRACE$risk5.cox <-predictEventProb(cscfit,times=5,newdata=TRACE,cause=1)

Predicted risks for specic values of the predictor variables

ndat<- data.frame(wmi=2.5,chf=1,age=60,sex=c(0,1),diabetes=1,vf=0) predictEventProb(cscfit,times=5,newdata=ndat,cause=1)

Predicted risks as function of time

ndat<- data.frame(wmi=2.5,chf=1,age=60,sex=c(0,1),diabetes=1,vf=0) TRACE$risk5.cox <-predictEventProb(cscfit,times=seq(0,5,.1),newdata=ndat) plotPredictEventProb(cscfit,cause=1,newdata=ndat,col=1:2)

Brier score and AUC

Score(list(Cox=cscfit),formula=Hist(time,event)~1,data=TRACE,times=5)

69 / 82

Exercises

Consider the Melanoma data: library(riskRegression); data(Melanoma); help(Melanoma)

I Plot the ROC-curve of tumor thickness as a marker for cancer related death after 5 years. Write a conclusion sentence which includes the AUC.

I Log-transform the variable tumor thickness.

I Fit a combined cause-specic Cox regression (CSC) with all covariates for cause 1 hazards but only sex and age for cause 2 hazards (non-cancer mortality).

I Fit a Fine-Gray regression model (FGR) for cause 1 specic risks with all covariates.

I _{Fit an absolute risk regression model for cause 1 specic risks (ARR)} with all covariates .

I Scatterplot predicted risks of the 5 year cancer-specic mortality obtained with the CSC model to those obtained with the ARR model. I Scatterplot predicted risks of the 5 year cancer-specic mortality

obtained with the FGR model to those obtained with the ARR model.

I Compute Brier score and AUC for 5-year prediction horizon.

70 / 82

Solutions

Plot the ROC-curve of tumor thickness as a marker for cancer related death after 5 years. Write a conclusion sentence which includes the AUC.

1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % AUC AalenJohansen: 50.0 [50.0;50.0] thickness: 68.0 [58.9;77.1]

Solutions

Log-transform the variable tumor thickness. Fit a combined cause-specic Cox regression (CSC) with the covariates epicel + sex + age + logthick for cause 1 hazards but only sex + age for cause 2 hazards (non-cancer mortality).

cscfit <- CSC(list(Hist(time,status)~epicel + sex + age + logthick,

Hist(time,status)~age+sex),data=Melanoma) publish(cscfit)

$`Cause 1`

1 epicel not present 1.00 [1.00;1.00] 1.0000

2 present 0.48 [0.26;0.89] 0.0186 3 sex Female 1.00 [1.00;1.00] 1.0000 4 Male 1.78 [1.04;3.03] 0.0350 5 age 1.02 [1.00;1.03] 0.0806 6 logthick 1.98 [1.43;2.73] <0.001 $`Cause 2`

1 age 1.08 [1.03;1.13] <0.001

2 sex Female 1.00 [1.00;1.00] 1.0

(19)

Solutions

Fit a Fine-Gray regression model (FGR) for cause 1 specic risks with the covariates epicel + sex + age + logthick.

fgrfit <- FGR(Hist(time,status)~epicel + sex + age + logthick,data=Melanoma,cause=1)

publish(fgrfit,org=TRUE)

coef exp(coef) se(coef) z p-value

epicelpresent -0.8346 0.4341 0.323120 -2.5828 0.00980 sexMale 0.5803 1.7866 0.280391 2.0697 0.03800 age 0.0113 1.0114 0.009884 1.1437 0.25000 logthick 0.6494 1.9144 0.152222 4.2663 0.00002 73 / 82

Solutions

Fit an absolute risk regression model for cause 1 specic risks (ARR) with all covariates .

arrfit <- ARR(Hist(time,status)~epicel + sex + age + logthick,data=Melanoma,cause=1)

publish(arrfit,org=TRUE)

epicelpresent 0.53 [0.30;0.93] 0.02719

sexMale 1.58 [1.01;2.49] 0.04558

age 1.0046 [0.99;1.02] 0.52088

logthick 1.78 [1.40;2.25] < 0.0001

74 / 82

Solutions

Scatterplot predicted risks of the 5 year cancer-specic mortality obtained with the CSC model to those obtained with the ARR model.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●_● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●_● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

predictEventProb(arrfit, newdata = Melanoma, cause = 1, times = 5 * 365.25)

predictEv

entProb(cscfit, ne

wdata = Melanoma, cause = 1, times = 5 *

365.25) 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 %

Solutions

Scatterplot predicted risks of the 5 year cancer-specic mortality obtained with the FGR model to those obtained with the ARR model.

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●

predictEventProb(arrfit, newdata = Melanoma, cause = 1, times = 5 * 365.25)

predictEv

entProb(fgrfit, ne

wdata = Melanoma, cause = 1, times = 5 *

365.25) 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 %

(20)

Solutions

Compute Brier score and AUC for 5-year prediction horizon. library(cmprsk)

Score(list(CSC=cscfit,FGR=fgrfit,ARR=arrfit),data=Melanoma,formula= Hist(time,status)~1,cause=1,times=5*365.25,metric="brier",plots= NULL)

Metric Brier: Scores

model times Brier se.Brier lower.Brier upper.Brier

1: AalenJohansen 1826 0.174 0.0219 0.131 0.216

2: CSC 1826 0.150 0.0188 0.113 0.187

3: FGR 1826 0.149 0.0187 0.112 0.186

4: ARR 1826 0.149 0.0189 0.112 0.186

Tests

times model reference delta se.delta lower upper p

1: 1826 CSC AalenJohansen -0.023679 0.01125 -0.04573 -0.00163 0.0353 2: 1826 FGR AalenJohansen -0.024496 0.01172 -0.04747 -0.00152 0.0367 3: 1826 ARR AalenJohansen -0.024608 0.01185 -0.04784 -0.00138 0.0379 4: 1826 FGR CSC -0.000817 0.00136 -0.00347 0.00184 0.5467 5: 1826 ARR CSC -0.000929 0.00211 -0.00506 0.00320 0.6591 6: 1826 ARR FGR -0.000112 0.00157 -0.00318 0.00296 0.9429 77 / 82

Solutions

Compute AUC for 5-year prediction horizon.

library(cmprsk)

Score(list(CSC=cscfit,FGR=fgrfit,ARR=arrfit),data=Melanoma,formula= Hist(time,status)~1,cause=1,times=5*365.25,metric="auc")

Metric AUC: Scores

model times AUC se.AUC lower.AUC upper.AUC

1: AalenJohansen 1826 0.500 0.0000 0.500 0.500

2: CSC 1826 0.750 0.0666 0.619 0.881

3: FGR 1826 0.756 0.0669 0.625 0.887

4: ARR 1826 0.757 0.0660 0.627 0.886

Tests

times model reference delta se.delta lower upper p

1: 1826 CSC AalenJohansen 0.24990 0.06664 0.11928 0.3805 0.000177 2: 1826 FGR AalenJohansen 0.25582 0.06689 0.12472 0.3869 0.000131 3: 1826 ARR AalenJohansen 0.25684 0.06602 0.12744 0.3862 0.000100 4: 1826 FGR CSC 0.00592 0.00503 -0.00393 0.0158 0.238853 5: 1826 ARR CSC 0.00694 0.00835 -0.00943 0.0233 0.406064 6: 1826 ARR FGR 0.00102 0.00660 -0.01191 0.0139 0.876996 78 / 82

Dissemination of results

The results of logistic regression or Cox regression or cause-specic Cox regression can be transformed into a nomogram or Internet calculator in order allow calculation of predicted risks for new subjects.

79 / 82

Nomogram

50� incidence of clinically indolent PCa versus a rate of 14.6� in those found to have PCa on first biopsy.21Indolent PCa also appears to be more common in men with free/total PSA 0.15 or greater.8We did not have sufficient data to incorporate the number of previous negative biopsies or free/ total PSA in our modeling but these factors may be useful in future research.

We developed these nomograms for predicting the possibil-ity that a man has an indolent cancer. The base model re-quires minimal data input, while the full model rere-quires considerable analysis of systematic biopsy. The base model appears to be better calibrated (fig. 2), which means that for a group of patients the mean predicted probability is close to the proportion of men with indolent cancer. However, the full model is more discriminating (table 2), which means that it can better rank individual patients with respect to risk. Ideally, then, one would use the full model to obtain a pre-dicted probability for the patient, and then adjust the prediction to correct for the calibration error. We plan to incorporate this adjustment in our free software, which can be obtained at www.nomograms.org.

New models were developed to distinguish indolent from clinically important cancer with relatively high areas under the ROC curve (0.64 to 0.79). However, these models may be more useful for ruling out, rather than ruling in, indolent cancer, given that they rarely predict with very high proba-bilities. The nomograms would be useful for counseling the man with low predicted probability of indolent cancer, reas-suring him that aggressive therapy appears warranted. Cur-rently, a watchful waiting policy for a man with indolent cancer is determined subjectively based on disease charac-teristics, health and patient preferences. Our nomogram may provide a useful tool for assisting in this decision. Although the nomogram accurately discriminates between men with

to direct a man toward watchful waiting. For example, an indolent cancer in an older man may be considered ideal for conservative management but an indolent cancer in a young man may still warrant aggressive therapy, since the patient may have a long life expectancy during which the cancer may progress from its present, presumably indolent, state. With this nomogram, we can only hope to provide objective infor-mation for use as a basis for this decision, rather than pro-vide an absolute solution to the management dilemma.

CONCLUSIONS

Nomograms have been developed to predict the probability that a man with prostate cancer has an indolent tumor. They each appear to have excellent discriminatory ability and good calibration.

Drs. Kazuho Suyama, Takuji Utsunomiya, Shigehiro Soh and John Gore at Baylor College of Medicine, and Drs. Peter G. Hammerer, Alexander Haese and Rolf-Peter Henke at Hamburg University Hospital helped collect the data.

REFERENCES

1. Jemal, A., Thomas, A., Murray, T. and Thun, M.: Cancer statis-tics, 2002. CA Cancer J Clin,52:23, 2002

2. Ohori, M., Wheeler, T. M., Dunn, J. K., Stamey, T. A. and Scardino, P. T.: The pathological features and prognosis of prostate cancer detectable with current diagnostic tests. J Urol,152:1714, 1994

3. Noguchi, M., Stamey, T. A., McNeal, J. E. and Yemoto, C. M.: Relationship between systematic biopsies and histological fea-tures of 222 radical prostatectomy specimens: lack of predic-tion of tumor significance for men with nonpalpable prostate cancer. J Urol,166:104, 2001

4. Partin, A. W., Kattan, M. W., Subong, E. N., Walsh, P. C., Wojno, FIG. 5. Nomogram for full model.Pre.Tx., pretreatment.Clin., clinical.Pri.Bx.Gl, primary biopsy Gleason score.Sec.Bx.Gl, secondary biopsy Gleason score.U/S, ultrasound.Prob., probable.

PREDICTION OF INDOLENT PROSTATE CANCER

1796

(21)

Time-dependent predictor variables

I _{These variables cannot not be used in the conventional}

way, i.e., time-dependent Cox, when the aim is prediction

I _{Most straightforward approach is landmark analysis:}

move time origin to a (landmark) time point after the original time origin

exclude all subjects who had an event before the new time origin

update time-varying predictor variables to the value at or last value before the new time origin

compute changes and other parameters of repeated measurements of the time-varying predictor variables until the new time-origin

proceed as before: t models, extract predicted risks, evaluate performance

81 / 82

Take home messages

I _{Risk prediction models can be derived from conventional}

regression models

I _{All subjects need to be followed from a common time origin}

I _{Predicted risks are associated with a given time-horizon}

I _{Only predictor variables that are known at time zero can}

be used to predict

I _{Censored data and competing risks need to be accounted}

for by choosing appropriate methods