PhD course: Statistical evaluation of diagnostic
and predictive models
Modelling
Thomas Alexander Gerds (University of Copenhagen) March 02-04, 2016
1 / 82
Multiple regression
Multiple regression can be used to exploit the joint predictive power of several or many variables.
Conventional modelling techniques:
I logistic regression for binary outcome
I Cox regression for time-to-event (survival) outcome
I Cox regression and Fine-Gray regression in the presence of
competing risks
P-values testing the null hypothesis of no association are not a good measure of predictive power.
2 / 82
Modelling Performance Survival Competing risk Display
Example: epo study
1Anaemia is a deciency of red blood cells and/or hemoglobin and an additional risk factor for cancer patients.
Randomized placebo controlled trial: does treatment with epoetin beta epo (300 U/kg) enhance hemoglobin concentration level and improve survival chances?
Henke et al. 2006 identied the c20 expression (erythropoietin receptor status) as a new biomarker for the prognosis of locoregional progression-free survival.
1Henke et al. Do erythropoietin receptors on cancer cells explain
unexpected clinical ndings? J Clin Oncol, 24(29):4708-4713, 2006.
Modelling Performance Survival Competing risk Display
Treatment
The study includes 149 2 head and neck cancer patients with a
tumor located in the oropharynx (36%), the oral cavity (27%), the larynx (14%) or in the hypopharynx (23%).
One of the treatments was radiotherapy following Resection
Complete Incomplete No
Placebo 35 14 25
Epo 36 14 25
Outcome
Blood hemoglobin levels were measured weekly during radiotherapy (7 weeks).
Treatment with epoetin beta was dened successful when the
hemoglobin level increased suciently. For patient i set
Yi = ( 1 treatment successful 0 treatment failed 5 / 82
Exercises
Consider the following subgroup of male patients over 75 years in the active treatment group from the Epo Study:
Treat sex tot.rad.dose HbBase age Y
Epo male 72 16.0 72 1
Epo male 60 12.7 76 1
Epo male 70 9.9 79 0
Epo male 60 10.7 76 0
Compute (by hand without computer) the AUC for discriminating treatment success (Y=1) for:
I baseline hemoglobin level
I age
I total radiation dose
6 / 82
Modelling Performance Survival Competing risk Display
Solutions
baseline hemoglobin level
100*(I(16>9.9)+I(16>10.7)+I(12.7>9.9)+I(12.7>10.7))/4
[1] 100 age
100*(I(72>79)+I(72>76)+I(76>79)+0.5*I(76==76))/4
[1] 12.5
Total radiation dose
100*(I(72>70)+I(72>60)+I(60>70)+0.5*I(60==60))/4
[1] 62.5
Modelling Performance Survival Competing risk Display
Target
Patient no. Treatment successful Predicted probability
1 0 P1 2 0 P2 3 1 P3 4 1 P4 5 0 P5 6 1 P6 7 1 P7 · · · · · ·
Predictors
Age min: 41 y, median: 59 y, max: 80 y
Gender male: 85%, female: 15%
Baseline hemoglobin mean: 12.03 g/dl, std: 1.45
Treatment epo: 50%, placebo 50%
Stratum complete: 48%, incomplete: 19%,
no resection: 34% Erythropoietin
receptor status neg: 32%, pos: 68%
9 / 82
Logistic regression
Response: treatment successful yes/no
Factor OddsRatio StandardError CI.95 pValue
(Intercept) 0.00 4.01 <0.0001 Age 0.97 0.03 [0.91;1.03] 0.2807 Sex:female 4.71 0.84 [0.91;26.02] 0.0657 HbBase 3.25 0.27 [1.99;5.91] <0.0001 Treatment:Epo 90.92 0.76 [23.9;493.41] <0.0001 Resection:Incompl 1.75 0.81 [0.36;9.03] 0.4924 Resection:Compl 4.14 0.69 [1.13;17.36] 0.0395 Receptor:positive 5.81 0.66 [1.72;23.39] 0.0076 10 / 82
Modelling Performance Survival Competing risk Display
The model provides general information
Treatment with epo increases the chance (odds) of reaching the target hemoglobin level signicantly by a factor of
90.92 (CI95% : [23.9;493.4],p <0.0001)
in the overall study population.
Does that mean everyone should be treated?
Modelling Performance Survival Competing risk Display
The model provides information for a single patient
For example: the predicted probability that a 51 year old man with complete tumor resection and baseline hemoglobin level
12.6g/dl reaches the target hemoglobin level (Yi=1) is
[Epo group: ] 97.4% [ Placebo: ] 29.2 %
If a similar patient has baseline hemoglobin level 14.8 g/dl then
the model predicts: [Epo group: ] 99.8% [Placebo: ] 84.7 %
Predictions and Brier score for logistic regression
Patient Treatment Predicted Brier
no. successful probability (%) Residual score
Yi Pi Yi−Pi (Yi−Pi)2 · · · · · 142 0 84.09 -84.09 0.7071 143 0 93.47 -93.47 0.8737 144 0 18.73 -18.73 0.0351 145 0 1.81 -1.81 0.0003 146 0 3.86 -3.86 0.0015 147 1 96.64 3.36 0.0011 148 0 0.5 -0.5 <0.0001 149 0 11.93 -11.93 0.0142 Σ0.0869 13 / 82
The model behind the table
The linear predictor is the linear combination of log-odds-ratios and the values of the patient's characteristics (predictors):
LP=β0+β1age+β2HbBase +β3Treat+· · ·+β7epoRec
Predicted probability of successful treatment= 1
1+exp{LP}
I a prediction can be obtained for any given value of the
predictor variables (patient characteristics)
I the predicted chance of successful treatment depends on
all odds ratio's and all variables
14 / 82
Modelling Performance Survival Competing risk Display
Predicted treatment success probability (logistic regression)
For a treated man with no resection possible and negative epo receptor status. Predicted risk Age (years) Baseline hemoglobin (g/dl) 9 10 11 12 13 14 50 60 70 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Modelling Performance Survival Competing risk Display
Tools for evaluating prediction accuracy
For each subject we have a predicted risk based on multiple predictors. To evaluate the prediction performance of the logistic regression model we consider the following tools:
I Distribution of risks
I Prediction accuracy: Brier score (lack of calibration and
lack of spread of predictions)
I Discrimination: Roc curve, c-index = AUC (lack of spread
of predictions)
I Calibration plot: (lack of calibration)
I Re-classication scatterplot: (comparison of risk
Brier score for null model in the Epo study
Patient Treatment Predicted Brier
no. successful probability (%) Residual score
Yi Pi Yi−Pi (Yi−Pi)2 · · · · · 142 0 44.3 -44.3 0.1962 143 0 44.3 -44.3 0.1962 144 0 44.3 -44.3 0.1962 145 0 44.3 -44.3 0.1962 146 0 44.3 -44.3 0.1962 147 1 44.3 55.7 0.3103 148 0 44.3 -44.3 0.1962 149 0 44.3 -44.3 0.1962 Σ0.247
The predicted probability is the prevalence of patients with treatment success in the data set.
17 / 82
Prevalence model
Calibration plot
Predicted probability of treatment success
Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ●
Performance null model Brier=24.7 AUC=50.0
18 / 82
Modelling Performance Survival Competing risk Display
Discrete predictors: Gender, resection status and epo
Calibration plot
Predicted probability of treatment success
Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ●● ● ● ● ● ● Null model Brier=24.7 AUC=50.0 ● Gender model Brier=24.7 AUC=50.3 ● Resection model Brier=24.0 AUC=58.7 ● Treatment model Brier=13.6 AUC=83.7
Modelling Performance Survival Competing risk Display
Continuous predictors: Baseline hemoglobin, Age
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● Re−classification plot
Predicted chance (Age model)
Predicted chance (Hemoglobin model)
0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ● ● Treatment success Treatment failed
Continuous predictors: Baseline hemoglobin, Age
Calibration plot
Predicted probability of treatment success
0 % 25 % 50 % 75 % 100 % Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % Null model Brier=24.7 AUC=50.0 ● Age model Brier=24.7 AUC=51.2 ●
Baseline hemoglobin model Brier=19.3 AUC=77.2
21 / 82
Continuous predictors: Baseline hemoglobin, Age
Roc curves 1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % Null model Brier=24.7 AUC=50.0 ● Age model Brier=24.7 AUC=51.2 ● Baseline hemoglobin Brier=19.3 AUC=77.2 22 / 82
Modelling Performance Survival Competing risk Display
Multiple regression models
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● Re−classification plot
Predicted chance (excluding receptor status)
Predicted chance (including receptor status)
0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ● ● Treatment success Treatment failed
Modelling Performance Survival Competing risk Display
Multiple regression models
Calibration plot
Predicted event probability
0 % 25 % 50 % 75 % 100 % Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % Null model Brier=24.7 AUC=50.0 ● All variables Brier= 9.6 AUC=93.3 ●
All + receptor status Brier= 8.7 AUC=94.7
Multiple regression models
Roc curves 1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % ● All variables Brier= 9.6 AUC=93.3 ●All + receptor status Brier= 8.7 AUC=94.7
25 / 82
R crash course: logistic regression prediction
Fitting the logistic regression model
logreg <- glm(Y~Treat+HbBase+age,data=Epo,family="binomial") Predicting probability of Y=1
## for same data
Epo$predrisk <- predictStatusProb(logreg,newdata=Epo)
boxplot(predrisk~Y,Epo)
## for specific values of the predictor variables
ndat=data.frame(Treat=c("Epo","Placebo"), HbBase=c(12,12),age=c(50,50)) predictStatusProb(logreg,newdata=ndat) ROC, calibration plot, AUC and Brier score
rlogreg <- Roc(list("logistic regression"=logreg),formula=Y~ 1,data=Epo)
rlogreg
plot(rlogreg,auc=TRUE)
calplot2(list("logistic regression"=rlogreg))
26 / 82
Modelling Performance Survival Competing risk Display
Exercises
Load the in vitro fertilisation data. Variables are described here: http://192.38.117.59/~tag/Teaching/share/data/IVF.html
IVF <- read.table("http://192.38.117.59/~tag/Teaching/share/ data/IVF.txt",header=TRUE)
1. Compute the prevalence of responders (y = 1)
2. Fit a logistic regression model with additive eects of the variables age, bmi, cyclelen, antfoll
3. Boxplot the predicted response probabilities in same data and add a line at the prevalence. Boxplot the probabilities conditional on the outcome.
4. Compare the risks of a 25 year old woman to that of a 35 year old woman when both have a BMI of 25, a cycle length of 28 and 19 antral follicles.
5. Plot the ROC curve and the calibration curve for this 4 variable model.
6. Compute the area under the ROC curve and the Brier score and compare the result to the benchmark prediction which ignores the predictor variables.
Modelling Performance Survival Competing risk Display
Solutions 1.
Compute the prevalence of responders (y = 1)
mean(IVF$y)
[1] 0.6690909 Same same:
mean(IVF$response=="positive")
Solutions 2.
Fit a logistic regression model with additive eects of the variables age, bmi, cyclelen, antfoll
library(Publish)
fit <- glm(y~age+bmi+cyclelen+antfoll,data=IVF,family=" binomial")
# summary(fit)
publish(fit,org=TRUE,units=list(age="years",bmi="kg/m^2", cyclelen="days"))
Variable Units OddsRatio CI.95 p-value
age years 0.96 [0.88;1.04] 0.33514 bmi kg/m2 0.93 [0.85;1.01] 0.10118 cyclelen days 1.24 [1.05;1.46] 0.01079 antfoll 1.14 [1.09;1.20] < 0.0001 29 / 82
Solutions 3.
Boxplot the predicted response probabilities in same data and add a line at the prevalence. Boxplot the probabilities conditional on the outcome.
IVF$pred <- predictStatusProb(fit,newdata=IVF)
boxplot(pred~response,data=IVF,horizontal=TRUE,ylim=c(0,1))
negativ e positiv e 0.0 0.2 0.4 0.6 0.8 1.0 30 / 82
Modelling Performance Survival Competing risk Display
Solutions 4.
Compare the risks of a 25 year old woman to that of a 35 year old woman when both have a BMI of 25, a cycle length of 28 and 19 antral follicles.
ndat <- data.frame(age=c(25,35),bmi=25,cyclelen=28,antfoll=19) ndat$pred <- round(100*predictStatusProb(fit,newdata=ndat),1) ndat
age bmi cyclelen antfoll pred
1 25 25 28 19 71.7
2 35 25 28 19 62.4
Modelling Performance Survival Competing risk Display
Solutions 5
Plot the ROC curve and the calibration curve for this 4 variable model.
library(ModelGood)
plot(Roc(list("logistic regression"=fit)),auc=TRUE)
1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % AUC (%) logistic.regression (78.9)
Solutions 5
Plot the calibration curve and compute the
library(ModelGood)
calPlot2(list("logistic regression"=fit))
Predicted event probability
0 % 25 % 50 % 75 % 100 % Obser v ed propor tion 0 % 25 % 50 % 75 % 100 % logistic.regression 33 / 82
Solutions 6
nullfit <- glm(y~1,data=IVF,family="binomial")
Roc(list("logistic regression"=fit,"reference"=nullfit))
Receiver operating characteristic Sample size: 275
Response: '0' (n=91) '1' (n=184)
Area under the ROC curve (AUC, higher better): full data
logistic.regression 78.91
reference NA
Brier score (Brier, lower better): full data logistic.regression 17.28
reference 22.14
34 / 82
Modelling Performance Survival Competing risk Display
The survival part of the epo study
The locoregional progression free survival time in the epo study is the time between treatment and what comes rst, death of patient or locoregional progression of the tumour.
Epo increases the blood hemoglobin level and thus successful epo treatment should improve the survival chances . . .
Modelling Performance Survival Competing risk Display
The role of time
Prediction model timeline
Time point at which patient is provided
with prediction
Time point attached to the prediction
baseline
followup
Origin (time 0) Horizon (time t)
Lost to followup, or (right) censored, means that patient was not followed until horizon time t.
I In survival analysis, a prediction is a survival function of time. I At any time, the predicted risk is equal to one minus the chance
to survive until this time point.
I The null model which ignores the predictor variables and predicts the prevalence to all subjects is obtained with the Kaplan-Meier estimator.
Predictors in the epo study
I Agemin: 41 y, median: 59 y, max: 80 y
I Gender
male: 85%, female: 15%
I Baseline hemoglobin level
mean: 12.03 g/dl, std: 1.45
I Treatment arm
epo: 50%, placebo 50%
I Resection
complete: 48%, incomplete: 19%, no resection: 34%
I Erythropoietin receptor status
neg: 32%, pos: 68%
37 / 82
Eect of epo treatment on prediction
Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % Treat 74 62 39 34 30 25 23 20 17 11 7 Placebo: 75 52 35 30 25 20 18 13 11 5 3 Epo: Treat Placebo Epo Locoregional progression−free survival 38 / 82
Modelling Performance Survival Competing risk Display
Treatment eect explained by new marker?
Epo−Receptor: positive Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % Treat 50 42 30 26 26 21 18 18 15 10 7 Placebo: 51 36 24 20 16 13 12 9 8 3 2 Epo: Treatment Placebo Epo Locoregional progression−free survival Epo−Receptor: negative Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % Treat 24 21 11 9 7 6 5 4 2 2 1 Placebo: 24 18 12 11 10 9 7 5 3 3 2 Epo: Treatment Placebo Epo Locoregional progression−free survival
Modelling Performance Survival Competing risk Display
Treatment eect explained by resection status?
Resection: none Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % 25 7 6 3 2 2 Placebo: 25 2 0 0 0 0 Epo: Treatment Placebo Epo Resection: incomplete Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % 14 11 7 6 5 0 Placebo: 14 8 5 4 2 0 Epo: Treatment Placebo Epo Resection: complete Months 0 12 24 36 48 60 0 % 25 % 50 % 75 % 100 % 35 21 17 14 10 5 Placebo: 36 25 20 14 9 3 Epo: Treatment Placebo Epo
Cox regression
Variable Units HazardRatio CI.95 p-value
Treat Placebo 1.00 [1.00;1.00] 1.00000 Epo 1.00 [0.51;1.96] 0.99017 epoRec 0 1.00 [1.00;1.00] 1.00000 1 0.69 [0.39;1.21] 0.19199 age 0.98 [0.96;1.00] 0.04769 sex male 1.00 [1.00;1.00] 1.00000 female 0.88 [0.47;1.65] 0.68916 stratum CompleteResection 1.00 [1.00;1.00] 1.00000 IncompleteResection 1.31 [0.75;2.28] 0.34486 noResection 3.25 [2.05;5.18] < 0.001 Treat(Placebo): epoRec(1 vs 0) 0.69 [0.39;1.21] 0.19199 Treat(Epo): epoRec(1 vs 0) 1.39 [0.77;2.51] 0.27682
epoRec(0): Treat(Epo vs Placebo) 1.00 [0.51;1.96] 0.99017
epoRec(1): Treat(Epo vs Placebo) 2.02 [1.25;3.26] 0.00413
I
last two lines show eect of treatment separately for the
two epo receptor status groups
I
main eects have no interpretation
I
likelihood ratio test needed to see if eect modication is
signicant
41 / 82
Predictions of the Cox regression model
The linear predictor is the linear combination of
log-hazard-ratios and the values of the patient's characteristics (predictors):
LP= ˆβ1Treat+ ˆβ2epoRec + ˆβ3age+· · ·+β7Treat∗epoRec
The predicted t-years survival probability for a patient with linear predictor LP is obtained with the formula:
S(t|LP) =exp(−Λ0(t)exp(LP))
where Λ0(t) is the cumulative hazard function (time-dependent)
when all predictor variables have value 0.
42 / 82
Modelling Performance Survival Competing risk Display
R crash course: Cox regression (part I)
Fitting the Cox regression model
coxreg<-coxph(Surv(ldfs.time,ldfs.status)~Treat*epoRec+age+stratum, data=Epo)
Predicting 4-year survival
## for same data
Epo$predsurv<-predictSurvProb(coxreg,newdata=Epo,times=48)
## for specific values of the predictor variables
ndat=data.frame(Treat=c("Epo","Placebo"),
stratum="CompleteResection",epoRec=factor(1), HbBase=c(12,12),age=c(50,50))
predictSurvProb(coxreg,newdata=ndat,times=48)
Scatterplot of predicted 4-year survival against baseline hemoglobin Epo$risk48<-predictSurvProb(coxreg,newdata=Epo,times=48) plot(risk48~HbBase,data=Epo,ylim=c(0,1))
Quantiles of predicted 4-year survival conditional on outcome
qrisk<-Score(list(coxreg),formula=Surv(ldfs.time,ldfs.status)~1,data =Epo,metrics=NULL,plots=NULL,summary="riskQuantile",times=48, nullModel=FALSE)
boxplot(qrisk,type="risk")
AUC and Brier score for 4-year predicted survival
rcoxreg<-Score(list(coxreg),formula=Surv(ldfs.time,ldfs.status)~1, data=Epo,times=48)
rcoxreg
riskScore:::plot.score.ROC(rcoxreg)
Modelling Performance Survival Competing risk Display
R crash course: Cox regression (part II)
Predicted survival function of timendat=data.frame(Treat=c("Epo","Placebo"),
stratum="CompleteResection",epoRec=factor(1), HbBase=c(12,12),age=c(50,50))
etimes <- sort(unique(Epo$ldfs.time))
survpred <- predictSurvProb(coxreg,newdata=ndat,times=etimes) plotPredictSurvProb(coxreg,newdata=ndat,col=1:2)
Time-dependent Brier score and AUC
rtcoxreg <- Score(list(coxreg),formula=Surv(ldfs.time,ldfs.
status)~1,data=Epo,times=1:48) riskScore:::plot.score.AUC(rtcoxreg) riskScore:::plot.score.Brier(rtcoxreg)
Exercises
Iload the GBSG2 data:
library(pec);data(GBSG2);help(GBSG2)
ITake a random subset of GBSG2 data which contains about 60% of the patients. Name this subset 'GBSG2.learn' and its complement which contains the remaining 40% of the patients 'GBSG2.val':
set.seed(17)
learn.id<- sample(1:NROW(GBSG2),size=.6*NROW(GBSG2)) GBSG2.learn=GBSG2[learn.id,]
GBSG2.val=GBSG2[-learn.id,] In the learning data:
IFit the overall Kaplan-Meier curve. Read from the graph, the estimated probability of surviving 500 days.
IFit a Cox regression model with the prognostic factors age, tumor size and grade, number of positive lymph nodes, estrogen and progesterone receptors.
In the validation data:
IPlot the 500-day survival predictions for given age
IBoxplot the 500-day survival predictions conditional on outcome
IPlot the Brier score of the Cox regression model against time and compare against the reference (Kaplan-Meier prediction).
IPlot the time-dependent AUC of the Cox regression model against time.
45 / 82
Solutions
km <-prodlim(Hist(time,cens)~1,data=GBSG2.learn)
plot(km)
abline(v=500,col=2)
abline(h=predict(km,times=500),col=2)
text(x=550,y=predict(km,times=450),round(100*predict(km,times=500),1),
pos=4,col=2) Time Sur viv al probability 0 500 1000 1500 2000 2500 0 % 25 % 50 % 75 % 100 % 411 385 334 259 211 164 120 70 28 8 1 Subjects: 86.1 46 / 82
Modelling Performance Survival Competing risk Display
Solutions
Fit a Cox regression model with the prognostic factors age, tumor size and grade, number of positive lymph nodes, estrogen and progesterone receptors.
library(Publish);library(survival)
fit <- coxph(Surv(time,cens)~age + menostat + tsize + tgrade + pnodes
+ progrec + estrec,data=GBSG2.learn) publish(fit,org=TRUE)
Variable Units HazardRatio CI.95 p-value
age 1.00 [0.98;1.03] 0.88397 menostat Post 1.00 [1.00;1.00] 1.00000 Pre 0.94 [0.58;1.53] 0.79777 tsize 1.00 [0.99;1.01] 0.39928 tgrade I 1.00 [1.00;1.00] 1.00000 II 2.42 [1.17;4.98] 0.01669 III 3.09 [1.44;6.65] 0.00389 pnodes 1.05 [1.03;1.07] < 0.001 progrec 1.00 [1.00;1.00] 0.03033 estrec 1.00 [1.00;1.00] 0.86952
Modelling Performance Survival Competing risk Display
Solutions
In the validation data: Plot the 500-day survival predictions for given age
GBSG2.val$risk500<-1-predictSurvProb(fit,newdata=GBSG2.val,times
=500)
plot(risk500~age,data=GBSG2.val,ylim=c(0,1))
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 40 50 60 70 80 0.0 0.2 0.4 0.6 0.8 1.0 age risk500
Solutions
In the validation data: Plot the 500-day survival predictions for given age score500 <- Score(list(fit),data=GBSG2.val,formula=Surv(time
,cens)~1,times=500,summary="riskQuantile",nullModel= FALSE)
boxplot(score500,type="risk",xlim=c(0,25))
coxph Predicted risk 0 % 6.25 % 12.5 % 18.75 % 25 % Overall Event Event−free Outcome 49 / 82
Solutions
Plot the Brier score of the Cox regression model against time and compare against the reference (Kaplan-Meier prediction).
score.time <-Score(list(fit),data=GBSG2.val,formula=Surv(
time,cens)~1,times=seq(0,2000,100),summary="riskQuantile ",nullModel=FALSE)
riskScore:::plot.score.Brier(score.time)
Time Br ier score 0 500 1000 1500 2000 0 % 5 % 10 % 15 % 20 % 25 % 50 / 82
Modelling Performance Survival Competing risk Display
Solutions
Plot the time-dependent AUC of the Cox regression model against time.
riskScore:::plot.score.AUC(score.time)
Time A UC 0 500 1000 1500 2000 50 % 60 % 70 % 80 % 90 % 100 %
Modelling Performance Survival Competing risk Display
Censored data and competing risks
In survival analysis, with no competing risks, the time of the
event is not observed when the patient was lost to follow-up (right-censored) before the event occurred.
The Kaplan-Meier method and our estimates of risk quantiles, AUC and Brier score give heigher weights to uncensored
subjects to account for that maybe the earlier censored subjects also had an event. This method is called inverse probability of censoring weighting (IPCW).
A competing riskis an event after which it is clear that the patient will never experience the event of interest. Here IPCW would be a mistake (bias).
Competing risk
Speed = 0Censored
Speed = ? 53 / 82Competing risks
ExamplesI non-cardiovascular mortality for event cardiovascular mortality
I non-cancer mortality for event relapse
I kidney transplant for event kidney failure without transplant
I discharge from ICU for event death in ICU
Pitfall
If a competing risk event is treated in the same way as
loss-to-followup (censored) then one is analysing a hypothetical world in which the competing risk does not exist (bias in this world).
Aim of prediction
In the presence of competing risks the aim of a risk prediction analysis is unchanged: the risk of the event of interest between the time origin and the prediction horizon.
54 / 82
Modelling Performance Survival Competing risk Display
Illustration
What happens:
I A cyclist can have an accident or get injured.
Consequences:
I after the accident the speed is zero.
Decision making:
I To estimate the chances of a cyclists to win a race, one
has to combine the speed with the probability of an
accident/injury.
Modelling Performance Survival Competing risk Display
Illustration
What happens:
I A patient can die free from cardiovascular disease.
Consequences:
I after death the hazard rate of cardiovascular
disease/mortality is zero. Decision making:
I To estimate the risk of cardiovascular disease/mortality,
one has tocombine the hazard rate of cardiovascular
disease/mortality with the hazard rate of death due to other causes.
Modelling of competing risks data
Several Cox regression models
I One Cox regression for each competing risk (cause of
death), which then are combined to predict the event of interest
advantage: model focus on the biological mechanism behind the dierent causes
disadvantage: requires modelling the competing risks
Direct regression models
I Fine-Gray regression / Logistic regression/ Absolute risk
regression
advantage: no model for the other causes needed
disadvantage: requires a model for the censoring times
57 / 82
Cox regression 5-year absolute CVD risk (formula I)
Need linear predictors of cause-specic log-hazard-ratios and patient's characteristics X . . . one for each cause:
I Cox regression for CVD hazard:
λ1(u|X) =λ01(u)exp(LP1)
I Cox regression for non-CVD hazard:
λ2(u|X) =λ02(u)exp(LP2)
Absolute 5-year risk of CVD:
Z 5 0 exp − Z s 0 {λ1(u|X) +λ2(u|X)}du | {z }
No event of any cause until s
λ1(s|X)
| {z }
CVD at s
ds.
58 / 82
Modelling Performance Survival Competing risk Display
Cox regression 5-year absolute CVD risk (formula II)
Need linear predictors of cause-specic log-hazard-ratios and patient's characteristics X . . . one for event and one for event-free (overall):
I Cox regression for CVD hazard:
λ1(u|X) =λ01(u)exp(LP1)
I Cox regression for event-free survival hazard:
λoverall(u|X) =λoverall(u)exp(LPoverall) Absolute 5-year risk of CVD:
Z 5 0 exp − Z s 0 λoverall(u|X)du | {z }
No event of any cause until s
λ1(s|X)
| {z }
CVD at s
ds.
Modelling Performance Survival Competing risk Display
Illustration: TRACE study
3The TRACE contains 1877 patients. Survival of patients after myocardial infarction related to various risk factors:
- status : 0: alive
9: dead from myocardial infarction, 7: dead from other causes.
- time: Survival time in years.
- chf: Clinical heart pump failure, 1: present, 0: absent. - diabetes: Diabetes, 1: present, 0: absent.
- vf: Ventricular fibrillation, 1: present, 0: absent.
- wmi: Measure of heart pumping effect based on ultrasound measurements where 2 is normal and 0 is worst. - sex: 1: female, 0: male.
- age: Age of patient.
3Jensen et al. 1997 Does in-hospital ventricular brillation aect
prognosis after myocardial infarction?, European Heart Journal 18, 919-924.
Cox regression (formula I)
CSC(list(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,Hist(time,event) ~age+sex+diabetes),data=TRACE)
$`Cause 1`
Variable Units HazardRatio CI.95 p-value
1 wmi 0.45 [0.38;0.53] <0.001 2 chf 1.74 [1.50;2.03] <0.001 3 age 1.06 [1.05;1.07] <0.001 4 sex 1.10 [0.96;1.27] 0.167 5 diabetes 1.50 [1.24;1.81] <0.001 6 vf 2.18 [1.75;2.72] <0.001 $`Cause 2`
Variable Units HazardRatio CI.95 p-value
1 age 0.94 [0.90;0.99] 0.012
2 sex 3.35 [0.42;26.76] 0.254
3 diabetes 2.36 [0.52;10.84] 0.268
61 / 82
Cox regression (formula II)
CSC(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE,survtype="
surv")
$`Cause 1`
Variable Units HazardRatio CI.95 p-value
1 wmi 0.45 [0.38;0.53] <0.001 2 chf 1.74 [1.50;2.03] <0.001 3 age 1.06 [1.05;1.07] <0.001 4 sex 1.10 [0.96;1.27] 0.167 5 diabetes 1.50 [1.24;1.81] <0.001 6 vf 2.18 [1.75;2.72] <0.001 $OverallSurvival
Variable Units HazardRatio CI.95 p-value
1 wmi 0.46 [0.38;0.54] <0.001 2 chf 1.71 [1.47;1.99] <0.001 3 age 1.06 [1.05;1.07] <0.001 4 sex 1.11 [0.97;1.27] 0.143 5 diabetes 1.51 [1.25;1.82] <0.001 6 vf 2.16 [1.74;2.69] <0.001 62 / 82
Modelling Performance Survival Competing risk Display
Fine-Gray regression
The linear predictor LP is the linear combination of estimated log-*sub*-hazard-ratios and the values of the patient's
characteristics (predictors):
Absolute 5-year risk of CVD:=exp[−exp{β0(5) +LP)}]
I β0(t) = log-sub-hazard (time-dependent) when all
predictor variables have value 0.
IPCW to account for censored data
I Censoring time = time where subject was lost to follow-up
I Kaplan-Meier method applied to censoring times
I Cox for censoring times
Modelling Performance Survival Competing risk Display
Fine-Gray regression
FGR(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=
TRACE,cause=1)
coef exp(coef) se(coef) z p-value
wmi -0.80718 0.4461 0.090987 -8.8714 0.00000000000000 chf 0.55653 1.7446 0.075961 7.3265 0.00000000000024 age 0.05921 1.0610 0.003943 15.0170 0.00000000000000 sex 0.09439 1.0990 0.073493 1.2843 0.20000000000000 diabetes 0.40744 1.5030 0.096645 4.2159 0.00002500000000 vf 0.78392 2.1900 0.134950 5.8090 0.00000000630000
I Regression coecients = sub-hazard-ratios = hard to interprete I Direction of eect and p-value are correct
Logistic risk regression
The linear predictor LP is the linear combination of estimated log-odds-ratios and the values of the patient's characteristics (predictors):
Absolute 5-year risk of CVD:=exp{β0(5) +LP(5))}
I β0(5) = log-odds (at 5 years) when all predictor variables
have value 0.
I Problem: odds(5) = risk(5)/(1-risk(5)) as usual, but here
(1-risk(5)) = probability of either being alive or having died due to non-CVD related causes within 5-years
IPCW to account for censored data
I either Kaplan-Meier method applied to censoring times
I or Cox regression for censoring times
65 / 82
Logistic regression
LRR(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE, cause=1,times=5)
Factor ARR CI95 p-value
wmi 0.36 [0.27;0.48] < 0.0001 chf 2.53 [2.01;3.20] < 0.0001 age 1.0860 [1.07;1.10] < 0.0001 sex 1.14 [0.90;1.46] 0.2817039 diabetes 2.27 [1.54;3.33] < 0.0001 vf 2.08 [1.36;3.18] 0.0007231
Example interpretation of coecients:
I The odds of cardiovascular mortality within 5-years was 2.27 times
higher for patients with a history of diabetes.
I The odds of cardiovascular mortality within 5-years was increased by
a factor 1.08 for each additional year of age.
66 / 82
Modelling Performance Survival Competing risk Display
Absolute risk regression
The linear predictor LP is the linear combination of estimated log-absolute-risk-ratios and the values of the patient's
characteristics (predictors):
Absolute 5-year risk of CVD:=exp{β0(5) +LP(5))}
I β0(5) = log-absolute-risk (at 5 years) when all predictor
variables have value 0.
I Problem: may not t very well when there are continuous
predictor variables
IPCW to account for censored data
I either Kaplan-Meier method applied to censoring times
I or Cox for censoring times
Modelling Performance Survival Competing risk Display
Absolute risk regression
ARR(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE, cause=1,times=5)
Factor ARR CI95 p-value
wmi 0.696 [0.63;0.77] < 0.0001 chf 1.643 [1.44;1.87] < 0.0001 age 1.0328 [1.03;1.04] < 0.0001 sex 1.042 [0.96;1.13] 0.3183186 diabetes 1.20 [1.09;1.33] 0.0002537 vf 1.282 [1.16;1.41] < 0.0001
Example interpretation of coecients:
I The risk of cardiovascular mortality within 5-years was 1.2 times
higher for patients with a history of diabetes.
I The risk of cardiovascular mortality within 5-years was increased by a
R crash course: competing risks (part I)
Fitting cause-specic Cox regression
cscfit<-CSC(Hist(time,event)~wmi+chf+age+sex+diabetes+vf,data=TRACE)
Predicted risks for a given time point and all data
TRACE$risk5.cox <-predictEventProb(cscfit,times=5,newdata=TRACE,cause=1)
Predicted risks for specic values of the predictor variables
ndat<- data.frame(wmi=2.5,chf=1,age=60,sex=c(0,1),diabetes=1,vf=0) predictEventProb(cscfit,times=5,newdata=ndat,cause=1)
Predicted risks as function of time
ndat<- data.frame(wmi=2.5,chf=1,age=60,sex=c(0,1),diabetes=1,vf=0) TRACE$risk5.cox <-predictEventProb(cscfit,times=seq(0,5,.1),newdata=ndat) plotPredictEventProb(cscfit,cause=1,newdata=ndat,col=1:2)
Brier score and AUC
Score(list(Cox=cscfit),formula=Hist(time,event)~1,data=TRACE,times=5)
69 / 82
Exercises
Consider the Melanoma data: library(riskRegression); data(Melanoma); help(Melanoma)
I Plot the ROC-curve of tumor thickness as a marker for cancer related death after 5 years. Write a conclusion sentence which includes the AUC.
I Log-transform the variable tumor thickness.
I Fit a combined cause-specic Cox regression (CSC) with all covariates for cause 1 hazards but only sex and age for cause 2 hazards (non-cancer mortality).
I Fit a Fine-Gray regression model (FGR) for cause 1 specic risks with all covariates.
I Fit an absolute risk regression model for cause 1 specic risks (ARR) with all covariates .
I Scatterplot predicted risks of the 5 year cancer-specic mortality obtained with the CSC model to those obtained with the ARR model. I Scatterplot predicted risks of the 5 year cancer-specic mortality
obtained with the FGR model to those obtained with the ARR model.
I Compute Brier score and AUC for 5-year prediction horizon.
70 / 82
Modelling Performance Survival Competing risk Display
Solutions
Plot the ROC-curve of tumor thickness as a marker for cancer related death after 5 years. Write a conclusion sentence which includes the AUC.
1−Specificity Sensitivity 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 % AUC AalenJohansen: 50.0 [50.0;50.0] thickness: 68.0 [58.9;77.1]
Modelling Performance Survival Competing risk Display
Solutions
Log-transform the variable tumor thickness. Fit a combined cause-specic Cox regression (CSC) with the covariates epicel + sex + age + logthick for cause 1 hazards but only sex + age for cause 2 hazards (non-cancer mortality).
cscfit <- CSC(list(Hist(time,status)~epicel + sex + age + logthick,
Hist(time,status)~age+sex),data=Melanoma) publish(cscfit)
$`Cause 1`
Variable Units HazardRatio CI.95 p-value
1 epicel not present 1.00 [1.00;1.00] 1.0000
2 present 0.48 [0.26;0.89] 0.0186 3 sex Female 1.00 [1.00;1.00] 1.0000 4 Male 1.78 [1.04;3.03] 0.0350 5 age 1.02 [1.00;1.03] 0.0806 6 logthick 1.98 [1.43;2.73] <0.001 $`Cause 2`
Variable Units HazardRatio CI.95 p-value
1 age 1.08 [1.03;1.13] <0.001
2 sex Female 1.00 [1.00;1.00] 1.0
Solutions
Fit a Fine-Gray regression model (FGR) for cause 1 specic risks with the covariates epicel + sex + age + logthick.
fgrfit <- FGR(Hist(time,status)~epicel + sex + age + logthick,data=Melanoma,cause=1)
publish(fgrfit,org=TRUE)
coef exp(coef) se(coef) z p-value
epicelpresent -0.8346 0.4341 0.323120 -2.5828 0.00980 sexMale 0.5803 1.7866 0.280391 2.0697 0.03800 age 0.0113 1.0114 0.009884 1.1437 0.25000 logthick 0.6494 1.9144 0.152222 4.2663 0.00002 73 / 82
Solutions
Fit an absolute risk regression model for cause 1 specic risks (ARR) with all covariates .
arrfit <- ARR(Hist(time,status)~epicel + sex + age + logthick,data=Melanoma,cause=1)
publish(arrfit,org=TRUE)
Factor ARR CI95 p-value
epicelpresent 0.53 [0.30;0.93] 0.02719
sexMale 1.58 [1.01;2.49] 0.04558
age 1.0046 [0.99;1.02] 0.52088
logthick 1.78 [1.40;2.25] < 0.0001
74 / 82
Modelling Performance Survival Competing risk Display
Solutions
Scatterplot predicted risks of the 5 year cancer-specic mortality obtained with the CSC model to those obtained with the ARR model.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
predictEventProb(arrfit, newdata = Melanoma, cause = 1, times = 5 * 365.25)
predictEv
entProb(cscfit, ne
wdata = Melanoma, cause = 1, times = 5 *
365.25) 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 %
Modelling Performance Survival Competing risk Display
Solutions
Scatterplot predicted risks of the 5 year cancer-specic mortality obtained with the FGR model to those obtained with the ARR model.
● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●
predictEventProb(arrfit, newdata = Melanoma, cause = 1, times = 5 * 365.25)
predictEv
entProb(fgrfit, ne
wdata = Melanoma, cause = 1, times = 5 *
365.25) 0 % 25 % 50 % 75 % 100 % 0 % 25 % 50 % 75 % 100 %
Solutions
Compute Brier score and AUC for 5-year prediction horizon. library(cmprsk)
Score(list(CSC=cscfit,FGR=fgrfit,ARR=arrfit),data=Melanoma,formula= Hist(time,status)~1,cause=1,times=5*365.25,metric="brier",plots= NULL)
Metric Brier: Scores
model times Brier se.Brier lower.Brier upper.Brier
1: AalenJohansen 1826 0.174 0.0219 0.131 0.216
2: CSC 1826 0.150 0.0188 0.113 0.187
3: FGR 1826 0.149 0.0187 0.112 0.186
4: ARR 1826 0.149 0.0189 0.112 0.186
Tests
times model reference delta se.delta lower upper p
1: 1826 CSC AalenJohansen -0.023679 0.01125 -0.04573 -0.00163 0.0353 2: 1826 FGR AalenJohansen -0.024496 0.01172 -0.04747 -0.00152 0.0367 3: 1826 ARR AalenJohansen -0.024608 0.01185 -0.04784 -0.00138 0.0379 4: 1826 FGR CSC -0.000817 0.00136 -0.00347 0.00184 0.5467 5: 1826 ARR CSC -0.000929 0.00211 -0.00506 0.00320 0.6591 6: 1826 ARR FGR -0.000112 0.00157 -0.00318 0.00296 0.9429 77 / 82
Solutions
Compute AUC for 5-year prediction horizon.
library(cmprsk)
Score(list(CSC=cscfit,FGR=fgrfit,ARR=arrfit),data=Melanoma,formula= Hist(time,status)~1,cause=1,times=5*365.25,metric="auc")
Metric AUC: Scores
model times AUC se.AUC lower.AUC upper.AUC
1: AalenJohansen 1826 0.500 0.0000 0.500 0.500
2: CSC 1826 0.750 0.0666 0.619 0.881
3: FGR 1826 0.756 0.0669 0.625 0.887
4: ARR 1826 0.757 0.0660 0.627 0.886
Tests
times model reference delta se.delta lower upper p
1: 1826 CSC AalenJohansen 0.24990 0.06664 0.11928 0.3805 0.000177 2: 1826 FGR AalenJohansen 0.25582 0.06689 0.12472 0.3869 0.000131 3: 1826 ARR AalenJohansen 0.25684 0.06602 0.12744 0.3862 0.000100 4: 1826 FGR CSC 0.00592 0.00503 -0.00393 0.0158 0.238853 5: 1826 ARR CSC 0.00694 0.00835 -0.00943 0.0233 0.406064 6: 1826 ARR FGR 0.00102 0.00660 -0.01191 0.0139 0.876996 78 / 82
Modelling Performance Survival Competing risk Display
Dissemination of results
The results of logistic regression or Cox regression or cause-specic Cox regression can be transformed into a nomogram or Internet calculator in order allow calculation of predicted risks for new subjects.
79 / 82
Modelling Performance Survival Competing risk Display
Nomogram
50� incidence of clinically indolent PCa versus a rate of 14.6� in those found to have PCa on first biopsy.21Indolent PCa also appears to be more common in men with free/total PSA 0.15 or greater.8We did not have sufficient data to incorporate the number of previous negative biopsies or free/ total PSA in our modeling but these factors may be useful in future research.
We developed these nomograms for predicting the possibil-ity that a man has an indolent cancer. The base model re-quires minimal data input, while the full model rere-quires considerable analysis of systematic biopsy. The base model appears to be better calibrated (fig. 2), which means that for a group of patients the mean predicted probability is close to the proportion of men with indolent cancer. However, the full model is more discriminating (table 2), which means that it can better rank individual patients with respect to risk. Ideally, then, one would use the full model to obtain a pre-dicted probability for the patient, and then adjust the prediction to correct for the calibration error. We plan to incorporate this adjustment in our free software, which can be obtained at www.nomograms.org.
New models were developed to distinguish indolent from clinically important cancer with relatively high areas under the ROC curve (0.64 to 0.79). However, these models may be more useful for ruling out, rather than ruling in, indolent cancer, given that they rarely predict with very high proba-bilities. The nomograms would be useful for counseling the man with low predicted probability of indolent cancer, reas-suring him that aggressive therapy appears warranted. Cur-rently, a watchful waiting policy for a man with indolent cancer is determined subjectively based on disease charac-teristics, health and patient preferences. Our nomogram may provide a useful tool for assisting in this decision. Although the nomogram accurately discriminates between men with
to direct a man toward watchful waiting. For example, an indolent cancer in an older man may be considered ideal for conservative management but an indolent cancer in a young man may still warrant aggressive therapy, since the patient may have a long life expectancy during which the cancer may progress from its present, presumably indolent, state. With this nomogram, we can only hope to provide objective infor-mation for use as a basis for this decision, rather than pro-vide an absolute solution to the management dilemma.
CONCLUSIONS
Nomograms have been developed to predict the probability that a man with prostate cancer has an indolent tumor. They each appear to have excellent discriminatory ability and good calibration.
Drs. Kazuho Suyama, Takuji Utsunomiya, Shigehiro Soh and John Gore at Baylor College of Medicine, and Drs. Peter G. Hammerer, Alexander Haese and Rolf-Peter Henke at Hamburg University Hospital helped collect the data.
REFERENCES
1. Jemal, A., Thomas, A., Murray, T. and Thun, M.: Cancer statis-tics, 2002. CA Cancer J Clin,52:23, 2002
2. Ohori, M., Wheeler, T. M., Dunn, J. K., Stamey, T. A. and Scardino, P. T.: The pathological features and prognosis of prostate cancer detectable with current diagnostic tests. J Urol,152:1714, 1994
3. Noguchi, M., Stamey, T. A., McNeal, J. E. and Yemoto, C. M.: Relationship between systematic biopsies and histological fea-tures of 222 radical prostatectomy specimens: lack of predic-tion of tumor significance for men with nonpalpable prostate cancer. J Urol,166:104, 2001
4. Partin, A. W., Kattan, M. W., Subong, E. N., Walsh, P. C., Wojno, FIG. 5. Nomogram for full model.Pre.Tx., pretreatment.Clin., clinical.Pri.Bx.Gl, primary biopsy Gleason score.Sec.Bx.Gl, secondary biopsy Gleason score.U/S, ultrasound.Prob., probable.
PREDICTION OF INDOLENT PROSTATE CANCER
1796
Time-dependent predictor variables
I These variables cannot not be used in the conventional
way, i.e., time-dependent Cox, when the aim is prediction
I Most straightforward approach is landmark analysis:
move time origin to a (landmark) time point after the original time origin
exclude all subjects who had an event before the new time origin
update time-varying predictor variables to the value at or last value before the new time origin
compute changes and other parameters of repeated measurements of the time-varying predictor variables until the new time-origin
proceed as before: t models, extract predicted risks, evaluate performance
81 / 82
Take home messages
I Risk prediction models can be derived from conventional
regression models
I All subjects need to be followed from a common time origin
I Predicted risks are associated with a given time-horizon
I Only predictor variables that are known at time zero can
be used to predict
I Censored data and competing risks need to be accounted
for by choosing appropriate methods