• No results found

Visualizing and analyzing time to event data: Lifting the veil of censoring

N/A
N/A
Protected

Academic year: 2021

Share "Visualizing and analyzing time to event data: Lifting the veil of censoring"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Visualising and analysing

time-to-event data: lifting the veil of

censoring

Patrick Royston

Cancer Group, MRC Clinical

Trials Unit, London

(2)

A poet writes about censored observations:

Last night I saw upon the stair

A little man who wasn't there

He wasn't there again today

Oh, how I wish he'd go away!

From

Antigonish

(1899)

(3)

Outline

Why is censoring of time-to-event data an

issue?

Example in breast cancer

Visualisation of censored data using

model-based imputation

Multiple imputation and analysis of

survival data with missing covariate

observations

(4)

Why is censoring an issue?

You can’t picture the raw data easily

Reliance on Kaplan-Meier plots

ƒ

Exaggerates differences between groups

ƒ

Attracts attention to unreliable survival

estimates at extreme times

Data will be analysed using Cox model

ƒ

Still the almost-automatic choice – although

decent alternatives exist

Time is “forgotten about” in the Cox model

(5)

Results of Cox regression models are

usually expressed as (log) hazard ratios

ƒ

Indirect – not dealing directly with time

ƒ

Can be hard to interpret – different effect on

survival curves at high and low survival probs

ƒ

Particularly difficult for interactions – ‘ratio of

hazard ratios’

Non-proportional hazards

ƒ

Data with long-term follow-up typically have it

ƒ

Modelling and interpretation may be complex

(6)

Example: Primary

node-positive breast cancer

GBSG trial BMFT-2

686 patients, 299 events for

recurrence-free survival (RFS)

Patients assigned to hormonal therapy

(TAM) or not

Visualise the effect of TAM on RFS

Visualise interaction between TAM and ER

(estrogen receptor status)

(7)

Traditional visualisation:

Kaplan-Meier by TAM group

0.00

0.25

0.50

0.75

1.00

S(t)

0

2

4

6

8

Recurrence-free survival time, yr

hormone = No TAM

hormone = TAM

(8)

Dot plot by TAM therapy –

unhelpful with censored data

000000 1011000 01111110 11111111110 11100011110111111111 1111111111111111111 11111111010011111111111 1011110011111111111110000 00110010101111111011011 000110110010110 1010011001011010010111011100 0100101011111110101 010001000111 1011000000111011 0000101010001 101011010110100 1000111001001100000000 11010101 0001010001110000 10000000000101 00100010 01001000001000 0001100000000100 0100010000000 11000000010000 00000000 0000000001 0000001001 00000001 00000000 010 0000 100 0 0 000 11100011 1010 011011 10111111 1000111 1111111111111111 11110 0111101 10011010110000 10100111000 00101001 000101 0101100 1001001100 0101 10000 100001100 11010 0000000111 0000 000000110 0001000000000 00000000000010000 0000001 110000100 010000011 000 000000 0 00010 000 00 000 0

0

2

4

6

8

R

e

cu

rre

n

ce

-f

re

e

su

rvi

va

l t

ime

,

yr

No TAM

TAM

Hormone therapy, 1=no, 2=yes

(9)

How better to visualise

survival times?

To make progress with visualisation, aim to

impute the “missing” part of censored times

Assume a parametric distribution of survival time

Survival times are sometimes approximately

lognormally distributed (Royston 2001a)

ƒ

Can check by using modified Normal Q-Q plot

If lognormal approximation is not good, can

consider Box-Cox transformation of time

(10)

Assessing lognormality:

modified Normal Q-Q plot

Simple transformation of Kaplan-Meier survival curve

.2

.5

1

2

5

10

Recurrence-free survival time (log scale)

-3

-2

-1

0

1

(11)

Normal Q-Q plot by TAM group

.2

.5

1

2

5

10

Recurrence-free survival time (log scale)

-3

-2

-1

0

1

Normal equivalent deviates

(12)

Visualisation of censored data

using imputation

Create m (

1) copies of the data with

censored survival times imputed

Need an imputation model to reflect

ƒ

Distribution of times (e.g. lognormal)

ƒ

Effects of covariates (prognostic factors)

Creating an imputation model:

ƒ

Use

mfp

with cnreg (censored normal regrn.)

to model poss. non-linear effects of covariates

ƒ

E.g.

mfp cnreg lnt x1 x2 x3 x4a x4b x5 x6

x7 hormone, censored(c) select(1)

(13)

Creating the imputed

dataset(s)

Can use the

ice

multiple imputation

command to create the imputations

ƒ

Royston (2004, 2005a, 2005b)

Stata J

ice

varlist

using

filename

[.dta]

[if

exp

] [in

range

] [

weight

],

[m(

#

) cmd(

cmdlist

) cycles(

#

)

boot[(

varlist

)] seed(

#

) dryrun

eq(

eqlist

) passive(

passivelist

)

substitute(

sublist

) dropmissing

(14)

Interval censoring with

ice

gen ll = lnt

gen ul = cond(_d==1, lnt, ln(50))

// chose upper limit of 50 years for

RFS: can use . for

+

(generate FP transformations of

x1

,

x5

,

x6

)

ice x1_1 x2 x3 x4a x4b x5_1 x6_1 x7

hormone ll ul lnt using imputed.dta,

interval(lnt:ll ul) m(10)

(15)

How

interval()

works

Censored obs Upper limit Complete obs -4 -3 -2 -1 0 1 2 3 4

(16)

Code fragment from uvis.ado

`cmd' `yvarlist' `xvars' `wgt',

`options'

...

if "`cmd'"=="intreg" {

tempvar PhiA PhiB

gen `PhiA‘ = cond(missing(`ll'), 0,

norm((`ll'-`xb')/`rmsestar'))

gen `PhiB‘ = cond(missing(`ul'), 1,

norm((`ul'-`xb')/`rmsestar'))

replace `yimp‘ = `xb‘

+`rmsestar'*invnorm(`u'*

(`PhiB'-`PhiA')+`PhiA')

}

(17)

Uses of the

interval()

option

Impute right-, left- or interval-censored

outcomes

ƒ

Response variable in time-to-event studies

Impute when a covariate is sometimes

partly observed, sometimes complete

ƒ

Some observations recorded exactly

ƒ

Others known to be below or above a cutoff

ƒ

E.g. D-dimer in DVT, PgR/ER in breast cancer

Interval censored covariates

(18)

Breast cancer data: visualisation

of time to recurrence

0 0 0000 0 0 0 0000 10 1 101 1010 11110111111010 11011011111 111010111111111000110 11111011111111111111111111111 111110101111111011111011001111111111111 1101111101100111001110111101011111111011111010110110011111111 0100011111010100000111101000110101110011001110110010010010100 100100110111110111111001111110100000010001001000101100110110101000000 0010100000100101011001001011100011100000110000100111001101110110101 10010000101000001000001000100010100000000110100000001011011100100010001011001100010 000100101010000001100000010000000010100000000010100000010000000000000001000000000001000000000000001000100000010 00100000000110010100000000000000011000010001000000000000010001001000000000000 00001000000000010000 1 1 11 11 11111111111 1111111011 1111111110111111 1111111111111111111111111111 11111111111111111111111111110111101 111111111101111111111111111111111111101111111111 100111111111111111111111111111101 01101011011111111010111111111101101011111011 10101011001111111010111111111111101111 001011110101100110001010011110001100111011011010 00101000000010001010100101110110100110000010001000 0000000010001100001000100001000000000000000000110001010000010000100 0000000000000000000000000000000000010000001000000000 00000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000 000000000000000000000000000000000 0000000000000000000000000 000000000000000000000000 000000000000000 0000000 00000000 0

.0

5

.1

.2

.5

1

2

5

10

20

50

Recu

rr

en

ce-fr

e

e sur

v

ival tim

e

, yr

0

1

imputation number

fig_response

(19)

Visualisation: some plots using

the first imputed sample

.2 .5 1 2 5 10 20 50

Recurrence-free survival time (log scale)

No TAM TAM

Hormone therapy

.2 .5 1 2 5 10 20 50

Recurrence-free survival time (log scale)

0 10 20 30 40 50

(20)

Visualisation: treatment by

covariate interaction

.2

.5

1

2

5

10

20

50

Recurrence-free survival time (log scale)

ER neg, no TAM

ER neg, TAM

ER pos, no TAM

ER pos, TAM

(21)

Limitations

Imputed times to event are helpful for

visualisation, but less so for analysis

ƒ

Effectively, such imputations are

extrapolations into the future

ƒ

We don’t

know

the future distribution

ƒ

Estimates of means, SD’s, regression coeffs

etc. are heavily dependent on the distributional

assumptions

ƒ

Potential for bias if assumed distr’n is wrong

Imputed times may be unrealistic

(22)

Other approaches

A reasonably large literature exists

Buckley-James estimation (Buckley & James

1979)

ƒ

Estimates the mean of the censored part

ƒ

Not so good for visualisation

Wei & Tanner (1991)

ƒ

Two algorithms which give multiple imputations of the

censored part

ƒ

Relaxes the normality assumption – samples taken from

the distribution of the residuals

stpm

(Royston 2001b, Royston & Parmar 2002)

(23)

Imputation of survival data with

missing covariate observations

So far, have assumed covariates have complete data

If covariates have

missing data

, need a suitable algorithm

for multiple imputation of all missing values

ƒ

e.g. MICE (

ice

)

To reduce bias, must include the response (time-to-event)

in the imputation model

ƒ

How?

“Standard” approach is to include (censored)

log time

and

the

censoring indicator

in the imputation model

ƒ

No theoretical justification

May be better to

ƒ

Include covariates as usual

ƒ

Impute right-censored times using

ice

with

interval()

option

(24)

Analysis of survival data with

missing covariate observations

Disregard the imputed times in the MI

dataset

ƒ

Except for visualisation purposes

Use original time and censoring indicator

Can analyse the MI dataset using

ƒ

stcox

(Cox regression)

ƒ

streg

(several models available)

ƒ

stpm

(flexible parametric survival models)

(25)

Conclusions

Use of familiar graphical tools with imputed times

to event can give greater insight into censored

survival data

ƒ

Scatter plots, smoothers, etc

Treatment or prognostic effects may be

depressingly small when displayed as scatter

plots of times

ƒ

Much overlap between groups

ƒ

Weak regression relationships

Imputation of times may be helpful in multiple

imputation with missing covariate values

(26)

Some references

• Buckley J, James I (1979) Linear regression with censored data. Biometrika 66: 429-436

• Faucett CL, Schenker N, Taylor JMG (2002) Survival analysis using auxiliary variables via multiple imputation, with application to AIDS clinical trial data. Biometrics 58: 37-47.

• Ma SG (2006) Multiple augmentation with partial missing regressors. Biometrical Journal 48: 83-92 • Pan W (2000) A multiple imputation approach to Cox regression with interval-censored data.

Biometrics 56: 199-203

• Royston P (2001a) The lognormal distribution as a model for survival time in cancer, with an emphasis on prognostic factors. Statistica Neerlandica 55: 89-104

• Royston P (2001b) Flexible alternatives to the Cox model, and more. The Stata Journal 1:1-28. • Royston P, Parmar MKB (2002) Flexible proportional-hazards and proportional-odds models for

censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21: 2175-2197

• Royston P (2004) Multiple imputation of missing values. Stata Journal 4: 227-241

• Royston P (2005a) Multiple imputation of missing values: update. Stata Journal 5: 188-201

• Royston P. (2005b) Multiple imputation of missing values: update of ice. Stata Journal 5: 527-536 • Tanner MA, Wing HW (1987) The calculation of posterior distributions by data augmentation. JASA

82: 528-540. [Cited 642 times, WoS 10.9.2006]

• Wei GCG, Tanner MA (1990) Posterior computations for censored regression data. JASA 85: 829-39 • Wei GCG, Tanner MA (1991) Application of multiple imputation to the analysis of censored

References

Related documents

Findings from this study suggests that rainwater harvesting improved soil aggregate stability at Mxheleni site only in Kwa-Zulu Natal; while it was improved across all the study

The data presented in this article are drawn from an Economic and Social Research Council/Department for International Development-funded study of the impacts of mobile phones on

2 Sections Metal Foil Single-sided Metallized Polypropylene Film Polypropylene Film Dielectric Metal Foil Single-sided Metallized Polypropylene Film 2 Sections Polypropylene

Project: “3D modeling of brain tumors using endoneurosonography and artificial neural net- works”A. Supervisor: Flavio Augusto Prieto

It was identified that silver products are an effective treatment for infected chronic wounds, based on statistically significant results regarding wound healing and infection

Due to the variable availability of solar, low power supply wind and heat power, it is convenient to store the energy into a battery pack and then charge the cell phones from that

The end result was that between 1911 and 1912, a decade after Victoria’s death, Victoria Harbour acquired its first two, purpose-designed if fairly modest, dedicated marine passenger

Up to now the DMT has been presented to and used by 89 participants of the Assessment Mission Course (AMC) and 13 participants of the Staff Management Course (SMC) - both courses