• No results found

The Analysis of Censored Covariates in Observational Studies

N/A
N/A
Protected

Academic year: 2020

Share "The Analysis of Censored Covariates in Observational Studies"

Copied!
109
0
0

Loading.... (view fulltext now)

Full text

(1)

JOHNSON, BRENT ALAN. The Analysis of Censored Covariates in Observational Studies. (Under the direction of Anastasios A. Tsiatis.)

(2)

by

Brent Alan Johnson

A dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in

STATISTICS

in the

GRADUATE SCHOOL at

NC STATE UNIVERSITY 2003

Anastasios A. Tsiatis Dennis Boos

Chair of Advisory Committee

(3)

Dedication

(4)

Biography

(5)

Contents

List of Tables vi

List of Figures viii

1 Introduction 1

2 Confounding 5

2.1 What is Confounding? . . . 6

2.2 Adjustment through the Propensity Score . . . 9

3 Estimating Mean Response 14 3.1 Assumptions and Methods . . . 14

3.2 Large Sample Properties . . . 20

3.3 Analysis of the ESPRIT Infusion Trial . . . 22

3.4 Simulation Results . . . 26

4 Estimating Duration-Response Relationships 31 4.1 Methods and Assumptions . . . 31

4.2 The Estimator and Its Properties . . . 33

4.2.1 Motivation . . . 33

4.3 Parametric Inference . . . 38

4.4 Semiparametric Inference . . . 41

4.4.1 Estimation through Weighted Least Squares . . . 44

4.5 Numerical Studies . . . 47

(6)

5 Discussion 53

Bibliography 55

A Proof Theorem 1 59

B Asymptotic Properties of µˆjn 64

B.1 M-estimation Method . . . 65

B.2 Derivation of the Influence Function . . . 68

C Proof of Theorem 2 71 C.1 Preliminaries . . . 71

C.2 Main Proof . . . 72

D Asymptotic Properties of βˆn 82 D.1 Consistency . . . 83

D.2 Asymptotic Variance Calcuations . . . 86

D.2.1 Time-Independent Confounders . . . 86

(7)

List of Tables

3.1 Summary statistics for the ESPRIT trial . . . 23 3.2 Estimated event rates for the ESPRIT trial. T1j denotes the overall

event rate for all patients belonging to interval Ij,T2j denotes the un-censored event rate for patients belonging to Ij, ˆµ(0)j is the proposed estimator assuming no confounding is present, ˆµ(1)j is the proposed es-timator assuming confounding is present through baseline factors only, and ˆµ(2)j is the proposed estimator assuming time-dependent confound-ing. Standard errors are presented in parentheses. . . 26 3.3 Simulation summary of mean response for 1000 Monte Carlo datasets

when treatment-duration data are discrete. T1j is the average Y for Ui Ij and T2j is the average Y for (Ui Ij,i = 1). ˆµj,12 is our estimator assuming confounding is present throughZ1 andZ2. ECP is defined the empirical coverage probability. Estimated standard errors are given in parentheses. . . 28 3.4 Simulation summary of mean response for 1000 Monte Carlo datasets

when the no unmeasured confounder assumption is violated. ˆµjn as-sumes confounding is present only through Z1. Empirical coverage probabilities (ECP) are given in parentheses. . . 30 4.1 Numerical study for Bernoulli response using 1000 Monte Carlo datasets.

ˆ

(8)

4.2 Numerical study for Bernoulli response using 1000 Monte Carlo datasets. ˆ

βJT is our estimator, ˆβLS is MLE from logistic regression using all the subject, while ˆβU C is the MLE using just those uncensored subjects. Empirical coverage probabilities are given in parentheses. Sample size is 250. . . 50 4.3 Duration-response summary for ESPRIT trial data. Observed duration

(9)

List of Figures

(10)

Chapter 1

Introduction

(11)

reflow, or coronary thrombosis immediately discontinue the infusion process to re-ceive appropriate medical attention; we define these protocol-defined adverse events as infusion-terminating events, or more generally as treatment-terminating events.

The main study analysis suggested that the eptifibatide regimen in the study is superior to placebo. The event proportion for the composite endpoint was 10.5% for placebo versus 6.1% for eptifibatide (p=0.0034). A natural follow-up question posed by the investigators involved the length of infusion of eptifibatide that should be rec-ommended for future patients. Because infusion cannot continue after a treatment-terminating event, a recommendation to infuse fort units of time necessarily implies that treatment would be discontinued after drug was administered fort units of time or when a treatment-terminating event occurs, whichever comes first. That is, the recommended infusion length t may be censored by a treatment-terminating event. Thus, more precisely, the investigators were interested in comparing the event propor-tions for different infusion length “policies,” where a “policy” dictates a particular infusion length t that may be possibly censored by events that require treatment termination.

(12)

patients were assigned at random to receive different treatment duration policies at the beginning of the study, estimation of this parameter for a particular policy would be straightforward following an intention-to-treat principle; the average of all responses for patients assigned to the policy would be an unbiased estimator for this mean.

Data from a study that did not randomize patients to different treatment du-ration policies, such as the data from patients in the ESPRIT trial who received eptifibatide, includes the length of infusion for each patient and whether the infusion was terminated because of a protocol-defined adverse event or because of physician discretion. One difficulty in estimating the mean response for different treatment duration policies using the observed data is that, unlike a well-designed randomized treatment duration study, when a patient has their treatment terminated because of an adverse event then we do not know what treatment duration policy was intended for that patient. Also, as in most observational studies, because infusion length was left to the discretion of the physician rather than dictated by design (randomization), patients receiving different durations may not be prognostically similar. It is intu-itively apparent that simply averaging responses among individuals observed to have completed infusion of length t, either including or excluding patients experiencing a treatment-terminating event prior tot, would yield a biased estimator of the desired policy mean.

(13)

a functional relationship of this mean response continuously in time from a sample of observed data such as those in the ESPRIT infusion trial and derive a consistent estimator. Although there are currently no consistent estimators available for esti-mating mean response under these circumstances, numerous authors have developed estimators for estimating causal parameters in observational studies (e.g., Robins et al., 1994; Robins et al., 2000; Hernan et al., 2000; Hernan et al., 2001; Satten et al., 2001). Our work differs from these methods in that our definition of treatment includes the possibility that treatment may be censored or terminated. There has also been some work in the area of censored treatments or censored covariates (e.g., Clayton, 1978; Akritas et al., 1995); where “covariate” refers to infusion length in our problem. The key difference between these methods and that proposed here is the dependence of the outcome on the covariate censoring process. In our problem, the covariate censoring process is expected to be related to the outcome.

(14)

Chapter 2

Confounding

(15)

2.1

What is Confounding?

Confounders are commonly defined as those variables which affect an outcome or endpoint of interest as well as another factor that is itself associated with the out-come of interest. In this dissertation, that “other factor” will always be an exposure or treatment of scientific interest. Let the random vector (Y, D, X) represent the outcome, exposure, and confounding random variable, respectively, defined on the probability space (Ω,A, P). Then, confoundingdescribes the phenonmenon whereX modifies the relationship between Y and D — that is, the relationship between the exposure and outcome changes with different levels (or values) of the confounding factor.

Here, we will assume that primary scientific interest lies in marginal associations between Y and D across levels of X rather than the conditional associations of Y and Dwithin levels of X. Marginal versus conditional inference in regression models has become the focus of intense debate in recent years and we do not intend to continue that discussion here. Moreover, the aforementioned debate tends to focus on what we may call “associational” models rather than “causal” models, whereas this dissertation will focus on and use a particular causal methodology. To give us some concrete ideas about confounding and their role in real-world applications, let us consider the following examples:

(16)

non-exposure to an occupational hazard — chemical 1,3-butadiene — and X denote smok-ing (adapted from Zhao, Vodicka, Sram, and Hemminki, 2000). Initially, we will

assume X only realizes values zero and one for non-smoker and smoker, respectively.

Although the primary goal of this study was to determine the relation between the exposure D and an increase in S, our goal will be to find an association between the exposure D and the incidence of cancer Y over a 30-year period. Because this study was not randomized, it is unreasonable to assume that those workers exposed to D are prognostically similar to those not exposed toD. For example, more than half of those workers exposed toDwere smokers while less than half of the controls (workers not exposed toD) smoked. Since 1,3-butadiene is contained in cigarette smoke, naive statistical analyses may be unable to decide whether an increase in the incidence of cancer over a 30-year period is due to exposure to the occupational hazard or due to the smoking (See Rosenbaum, 2002)

Example 2.2 (Dose-Response Study) Let Y denote a clinical endpoint, D is an experimental treatment, and X is age. Here, we will not yet specify whether Y is continuous or discrete. Initially, we will assume D is a simple binary random vari-able — such as treatment versus placebo or high dose versus low dose — and X is summarized inK levels. Later, we will want to consider continuous random variables for both D and X.

(17)

analysis can sometimes be complicated when subjects receiving different treatments are also different prognostically. Suppose the investigator decides which subjects receive which treatment. An obvious selection bias results and may further bias the conclusions of the clinical study. In a biomedical setting, high doses of a certain drug may be slightly toxic, cause discomfort or minor “adverse events,” and therefore more difficult for older than younger patients to tolerate. As a result, attending physicians may be more likely to “assign” older patients a lower dose and strong, younger patients the higher dose. Then, the association between outcome and treatment dose will be confounded by age.

Example 2.3 (Simulation) Generate a random variable X according the probabil-ity law FX. Next generate a random variable Daccording to the probability law FD|X. Finally, generate a random variable Y according the the probability law FY|D,X.

(18)

2.2

Adjustment through the Propensity Score

LetXbe a random vector of potential confounders andDbe our binary treatment random variable. A propensity score is defined as the probability of treatment or selection probability, e.g. π(X) =P r(D= 1|X). There are basically two approaches to correct for the bias due to confounding using propensity scores which have received much attention in the statistical literature over the past twenty years. The first attempts to control the bias by stratifying on “levels” of the propensity score while the second inversely weights estimating equations by the propensity score.

Before we proceed with our discussion of propensities and related methods, we must first discuss a fundamental difference between the previous methods and the one about to be discussed. In simple linear regression (and all the methods discussed earlier), we often write the treatment effect asE(Y|D= 1)−E(Y|D= 0) =β, where D is a binary treatment random variable and E(Y|D) =α+βD. Propensity score methods will define new random variables Y0 and Y1 (see below) and define a new treatment effect of interest,E(Y1)−E(Y0) =β∗. In general, β 6=β∗. Occassionally, β is referred to as the associational treatment effect while β∗ is referred to as the causal treatment effect. Therefore, when we mention treatment effect from now on, we mean it in the “causal” sense, i.e. β∗.

Definition 2.1 (Counterfactual Response) A potential outcome (or

counterfac-tual response) Yd is the outcome one observes if the experimental unit receives treat-ment D=d. In the case of a binary treatmentD, the potential outcomes are Y1 and

(19)

Assumption A1 (Stable Unit Treatment Value Assumption) Asserts the

ob-served outcome Y is exactly equal to the potential outcome Yd if, in fact, the unit coincidentally receives treatment d, i.e.,

Y =Y1∗D+Y0(1−D),

for a binary random variable D.

Assumption A2 (Strongly Ignorable Treatment Assignment)

P r(D=d|X, Y0∗, Y1) =P r(D=d|X).

Also known as the “no unmeasured confounders assumption.”

Balancing Scores. DefineXas the p-dimensional random vector,X(ω) : Ω <p. Then, define a balancing score as the many-to-one function b(X) :<p → < such

that

P r{X∈A|D=d, b(X)}=P r{X∈A|D=d0, b(X)}

for A ∈ A and d and d0 are two different levels of D. It is straightforward to show that the propensity scoreπ(X) is the coarsest balancing score (Rosenbaum and Rubin, 1983a). Because the propensity score is a balancing score, then we have the desirable result that FX|D=d,π(X) =FX|D=d0(X). Using assumptions A1 and A2, one can show

that within levels of the propensity score, we may derive unbiased estimates of the “causal” treatment effect.

(20)

within strata, then combining the strata-specific measures using a weighted average (or weighted sum) to produre a “bias-free” measure of the treatment effect (when combining measures across strata make sense). The method works well when the stratified statistics all move in the same direction and combining the statistics into one overall measure makes sense.

Inverse Probability of Treatment Weighted (IPTW) Estimators. As we said earlier, our interest lies in estimating the marginal effect of D on Y. The second approach, by Robins and Rotnizky (1995), uses score equations and ideas from missing data to control for the selection bias due to X. Like the previous method, this one intends to draw inference on the “causal” treatment effect defined through the potential outcomes Yd∗, d = 0,1. If we define E(Yd) =µd, d= 0,1, then we are primarily interested in the difference ˆµ1−µˆ0. One can show the following function of observable random variables is ubiased for µd under assumptions A1 and A2:

Y ·I(D=d) πd(X) ,

where πd(X) =P r(D =d|X). Unbiasedness follows from the following argument:

E

½Y ·I(D=d) πd(X)

¾

= E

½Y

d ·I(D=d)

πd(X) ¾

= E

" E

½Y

d ·I(D=d)

πd(X) ¯¯ ¯¯Yd∗,X

¾#

= E

" Yd πd(X)E

½

I(D=d)¯¯¯¯Yd∗,X

¾#

= E

½ Y

d

πd(X)P(D=d|X) ¾

since FD|Y

(21)

where FD|Y

d,X and FD|X are denoting the conditional probability distribution func-tions. There are weaker versions of assumption A2, but this argument gives us the ideas behind the traditional argument used in constructing unbiased estimators for E(Yd), and more generally, for drawing marginal inference on the effect of D on Y in the presence of confounding.

This leads to the following estimating equation for our estimator:

n

X

i=1

(Yi−µd)I(Di =d) πd(Xi) = 0, which may also be written

ˆ

µd=X

i

Yiwi P

iwi

, where wi = I(Di =d) πd(Xi) ,

for πd(Xi) = P r(Di =d|Xi) d= 0,1. Notice that this estimator is simply the usual estimator for mean outcome at leveld inversely weighted by πd(X).

From here, it is easy to define the general result. Namely, defineS(β) =Pni=1si(β) be the usual score equation one would use to measure the association betweenY and D in the absence of confounding, e.g. in a randomized design, where we are defining si(β) as s(Yi, Di, β). Assume we measure potential confounders X on each subject. Then, define the new score S†(β) with si(β) replaced with si(β)/πdi(Xi). Using the new score equations will give the appropriate marginal inference on the potential outcome Yd. S†(β) has mean zero under assumptions A1 and A2 using identical arguments to the ones above.

(22)

and proceed as usual.

(23)

Chapter 3

Estimating Mean Response

So far, we have discussed a research question and some statistical tools useful in approaching such questions. Statistical tools should not be used with impunity and all assumptions should be clearly presented and verified, if possible. Here, we make unverifiable assumptions but make every effort to address those assumptions with the research investigators and, to the extent possible, through the observed data. Now, let us construct an approach to estimate the mean response as a function of infusion length discussed in Chapter 1 by extending the ideas and tools presented in Chapter 2.

3.1

Assumptions and Methods

(24)

order to conceptualize properly the question of interest, it is useful to introduce the idea of potential (counterfactual) random variables (Rubin, 1974; Rosenbaum and Rubin, 1983). Specifically, we define the potential random variable C to represent the time at which a randomly selected individual from our population, if continuously treated, would have a treatment-terminating event. We also define the potential ran-dom process (Y?

t , t≤C), whereYt?denotes the response of the individual, if treatment

were terminated at time t, for t C. In terms of these potential random variables, the policy of treating an individual fortunits of time or until a treatment-terminating event results in the responseYtC, wheret∧Cdenotes the minimum oftand C. This can also be written as

YtC =Yt∗I(C ≥t) +YC∗I(C < t), (3.1) and the parameter of interest is the population mean response for this treatment duration policy, namely, E(YtC).

The variables defined above are referred to as potential random variables, or coun-terfactuals, because, contrary to fact, they may not actually be observed. In contrast, for a randomly selected individual from our population, the observable random vari-ables are (Y, U,∆), where Y denotes the observed response, U denotes the actual treatment duration and ∆ is an indicator variable such that ∆ = 0 if treatment du-ration was stopped due to a treatment-terminating event and ∆ = 1 if stopped due to physician discretion.

(25)

finite values, t1, . . . , tk. In practice, this may necessitate grouping the treatment du-ration data and will be discussed later. However, when treatment is stopped because of an intervening event (∆ = 0) U can take values along a continuum of time. Under these conditions, the specific parameters on which we focus are the population mean responses for the k treatment duration policies, µj =E(Yt

j), j = 1, . . . , k.

A key assumption is that the observed response Y may be written in terms of the potential outcomes as

Y =   

k

X

j=1

Yt

jI(U =tj,∆ = 1)  

+YC∗I(∆ = 0). (3.2)

In words, (3.2) says if a patient experiences a treatment-terminating event, then the observed response is YC; otherwise, the observed response isYt

j corresponding to the realized value of actual treatment duration U.

Along with the data (Y, U,∆), we assume that additional, possibly time-dependent covariate information is also available. Let Z(u) denote the value of a vector of co-variates for an individual at timeu, whereuis measured as time from the individual’s entry into the study. We define ZH(x) to be the history of covariate information up

to and including time x, i.e., {Z(u), u x}, and use ZH

j to denote ZH(tj). We

de-fine the jth discrete cause-specific hazard function to be the conditional probability λj(ZH

j ) = P(U = tj,∆ = 1|U tj, ZjH), j = 1, . . . , k. The key assumption that

allows us to estimate consistently the parameter µj =E(Ytj) from a random sample of observed data is given by

(26)

In words, assumption (3.3) implies that the decision to terminate or continue a pa-tient’s treatment at timetj — given the patient has continuously received treatment up to and including time tj without a treatment-censoring event, and given the pa-tient’s covariate history up to and including time tj — does not depend on future prognosis. Such an assumption is plausible if all of the information about an individ-ual through time tj, which may be prognostic and which an investigator may use to make decisions on treatment duration, are captured in the data ZH

j . In the

epidemi-ological literature, assumption (3.3) is referred to as “no unmeasured confounders.” As we now demonstrate, assumption (3.3) allows us to identify µj from the dis-tribution of observable random variables. This then leads naturally to a consis-tent estimator for µj. The proposed method uses inverse probability weighting to adjust for potential confounders and censoring across treatment duration policies, where the weights are defined through the observable discrete cause-specific hazards, λj( ¯Zj), j = 1, . . . , k. For reasons discussed below, it is convenient to define the following functions of observables:

fj(ZjH) = λj(ZjH)

jY1

m=1

{1−λm(ZmH)} (3.4)

and

Kj(ZjH) =

j

Y

m=1

{1−λm(ZmH)}. (3.5) We also define [U] = maxj{j :tj < U}; therefore ZH

[U] refers to all covariate

informa-tion up to and including time tj just prior to U and K[U](ZH

[U]) will be the product

(27)

Theorem 1 Assuming the probabilities fj(ZH

j )andK[U](Z[HU])are known, the

follow-ing function of observables is unbiased for µj =E(Yt

j) under assumption (3.3): Y ×

  

I(U =tj,∆ = 1) fj(ZH

j )

+ I(U < tj,∆ = 0) K[U](ZH

[U])

 

 (3.6)

A proof of Theorem 1 is given in the Appendix.

Using an argument identical to the one in the proof of Theorem 1, we can show that

E

(Y µj)   

I(U =tj,∆ = 1) fj(ZH

j )

+I(U < tj,∆ = 0) K[U](ZH

[U])

   

= 0. (3.7)

Thus for a sample of data {Yi, Ui,i, ZH(U

i), i= 1, . . . , n}, a natural estimator for

µj would be obtained by solving the simple estimating equation

n

X

i=1

(Yi−µˆjn)   

I(Ui =tj,i = 1) fj(ZH

ij)

+I(Ui < tj,i = 0) K[Ui](ZH

[Ui])  

= 0, (3.8) whereZH

ij refers to the covariate information up to and including time tj for the i-th

individual. This yields the estimator ˆ

µjn= Pn

i=1Yiwij

Pn

i=1wij

, wij = I(Ui =tj,i = 1) fj(ZH

ij)

+I(Ui < tj,i = 0) K[Ui](ZH

[Ui]) ,

which is a weighted average of the responses. Remark: If we view the probability fj(ZH

ij) as the propensity score (Rosenbaum and

Rubin, 1983) for thei-th individual to have treatment terminated at time tj, then, if there were no treatment-terminating events, the estimator would equalPYiwij/Pwij, where wij =I(Ui =tj)/fj(ZH

ij). This is a weighted average of the responses for

(28)

S¨arndal, and Wretman (1983) and Rosenbaum (1987) to adjust for confounding of treatment with baseline time-independent covariates. With censoring, the weighted average also includes responses from individuals who have a treatment-censoring event at timeC < tj. However, their contribution is weighted by

1 K[Ui](ZH

[Ui])

= 1

fj(ZH ij)

fj(ZH ij)

K[Ui](ZH

[Ui]) .

Intuitively, this weight can be viewed as the inverse propensity score multiplied by the conditional probability that the individual would have treatment stopped at time tj given that treatment duration was known to be greater than C. Heuristically, this shows how the response of an individual who has treatment censored at time C contributes “to the right” (Blight, 1970; Turnbull, 1974; Turnbull, 1976) in estimating µj for all {j : tj > C}.

Because (3.8) contains the unknown probabilities fj(ZjH) and K[U](Z[HU]), which are functions of the unknown parameters λj(ZjH), j = 1, . . . , k, these probabilities must be estimated from the data. This requires that we posit a model for the discrete hazards λj(ZjH, γ), j = 1, . . . , k1, as a function of a finite dimensional parameter vector γ. Assuming (3.3) holds, it is straightforward to derive the observed-data likelihood of Di ={Ui,i, ZH(Ui)}, i= 1, . . . , n, as Qni=1L(γ;Di), where

L(γ;Di) =

kY1

j=1

(

λij(γ) 1−λij(γ)

)I(Ui=tj,i=1)

{1−λij(γ)}I(Ui≥tj), (3.9)

andλij(γ) = λj(ZH

ij, γ). The estimator forγis obtained by maximizing this likelihood.

Of course, the exact form of (3.9) depends on how we model λj(ZH

j ) as a function of

(29)

Generalized linear models are often used for their interpretability and general applicability. We consider one such class of generalized linear models, where

λj(ZjH) =Fj+βjtZjH),

j = 1, . . . , k1, and F(t) is the logistic function, i.e. et/(1 +et). This is similar to

the continuation ratio logit model (Agresti, 1990, p.319), in which case (3.9) becomes

kY1

j=1

Y

i∈Rj

exp{j +βt

jZijH)I(Ui =tj,i = 1)}

1 + exp(αj +βt jZijH)

,

where Rj = {i : Ui tj}. This likelihood is similar to Cox’s partial likelihood for continuous time proportional hazards models.

The parametrization above allows for different parameters associated with each time intervaltj. For parsimony, when applicable, we may assume that some of these parameters are the same across the different time intervals.

To summarize, the proposed estimator for µj, j = 1, . . . , k is given by

ˆ µjn=

Pn

i=1Yiwˆij

Pn

i=1wˆij

, wˆij = I(Ui =tj,i = 1) fj(ZH

ij,ˆγ)

+I(Ui < tj,i = 0) K[Ui](ZH

[Ui],ˆγ) ,

where ˆγ is the maximum likelihood estimator which maximizes (3.9).

3.2

Large Sample Properties

(30)

As mentioned previously, this is an approximation to the truth when ∆ = 1 and U is, in fact, continuous and the data are grouped into k categories by partitioning treatment duration into intervals, with tj representing the midpoint of thejth inter-val. A rigorous approach to this problem would assume that E{YtC} is a smooth function of t and that the number of intervals k increases as a function of n. Such a technical development is beyond the scope of the paper and, we believe, would not add additional practical insight into the problem. We make additional comments on the effect of partitioning in the Discussion.

Because the proposed estimator ˆµjnuses the estimated discrete hazardλj(ZH ij,γ),ˆ

it is convenient to define the estimator as the first element in the solution to the system of equations

n

X

i=1

   

ψµj(Yi, Diˆjnˆn) ψγ(Diˆn)

  

= 0, (3.10)

where

ψµj(Yi, Di, µj, γ) = (Yi−µj)   

I(Ui =tj,∆ = 1) fj(ZH

ij;γ)

+I(Ui < lj,∆ = 0) K[Ui](ZH

[Ui];γ)   

ψγ(Di, γ) =

∂γ logL(γ;Di).

Note that we have characterized the proposed estimator as an M-estimator (Huber, 1964), whose asymptotic properties are well known. Hence, under suitable regularity conditions, ˆµjn can be shown to be consistent and asymptotically normal when the model for the discrete hazards (3.9) is correctly specified.

(31)

Stefanski, 1995, sec. A.3.6) and is given by

n−1

n

X

i=1

(Yi−µˆjn)2   

I(Ui =tj,∆ = 1) fj2(ZH

ij; ˆγn)

+I(Ui < tj,∆ = 0) K[2U

i](Z H

[Ui]; ˆγn)   −Hˆi

n ˆ

E(SγSγt)o1Hˆit  ,

where ˆE(SγSt

γ) is a consistent estimator of the Fisher information for γ, and

ˆ

Hi = (Yi−µˆjn)     

I(Ui =tj,∆ = 1)∂fj(ZijH) ∂γ

fj2(ZH ij; ˆγn)

+ I(Ui < tj,∆ = 0)

∂K[Ui](ZH[Ui])

∂γ

K[2U i](Z

H

[Ui]; ˆγn)

    .

The asymptotic results above pertain to the marginal distribution of ˆµjn for a fixedj. It would be straightforward to consider the system of estimating equations

n X i=1                    

ψµ1(Yi, Diˆ1n,ˆγn) ·

· ·

ψµk(Yi, Diˆknˆn) ψγ(Diˆn)

                   

= 0, (3.11)

simultaneously in order to derive the joint asymptotic normal distribution of (ˆµ1n, . . . ,µˆkn). This may be useful, for example, if formal tests of contrasts of mean response for the different treatment duration policies are desired.

3.3

Analysis of the ESPRIT Infusion Trial

(32)

endpoint of death, MI, or revascularization within 30 days of the initiation of treat-ment. The data are discretized by taking tj to be the midpoint of five intervals Ij, namely Ij ={(tj1+tj)/2,(tj+tj+1)/2}for t= (t1, t2, t3, t4, t5) = (16,18,20,22,24), and we redefine Ui =tj for any patient where Ui Ij and ∆i = 1. There are seven patients who completed infusion before 15 hours and are assigned to the first group (Ui = 16) and seven patients who completed infusion after 25 hours and are assigned to the last group (Ui = 24). The inclusion or exclusion of these 14 observations does not appreciably change the results. The frequency for the number of patients who completed infusion at tj is 61, 479, 194, 85, and 111 for j = 1, . . . ,5. The proposed method does not require that we discretize the values of C for patients experiencing an infusion terminating event; a summary of these individuals is as follows: 89 cen-sored before 16 hours, 11 between 16 to 18 hours and six patients between 18 to 20 hours.

Table 3.1: Summary statistics for the ESPRIT trial

Event Within 30 Days

No (N=77) Yes (N=959)

Diabetes 0.17 0.20

PTCA 0.34 0.22

Angina 0.31 0.40

Heparin 0.18 0.13

Weight 83.99 85.17

(33)

treat-ment duration policy. As such, the ESPRIT infusion trial investigators were asked to identify factors that they believed would influence treatment duration. The inves-tigators confirmed that treatment would be discontinued with certainty if a patient experienced a protocol-defined adverse event, but otherwise, could not identify any variables that they believed would affect the decision of the participating physicians in any systematic way to terminate treatment. Thus the investigators believed that that there were no measured or unmeasured confounders. Although it is impossible to confirm whether there are no unmeasured confounders in any dataset, we investigate the sensitivity of our analysis results under different assumptions about measured and unmeasured confounders.

(34)

demonstrate that confounding indeed plays a minor role.

Naive estimators for µj commonly employed in practice include an overall event proportion for all patients belonging to the intervalIj which we denote byT1j, and an uncensored event proportion for all patients completing infusion inIj which we denote byT2j. In Table 3.2, we present the results of the estimated event proportions for the two naive estimators and three estimators proposed in this paper that make different assumptions about the confounding relationship — ˆµ(0)jn assumes no confounding is present, ˆµ(1)jn assumes confounding is present only through baseline covariates, and ˆ

µ(2)jn assumes the possibility of time-dependent confounding; thus, this last estimator includes the two time-dependent covariates related to nadir platelet count in addition to the baseline covariates. We immediately note that for the first interval T11 is much larger than T21. This can be explained by the higher event proportion among censored patients compared to uncensored patients, i.e. 0.189 compared to 0.061, and by the fact that 90 patients were censored at or before 16 hours while only 61 patients completed physician-recommended infusion at 16 hours. As might be expected under these conditions, the proposed estimator ˆµjn is greater than the naive uncensored event proportions,T2j, for all j = 1, . . . , k.

(35)

Table 3.2: Estimated event rates for the ESPRIT trial. T1j denotes the overall event rate for all patients belonging to interval Ij, T2j denotes the uncensored event rate for patients belonging to Ij, ˆµ(0)j is the proposed estimator assuming no confounding is present, ˆµ(1)j is the proposed estimator assuming confounding is present through baseline factors only, and ˆµ(2)j is the proposed estimator assuming time-dependent confounding. Standard errors are presented in parentheses.

j tj (hrs) T1j T2j µˆ(0)j µˆ(1)j µˆ(2)j 1 16 0.140 0.017 0.036(0.017) 0.037(0.016) 0.035(0.012) 2 18 0.047 0.046 0.062(0.010) 0.064(0.010) 0.064(0.010) 3 20 0.068 0.068 0.082(0.017) 0.081(0.017) 0.081(0.017) 4 22 0.050 0.050 0.065(0.022) 0.064(0.022) 0.063(0.022) 5 24 0.115 0.115 0.124(0.028) 0.126(0.035) 0.132(0.037)

rate and could possibly be detrimental to the patients.

3.4

Simulation Results

In this section, we investigate the small sample properties of our estimator and explore the sensitivity of our estimator to the assumption of no unmeasured con-founders (3.2). These simulations assume that patients are assigned to one of a finite number of treatment duration policies at values t = (t1, . . . , t4). Although the pro-posed method allows for time-dependent covariates, for simplicity, we only consider time-independent covariates. This is consistent with our analysis of the ESPRIT data.

(36)

a treatment-censoring random variable C as an exponential{ρ(Z)} random variable, where

ρ(Z) = 0.005·exp(ϕ1Z1),

ϕ1 = 2. The treatment duration data are simulated according to the following algorithm, which is consistent with the assumptions made: Start by letting m= 1.

1. If C < tm, then define U =C and ∆ = 0.

2. For C tm, generate a Bernoulli random variable Qm, the indicator variable for stopping treatment at time tm, with probability λm(Z) where

logitm(Z)}=αm+β1Z1.

3. If Qm = 1, then assign U = tm and ∆ = 1; if Qm = 0 and m < k, then incrementm tom+ 1 and goto Step 1.

4. IfU =tj and ∆ = 1, then generate the corresponding responseY as a Bernoulli random variable with probability π, where

logit(π) =ηj+ζ1Z1,

whereas, if U =C and ∆ = 0, then generate the corresponding response Y as a Bernoulli random variable with probabilityπ, where

(37)

Note that the parameter υ is only present when ∆ = 0. Therefore, roughly speaking, the log odds of Y = 1 are multiplied by exp(υ) when ∆ = 0 vis-a-vis ∆ = 1, thus inducing a dependence of censoring of treatment infusion on the probability of outcome. The population parameter of interestµj is difficult to evaluate analytically; thus, we approximated its value by simulation. Using the above algorithm, we forced treatment duration to be stopped at timetj, if not already censored by replacing Step 2 with “Qm = 0 for m= 1, . . . , j1 andQj = 1”, generating the outcomeY 100,000 times, and then taking the sample average.

The chosen values of the parameters are as follows: α = (1.2,0.75,0), β1 = 0.5, η = (5,4,4,2), ζ1 = 2, υ = 2. For each data set, nominal 95% Wald confidence intervals were constructed using ˆµjn, its standard error from Section 3.2, and a critical value of 1.96. Then, the empirical coverage probability (ECP) is calcu-lated as the number of MC datasets where the trueµjfalls within the Wald confidence interval divided by the total number of MC datasets.

Table 3.3: Simulation summary of mean response for 1000 Monte Carlo datasets when treatment-duration data are discrete. T1j is the average Y for Ui Ij and T2j is the average Y for (Ui ∈Ij,i = 1). ˆµj,12is our estimator assuming confounding is present through Z1 and Z2. ECP is defined the empirical coverage probability. Estimated standard errors are given in parentheses.

(38)

Table 3.3 presents simulation results for estimating mean response as a function of treatment duration policy tj, j = 1, . . . ,4 when our no unmeasured confounders assumption is true. We use a sample size ofn= 1000 and generate 1000 Monte Carlo datasets. Monte Carlo bias is less than 2% in every case and less than 1% in most cases; interval coverage is approximately the nominal level. The naive estimators,T1j and T2j, are biased and do not possess the correct coverage probability.

Next, we investigate the sensitivity of our estimator to deviations from the “no unmeasured confounders” assumption. For this, we include an additional covariateZ2 to represent a potential unmeasured confounder. To induce additional confounding, we define an additional term β2Z2 in the treatment choice model in step 2 of the algorithm and an additional term ζ2Z2 in the prognostic model in step 4 of the algorithm. We fix β1 = 0.5, ζ1 = 2 and let the other parameters, α, η, υ be the same as in the first simulation. Different values of β2 and ζ2 are considered to study the effect of the strength of confounding of the unmeasured variable Z2 on the resulting estimator.

The estimator ˆµjnis computed using only the covariateZ1. We again use a sample size ofn = 1000 and 1000 Monte Carlo datasets. The simulation results are displayed in Table 3.4.

(39)

Table 3.4: Simulation summary of mean response for 1000 Monte Carlo datasets when the no unmeasured confounder assumption is violated. µˆjn assumes confounding is present only through Z1. Empirical coverage probabilities (ECP) are given in parentheses.

ζ2 =1 ζ2 =0.5 tj β2 µj µˆjn(ECP) µj µˆjn(ECP)

-0.5 0.094(0.961) 0.087(0.970)

t1 -0.25 0.090 0.092(0.974) 0.084 0.085(0.961)

0 0.090(0.958) 0.085(0.955)

-0.5 0.126(0.957) 0.118(0.961)

t2 -0.25 0.125 0.126(0.966) 0.117 0.117(0.963)

0 0.126(0.955) 0.117(0.958)

-0.5 0.130(0.907) 0.126(0.942)

t3 -0.25 0.134 0.133(0.924) 0.130 0.127(0.949)

0 0.137(0.973) 0.129(0.954)

-0.5 0.152(0.767) 0.152(0.877)

t4 -0.25 0.172 0.161(0.871) 0.166 0.157(0.917)

(40)

Chapter 4

Estimating Duration-Response

Relationships

We now explore methods to estimate the mean reresponseE(YtC) as a continuous function in t. In particular, we will write

E(YtC) = µ(t, β) (4.1)

through thep-dimensional parameter vectorβ. Our goal is to derive a consistent and asymptotically normal estimator for β.

4.1

Methods and Assumptions

(41)

may not actually be observed. In contrast, for a randomly selected individual from our population, the observable random variables are given by {Y, U,}, where Y denotes the observed response, U denotes the actual treatment duration realizing values along a continuum of time, and ∆ is an indicator variable such that ∆ = 1 when treatment duration was stopped by choice and ∆ = 0 when treatment duration was stopped due to treatment terminating events. For the methods used below, it will be necessary to define the observable stochastic process N(t) =I(U ≤t,∆ = 1). Now, we assume the observed responseY may be written as the following function of potential outcomes Yt∗, t ≤C:

Y = Z

Yu∗dN(u) +YC∗I(∆ = 0). (4.2) Along with the data (Y, U,∆), we assume that additional, possibly time-dependent covariate information is also available. Let Z(u) denote the value of a vector of co-variates for an individual at timeu, whereuis measured as time from the individual’s entry into the study. We define ZH(t) to be the history of covariate information up to and including time t, i.e.,

ZH(t) = {Z(u), u≤t}.

Define the observable cause-specific hazard function to be the conditional probability λ{t, ZH(t)}= lim

h→0h

1P{U (t, t+h],∆ = 1|U t, ZH(t)}.

A key assumption that allows us to estimate consistently the parameter β from a random sample of observed data is given by

(42)

lim

h→0h

1P nU (t, t+dt],∆ = 1|U t, ZH(t), C, Y

x, t≤x≤C

o

. (4.3)

In words, assumption (4.3) implies that the decision to terminate or continue a pa-tient’s treatment at timet— given the patient has continuously received treatment up to and including time t without a treatment-censoring event, and given the patient’s covariate history up to and including time t — does not depend on future prognosis. Such an assumption is plausible if all of the information about an individual through time t, which may be prognostic and which an investigator may use to make deci-sions on treatment duration, are captured in the data ZH(t). In the epidemiological

literature, assumption (4.3) is referred to as “no unmeasured confounders.”

4.2

The Estimator and Its Properties

4.2.1

Motivation

We now want to address how we might estimate ap-dimensional parameter vector β from the observed data{Yi, Ui,i, ZiH(Ui), i= 1, . . . , n}. The solution we propose is motivated by an (intent-to-treat) analysis from a randomized design, that is, a study design and analysis where treatment confounding and censoring play no significant role.

(43)

the L2-normkY−µ(U, β)k. Then, ˆβn solves

n

X

i=1

ψ∗(Yi, Ui, β) = 0, (4.4)

where

ψ∗(Yi, t, β) =µβ(t, β){Yi−µ(t, β)}

and µβ(Ui, β) is the p-dimensional gradient of µ(Ui, β) with respect to β. Other estimating functions or minimization criteria for β will be considered later, namely,

ψ†(Yi, t, β) = µβ(t, β)V ar1(Yi){Yi−µ(t, β)},

where V ar(Yi) is the variance of Yi. The functionψ†(Yi, t, β) corresponds roughly to a scaled version of the score equations for the exponential family of distributions.

The estimator in (4.4) is known to be sensitive to outlying duration values in t and one simple modification would be to restrict the treatment duration data to a certain range, say [0, tu]. Now, this restricted version of (4.4) may also be written as the sum of stochastic integrals for counting processes (Fleming and Harrington, 1991),

n

X

i=1

Z tu

0 ψ

(Y

i, t, β)dNiuc(t) = 0, (4.5)

whereNuc

i (t) =I(Ui ≤t). This representation will be useful in our development that

follows.

(44)

observable random variables, define the cumulative cause-specific hazard function Λ{t, ZH(t)}=Rt

0 λ{u, ZH(u)}du and

K{t, ZH(t)}= exp µ

Λ{t, ZH(t)}

.

Then Theorem 1 from Chapter 3 suggests that the following function of observable random variables has mean zero for fixed t≤tu,

ψ∗(Y, t, β) (

I(U (t, t+dt],∆ = 1) f{t, ZH(t)}dt +

I(U < t,∆ = 0) K{U, ZH(U)}

)

, (4.6)

where

f{t, ZH(t)}=−d

dtK{t, Z

H(t)}.

Thus, putting (4.5) and (4.6) together, we have the following theorem.

Theorem 2 Let h(t) be any non-negative function in t and

H(t) = Z t

0 h(u)du.

Defineµβ(t, β) = ³∂/∂β´µ(t, β)andg{t, ZH(t)}=f{t, ZH(t)}/h(t). We require that

µβ(t, β) and g{t, ZH(t)} be continuous functions in t, and Y

t∧C be a left-continuous

process in t, t C, with finitely many discontinuites. We also require that the ratio g{t, ZH(t)} and K{U, ZH(U)} be bounded away from zero for all t L, for

tu < L < ∞. Finally, assume that when ∆ = 1, the probability that U and C occur in an infintesimal interval ∆t is on the order (∆t)2 (See Appendix C for precise definition). Then, the following function of observables will have mean zero at the

truth β0,

Z tu

0 ψ

(Y, t, β)

"

dN(t) f{t, ZH(t)}dt +

I(U < t,∆ = 0) K{U, ZH(U)}

#

(45)

where N(t) =I(U ≤t,∆ = 1).

We provide a heuristic argument of the proof below. Define Γ as a partition of the closed interval [0, tu], i.e.

Γ ={s0, s1, . . . , sm},

where s0 = 0, sm = tu, si is a partition point. Define norm of Γ, |Γ|, as the longest subinterval of Γ and define (4.7) as ψ(Y, D, β) where D = {U,∆, ZH(U)}

and g{t, ZH(t)}=f{t, ZH(t)}/h(t). The proof begins by rewriting ψ(Y, D, β) as

Z tu

0

ψ∗(Y, t, β)dN(t) g{t, ZH(t)} +

(1∆) K{U, ZH(U)}

Z tu

U

ψ∗(Y, t, β)h(t)dt. (4.8) We show that ψ(Y, D, β) may be written as the limit Riemann-Stieltjes sumsWΓ = Pm

i=1Wi,Γ (as|Γ| approaches zero) where

Wi,Γ =k1i)∆N(si) + 1

K{U, ZH(U)}k2(ξi)∆si,

where i}, i = 1, . . . , m are intermediate points of the i-th subinterval, ∆si = si−si1, and

∆N(si) = N(si)−N(si1) =I(si1 < U ≤si,∆ = 1) k1i) = ψ∗(Y, ξi, β)/g{ξi, ZHi)}

k2i) = ψ∗(Y, ξi, β)I(U < ξi)h(ξi)

Heuristically, we construct a new sequence of random variables WfΓ (see below) which has the property thatWΓ−WfΓ converges to zero almost surely. Namely, define

f

WΓ=Pmi=1Wfi,Γ,

f

(46)

for Wfi,(1)Γ defined by

ψ∗(Ysi, si, β)I(si−1 < U ≤si, C ≥si)h(si)∆si f†{si−, ZH(s

i−)}

and Wfi,(0)Γ given by ψ∗(YC∗, si, β)

"

I(C < s1) +

i

X

k=2

I(U > sk1, sk1 ≤C < sk) K†{sk1, ZH

k−1}

#

∆H(si). and

f†{sj, ZH(sj)} = π(sj)K†{sj1, ZH(sj1)} K†{sj, ZH(sj)} =

j

Y

k=1

{1−π(sk)}

π(sj) = P {sj1 < U ≤sj, C ≥sj|U > sj1, C ≥sj, ZH(sj1),(Yx∗, x≤C)I(C≥sj)o.

In the appendix, we show the difference WΓ −WfΓ converges to zero almost surely. Hence, WfΓ converges toψ(Y, D, β) almost surely. The random variable WfΓ also has the property thatE(WfΓ) = 0 (which is proven in detail in the appendix). Finally, we would like to argue that E(WfΓ) converges to the expectation ofψ(Y, D, β).

Here, we argue that |WfΓ| is bounded by an integrable random variable for all partitions Γ. One can show that this integrable random variable is a constant using our previously stated assumptions: (1) the continuity ofh(t), µ(t, β), µβ(t, β), (2) the left-continuity ofYtC, and (3) that g{t, ZH(t)} and K{t, ZH(t)}are bounded below

(47)

the efficiency of the resulting estimator. One suggestion is to choose H(t) such that the resulting estimator has additional desirable properties, such as minimum variance or optimal convergence rates. We will see that the choice ofH(t) may also be related to the modelling strategy for K{t, ZH(t)}. We will consider two different modelling

strategies: parametric and semiparametric.

4.3

Parametric Inference

In this section, we will explore parametric models forH(t) andK{t, ZH(t)}. First, suppose K{t, ZH(t)}may be parametrically modelled through the q-dimensional pa-rameter γ (which will imply f{t, ZH(t)} is also a function of γ). Under these condi-tions, the observed data likelihood for thei-th individual is

L(Di, γ)∝ µ

f{Ui, ZiH(Ui), γ}iµ

K{Ui, ZiH(Ui), γ}1i

.

One parametric possibility for K{t, ZH(t), γ} is

K{t, ZH(t), γ}= exp µ

−c Z t

0 e

γ0ZH(u) du

,

for some constantc. Estimation ofγ is straightforward through the following observed data likelihood for the i-th individual L(Di, γ) and associated score vector

Sγ(Di, γ) =

∂γ logL(Di, γ).

(48)

appropriate regulariy assumptions hold, the resulting estimator minus the estimand may be written

n1/2(ˆγn−γ0) = n−1/2

n

X

i=1

Vγ1Sγ(Di, γ0) +op(1),

where Vγ is the expected Fisher information. Now, we move to a discussion on H(t). When considering linear models µ(t, β) = β0+β1t, a convenient choice for H(t) in our definition of ψ(Y, D, β) is

Ga,b(t) = Z t

0

xa−1e−x/bdx

Γ(a)ba ,

and Γ(n) = R0∞un−1e−udu. We writea and b as Roman letters to stress the fact that

they are fixed and known parameters. In this case, ψ∗(Y, t, β) may be written as

ψ∗(Y, t, β) =    

1 t

  

(Y −β0−β1t).

Let us simplify ψ(Y, D, β, γ) (including the dependence of γ in our estimating equa-tion) under these conditions.

The integral of the sum is the sum of the integrals where the first part of the summand is

∆ψ(Y, U, β)h(U)

f{U, ZH(U), γ} . (4.9)

Using our new definition of H(t) = Ga,b(t) and ψ∗(Y, t, β), (4.9) is simplified as the

following:

  

1 U

  

(Y −β0−β1U) ∆U

a−1e−U/b

(49)

The second part of the summand is written as (1∆)I(U < tu)

K{U, ZH(U), γ}

Z tu

U

ψ(Y, t, β)dGa,b(t). (4.11) Again, using our definitions of ψ∗(Y, t, β) and Ga,b(t), the integral in (4.11) may be simplified. One piece of the simplification of (4.11) involves the restricted moments E{TjI(t

` T tu)}, where t` is either zero or U. For the Gamma(a, b) family of

distributions, the restricted moments are given as Z tu

t`

sjdGa,b(s) = Γ(a+j)b

j

Γ(a) {Ga+j,b(tu)−Ga+j,b(t`)}. (4.12) Of course, ifj are integers, then Γ(a+j)/Γ(a) is simply (a+j1)·(a+j2)· · ·(a+1)·a. Now, the integral in (4.11) may be written as

      

(Y −β0)Ga,b([U, tu])−β1bGa+1,b([U, tu]) (Y −β0)Ga+1,b([U, tu])(a+ 1)ab2β1Ga+2,b([U, tu])

     

, (4.13) where Ga,b([t`, tu]) =Ga,b(tu)−Ga,b(t`). Thus, we have given an explicit formula for ψ(Y, D, β, γ) through the sum of (4.10) and (4.11). From here, it is straightforward to derive thei-th data influence function for ˆβn, i.e.

n1/2( ˆβn−β0) =n−1/2

n

X

i=1

IF(Yi, Di, θ0) +op(1), where θ= (β0, γ0)0 and

IF(Yi, Di, θ0) =A−1(θ0) ·

ψ(Yi, Di, θ0)−J(θ0)Vγ1Sγ(Di, γ0) ¸

(4.14) where we define

A(θ0) = E (

∂βψ(Yi, Di, θ0) )

J(θ0) = E (

∂γψ(Yi, Di, θ0) )

(50)

Therefore, ˆβn is consistent for its estimand and n1/2( ˆβ

n −β0) is asymptotically

normal with asymptotic variance given by E{IFi0)IFi00)}. Using the proposed Ga,b(t) distribution for H(t) and the subsequent results in (4.10) and (4.11), and a parametric model for γ, we have given a complete characterization for a parametric solution to our estimation problem.

This method is attractive from a computational point of view. Namely, the inte-grals inψ(Y, D, β, γ) have a closed form and methods for estimatingγ via parametric likelihoods are well-established. The asymptotic variance of n1/2( ˆβ

n−β0) is

consis-tently estimated by the usual sample estimator, n−1Pni=1IFiθn)IFi0θn).

4.4

Semiparametric Inference

One suggestion from the previous section was to modelK{t, ZH(t)}parametrically through the hazard

λ{t, ZH(t)}=exp{γ0ZH(t)},

for some constant c. A similar approach uses a semiparametric formulation for the hazard, that is, through proportional hazards:

λ{t, ZH(t)}=λ0(t) exp{γ0ZH(t)}. (4.15)

In this case, f{t, ZH(t), γ} may be written as the product

(51)

Here,

References

Related documents

Published by the Government Accountability (GAO) Office, one of the most visible public reporting efforts of antipsychotic medication treatment among youth was the 2008

Regarding astrocytoma grade IV, the most malignant glioma type, HR increased statistically significant per year of latency for mobile and cordless phone use and wireless phone use

Following progress by this Subgroup’s Working Teams (Data Provenance and Version Control; Data Rescue and Crowd Sourcing) this subgroup will address and plan a workshop to include

Each department would select the poorest schools that are well-managed, and have Section 21 status, and begin to subsidise Reception Year places at those schools at the

The windowpane oyster family Placunidae Rafinesque, 1815 with additional description of Placuna quadrangula (Philipsson, 1788) from India.. Rocktim Ramen Das, Vijay Kumar

Originally the long legged Hungarian Hound was used for hunting big game like buffaloes and later bears, wild boars and lynxes, while the short legged Hound was used