The Analysis of Censored Covariates in Observational Studies

(1)

JOHNSON, BRENT ALAN. The Analysis of Censored Covariates in Observational Studies. (Under the direction of Anastasios A. Tsiatis.)

(2)

by

Brent Alan Johnson

A dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in

STATISTICS

in the

GRADUATE SCHOOL at

NC STATE UNIVERSITY 2003

Anastasios A. Tsiatis Dennis Boos

Chair of Advisory Committee

(3)

Dedication

(4)

Biography

(5)

Contents

List of Tables vi

List of Figures viii

1 Introduction 1

2 Confounding 5

2.1 What is Confounding? . . . 6

2.2 Adjustment through the Propensity Score . . . 9

3 Estimating Mean Response 14 3.1 Assumptions and Methods . . . 14

3.2 Large Sample Properties . . . 20

3.3 Analysis of the ESPRIT Infusion Trial . . . 22

3.4 Simulation Results . . . 26

4 Estimating Duration-Response Relationships 31 4.1 Methods and Assumptions . . . 31

4.2 The Estimator and Its Properties . . . 33

4.2.1 Motivation . . . 33

4.3 Parametric Inference . . . 38

4.4 Semiparametric Inference . . . 41

4.4.1 Estimation through Weighted Least Squares . . . 44

4.5 Numerical Studies . . . 47

(6)

5 Discussion 53

Bibliography 55

A Proof Theorem 1 59

B Asymptotic Properties of µˆ_jn 64

B.1 M-estimation Method . . . 65

B.2 Derivation of the Influence Function . . . 68

C Proof of Theorem 2 71 C.1 Preliminaries . . . 71

C.2 Main Proof . . . 72

D Asymptotic Properties of βˆ_n 82 D.1 Consistency . . . 83

D.2 Asymptotic Variance Calcuations . . . 86

D.2.1 Time-Independent Confounders . . . 86

(7)

List of Tables

3.1 Summary statistics for the ESPRIT trial . . . 23 3.2 Estimated event rates for the ESPRIT trial. T₁_j denotes the overall

event rate for all patients belonging to interval I_j,T₂_j denotes the un-censored event rate for patients belonging to I_j, ˆµ(0)_j is the proposed estimator assuming no confounding is present, ˆµ(1)_j is the proposed es-timator assuming confounding is present through baseline factors only, and ˆµ(2)_j is the proposed estimator assuming time-dependent confound-ing. Standard errors are presented in parentheses. . . 26 3.3 Simulation summary of mean response for 1000 Monte Carlo datasets

when treatment-duration data are discrete. T₁_j is the average Y for U_i ∈ I_j and T₂_j is the average Y for (U_i ∈ I_j,∆_i = 1). ˆµ_j,₁₂ is our estimator assuming confounding is present throughZ₁ andZ₂. ECP is defined the empirical coverage probability. Estimated standard errors are given in parentheses. . . 28 3.4 Simulation summary of mean response for 1000 Monte Carlo datasets

when the no unmeasured confounder assumption is violated. ˆµ_jn as-sumes confounding is present only through Z₁. Empirical coverage probabilities (ECP) are given in parentheses. . . 30 4.1 Numerical study for Bernoulli response using 1000 Monte Carlo datasets.

ˆ

(8)

4.2 Numerical study for Bernoulli response using 1000 Monte Carlo datasets. ˆ

β_JT is our estimator, ˆβ_LS is MLE from logistic regression using all the subject, while ˆβ_{U C} is the MLE using just those uncensored subjects. Empirical coverage probabilities are given in parentheses. Sample size is 250. . . 50 4.3 Duration-response summary for ESPRIT trial data. Observed duration

(9)

List of Figures

(10)

Chapter 1

Introduction

(11)

reflow, or coronary thrombosis immediately discontinue the infusion process to re-ceive appropriate medical attention; we define these protocol-defined adverse events as infusion-terminating events, or more generally as treatment-terminating events.

The main study analysis suggested that the eptifibatide regimen in the study is superior to placebo. The event proportion for the composite endpoint was 10.5% for placebo versus 6.1% for eptifibatide (p=0.0034). A natural follow-up question posed by the investigators involved the length of infusion of eptifibatide that should be rec-ommended for future patients. Because infusion cannot continue after a treatment-terminating event, a recommendation to infuse fort units of time necessarily implies that treatment would be discontinued after drug was administered fort units of time or when a treatment-terminating event occurs, whichever comes first. That is, the recommended infusion length t may be censored by a treatment-terminating event. Thus, more precisely, the investigators were interested in comparing the event propor-tions for different infusion length “policies,” where a “policy” dictates a particular infusion length t that may be possibly censored by events that require treatment termination.

(12)

patients were assigned at random to receive different treatment duration policies at the beginning of the study, estimation of this parameter for a particular policy would be straightforward following an intention-to-treat principle; the average of all responses for patients assigned to the policy would be an unbiased estimator for this mean.

Data from a study that did not randomize patients to different treatment du-ration policies, such as the data from patients in the ESPRIT trial who received eptifibatide, includes the length of infusion for each patient and whether the infusion was terminated because of a protocol-defined adverse event or because of physician discretion. One difficulty in estimating the mean response for different treatment duration policies using the observed data is that, unlike a well-designed randomized treatment duration study, when a patient has their treatment terminated because of an adverse event then we do not know what treatment duration policy was intended for that patient. Also, as in most observational studies, because infusion length was left to the discretion of the physician rather than dictated by design (randomization), patients receiving different durations may not be prognostically similar. It is intu-itively apparent that simply averaging responses among individuals observed to have completed infusion of length t, either including or excluding patients experiencing a treatment-terminating event prior tot, would yield a biased estimator of the desired policy mean.

(13)

a functional relationship of this mean response continuously in time from a sample of observed data such as those in the ESPRIT infusion trial and derive a consistent estimator. Although there are currently no consistent estimators available for esti-mating mean response under these circumstances, numerous authors have developed estimators for estimating causal parameters in observational studies (e.g., Robins et al., 1994; Robins et al., 2000; Hernan et al., 2000; Hernan et al., 2001; Satten et al., 2001). Our work differs from these methods in that our definition of treatment includes the possibility that treatment may be censored or terminated. There has also been some work in the area of censored treatments or censored covariates (e.g., Clayton, 1978; Akritas et al., 1995); where “covariate” refers to infusion length in our problem. The key difference between these methods and that proposed here is the dependence of the outcome on the covariate censoring process. In our problem, the covariate censoring process is expected to be related to the outcome.

(14)

Chapter 2

Confounding

(15)

2.1

What is Confounding?

Confounders are commonly defined as those variables which affect an outcome or endpoint of interest as well as another factor that is itself associated with the out-come of interest. In this dissertation, that “other factor” will always be an exposure or treatment of scientific interest. Let the random vector (Y, D, X) represent the outcome, exposure, and confounding random variable, respectively, defined on the probability space (Ω,A, P). Then, confoundingdescribes the phenonmenon whereX modifies the relationship between Y and D — that is, the relationship between the exposure and outcome changes with different levels (or values) of the confounding factor.

Here, we will assume that primary scientific interest lies in marginal associations between Y and D across levels of X rather than the conditional associations of Y and Dwithin levels of X. Marginal versus conditional inference in regression models has become the focus of intense debate in recent years and we do not intend to continue that discussion here. Moreover, the aforementioned debate tends to focus on what we may call “associational” models rather than “causal” models, whereas this dissertation will focus on and use a particular causal methodology. To give us some concrete ideas about confounding and their role in real-world applications, let us consider the following examples:

(16)

non-exposure to an occupational hazard — chemical 1,3-butadiene — and X denote smok-ing (adapted from Zhao, Vodicka, Sram, and Hemminki, 2000). Initially, we will

assume X only realizes values zero and one for non-smoker and smoker, respectively.

Although the primary goal of this study was to determine the relation between the exposure D and an increase in S, our goal will be to find an association between the exposure D and the incidence of cancer Y over a 30-year period. Because this study was not randomized, it is unreasonable to assume that those workers exposed to D are prognostically similar to those not exposed toD. For example, more than half of those workers exposed toDwere smokers while less than half of the controls (workers not exposed toD) smoked. Since 1,3-butadiene is contained in cigarette smoke, naive statistical analyses may be unable to decide whether an increase in the incidence of cancer over a 30-year period is due to exposure to the occupational hazard or due to the smoking (See Rosenbaum, 2002)

Example 2.2 (Dose-Response Study) Let Y denote a clinical endpoint, D is an experimental treatment, and X is age. Here, we will not yet specify whether Y is continuous or discrete. Initially, we will assume D is a simple binary random vari-able — such as treatment versus placebo or high dose versus low dose — and X is summarized inK levels. Later, we will want to consider continuous random variables for both D and X.

(17)

analysis can sometimes be complicated when subjects receiving different treatments are also different prognostically. Suppose the investigator decides which subjects receive which treatment. An obvious selection bias results and may further bias the conclusions of the clinical study. In a biomedical setting, high doses of a certain drug may be slightly toxic, cause discomfort or minor “adverse events,” and therefore more difficult for older than younger patients to tolerate. As a result, attending physicians may be more likely to “assign” older patients a lower dose and strong, younger patients the higher dose. Then, the association between outcome and treatment dose will be confounded by age.

Example 2.3 (Simulation) Generate a random variable X according the probabil-ity law F_X. Next generate a random variable Daccording to the probability law F_D_|_X. Finally, generate a random variable Y according the the probability law F_Y_|_D,X.

(18)

2.2

Adjustment through the Propensity Score

LetXbe a random vector of potential confounders andDbe our binary treatment random variable. A propensity score is defined as the probability of treatment or selection probability, e.g. π(X) =P r(D= 1|X). There are basically two approaches to correct for the bias due to confounding using propensity scores which have received much attention in the statistical literature over the past twenty years. The first attempts to control the bias by stratifying on “levels” of the propensity score while the second inversely weights estimating equations by the propensity score.

Before we proceed with our discussion of propensities and related methods, we must first discuss a fundamental difference between the previous methods and the one about to be discussed. In simple linear regression (and all the methods discussed earlier), we often write the treatment effect asE(Y|D= 1)−E(Y|D= 0) =β, where D is a binary treatment random variable and E(Y|D) =α+βD. Propensity score methods will define new random variables Y₀∗ and Y₁∗ (see below) and define a new treatment effect of interest,E(Y₁∗)−E(Y₀∗) =β∗. In general, β 6=β∗. Occassionally, β is referred to as the associational treatment effect while β∗ is referred to as the causal treatment effect. Therefore, when we mention treatment effect from now on, we mean it in the “causal” sense, i.e. β∗.

Definition 2.1 (Counterfactual Response) A potential outcome (or

counterfac-tual response) Y_d∗ is the outcome one observes if the experimental unit receives treat-ment D=d. In the case of a binary treatmentD, the potential outcomes are Y₁∗ and

(19)

Assumption A1 (Stable Unit Treatment Value Assumption) Asserts the

ob-served outcome Y is exactly equal to the potential outcome Y_d∗ if, in fact, the unit coincidentally receives treatment d, i.e.,

Y =Y₁∗D+Y₀∗(1−D),

for a binary random variable D.

Assumption A2 (Strongly Ignorable Treatment Assignment)

P r(D=d|X, Y₀∗, Y₁∗) =P r(D=d|X).

Also known as the “no unmeasured confounders assumption.”

Balancing Scores. DefineXas the p-dimensional random vector,X(ω) : Ω→ <p_{. Then, define a balancing score as the many-to-one function} _b(_X_{) :}_<p _{→ <} _such

that

P r{X∈A|D=d, b(X)}=P r{X∈A|D=d0, b(X)}

for A ∈ A and d and d0 are two different levels of D. It is straightforward to show that the propensity scoreπ(X) is the coarsest balancing score (Rosenbaum and Rubin, 1983a). Because the propensity score is a balancing score, then we have the desirable result that F_X_|_D₌_d,π₍_X₎ =F_X_|_D₌_d0_,π₍_X₎. Using assumptions A1 and A2, one can show

that within levels of the propensity score, we may derive unbiased estimates of the “causal” treatment effect.

(20)

within strata, then combining the strata-specific measures using a weighted average (or weighted sum) to produre a “bias-free” measure of the treatment effect (when combining measures across strata make sense). The method works well when the stratified statistics all move in the same direction and combining the statistics into one overall measure makes sense.

Inverse Probability of Treatment Weighted (IPTW) Estimators. As we said earlier, our interest lies in estimating the marginal effect of D on Y. The second approach, by Robins and Rotnizky (1995), uses score equations and ideas from missing data to control for the selection bias due to X. Like the previous method, this one intends to draw inference on the “causal” treatment effect defined through the potential outcomes Y_d∗, d = 0,1. If we define E(Y_d∗) =µ_d, d= 0,1, then we are primarily interested in the difference ˆµ₁−µˆ₀. One can show the following function of observable random variables is ubiased for µ_d under assumptions A1 and A2:

Y ·I(D=d) π_d(X) ,

where π_d(X) =P r(D =d|X). Unbiasedness follows from the following argument:

E

½_Y _·_I(D₌_d) π_d(X)

¾

= E

½_Y_∗

d ·I(D=d)

π_d(X) ¾

= E

" E

½_Y_∗

d ·I(D=d)

π_d(X) ¯¯ ¯¯Y_d∗,X

¾#

= E

" Y_d∗ π_d(X)E

½

I(D=d)¯¯¯¯Y_d∗,X

¾#

= E

½ _Y_∗

d

π_d(X)P(D=d|X) ¾

since F_D_|_Y∗

(21)

where F_D_|_Y∗

d,X and FD|X are denoting the conditional probability distribution func-tions. There are weaker versions of assumption A2, but this argument gives us the ideas behind the traditional argument used in constructing unbiased estimators for E(Y_d∗), and more generally, for drawing marginal inference on the effect of D on Y in the presence of confounding.

This leads to the following estimating equation for our estimator:

n

X

i=1

(Y_i−µ_d)I(Di =d) π_d(X_i) = 0, which may also be written

ˆ

µ_d=X

i

Y_iw_i P

iwi

, where w_i = I(Di =d) π_d(X_i) ,

for π_d(X_i) = P r(D_i =d|X_i) d= 0,1. Notice that this estimator is simply the usual estimator for mean outcome at leveld inversely weighted by π_d(X).

From here, it is easy to define the general result. Namely, defineS(β) =Pn_i₌₁s_i(β) be the usual score equation one would use to measure the association betweenY and D in the absence of confounding, e.g. in a randomized design, where we are defining s_i(β) as s(Y_i, D_i, β). Assume we measure potential confounders X on each subject. Then, define the new score S†(β) with s_i(β) replaced with s_i(β)/π_d_i(X_i). Using the new score equations will give the appropriate marginal inference on the potential outcome Y_d∗. S†(β) has mean zero under assumptions A1 and A2 using identical arguments to the ones above.

(22)

and proceed as usual.

(23)

Chapter 3

Estimating Mean Response

So far, we have discussed a research question and some statistical tools useful in approaching such questions. Statistical tools should not be used with impunity and all assumptions should be clearly presented and verified, if possible. Here, we make unverifiable assumptions but make every effort to address those assumptions with the research investigators and, to the extent possible, through the observed data. Now, let us construct an approach to estimate the mean response as a function of infusion length discussed in Chapter 1 by extending the ideas and tools presented in Chapter 2.

3.1

Assumptions and Methods

(24)

order to conceptualize properly the question of interest, it is useful to introduce the idea of potential (counterfactual) random variables (Rubin, 1974; Rosenbaum and Rubin, 1983). Specifically, we define the potential random variable C to represent the time at which a randomly selected individual from our population, if continuously treated, would have a treatment-terminating event. We also define the potential ran-dom process (Y?

t , t≤C), whereYt?denotes the response of the individual, if treatment

were terminated at time t, for t ≤ C. In terms of these potential random variables, the policy of treating an individual fortunits of time or until a treatment-terminating event results in the responseY_t∗_∧_C, wheret∧Cdenotes the minimum oftand C. This can also be written as

Y_t∗_∧_C =Y_t∗I(C ≥t) +Y_C∗I(C < t), (3.1) and the parameter of interest is the population mean response for this treatment duration policy, namely, E(Y_t∗_∧_C).

The variables defined above are referred to as potential random variables, or coun-terfactuals, because, contrary to fact, they may not actually be observed. In contrast, for a randomly selected individual from our population, the observable random vari-ables are (Y, U,∆), where Y denotes the observed response, U denotes the actual treatment duration and ∆ is an indicator variable such that ∆ = 0 if treatment du-ration was stopped due to a treatment-terminating event and ∆ = 1 if stopped due to physician discretion.

(25)

finite values, t₁, . . . , t_k. In practice, this may necessitate grouping the treatment du-ration data and will be discussed later. However, when treatment is stopped because of an intervening event (∆ = 0) U can take values along a continuum of time. Under these conditions, the specific parameters on which we focus are the population mean responses for the k treatment duration policies, µ_j =E(Y_t∗

j), j = 1, . . . , k.

A key assumption is that the observed response Y may be written in terms of the potential outcomes as

Y =   

k

X

j=1

Y_t∗

jI(U =tj,∆ = 1)  

+YC∗I(∆ = 0). (3.2)

In words, (3.2) says if a patient experiences a treatment-terminating event, then the observed response is Y_C∗; otherwise, the observed response isY_t∗

j corresponding to the realized value of actual treatment duration U.

Along with the data (Y, U,∆), we assume that additional, possibly time-dependent covariate information is also available. Let Z(u) denote the value of a vector of co-variates for an individual at timeu, whereuis measured as time from the individual’s entry into the study. We define ZH_{(x) to be the history of covariate information up}

to and including time x, i.e., {Z(u), u ≤ x}, and use ZH

j to denote ZH(tj). We

de-fine the jth discrete cause-specific hazard function to be the conditional probability λ_j(ZH

j ) = P(U = tj,∆ = 1|U ≥ tj, ZjH), j = 1, . . . , k. The key assumption that

allows us to estimate consistently the parameter µ_j =E(Y_t∗_j) from a random sample of observed data is given by

(26)

In words, assumption (3.3) implies that the decision to terminate or continue a pa-tient’s treatment at timet_j — given the patient has continuously received treatment up to and including time t_j without a treatment-censoring event, and given the pa-tient’s covariate history up to and including time t_j — does not depend on future prognosis. Such an assumption is plausible if all of the information about an individ-ual through time t_j, which may be prognostic and which an investigator may use to make decisions on treatment duration, are captured in the data ZH

j . In the

epidemi-ological literature, assumption (3.3) is referred to as “no unmeasured confounders.” As we now demonstrate, assumption (3.3) allows us to identify µ_j from the dis-tribution of observable random variables. This then leads naturally to a consis-tent estimator for µ_j. The proposed method uses inverse probability weighting to adjust for potential confounders and censoring across treatment duration policies, where the weights are defined through the observable discrete cause-specific hazards, λ_j( ¯Z_j), j = 1, . . . , k. For reasons discussed below, it is convenient to define the following functions of observables:

f_j(Z_jH) = λ_j(Z_jH)

j_Y−1

m=1

{1−λ_m(Z_mH)} (3.4)

and

K_j(Z_jH) =

j

Y

m=1

{1−λ_m(Z_mH)}. (3.5) We also define [U] = max_j{j :t_j < U}; therefore ZH

[U] refers to all covariate

informa-tion up to and including time t_j just prior to U and K_[_U_](ZH

[U]) will be the product

(27)

Theorem 1 Assuming the probabilities f_j(ZH

j )andK[U](Z[HU])are known, the

follow-ing function of observables is unbiased for µ_j =E(Y_t∗

j) under assumption (3.3): Y ×

  

I(U =t_j,∆ = 1) f_j(ZH

j )

+ I(U < tj,∆ = 0) K_[_U_](ZH

[U])

 

 (3.6)

A proof of Theorem 1 is given in the Appendix.

Using an argument identical to the one in the proof of Theorem 1, we can show that

E 

_(Y ₋_µ_j₎   

I(U =t_j,∆ = 1) f_j(ZH

j )

+I(U < tj,∆ = 0) K_[_U_](ZH

[U])

   

_{= 0.} _(3.7)

Thus for a sample of data {Y_i, U_i,∆_i, ZH_(U

i), i= 1, . . . , n}, a natural estimator for

µ_j would be obtained by solving the simple estimating equation

n

X

i=1

(Y_i−µˆ_jn)   

I(U_i =t_j,∆_i = 1) f_j(ZH

ij)

+I(Ui < tj,∆i = 0) K_[_U_i_](ZH

[Ui])  

= 0, (3.8) whereZH

ij refers to the covariate information up to and including time tj for the i-th

individual. This yields the estimator ˆ

µ_jn= P_n

i=1Yiwij

P_n

i=1wij

, w_ij = I(Ui =tj,∆i = 1) f_j(ZH

ij)

+I(Ui < tj,∆i = 0) K_[_U_i_](ZH

[Ui]) ,

which is a weighted average of the responses. Remark: If we view the probability f_j(ZH

ij) as the propensity score (Rosenbaum and

Rubin, 1983) for thei-th individual to have treatment terminated at time t_j, then, if there were no treatment-terminating events, the estimator would equalPY_iw_ij/Pw_ij, where w_ij =I(U_i =t_j)/f_j(ZH

ij). This is a weighted average of the responses for

(28)

S¨arndal, and Wretman (1983) and Rosenbaum (1987) to adjust for confounding of treatment with baseline time-independent covariates. With censoring, the weighted average also includes responses from individuals who have a treatment-censoring event at timeC < t_j. However, their contribution is weighted by

1 K_[_U_i_](ZH

[Ui])

= 1

f_j(ZH ij)

K_[_U_i_](ZH

[Ui]) .

Intuitively, this weight can be viewed as the inverse propensity score multiplied by the conditional probability that the individual would have treatment stopped at time t_j given that treatment duration was known to be greater than C. Heuristically, this shows how the response of an individual who has treatment censored at time C contributes “to the right” (Blight, 1970; Turnbull, 1974; Turnbull, 1976) in estimating µ_j for all {j : t_j > C}.

Because (3.8) contains the unknown probabilities f_j(Z_jH) and K_[_U_](Z_[H_U_]), which are functions of the unknown parameters λ_j(Z_jH), j = 1, . . . , k, these probabilities must be estimated from the data. This requires that we posit a model for the discrete hazards λ_j(Z_jH, γ), j = 1, . . . , k−1, as a function of a finite dimensional parameter vector γ. Assuming (3.3) holds, it is straightforward to derive the observed-data likelihood of D_i ={U_i,∆_i, ZH(U_i)}, i= 1, . . . , n, as Qn_i₌₁L(γ;D_i), where

L(γ;D_i) =

k_Y−1

j=1

(

λ_ij(γ) 1−λ_ij(γ)

)_I₍_U_i₌_t_j_,_∆_i₌₁₎

{1−λ_ij(γ)}I(Ui≥tj)_, _(3.9)

andλ_ij(γ) = λ_j(ZH

ij, γ). The estimator forγis obtained by maximizing this likelihood.

Of course, the exact form of (3.9) depends on how we model λ_j(ZH

j ) as a function of

(29)

Generalized linear models are often used for their interpretability and general applicability. We consider one such class of generalized linear models, where

λ_j(Z_jH) =F(α_j+β_jtZ_jH),

j = 1, . . . , k−1, and F(t) is the logistic function, i.e. et_{/(1 +}_et_{). This is similar to}

the continuation ratio logit model (Agresti, 1990, p.319), in which case (3.9) becomes

k_Y−1

j=1

Y

i∈Rj

exp{(α_j +βt

jZijH)I(Ui =tj,∆i = 1)}

1 + exp(α_j +βt jZijH)

,

where R_j = {i : U_i ≥ t_j}. This likelihood is similar to Cox’s partial likelihood for continuous time proportional hazards models.

The parametrization above allows for different parameters associated with each time intervalt_j. For parsimony, when applicable, we may assume that some of these parameters are the same across the different time intervals.

To summarize, the proposed estimator for µ_j, j = 1, . . . , k is given by

ˆ µ_jn=

P_n

i=1Yiwˆij

P_n

i=1wˆij

, wˆ_ij = I(Ui =tj,∆i = 1) f_j(ZH

ij,ˆγ)

+I(Ui < tj,∆i = 0) K_[_U_i_](ZH

[Ui],ˆγ) ,

where ˆγ is the maximum likelihood estimator which maximizes (3.9).

3.2

Large Sample Properties

(30)

As mentioned previously, this is an approximation to the truth when ∆ = 1 and U is, in fact, continuous and the data are grouped into k categories by partitioning treatment duration into intervals, with t_j representing the midpoint of thejth inter-val. A rigorous approach to this problem would assume that E{Y_t∗_∧_C} is a smooth function of t and that the number of intervals k increases as a function of n. Such a technical development is beyond the scope of the paper and, we believe, would not add additional practical insight into the problem. We make additional comments on the effect of partitioning in the Discussion.

Because the proposed estimator ˆµ_jnuses the estimated discrete hazardλ_j(ZH ij,γ),ˆ

it is convenient to define the estimator as the first element in the solution to the system of equations

n

X

i=1

   

ψ_µ_j(Y_i, D_i,µˆ_jn,γˆ_n) ψ_γ(D_i,γˆ_n)

  

= 0, (3.10)

where

ψ_µ_j(Y_i, D_i, µ_j, γ) = (Y_i−µ_j)   

I(U_i =t_j,∆ = 1) f_j(ZH

ij;γ)

+I(Ui < lj,∆ = 0) K_[_U_i_](ZH

[Ui];γ)   

ψ_γ(D_i, γ) = ∂

∂γ logL(γ;Di).

Note that we have characterized the proposed estimator as an M-estimator (Huber, 1964), whose asymptotic properties are well known. Hence, under suitable regularity conditions, ˆµ_jn can be shown to be consistent and asymptotically normal when the model for the discrete hazards (3.9) is correctly specified.

(31)

Stefanski, 1995, sec. A.3.6) and is given by

n−1

n

X

i=1



(Y_i−µˆ_jn)2   

I(U_i =t_j,∆ = 1) f_j2(ZH

ij; ˆγn)

+I(Ui < tj,∆ = 0) K_[2_U

i](Z H

[Ui]; ˆγn)   −Hˆi

n ˆ

E(S_γS_γt)o−1Hˆ_it  ,

where ˆE(S_γSt

γ) is a consistent estimator of the Fisher information for γ, and

ˆ

H_i = (Y_i−µˆ_jn)     

I(U_i =t_j,∆ = 1)∂fj(ZijH) ∂γ

f_j2(ZH ij; ˆγn)

+ I(Ui < tj,∆ = 0)

∂K[Ui](ZH[Ui])

∂γ

K_[2_U i](Z

H

[Ui]; ˆγn)

    .

The asymptotic results above pertain to the marginal distribution of ˆµ_jn for a fixedj. It would be straightforward to consider the system of estimating equations

n X i=1                    

ψ_µ₁(Y_i, D_i,µˆ₁_n,ˆγ_n) ·

· ·

ψ_µ_k(Y_i, D_i,µˆ_kn,γˆ_n) ψ_γ(D_i,γˆ_n)

                   

= 0, (3.11)

simultaneously in order to derive the joint asymptotic normal distribution of (ˆµ₁_n, . . . ,µˆ_kn). This may be useful, for example, if formal tests of contrasts of mean response for the different treatment duration policies are desired.

3.3

Analysis of the ESPRIT Infusion Trial

(32)

endpoint of death, MI, or revascularization within 30 days of the initiation of treat-ment. The data are discretized by taking t_j to be the midpoint of five intervals I_j, namely I_j ={(t_j₋₁+t_j)/2,(t_j+t_j₊₁)/2}for t= (t₁, t₂, t₃, t₄, t₅) = (16,18,20,22,24), and we redefine U_i =t_j for any patient where U_i ∈ I_j and ∆_i = 1. There are seven patients who completed infusion before 15 hours and are assigned to the first group (U_i = 16) and seven patients who completed infusion after 25 hours and are assigned to the last group (U_i = 24). The inclusion or exclusion of these 14 observations does not appreciably change the results. The frequency for the number of patients who completed infusion at t_j is 61, 479, 194, 85, and 111 for j = 1, . . . ,5. The proposed method does not require that we discretize the values of C for patients experiencing an infusion terminating event; a summary of these individuals is as follows: 89 cen-sored before 16 hours, 11 between 16 to 18 hours and six patients between 18 to 20 hours.

Table 3.1: Summary statistics for the ESPRIT trial

Event Within 30 Days

No (N=77) Yes (N=959)

Diabetes 0.17 0.20

PTCA 0.34 0.22

Angina 0.31 0.40

Heparin 0.18 0.13

Weight 83.99 85.17

(33)

treat-ment duration policy. As such, the ESPRIT infusion trial investigators were asked to identify factors that they believed would influence treatment duration. The inves-tigators confirmed that treatment would be discontinued with certainty if a patient experienced a protocol-defined adverse event, but otherwise, could not identify any variables that they believed would affect the decision of the participating physicians in any systematic way to terminate treatment. Thus the investigators believed that that there were no measured or unmeasured confounders. Although it is impossible to confirm whether there are no unmeasured confounders in any dataset, we investigate the sensitivity of our analysis results under different assumptions about measured and unmeasured confounders.

(34)

demonstrate that confounding indeed plays a minor role.

Naive estimators for µ_j commonly employed in practice include an overall event proportion for all patients belonging to the intervalI_j which we denote byT₁_j, and an uncensored event proportion for all patients completing infusion inI_j which we denote byT₂_j. In Table 3.2, we present the results of the estimated event proportions for the two naive estimators and three estimators proposed in this paper that make different assumptions about the confounding relationship — ˆµ(0)_jn assumes no confounding is present, ˆµ(1)_jn assumes confounding is present only through baseline covariates, and ˆ

µ(2)_jn assumes the possibility of time-dependent confounding; thus, this last estimator includes the two time-dependent covariates related to nadir platelet count in addition to the baseline covariates. We immediately note that for the first interval T₁₁ is much larger than T₂₁. This can be explained by the higher event proportion among censored patients compared to uncensored patients, i.e. 0.189 compared to 0.061, and by the fact that 90 patients were censored at or before 16 hours while only 61 patients completed physician-recommended infusion at 16 hours. As might be expected under these conditions, the proposed estimator ˆµ_jn is greater than the naive uncensored event proportions,T₂_j, for all j = 1, . . . , k.

(35)

Table 3.2: Estimated event rates for the ESPRIT trial. T₁_j denotes the overall event rate for all patients belonging to interval I_j, T₂_j denotes the uncensored event rate for patients belonging to I_j, ˆµ(0)_j is the proposed estimator assuming no confounding is present, ˆµ(1)_j is the proposed estimator assuming confounding is present through baseline factors only, and ˆµ(2)_j is the proposed estimator assuming time-dependent confounding. Standard errors are presented in parentheses.

j t_j (hrs) T₁_j T₂_j µˆ(0)_j µˆ(1)_j µˆ(2)_j 1 16 0.140 0.017 0.036(0.017) 0.037(0.016) 0.035(0.012) 2 18 0.047 0.046 0.062(0.010) 0.064(0.010) 0.064(0.010) 3 20 0.068 0.068 0.082(0.017) 0.081(0.017) 0.081(0.017) 4 22 0.050 0.050 0.065(0.022) 0.064(0.022) 0.063(0.022) 5 24 0.115 0.115 0.124(0.028) 0.126(0.035) 0.132(0.037)

rate and could possibly be detrimental to the patients.

3.4

Simulation Results

In this section, we investigate the small sample properties of our estimator and explore the sensitivity of our estimator to the assumption of no unmeasured con-founders (3.2). These simulations assume that patients are assigned to one of a finite number of treatment duration policies at values t = (t₁, . . . , t₄). Although the pro-posed method allows for time-dependent covariates, for simplicity, we only consider time-independent covariates. This is consistent with our analysis of the ESPRIT data.

(36)

a treatment-censoring random variable C as an exponential{ρ(Z)} random variable, where

ρ(Z) = 0.005·exp(ϕ₁Z₁),

ϕ₁ = −2. The treatment duration data are simulated according to the following algorithm, which is consistent with the assumptions made: Start by letting m= 1.

1. If C < t_m, then define U =C and ∆ = 0.

2. For C ≥ t_m, generate a Bernoulli random variable Q_m, the indicator variable for stopping treatment at time t_m, with probability λ_m(Z) where

logit{λ_m(Z)}=α_m+β₁Z₁.

3. If Q_m = 1, then assign U = t_m and ∆ = 1; if Q_m = 0 and m < k, then incrementm tom+ 1 and goto Step 1.

4. IfU =t_j and ∆ = 1, then generate the corresponding responseY as a Bernoulli random variable with probability π, where

logit(π) =η_j+ζ₁Z₁,

whereas, if U =C and ∆ = 0, then generate the corresponding response Y as a Bernoulli random variable with probabilityπ, where

(37)

Note that the parameter υ is only present when ∆ = 0. Therefore, roughly speaking, the log odds of Y = 1 are multiplied by exp(υ) when ∆ = 0 vis-a-vis ∆ = 1, thus inducing a dependence of censoring of treatment infusion on the probability of outcome. The population parameter of interestµ_j is difficult to evaluate analytically; thus, we approximated its value by simulation. Using the above algorithm, we forced treatment duration to be stopped at timet_j, if not already censored by replacing Step 2 with “Q_m = 0 for m= 1, . . . , j−1 andQ_j = 1”, generating the outcomeY 100,000 times, and then taking the sample average.

The chosen values of the parameters are as follows: α = (−1.2,−0.75,0), β₁ = −0.5, η = (−5,−4,−4,−2), ζ₁ = −2, υ = 2. For each data set, nominal 95% Wald confidence intervals were constructed using ˆµ_jn, its standard error from Section 3.2, and a critical value of 1.96. Then, the empirical coverage probability (ECP) is calcu-lated as the number of MC datasets where the trueµ_jfalls within the Wald confidence interval divided by the total number of MC datasets.

Table 3.3: Simulation summary of mean response for 1000 Monte Carlo datasets when treatment-duration data are discrete. T₁_j is the average Y for U_i ∈ I_j and T₂_j is the average Y for (U_i ∈I_j,∆_i = 1). ˆµ_j,₁₂is our estimator assuming confounding is present through Z₁ and Z₂. ECP is defined the empirical coverage probability. Estimated standard errors are given in parentheses.

(38)

Table 3.3 presents simulation results for estimating mean response as a function of treatment duration policy t_j, j = 1, . . . ,4 when our no unmeasured confounders assumption is true. We use a sample size ofn= 1000 and generate 1000 Monte Carlo datasets. Monte Carlo bias is less than 2% in every case and less than 1% in most cases; interval coverage is approximately the nominal level. The naive estimators,T₁_j and T₂_j, are biased and do not possess the correct coverage probability.

Next, we investigate the sensitivity of our estimator to deviations from the “no unmeasured confounders” assumption. For this, we include an additional covariateZ₂ to represent a potential unmeasured confounder. To induce additional confounding, we define an additional term β₂Z₂ in the treatment choice model in step 2 of the algorithm and an additional term ζ₂Z₂ in the prognostic model in step 4 of the algorithm. We fix β₁ = −0.5, ζ₁ = −2 and let the other parameters, α, η, υ be the same as in the first simulation. Different values of β₂ and ζ₂ are considered to study the effect of the strength of confounding of the unmeasured variable Z₂ on the resulting estimator.

The estimator ˆµ_jnis computed using only the covariateZ₁. We again use a sample size ofn = 1000 and 1000 Monte Carlo datasets. The simulation results are displayed in Table 3.4.

(39)

Table 3.4: Simulation summary of mean response for 1000 Monte Carlo datasets when the no unmeasured confounder assumption is violated. µˆ_jn assumes confounding is present only through Z₁. Empirical coverage probabilities (ECP) are given in parentheses.

ζ₂ =−1 ζ₂ =−0.5 t_j β₂ µ_j µˆ_jn(ECP) µ_j µˆ_jn(ECP)

-0.5 0.094(0.961) 0.087(0.970)

t₁ -0.25 0.090 0.092(0.974) 0.084 0.085(0.961)

0 0.090(0.958) 0.085(0.955)

-0.5 0.126(0.957) 0.118(0.961)

t₂ -0.25 0.125 0.126(0.966) 0.117 0.117(0.963)

0 0.126(0.955) 0.117(0.958)

-0.5 0.130(0.907) 0.126(0.942)

t₃ -0.25 0.134 0.133(0.924) 0.130 0.127(0.949)

0 0.137(0.973) 0.129(0.954)

-0.5 0.152(0.767) 0.152(0.877)

t₄ -0.25 0.172 0.161(0.871) 0.166 0.157(0.917)

(40)

Chapter 4

Estimating Duration-Response

Relationships

We now explore methods to estimate the mean reresponseE(Y_t∗_∧_C) as a continuous function in t. In particular, we will write

E(Y_t∗_∧_C) = µ(t, β) (4.1)

through thep-dimensional parameter vectorβ. Our goal is to derive a consistent and asymptotically normal estimator for β.

4.1

Methods and Assumptions

(41)

may not actually be observed. In contrast, for a randomly selected individual from our population, the observable random variables are given by {Y, U,∆}, where Y denotes the observed response, U denotes the actual treatment duration realizing values along a continuum of time, and ∆ is an indicator variable such that ∆ = 1 when treatment duration was stopped by choice and ∆ = 0 when treatment duration was stopped due to treatment terminating events. For the methods used below, it will be necessary to define the observable stochastic process N(t) =I(U ≤t,∆ = 1). Now, we assume the observed responseY may be written as the following function of potential outcomes Y_t∗, t ≤C:

Y = Z

Y_u∗dN(u) +Y_C∗I(∆ = 0). (4.2) Along with the data (Y, U,∆), we assume that additional, possibly time-dependent covariate information is also available. Let Z(u) denote the value of a vector of co-variates for an individual at timeu, whereuis measured as time from the individual’s entry into the study. We define ZH(t) to be the history of covariate information up to and including time t, i.e.,

ZH(t) = {Z(u), u≤t}.

Define the observable cause-specific hazard function to be the conditional probability λ{t, ZH(t)}= lim

h→0h

−1_P_{_U _∈_{(t, t}₊_h],_{∆ = 1}_|_U _≥_{t, Z}H_(t)_}_.

A key assumption that allows us to estimate consistently the parameter β from a random sample of observed data is given by

(42)

lim

h→0h

−1_P n_U _∈_{(t, t}₊_dt],_{∆ = 1}_|_U _≥_{t, Z}H_{(t), C, Y}∗

x, t≤x≤C

o

. (4.3)

In words, assumption (4.3) implies that the decision to terminate or continue a pa-tient’s treatment at timet— given the patient has continuously received treatment up to and including time t without a treatment-censoring event, and given the patient’s covariate history up to and including time t — does not depend on future prognosis. Such an assumption is plausible if all of the information about an individual through time t, which may be prognostic and which an investigator may use to make deci-sions on treatment duration, are captured in the data ZH_{(t). In the epidemiological}

literature, assumption (4.3) is referred to as “no unmeasured confounders.”

4.2

The Estimator and Its Properties

4.2.1

Motivation

We now want to address how we might estimate ap-dimensional parameter vector β from the observed data{Y_i, U_i,∆_i, Z_iH(U_i), i= 1, . . . , n}. The solution we propose is motivated by an (intent-to-treat) analysis from a randomized design, that is, a study design and analysis where treatment confounding and censoring play no significant role.

(43)

the L2-normkY−µ(U, β)k. Then, ˆβ_n solves

n

X

i=1

ψ∗(Y_i, U_i, β) = 0, (4.4)

where

ψ∗(Y_i, t, β) =µ_β(t, β){Y_i−µ(t, β)}

and µ_β(U_i, β) is the p-dimensional gradient of µ(U_i, β) with respect to β. Other estimating functions or minimization criteria for β will be considered later, namely,

ψ†(Y_i, t, β) = µ_β(t, β)V ar−1(Y_i){Y_i−µ(t, β)},

where V ar(Y_i) is the variance of Y_i. The functionψ†(Y_i, t, β) corresponds roughly to a scaled version of the score equations for the exponential family of distributions.

The estimator in (4.4) is known to be sensitive to outlying duration values in t and one simple modification would be to restrict the treatment duration data to a certain range, say [0, t_u]. Now, this restricted version of (4.4) may also be written as the sum of stochastic integrals for counting processes (Fleming and Harrington, 1991),

n

X

i=1

Z _t_u

0 ψ

∗_(Y

i, t, β)dNiuc(t) = 0, (4.5)

whereNuc

i (t) =I(Ui ≤t). This representation will be useful in our development that

follows.

(44)

observable random variables, define the cumulative cause-specific hazard function Λ{t, ZH_(t)_}₌Rt

0 λ{u, ZH(u)}du and

K{t, ZH(t)}= exp µ

−Λ{t, ZH(t)} ¶

.

Then Theorem 1 from Chapter 3 suggests that the following function of observable random variables has mean zero for fixed t≤t_u,

ψ∗(Y, t, β) (

I(U ∈(t, t+dt],∆ = 1) f{t, ZH_(t)}_dt +

I(U < t,∆ = 0) K{U, ZH_(U₎}

)

, (4.6)

where

f{t, ZH(t)}=−d

dtK{t, Z

H_(t)_}_.

Thus, putting (4.5) and (4.6) together, we have the following theorem.

Theorem 2 Let h(t) be any non-negative function in t and

H(t) = Z _t

0 h(u)du.

Defineµ_β(t, β) = ³∂/∂β´µ(t, β)andg{t, ZH_(t)_}₌_f_{_{t, Z}H_(t)_}_/h(t)_{. We require that}

µ_β(t, β) and g{t, ZH_(t)_} _{be continuous functions in} _t_{, and} _Y∗

t∧C be a left-continuous

process in t, t ≤ C, with finitely many discontinuites. We also require that the ratio g{t, ZH_(t)_} _and _K_{_{U, Z}H_(U₎_} _{be bounded away from zero for all} _t _≤ _L_{, for}

t_u < L < ∞. Finally, assume that when ∆ = 1, the probability that U and C occur in an infintesimal interval ∆t is on the order (∆t)2 (See Appendix C for precise definition). Then, the following function of observables will have mean zero at the

truth β₀,

Z _t_u

0 ψ

∗_{(Y, t, β)}

"

dN(t) f{t, ZH_(t)}_dt +

I(U < t,∆ = 0) K{U, ZH_(U₎}

#

(45)

where N(t) =I(U ≤t,∆ = 1).

We provide a heuristic argument of the proof below. Define Γ as a partition of the closed interval [0, t_u], i.e.

Γ ={s₀, s₁, . . . , s_m},

where s₀ = 0, s_m = t_u, s_i is a partition point. Define norm of Γ, |Γ|, as the longest subinterval of Γ and define (4.7) as ψ(Y, D, β) where D = {U,∆, ZH_(U₎_}

and g{t, ZH_(t)}₌_f{_{t, Z}H_(t)}_{/h(t). The proof begins by rewriting} _{ψ(Y, D, β) as}

Z _t_u

0

ψ∗(Y, t, β)dN(t) g{t, ZH_(t)} +

(1−∆) K{U, ZH_(U₎}

Z _t_u

U

ψ∗(Y, t, β)h(t)dt. (4.8) We show that ψ(Y, D, β) may be written as the limit Riemann-Stieltjes sumsW_Γ = P_m

i=1Wi,Γ (as|Γ| approaches zero) where

W_i,_Γ =k₁(ξ_i)∆N(s_i) + 1−∆

K{U, ZH_(U₎}k2(ξi)∆si,

where {ξ_i}, i = 1, . . . , m are intermediate points of the i-th subinterval, ∆s_i = s_i−s_i₋₁, and

∆N(s_i) = N(s_i)−N(s_i₋₁) =I(s_i₋₁ < U ≤s_i,∆ = 1) k₁(ξ_i) = ψ∗(Y, ξ_i, β)/g{ξ_i, ZH(ξ_i)}

k₂(ξ_i) = ψ∗(Y, ξ_i, β)I(U < ξ_i)h(ξ_i)

Heuristically, we construct a new sequence of random variables Wf_Γ (see below) which has the property thatW_Γ−Wf_Γ converges to zero almost surely. Namely, define

f

W_Γ=Pm_i₌₁Wf_i,_Γ,

f

(46)

for Wf_i,(1)_Γ defined by

ψ∗(Y_s∗_i, s_i, β)I(si−1 < U ≤si, C ≥si)h(si)∆si f†{s_i−, ZH_(s

i−)}

and Wf_i,(0)_Γ given by ψ∗(Y_C∗, s_i, β)

"

I(C < s₁) +

i

X

k=2

I(U > s_k₋₁, s_k₋₁ ≤C < s_k) K†{s_k₋₁, ZH

k−1}

#

∆H(s_i). and

f†{s_j, ZH(s_j)} = π(s_j)K†{s_j₋₁, ZH(s_j₋₁)} K†{s_j, ZH(s_j)} =

j

Y

k=1

{1−π(s_k)}

π(s_j) = P {s_j₋₁ < U ≤s_j, C ≥s_j|U > s_j₋₁, C ≥s_j, ZH(s_j₋₁),(Y_x∗, x≤C)I(C≥s_j)o.

In the appendix, we show the difference W_Γ −Wf_Γ converges to zero almost surely. Hence, Wf_Γ converges toψ(Y, D, β) almost surely. The random variable Wf_Γ also has the property thatE(Wf_Γ) = 0 (which is proven in detail in the appendix). Finally, we would like to argue that E(Wf_Γ) converges to the expectation ofψ(Y, D, β).

Here, we argue that |Wf_Γ| is bounded by an integrable random variable for all partitions Γ. One can show that this integrable random variable is a constant using our previously stated assumptions: (1) the continuity ofh(t), µ(t, β), µ_β(t, β), (2) the left-continuity ofY_t∗_∧_C, and (3) that g{t, ZH_(t)} _and _K{_{t, Z}H_(t)}_{are bounded below}

(47)

the efficiency of the resulting estimator. One suggestion is to choose H(t) such that the resulting estimator has additional desirable properties, such as minimum variance or optimal convergence rates. We will see that the choice ofH(t) may also be related to the modelling strategy for K{t, ZH_(t)_}_{. We will consider two different modelling}

strategies: parametric and semiparametric.

4.3

Parametric Inference

In this section, we will explore parametric models forH(t) andK{t, ZH(t)}. First, suppose K{t, ZH(t)}may be parametrically modelled through the q-dimensional pa-rameter γ (which will imply f{t, ZH(t)} is also a function of γ). Under these condi-tions, the observed data likelihood for thei-th individual is

L(D_i, γ)∝ µ

f{U_i, Z_iH(U_i), γ} ¶_∆_iµ

K{U_i, Z_iH(U_i), γ} ¶₁₋_∆_i

.

One parametric possibility for K{t, ZH_{(t), γ}_} _is

K{t, ZH(t), γ}= exp µ

−c Z _t

0 e

γ0ZH₍_u₎ du

¶ ,

for some constantc. Estimation ofγ is straightforward through the following observed data likelihood for the i-th individual L(D_i, γ) and associated score vector

S_γ(D_i, γ) = ∂

∂γ logL(Di, γ).

(48)

appropriate regulariy assumptions hold, the resulting estimator minus the estimand may be written

n1/2(ˆγ_n−γ₀) = n−1/2

n

X

i=1

V_γ−1S_γ(D_i, γ₀) +o_p(1),

where V_γ is the expected Fisher information. Now, we move to a discussion on H(t). When considering linear models µ(t, β) = β₀+β₁t, a convenient choice for H(t) in our definition of ψ(Y, D, β) is

G_a,b(t) = Z _t

0

xa−1_e−x/b_dx

Γ(a)ba ,

and Γ(n) = R₀∞un−1_e−u_{du. We write}_a _and _b _{as Roman letters to stress the fact that}

they are fixed and known parameters. In this case, ψ∗(Y, t, β) may be written as

ψ∗(Y, t, β) =    

1 t

  

(Y −β0−β1t).

Let us simplify ψ(Y, D, β, γ) (including the dependence of γ in our estimating equa-tion) under these conditions.

The integral of the sum is the sum of the integrals where the first part of the summand is

∆ψ∗(Y, U, β)h(U)

f{U, ZH_{(U), γ}} . (4.9)

Using our new definition of H(t) = G_a,b(t) and ψ∗(Y, t, β), (4.9) is simplified as the

following: _

  

1 U

  

(Y −β0−β1U) ∆U

a−1_e−U/b

(49)

The second part of the summand is written as (1−∆)I(U < t_u)

K{U, ZH_(U_{), γ}}

Z _t_u

U

ψ(Y, t, β)dG_a,b(t). (4.11) Again, using our definitions of ψ∗(Y, t, β) and G_a,b(t), the integral in (4.11) may be simplified. One piece of the simplification of (4.11) involves the restricted moments E{Tj_I(t

` ≤ T ≤ tu)}, where t` is either zero or U. For the Gamma(a, b) family of

distributions, the restricted moments are given as Z _t_u

t`

sjdG_a,b(s) = Γ(a+j)b

j

Γ(a) {Ga+j,b(tu)−Ga+j,b(t`)}. (4.12) Of course, ifj are integers, then Γ(a+j)/Γ(a) is simply (a+j−1)·(a+j−2)· · ·(a+1)·a. Now, the integral in (4.11) may be written as

      

(Y −β₀)G_a,b([U, t_u])−β₁bG_a₊₁_,b([U, t_u]) (Y −β₀)G_a₊₁_,b([U, t_u])−(a+ 1)ab2β₁G_a₊₂_,b([U, t_u])

     

, (4.13) where G_a,b([t_`, t_u]) =G_a,b(t_u)−G_a,b(t_`). Thus, we have given an explicit formula for ψ(Y, D, β, γ) through the sum of (4.10) and (4.11). From here, it is straightforward to derive thei-th data influence function for ˆβ_n, i.e.

n1/2( ˆβ_n−β₀) =n−1/2

n

X

i=1

IF(Y_i, D_i, θ₀) +o_p(1), where θ= (β0, γ0)0 and

IF(Y_i, D_i, θ₀) =A−1(θ₀) ·

ψ(Y_i, D_i, θ₀)−J(θ₀)V_γ−1S_γ(D_i, γ₀) ¸

(4.14) where we define

A(θ₀) = E (

− ∂

∂βψ(Yi, Di, θ0) )

J(θ₀) = E (

− ∂

∂γψ(Yi, Di, θ0) )

(50)

Therefore, ˆβ_n is consistent for its estimand and n1/2_{( ˆ}_β

n −β0) is asymptotically

normal with asymptotic variance given by E{IF_i(θ₀)IF_i0(θ₀)}. Using the proposed G_a,b(t) distribution for H(t) and the subsequent results in (4.10) and (4.11), and a parametric model for γ, we have given a complete characterization for a parametric solution to our estimation problem.

This method is attractive from a computational point of view. Namely, the inte-grals inψ(Y, D, β, γ) have a closed form and methods for estimatingγ via parametric likelihoods are well-established. The asymptotic variance of n1/2_{( ˆ}_β

n−β0) is

consis-tently estimated by the usual sample estimator, n−1Pn_i₌₁IF_i(ˆθ_n)IF_i0(ˆθ_n).

4.4

Semiparametric Inference

One suggestion from the previous section was to modelK{t, ZH(t)}parametrically through the hazard

λ{t, ZH(t)}=c·exp{γ0ZH(t)},

for some constant c. A similar approach uses a semiparametric formulation for the hazard, that is, through proportional hazards:

λ{t, ZH(t)}=λ₀(t) exp{γ0ZH(t)}. (4.15)

In this case, f{t, ZH_{(t), γ}_} _{may be written as the product}

(51)

Here,