Self-consistent estimation of censored quantile regression

(1)

Contents lists available atSciVerse ScienceDirect

Journal of Multivariate Analysis

journal homepage:www.elsevier.com/locate/jmva

Self-consistent estimation of censored quantile regression

Limin Peng

Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Rd. NE, Atlanta, 30322, USA

a r t i c l e i n f o

Article history: Received 11 June 2011 Available online 21 October 2011 AMS subject classifications: 62N01 62H99 Keywords: Censoring Martingale Regression quantile Self-consistency

Stochastic integral equation

a b s t r a c t

The principle of self-consistency has been employed to estimate regression quantile with randomly censored response. The asymptotic studies for this type of approach was established only recently, partly due to the complex forms of the current self-consistent estimators of censored regression quantiles. Of interest, how the self-consistent estimation of censored regression quantiles is connected to the alternative martingale-based approach still remains uncovered. In this paper, we propose a new formulation of self-consistent censored regression quantiles based on stochastic integral equations. The proposed representation of censored regression quantiles entails a clearly defined estimation procedure. More importantly, it greatly simplifies the theoretical investigations. We establish the large sample equivalence between the proposed self-consistent estimators and the existing estimator derived from martingale-based estimating equations. The connection between the new consistent estimation approach and the available self-consistent algorithms is also elaborated.

1. Introduction

Quantile regression [12] has arisen into a useful regression technique for survival data (i.e. time-to-event data). Compared to traditional survival regression methods, including the Cox proportional hazards model and the accelerated failure time (AFT) model, quantile regression can accommodate a more general relationship between an event time of interest and covariates while providing straightforward interpretations.

LetTandCdenote time to event and time to censoring, and letZdenote ap

×

1 covariate vector with the first component set as 1. Without loss of generality, the censored regression quantile model investigated in this paper takes the form

QT

(τ

|

Z

)

=

exp

{

ZT

β

0

(τ)

}

, τ

∈

(

0

,

1

),

(1)

whereQT

(τ

|

Z

)

≡

inf

{

t

:

Pr

(

T

≤

t

|

Z

)

≥

τ

}

denotes the conditional quantile function ofT givenZ (with the same

definition applied to any other random variable), and

β

₀

(τ)

is ap

×

1 vector of unknown regression coefficients representing covariate effects on the

τ

-th quantile of logT. Model(1)adopts the standard random censoring mechanism. That is,Tand

Care assumed to be independent conditional onZ. Under this independent censoring assumption, the distribution ofCis allowed to depend onZ.

It is worth noting that much previous work on quantile regression with censored data cannot address the regression quantile problem defined above. For example, early efforts by Powell [20,21] require that all censoring times be known or fixed and so do the subsequent work by Fitzenberger [5], Buchinsky and Hahn [3], among others. Other methods, such as those by Ying et al. [25] and Honore et al. [7], demand unconditional independence betweenTandC, which is a stronger assumption than the standard random censorship, and thus cannot be applied here.

Portnoy [18] made the first attempt to tackle the censored regression quantile problem(1)by novelly employing the principle of self-consistency [4]. The principle of self-consistency here, in short, refers to an estimation scenario from

E-mail address:[email protected].

(2)

equaling the estimator to an expression that contains the estimator itself. The estimator proposed in [18] reduces to the Kaplan–Meier estimator [9] in the one-sample case. The initial iterative algorithm was polished by Neocleous et al. [16] to a grid-based estimation procedure. We hereafter refer the estimator defined in [16] to as Portnoy’s self-consistent estimator, denoted by



β

_PSC

(τ)

. Asymptotic studies on the self-consistent estimation of regression quantile were established only until

recently by Portnoy and Lin [19], partly due to the complex representation of the existing estimators. Other subsequent work of [18] includes (but is not limited to) [15], which studied the monotonicity property associated with



β

_PSC

(τ)

, and [24], which

relaxed the global linear assumption in model(1).

Peng and Huang [17] proposed an alternative approach to estimating

β

₀

(τ)

in model(1)by utilizing the martingale structure of randomly censored data. A clearly defined grid-based algorithm was developed based on a set of monotone estimating equations. These estimating equations have appealing stochastic integral representations which greatly facilitate large-sample studies. Peng and Huang [17] derived the closed form of the limit process of their estimator, denoted by



β

_PH

(τ)

.

It was also shown that



β

_PH

(τ)

becomes a Nelson–Aalen type estimator [14,1] when there is no covariate. More recently,

Huang [8] derived a grid-free estimation procedure based on a new concept of quantile calculus. The resulting estimator has the same distribution as Peng and Huang [17]’s estimator.

A brief introduction of the algorithms presented in [16,17] is provided in Section2. Both of these approaches have been implemented forRin the contributed packagequantreg[11]. Comprehensive numerical studies conducted by Koenker [10] demonstrated very similar empirical performance between



β

_PSC

(τ)

and



β

_PH

(τ)

. Though the asymptotic equivalence

between these two estimators can be established in the one-sample case given the fact that the Kaplan–Meier estimator and the Nelson–Aalen estimator are equivalent in the large sample sense, their connection remains unknown in general regression settings.

In this paper, we develop a new framework to study the self-consistent estimation of model(1)withTsubject to standard random censorship. With the principle of self-consistency generally perceived as an estimation strategy that defines an estimator as some function of the observed data and the estimator itself, and then utilizes such a relationship to obtain a consistent well-defined estimator, throughout this paper, the self-consistent estimation of model(1)refers to as the estimation of

β

₀

(

· )

which involves a use of self-consistency principle. Compared to previous work, the proposed method preserves the computational-ease feature, while providing a more direct approach to the asymptotic theory. Of note, the current self-consistent estimators are defined under the assumption that no censoring would occur belowZT

_β

0

(ϵ

l

)

for allZ,

where

ϵ

lis a prespecified constant in

(

0

,

1

)

. This assumption is not likely to incur serious concerns in real data analysis with

ϵ

lselected small enough. Nevertheless, it is practically desirable to eliminate this restriction. In Section3, we formulate

censored regression quantiles based on stochastic integral equations derived from adopting self-consistency principle. The new representation of self-consistent censored regression quantiles entails clearly defined estimation and inference procedures, which avoid the artificial data constraint stated above. In Section4, we show the asymptotic equivalence of the proposed self-consistent estimators to



β

_PH

(τ)

. Therefore, we establish the uniform consistency and weak convergence of the

new estimators with the closed-forms for the limit distributions derived. Furthermore, we elaborate the connection between the proposed estimators and Portnoy’s self-consistent estimator. The results aid in understanding the close proximity between



β

_PH

(τ)

and



β

_PSC

(τ)

observed in previous empirical studies. Monte-Carlo simulations reported in Section5confirm

our theoretical findings.

It is important to note that the proposed framework for self-consistent estimation of censored quantile regression can be extended to survival settings with more complex censoring mechanisms. A few remarks are supplied in Section6. 2. Two existing approaches for censored quantile regression

DefineX

˜

=

T

∧

Cand

δ

=

I

(

T

≤

C

)

, where

∧

is the minimum operator andI

(

· )

is the indicator function. The observed randomly censored data consist ofniid replicates of

(

X

˜

, δ,

Z

)

, denoted by

{

(

X

˜

i

, δ

i

,

Zi

),

i

=

1

, . . . ,

n

}

. We defineX

=

logX

˜

and accordinglyXi

=

logX

˜

i.

2.1. Portnoy’s self-consistent approach

We here outline the grid-based algorithm presented in [16]. A grid of

τ

-values is defined as 0

< τ

1

< τ

2

<

· · ·

< τ

M

<

1.

Definem

(

β

,

i

,

k

)

=

max

{

l

:

1

≤

l

≤

k

−

1

,

ZT

i

β

(τ

l

) <

Xi

≤

ZiT

β

(τ

l+1

)

}

if the set

{

l

:

1

≤

l

≤

k

−

1

,

ZiT

β

(τ

l

) <

Xi

≤

ZT

i

β

(τ

l+1

)

}

is not empty, andm

(

β

,

i

,

k

)

=

k

+

1 otherwise. By this definition,m

(

β

,

i

,

0

)

=

1.

Step1. Compute



β

_PSC

(τ

1

)

by fitting the uncensored quantile regression with data

{

Xi

, δ

i

,

Zi

}

ni=1. It is assumed that all censored

Xi’s are above the hyperplane determined by



β

_PSC

(τ

₁

)

. Setk

=

1.

Step2. Given



β

_PSC

(τ

_l

)

(l

≤

k), obtain



β

_PSC

(τ

_k+1

)

by minimizing the following weighted check function:



δi=1

ρ

τ

(

Xi

−

ZiTb

)

+



δi=0



ˆ

w

k+1,i

ρ

τ

(

Xi

−

ZiTb

)

+

(

1

− ˆ

w

k+1,i

)ρ

τ

(

X∗

−

ZiTb

)



,

(2)

where

w

ˆ

_k+1,i

=

(τ

k+1

−

τ

m(βPSC,i,k)

)/(

1

−

τ

m(βPSC,i,k)

)

ifm

(



β

PSC

,

i

,

k

) <

k

+

1 and 1 otherwise,

ρ

τ

(

u

)

=

u

{

τ

−

I

(

u

<

0

)

}

,

(3)

Step3. Replacekbyk

+

1.

Step4. Repeat steps 2–3 untilk

>

Mor only censored observations remain aboveZ_iT



β

_PSC

(τ

_k−1

)

.

Remark 1. The censored regression quantiles studied in [19] slightly variate from those defined in [16]. First, Portnoy and Lin [19] defined both



β

(τ

₁

)

and



β

(τ

₂

)

by ordinary uncensored regression quantile methods. Second, Portnoy and Lin [19]

used a linear interpolation to define the quantile crossing of a censored observation, while [16] essentially adopted the cadlag version of regression quantiles. Third, Portnoy [19] set

w

ˆ

_k+1,ias 0 instead of 1 whenm

(



β

_PSC

,

i

,

k

)

=

k

+

1. Nevertheless, as

pointed out by Portnoy and Lin [19], ‘‘either definition leads to the same asymptotic distribution’’.

Note that there may involve some ambiguity in determining



β

_PSC

(τ

₁

)

when there is a censoredX_i lying below the

hyperplane Z_iT



β

_PSC

(τ

₁

)

. This complication tends to be more troublesome in bootstrap-based inferences, as the above

phenomenon can occur more often in re-sampled datasets. A detailed discussion on this issue can be found in the Appendix of [18].

2.2. Peng and Huang (2008) [17]’s approach

The estimator proposed by Peng and Huang [17],



β

_PH

(τ)

, is defined as a cadlag approximation to the solution of the

estimating equation, n1/2S_n(PH)

(

β

, τ)

=

0

,

whereS_n(PH)

(

β

, τ)

=

n−1



n i=1Zi



Ni

{

ZiT

β

(τ)

} −



τ 0 Yi

{

ZiT

β

(

u

)

}

dH

(

u

)



withNi

(

x

)

=

I

(

Xi

≤

x

, δ

i

=

1

)

,Yi

(

x

)

=

I

(

Xi

≥

x

)

and

H

(

x

)

= −

log

(

1

−

x

)

. A grid of

τ

-values,G, is defined as 0

=

τ

₀

< τ

₁

< τ

₂

<

· · ·

< τ

_M

=

τ

_U, where

τ

Uis a deterministic

constant subject to some identifiability constraint. Let

∥

G

∥

denote the size ofG, maxk=0,...,M

(τ

k+1

−

τ

k

)

. Without further

mentioning,Gwill be adopted throughout the rest of the paper, even for



β

_PSC

(τ)

. The algorithm for obtaining



β

_PH

(τ)

is as

follows.

Step1. Set exp

{

ZT

i



β

_PH

(τ

₀

)

} =

0 for alliby the definition ofQ_T

(τ

|

Z

)

. Setk

=

0.

Step2. Given exp

{

Z_iT



β

_PH

(τ

_l

)

} =

0 forl

≤

k, obtain



β

_PH

(τ

_k+1

)

as the minimizer of the followingL1-type convex objective

function: lk+1(h)= n  i=1  δiXi−δihTZi  +      X∗−hT n  l=1 (−δlZl)      +      X∗−hT n  r=1  2Zr k  l=0 I[Xr≥ZrTβ_PH(τl)]{H(τl+1)−H(τl)}      , whereX∗_{is an extremely large value.}

Step3. Replacekbyk

+

1 and repeat step 2 untilk

=

Mor no feasible solution can be found for minimizinglk

(

h

)

.

3. The proposed formulation of self-consistent censored regression quantiles

DefineRi

(

x

)

=

I

(

Xi

≤

x

, δ

i

=

0

)

. Without considering covariates, Efron [4] suggested a self-consistent estimating

equation forFlogT

(

t

)

≡

Pr

(

logT

≤

t

)

given by

FlogT

(

t

)

=

n−1 n



i=1



Ni

(

t

)

+

Ri

(

t

)

FlogT

(

t

)

−

FlogT

(

Xi

)

1

−

FlogT

(

Xi

)



.

(3)

The right hand side (RHS) of Eq.(3)can be expressed as a stochastic integral. Specifically, RHS of(3)

=

n−1 n



i=1



Ni

(

t

)

+

Ri

(

t

)



t 0 FlogT

(

t

)

−

FlogT

(

u

)

1

−

FlogT

(

u

)

dRi

(

u

)



=

n−1 n



i=1



Ni

(

t

)

+

Ri

(

t

)

{

1

−

FlogT

(

t

)

}



t 0 Ri

(

u

)

{

1

−

FlogT

(

u

)

}

2 dFlogT

(

u

)



,

where the second equality follows by applying stochastic integral by parts assuming the continuity ofFlogT

(

u

)

. This renders

an alternative form of the self-consistent estimating equation forFlogT

(

t

)

,

FlogT

(

t

)

=

n−1 n



i=1



Ni

(

t

)

+

Ri

(

t

)

{

1

−

FlogT

(

t

)

}



t 0 Ri

(

u

)

{

1

−

FlogT

(

u

)

}

2 dFlogT

(

u

)



.

(4)

Interestingly, the self-consistent estimating Eq.(4), unlike Eq.(3), offers a natural extension to the general regression setting under model(1). Note that model(1)is equivalent to

(4)

Withtreplaced byZT

i

β

(τ)

, Eq.(4)evolves into an estimating equation for

β

0

(τ)

:

n1/2S_n(SC)

(

β

, τ)

=

0

,

(5) whereS_n(SC)

(

β

, τ)

equals n−1 n



i=1 Zi



Ni

{

ZiT

β

(τ)

} +

Ri

{

ZiT

β

(τ)

}

(

1

−

τ)



τ 0 Ri

{

ZiT

β

(

u

)

}

(

1

−

u

)

2 du

−

τ



.

The stochastic integral equation(5)provides a new formulation of censored regression quantiles driven by the use of self-consistency principle. It entails a simple and clearly-defined estimation procedure for

β

₀

(τ)

. The key idea is to approximate the solution to(5)by mimicking Euler’s method for first-order differential equation. Specifically, let



β

_SC

(

· )

denote a cadlag

function, which only jumps at the

τ

-gridG. Setting exp

{

Zi



β

_SC

(

0

)

} =

0 for alli, one can obtain



β

_SC

(τ

_k+1

)

by sequentially

solving the following equation forbfork

=

0

, . . . ,

M

−

1:

n−1/2 n



i=1 Zi



Ni

(

ZiTb

)

+

Ri

(

ZiTb

)



_k



l=0 Ri

{

ZiT



β

_SC

(τ

_l

)

} ·



1

−

τ

_k+1 1

−

τ

_l+1

−

1

−

τ

k+1 1

−

τ

_l





−

τ

_k+1



=

0

.

(6)

It is easy to see that Eq.(6)is a monotone estimating equation [6]. Simple algebraic manipulationsshow that the estimating function in(6)is the minus subgradient of the following weighted check function:

n−1/2





δi=1

ρ

τ

(

Xi

−

ZiTb

)

+



δi=0



˜

w

k+1,i

ρ

τ

(

Xi

−

ZiTb

)

+

(

1

− ˜

w

k+1,i

)ρ

τ

(

X∗

−

ZiTb

)





,

(7) where

w

˜

_k+1,i

=



k l=0Ri

{

ZiT



β

_SC

(τ

_l

)

}



₁₋_τ k+1 1−τl+1

−

1−τk+1 1−τl



,k

=

0

, . . . ,

M

−

1.

Remark 2. As in [17], we set exp

{

Zi



β

_SC

(

0

)

} =

0 for allibecauseQT

(

0

|

Zi

)

=

0 according to the definition of conditional

quantile (i.e.QT

(τ

|

Z

)

≡

inf

{

t

:

Pr

(

T

≤

t

|

Z

≥

τ)

}

). Similarly, the justification of this boundary constraint requires the

smoothness of

β

₀

(τ)

and positive density ofT everywhere between 0 and the

τ

U-th quantile, implied by the regularity

conditions C2(a) and C3(a) in [17] respectively. Note that these assumptions would be violated when the support ofT is bounded away from 0, i.e.

[

c

,

+∞

)

withc

>

0. In such a case, an easy adaptation is to transformT byT

−

cifcis known. Whencis unknown, a practical remedy may be to fit model (1) to max

(

T

−

c′

,

0

)

, wherec′may be chosen as the observed lower bound ofT’s support. This is expected to result in only minimal deviations from the original regression quantiles of interest whenc′_{is close to}_c_.

As shown above, the self-consistent estimator



β

_SC

(

· )

is well defined without imposing additional constraints on the

support ofC. This estimator can be easily computed in a sequential manner like Portnoy’s self-consistent estimator. Another appealing feature is that



β

_SC

(

· )

is formulated based on a stochastic integral equation which provides an easy platform

for conducting large-sample studies. This enables in-depth investigations on the connections between



β

_SC

(

· )

and existing

estimators, including



β

_PH

(

· )

and



β

_PSC

(

· )

. 4. Asymptotic results

Large sample studies for



β

_SC

(

· )

is greatly facilitated by the stochastic integral equation representation of (5). In

Section4.1, we first show the asymptotic equivalence between



β

_PH

(

· )

and



β

_SC

(

· )

. This result immediately implies the

uniform consistency and weak convergence of the new self-consistent estimator



β

_SC

(

· )

, provided the asymptotic properties

established for



β

_PH

(

· )

in [17]. Since the closed-form limit distribution of



β

_PH

(

· )

reduces to that of the Kaplan–Meier estimator

in the one-sample case, the same property applies to



β

_SC

(

· )

.

In Section4.2, we further exploit the relationship between



β

_PSC

(

· )

and



β

_SC

(

· )

, which ultimately uncovers the underlying

connection between



β

_PSC

(

· )

and



β

_PH

(

· )

. Theoretical work conducted here also suggests several variants of



β

_SC

(

· )

. 4.1. Asymptotic equivalence between



β

_PH

(

· )

and



β

_SC

(

· )

The result is formally stated in the following theorem:

Theorem 1.Suppose model(1)holds. Assuming conditions required by Theorem 2 of [17]and condition

(

D1

)

inAppendixare satisfied,

sup τ∈[ν,τU]

∥

n1/2

{



β

_PH

(τ)

−



β

_SC

(τ)

}∥ →

_p0

,

for any

ν

∈

(

0

, τ

U

)

.

(5)

Our proof ofTheorem 1involves a variant of estimating Eq.(5)given by n1/2S_n(OSC)

(

β

, τ)

=

0

, τ

∈

(

0

, τ

U

]

,

(8) where S_n(OSC)

(

β

, τ)

=

n−1 n



i=1 Zi



Ni

{

ZiT

β

(τ)

} +

(

1

−

τ)



τ 0 Ri

{

ZiT

β

(

s

)

}

(

1

−

s

)

2 ds

−

τ



.

A self-consistent estimator of

β

₀

(τ)

,



β

_OSC

(τ)

, can be derived based on Eq.(8)in the same fashion that



β

(SC)

(τ)

is defined based on Eq.(5). Specifically,



β

_OSC

(τ)

is a cadlag process obtained as follows: (i) set exp

{

Z_iT



β

_OSC

(τ

₀

)

} =

0 for alli; (ii) for k

=

0

, . . . ,

M

−

1, sequentially compute



β

_OSC

(τ

_k+1

)

as the minimizer of theL1-type convex objective function,

ℓ

k+1

(

h

)

=

n



i=1



δ

_iX_i

−

δ

_ihTZ_i



+



X∗

−

hT n



l=1

(

−

δ

_lZl

)



+



X∗

−

2Zr n



r+1

(τ

k+1

− ˜

w

k+1,r

)



.

In our arguments,



β

_OSC

(τ)

serves as a bridge to connect



β

_SC

(τ)

and



β

_PH

(τ)

. By theLemmas 2and3stated below, we can

respectively show the asymptotic equivalence between



β

_PH

(τ)

and



β

_OSC

(τ)

and that between



β

_SC

(τ)

and



β

_OSC

(τ)

, thereby

obtaining the result inTheorem 1.

Lemma 2. Suppose that model(1)holds. Let

β

˜

(

· )

and

β

ˇ

(

· )

be cadlag processes that only jump at0

< τ

1

< τ

2

<

· · ·

<

τ

M

< τ

U. Assuming that conditions required by Theorem 2 of [17]and condition

(

D1

)

are satisfied, if n1/2Sn(u)

(

β

˜

, τ)

a

=

0and n1/2_S(u) n

(

β

ˇ

, τ)

a

=

0, then sup τ∈[ν,τU]

∥ ˜

β

(τ)

−

β

₀

(τ)

∥ →

_p0

,

sup τ∈[ν,τU]

∥ ˇ

β

(τ)

−

β

₀

(τ)

∥ →

_p0

,

and sup τ∈[ν,τU]

∥

n1/2

{ ˜

β

(τ)

− ˇ

β

(τ)

}∥ →

_p0

for any

ν

∈

(

0

, τ

U

)

, where the superscript u can be ‘‘PH’’, ‘‘SC’’, and ‘‘OSC’’. Here and in the sequel, a

=

means asymptotic equivalence uniformly in

τ

∈

(

0

, τ

U

]

.

Remark 3. Lemma 2states that a deviation ofo

(

1

)

(uniformly in

τ

) in the estimating functionn1/2Sn(PH)

(

· )

,n1/2Sn(SC)

(

· )

, or

n1/2Sn(OSC)

(

· )

does not affect the uniform consistency and only change the limit process of the resulting estimator byo

(

n−1/2

)

.

Lemma 3. Suppose that model(1)holds. Assuming that conditions required by Theorem 2 of[17]and condition

(

D1

)

are satisfied,

sup τ∈(0,τU]



n−1/2 n



i=1 ZiI

{

Xi

>

ZiT



β

_OSC

(τ)

}



τ 0 Ri

{

ZiT



β

_OSC

(

s

)

}

(

1

−

s

)

2 ds



→

_p0

.

(9)

The technical proofs ofLemmas 2and3, andTheorem 1are relegated toAppendix.

4.2. Connection between



β

_SC

(

· )

and



β

_PSC

(

· )

We start with an introduction of some new notation. DefineAi

(

β

, τ)

= {

u

:

0

≤

u

< τ,

ZiT

β

(

u

−

)

≤

Xi

≤

ZiT

β

(

u

)

}

and

ψ

i

(

β

, τ)

=

sup

{

Ai

(

β

, τ)

} ·

I

(

Ai

(

β

, τ)

is not empty

)

+

τ

·

I

(

Ai

(

β

, τ)

is empty

)

. It is easy to see that

ψ

i

(



β

_s

, τ

_k+1

)

=

τ

m(βs,i,k),

where the subscriptscan be ‘‘PH’’, ‘‘SC’’, ‘‘OSC’’, or ‘‘PSC’’. It is easy to see that

ψ

i

(

β

, τ)

stands for the largest

τ

at which

ZT

i

β

(

· )

crossesXi.

To exploit the connection between



β

_SC

(

· )

and



β

_PSC

(

· )

, it is worthwhile to compare the objective functions adopted in

the corresponding algorithms, given by(2)and(7)respectively. It is observed that both objective functions take a form of weighted sum of check functions, while the major distinction lies with the weight definitions. It is interesting to note that the weight distinction may vanish ifZ_iT



β

_SC

(

· )

are nondecreasing for alli. This is because with nondecreasingZ_iT



β

_SC

(

· )

,

n1/2S_n(SC)

(



β

_SC

, τ)

=

n −1/2 n



i=1 Zi





Ni

{

ZiT



β

_SC

(τ)

} +

Ri

{

ZiT



β

_SC

(τ)

}

(

1

−

τ)



τ ψi(βSC,τ) Ri

{

ZiT



β

_SC

(

u

)

}

(

1

−

u

)

2 du

−

τ





(10)

=

n−1/2 n



i=1 Zi



Ni

{

ZiT



β

_SC

(τ)

} +

R_i

{

Z_iT



β

_SC

(τ)

}

τ

−

ψ

_i

(



β

_SC

, τ)

1

−

ψ

_i

(



β

_SC

, τ)

−

τ



,

(11)

and the estimating function in(11)renders weights,τ−ψi(βSC,τ)

1−ψi(βSC,τ)

(6)

However, the procedure to obtain



β

_SC

(

· )

does not ensure the increasingness ofZ_iT



β

_SC

(

· )

but its uniform convergence to

β

0

(

· )

which satisfies the monotonicity requirement onZiT

β

0

(

· )

for alli. InLemma 4, we claim that the weight contributions

from

τ

less than

ψ

i

(



β

_SC

, τ)

in(7)are negligible even whenZ_iT



β

_SC

(

· )

lacks the monotonicity.Lemma 4is stated below with

proof given inAppendix.

Lemma 4. Suppose model(1)holds. Assuming that conditions required by Theorem 2 of [17]and condition

(

D1

)

are satisfied,

sup τ∈(0,τU]



n−1/2 n



i=1 ZiRi

{

ZiT



β

_SC

(τ)

}

(

1

−

τ)



ψi(βSC,τ) 0 Ri

{

ZiT



β

_SC

(

s

)

}

(

1

−

s

)

2 ds



→

_p0

.

It immediately follows fromLemma 4, the definition of



β

_SC

(

· )

, and(11)that n1/2S_n(MSC)

(



β

_SC

, τ)

a

=

0

,

(12) where S_n(MSC)

(

β

, τ)

=

n−1 n



i=1 Zi



Ni

{

ZiT

β

(τ)

} +

Ri

{

ZiT

β

(τ)

}

τ

−

ψ

_i

(

β

, τ)

1

−

ψ

_i

(

β

, τ)

−

τ



.

Remark 4. The form ofSn(MSC)

(

β

, τ)

bears some similarity with that of McKeague et al. [13]’s estimating function for median

regression. Unlike in [13], we estimate

{

τ

−

ψ

_i

(

β

, τ)

}

/

{

1

−

ψ

_i

(

β

, τ)

}

self-consistently based on the assumed quantile regression model(1)rather than imposing a separate model ofTgivenZ.

Eq.(12)induces another variant of



β

_SC

(

· )

,



β

_MSC

(

· )

, defined as a cadlag solution to

n1/2S_n(MSC)

(

β

, τ)

=

0

.

(13)

More specifically,



β

_MSC

(

· )

jumps only on the gridGwith exp

{

Z_iT



β

_MSC

(

0

)

} =

0 for alliand



β

_MSC

(τ

k+1

)

, given

{



β

_MSC

(τ

l

)

}

kl=0, is

obtained as the solution to

n−1/2 n



i=1 Zi



Ni

(

ZiTb

)

+

Ri

(

ZiTb

)



τ

k+1

−

τ

m(βMSC,i,k) 1

−

τ

_m₍ βMSC,i,k)



−

τ



=

0

.

(14)

Given

∥

G

∥ =

o

(

n−1/2

)

,(12)holds with



β

_SCreplaced by



β

_MSC.

Note that



β

_MSC

(

· )

greatly resembles Portnoy’s estimator



β

_PSC

(

· )

by noting that the estimating function in(14)is the minus

subgradient of the following weighted check function,

n−1/2





δi=1

ρ

τ

(

Xi

−

ZiTb

)

+



δi=0



w

∗ k+1,i

ρ

τ

(

Xi

−

ZiTb

)

+

(

1

−

w

∗ k+1,i

)ρ

τ

(

X ∗

₋

Z_iTb

)





,

where

w

∗

k+1,i

= {

τ

k+1

−

τ

m(βMSC,i,k)

}

/

{

1

−

τ

m(βMSC,i,k)

}

,k

=

0

, . . . ,

M

−

1. The weights for



β

PSC

(

· )

,

w

ˆ

k,i, and the weights

for



β

_MSC

(

· )

,

w

∗_k_,_i, have nearly identical definitions except for the weights used for computing



β

_PSC

(τ

1

)

and



β

_MSC

(τ

1

)

. The

algorithm in [18] or [16] forces all

w

ˆ

₁_,_i’s equal to 1 when obtaining



β

_PSC

(τ

1

)

by uncensored regression quantiles. This may

incur some complication if there are censoredXi’s less thanZiT



β

_PSC

(τ

₁

)

. In contrast, the weights for



β

_MSC

(τ

₁

)

,

w

∗₁_,_i’s, are set

as 0 justified on the basis of estimating Eq.(14). By the similarity and the difference stated above,



β

_MSC

(

· )

may be viewed

as a modified version of



β

_PSC

(

· )

, which avoids the somewhat ad-hoc restriction ofC_i

<

Z_iT

β

₀

(τ

₁

)

as well as the associated

algorithmic issues.

Asymptotic studies for



β

_MSC

(

· )

are similar to those for



β

_PSC

(

· )

presented in [19], involving the main difficulty in analyzing

the term that involves

{

τ

−

ψ

_i

(

β

, τ)

}

/

{

1

−

ψ

_i

(

β

, τ)

}

. Provided the uniform consistency of



β

_MSC

(

· )

, we can write the

‘‘difference in

τ

’’ term inSn(MSC)

(

β

, τ)

, namely

n−1 n



i=1 ZiRi

{

ZiT

β

(τ)

}

τ

−

ψ

_i

(

β

, τ)

1

−

ψ

_i

(

β

, τ)

,

as a Riemann sum by adapting Steps 1–6 in the proof for Theorem 3.1 of [19]. Using similar arguments to their Step 7, we can show that ao

(

1

)

(uniformly in

τ

) deviation inn1/2_S(MSC)

n

(

β

, τ)

has little impact on the limit process of the solution to the

resulting equation. Note that Eq.(12)reveals that



β

_MSC

(

· )

and



β

_SC

(

· )

are both solutions to the equationn1/2S_n(MSC)

(

β

, τ)

a

=

0. The asymptotic equivalence between the proposed self-consistent estimator,



β

_SC

(

· )

, and the modified Portnoy’s estimator,



β

_MSC

(

· )

, then follows from this fact.

(7)

Table 1

Empirical biases×103_{(Bias) and empirical variances}_×₁₀3_{(Var) of} 

βPH(τ),β_SC(τ),β_OSC(τ),β_MSC(τ), andβ_PSC(τ), with C1, C2, and C3 denoting the first, the second, and the third component of the corresponding estimator respectively.

τ β_PH β_OSC β_SC β_MSC β_PSC

Bias Var Bias Var Bias Var Bias Var Bias Var

(I) Log-linear model with iid errors

0.1 C1 3 250 12 251 12 251 12 252 71 272 C2 0 588 6 594 6 594 5 597 2 635 C3 30 188 33 188 34 189 34 188 33 200 0.3 C1 2 87 15 88 15 88 15 88 39 89 C2 24 204 27 206 27 206 27 206 28 206 C3 1 70 1 70 1 70 1 70 4 72 0.5 C1 2 55 13 54 13 54 13 54 32 55 C2 22 139 24 140 24 140 25 140 25 137 C3 5 49 4 50 4 50 4 50 3 51 0.7 C1 23 54 2 53 1 53 1 53 21 53 C2 12 131 14 130 14 130 15 131 8 130 C3 10 46 9 45 9 45 9 45 3 43

(II) Log-linear model with heteroscedastic errors

0.1 C1 13 88 9 88 9 88 9 88 26 92 C2 1 211 3 212 3 212 3 212 4 223 C3 1 74 0 74 0 74 0 74 2 75 0.3 C1 9 49 1 49 1 49 1 50 17 50 C2 1 114 2 115 2 115 2 115 1 117 C3 5 43 4 43 4 43 4 43 4 43 0.5 C1 29 50 12 49 12 49 12 49 5 49 C2 18 117 17 116 17 117 17 117 19 114 C3 8 40 8 40 8 40 8 40 5 40 0.7 C1 43 68 14 68 14 68 15 68 7 65 C2 10 156 5 156 4 156 4 157 6 152 C3 14 51 14 50 15 50 15 50 12 48 5. Simulations

Simulation studies have been conducted to compare Peng and Huang’s estimator



β

_PH

(

· )

and the four types of

self-consistent estimators, including



β

_SC

(

· )

,



β

_OSC

(

· )

,



β

_MSC

(

· )

, and



β

_PSC

(

· )

. For brevity of presentation, we only report results from

two scenarios: (I) log-linear model with iid errors; (ii) log-linear model with heteroscedastic errors. The configurations follow those used in [17] with 25% censoring. Specifically, for scenario (I),T follows the model logT

=

0

.

5Z1

−

0

.

5Z2

+

ϵ

,

where

ϵ

∼

extreme value distribution,Z1

∼

Unif

(

0

,

1

)

, andZ2

∼

Bernoulli

(.

5

)

. The censoring distribution isUnif

(

0

.

1I

(

Z2

=

1

),

3

.

8

)

. For scenario (II),Tis generated from the model logT

=

0

.

5Z1

−

0

.

5Z2

ξ

+

ϵ

, where

ξ

∼

exponential

(

1

)

,

ϵ

∼

N

(

0

,

1

)

,

Z1

∼

Unif

(

0

,

1

)

, andZ2

∼

Bernoulli

(.

5

)

. The censoring timeC

∼

Unif

(

0

.

3I

(

Z2

=

1

),

5

.

2

)

. Under each configuration, we

generate 1000 simulated datasets of sample sizen

=

200.

InTable 1, we present the empirical biases (Bias) and empirical variances (Var) of all five estimators under comparison. As expected, all estimators have small empirical biases, and their empirical variances are very similar, especially those of



β

_OSC

(τ)

,



β

_SC

(τ)

and



β

_PSC

(τ)

.

We also examine the difference among these estimators based on each simulated dataset. Selecting



β

_SC

(τ)

as the

reference, we present inTable 2the empirical 25th percentiles (Diff25) and empirical 75th percentile (Diff75) of



β

_PH

(τ)

−



β

_SC

(τ)

,



β

_OSC

(τ)

−



β

_SC

(τ)

,



β

_MSC

(τ)

−



β

_SC

(τ)

, and



β

_PSC

(τ)

−



β

_SC

(τ)

. Results inTable 2confirm the observation fromTable 1

that the three new self-consistent estimator,



β

_SC

(

· )

,



β

_OSC

(

· )

, and



β

_MSC

(

· )

, are in close proximity; in over 50% of simulations

they coincide with each other. It is interesting to note that Portnoy’s estimator



β

_PSC

(τ)

seems to have a relatively larger

deviation from



β

_SC

(τ)

than does



β

_PH

(τ)

at small

τ

’s. This is likely due to its special treatment on



β

_PSC

(τ

₁

)

. The deviation

of Peng and Huang’s estimator



β

_PH

(τ)

from the three new self-consistent estimators appears to rise as

τ

increases. This

phenomenon is not surprising and shares the same spirit as the increasing cumulative error of Euler’s solution to an ordinary differential equation. In summary, our simulation results provide empirical evidence for the asymptotic equivalence among



β

_PH

(

· )

,



β

_SC

(

· )

,



β

_OSC

(

· )

, and



β

_MSC

(

· )

, and their close connections with



β

_PSC

(

· )

, in addition to the theoretical arguments given

in Section3. 6. Remarks

The principle of self-consistency has been widely adopted in survival analysis as an intuitive way to handle missing information due to censoring and/or truncation. Examples include estimating survival function with randomly censored

(8)

Table 2

Empirical 25th percentiles×103_{(DIFF25) and empirical 75th percentiles}_×₁₀3_{(DIFF75) of}

β_PH−β_SC,β_OSC−β_SC,β_MSC−β_SC, andβ_PSC−β_SC, with C1, C2, and C3 denoting the first, the second, and the third component of the corresponding quantity respectively.

τ β_PH−β_SC β_OSC−β_SC β_MSC−β_SC β_PSC−β_SC

Diff25 Diff75 Diff25 Diff75 Diff25 Diff75 Diff25 Diff75

(I) Log-linear model with iid errors

0.1 C1 0 0 0 0 0 0 −63 0 C2 0 0 0 0 0 0 0 0 C3 0 0 0 0 0 0 −7 4 0.3 C1 0 5 0 0 0 0 −33 0 C2 0 0 0 0 0 0 0 0 C3 0 0 0 0 0 0 −1 9 0.5 C1 0 17 0 0 0 0 −24 0 C2 0 0 0 0 0 0 0 0 C3 −6 6 0 0 0 0 −3 6 0.7 C1 0 39 0 0 0 0 −23 0 C2 −18 12 0 0 0 0 −1 0 C3 −13 14 0 0 0 0 −2 10

(II) Log-linear model with heteroscedastic errors

0.1 C1 0 0 0 0 0 0 −48 0 C2 0 0 0 0 0 0 −11 4 C3 0 0 0 0 0 0 −14 9 0.3 C1 0 2 0 0 0 0 −23 0 C2 0 0 0 0 0 0 −3 4 C3 0 0 0 0 0 0 −6 6 0.5 C1 0 23 0 0 0 0 −24 0 C2 −4 0 0 0 0 0 0 2 C3 −7 7 0 0 0 0 −4 10 0.7 C1 0 48 0 0 0 0 −26 0 C2 −27 12 0 0 0 0 −4 0 C3 −14 16 0 0 0 0 −6 7

data [4], doubly censored data [22], and interval-censored data [23]. In this paper, we present a new representation of self-consistent regression quantiles with randomly censored data based on stochastic integral equations. Such a formulation allows us to carry out self-consistent estimation in a sequential manner, while facilitating asymptotic studies. The new framework can also be extended to other more complex settings where the principle of self-consistency is applicable.

The proposed self-consistent estimator is shown to be asymptotically equivalent to Peng and Huang [17]’s estimator derived based on the martingale structure of the data. This result may be viewed as an extension of Kaplan–Meier and Nelson–Aalen equivalence in the one-sample case to the quantile regression setting. The techniques adopted in this work are conceptually simple and have good adaptability to other self-consistent estimation settings.

It is worth emphasizing that we adopt a grid size ofo

(

n−1/2

₎

_{for all estimators considered in the paper, including}

Portnoy’s estimator



β

_PSC

(

· )

. It is unclear to us whether the theoretical results can hold with a less finer

τ

-grid. The work by

Neocleous and Portnoy [15] suggests that this may be possible in the absence of censoring along with a benefit of guaranteed monotonicity of estimated quantile functions. Investigations along this direction for censored quantile regression may merit future research but are beyond the scope of this paper.

Acknowledgments

The author thanks Professor Stephen Portnoy for his helpful comments on this work. This research was supported by the US National Science Foundation grant DMS-1007660.

Appendix

DefineFX

(

x

|

Z

)

=

pr

(

X

≤

x

|

Z

)

,FX,1

(

x

|

Z

)

=

pr

(

X

≤

x

, δ

=

1

|

Z

)

,FlogT

(

x

|

Z

)

=

pr

(

logT

≤

x

|

Z

)

,fX

(

x

|

Z

)

=

dFX

(

x

|

Z

)/

dx,

fX,1

(

x

|

Z

)

=

dFX,1

(

x

|

Z

)/

dx, andflogT

(

x

|

Z

)

=

dFlogT

(

x

|

Z

)/

dx. LetZdenote the domain ofZ.

Regularity conditions include conditions (C1)–(C6) in [17], and

(D1): (i) sup_x_,_z∈ZfX

(

x

|

z

)

is bounded; (ii) supx,z∈ZflogT

(

x

|

z

)

is bounded; (iii) Each component of E

[

Z⊗2fX

(

ZTb

|

Z

)

]

(

E

[

Z⊗2fX,1

(

ZTb

|

Z

)

]

)

−1is uniformly bounded inb

∈

B

(

d0

)

, whereB

(

d0

)

is the same asB

(

d0

)

defined in [17].

Proof of Lemma 2.In case thaturepresents ‘‘PH’’, Lemma 1 follows immediately from the proofs of Theorems 1–2 in [17]. Very similar arguments can be applied to prove the case withurepresenting ‘‘SC’’, and ‘‘OSC’’ and hence are omitted here.

(9)

Proof of Lemma 3. First, it is easy to note that the LHS of(9) is bounded above by c1n−1/2