On the Convergence of Observed Partial Likelihood under Incomplete Data with Two Class Possibilities

(1)

http://dx.doi.org/10.4236/ojs.2014.42012

On the Convergence of Observed Partial Likelihood under

Incomplete Data with Two Class Possibilities

Tomoyuki Sugimoto

Department of Mathematical Sciences, Hirosaki University, Hirosaki, Japan Email: [email protected]

Received October 21,2013; revised November 21, 2013; accepted November 28, 2013

Copyright © 2014 Tomoyuki Sugimoto. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In accor-dance of the Creative Commons Attribution License all Copyrights © 2014 are reserved for SCIRP and the owner of the intellectual property Tomoyuki Sugimoto. All Copyright © 2014 are guarded by law and by SCIRP as a guardian.

ABSTRACT

In this paper, we discuss the theoretical validity of the observed partial likelihood (OPL) constructed in a Cox- type model under incomplete data with two class possibilities, such as missing binary covariates, a cure-mixture model or doubly censored data. A main result is establishing the asymptotic convergence of the OPL. To reach this result, as it is difficult to apply some standard tools in the survival analysis, we develop tools for weak con-vergence based on partial-sum processes. The result of the asymptotic concon-vergence shown here indicates that a suitable order of the number of Monte Carlo trials is less than the square of the sample size. In addition, using numerical examples, we investigate how the asymptotic properties discussed here behave in a finite sample.

KEYWORDS

Cox’s Regression Model; Logistic Regression Model; Incomplete Binary Data; Partial Likelihood; Partial-Sum Processes; Profile Likelihood

1. Introduction

Although the Cox model [1] is a standard tool for the analysis of time-to-event data, in practice analysts are of-ten confronted with some problems in handling incomplete data beyond the right-censored form, such as inter-val-censored data [2], missing covariates [3-5] or (statistical) structural modelling. The inference for the Cox model under such cases of incomplete data can usually be performed based on the semiparametric profile like-lihood (SPFL) [6] as a generalization of Cox’s partial likelike-lihood [7]. On the other hand, as a substitute for the SPFL method, one can analyse the same data using the imputation method, which yields a sum of partial like-lihoods. By describing the sum of all possible partial likelihoods more exactly, we can formulate the marginal of partial likelihoods [8], that is, the observed partial likelihood (OPL). In this area, the theory for the SPFL has been studied by many authors (e.g., [6,9]). However, to the best of our knowledge, there has not been much de-velopment in mathematical theory for the OPL.

(2)

Generally, it is difficult to investigate computationally to what extent the MC approximations of the OPL are valid, since the exact computation requires a huge number of summands, as the sample size and incomplete in-formation of data are increasing. For this reason, it is worth studying the OPL theoretically. However, it is not easy to complete such a study in one go, because standard tools to study asymptotic properties of Cox’s partial likelihood or the SPFL cannot be applied directly to an objective of the OPL. Therefore in this paper, for the sake of simplicity, we focus on the OPL constructed in incomplete data composed of unobserved two class la-bels. Typical cases of this type occur in a Cox-type model with incomplete data, such as missing binary cova-riates, a cure-mixture model or doubly censored data. As a main result, we establish the asymptotic convergence of the OPL and derive a limit form of the OPL. This result is a foundation or precondition for applying an infi-nite-dimensional Laplace approximation for integral on the baseline hazard. Such a Laplace approximation me-thod will yield the other limit form of the OPL [11], which is useful in discussing the consistency and asymptot-ic normality of the estimators. However, the method is not convenient for showing the convergence of the OPL. For these reasons, it is also valuable to discuss the convergence of the OPL using the arguments employed in this paper.

A matter of interest in practice concerns MC approximations of the OPL. One other significant point is that the result for the convergence of the exact OPL can be easily tailored to the context of the MC approximations. Based on such an argument, we show that a suitable order of the number of MC trials is less than O(n2) Further, in Section 4 we investigate how the asymptotic properties discussed here behave in a finite sample.

In Section 2 we formulate the OPL in incomplete data with two class possibilities, providing several examples of interest; in Section 3 we develop the tools to obtain the main result and show the convergence of the OPL, and in Section 4 we discuss the performances of MC approximations.

2. Observed Full and Partial Likelihoods

2.1. Notations and Motivated Examples

Let Ti min

(

T Ui i

)

∗

= , and i

(

Ti Ui

)

∗

∆ =1 ≤ be the observed survival time and right-censoring indicator of the

-th

i individual, where T i_i∗, = , ,1 n are continuous random variables independent of Ui and 1

( )

⋅ is the indicator function. Suppose that the individuals possess some difference between models or observations identi-fied by the two classes. We define such a class variable by

1 if the -th individual belongs to class 1 0 if the -th individual belongs to class 0

i

i q

i

∗₌ _.

  In the case that qi

∗

expresses the difference between models, assume that the distribution of Ti

∗

follows the proportional hazards model formulated as

( )

_{( )}

( )

_{( )}

( )

(

T ( )

)

0 and

i t t ri ri r Zi

λ =λ  β  β =  β  , __{= , ,}_{0 1} where

λ

0

( )

t is the baseline hazard function,

( )

_{( )}

r x is the function given by

( )1

_{( )}

( )0

_{( )}

( )0

_{( )}

exp , either exp or 0,

r x = x r x = x r x =

( )

i

Z  is the covariate vector ( ) ( ) T

1 b

(Z_i, , Z_i ) from the population of the class , and β is the regression coefficient vector T

1 b

(β, , β ) . As usual, the information on

(

Ti,∆i

)

can be re-expressed using the counting processes _i

( ) (

t =1 T_i≤ ,∆ =t _i 1

)

and at-risk processes _i

( ) (

t =1 T_i ≥t

)

.

In this paper, we consider incomplete data where some of the qi

∗

’s are treated as missing. Let

(

)

( )

(

)

(

)

(

)

is completely observed and

for 0 1 if 1 is complete 1 for 0 1 if 0 is missing

i i

i i i

i

i i

C q

q C q

C q

π ∗

∗ ∗

∗ =

 = = , = ,

 = 

= , = .



  



1 1

Each of these is used to construct the likelihoods. Further, if the event of

{

qi

}

∗₌_

can be expressed by a probability, we use the following notation and assumption

(

)

( )

_{( )}

( )0

_{( )}

(

T

)

{

(

T

)

}

(3)

where Xi is the covariate vector

(

)

T

1 a

1,Xi, , Xi related to qi,

∗

and α is the regression coefficient vector

(

)

T

0 1 a .

α α, , ,α For simplicity, we will write

(

)

T

(

T T

)

T 0 1 , a b

θ= θ θ, ,θ+ = α β, hereafter. Let (q1 qn)

∗₌ ∗_{, ,} ∗

q  be the collection of true qi

∗

’s. In many cases of incomplete data with two class possibilities, the observed full likelihood (OFL) can be generally written as

( )

₍

₎

{ } ( )

(

) ( )

0 0 1n 0

n n

f f

L θ,Λ =

∑

q_{∈ ,} Lπ θ,Λ ;q s q (2.1) with the elements such that

( )

₍

₎

( )

_{( ) ( )}

{

( )

_{( )}

}

( )

₍

₎

( )

0 1 0 0 π

i

i i i i

n q q q q

n

f i i i i i i i

L_π θ, Λ ;q =

∏

₌ p α λ T r β ∆S T; , Λβ

and

( )

n₁ qi, i= =

_∏

q

s s where the space

{ }

0 1 n

, denotes the collection of all the vectors composed of 0 or 1 with length n, q=

(

q1, ,qn

)

expresses one element of

{ }

0 1

n

, , in which there exists one true element q∗,

( )

₍

₎

(

( )

_{( ) ( )}

)

0 exp 0 0 1

i i

S t; ,Λ =β −r β Λ t , = ,

is the survival function of the i-th individual belonging to the class , 0

( )

₀ 0

( )

d t

t λ s s

Λ =

_∫

is the cumulative baseline hazard function, and s is given by either of s=1 or −1 in advance.

In the following three examples, we show how the form of the OFL is related to the representative cases. He-reafter, we will often omit θ when it is clear that a function depends on θ, e.g. r_i( ) =r_i( )

( )

β , p_i( ) =

( )

_{( )}

, i

p α Si( )

(

t;Λ =0

)

Si( )

(

t; ,Λβ 0

)

 

and so on.

Example: Missing Binary Covariates. Let us assume that r( )0

( )

x =r( )1

( )

x but Z_i( )0 ≠Z_i( )1. For example, the first covariate is binary and may be missing, (1 )

i

q

i i

Z ∗ =q∗ and ( )

, 2.

i

q ij ij

Z ∗ =Z j≥ Then, we can write

( )

(

)

(

( )1

)

(

)

(

( )0

)

(qi) _exp T (qi) _exp T ₁ _exp T

i i i i i i

r ∗ β = β Z ∗ =q∗ β Z + −q∗ β Z ,

where T ( ) 1 b 2 .

i

q

i i j j ij

Z q Z

β ∗ β ∗ β

=

= +

∑

In this case, the OFL is

( )

₍

₎

( )

_{( )}

{

_{( )}

( )

_{( )}

}

( )

₍

₎

( )

0 1 0 1 0 0 π

i

n n

f i i i i i i i

L θ ₌ _ _{= ,}p α λ T r β ∆S T β _

 

, Λ =

_∏

∑

   ; , Λ  .



Using the binomial expansion, this can be rewritten as

( )

₍

₎

{ } ( )

( )

( ) ( )

(

)

( )

0 0 1 1 0 0 π

i

i i i i

n

n q q q q

n

f i i i i i i i

L θ p λ T r S T

∆

 

 

 

∈ , =  

, Λ =

∑

_q

∏

;Λ . (2.2)

Example: Cox Cure-Mixture Model. The Cox cure-mixture model [10,12-15] is presumed to hold the propor-tional hazards model for uncured individuals and to be zero-hazard for cured ones. That is, we assume

( )0 ( )1

i i

Z =Z but r( )0

( )

x =0, so that we can write

( )

(

( )1

)

( ) T

exp

i

q

i i i

r ∗ β =q∗ β Z .

We observe that q_i∗=1 . .

(

i e C_i=1

)

if ∆ =i 1 and qi

∗

is missing

(

i e C. . i =0

)

otherwise. The OFL is usually

( )

₍

₎

{

( )

_{( )}

( ) ( )

₍

₎

}

{

( ) ( ) ( )

₍

₎

}

1

1 1 1 0 1 1

0 1 0 0 0

i i

n n

f i i i i i i i i i i

L θ,Λ =

∏

₌ p λ T r S T;Λ ∆ p +p S T;Λ −∆ ; (2.3)

note here that Si( )0

(

t;Λ =0

)

1 for all .t Then, (2.3) can be rewritten in the same form as (2.2).

Example: Doubly Censored Data. In doubly censored data [16,17], left-censored data may be included. Let i

q∗ indicate whether the i-th observation is left-censored or not, the OFL is then

( )

₍

₎

{

_{( )}

( )

}

* ( )

₍

₎

*

{

( )

₍

₎

}

1 *

1 1 1

0 1 0 0 1 0

i i _i i

q _q q

n n

f i i i i i i i

L θ λ T r S T S T

∆ −

=

,Λ =

∏

 ;Λ  − ;Λ  . (2.4) In the phenomenal meaning, the common model is assumed regardless of the type of observations, but we do not define qi

∗

as q_i∗. Here we use the rule r_i( )0 =0 and Zi( )1 =Zi( )0 such that qi

∗

designates the type of model rather than an observation. Under this rule, because we have Si( )0

(

t;Λ =0

)

1 for all t, (2.4) can be expressed as

( )

₍

₎

{ }

( )

( ) ( )

(

)

( )

0 0 1 1 0 0 1

i _i

i i i

n

n q q q q

n

f i i i i i i

L θ _{∈ ,} λ T r ∆S T π

=  

(4)

where note that qi

∗

is defined as missing data (i.e. Ci =0) if the i-th observation is left-censored, and as complete data of q_i∗=1 (i.e. Ci=1) otherwise.

2.2. Observed Partial Likelihood

Let Rn

[ ]

⋅ be the integral operator proposed by [18] to derive the partial likelihood in the Cox model without time-dependent covariates. Without loss of generality, let us suppose that there are no ties. By operating Rn

[ ]

⋅ to the OFL ( )

(

0

)

,

n f

L θ,Λ we have the OPL

( )

_{( )}

{ }0 1n ( )

(

) ( )

,

n n

p p

L θ =

∑

q∈ , Lπ θ;q s q where

( )

_{( )}

( )

₍

₎

( )

_{( )}

{

( )

_{( )}

( )

_{( )}

}

( )

π π 0 1

1 1

π .

i j

i i i

n n

q n

q q q

n n

p n f i i j i i j i

i i

L θ L θ p α r β T r β

∆ =

= =

 

; =q R _ , Λ ;q _=

∏

∑



Let _{( )}

( )

1

(

( )

)

1

(

)

1

log n log n .i

p

p n θ n L θ n i n

∆

− −

=

= +

∏

 To discuss an asymptotic form of p n( )

( )

θ

, we will pre-pare some convenient expressions. First, to pack the expression of π( )qi

i into

( )qi

i

r and ( )qi

i

p , we define

( ) ( )

_{( )}

{ }

( )

{

( )

_{( )}

}

(

)

{ }

( )

{

( )

_{( )}

}

(

)

{ }

( )

{

( )

_{( )}

}

₍

₎

{ }

( )

{

( )

_{( )}

}

1 1 0 0

1

1 1 , , 0, 1,

i i

l h l h

h q l q

i i i i i i i i i

l h l h

i i i i i i i

Z r C q Z r q Z r

C q Z r q Z r h l

β β β

β β

∗ ∗

 

= _ + − _

 

 

+ − _ + − _ =

 

 _

and ( )qi

(

1

)

(

1

)(

1

)

i i i i i i i

X =C −q∗ X + −C −q X.

Using these expressions, let

( )

₍

₎

_{( )}

( )

_{( )}

( )

₍

₎

_{( )}

( ) ( )

_{( )}

( )

₍

₎

( )

(

T

)

0 1

0 T

E E

and E log 1 e

i i i

i i

q q q

n n i i n n i i i

q X

n n i

t t r t t Z r

X α

β β β β

α α

   

, ; = _ _, , ; = _ _,

 

; = _ − + _,

 

q q

q



 



 

 



where E

[ ]

1 1 n n Ri n i Ri

− =

=

∑

_{is an empirical version of the theoretical expectation} E

_{[ ]}

R_i . Further, as an impor-tant definition, let

( )

2n _{{ }}_{0 1}n

(

)

n x x

ν

−

∈ ,

=

∑

∈

 1 

be the n-dimensional version of Minkowski’s measure ν∞. Then, using these notations, p n( )

( )

θ can be

written as

( )

1log

{

n _{{ }}0 1nexp

(

π( )

( )

)

( )

d n

( )

}

p n θ n B n p n θ ν

−

∈ ,

=

_∫q

;q q q ,

  s (2.5)

where

( )

(

)

( )0

(

)

T ( )

( )

{

( )0

(

)

}

( )

π 0 E d 0 log E d

e q_i e

n n i i n n i

p n Z t t t

τ τ

θ_; ₌ _;α ₊ β  ₋ _{, ;}β __ ___,

 

∫

q q  q

    

e

τ is the greatest follow-up time and ₂n ₂ in C1i_.

n

B = ∑= _{The quantity of}

₂

∑ni=1Ci_{is the total number that the} same _{πp n}_{( )}

( )

θ;q ’s are repeated on q∈ ,

{ }

0 1 .n In addition, we define

( )

π_{( )}n( ) π( )n( )

p n θ =p n θ  _p θ;q= _p θ;q



and _{( )}

( )

( )0

(

)

{

T ( )0

(

)

{

( )0

(

)

}

( )0

(

)

}

( )

π + ₀ log d 0

e

p n n n t n t n t t

τ

θ_; ₌ _;α β _{, ;}β∗ ₋ _{, ;}β _{, ;}∗ β∗ _Λ∗ _,

∫

q q q q q

    

which is _{πp n}_{( )}

( )

θ;q in which i

( )

t ’s are replaced by the true intensity

( )

( )1

( )

_{( )}

(

)

_{( )}

( )0

( )

_{( )}

0 0

0 d 1 0 d

t t

i t qi i s ri β s qi i s ri β s

∗∗ ∗ ∗ ∗ ∗ ∗ ∗

Λ =

_∫

 Λ + −

_∫

 Λ ,

where ( )1

(

)

E [

( )

( )qi

( )

qi

( )

],

n t β n i t Zi ri β ∗

∗ ∗

, ;q =  



β

∗ and 0

∗

Λ are the true β and Λ0. In the case of

( )1 i

Z =

( )0 , i

Z such as Cox cure-mixture model or doubly censored data, we have ( )n1

(

t

β

)

( )n1

(

t

β

)

.

∗ ∗ ∗

, ;q = , ;q

(5)

Remark 1. In (2.5), even if we consider a difference or quotient between p n_{( )}

( )

θ and p n( )

( )

θ , we cannot remove the potential increasing factor n in the summands exp(n_π_p_{( )}_n( ))⋅ because of the existence of the oper-ator

Σ

q∈ ,{0 1}n

.

Thus, it may potentially be difficult to apply some of the standard tools in the survival analysis to the asymptotic discussion. For these reasons, our strategy to obtain a limit of p n_{( )}

( )

θ is to regard all the summands of

Σ

q∈ ,{0 1}n as a process on

{ }

0 1 .

n

, We will then derive the result of a weak convergence on

{ }

0 1 ., n



3. Convergence of the Observed Partial Likelihood

We will now discuss how the mean of the log OPL converges to a deterministic function and provide Theorem 1 of the main result. The following conditions are assumed for these discussions.

Conditions A. Let Θ be a compact set of θ which includes

(

T T

)

T ,

θ∗₌ α∗ _,β∗ _whereθ∗_andα∗_{are the} true θ and α. The true baseline function 0

( )

t

∗

Λ is continuous and non-decreasing on t∈ ,

[ ]

0

τ

_e with

( )

0 0 0.

∗

Λ =

A1: X Zi, i( ),i=1,,n 

are i.i.d. vectors from the population of the class =0,1. A2: Pr

(

_i

( )

τ = >_e1

)

0. A3: Λ*0

( )

τ

e < ∞.

A4: E sup_ β∈Θ Z_il( ) ( ) r_i

( )

β  < ∞ =_ ,  0,1 for l=1,, b. A5: EXil < ∞ for l=1,, a. Condition A2 means E[{pi( )0Si( )0

( )

t pi( )1Si( )1

( )

t }SiU

( )

t ] 0

∗ ∗ ₊ ∗ ∗ _>

for all t∈ ,

[ ]

0τ_e because of Pr(_i

( )

t = =1)

( )

E[i t ], where

( )

_{( )}

( )

0

( )

i i

S∗ t =S  t; ,Λβ∗ ∗ and pi( ) pi( )(

α

)

∗ ₌  ∗

and SUi

( )

t is the i-th survival function of right-censoring time under a given ( (qi)).

i i

X Z, ∗ By the compact condition of Θ, we have 0<

( )0

_{( )}

E[pi α <] 1 on α∈Θ as a matter of course. However, in the case that there are no

( )

_{( )}

i

p α as in the ex-ample of doubly censored data, such conditions on α are omitted because ( )n0

(

q;α

)

=0 always.

Theorem 1. Suppose that Conditions A are satisfied. Then, as n→ ∞, _{p n}_{( )}

( )

θ converges almost surely to a deterministic function uniformly on θ∈Θ.

Theorem 1 is proved in Section 3.3. We prepare useful tools for such a proof in Sections 3.1 and 3.2 below. In Section 3.1, we discuss a relation needed to show that two OPL’s converge to the same limit, determining a plan (Lemmas 1 and 2) to obtain Theorem 1. In Section 3.2, following the plan, we provide a tool (Lemma 3) to give a weak convergence of all possible partial-sum processes.

3.1. Relations between Two Observed Partial Likelihoods

Note that the OPL is constructed by an integral on

{ }

0 1, n with the measure νn. Thus, to give two OPL’s with the same limit, it is predicted that the difference between the integrands of two OPL’s should converge weakly to zeros on

{ }

0 1 ,, n for example, by analogy of the dominated convergence theorem. Let

( )

1log

{

n _{{ }}0 1n n

(

π( )

( )

)

( )

d n

( )

}

for and

p θ n B ϕ p θ ν A B n

−

ℵ =

∫q

_{∈ ,} ℵ ;q q q = , ℵ = ,

   _ _

s

be functions that exist around _{p n}_{( )}

( )

θ , where

ϕ

n

( )

x =exp

( )

nx for simplicity. Then, we have the following

lemma about some _πA_p_{( )}ℵ

( )

θ

;q and πBp( )ℵ

( )

θ

;q .

Lemma 1. Suppose that

{ }

( )

(

π π

)

lim sup n 0 1n Ap Bp 0 for all 0

n→∞_θ_∈Θν q∈ , : ℵ θ; −q  ℵ θ;q >ε = ε> , (3.1) with probability 1. Then, as n→ ∞,

( )

as

supθ∈Θ A_pℵ θ −B_pℵ θ → 0,

where →as denotes almost sure convergence. (Proof of Lemma 1). Using Taylor expansions of

(

)

{ } (

1 1

)

1 1 1 1

1 0 0 1 0 1 0 0 1 0

log log , n n n

f − f = f− f − f g −g =n− g − g −g and 1 0 0

(

)

1 0 enh enh enh

n h h

(6)

the difference between Ap_{( )}ℵ

( )

θ

and ( )

( )

B pℵ

θ

 is derived as

( )

{ }

_{{ }} ( )

{

( )

}

( )

1 2

3 0 1 1 2

d

n

A B A B

n pℵ θ − pℵ θ =

∫

_q_{∈ ,} q πpℵ θ; −q πpℵ θ;q q ν q





  _{ } _  

  s

where, with some ξ ξ ξ1, 2, 3_{( )}q ∈ ,

( )

0 1 on

{ }

0 1

n ∈ ,

q and ξl′ = −1 ξl

(

l= , ,1 2 3

( )

q

)

,

( )

(

)

(

( )

)

(

( )

)

(

( )

)

( )

(

( ) ( )

( )

( ) ( )

( )

)

1 1 1 2 2 2

3 3 3

exp exp

and

A B A B

n n

p p p p

A B

n πp πp

ξ θ ξ θ ξ ϕ θ ξ ϕ θ

ϕ ξ θ ξ θ

ℵ ℵ ℵ ℵ

ℵ ℵ

′ ′

= + , = +

′

= ; + ; .

q q q q q

_ _ _ _ _ _

  

Let us assume that 0<B_p_{( )}_ℵ

( )

θ

<A_p_{( )}_ℵ

( )

θ

without loss of generality. Then,

{ }

₂ 1n ₁ and

( )

2 3 d

ν

n

∫

 q q  are bounded on θ∈Θ by some finite values b1 and b2. In fact,

{ }

1

2 1 1 1

n

ξ <

_ _ _and _{( )}

( )

₂ ₂

3 d

ν

n <1

ξ

∫

 q q  are shown by exp

(

B_{( )}

( )

)

exp

(

A_{( )}

( )

)

1.

pℵ θ pℵ θ <

  Therefore, we have

( )

{

( )

}

( )

{ }

( )

(

)

3 2

1 π π

1 2 π π

sup sup d

sup 0 1 0 .

A B A B

n

p p p p

n A B

n p p

b

b b

θ θ

θ

θ θ θ θ ν

ν θ θ

ℵ ℵ ℵ ℵ

ℵ ℵ

− ≤ ; − ;

≤ ∈ , : ; − ; >

∫

q

q q q q q

q q q





  _  

 

s

Applying (3.1) to the above inequality, this lemma is proved. 

Using Lemma 1, for several patterns of _πA_p_{( )}_ℵ and _πB_p_{( )}_ℵ we can investigate whether they converge to the same limit. The important problem is how to show the condition (3.1). For this purpose, we make the use of meaning that a convergence in νn-probability implicated in (3.1), since νn is a probability measure on

{ }

0 1 ., n We have the following lemma to establish the condition (3.1).

Lemma 2. Suppose that

{ }_{0 1} ( )

( )

p

sup n 0

A B

p p

π π

θ∈Θ, ∈ ,q  ℵ θ; −q  ℵ θ;q → , (3.2)

where →_p denotes convergence in probability. Then, (3.1) is established. Remark 2. For simplicity, letting

(

)

A _{( )}

(

)

B _{( )}

(

)

and

( )

(

sup

(

)

,

n p p n n

G θ,q = π ℵ θ; −q π ℵ θ;q fε q =1 θ∈ΘG θ,q >ε

denote

{ }

( )

(

0 1 1

)

{ }0 1n

( )

d

( )

n

n n n n n

Qε =ν q∈ , : fε q = =

∫q

_{∈ ,} fε q ν q

as the area of f_nε

( )

q =1. We can immediately show that (3.2) provides a version of convergence in probability of (3.1), i.e.

(

)

limn Pr Qn 0 1 for all 0

ε _ε

→∞ = = > , (3.3) because it is always satisfied that

{ }

( )

(

)

(

{ }

( )

)

_{{ }}_{0 1}

( )

sup 0 1 0 1 sup sup n

n n

n Gn n Gn Gn

θν q∈ , : θ,q >ε ≤ν q∈ , : θ θ,q >ε ≤ _θ_{∈Θ, ∈ ,}_q θ, .q Thus, we show that the operators of lim and Pr are mutually exchangeable in (3.3) in a proof of Lemma 2.  (Proof of Lemma 2). Note that

{ }

( )

(

0 1 1

)

{ }0 1

( )

d

( ) ( )

1

n n n

Qε ν fε ∞fε ν

∞

∞ _{∈ ,} ∞

= q∈ , : q = =

_∫q

q q ≤

because fn

( )

ε _q

is independent of

(

qn+1,qn+2,

)

due to the n-dimensional projection to

{ }

0 1 n

, from

{ }

0 1, ∞.

From condition (3.2), limits of fn

( )

ε

(7)

{ }0 1

( )

lim n lim n d P 1

n Q n f o

ε ε _ν

∞ ∞

∈ ,

→∞ =

∫

q →∞ q q ≤ .

This shows Pr lim

(

nQn 0

)

1 ε ₌ ₌

via Markov’s inequality such that

(

)

( )

2

Pr limnQn E limn fn d

ε _η ε _ν _η

∞

 

≥ ≤

_∫

_ _ .

q q q

Therefore, condition (3.2) gives (3.1). 

3.2. All Possible Partial-Sum Processes

Note that, by the portmanteau theorem, (3.2) is equivalent to, as n→ ∞,

( )

(

)

( )

(

)

D

(

)

{ }

πAp θ πBp θ 0 on θ 0 1 ,

∞ ℵ ; −q ℵ ;q → , ∈ Θ× ,q

 

where →D denotes convergence in distribution. In this section, we develop a tool to show such a weak convergence on q∈ ,

{ }

0 1∞.

For simplicity, let Ci′ = −1 Ci, qi 1 qi

∗ ∗

′ = − and qi′ = −1 qi. An important key to obtaining (3.2) or a limit of

( )

p n θ

 is a convergence result of ( )_n0

(

t, ;q

β

)

,_n( )1

(

t, ;q

β

∗

)

and( )_n0

( )

q;

α

on q∈ ,

{ }

0 1 .n As representa- tions of more essential terms to consider in the convergence on q∈ ,

{ }

0 1 ,n we denote

( )

(

)

n₁ and

( )

(

( )

)

n₁

( )

n Y t; , =

θ

q

∑

i= q Y ti i ;

θ

n Y t; ,

θ

q′ =

∑

i= q Y ti i′ ; ,

θ

 

where examples of Y ti

( )

;

θ

are

( )

( ) ( )1 1

_{( )}

( ) ( )0 0

, ,

h h

i i i i i i i i i i

C′ t Z r C′ t Z r C X′ and so on. Then, n1 n

(

Y t

( )

;θ ,

)

−

q



can be regarded as En Y ti

( )

θ qi

∗

 ; 

 q , which is the empirical version of the conditional expectation on a unique ∗

q but is calculated letting q be fixed. Let

( )

(

)

( )

E 0 if 0

E 1 if 1

i

i i i

i q

i i i

Y t q q

Y t

Y t q q

θ µ θ θ ∗ ∗ ∗ ∗ ∗   ; =  = ,    ; =   ; =  = ,    

which is the conditional expectation of Y ti

( )

;

θ

on given qi.

∗

Although qi

∗

is treated as unknown in many incomplete problems, this is not the case for terms included in the observed partial likelihood. Hence, for a fixed

{ }

0 1 ,n ∈ ,

q the conditional expectation of n

(

Y t

( )

; ,θ q

)

on

∗

q is E

(

( )

)

₁

(

( )

)

.

i

n

n Y tθ i qiµq∗ Y ti θ ∗

=

 ; , = ;

 q q 

∑

So, letting sn( )l

(

t, ;q β

)

,

( )1

₍

₎

n t, ;β

s q and sn( )0

(

q;α

)

be means to centralize

( )

₍

₎

, l

n t, ;qβ

 ( )1

(

)

n t, ;qβ

 and

( )0

₍

₎

n q;α

 , then

( )

₍

₎

_{( )}

{

( ) ( )

_{( )}

( ) ( )

_{( )}

}

( )

( ) ( )

_{( )}

(

)

(

( )

( ) ( )

_{( )}

)

1 1 0 0

1 1

E

i i

l l l

n i i i i i i i i

n l n l

i i i i i i i i i i

i q i q

t C t q Z r q Z r

n q C t Z r n q C t Z r

β β β

µ∗ β µ∗ β

∗ ∗ − − = =  _′  , ; = _ + _ ′ ′ ′ +

∑

+

∑

,

s q 

 

( )

₍

₎

_{( )}

{

( ) ( )

_{( )}

( ) ( )

_{( )}

}

( )

(

)

( )

_{( )}

(

_{( )}

( )

)

( )

_{( )}

1 1 1 0 0

1 0 1 1 1 1 E i i i i

n i i i i i i i i

q q

n n

i i i i i i i i i i

i q i q

t C t q Z r q Z r

n q C t Z r n q C t Z r

β β β

µ∗ ∗ β µ∗ ∗ β

∗ ∗ − − = =  _′  , ; = _ + _ ′ ′ ′ +

∑

+

∑

,

s q 

 

( )₀

₍

₎

_T

(

T

)

_T ₁

₍

₎

1

E E log 1 e

and i

i

n X

n C q Xi i i n i qi q C Xi i

α

α α  ′∗    α − ₌ ′µ∗ ′

; = _ _− _ + _+ .

 

∑

q s

Example: Missing Binary Covariates. For simplicity, we assume that Ci=1 or 0 occurs independently of

.

i

q∗ Letting ( )

(

( )

)

Pr qi

i i i i

w = C = X Z, ∗ , then

( )

( ) ( )

( )

{

( ) ( ) ( ) ( )

}

( ) ( ) ( ) ( ) ( )

_{( ) ( )}

1 0

1 1 0 0 1

0 1

E E Pr 1 1 E

E E

i

q

i i i i i i i i i i i

l l l U

i i i i i i i i i i i i i i

C q X X C q X Z X w p

C t q Z r q Z r Z r w p S t S t

(8)

while the expectations of terms which may form all possible partial-sums in ( )n0

(

q;α

)

,

( )l

₍

₎

n t, ;q β

 and

( )1

₍

₎

n t, ;q β

 are

(

)

(

)

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( )

( ) ( )

(

)

( ) ( )

_{( )}

( )

0 0 0 0

0 1 1

0

E Pr 0

E 0 E E if 0,

E E if 1,

E 1

E Pr 1

i

i i i i i i i

q

i i i i i i i i

i i i i i

i i i

q

l h l h

i i i i i i i i i i

q

C X X C X q q

X w q X w p p q

X w p p q

X w q

C t Z r Z r C t X Z

µ µ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   ′ = _ = , _   = _{  }    =         = =       =  _ = _ _ _ _ _ _  ′_   =   ′_ = , , ( ) ( ) ( ) ( )

_{( ) ( )}

( ) ( ) ( ) ( ) ( ) ( )

_{( ) ( )}

( )

0 0 0

0

0 1 1

(1) E if 0 E 0 1. E if 1 E[ ] i i

l h U

i i i i i i

i i

l h U

i i i i i i

i i

q q

Z r w S t S t p

q p

Z r w S t S t p

q p ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗                   _{= ,}  _ _  _ _ = = ,       _{= ,}      

In these calculi, note that the Bayes rule is used, such as

( ) ( )

(

)

( )

_{( )}

(

)

(

)

0 0 0 Pr

E d Pr d Pr .

Pr

i i

i i i i i i i i i i

i

q j X

X w q j X w X q j X w X

q j ∗ ∗ ∗ ∗ =  ₌ ₌ ₌ ₌  

∫

₌

Example: Cox Cure-Mixture Model. In this model, Ci= ∆i is usually assumed. The expectations of terms which may form all possible partial-sums are

(

( )

( ) ( )0 0

)

0

i

l i i i i q C t Z r

µ∗ ′ = and

( )

( ) ( )

(

)

( ) ( )

₍

_{) ( )}

( )

( ) ( )

_{( )}

( ) ( )

_{( )}

( ) ( )

1 1 1 1

1 1 0 0 1 1

0 0

1 1

E Pr 1 1

E 1 E E if 1

E E if 0

E 0

i

q

l l

i i i i i i i i i i i i

q

l l

i i i i i i i i i i

l U

i i i i i i

i i i i

t Z r Z r t X Z q q

Z r Q t q Z r Q t p p q

Z r S t p p q

Z r S t q

µ β β ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗  _ _  ′ ∆ = _ _ − ∆ = , , _ _       ₌  _{ } _ _ _ ₌         =_ =_ ,       ₌  _ = _ _ _ _ _ _   

where 0

( )

(

( )1

)

( )1

( )

Pr 0 1 e d U .

i i i i i i _t i i

Q t = t≤ ,∆ =T X Z, ,q∗= = −

∫

τS∗ s S s Similarly, we have

(

)

( )

( ) ( ) ( ) ( ) 1 1 0 0 0

E 0 E if 1

E Pr 0

E E if 0

i

i i i i i

q

i i i i i i i

q

i i i i

X Q p p q

X X Z q q

X p p q

µ∗ ∗

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗      =          ′ ∆ =  _∆ = , _ = _ _ _ _ .   _{ }_ _ _ _ =

On Weak Convergence. Let

( )

(

( )

)

. i

i i _q i

Y t ;θ =Y t;θ −µ∗ Y t;θ In our application, note that centred

( )

, 1

i

Y t ;

θ

i= , , n are zero-means and mutually independent but are not sampled from an identical distribution. We will therefore discuss the partial-sum processes about Y ti

( )

;

θ

sampled from two populations. Lemma 3 shows that 1

(



( )

)

n

n− Y t; ,θ q converges in probability to zero uniformly on

[

0,τ_e

]

× Θ × ,

{ }

0 1∞; recall that an

n-dimensional element q∈ ,

{ }

0 1n means marginal collection of elements of

{ }

0 1, ∞. In advance, let

(

)

(

)

( )

{

0 0 E 0 0

}

.

t i i

Kδ,θ = t ,θ : Y t ;θ −Y t ;θ <δ

Remark 3. For example, if EY ti

( )

;θ 2< ∞, by Chebyshev’s inequality, we immediately have

( )

(

)

(

1

)

lim_n_→∞sup Pr_q n−_n Y t ; ,θ q >ε = .0

However, a result of interest here is whether Pr and supq can be exchanged, that is, about

( )

(

)

(

1

)

(9)

Incidentally, we cannot obtain the almost sure convergence in this problem, since

( )

(

)

(

( )

)

( )

1

sup_q n−_n Y t ; ,θ q =E_n_1 Y t_i ;θ ≥0 Y t_i ;θ _

is always apart from zero. 

Lemma 3. Let Yi

( )

⋅;

θ

,i= , ,1 n be random elements on D

[ ]

0,

τ

e sampled from one of two distributions (populations) with EY ti

( )

;θ =0 at every

( )

t,θ , where the population to which the i-th element Yi

( )

⋅;

θ

belongs is known and indexed by q_i∗=0 or 1. Suppose that Y1

( )

⋅; , ,

θ

Yn

( )

⋅;

θ

are mutually independent and have EY ti

( )

;θ =0 at every

( )

t,

θ

. If the following three conditions are satisfied,

(i) The class of functions

{

Y ti

( ) ( )

; : , ∈ ,θ tθ

[

0τe

]

× Θ

}

is Glivenko-Cantelli, (ii) E sup _[₀ _]

( )

e i

t∈ ,τ θ, ∈ΘY tθ

 ; < ∞

   and

(iii) ₍ ₎

(

)

( )

0 0 0 0

E sup 1

t i i

t,θ ∈Kδ,θ Y t θ Y tθ O δ

 _; ₋ _; _≤

 

    for every Kt

[ ]

0 e ,

δ

θ τ

, ⊂ , × Θ then, as n→ ∞

( )

(

)

(

)

[ ]

{ }

1

D 0 on 0 0 1

n e

n− Y t ; , →θ q t, , ∈ ,θ q τ × Θ× , ∞.

Lemma 3 is proved in Appendix A.1. The following examples show that the conditions needed in Lemma 3 are satisfied for Y ti

( )

;α =C Xi′ i, Y ti

( )

;β =Ci′i

( )

t Zi( ) ( )1lri1h and

( )

( ) ( )0 0

₍

₎

0 1 .

l h

i i i i i

Y t;β =C′ t Z r l h, = ,

Example 1. Let

( )

(

)

.

i

i i i _q i i

Y t ; =θ C X′ −µ∗ C X′ From Condition A5, we have Condition (ii). Since Y ti

( )

;

θ

is independent of

( )

t,

θ

, Conditions (i) and (iii) are clearly satisfied.

Example 2. Let

( )

( ) ( )

(

( )

( ) ( )

)

, i

l h l h

i i i i i _q i i i i

Y t ;β =C′ t Z  r −µ∗ C′ t Z  r = ,0 1. Conditions (i) and (ii) are shown

by Conditions A1 and A4. Arbitrary

(

t0 0

)

Kt

δ β

β _,

, ∈ satisfies β0−β ≤O

( )

1 δ by Condition A4 and

(

0 0

)

(

0

)

( )

E_Y t_i ;β −Y t_i ;β _≤O 1 δ by Condition A3, where ⋅ means the Euclidean norm for vectors. Hence, Condition (iii) is satisfied.

3.3. Proof of Theorem 1

Consider _{p n}_{( )}

( )

θ ,_{p n}_{( )}

( )

θ and†_{p n}_{( )}

( )

θ in which the random quantities are reduced less than that of _{p n}_{( )}

( )

θ , where _{p n}_{( )}

( )

θ and †_{p n}_{( )}

( )

θ are _{p n}_{( )}

( )

θ in which _π_{p n}_{( )}

( )

θ is replaced by π_{p n}_{( )}

( )

θ;q and †πp n( )

( )

θ;q ,

i.e. _{( )}

( )

_{( )}

( )

( )( )  ( )( ) ( )

( )

( )( ) ( )( )

†

π n π n π n π n

p n p n p n p n

p θ p θ p θ p θ

θ θ θ θ

; = ; ; = ;

= , =

q q q q

   

_    and

( )

(

)

( )

(

)

{

(

)

}

( )

(

)

( )

(

)

( )

(

)

{

( )

(

)

}

( )

(

)

( )

0 T 1 (0) 0

0

π 0

0 1 0 0

† T

0

π 0

log d

log d .

e

n n n n

p n

n n n n

p n

t t t t

τ τ

θ α β β β β

∗ ∗ ∗ ∗

 

; = ; + _ , ; − , ; , ; _ Λ ,

 

; = ; + _ , ; − , ; , ; _ Λ

∫

q s q s q q s q

q s q s q s q s q

 



( ) ( )

p n p n

π and π : It is satisfied that

( )

(0)

(

)

( )

0 as

0 0

sups sEn d i t s n t β d t 0 ∗ ∗ ∗

− , ; Λ →

 

 

∫



∫

 q

and sup _{{ }}_{0 1} T ₀E ( )d

( )

T ₀ ( )1

(

)

d 0

( )

as0

i

s _q s

n

n i i

s Z t t t

β ∞ β β β

∗ ∗ , , ∈ ,q

∫

   −

∫

 , ;q Λ → by Conditions A1 and A4, similar to the standard Cox model (see [19]). Thus,

{ }0 1 π ( )

( )

π ( )

( )

as

sup_θ_{∈Θ, ∈ ,} ∞ _{p n} θ; − _{p n} θ; → 0

q  q  q

is obtained as n→ ∞.

( ) ( )

p n  p n

_π and_π : We have

{ }

( )0

₍

₎

( )0

₍

₎

p 0 1

sup_θ_{∈Θ, ∈ ,} ∞ n ;α − n ;α → 0

q  q s q

using Lemma 3 in Example 1 and applying the strong law of large numbers (SLLN) to logpi( )1

( )

α and T

i i i

C q X

α

_′∗

(10)

( )0

(

)

( )0

(

)

as

supt n t β n t β 0

∗ ∗ ∗ ∗

, ;q −s , ;q →



by applying the SLLN on D

[ ]

0,

τ

_e (see [1]) from Conditions A1 and A4. In addition, we have

{ }

( )1

(

)

( )1

(

)

p 0 1 ,

sup _n _n 0

t t β t β

∞ ∗ ∗

∈ , , ; − , ; →

q  q s q

using Lemma 3 in Example 2. Hence, _{{ }} _{( )}

( )

_{( )}

( )

0 1

sup_θ_{∈Θ, ∈ ,} ∞ _π_{p n} θ; − _π_{p n} θ;

q q q



  converges in probability to zero as n→ ∞.

( ) ( )

p n p n _ _†

and

π π : It satisfies that

( )0

₍

₎

( )0

₍

₎

p

sup_θ_{, ,}_t_q log_n t, ;q β −logs_n t, ;q β → 0

using Lemma 3 in Example 2 and the continuous mapping theorem about log-function. For the latter application, note that sn( )0

(

t, ;q β

)

is bounded away from zero on

[

0τe

]

{ }

0 1

∞

, × , × Θ by Condition A2. Hence,

{ } ( )

( )

†

π π

0 1

sup_θ_{∈Θ, ∈ ,} ∞ _{p n} θ; − _{p n} θ;

q  q  q

converges in probability to zero as n→ ∞.

Applying the above three results to Lemmas 1 and 2, therefore, we obtain

( )

†( )

( )

( ) as as as

supθ p n ( )θ −p n θ → 0, supθ p n θ −p n θ → 0 and supθ p n θ −p n θ → 0,

respectively, so that we conclude

( )

†( )

( )

as

sup_θ_∈Θ _{p n} θ −_{p n} θ → 0 asn→ ∞. (3.4) Although (3.4) shows that the limit of p n_{( )}

( )

θ is equivalent to that of ( )

( )

† p n θ

 , †p n_{( )}

( )

θ still depends on n and qi

∗

’s. We will therefore investigate the limit form of †p n_{( )}

( )

θ further. In discussing a convergence about the form 1

( )

1 _i n

i

i q

n−

∑

₌qµ∗ ⋅ and

( )

1

1 _i n

i

i q

n−

∑

₌q′µ∗ ⋅ included in

( )0 n s and

( )1 n

s of the partial sums, note that they can be written as

( )

(

)

( )

(

)

( )

(

)

( )

(

)

( )

1

1 0

1 1

1 0

1

E 1 1 E 1 0

E 0 1 E 0 0

i

n

i n i i n i i

i q

n

i n i i n i i

i q

n q q q q q

µ µ µ

∗

− ∗ ∗

=

− ∗ ∗

=

   

⋅ = _ = , = _ ⋅ + _ = , = _ ⋅ ,

   

′ ⋅ = _ = , = _ ⋅ + _ = , = _ ⋅ .

∑

1 1

Let v( )h

( )

E

(

qi h qi

)

,h 0 1.

∗

 

= _ = , = _ , = ,

q 1

 _ _

Similarly to Lemma 3 (proof of s2), we show that

(

)

( )

_{( )}

p

E_n_1 q_i = ,h q_i∗=l _→ v_h q asn→ ∞

at arbitrary point q∈ ,

{ }

0 1 ∞. In particular, because of

(

) (

)

(

)

Pr qi h qi Xi Pr qi Xi Pr qi h X qi i ,

∗ ∗ ∗

= , = = = = , =

note that

( )

_{( )}

_{E E}

(

)

_E ( )_Pr

(

)

h i i i i i i i

v = _ _ q = , =h q∗ X __= _p∗ q =h X q, =∗ _

 

q 1

 _  _

and then v₁( )

( )

q +v₀( )

( )

q = Ep_i∗( ) _. We have the following lemma.

Lemma 4. 1

( )

1

( )

1 _i and 1 _i

n n

i i

i q i q

n−

∑

₌qµ∗ ⋅ n−

∑

₌q′µ∗ ⋅ converge in probability to

( )1

_{( ) ( )}

( )0

_{( ) ( )}

( )0

_{( ) ( )}

( )0

_{( ) ( )}

1 1 1 0 and 0 1 0 0

v q µ ⋅ +v q µ ⋅ v q µ ⋅ +v q µ ⋅

uniformly on q∈ ,

{ }

0 1 ∞.

A proof of Lemma 4 is provided briefly in Appendix A.2 since it is similar to Lemma 3. Now, applying Lemma 4 to s( )n0

(

t, ;qβ

)

,sn( )1

(

t, ;qβ

)

and s( )n0

(

q;α

)

, we obtain their limits as

( )

₍

₎

_{( )}

{

( ) ( )

_{( )}

( ) ( )

_{( )}

}

( )

_{( )}

(

_{( )}

( ) ( )

_{( )}

)

( )

_{( )}

(

_{( )}

( ) ( )

_{( )}

)

( )

_{( )}

(

_{( )}

( ) ( )

_{( )}

)

( )

_{( )}

(

_{( )}

( ) ( )

_{( )}

)

0 1 1 0 0

1 1 1 0 1 1

1 1 1 0

1 0 0 0 0 0

0 1 0 0

E

l l

i i i i i i i i

l l

i i i i i i i i

l l

i i i i i i i i

t C t q Z r q Z r

v C t Z r v C t Z r

β β β

µ β µ β

∗ ∗

 _′ 

, ; = _ + _

′ ′

+ +

′ ′

+ + ,

s q

q q

 

 