Weighted estimation of conditional mean function with truncated, censored and dependent data

(1)

Weighted estimation of conditional mean function with

truncated, censored and dependent data

Han-Ying Lianga, Mar´ıa del Carmen Iglesias-P´erezb

a_{School of Mathematical Science, Tongji University, Shanghai 200092, P. R. China} E-mail: [email protected]

b_{Department of Statistics and OR, Escuela de Ingenier´ıa Forestal Organization, Universidad de Vigo,}

Campus A Xunqueira, 36005, Pontevedra, Spain. E-mail: [email protected]

Abstract. By applying the empirical likelihood method, we construct a new weighted estimator

of the conditional mean function for a left-truncated and right-censored model. Assuming that the observations form a stationary α-mixing sequence, we derive weak convergence with a certain rate and prove asymptotic normality of the weighted estimator. The asymptotic normality shows that the weighted estimator preserves the bias, variance, and, more importantly, automatic good boundary behavior of a local linear estimator of the conditional mean function. Also, a Berry-Esseen type bound for the weighted estimator is established. A simulation study is conducted to study the ﬁnite sample behavior of the new estimator and a real data application is provided.

Key words and phrases: Asymptotic normality; Berry-Esseen type bound; conditional mean

func-tion; truncated and censored data; weighted estimator; α-mixing

AMS 2000 subject classifications: 62N01; 62E20

1 Introduction

Regression techniques are commonly used for exploring the relationship between response Y and covariate X. This can be done via estimating the function m(x) = E(ϕ(Y )|X = x), where the (known) transformation ϕ(·) is introduced to include various targets of interest. For example, taking ϕ(y) = yr we get the rth conditional moment (when r = 1, the 1th conditional moment is the regression function), and if we take ϕ(y) = I(y≤ t) then m(x) becomes the conditional distribution function of Y given X = x at t. There is a large literature about the Nadaraya-Watson (NW) estimator of the conditional mean under complete data and different kinds of associations, like mixing processes and Markovian chains, for example, Bercu et al. (2015) for the recursive NW estimator associated with the recursive sliced inverse regression method, Huang and Su (2015) for characterizations based on regression assumptions of order statistics, Yanev and Ahsanullah (2012) for characterizations of Student’s t-distribution via regressions of order statistics, Ling and Wu (2012) for modified kernel regression estimation for functional data, Goegebeur et al. (2014) for nonparametric regression estimation of conditional tails. For more details see Györfi et al. (1989) and Bosq (1998) and the references therein. Masry and Fan (1997) considered estimating E(Y|X = x) for mixing sequences using the local polynomial estimator. For more references

(2)

about non-parametric regression techniques with dependent data see, for example, the bibliographical notes given in Fan and Yao (2003).

In some fields as reliability, survival analysis or economics, right-censored or/and left-truncated data are often encountered. The random variable Y can be regarded as the lifetime of a device, the survival time of a patient or the duration time of some economic variable. El Ghouch and Van Keilegom (2008) discussed for the first time asymptotic normality for an estimator of m(x) based on a transformed variable of Y , for the right-censored model with dependent data; Liang et al. (2011) for local polynomial estimation of the conditional mean function with dependent truncated data; Wang et al. (2013) for local polynomial quasi-likelihood regression with truncated and dependent data; de Uña- Álvarez et al. (2010) for wavelet regression with left-truncated dependent data, which is useful in the presence of discontinuities; Zou and Liang (2017) for wavelet estimation of density for censored data with censoring indicator missing at random. Recently, Liang et al. (2015) constructed the NW and the local linear (LL) estimators of m(x) and its derivatives for the left truncated and right censored (LTRC) model, and established asymptotic normality for the NW and LL estimators of m(x) under dependent observations. It can be seen easily that the implementation of the NW estimate of the regression function is much easier than the LL method, and the estimated values of the regression function are always within the range of the response variable. However, it is well-known that the NW method is inferior to the LL approach due to the limitations such as larger bias, non-adaptation and boundary effects (see, e.g., Fan and Gijbels (1996)). To take the advantages of both the NW method and LL estimate, Hall and Presnell (1999) proposed for the first time a weighted NW estimator of the regression function under the independent and completed samples. Liang (2012) used this method to define a weighted NW estimator for the nonparametric regression with truncated data.

We, in this paper, use empirical likelihood method to construct a new weighted estimator of m(x) for the LTRC model, and investigate weak consistency and asymptotic normality of the weighted estimator under α-mixing observations. Also, a Berry-Esseen type bound for the weighted estimator is established. These results are new, even in independent cases. Theoretical results in Section 3 and simulation study in Section 4 show that the weighted estimator shares the advantages of both the NW method and LL ﬁtting. Speciﬁcally, the new estimator reduces the bias of the NW estimator (matching that of the LL estimator).

Let (X, Y, T, W ) be a random vector, where Y is the lifetime with distribution function (df) F , T is random left truncation time with the df L and W denotes random right censoring time with the df G. Assume that X admits df V (·) and density v(·). In the LTRC model one observes (X, Z, T, δ) if Z ≥ T , where Z = min(Y, W ) and δ = I(Y ≤ W ). When Z < T nothing is observed. Clearly, if Y is independent of W , then Z has the df H = 1− (1 − F )(1 − G). Take θ = P (T ≤ Z), then necessarily, we assume θ > 0. Let (Xi, Zi, Ti, δi), for i = 1, 2,· · · , n, be a stationary random sample

(3)

random variables, as usual in survival analysis. Iglesias-Pérez and González-Manteiga (1999) defined a generalized product-limit estimator of conditional distribution function F (y|x) of Y given X = x for the LTRC data by

b Fn(y|x) = 1 − n ∏ i=1 ( 1−∑_n I(Zi ≤ y)δiBni(x) j=1I(Tj ≤ Zi ≤ Zj)Bnj(x) ) , (1.1) where Bni(x) = K(x−X_h_ni)/ ∑n j=1K( x−Xj

hn ), K denotes a kernel function, and 0 < hn → 0 is a

band-width parameter. Under the i.i.d. setting, Iglesias-Pérez and González-Manteiga (1999) obtained for the first time an almost sure representation and asymptotic normality of bFn(·). Liang et al. (2012)

investigated asymptotic properties of bFn(·) with α-mixing data.

In the sequel,{(Xi, Zi, Ti, δi) =: ζi, 1≤ i ≤ n} is assumed to be a stationary α-mixing sequence of

random vectors. Recall that the sequence{ζk, k≥ 1} is said to be α-mixing if the α-mixing coeﬃcient

α(n) :def= sup

k≥1

sup{|P (A ∩ B) − P (A)P (B)| : A ∈ F_n+k∞ , B∈ F₁k}

converges to zero as n → ∞, where F_lm = σ{ζl, ζl+1, . . . , ζm} denotes the σ-algebra generated by

ζl, ζl+1, . . . , ζm with l ≤ m. Among various mixing conditions used in the literature, the α-mixing is

reasonably weak, and has many practical applications. Many stochastic processes and time series are known to be α-mixing. Withers (1981) obtained various conditions for a linear process to be α-mixing. Also see, e.g., Doukhan (1994), page 99, for more details; and Cai and Kim (2003) for motivation in the scope of survival analysis. In fact, under very mild assumptions linear autoregressive and more generally bilinear time series models are α-mixing with mixing coeﬃcients decaying exponentially, i.e., α(n) = O(ρn) for some 0 < ρ < 1.

The rest of the paper is organized as follows. In the next section we introduce the estimators of m(x). Main results are formulated in Section 3. A simulation study is presented in Section 4, while a real data illustration is given in Section 5. Section 6 gives the proofs of the main results. Some preliminary lemmas, which are used in the proofs of the main results, are collected in Appendix (Section 7).

2 Estimators

In the sequel, for any df Q(y) = P (η≤ y), its density function is denoted by q(y), we denote the left and right support endpoints by aQ= inf{y : Q(y) > 0} and bQ = sup{y : Q(y) < 1}, respectively.

(4)

We observe that V∗(x) = P (X≤ x|T ≤ Z) = θ−1∫_−∞x θ(t)v(t)dt, which gives v∗(x) = θ−1θ(x)v(x). In addition, assume that Y, T and W are conditionally independent at X = x, and F (y|x) and G(y|x) are continuous with respect to y then it is easy to verify that

C(y|x) = θ−1(x)L(y|x)(1 − G(y|x))(1 − F (y|x)) = θ−1(x)L(y|x)(1 − H(y|x))

and H₁∗(y|x) = θ−1(x)∫₀yL(t|x)(1−G(t|x))f(t|x)dt, which gives h∗₁(y|x) = θ−1(x)L(y|x)(1−G(y|x))f(y|x). The estimator of C(y|x) is deﬁned by bCn(y|x) =

∑n

i=1I(Ti ≤ y ≤ Zi)Bni(x).

From now on we only consider classes of the functions ϕ that satisfy the following conditions: (H):(H1) ϕ vanishes outside the interval [τ1, τ2] for some aL(·|x) < τ1 ≤ τ2 < bH(·|x);

(H2) ϕ is a bounded function on [τ1, τ2].

Remark 2.1 If the lifetime Y has no left truncation (i.e., T = 0), then one can choose τ1 = 0 in the condition (H), in this case, (H) reduces to assumption (H) of El Ghouch and Van Keilegom (2008).

Under condition (H), if Λ(·) is a bounded function with bounded support, and the functions v∗(t) and f (y|t) are continuous at t = x we ﬁnd

E{Λan(X− x)[δϕ(Z)C−1(Z|X)(1 − F (Z|X)) − m(x)]

}

→ 0, (2.1)

where Λan(·) = Λ(·/an)/an and 0 < an→ 0.

If m(z) is assumed to have (p + 1)th continuous derivative in a small neighbourhood of x, it can be approximated by a polynomial function as

m(z)≈ m(x) + · · · + m(p)(x)(z− x)p/p! := β0+· · · + βp(z− x)p.

Note that (2.1) suggests that m(x) can be viewed as a nonparametric regression of Y_i∗ = δiϕ(Zi)C−1(Zi|Xi)(1−

F (Zi|Xi)) on Xi = x. Therefore, based on the idea of the local polynomial smoother, the estimator

of (m(x),· · · , m(p)(x)/p!)τ is deﬁned as ( bβ0,· · · , bβp)τ, which minimizes n ∑ i=1 ( b Y_i∗− p ∑ j=0 βj(Xi− x)j )2 Λan(Xi− x), (2.2) where bY_i∗ = δiϕ(Zi) bCn−1(Zi|Xi)(1− bFn(Zi|Xi)).

From (2.2), when p = 0, we obtain the following NW type estimator of m(x) b mN W(x) = n ∑ i=1 b Y_i∗wN W_i (x) with w_iN W(x) = ∑Λan(Xi− x) n j=1Λan(Xj− x) ,

when p = 1, the LL estimator of m(x) is mbLL(x) =

∑n i=1Ybi∗wiLL(x), where w_iLL(x) = Λan(Xi− x){sn2− (Xi− x)sn1} sn0sn2− s2_n1 with snj = n ∑ i=1 (Xi− x)jΛan(Xi− x).

(5)

Remark 2.2 The estimators mbN W(x) and mbLL(x) are constructed for the ﬁrst time in Liang et al.

(2015), who investigated the asymptotic normality of mbN W(x) andmbLL(x) under dependent

assump-tions.

It is easy to see that the LL weights w_iLL(x) satisfy:

n ∑ i=1 w_iLL(x) = 1 and n ∑ i=1 (Xi− x)wLLi (x) = 0. (2.3)

But for the NW type weights wN W_i (x) this moment condition is not fulﬁlled.

In view of the information (2.3), we use the empirical likelihood method to deﬁne the new weighted estimator of m(x) for the LTRC model as follows. We ﬁrst introduce the empirical likelihood function L =∏n

i=1pi(x), where p1(x),· · · , pn(x) are subject to the restrictions:

pi(x)≥ 0, n ∑ i=1 pi(x) = 1 and n ∑ i=1 pi(x)(Xi− x)Λan(Xi− x) = 0. (2.4)

The maximum ofL can be found via Lagrange multipliers. It may be shown that Lmax=

∏n

i=1bpi(x),

where bpi(x) = _n1 ·_1+η(X 1

i−x)Λan(Xi−x), i = 1,· · · , n, and η is the solution of the following equation:

n ∑ i=1 (Xi− x)Λan(Xi− x) 1 + η(Xi− x)Λan(Xi− x) = 0. (2.5)

Equivalently, η is chosen to minimize Ln(η) =

∑n

i=1log{1 + η(Xi− x)Λan(Xi− x)}, or to ﬁnd the root

of the equation L′_n(η) = 0 by using the New-Raphson iteration scheme. The proposed weighted estimator of m(x) is

b mn(x) = n ∑ i=1 b Y_i∗wni(x) with wni(x) = bpi (x)Λan(Xi− x) ∑n j=1bpj(x)Λan(Xj− x) .

Remark 2.3 Although the estimatormbn(x) is considered only for covariate X in univariate case, we

would like to mention that the basic ideas of our methodology hold for multivariate situations.

3 Main Results

Throughout the paper, let C, C1,· · · and c0, c, c1,· · · denote generic ﬁnite positive constants, whose

values may change from line to line, and let Φ(u) stand for the standard normal distribution function and [t] be the integer part of t. An= O(Bn) means|An| ≤ C|Bn| and f(i,j)(x, y) := ∂i+jf (x, y)/∂xi∂yj.

Let I = [x1, x2] be an interval contained in the support of m∗(·) such that infx∈Iϵm∗(x)≥ ϵ0 > 0 for

Iϵ= [x1− ϵ, x2+ ϵ] with chosen small ϵ > 0.

Throughout the paper we assume that α(n) = O(n−λ) for some positive constant λ. In order to formulate the main results, we need the following assumptions.

(6)

(A0) m(x) has 2th continuous derivative for x∈ I.

(A1) K(·) is a Lipschitz-continuous density function with bounded support and ∫_RtjK(t)dt = 0 for j = 1, 2,· · · k0− 1 and k0 > 2.

(A2) (i) For s∈ Iϵ, Y, T and W are conditionally independent at X = s;

(ii) τ1 and τ2 are two real numbers such that aL(·|x) < τ1 ≤ τ2 < bH(·|x) and aL(·|x) < aH(·|x) for

x∈ Iϵ.

(A3) Let k0 > 2. The ﬁrst k0 derivatives of functions θ(s) and v(s) are bounded for s∈ Iϵ, and the

ﬁrst k0 partial derivatives with respect to s of functions L(y|s), G(y|s), F (y|s), l(y|s), g(y|s)

and f (y|s) are bounded for (s, y) ∈ Iϵ× R.

(A4) For all integers j ≥ 1, the joint conditional density v∗_j(·, ·) of X1 and Xj+1 exists onR × R and

satisﬁes v∗_j(s1, s2)≤ C1 for (s1, s2)∈ Iϵ× Iϵ.

(A5) (i) h−2_n (nhn/ ln(n))−(λ−3)/2 = O(1); (ii)

∑_∞

n=1h−2n (nhn/ ln(n))−(λ−3)/2<∞.

(A6) Λ(·) is a symmetric and bounded density function with a bounded support [−1, 1]. (A7) Function m(x) has continuous second order derivatives at x∈ I.

(A8) Assume that nan → ∞, and that the sequence α(n) satisﬁes for positive integers q := qn that

q = o((nan)1/2) and limn→∞(na−1n )1/2α(q) = 0.

Remark 3.1 Conditions (A1)-(A3) and (A6)-(A7) are standard regularity conditions and used

com-monly in the literature, see, e.g., Iglesias-P´erez and Gonz´alez-Manteiga (1999) for conditions (A1)-(A3), El Ghouch and Van Keilegom (2008) for conditions (A6)-(A7); Condition (A4) is mainly tech-nical, which is employed to simplify the calculations of covariances in the proof, these assumptions are redundant for the independent setting; the role of condition (A5) is to obtain the rate of conver-gence of the estimator bFn(y|x). Condition (A8) is used to prove asymptotic normality for an α-mixing

sequence. Set ∆ij = ∫ RsiΛj(s)ds, σ2(x) = θ(x) ∫ R ϕ 2_{(y)f (y}_|x)dy

L(y|x)(1−G(y|x))− m2(x) and Γ2(x) =

θ∆02σ2(x)

θ(x)v(x) .

Theorem 3.1 Let x∈ I and α(n) = O(n−λ) for some λ≥ 2.1. Suppose that conditions (H), (A0)-(A4), (A5)(i) and (A6)-(A7) are satisﬁed. If na1+r0

n = O(1) for some constant r0 ≥ 2, then

b mn(x)− m(x) = a2_n 2 m ′′_(x)∆ 21+ Op((nan)−1/2) + op(a2n) + Op ((_ln(n) nhn )1/2 + hk0 n ) .

Theorem 3.2 Under the assumptions of Theorem 3.1 and Γ2(x) > 0, if (A8) holds, then √ nan { b mn(x)− m(x) − a2_n 2 m ′′_(x)∆ 21+ op(a2n) + Op ((_ln(n) nhn )1/2 + hk0 n )} _D → N(0, Γ2(x)).

(7)

Remark 3.2 (a) It may be seen from Theorem 3.1 that the weighted estimator mbn(x) → m(x) in

probability with a rate, which implies thatmbn(x) is consistent.

(b) From Theorem 3.2, if anln(n)/hn → 0 and nanh2kn0 → 0, it is easy to see that the asymptotic

mean-squared errors (AMSE) ofmbn(x) is AM SE(mbn) = a4nB + (nan)−1θ∆02(θ(x)v(x))−1σ2(x) with B = ∆2₂₁[2−1m′′(x)]2, thus we get the aymptotic optimal bandwidth for the bandwidth an,

aoptn = (θ∆02σ

2_(x)

4Bθ(x)v(x))1/5n−1/5 by minimizing the AMSE of mbn(x). (c) Note that Γ2_{(x) = ∆} 02(v∗(x))−1(σ21(x)− m2(x)), where σ₁2(x) = ∫ Rϕ 2_(y)C−2_(y_{|x)(1 − F (y|x))}2_dH∗ 1(y|x)

and m(x) =∫_Rϕ(y)C−1(y|x)(1−F (y|x))dH₁∗(y|x). Deﬁne the estimators bv∗_n(x) = _nh1

n ∑n i=1K (_x_−X_i hn ) and bH_1n∗ (y|x) =∑n_i=1I(Zi≤ y, δi= 1)Bni(x) of v∗(x) and H1∗(y|x), respectively. Hence, one can get plug-in estimatorsbσ2_1n(x) =∑n_i=1ϕ2(Zi)δi(1− bFn(Zi|x))2Bni(x)

b C2 n(Zi|x) andm(x) =b ∑n i=1 ϕ(Zi)δi(1− bFn(Zi|x))Bni(x) b Cn(Zi|x)

of σ2₁(x) and m(x), respectively. Hence, a plug-in estimator of Γ2(x) is bΓ2_n(x) = bσ21n(x)− bm2(x)

bv∗ n(x)

∫

RΛ2(t)dt. (d) For the LL estimator mbLL(x) of m(x), Liang et al. (2015) proved that

√ nan { b mLL(x)− m(x) − a2_n 2 m ′′_(x)∆ 21+ op(a2n) + Op ((_ln(n) nhn )1/2 + hk0 n )} _D → N(0, Γ2(x)). Obviously, the estimators mbLL(x) and mbn(x) of m(x) have same asymptotic normality, i.e.,

b

mn(x) matches both the asymptotic bias and the asymptotic variance of mbLL(x).

In order to give a Berry-Esseen type bound formbn(x) to assess the quality of the normal

approx-imation in Theorem 3.2, we need the following additional assumption.

(B) µ := µn and ν := νn are positive integers such that µ + ν≤ n, µ/n → 0 and νµ−1 → 0.

Put γ1n = (na5n)1/2, γ2n = (anln(n)/hn)1/4+ (nanhn2k0)1/4, γ3n = a−1n (ln(n)/(nan))(λ−1)/2, γ4n = νµ−1a−ς/(2+ς)n u(µ), γ5n= (µ/n)βa−ς(1+β)/(2+ς)n , γ6n= nµ−1α(ν) and u(µ) =

∑_∞

i=µας/(2+ς)(i).

Theorem 3.3 Let x ∈ I, α(n) = O(n−λ) and ∑∞_n=1_a1

n(

ln(n) nan)

(λ−1)/2 _< _{∞ for some λ > (2 + ς)/ς,} where ς > 0. Suppose that conditions (H), (A0)-(A4), (A5)(ii), (A6)-(A7) and (B) are satisﬁed, and that Γ2(x) > 0. Let γin → 0 (i = 1, · · · , 6) for 0 < 2β < ς and β ≤ [ςλ − (2 + ς)]/[2λ + (2 + ς)]. If

na1+r0

n = O(1) and ln(n)/(nan)1−2/r0 = O(1) for some constant r0 > 2, then for (20λ− 1)/[λ(10λ −

1)]≤ ρ < 1 we have sup u P((nan)1/2(mbn(x)− m(x)) Γ(x) ≤ u ) − Φ(u) = O((µ−1ν)1/3+ (n−1µ)1/3+ a(1_n−ρ)/3 +(ln(n)/(nan))1/4+ γ1n+ γ2n+ γ3n+ γ4n1/3+ γ5n+ γ6n1/4 ) .

(8)

Remark 3.3 The assumptions γin→ 0 (i = 1, · · · , 6) in Theorem 3.3 can be satisﬁed by appropriate

choosing for an, hn, p and q when λ is large enough (note that, if we replace α(n) = O(n−λ) by

the exponential decay α(n) = O(ρn) for some 0 < ρ < 1, which has been used by some authors (see Doukhan (1994)), then λ can be arbitrarily large).

4 Simulation Study

In this section, we conduce a simulated study to investigate the ﬁnite sample performance of the new estimator mbn(x) (denoted by NWP) in the case ϕ(y) = y, and compare it with that ofmbN W(x) and

b

mLL(x) (results by Liang et al (2015)). In particular, (i) we compare the global mean squared errors

of estimatorsmbn(x), mbN W(x) and mbLL(x) for diﬀerent sample sizes, truncation rates, percentage of

censoring and bandwidth choices, and we explore their graphical ﬁt to the true underlying curve; (ii) we examine how good the asymptotic normality of the new estimator is by Normal-Probability-plots ofmbn(x) at the speciﬁc point x = 1.

In order to obtain an α-mixing observed sequence {Xi, Zi, Ti, δi}, we generate the data as in the

simulated study by Liang et al. (2015), which is as follows.

(1) Drawing of the ﬁrst observation (X1, Z1, T1, δ1) in the ﬁnal sample. Step 1. Draw e1 ∼ N(0, 1), take X1 = 0.5e1;

Step 2. Compute Y1and W1, respectively, from the model Y1 = sin(πX1)+ϕ1(1+0.3 cos(πX1))ϵ1, W1 = sin(πX1) + 0.5ϕ2(1 + 0.3 cos(πX1)) + ϕ3(1 + 0.3 cos(πX1))˜ϵ1, where both ϵ1 and ˜ϵ1 are N (0, 1) random variables, ϵ1, ˜ϵ1and X1are mutually independent, and ϕi (i = 1, 2, 3) are chosen

(see below) to control the percentage of censoring. Take Z1 = min(Y1, W1), δ1= I(Y1≤ W1); Step 3. Draw independently T1 ∼ N(µ, 1), where µ is adapted in order to get diﬀerent values of θ. If Z1 < T1, reject the datum (X1, Z1, T1, δ1) and go back to Step 2; do this until Z1≥ T1.

(2) Drawing of the second observation (X2, Z2, T2, δ2) in the ﬁnal sample.

Step 4. Draw X2 according to the AR(1) model X2 = ρX1 + 0.5e2, where e2 ∼ N(0, 1) is

independent of X1, and |ρ| < 1 is some constant, which is chosen to control the dependence of

the observations;

Step 5. Compute Y2and W2, respectively, from the model Y2 = sin(πX2)+ϕ1(1+0.3 cos(πX2))ϵ2, W2 = sin(πX2) + 0.5ϕ2(1 + 0.3 cos(πX2)) + ϕ3(1 + 0.3 cos(πX2))˜ϵ2, where both ϵ2 and ˜ϵ2 are N (0, 1) random variables, and ϵ2, ˜ϵ2 and X2 are mutually independent. Take Z2= min(Y2, W2), δ2 = I(Y2 ≤ W2);

Step 6. Draw independently T2 ∼ N(µ, 1). If Z2 < T2, reject the datum (X2, Z2, T2, δ2) and go

(9)

By replicating the process (2) above, we generate the observed data (Xi, Zi, Ti, δi), i = 1,· · · , n.

The generating process shows that Xi= ρXi−1+ 0.5ei, Yi = sin(πXi) + ϕ1(1 + 0.3 cos(πXi))ϵi, Wi =

sin(πXi) + 0.5ϕ2(1 + 0.3 cos(πXi)) + ϕ3(1 + 0.3 cos(πXi))˜ϵi, Zi= min(Yi, Wi), δi = I(Yi ≤ Wi), where

ei ∼ N(0, 1), ϵi ∼ N(0, 1), ˜ϵi ∼ N(0, 1) and Ti ∼ N(µ, 1), and everything is distributed conditionally

on Zi ≥ Ti. Note that the α-mixing property of the observable Xi is immediately transferred to the

(Xi, Zi, Ti, δi). Hence, the true underlying regression function is m(x) =E(Y |X = x) = sin(πx).

For the proposed estimators, we employ the kernel K(x) = Λ(x) = 15₁₆(1− x2₎2_I(_{|x| ≤ 1). In}

addition, the parameters ϕi (i = 1, 2, 3) allow to control the percentage of censoring (PC) which is

given by PC =P (Yi > Wi|Xi= x) ϕ1=ϕ3=0.7 = 1− Φ (₅√₂ 14 ϕ2 ) =        10%, when ϕ2 = 2.537, 30%, when ϕ2 = 1.038, 50%, when ϕ2 = 0.

4.1 Comparison of consistency among the estimators of m(x)

To compare the global performance of the estimatorsmbn(x) , mbN W(x) and mbLL(x), we compute for

the estimatormbh,aof m the global mean square error (GMSE) along M = 500 Monte Carlo trials and

a grid of bandwidths h := hn and a := an; the GMSE is deﬁned as

GM SE(h, a) = 1 M n M ∑ l=1 n ∑ k=1 [mbh,a(Xk,l)− m(Xk,l)]2,

where Xk,l is the k-th datum in the l-th trial. The minimal values of GM SE(h, a) along the grid, and

the corresponding bandwidths minimizing the error, are reported in Table 1. Speciﬁcally, for n = 100, hnand an range from 0.05 to 0.80 with a step 0.01; for n = 300, hn ranges from 0.05 to 0.80 with step

0.01 and anranges from 0.05 to 0.60 with step 0.01. When hn= 0.05 was obtained, then the grid was

expanded.

In Table 1 we see, for the new estimator mbn, that the minimum GMSE decreases as the sample

size n or the no truncation proportion θ increase, and that it increases as the dependence of the observations increases (ρ increases) or as the percentage of censoring PC increases. All these features were expected, and they were also observed for NW and LL estimators by Liang et al. (2015). More interestingly, we can appreciate how the new estimatormbnoutperforms the NW estimator in most of

the cases and how the local linear smoother outperforms both of them. It is also seen in Table 1 that the bandwidth andecreases as the sample size increases or the percentage of censoring decreases, and

that, in many cases, an for the new estimator is between the anfor the NW estimator and the an for

the LL estimator. The bandwidth hn is very similar for the three estimators and it increases as the

(10)

Table 1: Minimum GMSE’s of mbn, mbN W and mbLL along M = 500 Monte Carlo trials, and

corre-sponding optimal bandwidths, for several truncation rates and percentage of censoring (PC).

ρ θ PC n mbn an hn mbN W an hn mbLL an hn 0.1 60% 10% 100 5.9923× 10−2 0.36 0.04 6.2453× 10−2 0.32 0.03 6.0215× 10−2 0.41 0.04 300 3.0086× 10−2 0.33 0.07 3.1718× 10−2 0.29 0.06 2.9861× 10−2 0.37 0.10 50% 100 9.7991× 10−2 0.45 0.48 9.3781× 10−2 0.43 0.49 9.4899× 10−2 0.52 0.48 300 5.4613× 10−2 0.37 0.41 5.2431× 10−2 0.41 0.68 5.2705× 10−2 0.44 0.45 90% 10% 100 4.3330× 10−2 0.40 0.22 4.4696× 10−2 0.36 0.34 4.1301× 10−2 0.46 0.26 300 2.0164× 10−2 0.32 0.24 2.0731× 10−2 0.31 0.45 1.9873× 10−2 0.36 0.23 50% 100 8.5158× 10−2 0.46 0.72 8.6258× 10−2 0.42 0.67 8.3940× 10−2 0.51 0.71 300 4.5092× 10−2 0.38 0.53 4.3951× 10−2 0.39 0.68 4.3736× 10−2 0.47 0.66 0.5 60% 10% 100 6.0644× 10−2 0.39 0.05 6.2637× 10−2 0.36 0.05 5.9977× 10−2 0.44 0.07 300 3.1768× 10−2 0.34 0.09 3.2710× 10−2 0.31 0.09 3.1498× 10−2 0.35 0.09

In Figure 1, we plot the theoretical curve m(x) = sin(πx) together with the estimators mbn(x)

(denoted by mN W P), mbN W(x) and mbLL(x), respectively, averaged along the 500 Monte Carlo trials

for the two sample sizes, ρ = 0.1, P C = 10%, θ = 60%, an and hn from Table 1. From this Figure it

is seen that all estimators perform better with a larger n, and that the new NWP estimator and the LL estimator ﬁt better the theoretical curve specially at the boundaries.

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

Regression function and its estimators (n=100) m(x) NWP estimator LL estimator NW estimator −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

Regression function and its estimators (n=300) m(x)

NWP estimator LL estimator NW estimator

Figure 1. Regression function m(x) = sin(πx) and its estimators averaged along 500 Monte Carlo trials with ρ = 0.1,

(11)

4.2 Asymptotic normality

We explore how good is the asymptotic normal approximation of the estimator mbn(x) at x = 1.

Observe that the regression function in x = 1 is sin(π) = 0. In Figure 2, we plot the histograms and the Q-Q plots of mbn(x) for ρ = 0.1, P C = 10%, θ = 60%, an and hn from Table 1, based on

M = 250 replications with sample size n = 100 and 300, respectively. From Figure 2, it is seen that the normality in the distribution of the estimators is acceptable for the two sample sizes n, and it is better when n increases.

Histogram (n=100) new estimator at x=1 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 2.0 2.5 Histogram (n=300) new estimator at x=1 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 2.0 2.5 −3 −2 −1 0 1 2 3 −0.5 0.0 0.5 1.0 new estimator (n=100) Normal Quantiles Empir ical Quantiles −3 −2 −1 0 1 2 3 −0.5 0.0 0.5 new estimator (n=300) Normal Quantiles Empir ical Quantiles

Figure 2. The histograms ofmbn(x) at x = 1, averaged along 250 Monte Carlo trials with ρ = 0.1, P C = 10%,

θ = 60%, anand hnfrom Table 1, n =100 and 300, respectively.

5 Real data application

In this section, we apply the new NWP regression estimator to the analysis of Spanish unemployment data, and we compare its performance with the performance of NW and LL estimators which were studied in Liang et al. (2015).

(12)

of married women living in Galicia (NW of Spain), recruited by means of quarterly inquires at the individuals’ homes. Several covariates such as age or education are present too. The data are left-truncated because the sampled information corresponds to those women unemployed by the inquiry date. Moreover, limitation in the following-up period results in right-censoring; specifically, 56% of the unemployment times are right-censored. Additionally, the survey is conducted for periods, so that the women are grouped according to the inquiry date (38 clusters). Some dependence among the unemployment times within each cluster is expected because the specific economic situation in the corresponding period. This dependence can be modeled using the α-mixing condition assumed in this work. See de Uña- Álvarez and Iglesias-Pérez (2010) and Liang et al. (2015) for further description of this data set.

A regression analysis on the covariate ’age (in years) when entering the unemployed stock’ (X) was performed. NW and LL estimators of the conditional mean function m(x) were computed by Liang et al. (2015) using the biweight kernel and their bandwidths being the minimizers of the cross-validation (CV) criterion introduced in their Section 3 (with r = 3). Speciﬁcally, hn = 50 for both

estimators, and an took values 10.8 for the NW estimator and 10.1 for the LL estimator. Here, the

new NWP estimator was computed with the same kernel, and bandwidths hn= 50 and an= 10.5, for

comparative purposes. An outlier corresponding to X = 15 was removed for the three estimations. The estimators are displayed in Figure 3. The new NWP estimator is closer to the LL estimator, taking values higher than the NW estimator at the left boundary. The three estimators provide roughly the same regression curve for X between 23 and 53 years, showing a decreasing trend of the unemployment time from 20 to 30 years, and an increasing trend between 30 and 50 years. This indicates that women with intermediate ages (around 30) at the beginning of unemployment are the ones with the shortest expected unemployment times. This is in agreement with previous results (de Uña- Álvarez and Iglesias-Pérez (2010); Liang et al. (2015)).

6 Proof of Main Results

Set Θj = _n1

∑n i=1χ

j

ni(x) and χni(x) = (Xi− x)Λan(Xi− x).

Lemma 6.1 Let α(n) = O(n−λ) for some λ > 0. Suppose that (A3), (A4) and (A6) are satisﬁed. (i) Let nan→ ∞ and λ ≥ 2.1. If na1+rn 0 = O(1) for some constant r0≥ 2, then

Θ1 = a2n(v∗(x))′∆21+ Op((a_nn)1/2) + O(an3), Θ2 = anv∗(x)∆22 + Op((a_nn)1/2) + O(a3n), η =

Op((nan)−1/2+ an) and max1≤i≤n|ηχni(x)| = op(1);

(ii) Let∑∞_n=1_a1

n(

ln(n) nan)

(λ−1)/2_<_{∞ for some λ > 3. If na}1+r0

n = O(1) and ln(n)/(nan)1−2/r0 = O(1)

for some constant r0 > 2, then

Θ1 = a2n(v∗(x))′∆21 + O((anln(n)_n )1/2) + O(a3n) a.s., Θ2 = anv∗(x)∆22 + O((anln(n)_n )1/2) + O(a3_n) a.s., η = O((ln(n)_na

n)

1/2_{+ a}

(13)

20 30 40 50 60 70

0

50

100

150

age when entering the unemployed stock (years)

umemplo

yment time (months)

NW estimator LL estimator NWP estimator uncensored data censored data

Figure 3. Estimators LL (dashed line) and NW (dotted line) based on CV bandwidths and NWP estimator (solid

line) with hn= 50 and an= 10.5. Uncensored and censored data are showed (black and white circles respectively).

Proof. (i) We prove only Θ1 = a2n(v∗(x))′∆21+ Op((an/n)1/2) + O(a3n), the proof related to Θ2 is

similar. Note that Θ1= EΘ1+ Op(

√

Var(Θ1)). So, we need to evaluate that EΘ1 and Var(Θ1).

In view of (A3) and (A6), we have EΘ1 = 1 an E(Xi− x)Λ (_X i− x an ) = a2_n(v∗(x))′∆21+ O(a3n). (6.1)

Write Var(Θ1) = 1_nVar(χn1(x)) + _n22

∑

1≤i<j≤nCov(χni(x), χnj(x)) := Θ11+ 2Θ12. From (A3) and

(A6), it is easy to verify that Θ11= O(an/n).

For i < j, applying (A3), (A4) and (A6) we have|Cov(χni(x), χnj(x))| = O(a2n). On the other hand,

from Lemma 7.1 (take p = q = 20λ) it follows that|Cov(χni(x), χnj(x))| ≤ C[α(j −i)]1−1/(10λ)a1/(10λ)n .

Then Θ12 = O(an/n). Therefore Var(Θ1) = O(an/n), which, together with (6.1), implies that Θ1 = a2_n(v∗(x))′∆21+ Op((an/n)1/2) + O(a3n).

From (2.5), it is easy to see that

0 =1 n n ∑ i=1 χni(x) 1 + ηχni(x) =_n1 n ∑ i=1 χni(x) ( 1− ηχni(x) 1 + ηχni(x) )_≥ |η|Θ2 1+max1≤i≤n|ηχni(x)|− |Θ1|. (6.2)

Hence, applying the proof of Lemma 3 in Owen (1990), it follows that η = Op(an+ (nan)−1/2) and

(14)

(ii) From (6.1) we have Θ1 = 1 nan n ∑ i=1 [ (Xi− x)Λ (_X i− x an ) − E((Xi− x)Λ (_X i− x an ))] + a2_n(v∗(x))′∆21+ O(a3n).

Using Lemmas 7.5 and 7.6, one can prove that 1 nan n ∑ i=1 [ (Xi− x)Λ (_X_i_{− x} an ) − E((Xi− x)Λ (_X_i_{− x} an ))] = O ((_a_n_ln(n) n )1/2) a.s.

Similarly, one can verify that Θ2= 1 na2 n n ∑ i=1 (Xi− x)2Λ2 (_X_i_{− x} an ) = anv∗(x)∆22+ O ((_a_n_ln(n) n )1/2) + O(a3_n) a.s. Hence, from ln(n)_na n → 0 and (6.2) we have |η|

1+max1≤i≤n|ηχni(x)| = O(an + (

ln(n) nan)

1/2_{) a.s. and η =} O(an+ (ln(n)_na_n)1/2) a.s., further we have max1≤i≤n|ηχni(x)| = o(1) a.s.

Proof of Theorem 3.1. Let v_n∗(x) =∑n_i=1bpi(x)Λan(Xi− x) and write

b mn(x)− m(x) = 1 v_n∗(x) {∑n i=1 ( bY_i∗− Y_i∗)bpi(x)Λan(Xi− x) + n ∑ i=1 (Y_i∗− m(Xi))bpi(x)Λan(Xi− x) + n ∑ i=1 (m(Xi)− m(x))bpi(x)Λan(Xi− x) } :=(v∗_n(x))−1[R1n(x) + R2n(x) + R3n(x)].

It suﬃces to show that v∗_n(x)→ vP ∗(x) = θ−1θ(x)v(x), R1n(x) = Op

( (ln(n)/(nhn))1/2+ hkn0), R2n(x) = Op((nan)−1/2+ a3n) and R3n(x) = _2θ1a2nθ(x)v(x)m′′(x)∆21+ op(a2n). (I) We prove v_n∗(x) = v∗(x) + op(1) = θ−1θ(x)v(x) + op(1). From bpi(x) = _n1 ·_1+ηχ1_ni_(x) = _n1(1−_1+ηχηχni_ni(x)_(x)) we have v_n∗(x) = 1 nan n ∑ i=1 Λ (_X i− x an ) − η nan n ∑ i=1 χni(x) 1 + ηχni(x) Λ (_X i− x an ) := D1n(x)− D2n(x).

From (A3) and (A6) it follows that ED1n(x) =

∫

Λ(u)v∗(x + anu)du = v∗(x) + o(1). Similarly to the

arguments as in (i) of the proof of Lemma 6.1 it is easy to verify that Var(D1n(x)) → 0. Therefore, D1n(x) = v∗(x) + op(1). Note that |D2n(x)| ≤ |η| 1− max1≤i≤n|ηχni(x)|· 1 na2 n n ∑ i=1 |Xi− x|Λ2 (_X i− x an ) and E{_na12 n ∑n i=1|Xi−x|Λ2( Xi−x an )} = ∫

|u|Λ2_(u)v∗_(x+a

nu)du = O(1). Then D2n(x) = Op((nan)−1/2+

an) = op(1) by Lemma 6.1(i).

(II) We prove R1n(x) = Op

(

(15)

Since x ∈ I, [x − Can, x + Can] ∈ Iϵ for large n and 0 < C∗ ≤ C(y|x) ≤ C∗ < ∞ for (x, y) ∈

Iϵ× [τ1, τ2] by C(y|x) = θ−1(x)L(y|x)(1 − H(y|x)) and (A2). Then from (A6) we have |R1n(x)| ≤ 1 1− max1≤i≤n|ηχni(x)|· 1 nan { sup x∈Iϵ sup τ1≤y≤τ2 | bFn(y|x) − F (y|x)| n ∑ i=1 δi|ϕ(Zi)|Λ(X_ai−x_n ) C(Zi|Xi)

+ supx∈Iϵsupτ1≤y≤τ2| bCn(y|x) − C(y|x)|

C_∗− sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bCn(y|x) − C(y|x)| n

∑

i=1

δi|ϕ(Zi)|Λ(X_ai−x_n )(1− F (Zi|Xi))

C(Zi|Xi)

+supx∈Iϵsupτ1≤y≤τ2| bCn(y|x) − C(y|x)|| bFn(y|x) − F (y|x)|

C_∗− sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bCn(y|x) − C(y|x)|

n ∑ i=1 δi|ϕ(Zi)|Λ(X_ai−x_n ) C(Zi|Xi) } . Note that E(_na1 n ∑n i=1 δi|ϕ(Zi)|Λ(Xi−x_an ) C(Zi|Xi) ) = O(1), E(_na1 n ∑n i=1 δi|ϕ(Zi)|Λ(Xi−x_an )(1−F (Zi|Xi)) C(Zi|Xi) ) = O(1).

There-fore, using Lemma 6.1(i) and Lemma 7.3, it follows that R1n(x) = Op

(

(ln(n)/(nhn))1/2+ hkn0).

(III) We prove R3n(x) = _2θ1 a2nθ(x)v(x)m′′(x)∆21+ op(a2n).

In view of (2.4) we have R3n(x) = 1₂

∑n

i=1m′′(Xi∗)(Xi− x)2bpi(x)Λan(Xi− x), where Xi∗ is between

Xi and x, and from bpi(x) = _n1(1−_1+ηχηχni(x)

ni(x)) we write R3n(x) = 1 2nan n ∑ i=1 m′′(X_i∗)(Xi− x)2Λ (_X i− x an )[ 1− ηχni(x) 1 + ηχni(x) ] := R31n(x)− R32n(x). (6.3) We observe that E ( ₁ 2nan n ∑ i=1 m′′(Xi)(Xi− x)2Λ (_X i− x an )) = 1 2θa 2 nθ(x)v(x)m′′(x) ∫ Rs 2_{Λ(s)ds + o(a}2 n).

Using the similar arguments as those employed in the proof of Lemma 6.1(i) one can verify that Var ( ₁ 2nan n ∑ i=1 m′′(Xi)(Xi− x)2Λ (_X i− x an )) = O(a3_n/n).

Hence, from (A0) and nan→ ∞ we have R31n(x) = _2θ1 a2nθ(x)v(x)m′′(x)∆21+ op(a2n).

From _2na1 n ∑n i=1|m′′(Xi∗)|(Xi − x)2Λ( Xi−x an ) = Op(a 2

n), using Lemma 6.1 we have |R32n(x)| = op(a2n). Therefore, from (6.3) we obtain that R3n(x) = _2θ1a2nθ(x)v(x)m′′(x)∆21+ op(a2n).

(IV) We prove R2n(x) = Op((nan)−1/2+ a3n).

From bpi(x) = _n1{1 − ηχni(x) + η2χ2ni(x)− η3_χ3 ni(x) 1+ηχni(x)}, we write R2n(x) = 1 nan n ∑ i=1 (Y_i∗− m(Xi))Λ (_X i− x an )[ 1− ηχni(x) + η2χ2ni(x)− η3χ3_ni(x) 1 + ηχni(x) ] :=R21n(x)− R22n(x) + R23n(x)− R24n(x).

It is easy to verify that _na1

n

∑n

i=1E{(Yi∗− m(Xi))Λ(X_ai−x_n )χlni(x)} = 0 for l = 0, 1, 2. Note that

Var(R21n(x)) = 1 na2 n E [ (Y_i∗− m(Xi))Λ (_X i− x an )]2

(16)

+ 2 n2_a2 n ∑ 1≤i<j≤n Cov ( (Y_i∗− m(Xi))Λ (_X i− x an ) , (Y_j∗− m(Xj))Λ (_X j− x an )) . (6.4)

Since Y_i∗ and m(·) are bounded by (H), from (H), (A3), (A4) and (A6) it follows that _na12

nE [ (Y_i∗− m(Xi))Λ (_X i−x an )]2

= O((nan)−1) and similarly to the arguments as those employed in the proof of

Lemma 6.1(i) we have 2 n2_a2 n ∑ 1≤i<j≤n Cov((Y_i∗− m(Xi))Λ (_X_i_{− x} an ) , (Y_j∗− m(Xj))Λ (_X_j_{− x} an ))_{= o((na} n)−1).

Therefore Var(R21n(x)) = O((nan)−1), which gives that R21n(x) = Op((nan)−1/2).

Similarly to the arguments as for (6.4) one can verify that Var{_na1

n

∑n

i=1(Yi∗−m(Xi))Λ(X_ai−x_n )χlni(x)} =

O((nan)−1) for l = 1, 2. Then from Lemma 6.1 we have

R22n(x) = Op((an/n)1/2+ (nan)−1), R23n(x) = Op((a3n/n)1/2+ (nan)−3/2).

From (H), (A3) and (A6), it is easy to verify that _na14

n

∑n

i=1|(Yi∗− m(Xi))(Xi− x)3|Λ4(X_ai−x_n )} =

Op(1). Then, in view of Lemma 6.1 it follows that |R24n(x)| = Op(a3n+ (nan)−3/2).

Proof of Theorem 3.2. From the proof of Theorem 3.1 it suﬃces to show that √ nanR21n(x)→ ND ( 0, ∆02θ−1θ(x)v(x)σ2(x) ) .

In fact, set Ξi(x) = a−1/2n (Yi∗−m(Xi))Λ(X_ai−x_n ). Then EΞi(x) = 0 and√nanR21n(x) = n−1/2

∑n

i=1Ξi(x).

Note that (A8) implies that there exists a sequence of positive integers δn → ∞ such that δnq =

o((nan)1/2), δn(na−1n )1/2α(q)→ 0. Let π := πn= [_p+qn ] and p := pn= [(nan)1/2/δn]. Then

q/p→ 0, πα(q) → 0, πq/n → 0, p/n → 0, p/(nan)1/2→ 0.

Applying Bernstein’s big-block and small-block procedure, Following the proof line as in Liang et al.

(2015), one can prove the conclusion.

Proof of Theorem 3.3. From the proof of Theorem 3.1 one can write

(nan)1/2(mbn(x)− m(x)) Γ(x) = v∗(x) v_n∗(x)· (nan)1/2 v∗(x)Γ(x) · 1 n {_∑n i=1 (Y_i∗− m(Xi))Λan(Xi− x) + n ∑ i=1 ( bY_i∗− Y_i∗)Λan(Xi− x) 1 + ηχni(x) − η n ∑ i=1 (Y_i∗− m(Xi))Λan(Xi− x)χni(x) 1 + ηχni(x) +1 2 n ∑ i=1 [m′′(X_i∗)(Xi− x)2Λan(Xi− x) − E(m′′(Xi∗)(Xi− x)2Λan(Xi− x))] −η 2 n ∑ i=1 m′′(X_i∗)(Xi− x)2Λan(Xi− x)χni(x) 1 + ηχni(x) +1 2 n ∑ i=1 E(m′′(X_i∗)(Xi− x)2Λan(Xi− x)) } := v ∗_(x) v∗_n(x)[I1n(x) + I2n(x)− I3n(x) + I4n(x)− I5n(x) + I6n(x)],

(17)

where X_i∗ is between Xi and x.

Let γ7n = (ln(n)/(nan))1/4 + a1/2n , γ8n = (ln(n)/(nan))1/3 + (na7n)1/4 + a 2/3

n , γ9n = a4/3n and

γ10n = (a4nln(n))1/4+ (na7n)1/4. Using Lemma 7.7 we have

Then, it suﬃces to show that P(v ∗ n(x) v∗(x)− 1 > cγ7n )

= O(γ7n+ γ3n), P (|I2n(x)| > cγ2n) = O(γ2n), P (|I3n(x)| > cγ8n) = O(γ8n), P (|I4n(x)| > cγ9n) = O(γ9n), P (|I5n(x)| > cγ10n) = O(γ10n), I6n(x) = O((na5n)1/2)

and sup_u|P (I1n(x)≤ u) − Φ(u)| = O

( (µ−1ν)1/3_{+ (n}−1_µ)1/3_{+ a}(1−ρ)/3 n + γ4n1/3+ γ5n+ γ6n1/4 ) . (V) We prove P (|v∗n(x) v∗(x) − 1| > cγ7n) = O(γ7n+ γ3n). Write v∗_n(x)− v∗(x) = 1 nan n ∑ i=1 [ Λ (_X_i_{− x} an ) − E(Λ (_X_i_{− x} an ))] + [ ₁ an E ( Λ (_X₁_{− x} an )) − v∗(x) ] − η nan n ∑ i=1 χni(x) 1 + ηχni(x) Λ (_X i− x an ) .

Note that (A3) and (A6) imply that|_a1

nE(Λ(

X1−x

an ))− v

∗_(x)_{| ≤ Ca}2

n, and from Lemma 6.1(ii) we have

E_naη n ∑n i=1 χni(x) 1+ηχni(x)Λ (_X i−x an ) ≤ C ( (ln(n)/(nan))1/2 + an )

. Following the proof line as in Lemma 6.1(ii), applying Lemmas 7.5-7.6 we have

P ( ₁ nan n ∑ i=1 [ Λ (_X i− x an ) − E(Λ (_X i− x an ))]_{> c} 0 (_ln(n) nan )1/4) ≤ C an (_ln(n) nan )(λ−1)/2 = O(γ3n).

(VI) We prove P (|I2n(x)| > cγ2n) = O(γ2n). From the proof in (II) above we have |I2n(x)| = (nan)1/2 v∗(x)Γ(x)|R1n(x)| ≤ (nan)1/2 v∗(x)Γ(x)(1− max1≤i≤n|ηχni(x)|) · 1 nan { sup x∈Iϵ sup τ1≤y≤τ2 | bFn(y|x) − F (y|x)| n ∑ i=1 δi|ϕ(Zi)|Λ(X_ai−x_n ) C(Zi|Xi)

+ supx∈Iϵsupτ1≤y≤τ2| bCn(y|x) − C(y|x)|

C_∗− sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bCn(y|x) − C(y|x)| n

∑

i=1

δi|ϕ(Zi)|Λ(X_ai−x_n )(1− F (Zi|Xi))

C(Zi|Xi)

+supx∈Iϵsupτ1≤y≤τ2| bCn(y|x) − C(y|x)|| bFn(y|x) − F (y|x)|

C_∗− sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bCn(y|x) − C(y|x)|

n ∑ i=1 δi|ϕ(Zi)|Λ(X_ai−x_n ) C(Zi|Xi) } .

(18)

Lemma 6.1(ii) and Lemma 7.3 we obtain that P (|I2n(x)| > cγ2n) = O(γ2n) with γ2n= (anln(n)/hn)1/4+

(nanh2kn0)1/4.

(VII) We prove P (|I3n(x)| > cγ8n) = O(γ8n). Write I3n(x) = (nan)1/2 v∗(x)Γ(x) [ _η nan n ∑ i=1 (Y_i∗− m(Xi))Λ (_X i− x an ) χni(x) − η2 nan n ∑ i=1 (Y_i∗− m(Xi))Λ (_X i− x an ) χ2_ni(x) + η 3 nan n ∑ i=1 (Y_i∗− m(Xi))Λ(X_ai−x_n )χ3ni(x) 1 + ηχni(x) ] :=I31n(x)− I32n(x) + I33n(x). (6.5)

From (IV) above, we have E{_na1

n

∑n

i=1(Yi∗−m(Xi))Λ(X_ai−x_n )χlni(x)}2= O((nan)−1) for l = 1, 2. Then,

using Lemma 6.1(ii), it follows that EI_31n2 (x) = O(ln(n)_na

n + a 2 n) and EI32n2 (x) = O(( ln(n) nan ) 2_{+ a}4 n), which imply that P (|I31n(x)− I32n(x)| > c((ln(n)/(nan))1/3+ a2/3n )) = O((ln(n)/(nan))1/3+ a2/3n )). (6.6)

Note that E|I33n(x)| ≤ E{ (nan)

1/2_|η|3 v∗(x)Γ(x)(1−max1≤i≤n|ηχni(x)|)· 1 na4 n ∑n i=1|(Yi∗−m(Xi))(Xi−x)3|Λ4(X_ai−x_n )} =

O(ln3/2(n)/(nan)) + (na7n)1/2), which gives that

P (|I33n(x)| > c((ln3/2(n)/(nan))1/2+ (na7n)1/4)) = O((ln3/2(n)/(nan))1/2+ (na7n)1/4). (6.7)

From (6.5)-(6.7) and (ln(n)/(nan))1/3 > (ln3/2(n)/(nan))1/2 we ﬁnd

P (|I3n(x)| > c((ln(n)/(nan))1/3+ (na7n)1/4+ a2/3n )) = O((ln(n)/(nan))1/3+ (na7n)1/4+ a2/3n ).

(VIII) We evaluate|I6n(x)|, and prove that P (|I4n(x)| > cγ9n) = O(γ9n) and P (|I5n(x)| > cγ10n) = O(γ10n). Write EI_4n2 (x) = 1 4(v∗(x)Γ(x))2 { ₁ an Var ( m′′(X_i∗)(Xi− x)2Λ (_X i− x an )) + 2 nan ∑ 1≤i<j≤n Cov ( m′′(X_i∗)(Xi− x)2Λ (_X i− x an ) , m′′(X_j∗)(Xj − x)2Λ (_X j − x an ))} .

The proof in (III) shows that I6n(x) = O((na5n)1/2), and EI4n2 (x) = O(a4n), which gives that P (|I4n(x)| > ca4/3n ) = O(a4/3n ). Note that using Lemma 6.1(ii) we obtain that E|I5n(x)| = O((a4nln(n))1/2 +

(na7_n)1/2), which gives that P (|I5n(x)| > c((a4nln(n))1/4+ (na7n)1/4)) = O((a4nln(n))1/4+ (na7n)1/4).

(IX) We verify sup_u|P (I1n(x) ≤ u) − Φ(u)| = O((µ−1ν)1/3 + (n−1µ)1/3 + a(1n−ρ)/3 + γ4n1/3 + γ5n+ γ6n1/4). Let ξi(x) = (v∗(x)Γ(x))−1a−1/2n (Y_i∗ − m(Xi))Λ(X_ai−x_n ). Then Eξi(x) = 0 and I1n(x) = n−1/2∑n_i=1ξi(x). Put w := wn= [_µ+νn ] and deﬁne

dmn(x) = km∑+µ−1 i=km ξi(x), d′mn(x) = lm∑+ν−1 i=lm ξi(x), d′′wn(x) = n ∑ i=w(µ+ν)+1 ξi(x),

(19)

where km= (m− 1)(µ + ν) + 1, lm = (m− 1)(µ + ν) + µ + 1, m = 1, · · · , w. Then I1n(x) = n−1/2 {_∑w m=1 dmn(x) + w ∑ m=1 d′_mn(x) + d′′_wn(x) } := n−1/2{Ω′_n(x) + Ω′′_n(x) + Ω′′′_n(x)}.

n−1E(Ω′′_n(x))2 = O(τ1n), n−1E(Ω′′′n(x))2 = O(τ2n), (6.8)

sup

u |P (n −1/2_Ω′

n(x)≤ u) − Φ(u)| = O(µ−1ν + n−1µ + an1−ρ+ γ5n+ γ6n1/4). (6.9)

(i) We verify (6.8). Note that 1 nE(Ω ′′ n(x))2= 1 n w ∑ m=1 lm∑+ν−1 i=lm Var(ξi(x)) + 2 n w ∑ m=1 ∑ lm≤i<j≤lm+ν−1 Cov(ξi(x), ξj(x)) + 2 n ∑ 1≤i<j≤w Cov(d′_in(x), d′_jn(x)). (6.10)

The proof of of Theorem 3.2 shows that Var(ξi(x)) = (v∗(x)Γ(x))−2Var(Ξ(x)) = 1, and|Cov(ξi(x), ξj(x))| ≤

C min{an, [α(j− i)]1−1/(10λ)a−(1−1/(10λ))n

}

for i < j. Let cn= a−ρn for (20λ− 1)/[λ(10λ − 1)] ≤ ρ < 1,

then 1 n ∑ 1≤i<j≤n |Cov(ξi(x), ξj(x))| ≤ C { cnan+ a−(1−1/(10λ))n c−(λ−11/10)n } = O(a1_n−ρ). (6.11)

Using Lemma 7.1 again we have 1

n

∑

1≤i<j≤w

Cov(d′_in(x), d′_jn(x)) ≤ Cνµ −1a−ς/(2+ς)_n u(µ) = O(γ4n). (6.12)

From (6.10)-(6.12) it follows that n−1E(Ω′′_n(x))2= O(νµ−1+ a1n−ρ+ γ4n) = O(τ1n) and

1 nE(Ω ′′′ n(x))2= 1 n n ∑ i=w(µ+ν)+1 Eξ_i2(x) + 2 n ∑ w(µ+ν)+1≤i<j≤n Cov(ξi(x), ξj(x)) = O(τ2n).

(ii) We prove (6.9). Let emn(x), m = 1, 2,· · · , w be independent random variables where the

distri-bution of emn(x) is the same as that of dmn(x) for m = 1, 2,· · · , w. Put Un(x) = n−1/2

(20)

From Eξ_i2(x) = 1 and (6.11) it follows that s2_n = 1 + O(µ−1ν + n−1µ + a1n−ρ), which implies that

s2_n→ 1 and

sup

u |Φ(u/sn

)− Φ(u)| = O(|s2_n− 1|) = O(µ−1ν + n−1µ + a1_n−ρ). (6.14) By Berry-Esseen inequality (see Petrov (1995), page 154, Theorem 5.7), for l > 2 there exists some constant C > 0 such that

sup u |P (n −1/2_U n(x)≤ u) − Φ(u/sn)| ≤ C nl/2_sl n w ∑ m=1 E|emn(x)|l. (6.15)

Taking l = 2(1 + β), µ = ς− 2β, then l + µ = 2 + ς. Note that β ≤ [ςλ − (2 + ς)]/[2λ + (2 + ς)] implies that λ≥ (1 + β)(2 + ς)/(ς − 2β) = l(l + µ)/2µ. Then, using Lemma 7.4 (take p = l and q = l + µ) and E|ξ1(x)|2+ς ≤ Ca−ς/2n , we have ∑w m=1E|emn(x)|l = O ( nµβa−ς(1+β)/(2+ς)n )

, which, together with (6.15), yields that sup u |P (n −1/2_U n(x)≤ u) − Φ(u/sn)| = O ( n−(1+β)nµβa−ς(1+β)/(2+ς)_n )= O(γ5n). (6.16)

Assume that φ(t) and ψ(t) are the characteristic functions of n−1/2Ω′_n(x) and n−1/2Un(x), respectively.

Using Lemma 7.8, we have |φ(t) − ψ(t)| ≤ C|t|α1/2_(ν)n−1/2∑w m=1 { E∑km+µ−1 i=km ξi(x) 2}1/2 . From Eξ_i2(x) = 1 and |Cov(ξi(x), ξj(x))| ≤ C min

{ an, [α(j − i)]1−1/(10λ)a−(1−1/(10λ))n } for i < j we have E∑km+µ−1 i=km ξi(x) 2

= O(µ). Thus J1n = O(Υ(w2n−1µα(ν))1/2) = O(Υ(nµ−1α(ν))1/2) = O(Υγ6n1/2).

which yields that J2n= O(γ5n+ 1/Υ). Choose Υ = γ6n−1/4. Then from (6.17) we have

sup

u |P (n −1/2_Ω′

n(x)≤ u) − P (n−1/2Un(x)≤ u)| = O(γ5n+ γ6n1/4). (6.18)

Therefore, from (6.13), (6.14), (6.16) and (6.18) we have sup

u |P (n −1/2_Ω′

n(x)≤ u) − Φ(u)| = O(µ−1ν + n−1µ + a1n−ρ+ γ5n+ γ6n1/4).

(21)

7 Appendix

In this section, we give some preliminary Lemmas, which have been used in Section 6. Let{Zi, i≥ 1}

be a sequence of α-mixing real random variables with the mixing coeﬃcients{α(k)}.

Lemma 7.1 (Hall and Heyde (1980), Corollary A.2, page 278) Suppose that X and Y are

ran-dom variables such that E|X|p <∞, E|Y |q<∞, where p, q > 1, p−1+ q−1< 1. Then |EXY − EXEY | ≤ 8∥X∥p∥Y ∥q

{

sup

A∈σ(X),B∈σ(Y )

|P (AB) − P (A)P (B)|}1−p−1−q−1.

Lemma 7.2 (Volkonskii and Rozanov (1959)) Let V1,· · · , Vmbe α-mixing random variables

mea-surable with respect to the σ-algebraFj1

i1,· · · , F

jm

im, respectively, with 1≤ i1< j1 <· · · < jm≤ n, il+1−

jl ≥ w ≥ 1 and |Vj| ≤ 1 for l, j = 1, 2, · · · , m. Then |E(

∏m

j=1Vj)−

∏m

j=1EVj| ≤ 16(m − 1)α(w),

where F_ab = σ{Vi, a≤ i ≤ b} and α(w) is the mixing coeﬃcient.

Lemma 7.3 Let α(n) = O(n−λ) for some λ > 2. Suppose that conditions (A1)-(A4) are satisﬁed. (a) If (A5)(i) holds, then sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bCn(y|x) − C(y|x)| = Op(max{(ln(n)/(nhn))1/2, hkn0})

and sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bFn(y|x) − F (y|x)| = Op

(

max{(ln(n)/(nhn))1/2, hkn0

}) .

(b) If (A5)(ii) holds, then sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bCn(y|x)−C(y|x)| = O(max{(ln(n)/(nhn))1/2, hkn0}) a.s.

and sup_x_∈I_ϵsup_τ₁_≤y≤τ₂| bFn(y|x) − F (y|x)| = O

(

max{(ln(n)/(nhn))1/2, hkn0

}) a.s.

Proof. The proof of Lemma 7.3 comes from Theorem 2.1 of Liang et al. (2012).

Lemma 7.4 (Shao and Yu (1996), Theorem 4.1) Let 2 < p < q ≤ ∞. Assume that EZi = 0

and α(n) = O(n−γ) for γ > 0. Then there exists Q = Q(p, q, γ) < ∞ such that E|∑n_i=1Zi|p ≤

Qnp/2max1≤i≤n∥Zi∥pq if γ ≥ pq/[2(q − p)].

Lemma 7.5 (Liebscher (2001), Proposition 5.1) Assume that EZi= 0 and|Zi| ≤ S < ∞ a.s. (i =

1, 2,· · · , n). Set DN = max1≤j≤2NVar(

∑j

i=1Zi). Then, for n, N ∈ N, 0 < N ≤ n/2, ε > 0,

P(∑n_i=1Zi> ε ) ≤ 4 exp{−ε2 16 ( nN−1DN +1₃εSN )₋₁} + 32S_εnα(N ).

Lemma 7.6 (Liebscher (1996), Lemma 2.3) Assume α(n) ≤ C1n−q for some q > 1, C1 > 0. Let sup₁_{≤i,j≤n,i̸=j}|Cov(Zi, Zj)| := R∗(n) < ∞ be satisﬁed. Moreover, let Rm(n) < ∞ for some

m, 2q/(q− 1) < m ≤ ∞, where Rm(n) = sup1≤i≤n(E|Zi|m)1/m, for 1 ≤ m < ∞, and R∞(n) =

sup₁_≤i≤ness sup_w_∈Ω|Zi|. Then Var

( ∑n i=1Zi

)

≤ n{C2(q, m)(Rm(n))2m/(q(m−2))(R∗(n))1−m/(q(m−2))+

R2₂(n)}holds with C2(γ, m) := 20q_q_−1−2q/m−40q/mC11/q.

Lemma 7.7 Let X, V and Y1,· · · , Ym be random variables, then for positive numbers a, w1, · · · , wm

(22)

Proof. The ﬁrst inequality is a consequence of Michel and Pfanzagl (1971) and the second one follows

from Lemma 3.1 of Liang and Fan (2009).

Lemma 7.8 (Yang and Li (2006)) Let p and q be positive integers. Set ηr =

∑(r−1)(p+q)+p j=(r−1)(p+q)+1Zj

for 1 ≤ r ≤ w. If s > 0, r > 0 with 1/s + 1/r = 1, then there exists constant C > 0 such that |E exp(it∑w r=1ηr)− ∏w r=1E exp(itηr)| ≤ C|t|α1/s(q) ∑w r=1∥ηr∥r.

Acknowledgments. The authors are grateful to the Editor, an associate Editor, and two anonymous

referees for providing a detailed list of comments which greatly improved the presentation of the paper. The ﬁrst author was supported by the National Natural Science Foundation of China (11671299). The second author was supported by the Grant MTM2014-55966-P of the Spanish Ministry of Economy and Competitiveness.

References

[1] Bercu, B., Nguyen, T. M. N., Saracco, J. (2015). On the asymptotic behaviour of the recur-sive Nadaraya-Watson estimator associated with the recurrecur-sive sliced inverse regression method. Statistics 49(3), 660-679.

[2] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. Springer, New York.

[3] Cai, J., Kim, J. (2003). Nonparametric quantile estimation with correlated failure time data. Lifetime Data Analysis 9, 357-371.

[4] de Uña- Álvarez, J., Iglesias-Pérez, M. C. (2010). Nonparametric estimation of a conditional dis-tribution from length-biased data. Ann. Inst. Statist. Math. 62, 323-341.

[5] de U˜na- ´Alvarez, J., Liang, H. Y., Rodr´ıguez-Casal, A. (2010). Nonlinear wavelet estimator of the regression function under left truncated dependent data. J. Nonparametric Statist. 22(3), 319-344.

[6] Doukhan. P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics, vol. 85. Springer, Berlin.

[7] El Ghouch, A., Van Keilegom, I. (2008). Non-parametric regression with dependent censored data. Scand. J. Statist. 35(2), 228-247.

[8] Fan, J., Gijbels, I. (1996). Local Polynomial Modeling and Its Applications. Chapman & Hall, London.