• No results found

Adaptive group bridge estimation for high dimensional partially linear models

N/A
N/A
Protected

Academic year: 2020

Share "Adaptive group bridge estimation for high dimensional partially linear models"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

R E S E A R C H

Open Access

Adaptive group bridge estimation for

high-dimensional partially linear models

Xiuli Wang and Mingqiu Wang

*

*Correspondence: [email protected] School of Statistics, Qufu Normal University, Jingxuan West Road, Qufu, 273165, P.R. China

Abstract

This paper studies group selection for the partially linear model with a diverging number of parameters. We propose an adaptive group bridge method and study the consistency, convergence rate and asymptotic distribution of the global adaptive group bridge estimator under regularity conditions. Simulation studies and a real example show the finite sample performance of our method.

MSC: 62E20; 62J07; 62F12

Keywords: adaptive group bridge; high dimension; partially linear model

1 Introduction

Consider the following model:

Y= xTβ+f(U) +ε, ()

where x = (xT

,xT, . . . ,xTpn)

Tis a covariate vector withx

j= (Xjk,k= , . . . ,dj)Tbeing adj× vector corresponding to thejth group in the linear part,β= (βTj,j= , . . . ,pn)T withβj being thedj× vector of regression coefficients,f is an unknown function ofU, and εis the random error with mean zero. Without loss of generality,U is scaled to [, ]. Furthermore, (x,U) andεare independent.

Variable selection for high-dimensional data is a hot and important issue. Penalized regression methods have been widely used in the literature such as [–], and so on. Among these methods, bridge regression including lasso and ridge as two well-known special cases has been studied by many authors (e.g., [–]). [] studied adaptive bridge estimation for high-dimensional linear models. In addition, group structure of variables arise always in many contemporary statistical modeling problems. [] proposed a group bridge method which not only effectively removes unimportant groups, but also maintains the flexibility of selecting variables within identified groups. [] investigated an adaptive choice of the penalty order in group bridge regression.

The aforementioned model () is just the partially linear model that originated from []. The partially linear model is a common semiparametric model enjoying the inter-pretability and flexibility. Our contributions in this paper include: () we propose an adap-tive group bridge method to achieve the group selection for a high-dimensional partially linear model; () we consider the choice of indexγ in the adaptive group bridge and use

(2)

leave-one-observation-out cross-validation (CV) to implement this choice. It can signif-icantly reduce the computational burden; () we give the consistency, convergence rate and asymptotic distribution of the adaptive group bridge estimator which is the global minimizer of the objective function.

The rest of the article is organized as follows. Section  gives the adaptive group bridge method. In Section , we show the assumptions and asymptotic results for the global adap-tive group bridge estimator. Section  shows computational algorithm and selection of tuning parameters. Simulation studies and real data are presented in Section . Section  gives a short discussion. Technical proofs are relegated to Appendix.

2 Adaptive group bridge in the partially linear model

Suppose that we have a collection of independent observations{(xi,Ui,Yi), in}from model (). That is,

Yi= xTiβ+f(Ui) +εi, i= , . . . ,n, ()

whereε, . . . ,εnare i.i.d. random errors with mean zero and finite varianceσ<∞. To obtain an estimate of functionf(·), we employ a B-spline basis. DenoteSnas the space of polynomial splines of degreem≥. Let{Bk(u), ≤kqn}be a normalized B-spline basis withBk∞≤, where · ∞is the sup norm. Then, for anyfnSn, we have

fn(u) = qn

j=

Bj(ujB(u)Tα.

Under some smoothness conditions, the nonparametric functionf can well be approxi-mated by functions inSn.

Consider the following adaptive group bridge penalized objective function:

n

i=

Yi– xTi β– B(Ui)Tα

+ pn

j=

λjβ, ()

where λj, j= , . . . ,pn, are the tuning parameters, and · denotes the L norm on

the Euclidean space. Let Y = (Y, . . . ,Yn)T, X= (Xijk, in, ≤jpn, kdj) =

(x, . . . , xn)Tand Z = (B(U), . . . , B(Un))T. Then () can be changed into

Ln(β,α) =Y–Xβ– Zα+ pn

j=

λjβ. ()

For someβ, the optimalαminimizingLn(·) meets the partial differential equation

∂Ln(β,α)/∂α= ,

namely,

(3)

LetH= Z(ZTZ)–ZT, note thatHis a projection matrix. We can rewrite the expression () as follows:

Qn(β) =(IH)(Y –Xβ)

+ pn

j=

λjβ. ()

For some fixedγ > , defineβˆ=arg minQn(β), thenβˆ is called the adaptive group bridge estimator. Ifβˆ is obtained, then the estimatorαˆ can be achieved. Thus we can get the estimator of the nonparametric part, namely,ˆfn(u) = B(u)Tα.ˆ

3 Asymptotic properties

In this section, we show the oracle property of the parametric part. For convenience of the statement, we first give some notations. Define g(u) =E(x|U=u) andx˜= x –E(x|U). Let(u) be the conditional covariance matrix ofx˜, i.e.,(u) =cov(x˜|U=u). Denote as the unconditional covariance matrix ofx˜, i.e., =E[(U)]. The corresponding sample version is G = (g(U), . . . , g(Un))Twith g(Ui) =E(xi|Ui) andX= (x˜, . . . ,x˜n)Twithx˜i= xiE(xi|Ui).

Let the true parameter be β= (βT, . . . ,βTpn)T T

,βT)T. Let A={≤jpn : βj = }be the index set of the nonzero groups. Without loss of generality, we assume that coefficients of the firstkngroup are nonzero, i.e.,A={, , . . . ,kn}. Let|A|=knbe the cardinality of the setA, which is allowed to increase withn. Forj∈/A,βj= . De-fineβ= (βTj,jA)T,β

= (βTj,j∈/A)T. Letd∗=max≤jpndj,ϕn=max{λj,jA}and ϕn=min{λj,j∈/A}.

Corresponding to the partition ofβ, denoteβˆ= (βˆT(),βˆT())Tand decompose

X= (XX), G= (GG), X= (XX), =

 

 

.

The following conditions are required for the B-spline approximation of functionf.

(C) The distribution ofUis absolutely continuous, and its density is bounded away

fromand∞.

(C) (Hölder conditions off(·)andgj(·), wheregjis thejth component ofg) Letl,δand Mbe real constants such that <δ≤andM> .f(·)andgj(·)belong to a class of functionsH,

H= h:h(l)(u) –h(l)(u)≤M|u–u|δ,for≤u,u≤

,

where <lm– andr=l+δ.

The following part lists all the reasonable conditions which are necessary to attain the asymptotic results.

(A) Letλmax( )andλmin( )be the largest and smallest eigenvalue of , respectively.

There exist constantsτandτsuch that

(4)

(A) There exist constants <b<b<∞such that

b≤min βj, ≤jkn

≤max βj, ≤jkn

b.

(A) n–XT(IH)X P ;E[tr(XT(IH)X)] =O(npn).

(A) d∗=O(),pn/n→andn–ϕnkn→.

(A) (a)ϕnkn//( √

npn+npnqnr)→; (b)ϕn(

n–p

n+√pnqnr)γ–/n→ ∞.

(A) For every≤jpnand≤kdj,E[XjkE(Xjk|U)]is bounded. Furthermore,

E(ε)is bounded.

Conditions (A) and (A) are commonly used. Condition (A) holds under some condi-tions. The proof can be found in Lemmas  and  in []. Condition (A) is used to obtain the consistency of the estimator. Condition (A) is needed in the proof of convergence rate. Condition (A) is necessary to attain the asymptotic distribution.

Theorem .(Consistency) Suppose thatγ > and conditions(A)-(A)hold,then

ˆββ=OP

n–dpn+qn–r+n–ϕnkn

,

namely, ˆββP .

Theorem . implies that under some conditions the estimators converge to the true values of parameters.

Theorem .(Convergence rate) Suppose that conditions(A)-(A)hold,then

ˆββ=OP

n–p

n+√pnqnr

.

This theorem shows that the adaptive group bridge can give the optimal convergence rate withpn→ ∞.

Theorem .(Oracle property) Suppose that <γ < ,n–k

nqn→and nq–n r→.If conditions(A)-(A)are satisfied,then we have

(i) Pr(βˆ()= )→,n→ ∞; (ii) Letu

n=nωTn(XT(IH)X)– (XT(IH)X)–ωnwithωnbeing some kn

j=dj-vector withωn= ,then

n/u–n ωTn(βˆ()β)→D N,σ.

This theorem states that the adaptive group bridge performs as well as the oracle [].

4 Computational algorithm and selection of tuning parameters

4.1 Computational algorithm

(5)

We take the initial valueβ(). Here the ordinary least square estimate is chosen as the initial valueβ(). The penalty termpλj(βj) =λjβ can be approximated as

pλj

βjpλjβ

()

j +   p

λjβ

()

j /β

()

j βj–β

()

j

,

whenβ()j > . The following iterative expression ofβcan be obtained:

β()=XT(IH)X+n

λ,γ

β()–XT(IH)Y, ()

where

λ,γ

β()=diag

p λj(β

()

j )

β()j Idj,j= , . . . ,pn

,

withIdjbeing adj×djunit matrix. If someβ

()

j is smaller than –, then we setβ

()

j = . The finial estimate can be obtained iteratively by formula () until the convergence is achieved.

4.2 Selection of the tuning parameters

For our method,qn,γ, andλj(j= , . . . ,pn) should be chosen. For convenience, cubic spline basis (m= ) is used. We setqn= . Simulation results demonstrate that this choice per-forms quite well. There are also many tuning parameters that should be chosen. In fact, we only need to select one tuning parameter by settingλj=λ/β()j . We use ‘leave-one-observation-out’ cross-validation (CV) to selectλandγ. Due to the convergence of the algorithm, we have

ˆ

β=XT(IH)X+,γ(β)ˆ

–XT(IH)Y,

whereβˆ is obtained based on the whole data set. Note that it is the solution of the ridge regression

Y∗–X∗β+nβ,γ(β)β,ˆ ()

where Y∗= (IH)Y andX∗= (IH)X. Let Y∗= (y, . . . ,yn)TandX= (x

, . . . , xn)T. The CV error is

CV(λ,γ) =  n

n

i=

yi – xiTβˆ–i,

whereβˆ–iis achieved by solving () without theith observation. The computation of the CV error is intensive, so we will use the following formula, which can be proved similar to []:

CV(λ,γ) =  n

n

i=

(6)

where Dii is the (i,i)th diagonal element of (IH)X[XT(IH)X+,γ(β)]ˆ –XT(I

H). It is obvious that this method can significantly reduce the computational bur-den.

5 Simulation studies and application

In this section, we investigate the finite sample performance of the adaptive group bridge method through simulations and a real data application.

5.1 Monte Carlo simulations

We simulate  datasets consisting ofnobservations from the following partially linear model:

Yi= pn

j=

xTijβj+cos(πUi) +εi, i= , . . . ,n,

wheren= , and the errorεiN(,σ) withσ= ., , . We consider that there are pngroups withpn= , ,  and each group consists of three variables. The true values of parametersβT = (., , .),βT = (, –, ),βT = (., ., .),βT=· · ·=βTpn= (, , ). Uifollows the uniform distribution on [, ]. To generate covariate x = (xT,xT, . . . ,xTpn)

T

withxj= (Xjk,k= , , )T, we first simulateR, . . . ,Rpn independently from the standard

normal distribution. Next, simulateZj,j= , . . . ,pn, from a multivariate normal distribu-tion with the mean zero andCov(Zj,Zl) = .|jl|. Then the covariates are generated as Xjk= (Zj+R(j–)+k)/

,j= , . . . ,pn,k= , , .

We compare the adaptive group bridge (AGB) with the group lasso (GL) and the group bridge (GB). The following three performance measures are calculated:

. Lloss of parametric estimate, which is defined asββ. . Average number of nonzero groups identified by the method (NN).

. Average number of nonzero groups identified by the method that are truly nonzero (NNT).

Group selection results are depicted in Table . The numbers in the parentheses in the columns labeled ‘NN’ and ‘NNT’ are the corresponding sample standard deviations based on the  runs. Boxplots of theLlosses under different settings are given in Figures -.

From Table , we can have the following observations:

() Both GB and AGB perform better than GL for all settings. All these three methods can retain all the true nonzero groups, but GL always keeps more redundant groups that are unrelated with the response than both GB and AGB.

() AGB performs much better for largerσandpn. Whenpn= for AGB, groups

selected for the caseσ= are about .% lower than that for the caseσ= .. While groups selected for GB decrease by .% in the same situation.

() Forpn= , GB performs better than AGB, but the stability of GB is bad forσ= . Figures - presentLlosses with varyingσ andpn. We can see that the performances

of estimates are similar for GB and AGB. Forpn=  and , both GB and AGB perform better than GL. However, when pn= , the median ofLlosses for all these three are

(7)
[image:7.595.116.481.95.320.2]

Table 1 Group selection results

pn Method σ= 0.5 σ= 1 σ= 4

NN NNT NN NNT NN NNT

10 GL 7.80 3 7.59 3 6.02 3

(1.231) (0) (2.396) (0) (1.255) (0)

GB 4.64 3 4.80 3 4.55 3

(1.259) (0) (1.356) (0) (1.258) (0)

AGB 5.22 3 5.00 3 4.56 3

(1.605) (0) (1.735) (0) (1.131) (0)

30 GL 20.47 3 11.49 3 14.46 3

(2.504) (0) (4.464) (0) (2.022) (0)

GB 11.04 3 10.96 3 10.18 3

(3.643) (0) (3.784) (0) (2.350) (0)

AGB 13.06 3 10.64 3 9.57 3

(5.510) (0) (5.921) (0) (2.046) (0)

50 GL 33.17 3 16.48 3 22.37 3

(3.223) (0) (5.018) (0) (3.308) (0)

GB 17.23 3 17.79 3 15.96 3

(4.608) (0) (3.952) (0) (3.284) (0)

AGB 19.25 3 15.67 3 15.68 3

(8.437) (0) (8.385) (0) (3.684) (0)

[image:7.595.119.478.395.685.2]
(8)
[image:8.595.118.477.82.371.2] [image:8.595.117.477.420.711.2]

Figure 2 Boxplots ofL2loss forpn= 30.

(9)
[image:9.595.117.478.95.276.2]

Table 2 Estimates of the wage data

Variable Description GL GB AGB

edu Number of years of education 0.0694 0.0668 0.0635

south 1 = southern region, 0 = other –0.0723 –0.0679 –0.0490

sex 1 = Female, 0 = Male –0.1999 –0.1983 –0.2031

union 1 = union member, 0 = nonmember 0.1951 0.1934 0.2030

race 1 = other, 0 = White –0.0559 –0.0585 –0.0582

1 = Hispanic, 0 = White –0.0537 –0.0615 –0.0614

occup 1 = management, 0 = other 0.1874 0.2173 0.2516

1 = sales, 0 = other –0.0797 –0.0809 –0.0721

1 = clerical, 0 = other 0.0166 0.0262 0.0430

1 = service, 0 = other –0.1171 –0.1173 –0.1104

1 = professional, 0 = other 0.1533 0.1768 0.2061

sector 1 = manufacturing, 0 = other 0.0848 0.0912 0.0994

1 = construction, 0 = other 0.0546 0.0622 0.0674

marr 1 = married, 0 = other 0.0000 0.0000 0.0000

5.2 Wage data analysis

The workers’ wage data from Berndt[] contains a random sample of  observations on  variables sampled from the current population survey of . It provides information on wages and other characteristics of the workers, including continuous variables: the number of years of education, years of work experience, age and nominal variables: race, sex, region of residence, occupational status, sector, marital status and union membership. Our goal is to study the important factors for the wage, so it is reasonable to use our proposed method for these data.

From the residual plot, we can easily see that the variance of wages is not a constant. So thelogtransformation is used to stabilize the variance of wages. Due to the multicollinear-ity problem between age and experience, we need to get rid of either age or experience. Here we remove the age variable from the model. Xie and Huang [] analyzed these data without considering the transformation ofY. Furthermore, they did not consider group selection of factors. Similar to Xie and Huang [], we fit these data using a partially linear model withUbeing ‘years of work experience’.

Table  reports estimated regression coefficients of GL, GB and AGB. All these three methods exclude marital status. We use the first  observations as a training dataset to select and fit the model, and use the rest of  observations as a testing dataset to evalu-ate the prediction ability of the selected model. The prediction performance is measured by the median of{|yiyˆi|,i= , , . . . , }for GL, GB and AGB using the testing data, respectively. Hereyi’s are those  observations in the testing dataset andyˆi’s are corre-sponding prediction values. The median absolute prediction errors of GL, GB and AGB are ., . and ., respectively. Therefore, we can conclude that the AGB gives the smallest prediction error, so it is an attractive technique in group selection.

6 Discussion

(10)

Appendix

Proof of Theorem. By the definition ofβ, it is easy to getˆ

(IH)(Y –Xβ)ˆ + pn

j=

λj ˆβ≤(IH)(Y –Xβ) 

+ pn

j=

λjβ,

that is,

(IH)(Y –Xβ)ˆ –(IH)(Y –Xβ)≤ pn

j=

λjβ.

As Y =Xβ+f(U) +εwithf(U) = (f(U), . . . ,f(Un))Tandε= (ε, . . . ,εn)T, we can rewrite

the upper inequality as follows:

(IH)X(βˆ–β)– f(U) +εT(IH)X(βˆ–β)≤ pn

j=

λjβj

γ

.

Let

an=n–/

XT(IH)X/(βˆβ

),

bn=n–/

XT(IH)X–/XT(IH)f(U) +ε.

Then we have

an≤

anbn+bn

≤ n

pn

j=

λjβ+ bn.

Since|A|=kn, under condition (A),

n

pn

j=

λjβ =O

ϕnkn n

.

While

bn=  n

f(U) +εT(IH)XXT(IH)X–XT(IH)f(U) +ε

≤  nε

TAε+nf(U)

TAf(U), ()

where

A= (IH)XXT(IH)X–XT(IH).

For the first term on the right-hand side of (),

E

nε

TAε

=σ

n tr

(11)

Thus

n–εTAε=OP

n–dpn

. ()

For the second term on the right-hand side of (), by conditions (C) and (C),

E

nf(U)

TAf(U)

≤ 

nE λmax (IH)X

XT(IH)X–

XT(IH)

×trf(U)T(IH)f(U)

=  nE

f(U)T(IH)f(U)=Oqn–r. ()

Combining ()-(),

bn=OP

n–dpn+q–n r

.

By conditions (A) and (A),

Ean=  nE

(βˆ–β)TXT(IH)X(βˆ–β)

=E

(βˆ–β)T

nX

T(IH)X

(βˆ–β)

+E(βˆ–β)T (βˆ–β)

τ

E ˆββ

.

Therefore

ˆββ=OP

n–dpn+qn–r+n–ϕnkn

.

Under condition (A), we have

ˆββP .

Proof of Theorem . Letμn=n–p

n+qnr +

n–ϕn

kn, we can choose a sequence

{rn,rn> }which satisfiesrn→. PartitionR

pn

j=dj\{}into shells{S

nj:j= , , . . .}, where Snj={β: j–rnββ< jrn}. For an arbitrary fixed constantL∈R+, if ˆββis

larger than Lrn,βˆis in one of the shells withjL, we have

Pr ˆββ ≥Lrn

=

l>L,lr n>Lμn

Pr(βˆ∈Snl)

+

l>L,lrnLμn

Pr(βˆ ∈Snl) (Lis an arbitrary constant),

where

l>L,lrn>Lμ n

Pr(βˆ∈Snl)≤Pr ˆββ ≥L–μ

n

(12)

and

l>L,lr n≤Lμn

Pr(βˆ∈Snl)

=

l>L,lrnLμn

Pr

ˆ

βSnl,nτ

+

l>L,lr n≤Lμn

Pr

ˆ

βSnl,n> τ

,

wheren=n–XT(IH)X– . By condition (A),

l>L,lr n≤Lμn

Pr

ˆ

βSnl,n>τ 

≤Pr

n>τ 

=o().

Therefore,

Pr ˆββ ≥Lrn

=o() +

l>L,lrnLμn

Pr

inf

βSnl

Qn(β) –Qn(β)

< ,nτ

.

Since

Qn(β) –Qn(β)

=(IH)X(β–β)– f(U) +εT(IH)X(β–β)

+ pn j= λj

βjγβjγ

≥(IH)X(β–β)– f(U) +εT(IH)X(β–β)

+ kn j= λj

βjγβ

= In+ In+ In.

For In,

In≥ inf

βSnl

ββ

,

for allβSnl, there existsββ≥l–rn, therefore In≥l–rn. For In, we have

|In|=

kn

j=

λjγβ∗j

γ–

βjβj

ϕnγ

kn j= β∗ j γ–

(13)

whereβj is betweenβjandβj. By condition (A) and since we only need to considerβ withβSnl, lrn≤Lμn, there exists a constantC>  such that

|In| ≤Cϕnγ

kn

j=

βjβjCϕnkn/γββ.

So for allβSnlsuch that|In| ≤Cϕnkn/γlrn, by the Markov inequality, we have

Pr

inf

βSnl

Qn(β) –Qn(β)

≤

≤Pr

sup

βSnl

|In| ≥l–rnCϕnkn/γlrn

E(supβSnl|In|)

l–rnCϕnkn/γlrn .

Using the Cauchy-Schwarz inequality, we have

E

sup

βSnl

|In|

≤Ef(U) +εT(IH)XXT(IH)f(U) +ε/

×E

sup

βSnl

ββ /

≤l+/rn

EεT(IH)XXT(IH

+Ef(U)T(IH)XXT(IH)f(U)/,

where

EεT(IH)XXT(IH=σEtr(IH)XXT(IH)=O(np n)

and

Ef(U)T(IH)XXT(IH)f(U)

EtrXT(IH)Xtrf(U)T(IH)f(U)

=O(npn)O

nq–n r=Onpnq–n r

.

Accordingly,

E

sup

βSnl

|In|

Clrn

npn+npnqnr

.

Then we can get

l>L,lrnLμ n

Pr(βˆ∈Snl)l>L

Clrn(npn+npnqnr) l–rnCϕnkn/γlrn

(14)

We choosern= (√pn/n+√pnqnr), we have

l>L,lrnLμn

Pr(βˆ∈Snl) =

l>L

C

τl––Cϕnkn/γ/( √

npn+npnqnr) .

By condition (A)(a)ϕnkn//( √

npn+npnqnr)→, for sufficiently largen,

l––Cτ–λjkn// √

npn+npnM

rg

n

≥l–.

Thus

l>L,lrnLμ n

Pr(βˆ∈Snl)l>L

C

l– ≤C –(L–).

LetL→ ∞, then

l>L,lrnLμn

Pr(βˆ∈Snl)→.

Hence

ˆββ=OP

n–p

n+√pnqnr

.

Proof of Theorem. (i) By Theorem ., for sufficiently largeC,βˆ lies in the ball{β:

ββvnC}with probability converging to , wherevn=

n–p

n+√pnqnr. Letβ()=

β+vnνandβ()=β+vnν=vnνwithν=ν+ν≤C. Let

Vn(ν,ν) =Qn(β(),β()) –Qn(β, ) =Qn(β+vnν,vnν) –Qn(β, ).

Then βˆ andβˆ can be attained by minimizingVn(ν,ν) over νC, except on an

event with probability converging to zero. We only need to show that, for someνandν

withνC, ifν> ,

PrVn(ν,ν) –Vn(ν, ) > 

→, n→ ∞.

Some simple calculations show that

Vn(ν,ν) –Vn(ν, ) =vn(IH)Xν 

+ vn(Xν)T(IH)(Xν)

– vn

f(U) +εT(IH)(Xν) +

j∈/A

λjvnν

= IIn+ IIn+ IIn+ IIn.

For the first two terms IInand IIn,

IIn+ IIn≥–vn(IH)Xν 

= –nvnCoP() +τ

(15)

For IIn, we have

Ef(U) +εT(IH)Xν

≤ Ef(U)T(IH)XννTXT(IH)f(U)

+εT(IH)XννTXT(IH

CE

trXT(IH)X

trf(U)T(IH)f(U)

+σEtrXT(IH)X

=Onpnq–nr+npn

.

Thus we have

IIn=vn

np/n qnr+n/p/n OP().

For IIn, by  <γ< ,

j∈/A

vnν /γ

j∈/A

vnνj /

=vnν.

Accordingly,

IIn≥ϕnvγnνγ.

By condition (A)(b), for someν> , we have

PrVn(ν,ν) –Vn(ν, ) > 

→.

(ii) Letωnbe some kn

j=dj-vector withωn= . By Theorem .(i), with probability tending to , we have the following result:

∂Qn(β())

∂β()

β()=βˆ()

=XT(IH)X(βˆ()–β) –XT(IH)

f(U) +ε+ξn= ,

whereξn= (λγ ˆβγ–βˆ

T

, . . . ,λknγ ˆβkn

γ–βˆT

kn)

T. We consider the limit distribution

n–/ωTn –/ XT(IH)X

(βˆ()β)

=n–/ωTn –/ XT(IH)f(U) +n–/ωTn –/ XT(IH

n–/ωTn –/ ξn

=Jn+Jn+Jn.

ForJn,

Jn=n–ωTn –/ XT(IH)f(U)=OP

(16)

ForJn, by conditions (A) and (A), we have

EJnn–τ–ϕnγ

kn

j=

E ˆβj(γ–)=On–ϕnkn

.

ForJn,

Jn=n–/ωTn –/ GT(IH)ε+n–/ωTn –/ XTε

n–/ωTn –/ XTHε

=Kn+Kn+Kn.

Under conditions (C) and (C),

EKn=n–ωTn –/ EGT(IH)εεT(IH)G

–/

 ωn

=Oknq–n r

.

By condition (A), we have

EKn=n–ωTn –/ EXTHεεTHX

–/

 ωn

=On–knqn

.

Now we focus onKn

Kn=n–/ωTn –/ XTε

=√ n

n

i=

sniεi.

First,

E(sniεi) = ;

Var

n

i=

sniεi

= n

i=

Var(sniεi) =σ.

Next we verify the conditions of the Lindeberg-Feller central limit. For any> ,

n

i=

Esniεi|sniεi|>

=nEsnε|snε|>

nEsnε/Pr|snε|>

/

.

By condition (A),

Esnε=n–E ωTn –/ x–E(x|U)

x–E(x|U)

T –/

 ωn

n–ρminωnωTn

ρmax –E x–E(x|U)

T

x–E(x|U)

(17)

n–ρminωnωTn

ρmax –kndkn

j=

dj

k=

EXjkE(Xjk|U)

=Oknn–

and

P|snε|>

≤ 

E(snε) 

= σ

n –ωT

n –/ E x–E(x|U)

x–E(x|U)

T –/

 ωn

= σ

n

–=On–.

Thus we have

n

i=

Esniεi|sniεi|>

=Onknn–n–/

=o().

This means thatKn

D

N(,σ). Using Slutsky’s theorem, we have

n–/ωTn –/ XT(IH)X

(βˆ()β)→D N,σ.

Letu

n=nωTn(XT(IH)X)– (XT(IH)X)–ωn, then

n/u–nωTn(βˆ()β)→D N,σ.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 11401340).

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Received: 5 January 2017 Accepted: 16 June 2017

References

1. Tibshirani, R: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B58, 267-288 (1996)

2. Frank, I, Friedman, J: A statistical view of some chemometrics regression tools. Technometrics35, 109-148 (1993) 3. Fan, J, Li, R: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.96,

1348-1360 (2001)

4. Zou, H: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc.101, 1418-1429 (2006)

5. Zou, H, Hastie, T: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B67, 301-320 (2005) 6. Fu, W: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat.7, 397-416 (1998) 7. Knight, K, Fu, W: Asymptotics for lasso-type estimators. J. Comput. Graph. Stat.28, 1356-1378 (2000) 8. Liu, Y, Zhang, H, Park, C, Ahn, J: Support vector machines with adaptivelqpenalty. Comput. Stat. Data Anal.51,

6380-6394 (2007)

9. Huang, J, Horowitz, J, Ma, S: Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Stat.36, 587-613 (2008)

(18)

11. Chen, Z, Zhu, Y, Zhu, C: Adaptive bridge estimation for high-dimensional regression models. J. Inequal. Appl.2016, 258 (2016)

12. Huang, J, Ma, S, Xie, H, Zhang, C: A group bridge approach for variable selection. Biometrika96, 339-355 (2009) 13. Park, C, Yoon, Y: Bridge regression: adaptivity and group selection. J. Stat. Plan. Inference141, 3506-3519 (2011) 14. Engle, R, Granger, C, Rice, J, Weiss, A: Semiparametric estimates of the relation between weather and electricity sales.

J. Am. Stat. Assoc.81, 310-320 (1986)

15. Xie, H, Huang, J: Scad-penalized regression in high-dimensional partially linear models. Ann. Stat.37, 673-696 (2009) 16. Donoho, D, Johnstone, I: Ideal spatial adaptation by wavelet shrinkage. Biometrika81, 425-455 (1994)

17. Wang, L, Li, H, Huang, J: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc.103, 1556-1569 (2008)

Figure

Table 1 Group selection results
Figure 2 Boxplots of L2 loss for pn = 30.
Table 2 Estimates of the wage data

References

Related documents

T he purpose of the present problem is to study the effect of porous medium on unsteady laminar free convective flow of dusty viscous fluid along a moving porous hot vertical plate

Holistic Development – Kotahitanga Empowerment – Whakamana Relationships – Ngà hononga.. Family and

When you return to the Dashboard, you will see the hostname of the Domain Controller or other Windows machine that you installed the Connector on the 'System Settings &gt; Sites

Cultivation of mango, pomegranate and banana is adversely affected due to damage by one or more of the above documented pests but a few pests such as Sigatoka leaf spot and thrips

Stand establishment is the most risky period of the entire irrigated pasture production cycle. By following several recommended practices, a grower can greatly reduce the risk

Dietary Supplements that are regulated as foods and not drugs; that do not contain genetically modified components, or sourced from GMO materials; do not employ nanotechnology

Similarly with gatekeeper liability, if corporate auditors, attorneys, and underwriters are made liable for the financial fraud of the corporations to whom they provide