Penalized empirical likelihood based variable selection

(1)

(2)

(3)

(4)

(5)

PENALIZED EMPIRICAL LIKELIHOOD

BASED VARIABLE SELECTION

by

@T h a rshan naN ada r ajah

A thesis sulnniiicdto the Schoolo] GraduaieStiulies in portioljulfiilment o] the requiremcnijortheDeqrceoj

Mastero] Sciencein Statistics

Depa r tm cn t o f Mathem at icsand St a ti st ics Memor ia lUniver sityof Newfo n n d la n d

(6)

Abstract

Vari able select ion isanimportanttopic inhigh-dimensionalst at ist ical modelingves-peciallyin generalizerllinearmod els.Severalvariable sc!ectionproced ureshavebeen devc!oped in t helitera ture,includ ingthe sequent ial approach ,predict ion-error ap-proach,andinformation-theoret ic approach.Allofthesearc computationally ex-pensive.Anew methodbased on penalizedlikelihoodhasbeenlaudedfor its corn-pntati onal efficiencyandst abilit y.In this approachthe variableselectio nandthe estimat ionof the coefficientsarecarriedoutsimulta neously,Theparam etr ic likeli-hoodis a crucialcompo nent,but inmanysit nations a well-de fined para met ric likeli-hood is not easy to construct.To overcomethisproblem, Variya th (2006)prop osed apenalized-empirical-likelihood (PEL)based variableselect ionwhereempirical like-lihoodis constructe dbased011a set of estimati ngeq uat ions.We investigate the ,.,ympto ticproper t iesofthe newmet hod,and develo panalgorit hmforesti mati ngt he param et ers.Oursimulat ionstudiesshowthatwhenaparam et ricmodel is availab le,

(7)

PEL-bascdvariable selectiongives rcsnlts similartothose achieved bypar ametri c-likelihood variableselection. Thefonn ermeth od out performs thclatt er whenthe paramctri cmodelismisspecificd.\Ve extcnd our appro achtovariable selectionin Cox's prop orti onalhazardmodel.

(8)

Acknowledgements

Iwould liketo cxp rcssmy appreciation to my superv isorDr.AsokanMulaya th Variyath,forgiving methe opportu nity toworkwith him.His cont inuo usguida nce, snpport and pati cn ce ovcrthclasttwo yearshavebeeninvaluable.

Igrate fu lly aeknowledgcthc financialsuppo rtprovidedbyMemori al University ofNewfound land's SchoolofGraduat eStudies ,theDep artment of Ma th ematics& St a ti sti cs , and my superv isor inthcformofgra duatefellowships and teachi ng assis-tant ship s. Iwou ld liketothank allthe facultyinourdep ar tm en tfortheir supp ort. Itwould nothavcbccn possih lcforme to complctctherequirements ofthcgra d ua te progra m wit houtthcirgu idancc.

Iexpressmyprofoundappreciation tomyparents,brothers ,wifePremni, and beautifulson l(avevarmt\llforthcir ellcou ragcment,understandiugvaud patiencceven duringthedifficul t timcs .

(9)

who directlyorindireetly cncouragedmein theMaster 'sprogram andcontributedto

Fiually,thisthesis is dedica tedtomyparelltsaud theteacherswhoh ave supported me eversillcethebeginnin g ofmy studi es.

(10)

List of Tables

1.1Response andCovariatesof doct or-visitdat a .

Simulat ion result sforlinear regressionmodel .

Linearregressionmodel:Estimatesof nonzerocoeffieients wit h corre-spo ndingst andard errors in pa rent heses..

5.3Simulationresultsfor Poissonregressionmodel. Poissonregression:Estim at esof non zero coefficientswit h correspond-ing st anda rderr orsin parenthe ses

Simulationresul tsforlogisticregressionlllodel. Logisticregressionmod el:Estim at esof non zero coefficients with cor-resp ondin g sta ndarderrors in parenthe ses

(14)

Estimatesof Poissonregr essioncoefficients,with theirsta ndar derrors in parenth eses,formodel ide nt ifiedby differentvaria bleselection mct hods75

Simulati onresult s for Cox's prop ortionalhazard smod cl . Cox'sprop or t ionalhazardsmod el:Estim ates of nonzero coefficients with correspo nd ingstanda rderrors inparent heses. Simulati on results for Cox'sprop orti on alhazard smod el Cox's proportion alhazard smod el: Est imatesof non zero coefficients withcorr esp onding st andardcrror sinparent.hescs..

Estimatesof Cox'sprop ortionalhaza rdsmod el coefficient sfor fullmodel 89 Estimatesofregressioncoefficients in Cox'sproportional hazardsmodel90

(15)

List of Figures

L"penalt yfunction. SCADandHAR D penalt yfuncti ons

(16)

Chapter 1

Introduction

1.1

Background

of

Variable Sel ecti on

Variab le select ion isanimportanttopicinst at ist ical mod eling,espec ially ingeneral-izcdlinearmodel s (G LtvI).In practi ce, alargenumberof covariates,(X"X2,...,Xp ), arcbelievedtohaveallinftucnceontheresp onsevariahleyof intcrest.However,some covariatcshavcno influcnccor a wcak influence, andaregressionmodelthatinclud es allthe covariate,isnotadvisable,Exc!udin gtheunimportant covari at e,re",ltsin a simpler mod el wit h bett erint erpretive and predicti ve value.

Theproblem of identifyingasubmodelthat adequat elymodelstheresp onseis genera lly referredtoasthevariableseleet ion problem. Sta t ist icallyspea king,variable

(17)

selec t ionisa way toredu cethe complexityofthe mod el,insomecasesby accept ing asmallamount of biasto improvethe precision.Themainad va nt ag es of selectin g a

oTheinterpr et ati on of a lar gemodelcan bedifficul t.

oThepredi cti on accur acymaybcimproved bydropping rednnd ant andirrelevan t

oKnowing wh ichva ria bles are sign ifica ntgives insightintothenatur e of the prediction problemand allows a betterunderstandingof thefinal model.

oIt ischeaperto IIlcasl1rcarcd uccdsctofvariablcs.

For examp le,cons ide r thedoctor-visitdatafro m theAustr ali anheaIt hsur vey of 1977-78,whichisdiscussedin det ailbyCam eron and Trivedi (1998). Thedat a set consists ofaresp on sevari abl e (t he number of doct orvisit sinthe previoustwo weeks by an ad ult)andtwelvecovariates,includin ghealthindicator s andgenera l factors,which arclist edin Table1.1.Ourgoal istomodeltherelation shipbetweentheresp onse andthecovariates .Themodelwithalleova ria t es is notinterestingsinceitisdifficuIt tointerpret andwill havepoor predictionprecision.We aim to finda simpler mode l that givesareason abl edescriptionof thedat a-gen eratingmechanism .Theinitial ana lys isofand vari abl e selection for this da t a setarc discussedinCha p ter5.In

(18)

the nex t subsec t ion wewilldiscuss com mo nlynscd rcgrcssion mode lsandcst imat ion procedur es where vari abl e selectio nis considered impo rta nt.

Descrip ti on y-Dvisits Xl-Sex X2-Age X,,-Agcsq .I\",-IllCOmc X.-Levyplns XG-Freepoor X7-Frccrcpa X8-Illness Xg-Actd ays XIO-Hscorc Xu-Chcon dl Xl 2-Chcond 2

Numberofdoctorvisits in previoustwoweeks 1 ifIcmale,0if male

Ageinyearsdividedby100 Age squa red

Annual income in Aust rali andolla rsdivid edby1000 1ifcoveredbypriva tehealth insuran ce;oothe rwise 1ifcoveredby gover nment becauselowincome, rec entimmigrant ,unem ployed;Ootherwi~e

1ifcovered freeby govefllmentbecanseclderl y,disabilitypension, invalid vet eran, or familyof deceasedveteran;Oot.herwisc Numbe rofillucsscsln previousz weeks,wit.h5ormore coded

I.'

5 Nu mbcrof daysofreducedactivit.yin previous2wecksduct.oillnessor inj nry Genera l heal thquestionn air e scor eusin g Gold be rg'smeth od; high scorciudicat oshadhcalth

1 ifchroniccondit.ion(s)hut.not.limit edinactivity;Ooth erwisc 1ifchro n iecond it ion(s )andlim ited inactivity;0otherwise

Tabl e1.1:Response andcovariatesof doct or-visitda ta

1.1.1 Lin ear Mode ls

Linea rmod elshave been the main st ay ofstatistics forthirty yearsandrcmainone ofo nrmost common ly used statist icaltools. Inlincar lllod els,theda t a aremodeled usinglinearfuncti ons of the covariates,and theunknownparamet ers are estima ted from the dat a.For a given da t a set{Yi;Xill ...tXip}:~1ofnunits/subjects, a Iincar

(19)

1.1 BACK GROU ND OF VARI ABLESEL ECTIO N

regressionmodel assumes that tho relati onshipbetweentho rospo nscvaria blcp, and the p dimcn, iona lrcgressors Xi is linca r. Thus, the modelhas the for m

(1.1)

whcre e is theerror term,Xis an nxplUatri x of cova riat c vall1Cs,an d{3is a vcctor of unknownpar am et erstobe est imate d.Aviolat ionof thclinearity assumpt ion betweentheresp onse andtheexplanat oryvaria bles or thcdistrihutional assumpt ion of t hc ra ndo mcrror may incrcascthc modclvariat ion.Themeth odofleastsquares isthemostpopul armethodfor est ima t ing the rcgressionparam et ers,This approac h minimizesthe residua lsumofsquares,

I nma t r i xform,t here s i d u alSUIllo fsqu a r e sc anbewr i t t e n

RSS({3)=(y-X{3f "(y -X{3).

Hence,theordinarylcast- squ ares estim at e of{3is givellby

andthc fitted valuesatthetraininginputs are

(1.2)

(20)

Ifwe assumethat €~N (O,a2I,,),then the likelihood funct ionofycan bewritten

Let€((3,a2)=logL((3,a2) ,thenthe part ialderivat ive of €((3, a 2) wit hrespect to(3

estimateof (3.

1.1.2 Gen eraliz edLinearModel(GLM) Generalized linearmodels arcdefinedhy Nolder and Wedderburn (1972). includelinear regressionmodels,logistic and probitmodelsforcategorical responses, andlog-linearmodels.For all thesemodcls,alinearrelationshipisas sumedbetween the resp onsevariableyandcovariatesX through somelinkIuuctionvTbc condit ional expectat ionofygiven Xis specifiedas

JL=E(yI X)=g(X (3), (1.4)

whereg(*)is aknownIink functi onandf3isthe vector ofregressionparameters. AGLMincludes a randomcomponentspecifyingthe condit ional dist ribution of th e

(21)

1.1BA CK GRO UNDOF VARIABLESEL ECTION

rcspon scvariablcygivcllthc cxpl anatory variahlc.Thadi tion ally,thcralldolIl COlll-ponentis amemb er ofanexponen tia l-familydist ribntion suchas the Gaussian, bi-nomial,Poissoll, gallllll a,orinvcrsc-Gallssian .Theestim ationproceedsbydefining a measureofgoodness- of-fit bet weentheobserveddat a and thefitted valuesgenerate d bythemodel.Theparameter est ima tesarc the values thatminimizethe goodness-of-fit crite rion.Weprimarily est imate theparam et ersbymaximizingthelikelihood for theobservedda t a .Thelog-likelihoodbased ona setofindependent observa t ions Yll Y'l.,· ·· ,Ynis

The goodn ess-of-fit crit erionis

D(Y;IL) =2f (y; y)-2f (/l;Y);

it is calledthescaleddeviance.Notethate(y; y)is the maximumlikelihood for an exact fit in which thefittedvaluesarcequa ltothe observed dat a , and itdocsnot dependon theparam eters.Maximi zing f (/l;Y)is equivalent tominimizingD(y;/l) wit h resp ectusu,subjec t to the const raints imposedby the model.

(22)

1.1.3 Quasi-Lik elihood (QL)

When thereisinsufficientinform ationabout thedat aforustospecify aparam etric model,quasi-likelihood is often used.lnthis sitllation we can develop the stat istical analysisbased onapproximat ions to thelikelihood,andwe concentrate011ca..ses

where theobse rva t ions areindepend ent. Suppose wehave a vectorof independ en t respon ses,y,with meanuandcovariance diagon almatrixa2V(J-L).\VC assumetha t

{tis afuncti onofcovariates andsomeregression param et ers(3.To const ruct the quasi-likelihood ,we star t bylooking at a single compon entyof y.Undertheabove cond it ionsv thefunction

hasthefollowingproperti es:

E(U)=0,V(U)=

a2~(ll)

'

and-E

(

!Jjf;)

=

a2~(jl)'

Most ofthefirst- ord eras ympt otictheory concernedwit hthelikelihoodisbas edon theseproperti es.Itis therefore not surprising that

Q(/l; y )=

[:'~(:)dt

behaveslike alog-likelihood functi onforI';thisis calledthequas i-likelihood.The quasi-likelihoodforcompletedat a is

(23)

Q(JI;Y)=t Q(/I;;Y;).

Thoquasi-deviancefuncti on fora singleobserva t ioncanbewritten

Thequas i-likelihood est imat ingequa t ion, for theregressionpar am et er'(3are ob-tainedbydifferenti atin gQ(JI;Y ). They can be writte n in theformU(/J)= 0,where

U((3)=DTV-~~Y-JI)

is called thcquasi-scorefunctionandDisthederivativeofJI((3)withresp ectto(3. TheNcwton-Ra phson methodis widelyused toestimate thcparamete rs.

1.2

Va

ria b le

Selecti on Met

ho ds

Themain objcc t iveofmriableselcct ionmethods istoidenti fy a simp leradeq ua te mod elthatis easier tointerpretthantbefullmodel.In linearmod els,the submodel rclat es thcresp onsevariabl eyto a sub set of comp onents ofXintheform

y=X (S)(3(S) +f

whereX(s )isasubse tofthecomponentsofX,(3(s )is avect orof thc correspond-ingregressionparam et ers,ands~(1, 2,...,p) . The variableselect ion problem

(24)

istofind thebcst sub setBsuchthat t hesuh mo dcl is op t ima laccord ingtoso me cri te rionthatgivesa good descripti on of thedat a-gen erati ng mechanism.Several meth odshavebeendevelop edill theliter atur efortheidcnti ficati onof th e h cst sub mod el.These metho dscan bebroadlyclassifiedinto four catego ries:seq uen-tialapp roac hes , pred ict iou-erro rapproac hcs,infor ma t ion-t hcoret icap proac hcs ,and penali zed -likelih ood approac hes.In thenext sect ion wewilldiscuss exist ing vari abl e select ion procedures andtheir advant ages and disadvantages.

1.2.1 Sequen t ialApproa ch es

Thesequent ialapp roac hesweredevelopedin theearly1960s whencomputi ng ro-sourceswerelimited.Intheseap proac hes ,onlysome ofthepossiblesubmodc ls are evalua te d to identif ythebestmodel. In theforward-selectionapp roac h, westa rt withauinterceptmodel and add thevariabl es oneata time.Ateac hste p,eachva ri-abletha t isnotalrea dy in themod elistcst edfor inclusion , and themos tsiguificant varia ble isadde dtothemod cl.Thisprocesscont inues untilnon e of theremain ing varia blesaresignificantwhenadded to the mod elorthereare no more vari ahl es.B e-causeofthecomp lex itytha tarises from thenatur eof thisprocedur e,itis essent ia lly impossi ble to contro l the error rate.

(25)

lIlay chan geth esignifiea nceof oneor lllorevariablcs al ready includedi n thelllodel. Analte rnativeapproac h isbackwardelimination.Inthisapproach,westartamodel withalltbe variablesof interest.Thentheleas t impo rta nt vari ableisdropp ed,pr o-vided itis notsignificant.Weconti nueth isprocessby succcss i\'elyrL~ fi t ti ngrednccd models andapplying the same rulenntil allthevariablesremainin ginthemodel are stat ist icallysignificant. Backward eliminat ionalsohasdrawbacks,Sometimesvari

-abiesthat aredropped wouldbe significantinthefinalredu cedmodel.This suggests that a compromisebetweenfor wardselectionand backwardeliminat ionshould be

Efroymsou(19GO) proposedastepwise-regressionapproach thatisacombination ofthe above two approaches. Thismethodusesforwardselect ion,but afterthe add i-tionofeachvariable,backwardeliminat ionis applied topotentiallyremove vnriables alrea dyinthe model.Stepwiseregressiondoesnot gua ra ntee to findanopti ma l submodel.Tbe sequential app roaches ar e compu t ationallylessdelllandin g th an the

1.2.2 Predi cti on-Error Approach

Anot herapproachto variableselect ion is to choosethe sub modelwit hthe bestability to predicta future response.Meth odsusing the prediction-er rorapproach,suchas

(26)

cross-valid a tionand bootst rap,are compnta t iona lly intens ive.Cross-validati on has beenwellstudiedas abasisformod el select ionbyStone(1974).Incross-valida tion, we comp ntethe predi cti onerrorof allsnhmodcls.Wesplit thedat ainto[(parts of roug h lyequa lsizes andestimatethe prediction error forone partof thedat abas ed on thefitted submo del usingthe remaining(1(-1)part s .Wethen combineallJ( estimates of thcprcd ictioncrrorforcachsubmodel. The submod clwith theminimum predictioncrror is select ed .

Let k:{1,2, ...,n}>-+{1,2, ... ,[( } hean indexingfunction thati nd ica testhe par tit ionto whicheachobse rvatio n isallocate d bytherandomizati on.The case

estimatorsareap prox imatelyunb iased forthetr ue predi cti oner ror,bu t theycan haveahighvariance andthecomp utationa l bur denis also high .Ingeneral,five-or ten-foldcross-valida t ion isrecommend ed(SL'CBreimanand Spector ,1992;Kohavi, 1995).

Bickeland Freedm an(1982)suggeste dthatcond it ionalboot st rapbeused for variable selection. Thebootst ra pis a general toolfor assessingstatisticalaccuracy. Supposewe wishtofit amod elto a setoftraining dat a.Thebasicideais to randomly drawdat a setswithreplacem ent fromthe tra inin gda t a,eachofthesamesizeas the origina ltrainingsct.Thisprocedur erepea t ed alargenumber of times .Thenwerefit

(27)

thcmodcl to cachofthc bootstrapsamplesetsandoxa m inc thebchavior of t hcfits. Thesemethods arecomputer-intensiveandtendtobeimpracticalifwchavcto fit morethan15-20modelsor ifth c samplesize islar ge.However,cross-validat ion offersan intercstingaltern a tiveformodelselection.Insomesituati onstheprediction err orisnot well defined (for examp le,ingeneralizedlinea r models)and thcrcforc thesemeth odsarenot ap plicable.

1.2.3 Infor m a ti on- Theoret ic Approach

In this scction ,webrieflyintroducethemostcommonlyused inform ati on- theorct ic modelselect ionapproac hes :theAkaikein formatio ncrite rion(AIC)andBayesian in formationcritcrion(BIC).Thescmcth odsa rcapplicablcwhcn a well-defined para-metri cmodelis available.Wcwillalsodiscussnon paramet ric versions ofAICand

AkaikeIn for ma t ionCr iterio n(AIC)

Kullback and Leibl cr (1951)int roducedthcKullback-Lcibler (K- L)"dist ance"or "information"betweent\VOmodels.Letjalld gbe continuousdistribution fu nctions ,

thentheK-Linform ationbetweenmodelsj'andgisdefined to bc

(28)

Thenota t ion[(J,y )denotesthe dist an ce fromatof.However,theK-L distance can notbe computed withou tfullknowledgeof bothfandthepara meter0for each candidatemodelYi (:rIO).Akaike(1973,1974)founda simplerelationshipbetween theK-L dist anceandFisher 'smaximized log-likelihood functio n.Akaikealsofound arigorouswaytoest imate theK-L inform ati on,based on thecmpi rical log-likelihood function atitsmaximumpoint .Werepresent thefullmodelwit hpparametersas

Akaikeformul atestheproblemofst at istical mod elidentificati on astheselect ionof asubmodelf (y, X, (3,),wherethepar ticular rest ricte d modelisdefined by thecon-straintsri.•+1=;3.•+2= ... =;3p=0,so that

model(s ) :f(y,X,(3.•),(3.•=(;31.;32,..;3"O,...,

of

wheresis thenumberof para metersand(3,isasubspaceof)R".Let/3,bethe maximu mlikelihood estima teundermodel(s),thenthelog-likelihoodfunct ionis given by

(29)

wherekis thecardinalityof8.Unde r this criter ionwe choose themodel withthe

Baye sianInform a tion Cri ter io n(B IC)

Schwar z (l !J78)suggeste d nsingaBayesianapproac hto the mod el select ionproblem. Thismethodresultsina criter iontha tis similartoAIC.Itisbased on thepenalized log-likelihoodfunction evaluatedat the maximumlikclihoodestimate forthemodel. Thepenalt ytermin theBIC obta ined by Schwarz(1!J78) istheAIC pena lty term kmult ipliedby

~l

og(n),

wherenis thesamplesize,Similarlyto AIC,the BlC ofa submodelisdefined tobe

BlC (s ) = - 2f (iJ,) + kl og(n ).

The submodelwiththeminimu mBlC valueis selected. It hasbeen observed that minimi zingAICdocs not produce asympt ot icallyconsisten test ima tesof thecorrect model. Incontrast,BlCis consist ent.

Mnllow'sCsCritcrto n

Mallow'sCi.is atechniquefor modelselect ion inregressionprop osedbyMallows (l!J73, 1!J!J5). TheC,sta t istic is a crite rion to assessthefit when mod els wit h

(30)

differentnumbers of param et ers arc beingcompa red. TheMallows crite rion for a

whereRSS(s)is theresid ua lsumof sqnaresandkis thecardina lityofs. Usnally Ckisplott ed againstk for thecollect ionofsubset mod els ofvarioussizes under conside ra t ion. Accept abl emod els (minimizing thetotalbias of thepredictedvalu es) lirethose forwhichCkapproachesthevalu ek.

Insummary, the inform ation-t heoretic approaches are based011Strongparametric mode!assumpt ions . InGLMsand QL,themod el is frequentlyspecifiedbya setof

cst imatingequat ions undwern aynothavefullyspecifiedparametri c ussumpt ions.

Hence, thesemethod scannot beuseddirectl y.One solut ionisto uscnonp ar nmctric

theoretic npproac h istho computa t iona lburde nof fittin g allpossiblc submodels.

In the next section,we discuss the empiricnl-likelihoo d- base d inform ation- t.heoret.ic approac h forvariable select ionproposedbyVariyat h,Chen,andAbrah am (2010).

Em pirica l-Likelihoo d- Base d Information-Theor eti cApproa ch

Vari yath , Chen ,lind Abrah am (2010) develop ed an informa tion-th eoretic approac h to varia ble selectionbas edonanonp aram et riclikelihood,foruscwhen1Iwell-d efined

(31)

parametric modelisnot availab le. Theyrcpl accdthcpa ram et riclikelihood by thc cmp iricallikelihoodand invest iga t edthcusc olempiricnl-likelihoo d- bused AlC aud IllC .The empirica l-likelihood-based AlCisdefinedtobe

EAlC (s ) =11'(,8., ) +2k,

whcr cll'(,8,)=2£EL(,8,)ist hecmpirica l-likelihoo d rati ofuncti onfor thesubmo de l. Sim ilar ly,the empirica l-likelihoo d-based BICis defiued tobe

EIllC(s)= 1I'(,8, ) + klog (n ).

Thcbes tmodelisidcuti ficd asthcmod clwiththc miuimumvalueof EAlC (or EIllC ) overallpossible suln node ls.Moredet ailsofthe empir icallikelihood ar e givcu inCha pter3.Va riyath , Chc n,aud Abrah am (2010)showthattheemp irica land pa ra -metriclikelihood-based AIC andBIChavefirst-ord er asymp toticpropert ies. Their simulat ionstud iesshowthatwhe napar am et riclikelihood exists,thctwometh ods have similarperformance. The empir ica l-likelihoo d- basedap proac h is su perior when thcparamctr icmod clismisspccificd.

Inthciuformati on-thcoret ic app roacha completeevalua tion ofallthe subruodelsis necessar y,Asthclllllllbcr ofcovariatcsincrca...,cs,the computa t ioualburden becomes more severe, To avoidtheevaluat ionofallthesubmo dels,anew penalized-likelihood

(32)

1.2.4 Pen ali zed-Lik elih ood Approac h

Theidea of penalizati onisveryusefulillsta t ist icalmodelingparti cularl y ill high dimension alvari abl e selection, Most tradition alvariable select ion procedur es such

I

.'

AIC,Mallow'sCk ,and BICuse afixedpenaltybasedon the size of themodel. However ,allthese procedur esusc eit herstepwiseorsubset -select ion procedur esto select thevariables. These select ion procedur esmaketheprocedur es computat iona lly intensiveand unstable. Toovercometheineffieiencies oftra ditio nalvariabl e selcctioll procedur es,Fan and Li(2001) prop osedaunifiedapproachvia nonconcave penal -izcdleast squares.This meth od auto mat icallyandsimulta neouslyselects variables andestima tes their coefficients.Theleast absoluteshrinkage andselectionoperato r (LASSO) proposed byTibshiran i(1996,1997)is another variantofthep enalized-likelihood approach.FanandLi(2001)applied thepenalized-likelihood approachto lillearregression, robustl inear regression, andgeneralizedlinearmodels.They show thattheproposedpenalized-likelihood est ima tor wit hthesmoothIyclipped absolute deviation (SC AD) penaltyfunction(defined inChapte r 2) out per formsallthesu b-set and informa tion -th coretiemriablesclcctionprocedur esinterms0fcomput ati onal cost andsta bility.TheSCADimprovestheLASSO by reducing theesti ma t ion bias. Furthermore,they show thattheSCADpossessesoracleproperti eswith aprop er choiceof thetuningpar am et ers.The true regressioncoefficients that arczero are

(33)

automa t icallyshrunktozcro,andthe remaining coefficientsarcsinmltancously est

i-mat ed .Hen ce,theSCADandits properti es areidealprocedu resforvari a ble select ion,

at leas t fromatheoreti calpointof view.Thisencouragesusto investi gat e SCAD

propert ies in no nparame t ric-likelihoo dse tting.

1.2.5 Motivat ionfor NewApproach

Severalmet hodshavebeendeveloped toselectthe bestsubmodel, The sequent ial

approac hesarccomp uta ti ona llylessdemandin g as the number ofcovariates increases,

buttheidenti fica tion oftheopt imal modelisnot guara ntee d.The sim plestand

mostwidely used variableselectionmethodis cross-va lida t ion. In some sit uat ions

thepred iction error isnotwelldefined,forexample ingenera lizedlinear mod els,

whichl imitstheappli cationofthistechnique.lnform a ti on-theoreti c variabl e selection

methods suchasAICand

m

c

arcbas edon thepa ram etri clikelihood. Thesetwo criteriacan notbe applied withoutfullknowled geof theparam etri cmodel.Ifthe

modelisnot well defined ,we can usc em pir ical-likelihood-base d AICand

m

c

.

In somesitua t ions,the numbe r ofpossibl e sub mo de ls islarge, and the comp uta tio na l

cost beco messubs ta nt ialifa llthes ubmo delsmust beevalua te d.Met hods bas ed on

penalizedlikelihood suchasLASSOandSCADhave superiorcom pu ta t ionalefficiency

(34)

1.3PROPOSEDApPROACHTO VAHIAllLESELE CTIO N

sat isfies the ora cle prop erti es.Thepar am etri clikelihoodis a cruc ialcompo nentof thesemeth od s.Asdiscussed earlier,the par am etri cmod el isnotwelldefinedin ma ny cascs,lilllit ingt heapplicat ion of t he met hods.Weinvesti gat e th eprop ertics ofS CAD inanonp ar am et ric sett ing,whereinst ead ofthe param etri cIikelihood,we nset he empirical likeliho odbas ed on aset ofcst ima ti ngequa t ions.

1.3

Propo

sed

Approach to Variabl

e

S

el ection

Likelihood meth ods playamaj or roleinstatisticalanalysis.Theycan heusedto

theprobl emsarising whenthe da ta areincom pletelyobserved,dist ort ed , orsampled withabias.Theycanbeusedto poolinformationfrom differentdata sources.One prohlclllwit h paramct riclikclihoodin fcrenceistheriskof lllodclmis-spccificat ion. Snchllli s-sp ecificati on can callselik elihood-b as ed cstimate stobeineffi cient.To avoid theriskof mod elmrs-sp ecification, anonparam ctri cmeth od can bouscdTnstcad of parametr ic likelihood,wensc nonpara mct riccmpirical likelihoo d int hc pena lized-Iikclihoodvariabl c sclccti on oppr ooch.

(35)

1.3PROPOSEDApPROA CHTO VARIABLESEL ECTIO N 1.3.1 Empirical Likelihood (EL)

Owen (1988) introducedthc cmpiri callikelihood.Empiricallikclihood is anonp ara-mctri cmcth od of st atisticalinference.Itallowsust o use likelihoodmct hodswit hont as sumingthat thedata comc fromaknowndistribution.The empiricallikelihood method combines thereliability of ncnparamctricmethod s withtheflcxibility and effectivcness ofthc likclihoodapproach.

LetYt,!J2,··,!fltbe arandomsample fromacumulat ive distributionfunction

Pi =pry =Yi) =F(Yi) -F(Yi- )

be thoprobabi litymass assignedtOYi.Thc cmpiric al likclihoodfunctiondefinedby Owen (1988) is

Maximi zing

R(F)=log{L(p)}=

~log(P;)

n,=Illeadst o v',=- Thcmaximum empiri callikelihood

(36)

whcrcJt »)is the indicatorfunction.Thc clllpiricaldistr ibliti onflincti onbas cd on a rando msample is

F

,,(y) =~t/(Y; :5 Y),

Stati sti calinferenceonthe param eters can bebas ed onthe profile empirical likelihood . For example,if we are interestedin inforcnce ontho meanv sayrz,wcdcfincthcprofile empirical log-likelihoodforlLtobe

((ll)=snp

{

t

10g(1';): 1'; >0,;=1,2,...,71;t 1'; =1,t 1'i(Yi-IL) =O} . Owen(1988,1990,2001)proved thatthcempir ical likelihoodratiofnnctionhaslUI asymptoticX2_d_i_s_trib_u_{t ion wh}_en_u

een-,thctruevalue.Thisresult isusefulforIn

-fcrCllCc ollthcparamctcrs , sllcha....,tcstinghypothcscsand constfl1ctill g a confidcncc regionforIt.Notetha t thereis no needtoest ima teascaleparamet erinthe co n-st ruc t ionoft heconfirlcncc intcrval,and the confide nce regions arcnotncccssarily symmet ric becauseof the da ta-d riven approac h.Because ofthese properties,theEL methodhasbecom epopul ar inthcs tat ist ical literatu re and hasbeen cxtcndcdtolin -car regressionmodels(Owen,1991;Chen,1993,1994), genera lest imn t iugcquat ions (Q iuaud Lawless,1994), survivalana lysis(T homasand Grunkemeier ,1975;Li ,1995; Murphy,1995),surveysampling(Chenand Qin,1993; Chen,Sitt er, and WlI,2002) and time series(Monti.1997).

(37)

1.3.2 PenalizedEmpirical Likelihood (PE L)

Asdiscussed earlier,penali zed-likelihood-b ased variableselection can be appliedonly whenwehaveawell-definedparametricmodel.Whenwe arenot sureabo utthepara -metricmod el,butthe parameterscan beestimat edby a setofestimating equations, \\'e eanuse an EL basedonasetofestimat ing equations.So ll'eprop oseto replace the para met riclikelihood by theempirica l likelihood todefine anonparametricversion of thepenalizedlikelihoodmethod .Wediscuss theasymptoticprop ertics of ther c-gressionestimates,andwedevelopanalgorithm forest ima tingt heparamet ers.Our simulationstudiosshowtha twhenapar ametri cmod el is available,PEL-based va ri-ableselectiongivesresults similartothose achievedbypar ametric-likelihood variable selection,Theformermeth odoutperformsthelatt erwhenthe parametricmodelis missp ecified.We exte ndourapproachto Cox's proportional hazard smodel.We also applyour method to an Aust ra lianhealt hsurveyandalung-cancer da ta set,

1

.4

Outlin

e

of th

e

Th

esi s

Themainobjectiveof thisthesisistomake a cont ributiontovariableselcction.We mainlyfocusonpenalized-empirical-likelihoodvariableselection.InChapte r2 we brieflydiscussvariable selectionviathenonconcavcpenalizcd likelihoodproposed

(38)

byFanand Li (2001).InChapter3, weintroducethocmpiricalIikelihood and its

chara ctcristics.Wedescribcourpenalized-elllpirical-likeIiboodvariable selccti on and

discussitsasymptotic properti es.Tho algorithm isgiven inCbapt.cr 4.InChapt.cr 5 weprovide simulationst udiesto compare theperform anccof empirical-likelihood variableselectionwithpcnalizcd-param ctric-likelihood SCAD,int.hc cont.cxt oflincar regression,Poissonregression,andlogisticregression,We also apply our methodto thcAustralianhealth sur vey.In Chapt er6,wediscussthcimplementa tion of PELin Cox' sproportionalhazardmodcl.Ourconcludingrcmarks ar c givcninChaptcr 7.

(39)

=-Chapter 2

Variable Selection via N onconcave

Penalized Likelihood

A newclas sof variable selectionmet hods basedonanonconcavepeualized-likelihood approachwasproposedbyFanand Li(2001)and Tibshirani (1996). Thesemethods arcsuperior to tradition almethodsbecause of theircomput at ionalefficiencyand sta bility. Thevariable select ionand the est ima tionof theregressionparam et ers are carr iodoutsimulta neously.Thatis,insignificant variables arcremovedbyest ima t ing their regression param et ers as zero.Thesemethod swork reason ablywellin high-dimensionalproblems.In thischapt er ,wewillintroducethepenalized-likelihood variab le selectionprop osedbyFan andLi(2001)inthe cont extofalinearmodel.

(40)

Yi=Xi{3+ f ;,i=I , 2,

whereXi ERPis avect or ofcovariatesand(3E'RPa vectorof parameters .

a..;sume t ha t t hecollccte rl data{(Xi,Yi)} areinde pende ntsa mplesa ndydX;has

densitY!(Yi;X;{3).Agenera lformofthe penalizedlikelihoodprop osed byFan lind

Li(2001) is definedby

(2.1)

wheref( y,;Xi{3) isthe condit iona llog-likelihoodofydXi,l'J(*)is apenalt yfuncti on,

nndriis thc tnuingpar am ct cr .

In linearregressionmod els,if thecolumnsofthe designmatri xXare0rth onorm al

then itis easy to showthatthe best-subsetselectionmeth od and the stepwiseel

imi-na tioumet hod are equivalenttopenal izedlcast-squ ares cstimat ions wit hthe HARD

thresholdi ngpena lty proposedbyFan (1997)lind Antoni adi s (1997).Thispenalty is

l', (IIII)=02 _( IIII-o)2J(11i1<0).

Fora largevalue of1111,theHARDthresholding penalt ydoesnotoverp enalize.The

(41)

VARIABL ESELE CTIO NVIANONCONCAVE PENALIZEDLIKELIII O OD

.John st on c (199 4) inthe wa velet settingandexte nded by Tibsh irani(1996)togcncra l

likelih ood sett ings. Thepenalt y functio n usedin ridgeregression istheL2pen al ty,

PJ(lO/)

=<51012

Accordin gto Fanand Li(2001),agoodpenaltyfnnct ionsho uld rcs nlt

inanestimato r withthefollowing threeoracle propert ies:

LUnbiase d ness: To avoidunnecessar ymode ling bias ,thc cs tim a t orisnearl yun

-bia..sed whenthetrueunknownparam et erislar ge.

2.Sparsit y :Thisis athresh oldin g rule thatautomatic all y sctssmallestima ted

coc fficicnt s tozcro to rcducc t hc lllod clcomplcxity.

:l.Cont inuity: Thispropert y climina tcsunnecessa ry variat ion in thc mod elp

re-However,thepenalt yfuncti onsL"L2,and HARD donot sa t isfyall three cond it ions.

Asimplc pc na lty functi on sat isfyingallthrcc is thc SCAD pen alt yprop osedby Fan

(1997).Itsfirstderivati ve is

p

~(Ii)

= <5

{1(1i~

<5)

+

(~~

:;;

1(0

)

<5)

}

for somea>2 and

e

>O. (2.2)

Necessarycondi tio ns for the unbiasedness,spars ity,andcontin uityofthc SCAD

pen al tyhavebeenproved byAntoui adi s and Fan (2001). Thispcn al tyfnncti on

(42)

VARI ABLESEL ECTIO N VIANONC ONC AVEPENALIZE D LIKE LIHOOD Lppenallies(p=1,2,0.3)

§

P=1 p=2 ::-P=O.3

---Figure 2.1:Lppenaltyfnnction

As showninFigs. 2.1and2.2,allthepenaltyfunctions aresingularat theorigin,

satisfying1'8(0+)>O.Thisis the necessary condition forsparsityinvariable so-leetion.As showninFig. 2.2,the HARDandSCADpenalties arc consta ntwhen

{3is}argc,indica ting that thereisno excessivepenalizat ionfor large regressionco-efficients.However, SCAD is smootherthan HAn Dandhence yieldsa continnous

(43)

Figur e2.2:SCAD andHARD penaltyfuncti ons

Let{3o=({3;o,{3"io)Tbethetruevalueof{3.With ou tloss ofgenerality,weassume

that{3 20=Oan dallcOIllpOllelltsof{3lOarellonzero.LetI({3o)betheFisher inform a

-tion ma trixand letI,({3IO'O)be the Fisherillformati on given {320=0.Undersome

regularity conditions,FallandLi(2001) show that the est imat e of thercgrcssionp a-ramete r bas ed onthe SCADpenalt y,

/3

=

(/3~.,/3~")

T,sat isfies the oraclepropert ies

(44)

2.1 LOCAL QUADRATICApPROXIMATIONSANDSTANDARDERRORS

for a certa in choicc oftuningpar am et er (J,a),sincc !32 ~Oand .,fii(!3, - {3IO)-E..t N (O,I,'({3lO'O)).

ThcSCADpenaltyfnnctioninvolves two unknownparam et ers, J and n.In practice, we couldsearch for thebestpair (J,a)overatwo-dimensionalstru ctur eusing cross -valid ation (CV)orgenera lizedcross-valida t ion(GCV;Craven and Wahba ,19i9). However ,thiswould becomputationally expe nsive. From a Bayesinnpointof view, Fan and Li (2001) snggcstcd scttin ga =3.iandusing GCV to selectthebestvalue

2.1

Local Quadratic Approximations

and Standard

Error

s

Thcpcnalt yfuncti onp6(113jl)isirregnlar atthe origin and docsnot have conti nuous second-orde r deri vativ es atsome point s.Special care isneededin thc applicat ionof theNewt on-R aph son algorit hm,Fan and Li(200 1) locally approximatetheSCAD penalt yfunctionbyquadr ati cfunctions as follows.Supposeourinit ial value{3 ois closctothcmllXimi zer of (2.1).lfl3joi svcrycloscto zcro ,thensct!Jj =O,ot hcrwisc, thcpcllalt YP 6(ll3j l) can belocally approxima ted bythequadrati cfuncti onsvia

(45)

2.1LOCALQUADRATI CApPROXIMATIO NSANDSTANDARDERRORS

whcnf3ji'0.Inotherwords.

A disad vant age of this approx imat ion is thatoncea coefficient hasbee nshr unkto

zero, it willstayat zcro.However,this mcth od significant lyredu cesthc

compu-tati onalburden . Now we assume that the firsttwopar tialderivat ives ofthe log-likeliho odfuncti on arecont inuous,so thatitis a smoot h functionwith resp ectto

(3.Thcfirsttcrmin(2. 1)can belocally approximate dby aquadr ati cfuncti on via

Taylor's expansion.Themaximizatiouproblem(2.1)canbereducedto a quadratic

maximizat ion problemandthcNcwton-Raphsoualgorit hmcanbeused,Therefore, (2.1)can belocallyapproximat edby

C((30)+M((3of ((3 -(30)+

~

((3

-(3oft::, 2C((30)((3-(30)-

~

n(3TE6((30)(3

,

(2.3) wheret::,C((3o)=

DC~

~o),

t::,2C((30)=

~~

;;~)

.

Thc quadra t ic maximizat ion problem (2.3) is solved via thcNewto n-Ra phson

alga-rithm .In thisalgorithm,the upd a te atthe(k+I)'h itera tion is (3'+ 1 =(3k - [t::,2C((3k)-nE 6((3k)rl_[_M((3k)_{- IlU6(}_(3k)] ( k .

[

Po(I

f3

m

/1W

f3;D

]

(I>k) "(l>k)l>k

(46)

2.1 LOCAL QUADRATICApPROXIMATIONS ANDSTANDARD ERRORS The sandwich formulaforthe sta ndarderrorsof theest imatedparam ete rs exists immediatelybecausethismeth odestimates theparamet ersandselectsthevariables atthesametime.The standarderrors oftheest ima ted pa ramet ers aregiven by

Fanand Li(2001)conducte da seriesof Mont e- Carlosimulationsin linearregression, robustregression,andlogisticregressionandshowed that thepenalized-likelihood variableselection usingthe SCADpenalt yperformsbett erthanthe LASSO,HAnD, and informat ion-theoreticapproaehes.

(47)

Chapter 3

Variable Selection via Penalized

Empirical Likelihood

The empiricallikelihoodmeth odis apowerful inferencetoolwithpromising appli-cutio nsIn manyareas ofsta t ist ics.Inthischapte r,webrieflyintroducethebasic conceptofempiricallikelihood.Wcthclldiscnssthepellalizcd-empirical-likclihood

(48)

3.1EMPIRICALLIKELIlIOOD(EL)

3.1

Empiri

cal

Lik

elihood (

EL)

Wefirst out linetheem pirical likelih ood asdiscussedby Owen(19SS,1990).Fora

givcJl ralldomsa mplcYl,Y2"",Ynfroman unkuowu distributioIlft11lctionF(y),thc

empirical likelihood functionofFisdefinedto be

wherePi

=

F({Yi} )

=

Pr(}i

=

Yi)'The cm pirical likelihoo d ismaximized withou t any fur t her infor ma t ion a bo ut t he empiricald ist ribut ion func t ionF

where1(. )is theind icatorfu nct ionandtheinequalityis expresse dcompo nentwise.

Ingencral,it is morecommon to work withtheempirical log-likelih ood

(3.1)

subjec t totheconstrain ts

B

P

i

=1andPi>0,i=1,2, ..In.Supposewewantto illvc...,tigatcillferellceoll th c paramcterslludcr thca.."isllIllpti oll th at Fisamcmhcrof anonp ar ametri c dist rib ut ionfamilyF,sayIt=T(F )for somefunction alT ofthe

dist ribu tion.InferenceforparameterItcanheobtained using thelikelihood ap proach,

if weknowthelikelih ood valueatI"Foragivenvalueofu,thepopulationFEF

(49)

3.1EMPIRI CA LLIKELIHO OD (E L)

noti on of profilc likclihood isto findthcFatwhichthccmpirica l likclihoodattains themax imumvalu e amongthesetofT(F)=It.The profileemp irical likclihood

funct ionisdefined tobc

L,,(ll ) =Sli p{L,,(F )

I

T(F)=1',FEF}.

Wc can constru ct th clikclihoodinfcrcncc onl'basedonL,,(ll ).Thislikelihoodhas

simila r prop ert iesto itspar am etri c counte rpa rt.SinceLn(JI}$n-",it is convenie nt

tostanda rdizeL,,(l l )bydcfiningthclikelihoodratio functi on to be

R(F) = n"L,,(l t),

and itis easily shownthatthis can bewrittenas

R(F) =D"P;'

Thelikelihood ratiofunctionhas amaximumvalu e ofI. Forsimplicity,we can performinforence on anyfunctionFlisingthcpopulatiolllllcaup,=(/ll,112, · · · ,11d), viathc pro filccmp irical likclihood.Thc profilc cmpi ricall og-likelihoodfor11isd cfincd

(50)

3.1E~IPIlUCALLIKELIHOOD (EL)

Wecaucomputej'(u}by meximi zing{t IOg(Pi)}by theLagran gcmultiplicr

mcth odunderthe aboveconstraints. ThcLagrangemultipliermcth odisvery c

f-fcctiveforthis const raint maximizationproblem.Define

whcrc>'(vcctor-valncd )and 1 arcL agran gcmnltiplicrs.Byscttin g th cp arti ald

criva-tive ofGwith respecttop;tozero, wc gct

1;'=n{ I+>';'(Yi-ll)}'fori =1,2, andthcLagrangemult iplier

>.

=~(I')is thcsolutionof

Therefore,wecanwritcthc profilccmpirical likelihoodfunctio nas

(('1)=-nlog(n) -t 10g(1+>.T(I')(Yi -JL» .

Noww cdcfinetheprofilc empirical log-likclihoodratiofunction tobc

W(JL)=t log(npi)=t Jog[I+>.'1'(I')(Yi-

'1

)

].

Owen (1990)showed that,when110isthetruepopulationmean ,211'(/10)-.E..;X~

asn---too,simiiar to t hc paramet ric likelihood rat io function of Wilks( 1938).

This result isuseful forhypot hesistestson parameternand for thcconst ruct ionof

(51)

3.2PENALI ZEDEMPIRIC ALLIKELIHOOD BASED VA RI ABL ESELECTIO N

wherex?L(1 -a) is the(1-o)" qua ntileof the chi-squa re distribution wit hd de-grees of frcedom . Thisisdifferent fromtheconfide nce intervalsbas ed ona normal

approximat ion.

3.2

P

enalized

Empiric

al

Lik

elihood

ba

sed

V

ariabl e

S

el ection

Owen (1991)first considere dEL forlinearmod els.ELconfidenceregions for

regres-sioncoefficients in linearmod els werestudied by Chen(1994). We consideralinear

mod el of the followingform

lIi=Xif3+fi,;=1,2,

where Xi E'RPis a vectorof' covari ates and{jE'RPa vectorof' parameters.

assume that the lId Xisare condit iona lly independ ent.We alsoassume thatthccrror

term e, isindepend ent and identically distr ibute dwit h mean zero and finite variance (]2.Thus,E(lIdX;j=xif3is thecondit iona l meanfunct ion andVar(yd Xi )=(]2.

(52)

3. 2PEN ALI ZEDEMPIRIC ALLIKE[,1JIO O D BASEDVARIABL ESEL ECTI O N

FollowingOwen (1991)and Qinand Lawless (1994),we can extendtheempirical li ke-lihood infereneesforlinear modelsbas ed on a set of estim a ti ngfllneti ons y(y,X,{3). Assume that thegeneralized linearmodclisdefinedhy E[ Y(Yi,Xi,{3)]=0.In

gen-eral,.qis a vecto rofl' x1estima t ing funct ions.The profileempiricallog-likelihood

func t ionofd isdofin cdby

f(

{3)=S

ll

P

[

~ lOg(l'i):

l'i>0,i=I,2 ,...,n;

~

l'i =

l,

~

l'iY(Yi,Xi

,{3

) =ol

Usingthc Lagra ngemlllt ipliermcthod disellssed ill Section3.1,we calldefine

whereX(vectorvalued)and1 are Lagran gemultipl iers. Setting thcpar tial derivative ofGwit hrespectuip,equa l to zero gives

l;i=n{

I

+

5

/

Y~Yi'

Xi,

(3

)

}

'

fori=1,2, where theLagran gemult iplierX=).({3)isthesolut ionof

~ 1

/;~~~:'~:,(3)

=0. Thislcadsto t he pro filcem piriea l log-like1ihoodfllnet ion

f({3)=-nlog(n)-

~lOg(1

+).T({3)y(Yi,Xi,{3)) (3.3)

(53)

andthcprofilccmpirical log-likclihoodratiofnnction isdcfincdtobc

1I'({3)=

~

log(n[i;)=

~

log(!

+

>.T({3)g(Yi'X;,{3)). (3.5)

Nowwedefinethepenalized empirical likelihoodest ima torof {3as themaximizerof

L({3)=-nl og(n )-

~

[log(!+>.T({3)9(Yi,Xi,{3ll] -n

t

P. (I{3j ll

=e({3)-nt p· (I{3jll (36)

wit hrcspcctto {3,whcrc p.(*) isthcpcnalt yfnnction.Wccannsc any of thcpcnalty functionsdiscussedinChapter2.Var iyath (2006) firstintroducedthePEL,but reportedsome computationalissueswithover-penalizat ions.WelISCt.hecontinuous diffcrcnti al smoot hlyclippedabsolute deviati on (SC AD) penaltyfunctionwithtwo unknowntuningparamet ers(,s,Il)prop osedbyFan and Li(200!)and definedin (2.2).Intheuextsection wewill discussthedistributionprop crticsof th cpcnalizcd cmpirical likelihoodest imatcsof (J dc rivcd by Variyath(2006).Thc algorithm for t hc penalized empirical likelihoodwillbediscussedin thenext chapte r.

3.3

Di

stributional

Prop

erties

Variya th(2006) stat edand provedtheorems inconnection withPEL;wcreprod uce themhere.Let{30=({3io,{3~)Tbethctruevalue of {3wit h vectorlengths ofk

(54)

and p- kres pec t ivcly. Wit hontloss of generality,wc essumcthatfij,=

o

andall componentsof f3lOare no nzero .Let I( f3o)bethe Fisherinforma t ion matr ixand let

II(f3111,0)betheFisherinformation given f3,o=0. Under someregularitycond it ions, our penalized em pirica l likelihood

SC

AD

est ima t or

/3

=

(/3

~

,

/3

~)

'l'satisfies the ora cle propertiesfor a certainchoiceof the tuning paramet ers(S,lL).Hence,itis ca..syto provethat.

/3

,...!:..t

0

and..;ii(/31-f3IO) -!3...,N(0,

I

i I(

f31O'0)). Thefollowingtheor em provesthoexiste nceofalocalma xim izer of thepenali zed emp irical likelihoodL(f3).

Theorem3.3.1(Variyath,2006)Suppose (Yi,Xi),i=1,2,..,n isasetof in de-pend ent an didentica lly dis triinu ed rand onivectors.Letgi(f3 )=g (Yi ,Xi,f3)bethe

estim ati ngfllnction8 f or f3E R.PslIchthat forenchi= 1,2,...n,

E{gi(f30)}=0

[or someBs.Al80 asslIme tha t

(i)V= E{g (f3o)g'l'(f3o)}ispositive definite,

(ii)

OX;;

)

is continu ousinf3inaneighborhoodoff3o , (iii) therank of E{

O

X;;)

}

isl'inaneighborh ood off3o,

(55)

(iv)Ihere exis ls sam e juncti01~. G (y, X)such Ihalinaneighbarhaad aj{3o,

ID

:;:)

1<G(y,X),IIg(y,X,{3)1I3

<G(y,X) such IhalE[G(y,X)]<00.Thetuningparamet er<5is chose nas ajunction ojn such thatlIIax (P6..I,Bjnl:

,B

jn i

0)--; 0 as n--;00. Thenthere ex -ists a local maximizer

13

o]L({3) such that

1

13

-

,B

oll

=Op(n-I

/2

+

bnl. where

Theorem3.3.1ShOWHthatforanappropriate choice ofe.;thereexist sa root-neon

-siste nt penalized empiricallikelihoodestima tor.Thefollowinglennnn showsthatthis

est imator musthavethespars itypropert.yi32 =0.

Lemma 3.3. 2(Vari yath,2006)SUPlJOse (y"Xi),i=1,2,. ., nis a seto]itule -pendentandident icallydist ribut ed1Tlwl am vectors. Letg;{(3)=g(Yi,X,,{3) bethe estim ating jun ct ionjor{3ER." such that,jar eachi=1,2.

E{g,({3o)}=0

[or some{3o.Also assumethat

(i)If=E{g({30)gT ({30)}isposiiioedefi ni te,

(56)

(iii)IhemnkOfE{Dg;:')}ispinaneighborhood of {3o,

(iv)Ihere exists s om efn nctioll.<C (y,X).m chlhal in a n eighborhood of{3o ,

IDg;:')1<C(y, XJ,lIg( y,X,{3)113

<C(y,X) sucli thatE[C (y,X))<00.

(3.7)

If 6"-70and y'ii6"-700,then with ]J1'Obabilityten dingto1,for'any given(3, .m tisf yingll{3,- {3wll= Op(n-I/2)andan ycons tant C,

Usingtheabove lemm a , onecan prove thefollowingtheoremontheasymptotic normality of th e empir icallikelihood cstim at e.

Theor em3.3.3(Vari yath,2006)/n additian to the con ditions of Theorem3.3.1and Lem m a3.3.2,snpposethat

~

;}:~

is continuous in{3in a neighborhoodof thetme valneof(3o andisboundedby some integmbl e[unction Cry,X). Then

(57)

wherei3istliepeu alized empiric allikelihoodestima teof,8 and

~

=[E

{

D

.

~~~

))

r

{E{g(,8o)rl (,8o)}

-

I

}

E

{

D~

~

o)}

]

-1.

3.4

Pen aliz ed

Adjust ed

Empirical Likelihood

Computat ion oflV(,8) fora givenvalueof,8maylead tosometechnicalproblem .

The solut ion forA mustsat isfy{ 1+

>.

T

(,8)g(Yi'Xi,,8)}>0 foralli=I, ...,n.A

necessary andsufficientconditionforits existence isthatthe vector'O'isaninncr

pointoftheconvexhullof{g(Yi,Xi,,8),i=I,...,n }.Thetrueparamet er value,8o istheunique solut ionofE[g(y,X,,8)] =0. I3ut,under somcmom cnt conditions on

g(y,X,,8)(Owen, 2001),theconvexhull{g(Yi,Xi,,6),i=I,...,n}contains0as

itsinnerpointwithprob ability1asn-t00.When ,8 isnot closetoBg,orwhen

nissmall,t hereisaconsiderablecha ncethat t hesolut ionof(3. 4) doesnot exists.

Toavoidthisproblem,Chen,VariyathandAbrah am (2008)introd ucedthe adjuste d

empirical likelihood.

Denote9i(,8)

=

9(Yi,Xi,,8) andy,,(,8)

=

*

t 9i(,8)for anygiven,8.

posltl ve constn nt c.iriefine

9,,+1(,6)=

-~

{;9i(,8)

(58)

Now theadjust edprofileempiricallog-likelih oodratio fuucti onisdefined as

[

,,+1 ,,+1 ,,+1 ]

W' (,B) = ~np8Iog[ (n +I )l'il :l'i>0,i=I,2, .. .,n + l ; 8I'i=l,8PiYi(,B)=0,

=

~ log [l+

,\T(,B)Yi(,B)]

,,+1

wit h,\='\(/3)bein gthesolu t ionof 8 1

+

Y~(f;'(,B)

=O.Not etha t now0always liesinsidethe convex hull of{Y(Yi'Xi,,B),i=I, ...,n }.Theadjust ed em piricallo

g-likelihoodratiofuncti oniswelldefinedafte rad d ingapscud o-valucY,,+l(,B). Fora

wide ra ngeo fa,,, W '( ,B)havcsa me firstorderasylllp tot ic p roper t ics of W( ,B)(see

Che net al.,2008).Weextend thisidea of penali zedadjust ed em p irical likclihoodt0

avoid thetechni calproblemof non-ex iste ncc of solution to(3.4) foranygivenvalue

of ,B.

Nowwc dc finc t he pc nalizcd adjllstedelllpirical likclihoodcstilllator of ,B asthem

ax-(38)

with resp ectto,B,WbereI'6(*)isthe penalt yfunctiondefinedin(2.2).This adj ust-mentisparticul arl yusefulbecau se even for some undcsirnbl e valuos ofdand tuning

parameters , theproposedalgorithmguaranteesa solut ion,Now,\VC ca llshow that thcpenalizcdadjust ed empirical likelihood has thesa measymp t ot icprop erti es ns

(59)

3.4PENALIZED ADJ UST EDEMPIRICALLIK ELlil OOD

thcpcnalizcdclllpiricallikclihooddctail cdin Scction3.3.WcHtat c andprovc thcfol

-lowingtheoremsandlemmato showthat thcpenalized adjustedempiricallikelihood

estimateshaveoracleproperties.

Theorem3.4.1Suppose(Yi,Xi),i=1,2, ..,11.isasetof independent and ule

nti-cally dist ributedmndornuectors.Letgi(13)=9(lIi,Xi,13)bctlie estimaiin qjunctions for13ER'psuchthatforea ch i =I,2,... n,

E{gi(13o)}=0

for some130'

(i)V=E{g(13o)gT(13o)}ispositivedefinit e,

(ii)Dg;) iscontinuo us in13in a neighborhoodof130,

(iii)themnk of E{Dg ; )}i8Pin a neighborhoodof130,

(iv)there exists somef unctionsG(y,X)suclithatinaneighborhood0f13o,

IDg ; )1<G(y,X),Ilg(y,X,13)II"<G(y,X) such that E[G(y,X) ]<00.Thetuningl'ammeterJ ischosen.asafunction of Tnsuc h.thatIllax(p:l,..

l

!3jol:!3jooj

0)---+0asTn---+00,uihereTn=n

+

1.Then

(60)

thereexists alocalrnaxirnizer13ojU(,B)such that1113-/3011=0,,(rn-I /2+Ii", ), wher·eli",=max(l'U/3jol:/3jo"/O).

LetIt",=rn-I/2

+

_Ii_{", .}_I_t_i_{s s}_ufficie_{nt t}_{o s}_h_{ow t}_h_at_f_{or a}_n_yE>0,there existsalarge enonghCsucht hat

Pr {supL' [(,Bo+a",u);lIull=C]<L'(,Bo))2':1-E. (3.9)

Thisimpliesthatforlargern withprobability atleast1-E,thereexists alocal maximizerinthe ball[(,Bo+a",u);Ilull=CI.Hence,there existsalocalmaximizer

suchthat1113- ,Boll=O"(n,,, ).Let

D;,,(u)=L'(,Bo+a",u)-U(,Bo)'

D;,,(u)={f' (,Bo+a",u)-f'(,Bo)) -{l', (,Bo+n",u)-l',(,Bo)) ={f' (,Bo+a",u)- e'( ,Bo))-111

~

{l',(I,Bjo+a",ul)-l', (I,B,ol)}, whereI;is the numborofcomponents in,BIO'TheLagran gemulti plier inA(/3o)can be expressedas

(61)

3.4 PENALIZEDADJUSTEDEMPllliCALLIKELIIIOOD

-C({30)=t lOg{l+..\T(.Bo)9i({30)}+op(1)

=

t

..\T(.Bo)9i({311) -

~

t

[..\T(.BO)9i({30)]2

+op(1)

=*g:': ({30)\~;;I ({3o)gm ({3o) + Op(1).

(62)

Now, letting

(3. 10)

It can easilybe shownthat~ist.heasympt.ot.icvariallceofvm(j;-{3o ),undso t.he representat ionissimilartonormalizcdparametric likelihood.Bythcccnt.rallimit. thcor em ,f'im({30)isOp(m -1/2),thusthefirst term on theright-hand side of(3.10) is oforde rOp(ml/2_{0 m)}=Op(mo~. ).By select ingalargeC,the second tcrmdomin at es thefirsttermuniforml yinlluj]=C. Thethird term isbound edby

Thisis also dominat edby thesecond term in(3.10).Hence,by choos inga sufficient ly lar geval ue ofC,(3.9) holds.This completesthe proof.Thcorem 3.4.1 showsthat foranappropriatechoiceofJ"othereexistsaroot-r n consiste nt penalized empirical

(63)

3.4PENALI ZEDADJUST EDEMPIRICALLIKELIIIOOD

likelihoodestimato r. Thefollowinglemmashowsthatthisest imato rmusthave the

spars ityproperty !32=0.

Lemma3.4.2Suppos e(Yi ,Xi)'i=1,2,.. ,'IIisaset ofindepen de nt and iden tica lly

dist ribniedrandom vcctors .Let gi({3)=g(Yi,Xi,{3) bethecstima tinqjuncti onIor

{3ERi'suc h that,for' eachi=1,2 ,...,n,

E{gi ({30)}=0

fOT 8om e{30'

(i)V= E{ g({3o)r/"({3o)}ispositivedefin ite,

(ii)ag;::')iscontinuous in{3ina neighborhoodof {3o,

(iii)the m nk Of E{ag;::' )}ispin aneighbor-hoodof{3o,

(iv)there existssome junct io nsG(y,X) sucli that inaneighboThoodof {30,

lag ;::')I<G(y,X),Ilg(y, X,{3)II"<G(y,X)

suclithat E[G (y,X)]<00.

(64)

3.4PENALIZEDADJUSTEDE~IPII\ICALLIKELIHOOD

wher-e77l=n +1. Ift5m...Oand,fiiit5m-tOO,thenwithprobabilitytendingto1,for any givenf3, satis f yingllf3,-f31011=0 ,,(711-1/2)and anycons t antC ,

FollowingFanandLi (2001)inproving thi sLclJlma,itis sufficicnttoshowth at forf3 sat isfyingf3,- f31O=Op(771-1/2) and forSOIJlCsmallf"= Cm- '/2, and j="'+1,...,1',

iJ~;:

)

<0 for 0<

o,

<f m

for- fm< (3j <0. (3.12)

Duetothcconditiononl'",,( If3I),thc taskisequivalenttoshowingthat,uniformly inf3,

Thatis,the slopearoundthetrnevalueoff3islowcomparedto thc slope ofthe

penalty.Now

(65)

Sincef31 -f310=01'(m-1/2),itis simple to show thatwestill have

Helice,

uniformlyin both i =1,2,.. ,m and {3. Thuswehave

I

D~~)

I

s

II,\T(f3j)

II

~

II

D

~~j)

II

[1+",,(I)]

=0,,(m-I/2_)0I'_(m)[₁

+

_",_,(_I)] =0,,(ml/2) .

Usingthe aboveresults,foreachcomponent of{3wehave

Usingthe assumption(3.11),.fi1i8", -;00and8m- ;0,thesignof thederiva tive is complet ely determinedbythat off3j.Hence (3.12) holds.This completesthe proof. Usingtheabovelemma,we canprove thefollowingtheorem ontheasymptotic norlIlalityofth e adjustedelIlpiricallikelihoodestilIlat e.

(66)

Theorem3.4. 3In addition totheconditions ofTheore m3.4.1an d Lemma3.4·2, suppose the second derinaiiuesof each componen tofg,say g[k J,

D~~;

]

,

aI'xI' matrixwiththe(ij)thentry~:+, iscontinuollsin{3inaneighbo:,.hoodof{3,"and is boundedby somein tegrable functionCry,X).Then

wheTe!:Jisthepenalizcdernpir'icttllikelihoodestimate of {3 an d

A

=

[

E{

D~

~

)o

)

r

{

E

{g ({30)gT ({3o)}-)}

E

{

D

~

~o

)}]

-I. Duetothe sparsity propert y givenin Lemma 3.4.2, itis soeuthatthopenalized adjustedempirical likelihood estimat orwith propel'tuning para meterri .,maximizes L' {({3"Of} withrespcct tod..Hence,

(67)

3.4PENA LIZEDADJUST ED EMl'lfliCAL LIK ELlII OOD

F

ornotational siIllPlicity,Wedonotdiffer

enti

ate~ and~fortherest ofthe

proof.Thatis,wepresentour proof asifk:=1'.Ifwe expandthese functi ons at

(13= 130,>'=0),wehave Li.",(J3,,\)=Li.m(I3I"0 )

+

[Li

,

",~:o

,

O)]

(13- 130) +

[Li

,,~

~;,

O)

]

(,\_0)+op(o",)=0,

Li,m(J3,'\) =Li,,,,(130,0)

+

[Li

,

",~o,O)

]

(13-130)+

[Li

.,~

~;,

O)

]

(,\_0)+01,(0", )=a

where0",=1113-13011+11'\11,Thepa rtialderivati ves intheaboveexpa nsionsare

Li

.

",

~

o

,O

)

=

~~

DgD~o)

-t

_

E{Dg~;o)

}

,

Li,mi[;,,0'O)=

~

t

gi(l3o)gi(l3o)-tE{g(l3o)gT(l3o)},

Li

,

,,,~o,O)

=

p:;",(Il3ol),

Li,",i[;,,0'O) =

~

t{

DgD~o

)

}T-tE{

Dg~

;

o)

r

,

(68)

SinceL;,,,,(I3IO'0)= 9",(130 )=

Op(m-1/2) ,we caneasilyshowthatJ",=Op(m-I/2). When

1"L

(1131)--+0as m --+00,thelimiti ngdistribu tion of/31-1310will beasy nip-toticallynormal vi.c.,

and5;':=-~-Iisthe (2,2)'" elementof5;;,1 as suming

p

'L

(1131)=O.This completes theproof,

(69)

Chapt

er 4

Num

erical

Algorithm

Toimplement our meth od,weneed an efficient numerical algorit hm.Variyat h(2006) rep ortedsomecomputat ionalissues wit hover- penalizat ionsthatresulte d inhighbias .

Wemaximi ze the PEL wit h respectto(3 using amodified Newton-Rap hsonalgorit hm.

Ateach it erat ion oftheNewto n-Raphson meth od,we computetheLagra ngemult

i-plierforan updat ed valueof {3.Chen,Sitte r,and Wu(2002) prop osed amodified

Newto n-Ra phsonalgorit hm forco mput ingt he Lagra nge lllult iplier fora givenvalueof

the param et er.Thismeth odisnumerically sta ble,which isusefulin this application.

The numerical algor it hmgiven inSect ion 4.!and4.2 canbe easilyextended tope-nalizedadj uste dempiricallikelihood,byadding apseudo- value g,,+I({3)=-a"g,,({3),

(70)

4.1

Computation of La

gran ge

Multipli

er

TheLagra nge mnlt iplier>..is est imate d by solving the equa t ion

fora givenset ofvectorsy,({3),i=1,2, ...11.Note that the aboveequa t ion is the derivative offlwithresp ectto>"for agiven {3,where

(4.1)

In the empirical likelihoodproblem,thesolut ion must satisfy the conditiontha t

1+>..Ty,({3)>0,i=1,2,...11.

The modified Newton-Ra phsonalgorit hm for estima t ing>..for a givenvalueof {3is

1.Set>'"

=0, C=0,

'l

=

1,e=le -08,and {3

=

(30.

2.LetR"andR""bethefirst andsecond parti alderivati vcsof figivcnill(4.1) withrespect. toA,which are given by

Compute

R"

andflufor>..=>'"and let.6.(>..')=-

[R""

r

'

R"

.

If116.(>"' )11<e sto p thealgorithm andrepor t.>"' ;oth erwise cont inue.

(71)

3. Calculatelic

='It.( N).If1

+

(A'-(j')Yi({3):::;0for somei,let -(=

f

and go to Step2.

4.SetAc+1

=

_AC

-li',c

=

c+I,and1c+1

=

_(c₊₁₎_{- 4}_a_nd_go_t_{o Ste}_p₂_._S_{te p} 2 willgua ranteetha t Pi>0and theopti mizationis carr iedout. intheright

4.2

Algorithm for Optimizing Penalized Empirical

Lik

elihood

Let.,X(iJ)be theest imate dvalueofAfor a given {3.Wemaximi zethoPELdefinedin (3.G)over {3.Weuse the modified Newton-Ra phsonalgorithm prop osedbyFan and Li(2001).Notet.hat. thepenal ty funct.ion l',(liJjl)isirregular at. theoriginand may nothave a secondderivat iveat somepoints. Specialcare isneededin r.henpp licatio n of theNewton-Ra phsonalgor it hm. Here too, thopenalty funct ionislocally a pprox-imntcd asdet ailcdinSect.ion2aspro posed by Fanand Li(2001). We assumethat the profile emp irical log-likelihood funct ion is smoot hwithrespect. to{3so thatits first. t.wopart.ialderivat.ivesare cont.inuous.Thus,t.hefir st term in the profileempiri -cal log-likelihoodcan belocally approxima te dvia Taylor 's expans ion.Therefore,the

(72)

maximizationproblemcan beredu ced to aquad ra t icmaximi zationI'rohlcm , andthc Newton-Ra phsonalgorithmcan hcused.ThemodifiedNewton-Raphsonalgorit hm

forcsti ma ting(3usesquadra tic approximati on of theprofilc empiricallog-likclihood

functi on .Analgorit hmfor opt imizing thepenalized empirical likelihood , similarto

thatinFan andLi(2001),is asfollows:

1. Set(3=(30,andee-Ic-OS.

2.Let5.=A(,B)bethe est imatedvalueofA.

3.Theparam eter(3 iscomputed itera tivelyandthesolutionatthe(I.:+I)'"

iterationis givenby

where\V((3) istheprofile empirical log-likelihood ratio functi on defincd in(3.5),

11'1'=

Dl~;,(3)

,

11'1'1'=

~

~~~)

,

((3k) .

[

P

J

(il3

flJ

P

J(I

,B

l~lJ]

1U

((3k) c-((3")(3k E; eediag~,...,~,all{ ; =u ; .

Notethat to compute\VI'andIVI'I',wenccdto estima te theLagran

gcmulti-plier 5.(,8)asper Section 4.1.

4.If minl(3(k+ll_(3(kl

l<

e sto pthealgorit hmand rep ort(3(H I);otherwiseI.:=

(73)

We examinethesimplifiedexpressionsforIVI3andIVI3I3asfollows.LetRI3,RI3I3, and RI3Abethefirstandsecond partial derivatives of(4.1)wit hrespectto {3and '\

Now thefirstderivativeof1V({3)withrespectto{3is

Notethatfor,\

=

5.(,8),RA

=

o

.

_T_herefore,

(4.3)

Similarly,thesecondderivative of W({3)wit hrespectto{3 is

"

[

(1

+,\"I'(,8

)9

i(

{3

)

} { [~

]

[gi({3)f+2g;({3)[

~r

+g;'({3),\(,8)"I'}] 11'1313=

8

{I

+

,\"I'(,8)gi({3)F

_ "[{

[~rg

i({3)

+g;({3),\(,8)}

{

[~rg

i(

{3

)

+g;({3)'\(,8)}"I']

8

{l+,\"I'(,8)gi({3))2

(74)

FollowingOwen (2001),alocalquadraticapproximat iontoRleadsto

(4.4)

Optimization over {3 iseasier ifWI313isnegativ edefinite.The secondterm in(4.4)is negative semidefinite,hntthefirsttermRI3I3mightnotbe.

4.3

S

election

of Thr

esholding

Param

et er s

TheSCAD penaltyfunction involvestwounknownpara meters,<5anda.In prac-tice,we couldsearch forthebestpair(o,a)overatwo-dim ensionalstructureusing cross-validation(CV;Stone,1974) orgeneralizedcross-validation(GCV ; Cra venand Wahb a,1979).However,thisiscomput at ionallyexpensive.FromtheI3ayesianpoint ofview, Fanand Li(2001)suggested usinga=3.7,andthis value will beused throughoutonrsimulat ionstudics.Lct t heclllpiricallikelihoodrat iofunction evalu-atedat(3and),.(/3)be

(75)

11'(13)={t IOg(1+>..(fW 9i(13)} .

Then,wedefinetheGCVcrite rionto be

GCV(

J) =~

_n_{[l- e(J )/ll f'}

where e(J )isthe elfectiv ennmberofregression coelficients given by (4.5)

wherell'tltl(13)isthe secondderivative oftheprofile empiricallikelihoodfllllction with respect to{3(see(4.4))cvaluatcdatB,tr dcnotcst hctracc of amatrix.Wc choosethetuningpar amet ersd tomillimizeGCV(J).

4.4

Standard Error Formula

The standarderrorsfor theestima tcd regressionpar ameterscan be estimated di -rectlybecauseweareestima ting theparam et ersandselectingthe variablesat thc sametime.Following theconvent ionaltechniq ueinthelikelihoodsetting,theco rre-spondingsandwichforlllula callbcllscd a.."i an estimat.or fort hccovarianccmatrixof the estimates{J:

(76)

(77)

Chapter 5

Simulation Studies

We conductedaperformanceanalysisbased011a series ofMonte-Carlo simulat ions inIinearregression,Poisson regression,and logisticregression andab oappliedour method toareal-data example. In the simulat ionstudies we compare our met.hod wit hthepenalized-likelihoodSCADmethod .Our performan ce measuresfor these

compa risonsarc the median oftherelativemodel error(MRME),thoaveragonumber

ofestimatedzerocoeflicientsthatare initially settozero,andr.heaveragenumber ofzerocoellicientsthat.nrcnot.initially settozero.\Ve alsocomparetheest ima ted valucsof thcIlOllzcrococfficiclltsalld thccolTcspondingstandarden ors

(78)

MedianRela tiv eModelError (MRME)

FollowingTibshirani (1996), we comp arcthclllcdi an ofthcrelativ cmod cl cr ror (Fan

and Li,2Q(1l) ra t he r than themean rela ti vemodelerr or becauseof theinst ab ility of thc best-subsetvaria bleselect ion.Thcmodelcr ror for the linea r mo dcl is dc fincd by

M

E(i3 )

=

(i3

-13f E(X'l'X )

(i3

-13) .

The erro r for the selected mod elis compared to theerrorofthe fnll mod el,For each

varia bleselect ion method,we com pute d thcmedi an of thcrclativc modcl err or,and

this isreportedin the sirnula t ionst udies.

5.1

Lin

e ar

R

egr essi on

M

od el

Yi=X;f3+Ufi (51)

with13=(3,1.5,O,O,2, O,o, Ofwbcrc Xi=(:ci" :ri,,..,"'i,,) is a vectorof covariates andpe-S.The comp OllclltsofXand€are st alldardll ormal,the corrclationbetween Xiandz ,isO .5Ii - j l,andU=1.Thclcast- sqllar cs cstirnatc of13i s givcnby

_ [" t:

]

-

1

"

T T

-

I

'l'

(79)

The est imat ing eqnati onfor{3is given by

g({3)= t X; [Yi-X;{3]=0

and thefirstderivativ e of the cstima t ingequat iong({3)with respec tto{3 is

g'({3)= -t xrXi'

(5.3)

componentsofXand e being sta ndar d normal.ThisisthemodelusedbyTibshir an i (1996).Our penalized-empirical-likelihoodSCAD (PE LSCA D)is compared onlywith SCADsince FanandLi(2001) report edthatSCAD perform sbett er than LASSO and other inform at ion-th eoret ic approaches.Following Tibshiran i (1996) and Fan

and Li(2001),theperformanceof thesemethodswasassessedbas edon~I1U"Eand

thenumberofzerocoefficients.We also rep eat edthe ent irest udy withsamplesize

ill Tabl e5.1.Italso report sthe averagenumber ofzero and nonzero coefficients. The column lab eled "Correc t"gives the averagenumber ofcst imate dzero coefficients thatwereinitially set tozero,and the column lab eled "Incorrect"givestheaverage number of zero coefficients that werenot initiallysetto zero.Theest iruated values of thenonzero coefficientsand the corr csp ondin g st and ard error s ar erepor tedin Table 5.2.From Tabl e5.1we see tha t forII=60the MRMEof SCADis slight lysma ller

(80)

than thatof PELSCAD,andfor bothmeth odstheaverage number ofzero coefficients

isdoseto thetargetof five.Whenthesamplesize increas esto100,theMR~IEof PELSCAD islowcompa red tothat of SCAD.The averagenumberof zerocoefficients is aga in dosetofive.Thisdearlyindicatesthatbothmethod sperform well whena

par am etri cmod elisavailable.

l'vIRME% Avg.no.ofzerocoefficients

Correct Incorr ect

n=60,a=1 SCAD 35.57 4.61 0.0 PELSCAD 36.52 4.61 0.0 n=lOO,a=1 SCAD 41.50 4.85 0.0 PELSCAD 34.55 4.95 0.0

Table5.1:Simulat ion resultsforlinearregressionmodel

Method (3, (32 (3" n=60,a=1 SCAD 3.015 1.474 2.003 (0. 167) (0. HJ5) (0.136) 3.002 1.496 1.999 (0.163)(0.170) (0.141) ueel.Hll,a=1 SCAD 3.027 1.442 2.003 (0.139)(0.185) (0.104) 2.999 1.499 1.999 (0.120)(0.124) (0.104)

(81)