PENALIZED EMPIRICAL LIKELIHOOD
BASED VARIABLE SELECTION
by
@T h a rshan naN ada r ajah
A thesis sulnniiicdto the Schoolo] GraduaieStiulies in portioljulfiilment o] the requiremcnijortheDeqrceoj
Mastero] Sciencein Statistics
Depa r tm cn t o f Mathem at icsand St a ti st ics Memor ia lUniver sityof Newfo n n d la n d
Abstract
Vari able select ion isanimportanttopic inhigh-dimensionalst at ist ical modelingves-peciallyin generalizerllinearmod els.Severalvariable sc!ectionproced ureshavebeen devc!oped in t helitera ture,includ ingthe sequent ial approach ,predict ion-error ap-proach,andinformation-theoret ic approach.Allofthesearc computationally ex-pensive.Anew methodbased on penalizedlikelihoodhasbeenlaudedfor its corn-pntati onal efficiencyandst abilit y.In this approachthe variableselectio nandthe estimat ionof the coefficientsarecarriedoutsimulta neously,Theparam etr ic likeli-hoodis a crucialcompo nent,but inmanysit nations a well-de fined para met ric likeli-hood is not easy to construct.To overcomethisproblem, Variya th (2006)prop osed apenalized-empirical-likelihood (PEL)based variableselect ionwhereempirical like-lihoodis constructe dbased011a set of estimati ngeq uat ions.We investigate the ,.,ympto ticproper t iesofthe newmet hod,and develo panalgorit hmforesti mati ngt he param et ers.Oursimulat ionstudiesshowthatwhenaparam et ricmodel is availab le,
PEL-bascdvariable selectiongives rcsnlts similartothose achieved bypar ametri c-likelihood variableselection. Thefonn ermeth od out performs thclatt er whenthe paramctri cmodelismisspecificd.\Ve extcnd our appro achtovariable selectionin Cox's prop orti onalhazardmodel.
Acknowledgements
Iwould liketo cxp rcssmy appreciation to my superv isorDr.AsokanMulaya th Variyath,forgiving methe opportu nity toworkwith him.His cont inuo usguida nce, snpport and pati cn ce ovcrthclasttwo yearshavebeeninvaluable.
Igrate fu lly aeknowledgcthc financialsuppo rtprovidedbyMemori al University ofNewfound land's SchoolofGraduat eStudies ,theDep artment of Ma th ematics& St a ti sti cs , and my superv isor inthcformofgra duatefellowships and teachi ng assis-tant ship s. Iwou ld liketothank allthe facultyinourdep ar tm en tfortheir supp ort. Itwould nothavcbccn possih lcforme to complctctherequirements ofthcgra d ua te progra m wit houtthcirgu idancc.
Iexpressmyprofoundappreciation tomyparents,brothers ,wifePremni, and beautifulson l(avevarmt\llforthcir ellcou ragcment,understandiugvaud patiencceven duringthedifficul t timcs .
who directlyorindireetly cncouragedmein theMaster 'sprogram andcontributedto
Fiually,thisthesis is dedica tedtomyparelltsaud theteacherswhoh ave supported me eversillcethebeginnin g ofmy studi es.
Contents
Acknowledgem ent s
Listof Figures
1.113ackgrouIldof Vari ahl e Select ioIl
1.1.2 GeneralizedLinearModel(GLM) 1.1.3 Quasi-Likelihood (QL)
Predicti on- Err orApproach . Inform a t ion-Th eoreti cApproach. Penali zed-LikelihoodAp proach. Moti va tionfor New Ap proac h. Prop osedApproach toVaria blcSclcct ion 1.3 .1 Empirica l Likelih ood (EL). 1.3 .2 Pen alizedEmpiricalLikelih ood (PE L)
2VariableSelect io nviaNonco ncave Penali zedLikelih o od 2.1LocalQua d raticApprox ima t ionsandSta nda rd Errors
3Va ri ableSelectio nvia Pen ali zed Em p ir ica l Likelih o od Emp irica lLikelihoo d(EL).
Pen ali zed Empirica l Likelih oodbasedVariabl eSelecti on . Distribu t ion alProp erti es
Pen ali zed Adj uste dEm pirica l Likelih ood
4Numerical Al go ri thm Comp utat ionof Lagran ge Mult iplier 4.2Algor ithmforOptimi zingPen alized Empir ica l Likelih ood
Select ionof ThresholdingPar am et ers StandardError Formula.
LinearRegressionMod el . PoissonRegressionModel. Logisti cRegressionMod el . Aust ralianHealt hSurvey.
6Variabl eSel ecti onforCox's Prop ort ion alHazardModel Prop orti ollalHazard sModcl .
Simulat ionSt udies. Lung Cancer Exampl e.
List of Tables
1.1Response andCovariatesof doct or-visitdat a .
Simulat ion result sforlinear regressionmodel .
Linearregressionmodel:Estimatesof nonzerocoeffieients wit h corre-spo ndingst andard errors in pa rent heses..
5.3Simulationresultsfor Poissonregressionmodel. Poissonregression:Estim at esof non zero coefficientswit h correspond-ing st anda rderr orsin parenthe ses
Simulationresul tsforlogisticregressionlllodel. Logisticregressionmod el:Estim at esof non zero coefficients with cor-resp ondin g sta ndarderrors in parenthe ses
Estimatesof Poissonregr essioncoefficients,with theirsta ndar derrors in parenth eses,formodel ide nt ifiedby differentvaria bleselection mct hods75
Simulati onresult s for Cox's prop ortionalhazard smod cl . Cox'sprop or t ionalhazardsmod el:Estim ates of nonzero coefficients with correspo nd ingstanda rderrors inparent heses. Simulati on results for Cox'sprop orti on alhazard smod el Cox's proportion alhazard smod el: Est imatesof non zero coefficients withcorr esp onding st andardcrror sinparent.hescs..
Estimatesof Cox'sprop ortionalhaza rdsmod el coefficient sfor fullmodel 89 Estimatesofregressioncoefficients in Cox'sproportional hazardsmodel90
List of Figures
L"penalt yfunction. SCADandHAR D penalt yfuncti ons
Chapter 1
Introduction
1.1
Background
of
Variable Sel ecti on
Variab le select ion isanimportanttopicinst at ist ical mod eling,espec ially ingeneral-izcdlinearmodel s (G LtvI).In practi ce, alargenumberof covariates,(X"X2,...,Xp ), arcbelievedtohaveallinftucnceontheresp onsevariahleyof intcrest.However,some covariatcshavcno influcnccor a wcak influence, andaregressionmodelthatinclud es allthe covariate,isnotadvisable,Exc!udin gtheunimportant covari at e,re",ltsin a simpler mod el wit h bett erint erpretive and predicti ve value.
Theproblem of identifyingasubmodelthat adequat elymodelstheresp onseis genera lly referredtoasthevariableseleet ion problem. Sta t ist icallyspea king,variable
selec t ionisa way toredu cethe complexityofthe mod el,insomecasesby accept ing asmallamount of biasto improvethe precision.Themainad va nt ag es of selectin g a
oTheinterpr et ati on of a lar gemodelcan bedifficul t.
oThepredi cti on accur acymaybcimproved bydropping rednnd ant andirrelevan t
oKnowing wh ichva ria bles are sign ifica ntgives insightintothenatur e of the prediction problemand allows a betterunderstandingof thefinal model.
oIt ischeaperto IIlcasl1rcarcd uccdsctofvariablcs.
For examp le,cons ide r thedoctor-visitdatafro m theAustr ali anheaIt hsur vey of 1977-78,whichisdiscussedin det ailbyCam eron and Trivedi (1998). Thedat a set consists ofaresp on sevari abl e (t he number of doct orvisit sinthe previoustwo weeks by an ad ult)andtwelvecovariates,includin ghealthindicator s andgenera l factors,which arclist edin Table1.1.Ourgoal istomodeltherelation shipbetweentheresp onse andthecovariates .Themodelwithalleova ria t es is notinterestingsinceitisdifficuIt tointerpret andwill havepoor predictionprecision.We aim to finda simpler mode l that givesareason abl edescriptionof thedat a-gen eratingmechanism .Theinitial ana lys isofand vari abl e selection for this da t a setarc discussedinCha p ter5.In
the nex t subsec t ion wewilldiscuss com mo nlynscd rcgrcssion mode lsandcst imat ion procedur es where vari abl e selectio nis considered impo rta nt.
Descrip ti on y-Dvisits Xl-Sex X2-Age X,,-Agcsq .I\",-IllCOmc X.-Levyplns XG-Freepoor X7-Frccrcpa X8-Illness Xg-Actd ays XIO-Hscorc Xu-Chcon dl Xl 2-Chcond 2
Numberofdoctorvisits in previoustwoweeks 1 ifIcmale,0if male
Ageinyearsdividedby100 Age squa red
Annual income in Aust rali andolla rsdivid edby1000 1ifcoveredbypriva tehealth insuran ce;oothe rwise 1ifcoveredby gover nment becauselowincome, rec entimmigrant ,unem ployed;Ootherwi~e
1ifcovered freeby govefllmentbecanseclderl y,disabilitypension, invalid vet eran, or familyof deceasedveteran;Oot.herwisc Numbe rofillucsscsln previousz weeks,wit.h5ormore coded
I.'
5 Nu mbcrof daysofreducedactivit.yin previous2wecksduct.oillnessor inj nry Genera l heal thquestionn air e scor eusin g Gold be rg'smeth od; high scorciudicat oshadhcalth1 ifchroniccondit.ion(s)hut.not.limit edinactivity;Ooth erwisc 1ifchro n iecond it ion(s )andlim ited inactivity;0otherwise
Tabl e1.1:Response andcovariatesof doct or-visitda ta
1.1.1 Lin ear Mode ls
Linea rmod elshave been the main st ay ofstatistics forthirty yearsandrcmainone ofo nrmost common ly used statist icaltools. Inlincar lllod els,theda t a aremodeled usinglinearfuncti ons of the covariates,and theunknownparamet ers are estima ted from the dat a.For a given da t a set{Yi;Xill ...tXip}:~1ofnunits/subjects, a Iincar
1.1 BACK GROU ND OF VARI ABLESEL ECTIO N
regressionmodel assumes that tho relati onshipbetweentho rospo nscvaria blcp, and the p dimcn, iona lrcgressors Xi is linca r. Thus, the modelhas the for m
(1.1)
whcre e is theerror term,Xis an nxplUatri x of cova riat c vall1Cs,an d{3is a vcctor of unknownpar am et erstobe est imate d.Aviolat ionof thclinearity assumpt ion betweentheresp onse andtheexplanat oryvaria bles or thcdistrihutional assumpt ion of t hc ra ndo mcrror may incrcascthc modclvariat ion.Themeth odofleastsquares isthemostpopul armethodfor est ima t ing the rcgressionparam et ers,This approac h minimizesthe residua lsumofsquares,
I nma t r i xform,t here s i d u alSUIllo fsqu a r e sc anbewr i t t e n
RSS({3)=(y-X{3f "(y -X{3).
Hence,theordinarylcast- squ ares estim at e of{3is givellby
andthc fitted valuesatthetraininginputs are
(1.2)
Ifwe assumethat €~N (O,a2I,,),then the likelihood funct ionofycan bewritten
Let€((3,a2)=logL((3,a2) ,thenthe part ialderivat ive of €((3, a 2) wit hrespect to(3
estimateof (3.
1.1.2 Gen eraliz edLinearModel(GLM) Generalized linearmodels arcdefinedhy Nolder and Wedderburn (1972). includelinear regressionmodels,logistic and probitmodelsforcategorical responses, andlog-linearmodels.For all thesemodcls,alinearrelationshipisas sumedbetween the resp onsevariableyandcovariatesX through somelinkIuuctionvTbc condit ional expectat ionofygiven Xis specifiedas
JL=E(yI X)=g(X (3), (1.4)
whereg(*)is aknownIink functi onandf3isthe vector ofregressionparameters. AGLMincludes a randomcomponentspecifyingthe condit ional dist ribution of th e
1.1BA CK GRO UNDOF VARIABLESEL ECTION
rcspon scvariablcygivcllthc cxpl anatory variahlc.Thadi tion ally,thcralldolIl COlll-ponentis amemb er ofanexponen tia l-familydist ribntion suchas the Gaussian, bi-nomial,Poissoll, gallllll a,orinvcrsc-Gallssian .Theestim ationproceedsbydefining a measureofgoodness- of-fit bet weentheobserveddat a and thefitted valuesgenerate d bythemodel.Theparameter est ima tesarc the values thatminimizethe goodness-of-fit crite rion.Weprimarily est imate theparam et ersbymaximizingthelikelihood for theobservedda t a .Thelog-likelihoodbased ona setofindependent observa t ions Yll Y'l.,· ·· ,Ynis
The goodn ess-of-fit crit erionis
D(Y;IL) =2f (y; y)-2f (/l;Y);
it is calledthescaleddeviance.Notethate(y; y)is the maximumlikelihood for an exact fit in which thefittedvaluesarcequa ltothe observed dat a , and itdocsnot dependon theparam eters.Maximi zing f (/l;Y)is equivalent tominimizingD(y;/l) wit h resp ectusu,subjec t to the const raints imposedby the model.
1.1.3 Quasi-Lik elihood (QL)
When thereisinsufficientinform ationabout thedat aforustospecify aparam etric model,quasi-likelihood is often used.lnthis sitllation we can develop the stat istical analysisbased onapproximat ions to thelikelihood,andwe concentrate011ca..ses
where theobse rva t ions areindepend ent. Suppose wehave a vectorof independ en t respon ses,y,with meanuandcovariance diagon almatrixa2V(J-L).\VC assumetha t
{tis afuncti onofcovariates andsomeregression param et ers(3.To const ruct the quasi-likelihood ,we star t bylooking at a single compon entyof y.Undertheabove cond it ionsv thefunction
hasthefollowingproperti es:
E(U)=0,V(U)=
a2~(ll)
'
and-E(
!Jjf;)
=a2~(jl)'
Most ofthefirst- ord eras ympt otictheory concernedwit hthelikelihoodisbas edon theseproperti es.Itis therefore not surprising that
Q(/l; y )=
[:'~(:)dt
behaveslike alog-likelihood functi onforI';thisis calledthequas i-likelihood.The quasi-likelihoodforcompletedat a is
Q(JI;Y)=t Q(/I;;Y;).
Thoquasi-deviancefuncti on fora singleobserva t ioncanbewritten
Thequas i-likelihood est imat ingequa t ion, for theregressionpar am et er'(3are ob-tainedbydifferenti atin gQ(JI;Y ). They can be writte n in theformU(/J)= 0,where
U((3)=DTV-~~Y-JI)
is called thcquasi-scorefunctionandDisthederivativeofJI((3)withresp ectto(3. TheNcwton-Ra phson methodis widelyused toestimate thcparamete rs.
1.2
Va
ria b le
Selecti on Met
ho ds
Themain objcc t iveofmriableselcct ionmethods istoidenti fy a simp leradeq ua te mod elthatis easier tointerpretthantbefullmodel.In linearmod els,the submodel rclat es thcresp onsevariabl eyto a sub set of comp onents ofXintheform
y=X (S)(3(S) +f
whereX(s )isasubse tofthecomponentsofX,(3(s )is avect orof thc correspond-ingregressionparam et ers,ands~(1, 2,...,p) . The variableselect ion problem
istofind thebcst sub setBsuchthat t hesuh mo dcl is op t ima laccord ingtoso me cri te rionthatgivesa good descripti on of thedat a-gen erati ng mechanism.Several meth odshavebeendevelop edill theliter atur efortheidcnti ficati onof th e h cst sub mod el.These metho dscan bebroadlyclassifiedinto four catego ries:seq uen-tialapp roac hes , pred ict iou-erro rapproac hcs,infor ma t ion-t hcoret icap proac hcs ,and penali zed -likelih ood approac hes.In thenext sect ion wewilldiscuss exist ing vari abl e select ion procedures andtheir advant ages and disadvantages.
1.2.1 Sequen t ialApproa ch es
Thesequent ialapp roac hesweredevelopedin theearly1960s whencomputi ng ro-sourceswerelimited.Intheseap proac hes ,onlysome ofthepossiblesubmodc ls are evalua te d to identif ythebestmodel. In theforward-selectionapp roac h, westa rt withauinterceptmodel and add thevariabl es oneata time.Ateac hste p,eachva ri-abletha t isnotalrea dy in themod elistcst edfor inclusion , and themos tsiguificant varia ble isadde dtothemod cl.Thisprocesscont inues untilnon e of theremain ing varia blesaresignificantwhenadded to the mod elorthereare no more vari ahl es.B e-causeofthecomp lex itytha tarises from thenatur eof thisprocedur e,itis essent ia lly impossi ble to contro l the error rate.
lIlay chan geth esignifiea nceof oneor lllorevariablcs al ready includedi n thelllodel. Analte rnativeapproac h isbackwardelimination.Inthisapproach,westartamodel withalltbe variablesof interest.Thentheleas t impo rta nt vari ableisdropp ed,pr o-vided itis notsignificant.Weconti nueth isprocessby succcss i\'elyrL~ fi t ti ngrednccd models andapplying the same rulenntil allthevariablesremainin ginthemodel are stat ist icallysignificant. Backward eliminat ionalsohasdrawbacks,Sometimesvari
-abiesthat aredropped wouldbe significantinthefinalredu cedmodel.This suggests that a compromisebetweenfor wardselectionand backwardeliminat ionshould be
Efroymsou(19GO) proposedastepwise-regressionapproach thatisacombination ofthe above two approaches. Thismethodusesforwardselect ion,but afterthe add i-tionofeachvariable,backwardeliminat ionis applied topotentiallyremove vnriables alrea dyinthe model.Stepwiseregressiondoesnot gua ra ntee to findanopti ma l submodel.Tbe sequential app roaches ar e compu t ationallylessdelllandin g th an the
1.2.2 Predi cti on-Error Approach
Anot herapproachto variableselect ion is to choosethe sub modelwit hthe bestability to predicta future response.Meth odsusing the prediction-er rorapproach,suchas
cross-valid a tionand bootst rap,are compnta t iona lly intens ive.Cross-validati on has beenwellstudiedas abasisformod el select ionbyStone(1974).Incross-valida tion, we comp ntethe predi cti onerrorof allsnhmodcls.Wesplit thedat ainto[(parts of roug h lyequa lsizes andestimatethe prediction error forone partof thedat abas ed on thefitted submo del usingthe remaining(1(-1)part s .Wethen combineallJ( estimates of thcprcd ictioncrrorforcachsubmodel. The submod clwith theminimum predictioncrror is select ed .
Let k:{1,2, ...,n}>-+{1,2, ... ,[( } hean indexingfunction thati nd ica testhe par tit ionto whicheachobse rvatio n isallocate d bytherandomizati on.The case
estimatorsareap prox imatelyunb iased forthetr ue predi cti oner ror,bu t theycan haveahighvariance andthecomp utationa l bur denis also high .Ingeneral,five-or ten-foldcross-valida t ion isrecommend ed(SL'CBreimanand Spector ,1992;Kohavi, 1995).
Bickeland Freedm an(1982)suggeste dthatcond it ionalboot st rapbeused for variable selection. Thebootst ra pis a general toolfor assessingstatisticalaccuracy. Supposewe wishtofit amod elto a setoftraining dat a.Thebasicideais to randomly drawdat a setswithreplacem ent fromthe tra inin gda t a,eachofthesamesizeas the origina ltrainingsct.Thisprocedur erepea t ed alargenumber of times .Thenwerefit
thcmodcl to cachofthc bootstrapsamplesetsandoxa m inc thebchavior of t hcfits. Thesemethods arecomputer-intensiveandtendtobeimpracticalifwchavcto fit morethan15-20modelsor ifth c samplesize islar ge.However,cross-validat ion offersan intercstingaltern a tiveformodelselection.Insomesituati onstheprediction err orisnot well defined (for examp le,ingeneralizedlinea r models)and thcrcforc thesemeth odsarenot ap plicable.
1.2.3 Infor m a ti on- Theoret ic Approach
In this scction ,webrieflyintroducethemostcommonlyused inform ati on- theorct ic modelselect ionapproac hes :theAkaikein formatio ncrite rion(AIC)andBayesian in formationcritcrion(BIC).Thescmcth odsa rcapplicablcwhcn a well-defined para-metri cmodelis available.Wcwillalsodiscussnon paramet ric versions ofAICand
AkaikeIn for ma t ionCr iterio n(AIC)
Kullback and Leibl cr (1951)int roducedthcKullback-Lcibler (K- L)"dist ance"or "information"betweent\VOmodels.Letjalld gbe continuousdistribution fu nctions ,
thentheK-Linform ationbetweenmodelsj'andgisdefined to bc
Thenota t ion[(J,y )denotesthe dist an ce fromatof.However,theK-L distance can notbe computed withou tfullknowledgeof bothfandthepara meter0for each candidatemodelYi (:rIO).Akaike(1973,1974)founda simplerelationshipbetween theK-L dist anceandFisher 'smaximized log-likelihood functio n.Akaikealsofound arigorouswaytoest imate theK-L inform ati on,based on thecmpi rical log-likelihood function atitsmaximumpoint .Werepresent thefullmodelwit hpparametersas
Akaikeformul atestheproblemofst at istical mod elidentificati on astheselect ionof asubmodelf (y, X, (3,),wherethepar ticular rest ricte d modelisdefined by thecon-straintsri.•+1=;3.•+2= ... =;3p=0,so that
model(s ) :f(y,X,(3.•),(3.•=(;31.;32,..;3"O,...,
of
wheresis thenumberof para metersand(3,isasubspaceof)R".Let/3,bethe maximu mlikelihood estima teundermodel(s),thenthelog-likelihoodfunct ionis given by
wherekis thecardinalityof8.Unde r this criter ionwe choose themodel withthe
Baye sianInform a tion Cri ter io n(B IC)
Schwar z (l !J78)suggeste d nsingaBayesianapproac hto the mod el select ionproblem. Thismethodresultsina criter iontha tis similartoAIC.Itisbased on thepenalized log-likelihoodfunction evaluatedat the maximumlikclihoodestimate forthemodel. Thepenalt ytermin theBIC obta ined by Schwarz(1!J78) istheAIC pena lty term kmult ipliedby
~l
og(n),
wherenis thesamplesize,Similarlyto AIC,the BlC ofa submodelisdefined tobeBlC (s ) = - 2f (iJ,) + kl og(n ).
The submodelwiththeminimu mBlC valueis selected. It hasbeen observed that minimi zingAICdocs not produce asympt ot icallyconsisten test ima tesof thecorrect model. Incontrast,BlCis consist ent.
Mnllow'sCsCritcrto n
Mallow'sCi.is atechniquefor modelselect ion inregressionprop osedbyMallows (l!J73, 1!J!J5). TheC,sta t istic is a crite rion to assessthefit when mod els wit h
differentnumbers of param et ers arc beingcompa red. TheMallows crite rion for a
whereRSS(s)is theresid ua lsumof sqnaresandkis thecardina lityofs. Usnally Ckisplott ed againstk for thecollect ionofsubset mod els ofvarioussizes under conside ra t ion. Accept abl emod els (minimizing thetotalbias of thepredictedvalu es) lirethose forwhichCkapproachesthevalu ek.
Insummary, the inform ation-t heoretic approaches are based011Strongparametric mode!assumpt ions . InGLMsand QL,themod el is frequentlyspecifiedbya setof
cst imatingequat ions undwern aynothavefullyspecifiedparametri c ussumpt ions.
Hence, thesemethod scannot beuseddirectl y.One solut ionisto uscnonp ar nmctric
theoretic npproac h istho computa t iona lburde nof fittin g allpossiblc submodels.
In the next section,we discuss the empiricnl-likelihoo d- base d inform ation- t.heoret.ic approac h forvariable select ionproposedbyVariyat h,Chen,andAbrah am (2010).
Em pirica l-Likelihoo d- Base d Information-Theor eti cApproa ch
Vari yath , Chen ,lind Abrah am (2010) develop ed an informa tion-th eoretic approac h to varia ble selectionbas edonanonp aram et riclikelihood,foruscwhen1Iwell-d efined
parametric modelisnot availab le. Theyrcpl accdthcpa ram et riclikelihood by thc cmp iricallikelihoodand invest iga t edthcusc olempiricnl-likelihoo d- bused AlC aud IllC .The empirica l-likelihood-based AlCisdefinedtobe
EAlC (s ) =11'(,8., ) +2k,
whcr cll'(,8,)=2£EL(,8,)ist hecmpirica l-likelihoo d rati ofuncti onfor thesubmo de l. Sim ilar ly,the empirica l-likelihoo d-based BICis defiued tobe
EIllC(s)= 1I'(,8, ) + klog (n ).
Thcbes tmodelisidcuti ficd asthcmod clwiththc miuimumvalueof EAlC (or EIllC ) overallpossible suln node ls.Moredet ailsofthe empir icallikelihood ar e givcu inCha pter3.Va riyath , Chc n,aud Abrah am (2010)showthattheemp irica land pa ra -metriclikelihood-based AIC andBIChavefirst-ord er asymp toticpropert ies. Their simulat ionstud iesshowthatwhe napar am et riclikelihood exists,thctwometh ods have similarperformance. The empir ica l-likelihoo d- basedap proac h is su perior when thcparamctr icmod clismisspccificd.
Inthciuformati on-thcoret ic app roacha completeevalua tion ofallthe subruodelsis necessar y,Asthclllllllbcr ofcovariatcsincrca...,cs,the computa t ioualburden becomes more severe, To avoidtheevaluat ionofallthesubmo dels,anew penalized-likelihood
1.2.4 Pen ali zed-Lik elih ood Approac h
Theidea of penalizati onisveryusefulillsta t ist icalmodelingparti cularl y ill high dimension alvari abl e selection, Most tradition alvariable select ion procedur es such
I
.'
AIC,Mallow'sCk ,and BICuse afixedpenaltybasedon the size of themodel. However ,allthese procedur esusc eit herstepwiseorsubset -select ion procedur esto select thevariables. These select ion procedur esmaketheprocedur es computat iona lly intensiveand unstable. Toovercometheineffieiencies oftra ditio nalvariabl e selcctioll procedur es,Fan and Li(2001) prop osedaunifiedapproachvia nonconcave penal -izcdleast squares.This meth od auto mat icallyandsimulta neouslyselects variables andestima tes their coefficients.Theleast absoluteshrinkage andselectionoperato r (LASSO) proposed byTibshiran i(1996,1997)is another variantofthep enalized-likelihood approach.FanandLi(2001)applied thepenalized-likelihood approachto lillearregression, robustl inear regression, andgeneralizedlinearmodels.They show thattheproposedpenalized-likelihood est ima tor wit hthesmoothIyclipped absolute deviation (SC AD) penaltyfunction(defined inChapte r 2) out per formsallthesu b-set and informa tion -th coretiemriablesclcctionprocedur esinterms0fcomput ati onal cost andsta bility.TheSCADimprovestheLASSO by reducing theesti ma t ion bias. Furthermore,they show thattheSCADpossessesoracleproperti eswith aprop er choiceof thetuningpar am et ers.The true regressioncoefficients that arczero areautoma t icallyshrunktozcro,andthe remaining coefficientsarcsinmltancously est
i-mat ed .Hen ce,theSCADandits properti es areidealprocedu resforvari a ble select ion,
at leas t fromatheoreti calpointof view.Thisencouragesusto investi gat e SCAD
propert ies in no nparame t ric-likelihoo dse tting.
1.2.5 Motivat ionfor NewApproach
Severalmet hodshavebeendeveloped toselectthe bestsubmodel, The sequent ial
approac hesarccomp uta ti ona llylessdemandin g as the number ofcovariates increases,
buttheidenti fica tion oftheopt imal modelisnot guara ntee d.The sim plestand
mostwidely used variableselectionmethodis cross-va lida t ion. In some sit uat ions
thepred iction error isnotwelldefined,forexample ingenera lizedlinear mod els,
whichl imitstheappli cationofthistechnique.lnform a ti on-theoreti c variabl e selection
methods suchasAICand
m
c
arcbas edon thepa ram etri clikelihood. Thesetwo criteriacan notbe applied withoutfullknowled geof theparam etri cmodel.Ifthemodelisnot well defined ,we can usc em pir ical-likelihood-base d AICand
m
c
.
In somesitua t ions,the numbe r ofpossibl e sub mo de ls islarge, and the comp uta tio na lcost beco messubs ta nt ialifa llthes ubmo delsmust beevalua te d.Met hods bas ed on
penalizedlikelihood suchasLASSOandSCADhave superiorcom pu ta t ionalefficiency
1.3PROPOSEDApPROACHTO VAHIAllLESELE CTIO N
sat isfies the ora cle prop erti es.Thepar am etri clikelihoodis a cruc ialcompo nentof thesemeth od s.Asdiscussed earlier,the par am etri cmod el isnotwelldefinedin ma ny cascs,lilllit ingt heapplicat ion of t he met hods.Weinvesti gat e th eprop ertics ofS CAD inanonp ar am et ric sett ing,whereinst ead ofthe param etri cIikelihood,we nset he empirical likeliho odbas ed on aset ofcst ima ti ngequa t ions.
1.3
Propo
sed
Approach to Variabl
e
S
el ection
Likelihood meth ods playamaj or roleinstatisticalanalysis.Theycan heusedtotheprobl emsarising whenthe da ta areincom pletelyobserved,dist ort ed , orsampled withabias.Theycanbeusedto poolinformationfrom differentdata sources.One prohlclllwit h paramct riclikclihoodin fcrenceistheriskof lllodclmis-spccificat ion. Snchllli s-sp ecificati on can callselik elihood-b as ed cstimate stobeineffi cient.To avoid theriskof mod elmrs-sp ecification, anonparam ctri cmeth od can bouscdTnstcad of parametr ic likelihood,wensc nonpara mct riccmpirical likelihoo d int hc pena lized-Iikclihoodvariabl c sclccti on oppr ooch.
1.3PROPOSEDApPROA CHTO VARIABLESEL ECTIO N 1.3.1 Empirical Likelihood (EL)
Owen (1988) introducedthc cmpiri callikelihood.Empiricallikclihood is anonp ara-mctri cmcth od of st atisticalinference.Itallowsust o use likelihoodmct hodswit hont as sumingthat thedata comc fromaknowndistribution.The empiricallikelihood method combines thereliability of ncnparamctricmethod s withtheflcxibility and effectivcness ofthc likclihoodapproach.
LetYt,!J2,··,!fltbe arandomsample fromacumulat ive distributionfunction
Pi =pry =Yi) =F(Yi) -F(Yi- )
be thoprobabi litymass assignedtOYi.Thc cmpiric al likclihoodfunctiondefinedby Owen (1988) is
Maximi zing
R(F)=log{L(p)}=
~log(P;)
n,=Illeadst o v',=- Thcmaximum empiri callikelihood
whcrcJt »)is the indicatorfunction.Thc clllpiricaldistr ibliti onflincti onbas cd on a rando msample is
F
,,(y) =~t/(Y; :5 Y),
Stati sti calinferenceonthe param eters can bebas ed onthe profile empirical likelihood . For example,if we are interestedin inforcnce ontho meanv sayrz,wcdcfincthcprofile empirical log-likelihoodforlLtobe
((ll)=snp
{
t
10g(1';): 1'; >0,;=1,2,...,71;t 1'; =1,t 1'i(Yi-IL) =O} . Owen(1988,1990,2001)proved thatthcempir ical likelihoodratiofnnctionhaslUI asymptoticX2distribut ion whenueen-,thctruevalue.Thisresult isusefulforIn
-fcrCllCc ollthcparamctcrs , sllcha....,tcstinghypothcscsand constfl1ctill g a confidcncc regionforIt.Notetha t thereis no needtoest ima teascaleparamet erinthe co n-st ruc t ionoft heconfirlcncc intcrval,and the confide nce regions arcnotncccssarily symmet ric becauseof the da ta-d riven approac h.Because ofthese properties,theEL methodhasbecom epopul ar inthcs tat ist ical literatu re and hasbeen cxtcndcdtolin -car regressionmodels(Owen,1991;Chen,1993,1994), genera lest imn t iugcquat ions (Q iuaud Lawless,1994), survivalana lysis(T homasand Grunkemeier ,1975;Li ,1995; Murphy,1995),surveysampling(Chenand Qin,1993; Chen,Sitt er, and WlI,2002) and time series(Monti.1997).
1.3.2 PenalizedEmpirical Likelihood (PE L)
Asdiscussed earlier,penali zed-likelihood-b ased variableselection can be appliedonly whenwehaveawell-definedparametricmodel.Whenwe arenot sureabo utthepara -metricmod el,butthe parameterscan beestimat edby a setofestimating equations, \\'e eanuse an EL basedonasetofestimat ing equations.So ll'eprop oseto replace the para met riclikelihood by theempirica l likelihood todefine anonparametricversion of thepenalizedlikelihoodmethod .Wediscuss theasymptoticprop ertics of ther c-gressionestimates,andwedevelopanalgorithm forest ima tingt heparamet ers.Our simulationstudiosshowtha twhenapar ametri cmod el is available,PEL-based va ri-ableselectiongivesresults similartothose achievedbypar ametric-likelihood variable selection,Theformermeth odoutperformsthelatt erwhenthe parametricmodelis missp ecified.We exte ndourapproachto Cox's proportional hazard smodel.We also applyour method to an Aust ra lianhealt hsurveyandalung-cancer da ta set,
1
.4
Outlin
e
of th
e
Th
esi s
Themainobjectiveof thisthesisistomake a cont ributiontovariableselcction.We mainlyfocusonpenalized-empirical-likelihoodvariableselection.InChapte r2 we brieflydiscussvariable selectionviathenonconcavcpenalizcd likelihoodproposed
byFanand Li (2001).InChapter3, weintroducethocmpiricalIikelihood and its
chara ctcristics.Wedescribcourpenalized-elllpirical-likeIiboodvariable selccti on and
discussitsasymptotic properti es.Tho algorithm isgiven inCbapt.cr 4.InChapt.cr 5 weprovide simulationst udiesto compare theperform anccof empirical-likelihood variableselectionwithpcnalizcd-param ctric-likelihood SCAD,int.hc cont.cxt oflincar regression,Poissonregression,andlogisticregression,We also apply our methodto thcAustralianhealth sur vey.In Chapt er6,wediscussthcimplementa tion of PELin Cox' sproportionalhazardmodcl.Ourconcludingrcmarks ar c givcninChaptcr 7.
=-Chapter 2
Variable Selection via N onconcave
Penalized Likelihood
A newclas sof variable selectionmet hods basedonanonconcavepeualized-likelihood approachwasproposedbyFanand Li(2001)and Tibshirani (1996). Thesemethods arcsuperior to tradition almethodsbecause of theircomput at ionalefficiencyand sta bility. Thevariable select ionand the est ima tionof theregressionparam et ers are carr iodoutsimulta neously.Thatis,insignificant variables arcremovedbyest ima t ing their regression param et ers as zero.Thesemethod swork reason ablywellin high-dimensionalproblems.In thischapt er ,wewillintroducethepenalized-likelihood variab le selectionprop osedbyFan andLi(2001)inthe cont extofalinearmodel.
Yi=Xi{3+ f ;,i=I , 2,
whereXi ERPis avect or ofcovariatesand(3E'RPa vectorof parameters .
a..;sume t ha t t hecollccte rl data{(Xi,Yi)} areinde pende ntsa mplesa ndydX;has
densitY!(Yi;X;{3).Agenera lformofthe penalizedlikelihoodprop osed byFan lind
Li(2001) is definedby
(2.1)
wheref( y,;Xi{3) isthe condit iona llog-likelihoodofydXi,l'J(*)is apenalt yfuncti on,
nndriis thc tnuingpar am ct cr .
In linearregressionmod els,if thecolumnsofthe designmatri xXare0rth onorm al
then itis easy to showthatthe best-subsetselectionmeth od and the stepwiseel
imi-na tioumet hod are equivalenttopenal izedlcast-squ ares cstimat ions wit hthe HARD
thresholdi ngpena lty proposedbyFan (1997)lind Antoni adi s (1997).Thispenalty is
l', (IIII)=02 _( IIII-o)2J(11i1<0).
Fora largevalue of1111,theHARDthresholding penalt ydoesnotoverp enalize.The
VARIABL ESELE CTIO NVIANONCONCAVE PENALIZEDLIKELIII O OD
.John st on c (199 4) inthe wa velet settingandexte nded by Tibsh irani(1996)togcncra l
likelih ood sett ings. Thepenalt y functio n usedin ridgeregression istheL2pen al ty,
PJ(lO/)
=<51012
Accordin gto Fanand Li(2001),agoodpenaltyfnnct ionsho uld rcs nltinanestimato r withthefollowing threeoracle propert ies:
LUnbiase d ness: To avoidunnecessar ymode ling bias ,thc cs tim a t orisnearl yun
-bia..sed whenthetrueunknownparam et erislar ge.
2.Sparsit y :Thisis athresh oldin g rule thatautomatic all y sctssmallestima ted
coc fficicnt s tozcro to rcducc t hc lllod clcomplcxity.
:l.Cont inuity: Thispropert y climina tcsunnecessa ry variat ion in thc mod elp
re-However,thepenalt yfuncti onsL"L2,and HARD donot sa t isfyall three cond it ions.
Asimplc pc na lty functi on sat isfyingallthrcc is thc SCAD pen alt yprop osedby Fan
(1997).Itsfirstderivati ve is
p
~(Ii)
= <5
{1(1i~<5)
+(~~
:;;
1(0
)
<5)
}
for somea>2 ande
>O. (2.2)Necessarycondi tio ns for the unbiasedness,spars ity,andcontin uityofthc SCAD
pen al tyhavebeenproved byAntoui adi s and Fan (2001). Thispcn al tyfnncti on
VARI ABLESEL ECTIO N VIANONC ONC AVEPENALIZE D LIKE LIHOOD Lppenallies(p=1,2,0.3)
§
P=1 p=2 ::-P=O.3---Figure 2.1:Lppenaltyfnnction
As showninFigs. 2.1and2.2,allthepenaltyfunctions aresingularat theorigin,
satisfying1'8(0+)>O.Thisis the necessary condition forsparsityinvariable so-leetion.As showninFig. 2.2,the HARDandSCADpenalties arc consta ntwhen
{3is}argc,indica ting that thereisno excessivepenalizat ionfor large regressionco-efficients.However, SCAD is smootherthan HAn Dandhence yieldsa continnous
Figur e2.2:SCAD andHARD penaltyfuncti ons
Let{3o=({3;o,{3"io)Tbethetruevalueof{3.With ou tloss ofgenerality,weassume
that{3 20=Oan dallcOIllpOllelltsof{3lOarellonzero.LetI({3o)betheFisher inform a
-tion ma trixand letI,({3IO'O)be the Fisherillformati on given {320=0.Undersome
regularity conditions,FallandLi(2001) show that the est imat e of thercgrcssionp a-ramete r bas ed onthe SCADpenalt y,
/3
=(/3~.,/3~")
T,sat isfies the oraclepropert ies2.1 LOCAL QUADRATICApPROXIMATIONSANDSTANDARDERRORS
for a certa in choicc oftuningpar am et er (J,a),sincc !32 ~Oand .,fii(!3, - {3IO)-E..t N (O,I,'({3lO'O)).
ThcSCADpenaltyfnnctioninvolves two unknownparam et ers, J and n.In practice, we couldsearch for thebestpair (J,a)overatwo-dimensionalstru ctur eusing cross -valid ation (CV)orgenera lizedcross-valida t ion(GCV;Craven and Wahba ,19i9). However ,thiswould becomputationally expe nsive. From a Bayesinnpointof view, Fan and Li (2001) snggcstcd scttin ga =3.iandusing GCV to selectthebestvalue
2.1
Local Quadratic Approximations
and Standard
Error
s
Thcpcnalt yfuncti onp6(113jl)isirregnlar atthe origin and docsnot have conti nuous second-orde r deri vativ es atsome point s.Special care isneededin thc applicat ionof theNewt on-R aph son algorit hm,Fan and Li(200 1) locally approximatetheSCAD penalt yfunctionbyquadr ati cfunctions as follows.Supposeourinit ial value{3 ois closctothcmllXimi zer of (2.1).lfl3joi svcrycloscto zcro ,thensct!Jj =O,ot hcrwisc, thcpcllalt YP 6(ll3j l) can belocally approxima ted bythequadrati cfuncti onsvia
2.1LOCALQUADRATI CApPROXIMATIO NSANDSTANDARDERRORS
whcnf3ji'0.Inotherwords.
A disad vant age of this approx imat ion is thatoncea coefficient hasbee nshr unkto
zero, it willstayat zcro.However,this mcth od significant lyredu cesthc
compu-tati onalburden . Now we assume that the firsttwopar tialderivat ives ofthe log-likeliho odfuncti on arecont inuous,so thatitis a smoot h functionwith resp ectto
(3.Thcfirsttcrmin(2. 1)can belocally approximate dby aquadr ati cfuncti on via
Taylor's expansion.Themaximizatiouproblem(2.1)canbereducedto a quadratic
maximizat ion problemandthcNcwton-Raphsoualgorit hmcanbeused,Therefore, (2.1)can belocallyapproximat edby
C((30)+M((3of ((3 -(30)+
~
((3
-(3oft::, 2C((30)((3-(30)-~
n(3TE6((30)(3
,
(2.3) wheret::,C((3o)=DC~
~o),
t::,2C((30)=~~
;;~)
.
Thc quadra t ic maximizat ion problem (2.3) is solved via thcNewto n-Ra phson
alga-rithm .In thisalgorithm,the upd a te atthe(k+I)'h itera tion is (3'+ 1 =(3k - [t::,2C((3k)-nE 6((3k)rl[M((3k)- IlU6((3k)] ( k .
[
Po(I
f3
m
/1W
f3;D
]
(I>k) "(l>k)l>k2.1 LOCAL QUADRATICApPROXIMATIONS ANDSTANDARD ERRORS The sandwich formulaforthe sta ndarderrorsof theest imatedparam ete rs exists immediatelybecausethismeth odestimates theparamet ersandselectsthevariables atthesametime.The standarderrors oftheest ima ted pa ramet ers aregiven by
Fanand Li(2001)conducte da seriesof Mont e- Carlosimulationsin linearregression, robustregression,andlogisticregressionandshowed that thepenalized-likelihood variableselection usingthe SCADpenalt yperformsbett erthanthe LASSO,HAnD, and informat ion-theoreticapproaehes.
Chapter 3
Variable Selection via Penalized
Empirical Likelihood
The empiricallikelihoodmeth odis apowerful inferencetoolwithpromising appli-cutio nsIn manyareas ofsta t ist ics.Inthischapte r,webrieflyintroducethebasic conceptofempiricallikelihood.Wcthclldiscnssthepellalizcd-empirical-likclihood
3.1EMPIRICALLIKELIlIOOD(EL)
3.1
Empiri
cal
Lik
elihood (
EL)
Wefirst out linetheem pirical likelih ood asdiscussedby Owen(19SS,1990).Fora
givcJl ralldomsa mplcYl,Y2"",Ynfroman unkuowu distributioIlft11lctionF(y),thc
empirical likelihood functionofFisdefinedto be
wherePi
=
F({Yi} )=
Pr(}i=
Yi)'The cm pirical likelihoo d ismaximized withou t any fur t her infor ma t ion a bo ut t he empiricald ist ribut ion func t ionFwhere1(. )is theind icatorfu nct ionandtheinequalityis expresse dcompo nentwise.
Ingencral,it is morecommon to work withtheempirical log-likelih ood
(3.1)
subjec t totheconstrain ts
B
P
i
=1andPi>0,i=1,2, ..In.Supposewewantto illvc...,tigatcillferellceoll th c paramcterslludcr thca.."isllIllpti oll th at Fisamcmhcrof anonp ar ametri c dist rib ut ionfamilyF,sayIt=T(F )for somefunction alT ofthedist ribu tion.InferenceforparameterItcanheobtained using thelikelihood ap proach,
if weknowthelikelih ood valueatI"Foragivenvalueofu,thepopulationFEF
3.1EMPIRI CA LLIKELIHO OD (E L)
noti on of profilc likclihood isto findthcFatwhichthccmpirica l likclihoodattains themax imumvalu e amongthesetofT(F)=It.The profileemp irical likclihood
funct ionisdefined tobc
L,,(ll ) =Sli p{L,,(F )
I
T(F)=1',FEF}.Wc can constru ct th clikclihoodinfcrcncc onl'basedonL,,(ll ).Thislikelihoodhas
simila r prop ert iesto itspar am etri c counte rpa rt.SinceLn(JI}$n-",it is convenie nt
tostanda rdizeL,,(l l )bydcfiningthclikelihoodratio functi on to be
R(F) = n"L,,(l t),
and itis easily shownthatthis can bewrittenas
R(F) =D"P;'
Thelikelihood ratiofunctionhas amaximumvalu e ofI. Forsimplicity,we can performinforence on anyfunctionFlisingthcpopulatiolllllcaup,=(/ll,112, · · · ,11d), viathc pro filccmp irical likclihood.Thc profilc cmpi ricall og-likelihoodfor11isd cfincd
3.1E~IPIlUCALLIKELIHOOD (EL)
Wecaucomputej'(u}by meximi zing{t IOg(Pi)}by theLagran gcmultiplicr
mcth odunderthe aboveconstraints. ThcLagrangemultipliermcth odisvery c
f-fcctiveforthis const raint maximizationproblem.Define
whcrc>'(vcctor-valncd )and 1 arcL agran gcmnltiplicrs.Byscttin g th cp arti ald
criva-tive ofGwith respecttop;tozero, wc gct
1;'=n{ I+>';'(Yi-ll)}'fori =1,2, andthcLagrangemult iplier
>.
=~(I')is thcsolutionofTherefore,wecanwritcthc profilccmpirical likelihoodfunctio nas
(('1)=-nlog(n) -t 10g(1+>.T(I')(Yi -JL» .
Noww cdcfinetheprofilc empirical log-likclihoodratiofunction tobc
W(JL)=t log(npi)=t Jog[I+>.'1'(I')(Yi-
'1
)
].
Owen (1990)showed that,when110isthetruepopulationmean ,211'(/10)-.E..;X~
asn---too,simiiar to t hc paramet ric likelihood rat io function of Wilks( 1938).
This result isuseful forhypot hesistestson parameternand for thcconst ruct ionof
3.2PENALI ZEDEMPIRIC ALLIKELIHOOD BASED VA RI ABL ESELECTIO N
wherex?L(1 -a) is the(1-o)" qua ntileof the chi-squa re distribution wit hd de-grees of frcedom . Thisisdifferent fromtheconfide nce intervalsbas ed ona normal
approximat ion.
3.2
P
enalized
Empiric
al
Lik
elihood
ba
sed
V
ariabl e
S
el ection
Owen (1991)first considere dEL forlinearmod els.ELconfidenceregions for
regres-sioncoefficients in linearmod els werestudied by Chen(1994). We consideralinear
mod el of the followingform
lIi=Xif3+fi,;=1,2,
where Xi E'RPis a vectorof' covari ates and{jE'RPa vectorof' parameters.
assume that the lId Xisare condit iona lly independ ent.We alsoassume thatthccrror
term e, isindepend ent and identically distr ibute dwit h mean zero and finite variance (]2.Thus,E(lIdX;j=xif3is thecondit iona l meanfunct ion andVar(yd Xi )=(]2.
3. 2PEN ALI ZEDEMPIRIC ALLIKE[,1JIO O D BASEDVARIABL ESEL ECTI O N
FollowingOwen (1991)and Qinand Lawless (1994),we can extendtheempirical li ke-lihood infereneesforlinear modelsbas ed on a set of estim a ti ngfllneti ons y(y,X,{3). Assume that thegeneralized linearmodclisdefinedhy E[ Y(Yi,Xi,{3)]=0.In
gen-eral,.qis a vecto rofl' x1estima t ing funct ions.The profileempiricallog-likelihood
func t ionofd isdofin cdby
f(
{3)=S
ll
P
[
~ lOg(l'i):
l'i>0,i=I,2 ,...,n;~
l'i =
l,
~
l'iY(Yi,Xi
,{3
) =ol
Usingthc Lagra ngemlllt ipliermcthod disellssed ill Section3.1,we calldefine
whereX(vectorvalued)and1 are Lagran gemultipl iers. Setting thcpar tial derivative ofGwit hrespectuip,equa l to zero gives
l;i=n{
I
+
5
/
Y~Yi'
Xi,(3
)
}
'
fori=1,2, where theLagran gemult iplierX=).({3)isthesolut ionof~ 1
/;~~~:'~:,(3)
=0. Thislcadsto t he pro filcem piriea l log-like1ihoodfllnet ionf({3)=-nlog(n)-
~lOg(1
+).T({3)y(Yi,Xi,{3)) (3.3)andthcprofilccmpirical log-likclihoodratiofnnction isdcfincdtobc
1I'({3)=
~
log(n[i;)=~
log(!+
>.T({3)g(Yi'X;,{3)). (3.5)Nowwedefinethepenalized empirical likelihoodest ima torof {3as themaximizerof
L({3)=-nl og(n )-
~
[log(!+>.T({3)9(Yi,Xi,{3ll] -nt
P. (I{3j ll=e({3)-nt p· (I{3jll (36)
wit hrcspcctto {3,whcrc p.(*) isthcpcnalt yfnnction.Wccannsc any of thcpcnalty functionsdiscussedinChapter2.Var iyath (2006) firstintroducedthePEL,but reportedsome computationalissueswithover-penalizat ions.WelISCt.hecontinuous diffcrcnti al smoot hlyclippedabsolute deviati on (SC AD) penaltyfunctionwithtwo unknowntuningparamet ers(,s,Il)prop osedbyFan and Li(200!)and definedin (2.2).Intheuextsection wewill discussthedistributionprop crticsof th cpcnalizcd cmpirical likelihoodest imatcsof (J dc rivcd by Variyath(2006).Thc algorithm for t hc penalized empirical likelihoodwillbediscussedin thenext chapte r.
3.3
Di
stributional
Prop
erties
Variya th(2006) stat edand provedtheorems inconnection withPEL;wcreprod uce themhere.Let{30=({3io,{3~)Tbethctruevalue of {3wit h vectorlengths ofk
and p- kres pec t ivcly. Wit hontloss of generality,wc essumcthatfij,=
o
andall componentsof f3lOare no nzero .Let I( f3o)bethe Fisherinforma t ion matr ixand letII(f3111,0)betheFisherinformation given f3,o=0. Under someregularitycond it ions, our penalized em pirica l likelihood
SC
AD
est ima t or/3
=(/3
~
,
/3
~)
'l'satisfies the ora cle propertiesfor a certainchoiceof the tuning paramet ers(S,lL).Hence,itis ca..syto provethat./3
,...!:..t
0
and..;ii(/31-f3IO) -!3...,N(0,I
i I(
f31O'0)). Thefollowingtheor em provesthoexiste nceofalocalma xim izer of thepenali zed emp irical likelihoodL(f3).Theorem3.3.1(Variyath,2006)Suppose (Yi,Xi),i=1,2,..,n isasetof in de-pend ent an didentica lly dis triinu ed rand onivectors.Letgi(f3 )=g (Yi ,Xi,f3)bethe
estim ati ngfllnction8 f or f3E R.PslIchthat forenchi= 1,2,...n,
E{gi(f30)}=0
[or someBs.Al80 asslIme tha t
(i)V= E{g (f3o)g'l'(f3o)}ispositive definite,
(ii)
OX;;
)
is continu ousinf3inaneighborhoodoff3o , (iii) therank of E{O
X;;)
}
isl'inaneighborh ood off3o,(iv)Ihere exis ls sam e juncti01~. G (y, X)such Ihalinaneighbarhaad aj{3o,
ID
:;:)
1<G(y,X),IIg(y,X,{3)1I3<G(y,X) such IhalE[G(y,X)]<00.Thetuningparamet er<5is chose nas ajunction ojn such thatlIIax (P6..I,Bjnl:
,B
jn i
0)--; 0 as n--;00. Thenthere ex -ists a local maximizer13
o]L({3) such that1
1
13-
,B
oll
=Op(n-I/2
+
bnl. whereTheorem3.3.1ShOWHthatforanappropriate choice ofe.;thereexist sa root-neon
-siste nt penalized empiricallikelihoodestima tor.Thefollowinglennnn showsthatthis
est imator musthavethespars itypropert.yi32 =0.
Lemma 3.3. 2(Vari yath,2006)SUPlJOse (y"Xi),i=1,2,. ., nis a seto]itule -pendentandident icallydist ribut ed1Tlwl am vectors. Letg;{(3)=g(Yi,X,,{3) bethe estim ating jun ct ionjor{3ER." such that,jar eachi=1,2.
E{g,({3o)}=0
[or some{3o.Also assumethat
(i)If=E{g({30)gT ({30)}isposiiioedefi ni te,
(iii)IhemnkOfE{Dg;:')}ispinaneighborhood of {3o,
(iv)Ihere exists s om efn nctioll.<C (y,X).m chlhal in a n eighborhood of{3o ,
IDg;:')1<C(y, XJ,lIg( y,X,{3)113
<C(y,X) sucli thatE[C (y,X))<00.
(3.7)
If 6"-70and y'ii6"-700,then with ]J1'Obabilityten dingto1,for'any given(3, .m tisf yingll{3,- {3wll= Op(n-I/2)andan ycons tant C,
Usingtheabove lemm a , onecan prove thefollowingtheoremontheasymptotic normality of th e empir icallikelihood cstim at e.
Theor em3.3.3(Vari yath,2006)/n additian to the con ditions of Theorem3.3.1and Lem m a3.3.2,snpposethat
~
;}:~
is continuous in{3in a neighborhoodof thetme valneof(3o andisboundedby some integmbl e[unction Cry,X). Thenwherei3istliepeu alized empiric allikelihoodestima teof,8 and
~
=[E{
D
.
~~~
))
r
{E{g(,8o)rl (,8o)}-
I
}
E{
D~
~
o)}
]
-1.3.4
Pen aliz ed
Adjust ed
Empirical Likelihood
Computat ion oflV(,8) fora givenvalueof,8maylead tosometechnicalproblem .
The solut ion forA mustsat isfy{ 1+
>.
T
(,8)g(Yi'Xi,,8)}>0 foralli=I, ...,n.Anecessary andsufficientconditionforits existence isthatthe vector'O'isaninncr
pointoftheconvexhullof{g(Yi,Xi,,8),i=I,...,n }.Thetrueparamet er value,8o istheunique solut ionofE[g(y,X,,8)] =0. I3ut,under somcmom cnt conditions on
g(y,X,,8)(Owen, 2001),theconvexhull{g(Yi,Xi,,6),i=I,...,n}contains0as
itsinnerpointwithprob ability1asn-t00.When ,8 isnot closetoBg,orwhen
nissmall,t hereisaconsiderablecha ncethat t hesolut ionof(3. 4) doesnot exists.
Toavoidthisproblem,Chen,VariyathandAbrah am (2008)introd ucedthe adjuste d
empirical likelihood.
Denote9i(,8)
=
9(Yi,Xi,,8) andy,,(,8)=
*
t 9i(,8)for anygiven,8.posltl ve constn nt c.iriefine
9,,+1(,6)=
-~
{;9i(,8)Now theadjust edprofileempiricallog-likelih oodratio fuucti onisdefined as
[
,,+1 ,,+1 ,,+1 ]
W' (,B) = ~np8Iog[ (n +I )l'il :l'i>0,i=I,2, .. .,n + l ; 8I'i=l,8PiYi(,B)=0,
=
~ log [l+
,\T(,B)Yi(,B)]
,,+1wit h,\='\(/3)bein gthesolu t ionof 8 1
+
Y~(f;'(,B)
=O.Not etha t now0always liesinsidethe convex hull of{Y(Yi'Xi,,B),i=I, ...,n }.Theadjust ed em piricallog-likelihoodratiofuncti oniswelldefinedafte rad d ingapscud o-valucY,,+l(,B). Fora
wide ra ngeo fa,,, W '( ,B)havcsa me firstorderasylllp tot ic p roper t ics of W( ,B)(see
Che net al.,2008).Weextend thisidea of penali zedadjust ed em p irical likclihoodt0
avoid thetechni calproblemof non-ex iste ncc of solution to(3.4) foranygivenvalue
of ,B.
Nowwc dc finc t he pc nalizcd adjllstedelllpirical likclihoodcstilllator of ,B asthem
ax-(38)
with resp ectto,B,WbereI'6(*)isthe penalt yfunctiondefinedin(2.2).This adj ust-mentisparticul arl yusefulbecau se even for some undcsirnbl e valuos ofdand tuning
parameters , theproposedalgorithmguaranteesa solut ion,Now,\VC ca llshow that thcpenalizcdadjust ed empirical likelihood has thesa measymp t ot icprop erti es ns
3.4PENALIZED ADJ UST EDEMPIRICALLIK ELlil OOD
thcpcnalizcdclllpiricallikclihooddctail cdin Scction3.3.WcHtat c andprovc thcfol
-lowingtheoremsandlemmato showthat thcpenalized adjustedempiricallikelihood
estimateshaveoracleproperties.
Theorem3.4.1Suppose(Yi,Xi),i=1,2, ..,11.isasetof independent and ule
nti-cally dist ributedmndornuectors.Letgi(13)=9(lIi,Xi,13)bctlie estimaiin qjunctions for13ER'psuchthatforea ch i =I,2,... n,
E{gi(13o)}=0
for some130'
(i)V=E{g(13o)gT(13o)}ispositivedefinit e,
(ii)Dg;) iscontinuo us in13in a neighborhoodof130,
(iii)themnk of E{Dg ; )}i8Pin a neighborhoodof130,
(iv)there exists somef unctionsG(y,X)suclithatinaneighborhood0f13o,
IDg ; )1<G(y,X),Ilg(y,X,13)II"<G(y,X) such that E[G(y,X) ]<00.Thetuningl'ammeterJ ischosen.asafunction of Tnsuc h.thatIllax(p:l,..
l
!3jol:!3jooj
0)---+0asTn---+00,uihereTn=n+
1.Thenthereexists alocalrnaxirnizer13ojU(,B)such that1113-/3011=0,,(rn-I /2+Ii", ), wher·eli",=max(l'U/3jol:/3jo"/O).
LetIt",=rn-I/2
+
Ii", .Itis sufficient to show thatfor anyE>0,there existsalarge enonghCsucht hatPr {supL' [(,Bo+a",u);lIull=C]<L'(,Bo))2':1-E. (3.9)
Thisimpliesthatforlargern withprobability atleast1-E,thereexists alocal maximizerinthe ball[(,Bo+a",u);Ilull=CI.Hence,there existsalocalmaximizer
suchthat1113- ,Boll=O"(n,,, ).Let
D;,,(u)=L'(,Bo+a",u)-U(,Bo)'
D;,,(u)={f' (,Bo+a",u)-f'(,Bo)) -{l', (,Bo+n",u)-l',(,Bo)) ={f' (,Bo+a",u)- e'( ,Bo))-111
~
{l',(I,Bjo+a",ul)-l', (I,B,ol)}, whereI;is the numborofcomponents in,BIO'TheLagran gemulti plier inA(/3o)can be expressedas3.4 PENALIZEDADJUSTEDEMPllliCALLIKELIIIOOD
-C({30)=t lOg{l+..\T(.Bo)9i({30)}+op(1)
=
t
..\T(.Bo)9i({311) -~
t
[..\T(.BO)9i({30)]2+op(1)
=*g:': ({30)\~;;I ({3o)gm ({3o) + Op(1).Now, letting
(3. 10)
It can easilybe shownthat~ist.heasympt.ot.icvariallceofvm(j;-{3o ),undso t.he representat ionissimilartonormalizcdparametric likelihood.Bythcccnt.rallimit. thcor em ,f'im({30)isOp(m -1/2),thusthefirst term on theright-hand side of(3.10) is oforde rOp(ml/20 m)=Op(mo~. ).By select ingalargeC,the second tcrmdomin at es thefirsttermuniforml yinlluj]=C. Thethird term isbound edby
Thisis also dominat edby thesecond term in(3.10).Hence,by choos inga sufficient ly lar geval ue ofC,(3.9) holds.This completesthe proof.Thcorem 3.4.1 showsthat foranappropriatechoiceofJ"othereexistsaroot-r n consiste nt penalized empirical
3.4PENALI ZEDADJUST EDEMPIRICALLIKELIIIOOD
likelihoodestimato r. Thefollowinglemmashowsthatthisest imato rmusthave the
spars ityproperty !32=0.
Lemma3.4.2Suppos e(Yi ,Xi)'i=1,2,.. ,'IIisaset ofindepen de nt and iden tica lly
dist ribniedrandom vcctors .Let gi({3)=g(Yi,Xi,{3) bethecstima tinqjuncti onIor
{3ERi'suc h that,for' eachi=1,2 ,...,n,
E{gi ({30)}=0
fOT 8om e{30'
(i)V= E{ g({3o)r/"({3o)}ispositivedefin ite,
(ii)ag;::')iscontinuous in{3ina neighborhoodof {3o,
(iii)the m nk Of E{ag;::' )}ispin aneighbor-hoodof{3o,
(iv)there existssome junct io nsG(y,X) sucli that inaneighboThoodof {30,
lag ;::')I<G(y,X),Ilg(y, X,{3)II"<G(y,X)
suclithat E[G (y,X)]<00.
3.4PENALIZEDADJUSTEDE~IPII\ICALLIKELIHOOD
wher-e77l=n +1. Ift5m...Oand,fiiit5m-tOO,thenwithprobabilitytendingto1,for any givenf3, satis f yingllf3,-f31011=0 ,,(711-1/2)and anycons t antC ,
FollowingFanandLi (2001)inproving thi sLclJlma,itis sufficicnttoshowth at forf3 sat isfyingf3,- f31O=Op(771-1/2) and forSOIJlCsmallf"= Cm- '/2, and j="'+1,...,1',
iJ~;:
)
<0 for 0<o,
<f mfor- fm< (3j <0. (3.12)
Duetothcconditiononl'",,( If3I),thc taskisequivalenttoshowingthat,uniformly inf3,
Thatis,the slopearoundthetrnevalueoff3islowcomparedto thc slope ofthe
penalty.Now
Sincef31 -f310=01'(m-1/2),itis simple to show thatwestill have
Helice,
uniformlyin both i =1,2,.. ,m and {3. Thuswehave
I
D~~)
I
s
II,\T(f3j)II
~
II
D
~~j)
II
[1+",,(I)]=0,,(m-I/2)0I'(m)[1
+
",,(I)] =0,,(ml/2) .Usingthe aboveresults,foreachcomponent of{3wehave
Usingthe assumption(3.11),.fi1i8", -;00and8m- ;0,thesignof thederiva tive is complet ely determinedbythat off3j.Hence (3.12) holds.This completesthe proof. Usingtheabovelemma,we canprove thefollowingtheorem ontheasymptotic norlIlalityofth e adjustedelIlpiricallikelihoodestilIlat e.
Theorem3.4. 3In addition totheconditions ofTheore m3.4.1an d Lemma3.4·2, suppose the second derinaiiuesof each componen tofg,say g[k J,
D~~;
]
,
aI'xI' matrixwiththe(ij)thentry~:+, iscontinuollsin{3inaneighbo:,.hoodof{3,"and is boundedby somein tegrable functionCry,X).ThenwheTe!:Jisthepenalizcdernpir'icttllikelihoodestimate of {3 an d
A
=[
E{
D~
~
)o
)
r
{
E
{g ({30)gT ({3o)}-)}E
{
D
~
~o
)}]
-I. Duetothe sparsity propert y givenin Lemma 3.4.2, itis soeuthatthopenalized adjustedempirical likelihood estimat orwith propel'tuning para meterri .,maximizes L' {({3"Of} withrespcct tod..Hence,3.4PENA LIZEDADJUST ED EMl'lfliCAL LIK ELlII OOD
F
ornotational siIllPlicity,Wedonotdiffer
enti
ate~ and~fortherest ofthe
proof.Thatis,wepresentour proof asifk:=1'.Ifwe expandthese functi ons at
(13= 130,>'=0),wehave Li.",(J3,,\)=Li.m(I3I"0 )
+
[Li
,
",~:o
,
O)]
(13- 130) +[Li
,,~
~;,
O)
]
(,\_0)+op(o",)=0,Li,m(J3,'\) =Li,,,,(130,0)
+
[Li
,
",~o,O)
]
(13-130)+[Li
.,~
~;,
O)
]
(,\_0)+01,(0", )=awhere0",=1113-13011+11'\11,Thepa rtialderivati ves intheaboveexpa nsionsare
Li
.
",
~
o
,O
)
=~~
DgD~o)
-t_
E{Dg~;o)
}
,
Li,mi[;,,0'O)=
~
t
gi(l3o)gi(l3o)-tE{g(l3o)gT(l3o)},Li
,
,,,~o,O)
=
p:;",(Il3ol),
Li,",i[;,,0'O) =
~
t{
DgD~o
)
}T-tE{Dg~
;
o)
r
,
SinceL;,,,,(I3IO'0)= 9",(130 )=
Op(m-1/2) ,we caneasilyshowthatJ",=Op(m-I/2). When
1"L
(1131)--+0as m --+00,thelimiti ngdistribu tion of/31-1310will beasy nip-toticallynormal vi.c.,and5;':=-~-Iisthe (2,2)'" elementof5;;,1 as suming
p
'L
(1131)=O.This completes theproof,Chapt
er 4
Num
erical
Algorithm
Toimplement our meth od,weneed an efficient numerical algorit hm.Variyat h(2006) rep ortedsomecomputat ionalissues wit hover- penalizat ionsthatresulte d inhighbias .
Wemaximi ze the PEL wit h respectto(3 using amodified Newton-Rap hsonalgorit hm.
Ateach it erat ion oftheNewto n-Raphson meth od,we computetheLagra ngemult
i-plierforan updat ed valueof {3.Chen,Sitte r,and Wu(2002) prop osed amodified
Newto n-Ra phsonalgorit hm forco mput ingt he Lagra nge lllult iplier fora givenvalueof
the param et er.Thismeth odisnumerically sta ble,which isusefulin this application.
The numerical algor it hmgiven inSect ion 4.!and4.2 canbe easilyextended tope-nalizedadj uste dempiricallikelihood,byadding apseudo- value g,,+I({3)=-a"g,,({3),
4.1
Computation of La
gran ge
Multipli
er
TheLagra nge mnlt iplier>..is est imate d by solving the equa t ionfora givenset ofvectorsy,({3),i=1,2, ...11.Note that the aboveequa t ion is the derivative offlwithresp ectto>"for agiven {3,where
(4.1)
In the empirical likelihoodproblem,thesolut ion must satisfy the conditiontha t
1+>..Ty,({3)>0,i=1,2,...11.
The modified Newton-Ra phsonalgorit hm for estima t ing>..for a givenvalueof {3is
1.Set>'"
=0, C=0,
'l
=
1,e=le -08,and {3=
(30.2.LetR"andR""bethefirst andsecond parti alderivati vcsof figivcnill(4.1) withrespect. toA,which are given by
Compute
R"
andflufor>..=>'"and let.6.(>..')=-[R""
r
'
R"
.
If116.(>"' )11<e sto p thealgorithm andrepor t.>"' ;oth erwise cont inue.3. Calculatelic
='It.( N).If1
+
(A'-(j')Yi({3):::;0for somei,let -(=f
and go to Step2.4.SetAc+1
=
AC-li',c
=
c+I,and1c+1=
(c+1)- 4andgoto Step2.Ste p 2 willgua ranteetha t Pi>0and theopti mizationis carr iedout. intheright4.2
Algorithm for Optimizing Penalized Empirical
Lik
elihood
Let.,X(iJ)be theest imate dvalueofAfor a given {3.Wemaximi zethoPELdefinedin (3.G)over {3.Weuse the modified Newton-Ra phsonalgorithm prop osedbyFan and Li(2001).Notet.hat. thepenal ty funct.ion l',(liJjl)isirregular at. theoriginand may nothave a secondderivat iveat somepoints. Specialcare isneededin r.henpp licatio n of theNewton-Ra phsonalgor it hm. Here too, thopenalty funct ionislocally a pprox-imntcd asdet ailcdinSect.ion2aspro posed by Fanand Li(2001). We assumethat the profile emp irical log-likelihood funct ion is smoot hwithrespect. to{3so thatits first. t.wopart.ialderivat.ivesare cont.inuous.Thus,t.hefir st term in the profileempiri -cal log-likelihoodcan belocally approxima te dvia Taylor 's expans ion.Therefore,the
maximizationproblemcan beredu ced to aquad ra t icmaximi zationI'rohlcm , andthc Newton-Ra phsonalgorithmcan hcused.ThemodifiedNewton-Raphsonalgorit hm
forcsti ma ting(3usesquadra tic approximati on of theprofilc empiricallog-likclihood
functi on .Analgorit hmfor opt imizing thepenalized empirical likelihood , similarto
thatinFan andLi(2001),is asfollows:
1. Set(3=(30,andee-Ic-OS.
2.Let5.=A(,B)bethe est imatedvalueofA.
3.Theparam eter(3 iscomputed itera tivelyandthesolutionatthe(I.:+I)'"
iterationis givenby
where\V((3) istheprofile empirical log-likelihood ratio functi on defincd in(3.5),
11'1'=
Dl~;,(3)
,
11'1'1'=~
~~~)
,
((3k) .[
P
J
(il3
flJ
P
J(I
,B
l~lJ]
1U
((3k) c-((3")(3k E; eediag~,...,~,all{ ; =u ; .Notethat to compute\VI'andIVI'I',wenccdto estima te theLagran
gcmulti-plier 5.(,8)asper Section 4.1.
4.If minl(3(k+ll_(3(kl
l<
e sto pthealgorit hmand rep ort(3(H I);otherwiseI.:=We examinethesimplifiedexpressionsforIVI3andIVI3I3asfollows.LetRI3,RI3I3, and RI3Abethefirstandsecond partial derivatives of(4.1)wit hrespectto {3and '\
Now thefirstderivativeof1V({3)withrespectto{3is
Notethatfor,\
=
5.(,8),RA=
o
.
Therefore,(4.3)
Similarly,thesecondderivative of W({3)wit hrespectto{3 is
"
[
(1
+,\"I'(,8
)9
i(
{3
)
} { [~
]
[gi({3)f+2g;({3)[~r
+g;'({3),\(,8)"I'}] 11'1313=8
{I+
,\"I'(,8)gi({3)F_ "[{
[~rg
i({3)
+g;({3),\(,8)}{
[~rg
i(
{3
)
+g;({3)'\(,8)}"I']8
{l+,\"I'(,8)gi({3))2FollowingOwen (2001),alocalquadraticapproximat iontoRleadsto
(4.4)
Optimization over {3 iseasier ifWI313isnegativ edefinite.The secondterm in(4.4)is negative semidefinite,hntthefirsttermRI3I3mightnotbe.
4.3
S
election
of Thr
esholding
Param
et er s
TheSCAD penaltyfunction involvestwounknownpara meters,<5anda.In prac-tice,we couldsearch forthebestpair(o,a)overatwo-dim ensionalstructureusing cross-validation(CV;Stone,1974) orgeneralizedcross-validation(GCV ; Cra venand Wahb a,1979).However,thisiscomput at ionallyexpensive.FromtheI3ayesianpoint ofview, Fanand Li(2001)suggested usinga=3.7,andthis value will beused throughoutonrsimulat ionstudics.Lct t heclllpiricallikelihoodrat iofunction evalu-atedat(3and),.(/3)be
11'(13)={t IOg(1+>..(fW 9i(13)} .
Then,wedefinetheGCVcrite rionto be
GCV(
J) =~
n[l- e(J )/ll f'where e(J )isthe elfectiv ennmberofregression coelficients given by (4.5)
wherell'tltl(13)isthe secondderivative oftheprofile empiricallikelihoodfllllction with respect to{3(see(4.4))cvaluatcdatB,tr dcnotcst hctracc of amatrix.Wc choosethetuningpar amet ersd tomillimizeGCV(J).
4.4
Standard Error Formula
The standarderrorsfor theestima tcd regressionpar ameterscan be estimated di -rectlybecauseweareestima ting theparam et ersandselectingthe variablesat thc sametime.Following theconvent ionaltechniq ueinthelikelihoodsetting,theco rre-spondingsandwichforlllula callbcllscd a.."i an estimat.or fort hccovarianccmatrixof the estimates{J:
Chapter 5
Simulation Studies
We conductedaperformanceanalysisbased011a series ofMonte-Carlo simulat ions inIinearregression,Poisson regression,and logisticregression andab oappliedour method toareal-data example. In the simulat ionstudies we compare our met.hod wit hthepenalized-likelihoodSCADmethod .Our performan ce measuresfor these
compa risonsarc the median oftherelativemodel error(MRME),thoaveragonumber
ofestimatedzerocoeflicientsthatare initially settozero,andr.heaveragenumber ofzerocoellicientsthat.nrcnot.initially settozero.\Ve alsocomparetheest ima ted valucsof thcIlOllzcrococfficiclltsalld thccolTcspondingstandarden ors
MedianRela tiv eModelError (MRME)
FollowingTibshirani (1996), we comp arcthclllcdi an ofthcrelativ cmod cl cr ror (Fan
and Li,2Q(1l) ra t he r than themean rela ti vemodelerr or becauseof theinst ab ility of thc best-subsetvaria bleselect ion.Thcmodelcr ror for the linea r mo dcl is dc fincd by
M
E(i3 )
=(i3
-13f E(X'l'X )
(i3
-13) .
The erro r for the selected mod elis compared to theerrorofthe fnll mod el,For eachvaria bleselect ion method,we com pute d thcmedi an of thcrclativc modcl err or,and
this isreportedin the sirnula t ionst udies.
5.1
Lin
e ar
R
egr essi on
M
od el
Yi=X;f3+Ufi (51)
with13=(3,1.5,O,O,2, O,o, Ofwbcrc Xi=(:ci" :ri,,..,"'i,,) is a vectorof covariates andpe-S.The comp OllclltsofXand€are st alldardll ormal,the corrclationbetween Xiandz ,isO .5Ii - j l,andU=1.Thclcast- sqllar cs cstirnatc of13i s givcnby
_ [" t:
]
-
1
"
T T-
I
'l'
The est imat ing eqnati onfor{3is given by
g({3)= t X; [Yi-X;{3]=0
and thefirstderivativ e of the cstima t ingequat iong({3)with respec tto{3 is
g'({3)= -t xrXi'
(5.3)
componentsofXand e being sta ndar d normal.ThisisthemodelusedbyTibshir an i (1996).Our penalized-empirical-likelihoodSCAD (PE LSCA D)is compared onlywith SCADsince FanandLi(2001) report edthatSCAD perform sbett er than LASSO and other inform at ion-th eoret ic approaches.Following Tibshiran i (1996) and Fan
and Li(2001),theperformanceof thesemethodswasassessedbas edon~I1U"Eand
thenumberofzerocoefficients.We also rep eat edthe ent irest udy withsamplesize
ill Tabl e5.1.Italso report sthe averagenumber ofzero and nonzero coefficients. The column lab eled "Correc t"gives the averagenumber ofcst imate dzero coefficients thatwereinitially set tozero,and the column lab eled "Incorrect"givestheaverage number of zero coefficients that werenot initiallysetto zero.Theest iruated values of thenonzero coefficientsand the corr csp ondin g st and ard error s ar erepor tedin Table 5.2.From Tabl e5.1we see tha t forII=60the MRMEof SCADis slight lysma ller
than thatof PELSCAD,andfor bothmeth odstheaverage number ofzero coefficients
isdoseto thetargetof five.Whenthesamplesize increas esto100,theMR~IEof PELSCAD islowcompa red tothat of SCAD.The averagenumberof zerocoefficients is aga in dosetofive.Thisdearlyindicatesthatbothmethod sperform well whena
par am etri cmod elisavailable.
l'vIRME% Avg.no.ofzerocoefficients
Correct Incorr ect
n=60,a=1 SCAD 35.57 4.61 0.0 PELSCAD 36.52 4.61 0.0 n=lOO,a=1 SCAD 41.50 4.85 0.0 PELSCAD 34.55 4.95 0.0
Table5.1:Simulat ion resultsforlinearregressionmodel
Method (3, (32 (3" n=60,a=1 SCAD 3.015 1.474 2.003 (0. 167) (0. HJ5) (0.136) 3.002 1.496 1.999 (0.163)(0.170) (0.141) ueel.Hll,a=1 SCAD 3.027 1.442 2.003 (0.139)(0.185) (0.104) 2.999 1.499 1.999 (0.120)(0.124) (0.104)