An Optimization Approach for the Parameter Estimation of the Nonlinear Mixed Effects Models

(1)

NonlinearMixed Eets Models (Under the diretion of John F. Monahan.)

Nonlinear mixed-eets models (NLMM) have reeived a great deal of attention

in the statistial literature in reent years beause of the exibility they oer in

handlingthe unbalaned repeated-measurements data that arise in dierent areas of

investigation,suhaspharmaokinetis. Weonentratehereonmaximumlikelihood

estimation for the parameters in nonlinear mixed-eets models. A rather omplex

numerialissuefor maximumlikelihoodestimation innonlinear mixed-eetsmodels

is the evaluation of a multiple integral that, in most ases, does not have a

losed-form expression. We restritour attention in this artile onnumerialmethods that

are based onapproximation for the likelihood. Several numerialapproximations for

the likelihood have been proposed suh as rst-order linearization (FOL), Laplae

approximation, Importane Sampling, and Gaussian Quadrature (GQ). In addition,

for ageneraloptimization problem,iterative methodsare usually required toupdate

the parameter estimates iteratively. Alarge numberof parameterupdating methods

have been developed suh as Newton-Raphson, Steepest Desent, Stohasti

opti-mization, et. Many urrent optimization algorithms implement a Newton iterative

method to update the parameter estimates in NLMM. The objetive of this thesis

is to propose an optimization approah for the parameter estimation in nonlinear

(2)

rameter estimates in NLMM. In Chapter 1, we desribe the model and introdue

several likelihoodapproximationsandparameterupdatingproedures forthese

mod-els. The proposed optimization approah is illustrated in Chapter 2. In order to

omparethis approahtotheotheroptimizationmethods, simulationsare performed

and onlusions are drawn based on simulation results in Chapter 3. Some future

(3)

by

Jing Wang

A dissertation submitted inpartial satisfationof the

requirementsfor the degree of

Dotor of Philosophy

in

Statistis

in the

GRADUATE SCHOOL

at

NC STATE UNIVERSITY

2004

Professor John F. Monahan

Chair of AdvisoryCommittee

Professor DavidA. Dikey Professor Marie Davidian

(4)

Nonlinear Mixed Eets Models

by

(5)

To my parents

(6)

Biography

JingWang wasborn inTonghua, China, toparents Yurong Ma and Weijia Wang on

Otober22, 1971. She earnedher Bahelorof Siene degreein MathematisinMay

of 1993, a Master of Siene degree in Probability and Statistisin May of 1996, all

atthe Northeast NormalUniversity inChina. She beganher Ph.D. studiesat North

CarolinaState University in August 2000, and her thesis defense was July 21, 2004.

Aftergraduating,shewillbeginaareeratLouisianaStateUniversity asanassistant

(7)

Aknowledgements

Thankstoallofthe followingpeople. Withoutyouthiswouldnothavebeenpossible.

To my advisor, Dr. John Monahan, for your instrution and patiene.

Tomyommitteemembers,ProfessorsMarieDavidian,BibhutiBhattaharyya,

and DavidDikey, for atually readingall of this.

To the friends I have made atNCSU, Luna, Weiwei Wu, and Jimmy Doi.

Speial thanks to my parents and the rest of my family for enouraging me

(8)

Contents

List of Figures viii

List of Tables ix

1 Introdution 1

1.1 Approximation methods . . . 5

1.1.1 Comparing approximation methods . . . 14

1.2 A numerialapproah forthe parameter estimation of NLMM . . . . 21

2 Stohasti Approximation 24 2.1 RMSA . . . 25

2.2 KWSA . . . 27

2.3 GeneralizedRMSA . . . 28

2.4 Simultaneous Perturbation SA(SPSA) . . . 29

2.5 The appliationof IS and SPSA in the MLE of aNLMM . . . 32

(9)

3.1 Example . . . 35

3.2 Designs and Simulations . . . 37

3.3 Some ImplementationIssues . . . 43

3.3.1 Choosing startingvalues by ad ho method. . . 43

3.3.2 Improvingad ho estimates by LSI method . . . 45

3.3.3 Saling . . . 48

3.3.4 AlgorithmFator Issues . . . 57

3.3.5 A numerialstudy onalgorithmfators . . . 63

3.3.6 AlgorithmForm . . . 64

3.3.7 Results onalgorithmfators . . . 68

3.3.8 Optimizationmethod pro nlmixed . . . 69

3.4 Compare SPSAIS to pro nlmixed . . . 70

3.5 A largesample size problem . . . 72

3.5.1 Conlusions . . . 74

3.5.2 Twoadditionaltypes of relative error . . . 77

3.5.3 AlgorithmVariane . . . 78

4 Future work 80 4.1 More appliations . . . 80

4.2 A quadratioptimization . . . 81

(10)

Bibliography 83

A ANOVA Code and Results 89

A.1 Algorithmfators . . . 89

A.2 Tukey's multiple omparisons tests . . . 90

A.3 Relative squares error . . . 91

A.4 Maximum relative squares error . . . 92

B SPSAIS 94

(11)

List of Figures

2.1 Desribethe relationship between the sign and the slopein RMSA. . 26

3.1 The trunk irumferene of the orange tree over1582 days. . . 36

3.2 Four prolesof the simulateddata. . . 40

3.3 The plots for 16parameter settings . . . 41

(12)

List of Tables

3.1 MLE of the parameters

1 ;

2 ;

3 ;

2

; and 2

b

;by nlme . . . 37

3.2 Designfor the Parameter Settings . . . 38

3.3 ad ho startingestimates inthe rst parametersetting . . . 48

3.4 LSI starting estimates in the rst parameter setting . . . 49

3.5 An overall designon parameter settingand algorithmfators . . . 64

3.6 The treatmentmeans of the resultant R AEsfrom IS;AS, and SS . . 68

3.7 The number of funtionalls requested by nlmix(IS)(i). . . 70

3.8 Treatment meansof the R AEs resultingfrom methods. . . 71

3.9 Tukey's multiple omparisons tests for methods. . . 71

3.10 ad ho startingestimates forthe 1st parameter setting . . . 73

3.11 LSI starting estimates for the 1st parameter setting. . . 74

3.12 The number of funtionalls requested by nlmix (IS)(i) . . . 75

3.13 Treatment meansof the resultantR AEs fromIS;AS, and SS. . . 75

3.14 Treatment meansof the R AEs resultingfrom methods. . . 76

(13)

3.16 Treatment meansof the resultantR AEs fromSS levels . . . 77

3.17 Treatment meansof the R SEs frommethodsin smallersample ase. 78

3.18 Treatment meansof the R MSEs frommethods insmaller sample ase. 78

3.19 Treatment meansof the R SEs frommethodsin largersample ase. . 78

(14)

Chapter 1

Introdution

Nonlinear mixed-eets models provide a powerful and useful tool for analyzing

repeated-measuresdatathatariseindierentareasofinvestigation,suhaseonomis

andpharmaokinetis. Repeated-measuresdataaregeneratedbyobservinganumber

of subjets repeatedly under varying onditions. Nonlinear mixed-eetsmodels are

mixed-eets models in whih the intrasubjet model relating the response variable

to time is nonlinear in the parameters. We onsider here the model proposed by

Lindstromand Bates(1990). This model an be viewed asa hierarhial model that

insomewaysgeneralizesboththelinearmixedeetsmodelofLairdandWare(1982)

and the usualnonlinear modelfor independent data(Bates andWatts, 1988). In the

rst stage the jthobservation on the ith subjetis modeled as

y

ij

=f(

i ;x

ij )+

ij

; i=1;::N;j =1;::N

(15)

where f is a nonlinear funtion of a subjet-spei parameter vetor

i

and the

preditor or ovariate vetor x

ij ;

ij

is a normally distributed noise term, N is the

totalnumberofsubjets, and N

i

isthe numberof observationsontheith subjet. In

the seond stage, the subjet-spei parameter vetor ismodeled as

i = i (;b i )=A

i +B i b i ; b i N(0;D)

whereisap dimensionalvetorofxedpopulationparameters,b

i isam

0

dimensional

random eets vetor assoiated with the ith subjet (not varying with j), the

ma-tries A

i

and B

i

are design matries of size r p and r m

0

for the xed and

randomeets, respetively, and D isa ovariane matrix. This isa generalform of

the nonlinear-mixedeets model. Weoften writethe model for the ith individual's

entire response vetor y

i

. This is aomplishedby letting

y i = 0 B B B B B B B B B B y i1 y i2 . . . y iN i 1 C C C C C C C C C C A i = 0 B B B B B B B B B B i1 i2 . . . iN i 1 C C C C C C C C C C A

and f(

i )= 0 B B B B B B B B B B f( i ;x i1 ) f( i ;x i2 ) . . . f( i ;x iN i ) 1 C C C C C C C C C C A : Then y i

=f(

i )+ i (1.1) where i N(0; i

) is omposed of random error variables and

i

is a positive

(16)

We an write the N individualmodels as one by letting y= 0 B B B B B B B B B B y 1 y 2 . . . y n 1 C C C C C C C C C C A

f()= 0 B B B B B B B B B B f( 1 ) f( 2 ) . . . f( n ) 1 C C C C C C C C C C A ~

D=diag(D;D;::;D);and =diag(

1 ;

2 ;:::;

n

): Now, the overall model an be

writtenas

yj(x;b)N(f(););=A+Bb

bN(0; ~

D)

A=(A

1 ;A 2 ;:::A n ) T

;B =diag(B

1 ;B

2 ;:::;B

n

);b=(b

1 ;b

2 ;:::;b

n )

T

:

To have a better understanding of the nonlinear mixed-eets model, we onsider

a simple example. This example is given by Hartford and Davidian (2000) whih

onerns a study of the pharmaokinetis of a drug given as an intravenous bolus.

The one ompartment model is given inthe following form

y ij =f(x ij ; i )+e

ij = 100 V i exp ( Cl i x ij V i

)+e

ij

(e

i j

i

)N(0; 2 I N i ) Cl i

=exp (

1 + 2 100 a i +b 1i ) V i

=exp (

3 +b

2i )

b =(b ;b ) T

(17)

mayprovideasuitableharaterizationforwithin-subjetplasmonentrationatthe

timex

ij

forsubjeti,whereCl

i andV

i

representsubjet-speilearaneandvolume

of distribution, respetively, a

i

is aovariate for eah individual, and the ovariane

matrix is D = 0 B B D 11 D 12 D 21 D 22 1 C C A

: The xed parameter vetor is =(

1 ; 2 ; 3 ) T and

therandomeetvetorisb

i =(b 1i ;b 2i ) T

. Thus,thesubjetspeiparametervetor

is modeledas

i = 0 B B logCl i logV i 1 C C A = 0 B B 1 a i =100 0

0 0 1

1 C C A 0 B B B B B B 1 2 3 1 C C C C C C A + 0 B B 1 0 0 1 1 C C A 0 B B b 1i b 2i 1 C C A The matries A i = 0 B B 1 a i =100 0

0 0 1

1 C C A and B i = 0 B B 1 0 0 1 1 C C A

may be thought of asthe two \design"matriesorresponding to

i :

Combining the unknown xed parameters into one vetor and letting

=(;[veh(D)℄ T

;[veh()℄ T

(18)

the maximum likelihoodestimatefor an be found by maximizingthe likelihood L(yj) = N Y i=1 L i (y i j i ) = N Y i=1 Z p(y i ;b i jx i ; i )db i = N Y i=1 Z p(y i jb i ;x i ; i )(b i )db i (1.2) where L i (y i j i

) is the likelihood for the ith individual, p(y

i ;b i jx i ; i

) is the joint

density of Y

i jx i and b i , p(y i jb i ;x i ; i

) is the onditional density of the Y

i jx i given b i

, and (b

i

) isthe density funtion of the random eet b

i :

1.1 Approximation methods

Beausetheexpetationfuntionof(y

i ;b i jx i ; i

),thatis,f in(1.1)isnonlinearin

b

i

, thereisnolosed-formexpression forthe likelihoodfuntionLin(1.2). This

non-linearitymakesthealulationofthemaximumlikelihoodestimationoftheparameter

very diÆult. There are several approximationsproposed for estimating the

likeli-hood. We desribe briey here four dierent approximations for the likelihood that

are ommoninuse inthenonlinear mixed-eetsmodel(1.1): First-Order

Lineariza-tion (FOL) of the expetation funtion f around the expeted value of the random

eets (Sheiner and Beal, 1980; Vonesh and Carter, 1992), Laplae approximation

(Pinheiro and Bates, 1995), Importane Sampling ( Pinheiro and Bates, 1995), and

(19)

between these approximations ontheir omputationaland statistial properties.

First Order Linearization (FOL)

Aording to (1.2), the integral that we want toevaluate for the marginal

distri-bution of y

i

inmodel (1.1) for the ith subjet is

L i (y i j)= Z (2) (Ni+m0)=2 jDj 1=2 j i j 1=2

exp [ q(;D;

i ;y i ;b i )=2℄db i (1.3) where q(;D; i ;y i ;b i )=(y

i f( i (;b i ))) T i 1 (y i f( i (;b i

)))+b T i D 1 b i (1.4)

Shiener and Beal (1980) approximate the integral in (1.3) by approximating the

in-tegrand with a Taylor series expansion for f(

i

)about b

i

=0 beforeintegration

y

i

= f(

i (;b i ))+ 1=2 i e i f( i

(;0))+[f(

i (;b i ))=b i j bi=0 ℄b i + 1=2 i e i where e i

N(0;I

m

). Shiener and Beal proeed by onstruting the likelihoodfrom

this approximate model where b

i and

i

are normallydistributed as

L(yj) N Y i=1 Z (2) N i =2 j i j 1=2 exp 1 2 fy i f( i

(;0)) Z

i (;0)b i g T 1 i fy i f( i

(;0)) Z

i (;0)b i g (2) m 0 =2 jDj 1=2 exp[ 1 2 (b T i D 1 b i )℄db i

whihhas the following losed formexpression afterompleting the square

(20)

whereZ i =f( i (;b i ))=b i j b i =0 and V i =V i

(;0;veh(D))=

i +Z i DZ T i :For

simpliity, wedenote f((;b

i

))by f

i (;b

i

) inthe following disussion.

Laplae

Lindstrom and Bates (1990) attempt to improve on the rst-order

population-averaged approximation by expanding q around ^

b

i

, the non-zero estimate of b

i

or-respondingto the best linear unbiased preditor inthe linear ase. This

approxima-tion, alongwith a Gaussianposterior approximationused by Laird and Louis (1982)

and Stiratelli, Laird and Ware (1984), forms the motivation for the Lindstrom and

Bates algorithm. Alternatively, the Lindstrom and Bates algorithm an be derived

by Laplae's approximation,

L(yj) = N Y i=1 L i (y i j)= N Y i=1 Z exp fN i l i (b i )gdb i N Y i=1 (2=N i ) m0=2 jr i ( ^ b i )j 1=2 exp fN i l i ( ^ b i )g (1.5) where l i (b i ) = 1 N i ln[p(y i jb i )(b i )℄, ^ b i

is the estimate of b

i

by maximizing l

i (b i ), and r i ( ^ b i )= 2 l i (b i )=b i b T i j b i = ^ b i .

Theaboveapproximationin(1.5)isalledthe\orreted"Laplae's(laplae)

ap-proximationby Hartford (2001) beausethe exat seondderivative ofthe integrand

for the ith subjet is taken with respetive tothe randomeet b

i

. Another version

of Laplae'sapproximation tothe likelihoodfor the ith subjet whih Hartford alls

(21)

Bates (1995)by onsideringa seond order Taylor expansionof q in(1.4) around ^ b i q(;D; i ;y i ;b i

)q(;D;

i ;y i ; ^ b i )+ 1 2 [b i ^ b i ℄ T [ 2 q(;D; i ;y i ;b i ) b i b T i ℄j (b i = ^ b i ) [b i ^ b i ℄

where the linear term vanishes beause q(;D;

i ;y i ;b i )=b i j (b i = ^ b i )

=0. For

sim-pliity,denote q(;D;

i ;y

i ;b

i ) by q

i (;b

i ).

Consider anapproximation to 2 q i (;b i )=b i b T i atb i = ^ b i 2 q i (;b i )=b i b T i = 2 f i (;b i ) b i b T i j (bi= ^ bi) [y i f i (; ^ b i )℄+ + f i (;b i ) b T i f i (;b i ) b i j (b i = ^ b i ) [y i f i (; ^ b i

)℄+D 1 f i (;b i ) b T i f i (;b i ) b i j (b i = ^ b i ) [y i f i (; ^ b i

)℄+D 1

= Q

i

the seond lastterm is obtained by ignoring

2 f i (;b i ) b i b T i j (b i = ^ b i ) [y i f i (; ^ b i )℄ sine at ^ b i

itsontributionis usually negligibleompared tothat of

f i (;b i ) b T i f i (;b i ) b i j (b i = ^ b i ) [y i f i (; ^ b i )℄

Hartford (2000) alls this version \ulaplae" beause the integrand is

approxi-mated by a seond order Taylor expansion insteadof the exat seond derivativeon

b

i

for the ith subjet. The ulaplae approximation has the advantage of requiring

only the rst-order partialderivativesof the model funtionwith respet to the

ran-dom eets, whih are usually available from the estimation of ^

(22)

appropriate whenurvature is small. Therefore aording to(1.4) q(;D; i ;y i ;b i

) q(;D;

i ;y i ; ^ b i )+ 1 2 [b i ^ b i ℄Q i [b i ^ b i ℄ = (y i f i (; ^ b i )) T i 1 (y i f i (; ^ b i ))+ ^ b T i D 1 ^ b i + 1 2 [b i ^ b i ℄ T Q i [b i ^ b i ℄

The resultingapproximationfor the likelihood usingLaplae's approximation is

L(yj) Q N i=1 (2) N i =2 jDj 1=2 j i j 1=2 jQ i j 1=2

exp f q(;D;

i ;y i ; ^ b i )=2g Importane Sampling

ImportaneSamplingisanumerialintegrationtehniquethattakesadvantageof

the fatthatany integralan bethought ofasanexpetation. Consider the problem

of estimating the multiple integral

I = Z

v

1

(x)dx x2R n

(1.6)

Wesupposethatv

1 2L

2

(x). Thebasiideaofthistehniqueonsistsofonentrating

the distributionofthe samplepointsintheparts ofthe regionofR n

that areof most

\important" instead of spreading them out evenly. We an represent the integral

(1.6) by I = Z v 1 (x) v X (x) v X

(x)dx=E( v 1 (x) v X (x) ) (1.7)

where the expetation E() is taken with respet to v

X

(x): Here X is any random

vetor with p.d.f. v (x), suh that v (x) > 0 for eah x 2 R n

(23)

v

X

(x) is alled the Importane Samplingdensity. It is obvious from (1.7) that u =

v

1

(X)=v

X

(X)is anunbiased estimatorof I,with the variane

var(u)= Z v 2 1 (x) v X (x) dx I 2 :

In order to estimate the integral (1.7) we take asample X

1

; ;X

n

from the p.d.f.

of v

X

(x)and substitute itsvalues in the sample-mean formula

^ I = 1 n n X i=1 v 1 (x i ) v X (x i ) :

ImportaneSamplingprovidesa simpleand eÆient way of performing MonteCarlo

integration. The ritial step for the suess of this method is the hoie of an

importanedistributionfromwhihthe sampleis drawn and importaneweights are

alulated. Ideally this distribution orresponds to the density that we are trying to

integrate, but inpratie one uses aneasilysampledapproximation. In NLMM ase,

the likelihoodin (1.2) isan integral overthe random eets, say,

L(yj) = N Y i=1 L i (y i j i )= N Y i=1 Z p(y i jb i ;x i ; i )(b i )db i

Inorder toestimatethe aboveintegral, wewould liketouse ImportaneSamplingto

nd the distribution for the random eets b

i

for the ith subjet. Sine the random

parametersb

i

are assumed tobenormally distributed,if themaximum likelihood

es-timateofb

i ,say,

^

b

i

andtheestimateoftheHessianmatrixH

i

areomputedfromthe

integrand p(y i jb i ; i )(b i

),then theImportaneSamplingdistributionanbeviewed

as N( ^ b i ;H 1 i

). In other words, the density of N( ^ b i ;H 1 i

) should have a shape

sim-ilarto that of the integrand p(y jb

i ;

i )(b

i

(24)

the funtion that we want to integrate is, up to a multipliative onstant, equal to

exp ( q(;D;

i ;y

i ;b

i

)=2). As shown in the \unorreted" Laplae approximation,

the Hessian matrix H

i

an be approximated by a seond order Taylor expansion

on b

i

, say, Q

i

, and the integrand is, up to a multipliative onstant, approximately

equal to a N( ^ b i ;Q 1 i

) density. This gives us a natural hoie for the importane

distribution. Letting I(b

i

) denotethis ImportaneSamplingdensity and N

IS

denote

the number of importane samples to be drawn, in pratie one suh sample an be

generated by seleting a vetor z ?

k

with distribution N(0;I

m

) and alulating the

sampleof randomeets as b ? ik = ^ b i +Q 1=2 i z ? k

;k =1; ;N

IS

. Then the likelihood

of y approximated by ImportaneSamplingis

L(yj)= N Y i=1 L i (y i j) = N Y i=1 Z p(y i jb i ;)(b i )db i = N Y i=1 Z 1 (2) (N i +m 0 )=2 jDj 1=2 j i j 1=2 exp 1 2 fy i

f(;b

i )g T 1 i fy i f(;b i )g 1 2 b T i D 1 b i db i = Z N Y i=1 1 (2) N i =2 jDj 1=2 jQ i j 1=2 j i j 1=2 exp 1 2 fy i f(;b i )g T 1 i fy i

f(;b

i )g 1 2 b T i D 1 b i + 1 2 (b i ^ b i ) T Q i (b i ^ b i ) I(b i )db i N Y i=1 1 N IS N IS X k=1 exp n 1 2 (y i

(25)

Similar to Laplae approximation method, there are two versions of the Importane

Samplingapproximation (Hartford, 2000). The above method implementedthe idea

ofenteringandsalingandisalledthe\unorreted"ImportaneSamplingmethod

(imp)beauseHessianmatrixH

i onb

i

isapproximatedbyQ

i

,aseond-orderTaylor

expansion of Hessian matrix H

i onb

i

. The orreted version (uimp) isreplaing Q

i

with the Hessian matrix H

i .

Gaussian Quadrature

Gaussian Quadrature is used to approximate integrals of funtions with respet

toa given kernel by a weighted average of the integrand evaluatedat predetermined

absissas. The weightsandabsissas usedinGaussianQuadraturerules forthe most

ommon kernels an be obtained from the tables of Abramowitz and Stegun (1964,

Chapter25)orbyusinganalgorithmproposedbyGolub(1973);afurtherdesription

is given in Monahan (2001, Chapter 10) or Davis and Rabinowitz (1984). The idea

is similarto ImportaneSampling. In the univariate ase, suppose R

h(x)dx isto be

numerially integrated where h(x) an be written as h(x) = f(x)w(x) and w(x) is

the weight funtion. The integralan be approximated by

Z

h(x)dx= Z

f(x)w(x)dx N

GQ

X

i=1 !

i f(x

i )

where !

i

are weights, x

i

are absissas, and N

GQ

is the number of the quadrature

(26)

in the nonlinear mixed-eets model we an transform the problem into suessive

appliations of simple one-dimensional Gaussian Quadrature rules. Letting z ?

j , !

j ,

j = 1; ;N

GQ

denote respetively the absissas and weights for (one-dimensional)

GaussianQuadrature rule with N

GQ

pointsbased on the N(0;1) kernel,we have

L(yj) = N Y i=1 Z (2) (N i +m 0 )=2 jDj 1=2 j i j 1=2 exp n 1 2 (y i f i (;b i )) T i 1 (y i f i (;b i )) o

exp ( b T i D 1 b i =2)db i = N Y i=1 Z (2) N i =2 j i j 1=2 exp n 1 2 (y i f i (;D 1=2 z ? )) T i 1 (y i f i (;D 1=2 z ? )) o

exp ( z ?T

z ?

=2)dz ?

by saling b

i =D 1=2 z ? N Y i=1 (2) N i =2 j i j 1=2 N GQ X j 1 N GQ X j q q Y k=1 ! jk exp h 1 2 (y i f i (;D 1=2 z ? j 1 ;;j q )) T i 1 (y i f i (;D 1=2 z ? j 1 ;;j q )) i (1.9) where z ? j 1 ;;jq =(z ? j 1

; ;z ?

jq )

T

.

GaussianQuadrature rulein this asean be viewed asadeterministi version of

Monte Carlointegration inwhihrandom samplesof b

i

are generated fromN(0 ;D)

distribution. The samplesz ?

j

i

and the weights !

j

are xed beforehand, but inMonte

Carlo integration they vary randomly. Beause Importane Sampling tends to be

muhmore eÆientthansimple MonteCarlointegration, wealsoonsiderthe

equiv-aleneofImportaneSamplingintheGaussianQuadratureontext, whihisdenoted

by Adaptive Gaussian Quadrature (Pinheiro and Bates 1995). In this approah the

grid of absissas in b

i

sale is entered around the onditional modes ^

b

i

rather than

(27)

of z ?

. The Adaptive GaussianQuadrature approximation very losely resembles the

ImportaneSampling. Thebasidiereneisthatthe formerusesxedabsissas and

weights,butthelatterallowsthemtobedeterminedbyapseudo-randommehanism.

It is alsointeresting to note that Laplae isone-pointGaussian Quadrature beause

in this ase z ?

1

= 0 and !

1

= 1. The Adaptive Gaussian Quadrature also gives the

exat likelihoodwhenthemodelfuntionislinearinb, butthatisnottrue ingeneral

for the GaussianQuadrature approximation(1.9).

The methodsFOL,Laplae,ImportaneSamplingapproximation,and Adaptive

GaussianQuadrature giveexat resultswhen the model funtion f islinear in b.

1.1.1 Comparing approximation methods

Pinheiroand Bates(1995)onstrut asimulationexperimenttoompare the

per-formaneofmethodsinludingFOL,Laplae'sapproximation,ImportaneSampling,

Gaussianquadrature,and thenew version ofAdaptiveGaussianQuadraturefor

esti-mating the parameters of nonlinear mixed eets models. They onlude that FOL

approximationgivesaurateandreliableestimationresultstothelog-likelihood

fun-tion in some nonlinear mixed-eets models suh as the one proposed by Lindstrom

andBates(1990). Themainadvantageofthisapproximationisitsomputational

eÆ-ieny. Inotherases, however, theyndFOLmethodnottobeaurate. They

on-lude that,beauseitis simpleromputationally,it ouldbeused forstarting values

(28)

AdaptiveGaussianQuadraturemethodsgivethe best mixofeÆieny and auray.

The drawbak of the Laplae approximation isthat itis just aone-pointestimation.

Inotherwords,weannotimprovetheauraywithmorenumerialeortsineonly

one absissaisused for eah individual. Quadraturerules relyona deterministi

ap-proximationtoanintegralasaweightedsumoftheintegrandevaluatedataspeially

hosen set of values, or absissas, where the weights are also speially hosen. The

approximationthusrequires theintegrandtobeevaluatedateahabsissa,and then

these values are weighted and summed. The auray of the rule for approximating

the true value of the integral is prediated on the number of absissas, N

GQ

whih

may be hosen by the user. The more absissas we have, the better the

approxima-tion willbe ahieved. For m=1 dimensional integrals,itis not too omputationally

burdensome to arry out suh numerial integration, as the absissas need only be

hoseninonedimension. TheapproximationoftenworkswellforN

GQ

assmallas5or

10(Pinheiroand Bates1995, Hartford2000). However, form >1, absissas must be

hosen ineahdimension,and the integrand must be evaluatedateahombination;

for example, for m =3 and N

GQ

=10, there are 10 3

=1000 funtion evaluationsto

perform. Thus, for larger m, the omputational hallengeinreases greatly. Besides,

in the ontextof maximizingthe integrand Q

N

i=1 R

p(y

i jb

i ;

i

;;D)(b

i

); some sort

of optimizationsheme, suh as Newton-Raphson iterative approah, would have to

be invoked to update the parameter estimates. This would of ourse require that

(29)

turn would require nm-dimensional integrals to be evaluated at the urrent iterate

for the parameters. It should be lear that the omputational burden ould beome

overwhelming. IfonereduedN

GQ

toaddressthisburden,aurayoftheintegration

(and hene auray of the evaluatedlikelihood)would beompromised. This

prob-lem, wherein omputational hallenges beome overwhelming in higher dimensions,

is oftenalled the urse of dimensionality. Moreover, Gaussian Quadrature

approxi-mationwillperformpoorlyif therandomeets arenot losetonormal. Importane

Samplingapproximationgivesreliable estimationresults, omparabletothose of the

Gaussian Quadrature and Laplae approximations. The main advantage of

Impor-tane Samplingapproximation is its versatility in handling distributions other than

normal, for both the random eets and the subjet-spei error term. For

exam-ple,itwould beratherstraightforwardtoadaptthe ImportaneSamplingintegration

to handle a multivariate t or logisti distribution for the random eets, but that

would not be a trivial task for either the FOL, Laplae, or Gaussian Quadrature

approximations. Also, by using Importane Samplingapproximation, the likelihood

an be approximated to any speied level of auray by inreasing the number of

importanesamples. In ontrast toGaussian Quadraturemethod with the absissas

being determined by aquadrature rule, the methods based onImportane Sampling

are random. This random method leads to random input for an optimization

algo-rithm that resultsin dierent values of the likelihoodwhen re-evaluatedat the same

(30)

one would expet the result for the approximation to the likelihood to be slightly

dierent. This randomness of Importane Sampling method auses some numerial

diÆultiesfortheoptimizationalgorithmusedtoobtainthemaximumlikelihood

esti-mates,beausethestohastivariabilityassoiatedwithdierentimportanesamples

overwhelmed the numerial variability of the likelihood for small hanges in the

pa-rameter values (used to alulate numerialderivatives). To avoid this problem, the

same importanesamples are used throughout the alulations (Pinheiroand Bates,

1995). In other words, only one importane sample is used throughout the entire

searh, and thus it requires a large number of absissas whih turns out to be very

omputationally ineÆient. Hene, Importane Sampling approximation is

onsid-erably less omputationally eÆient than FOL, Laplae, or Gaussian Quadrature

approximation. Nonetheless, ImportaneSamplingapproximationis most promising

for high auray among all the numerial methods we disussed above. The reason

is that it not only an handle distributions other than normal but alsoan improve

the auray of the estimation by putting more numerial eort, provided that it is

aordable tothe algorithm.

Optimization Algorithms

An iterativealgorithmfor updating the parameter estimates require:

(1)startingvaluesorestimates(seelaterdisussion onhowtoget startingvaluesand

(31)

(2) a method for updatingfrom iterationt to iterationt+1,

(3) a rule for deiding when to stop. Basially, most optimizations for nding an

extremepointofthe funtionf(),denoted by ^

are variationsonNewton'smethod

t+1 =

t a

t H

1

(

t

)5f(

t )

where a

t

is alled the step size or gain in the tth iterative step, 5f and H are the

gradient and Hessian matrix of the objetive funtion f, respetively. Speially,

note that:

if H =I and a

t

=a isa onstant, thenwehave the methodof steepest desent,

if a

t

=1, then we have the \pure" Newton method,

inaseswherethe objetivefuntionisobservedwithaddednoise,stohasti

approx-imation(SA) may be appropriate. A general SA algorithmhas the form

t+1 =

t a

t g(

t

) (1.10)

where g(

t

) isthe gradientfuntion observed with error.

Sine analyti methods are often too ostly in human or omputer eort, some

gradient approximation shemes are usually used suh as the nite dierene (FD).

IntheaseofSA,Spall(1992)introduesthe simultaneousperturbation(SP)forthe

gradient approximation. It turns out that SP is an eÆient approximation beause

(32)

Finite Dierene (FD)

In the ase of SA,the diret measurementsorevaluationsofthe gradient 5f(

t )

are usually not available if the objetive funtion is very ompliated. One major

approximation to the gradient is Finite Dierene (FD). The FD formula for

ap-proximatingthe gradientof anobjetive f for the mth parameter, say,

m is

5f

m (

m )=

f(+

m Æ

m

) f(

m Æ

m )

2

m

; m=1;;p

wherepis thedimension ofthe parameter vetor , orthenumberofthe parameters

tobeestimated,

m

isasmallpositivesalarandÆ

m

isavetorwithaoneinthemth

plaeandzeroselsewhere. Eahomponentof isperturbed one-at-a-time,andeah

omponent of the gradient estimate is formed by the ratio of two dierenes. This

methodis motivated diretlyfromthe denitionof agradientasavetor ofppartial

derivatives,eahonstrutedasthelimitoftheratioofahangeinthefuntionvalue

over aorresponding one omponent of the argument vetor.

Simultaneous Perturbation Dierene (SP)

Spall(1992)providesoneformforthegradientapproximationusingsimultaneous

perturbation in SA ase. The simultaneous perturbation dierene approximation

(SP) has all elements of randomly perturbed together, but eah omponent of

the gradient is formed from a ratio involving the individual omponents in the

(33)

approximation to the gradient funtiong() has the following form

g()=

f(+) f( )

2

(1.11)

where fgis a vetor omposed of amutually independent mean-zero random

vari-ables f

1 ;

2

; ;

p

g with

m

independent of

0 ;

1

; ;

p

; m = 1 ;p. The

m

omponents are hosen randomlyaording tothe onditions. The two ommon

hoies of are:

1)

m

are i:i:d: 1with equal probability(symmetri Bernoulli distribution).

2) is uniformlydistributedon unit sphere.

Therefore, the mth element of the gradient approximation by SP is

g

m

()=

m

f(+) f( )

2

Note that this estimate diers from the usual FD gradient approximation sine the

numerator is the same for allelements ofthe vetor and thus eah omponent of the

gradient approximation requires only two evaluations, instead of 2p evaluations as

requiredby ageneralFDapproximation. The name\simultaneous perturbation"as

applied to (1.11) arises from the fat that all elements of vetor are being varied

(34)

1.2 A numerial approah for the parameter

esti-mation of NLMM

Among all the approximations for the likelihood, Importane Sampling is most

promisingfor high auraybeausethe preision an be improved by putting more

numerial work or absissas. Of all optimization tehniques for updating the

pa-rameter estimates, stohasti reursive proedure an be regarded as an eonomi

and eÆientupdating algorithmbeauseitavoids the alulationof the Hessian

ma-trix. The key quantity in stohasti updating formula is the approximation for the

gradient of the objetive funtion. It is shown that SP is a very eÆient gradient

approximation method beause it saves many evaluations of the objetive funtion

ateahiteration andthe noise does not presentan obstaletoonvergene. We

pro-pose here an optimization approah for the parameter estimation for NLMM whih

approximates the likelihood using Importane Samplingand updates the parameter

estimates using SA algorithmwith SP gradientapproximation.

As disussed in 1.2, the gradient 5f() of the objetive funtion f() an be

approximated by SP method tog() in (1.11)

g()=

f(+) f( )

2

thus the SA optimizationwith SP gradientapproximationhas the formof

t+1 =

t

t a

t

f(+

t

) f(

t

t )

2

t

(35)

and the mth elementof the parameter,say,

m

an be updatedin the tth iterative

step by (t+1)m = tm tm a t

f(+

t t ) f( t t ) 2 t (1.13)

In the ase of NLMM, our objetive funtion is the likelihoodL(). Therefore,

sub-stitutingf with L inthe formulae (1.12) and (1.13) gives usthe reursive algorithm

for updating the parameter estimates of NLMM, that is

t+1 = t t a t L(+ t t ) L( t t ) 2 t (1.14) or (t+1)m = tm tm a t L(+ t t ) L( t t ) 2 t : (1.15)

However,sinethereisnolosed-formexpressionforthelikelihoodfuntioninNLMM,

or there are nodiret measurements available forL, the taskfor the parameter

esti-matesofNLMMismadeverydiÆult. WewilldisusshowweoveromethisdiÆulty

inChapter2. Toinvestigatethe performaneofour optimizationapproah,weapply

this approah to a simple nonlinear mixed-eets model for tting a real data and

make omparisons among dierent optimizations inluding the proposed approah

and the existing ones.

The optimization for the parameter estimates of NLMM has been implemented

in software pakage SAS pro nlmixed and Splus or S funtion nlme. SAS pro

(36)

Quadrature,AdaptiveGaussianQuadrature,andImportaneSampling, thepriniple

ones beingAdaptiveGauss quadrature andarst-order Taylorseries approximation.

Laplaeapproximationisanoption;itisequivalenttoAdaptiveGaussianQuadrature

with only one quadrature point. There are several iterativetehniques for updating

the parameter estimates available in pro nlmixed that work well in various

irum-stanes. They inlude trust region Method (TRUREG), Newton-Raphson method

with Line Searh (NEWRAP), Newton-Raphson method with Ridging (NRRIDG),

Quasi-Newtonmethods(QUANEW),Double-Doglegmethod(DBLDOG),Conjugate

Gradientmethods(CONGRA),andNelder-Mead Simplexmethod(NMSIMP).Some

of these optimization tehniques require the alulation of the gradient or Hessian

matrix and some donot. The nite dierenemethod(FD) is the main method for

the gradient and Hessian approximation required by the reursive proedures

imple-mented in pro nlmixed. Sine eah of the optimizers requires dierent derivatives,

some omputational eÆienies an be gained. Eah optimization method employs

one or more onvergene riteria that determine when it has onverged. Details

on these optimization tehniques are available in Chapter \Nonlinear Mixed Eets

Model" onSASOnlineManual(version8.2). Throughoutthis artile,weuse the

op-timizationmethodpro nlmixed toompare withthe proposedone. Foronveniene,

we name the optimization algorithm of pro nlmixed using Importane Sampling,

FOL, GQ, and Laplae for the likelihoodapproximation as nlmix(IS), nlmix(FOL),

(37)

Chapter 2

Stohasti Approximation

Stohasti approximation(SA)isused forthe maximizationproblem,orfor

max-imizing an objetive funtion f of the parameter . For onveniene, we substitute

thenotationf withL,thelikelihoodwhihisourfousedobjetivefuntion

through-out this artile. Other ommon names for L are performane measure,

measure-of-eetiveness,tnessfuntion,orriterion(Spall,2004). Spallsummarizes(2004)that

stohasti optimization algorithmsapply where

(I) there is a random noise in the measurements of L(), say, L()+(), where

represents the randomnoise terms with E(())=0;

and or

(II) there is a random (Monte Carlo) hoie made in the searh diretion as the

(38)

2.1 RMSA

Stohasti approximation (SA) is a ornerstone of stohasti optimization.

Rob-bins and Monro (1951) rst introdue SA, whih is alled RMSA, as a general

root-ndingmethodintheunivariateasewhenonlynoisymeasurementsoftheunderlying

funtion are available. In other words, SA isrst designed to nd ?

that solves the

one-dimensional equation

L()=0

with the noisy measurements,denoted by L ? L ? t =L t + t

; or L ?

(

t

)=L(

t

)+(

t

); t=1;;T

where 1 ; 2 ; ; t

are i.i.drandom variableswith mean zero and variane 2

.

Robbins and Monro(1951) denea (nonstationary)Markov hain f

t

g by taking

1

to bean arbitraryonstant and dening

t+1 = t a t L ? ( t ) wherefa t

gisa gainsequene of positivereal numbers satisfyingthe well-known R M

onditions a t !0; X t a t =1; X t a 2 t <1

Details onhow to hoose anappropriate gain sequene fa

t

g inpratie for SA

algo-rithm will be disussed in Chapter 3. In p = 1 dimensional ase, the typial hoie

of the gain a

t is a0 t0+t 1 bt

(Chung, 1954) where a

0 , t

0

are two integers, and b

t

(39)

estimate of the slope. In addition, in the p = 1 dimensional ase, we need to be

areful ofthesign inRMSA.Theplot inFigure2.1 shows howthe signofthe RMSA

is determined by the slope of the objetive funtion. Observing L ?

(

t

) positive or

slopepositivesuggests dereasing

t

,and observingL ?

(

t

)negativeorslope negative

suggests inreasing

t

. Under ertain onditions Robbins and Monro (1951) show

Figure2.1: Desribe the relationship between the sign and the slope inRMSA.

−2

−1

0

1

2 −2

−1

0

1 theta

L(theta)

thatf

t

gisaonsistentestimatorof ?

:Inotherwords,

t

onvergestothe root ?

in

probability,i.e.,

t p

! ?

. Insome appliationsit ismoreonvenient tomake agroup

(40)

of observations willthen be

L ?

(t;1)

; ;L ? (t;r) (2.1) Let L ? t

be the arithmeti meanof the values (2.1),then the RMSA algorithman be

hanged into t+1 = t a t L ? t 2.2 KWSA

Beginning with the paper of Robbins and Monro muh work has been done in

stohastiapproximation. UnderthestimulusofRobbinsandMonro'smethod,Kiefer

andWolfowitz(1952)establishananalogousreursiveproedureforndingthe

max-imum of a univariatefuntion L() with the noisy measurements

L ? t =L t + t

; or L ?

(

t

)=L(

t

)+(

t

); t=1;;T

where 1 ; 2 ; ; t

are i.i.d randomvariableswith mean zero and variane 2

. They

presentashemewhereby,startingfromanarbitrarypoint

1

;oneobtainssuessively

2 ;

3

;::: suh that

t

onverges to ?

in probability as t !1: The reursive sheme

an be desribed asfollows

t+1 = t a t L ? ( t + t ) L ? ( t t ) 2 t wherefa t

gandf

t

gare gain sequenes ofpositivenumbers suh that

t !0; P a t = 1; P a t t <1, P a 2 2

(41)

fa

t

gand f

t

gin pratie willbedisussed inChapter 3. Under regularity onditions

on L() they prove that

t

onverges to the maximum ?

in probability. The above

reursive proedure isalled Kiefer-Wolfwitz stohasti approximation (KWSA).

As disussed earlier,maximizingL() an be onverted to ndingthe root of the

gradient funtion of L(), say, L()= =0. Therefore, KWSA is in fat a speial

ase of RMSA with noisymeasurements of the gradient funtionL()= obtained

by taking the ratio of the nite dierene L

?

(+) L ?

( )

2

. In other words, the usual

nite-dierene is used to approximate the gradient funtion of at eah iteration.

In ontrast to RMSA proedure, KWSA is a gradient-free optimization sine there

is no need to postulate the existene of the derivative of L() (indeed, L() an be

disrete.)

2.3 Generalized RMSA

Blum(1954)extendsRMSAproeduretoamultivariateversionofRobbins-Monro

proedure. Now both 0 and are p-dimensional vetors, then a multidimensional

RMSA algorithmfor ndingthe rootof the funtion L() is onstrutedas

t+1 =

t a

t L

?

(

t )

and the mth variable of the parameter vetor , say,

m

an be updated in the tth

step of a RMSA algorithmby

(t+1)m =

tm a

t L

?

(

(42)

Generalized KWSA

Correspondingly,a multivariate KWSAfor maximizingL() is

(t+1)m = tm a t L ? ( t + t Æ m ) L ? ( t t Æ m ) t (2.2)

Under appropriateonditions,theiterationin(2.2)onverges to ?

almostsurely,

(Blum1954, Kushner and Clark 1978, Fabian 1971 orKushner and Yin1997).

2.4 Simultaneous Perturbation SA (SPSA)

As disussed earlier, for a maximizationproblem, there are two methods for

ap-proximatingthegradientfuntion,thatis,FDandSP. KWSAisinfatanalgorithm

formaximizingthe objetivefuntion usingFD approximationfor the gradient.

A-ordingto (1.11),an SP approximationfor gradient is

g()=

f(+) f( )

2

and the parameter vetor an be updated at the tth iteration in the following

iterativefuntion t+1 = t t a t L(+ t t ) L( t t ) 2 t where f t

g is a vetor omposed of mutually independent mean-zero random

vari-ables f t1 ; t2 ;; tp g with tm independent of 0 ; 1 ; ; p

; m =1;p. The

ti

(43)

ne-3the detailsonhow tohoose the gainsequenes fa

t

g andf

t

g inpratie forSPSA

algorithm. Foronveniene, we allSA algorithmusingSP for the gradient

approxi-mation assimultaneous perturbation SA, denoted by SPSA.

As disussedearlier,the gradientapproximatedbySPSA requiresonlytwo

evalu-ationsof L(),insteadof 2pevaluationsat eah iteration. The measurementsavings

periteration,ofourse,providesthepotentialforSPSAtoahievelargesavings(over

FDSA,KWSA)inthetotalnumberofmeasurementsrequiredtoestimatewhenpis

large. This potentialis onlyrealizedif the numberof iterationsrequiredfor eetive

onvergene to ?

does not inrease ina way toanel the measurementsavings per

gradient approximation at eah iteration. Spall (1988, 1992) presents onditions for

onvergene ofSPSA (

t a:s

! ?

)usingthedierentialequationapproahasdisussed

in Ljung(1977) and Kushner and Clark (1978)in the ontextof the R M algorithm.

The most interesting theoretial results in Spall (1992) and those that most justify

theuseofSPSA,arethe asymptotieÆienyonlusionsthatfollowfromonditions

given inSpall (1992) showing that

t =2

(

t

?

) d

!N(;) (2.3)

where d

! denotes onvergene indistribution, >0,and are amean vetor and

ovariane matrix. Spall (1992) uses this asymptoti normality result in expression

(2.3) (together with a parallelresult for FDSA) toestablishthe relativeeÆieny of

SPSA. This eÆieny depends on the shape of L(), the values for a

t ,

t

, and the

distribution of

tm

and measurement noise terms ()

(44)

expres-sion that an be used to haraterize the relative eÆieny. However, as disussed

in Spall (1992, Set.4) and Chin (1997), in most pratial problems, SPSA will be

asymptotiallymore eÆient than FDSA. In partiular, by equating the asymptoti

mean-squared errors E( ?

) 2

inSPSA and FDSA,they nd

no:of L ?

()valuesinSPSA

no: ofL ?

()valuesinFDSA !

1

p

(2.4)

asthe numberof measurementsofL inboth proedures gets large. Heneexpression

(2.4)impliesthatthe p-foldsavingsperiteration(gradientapproximation)translates

diretly into ap-foldsavingsin the overall optimizationproess.

Overall, of the stohasti optimization tehniques, KWSA, FDSA, SPSA may

be alled \gradient-free" stohasti algorithms. In ontrast, RMSA may be alled

the \gradient-based" algorithm. The \gradient-free" in this irumstane refers to

the ase where the gradientL()= of the likelihood L() is not readily available

or not diretly measured (even with noise). The gradient-free stohasti algorithms

exhibit onvergene properties similar to the gradient-based stohasti algorithms

whilerequiringonlyobjetivefuntionmeasurements. Thegradient-basedalgorithms

relyondiret measurements of thegradientof the objetivefuntion with respet to

the parameters being optimized. These measurements typially yield an unbiased

estimateof the gradient. The main advantage of the gradient-free algorithmsis that

they do not require the knowledge of the gradient funtion whih in many ases is

(45)

2.5 The appliation of IS and SPSA in the MLE

of a NLMM

In the ase of NLMM, the maximum likelihood estimation of the parameters

an beobtained bymaximizinglikelihoodL(). When Land the gradientL()=

are observed diretly, thereare, ofourse, manymethodsforndingthe maximum ^

(e.g., steepest desent, Newton-Raphson). However, in NLMM ase, as disussed in

setion 1.1, it is diÆult to get the diret measurements of the likelihood, whih is

given inan integral form

L()= Z

p(yjb;)(b)db

and the orresponding gradientfor aomplex nonlinear mixed-eetsmodel beause

the expetationfuntion is nonlinear in b and there is no losed-formexpression for

theabovelikelihoodfuntion. Butaswementioned,Lanbeapproximatedbytaking

the sample mean of the integrand at importanesamples, say,

L()L

IS ()=

1

N

IS N

Y

i=1 N

IS

X

k=1 p(y

i jb

ik

;)(b

ik )=I(b

ik )

fromequation (1.8). If E(L

IS

())=L(), or,L

IS

is anunbiasedestimate ofL, then

L

IS

an be onsidered to be the noisy measurements of the likelihoodL, say, L ?

in

the framework of SA.Letting

=L

IS

(46)

then we have

E() =E[L

IS

() L()℄=L() L()=0

In fat,the expetationof L

IS

an be writtenas

E(L IS ()) = 1 N IS N Y i=1 N IS X k=1 Ep(y i jb ik ;)(b ik )=I(b ik ) = 1 N IS N Y i=1 N IS X k=1 Z fp(y i jb ik ;)(b ik )=I(b ik )gI(b ik )db ik = 1 N IS N Y i=1 N IS X k=1 Z p(y i jb ik ;)(b ik )db ik = 1 N IS N Y i=1 N IS X k=1 L i () = N Y i=1 L i (y i

j)=L(yj)

We show above that L

IS

is an unbiased estimate of L, and therefore, L

IS

are the

noisy measurements of the likelihood. Consequently, aording to (1.14), SA with

Importane Sampling likelihood approximation and SP gradient approximation an

be formulated as

t+1 = t t a t L IS ( t + t t ) L IS ( t t t ) 2 t (2.5) where t

is a vetor with p mutually independent variables in the tth stohasti

step. Forsimpliity,weallthealgorithmgiven in(2.5)asSimultaneous Perturbtion

Stohasti Approximation Using Importane Sampling for likelihood approximaton,

denoted by SPSAIS.

(47)

A numerialexperimentalstudy isperformedonthe importanesamplesize fordata

generated from a nonlinear model in Chapter 3. We expet the auray of the

parameterestimationtobeimprovedbyusingmoreimportanesamples. Inaddition,

we willmake omparisons between SPSAIS and other optimization methods suh as

(48)

Chapter 3

Appliation of SPSAIS to NLMM

3.1 Example

Consider the orange tree data given by Draper and Smith (1981). This example

was used by Pinheiro & Bates (2000, Ch.8.2) to illustrate how a logisti growth

urve model an be t using the S-Plus/R routine nlme. These data onsist of

seven measurements of the trunk irumferene (in millimeter) on eah of ve

or-ange trees over 1582 days ranging from t = 118 to 1582. The seven time points

are t = 118;484;664;1004;1231;1372; and 1582. A plot of this funtion over 1582

days for a partiular hoie of =(

1 ;

2 ;

3

) is given inFigure 3.1. Lindstrom and

Bates (1990) and Pinheiro and Bates (1995) propose the following logisti nonlinear

(49)

Figure 3.1: The trunk irumferene of the orange tree over1582 days.

2

500 1000

1500

80

100

120

140

160

180

200

(time)

(circumference)

1

1 ₁

1

2

3

3 ₃

4

5

y ij = 1 +b i

1+exp [ (x

ij 2 )= 3 ℄ + ij wherey ij

representsthe jthmeasurementonthe ith tree (i=1; ;5;j =1; ;7);

x

ij

is the orresponding day;

1 ;

2 ,

3

are the xed-eets parameters; b

i

are the

random-eetsparametersassumedtobei:i:d:N(0; 2

b

)and

ij

(50)

assumed tobei:i:d: N(0; 2

) and independent of b

i

. This model has a logistiform,

and the random eets b

i

enter the model linearly. Pinheiro and Bates estimate the

parameters usingGaussian Quadratureapproximation for the loglikelihoodwith 200

absissas. The results of the estimation are presented in Table 3.1.

Table 3.1: MLE of the parameters

1 ;

2 ;

3 ;

2

; and 2

b

; by nlme

1

2

3

2

b

2

GQ(200) 192.293 727.074 348.074 1003.25 61.49

3.2 Designs and Simulations

To ompare howSPSAIS workson this logistimodel with the optimizationpro

nlmixed asdisussed inChapter2,wesimulatedata andemployoptimizationsSPSA

and pro nlmixed for these data. The omparison riterion is based on relative

absolute error, whih is dened as the averaged ratio of the dierene between the

estimatedparametervalue ^

m

andthe trueparametervalue ?

m

tothe trueparameter

value, that is,

R AE = 1

p p

X

m=1 j

^

m

?

m j

?

m

wherepisthe totalnumberofthe parameters. Forsimpliity,wedenotethe relative

absoluteerrorbyR AE. Whyisitneessarytodenesuhavalue? Thereasonisthat

nourrent algorithman give a\perfet" maximumlikelihoodestimation for NLMM

(51)

the nonlinearity of the expetation funtion of the random eets and thus there is

no losed or analytial form for the likelihood. In this sense, we make omparisons

between parameter estimates as if we were treatingthem as \ompeting" parameter

estimatesor\real"MLE.Fromthispointofview,aquantitydenedassuharelative

absoluteerrormakessense. Apparently,a\better"optimizationhasarelativelysmall

R AE inmost ases.

In order to examine whether all optimizations in the preeding disussion work

for this logisti model, we design several parameter settings and generate 20 data

sets under eah parameter setting, and apply all optimizations to these data. The

parameter settings are designed in a way that they represent as versatile proles as

possible. The parameter estimates from the preeding orange example in Table 3.1

provide usthe referene forthe hoie of parameter settings. Sine

1

takes the role

of asaling parameter,itis set toa onstant,for example 200,as given in Table 3.1,

and weset twolevels, high andlowforeah ofthe other fourparameters. Therefore,

wehaveafatorial experimentaldesignof size2222=16. Listed inTable3.2

is the struture of these sixteen parameter settings.

Table 3.2: Designfor the Parameter Settings

1

2

3

2

b

2

high 200 600 600 1000 60

low 200 300 300 600 10

(52)

data based solely on the levels of the xed parameters, ignoring rst the random

eets and random error term in the model. The plots given in Figure3.2 show the

data proles determined by the values of

2

and

3

, as given in Table 3.2. We an

see from Figure 3.2 that the proles determined by parameters (200,600,600) and

(200,300,600) reveal a linear shape with the latter havinga bigger rate, and proles

determinedbyparameters(200,600,300)and(200,300,300)revealapolynomialshape,

with thelatter havingabigger rate. Basially,this indiates thatthe generated data

represent 4 dierent proles. Adding the random eets and random error terms

bak onto the model, we get 16 proles showing the variation among individuals

at dierent time points. The orresponding plots for a single sample from the 16

parameter ombinationare given in Figure 3.3 and 3.4. Note that the design points

x

ij

(53)

Figure 3.2: Four proles of the simulateddata.

60

80

100

120

140

160

500 1000

1500

theta (200,600,600)

t

y

50

100

150

500 1000

1500

theta(200,600,300)

t

y

100

120

140

160

180

500 1000

1500

theta(200,300,600)

t

y

80

100

120

140

160

180

200

500 1000

1500

theta(200,300,300)

t

(54)

Figure3.3: The plots for 16 parameter settings

1

500 1000

1500

60

120 (time),theta(200,600,600,1000,60)

(circumference)

1

2

3

4

5

1

500 1000

1500

60

120 (time),theta(200,600,600,1000,10)

(circumference)

1

2

3

4

5

1

500 1000

1500

60

120 (time),theta(200,600,600,600,60)

(circumference)

1

2

3

4

5

1

500 1000

1500

60

120 (time),theta(200,600,600,600,10)

(circumference)

1

2

3

4

1

500 1000

1500

50

150 (time),theta(200,600,300,1000,60)

(circumference)

1

2

3

4

5

1

500 1000

1500

50

150 (time),theta(200,600,300,1000,10)

(circumference)

1

2

3

4

5

1

500 1000

1500

50

150 (time),theta(200,600,300,600,60)

(circumference)

1

2

3

4

5

1

500 1000

1500

50

150 (time),theta(200,600,300,600,10)

(circumference)

1

2

3

4

5

(55)

Figure3.4: The plots for 16 parameter settings

1

1 ₁

1

500 1000

1500

80

140 (time),theta(200,300,600,1000,60)

(circumference)

1

1 ₁

1

2

3

4

5

5 ₅

5

1

500 1000

1500

80

120

180 (time),theta(200,300,600,1000,10)

(circumference)

1

2

3

4

5

1

1 ₁

1

500 1000

1500

80

140 (time),theta(200,300,600,600,60)

(circumference)

1

1 ₁

1

2

3

4

5

1

500 1000

1500

80

120

180 (time),theta(200,300,600,600,10)

(circumference)

1

2

3

4

1

1 ₁

₁

1

500 1000

1500

60

120

200 (time),theta(200,300,300,1000,60)

(circumference)

1

1 ₁

₁

1

2

3

3 ₃

4

5

1

500 1000

1500

80

140

200 (time),theta(200,300,300,1000,10)

(circumference)

1

2

3

4

5

1

1 ₁

₁

1

500 1000

1500

60

120

200 (time),theta(200,300,300,600,60)

(circumference)

1

1 ₁

₁

1

2

3

3 ₃

4

5

1

500 1000

1500

80

140

200 (time),theta(200,300,300,600,10)

(circumference)

1

2

3

4

5

(56)

3.3 Some Implementation Issues

There are several important issues assoiated with the implementation of

SP-SAIS optimization algorithm in terms of the hoie of the starting values, saling,

and algorithm fators in the algorithm. All of these issues play an important role

in determining whether SPSAIS an be started, and or whether the algorithm will

onverge.

3.3.1 Choosing starting values by ad ho method

Beause nonlinear models an have any dierent forms,there is noone, \all

pur-pose" or \ automati" approah for identifying a sensible hoie of starting values.

However, for models withpartiular features,ad ho methodsan be onstrutedfor

starting estimates.

For example,for the logisti model

y

ij

=f(x

ij ;)=

1 +b

i

1+exp [ (x

ij

2 )=

3 ℄

+

ij

where b

i

are i:i:d: N(0; 2

b

) and

ij

are i:i:d: N(0; 2

); an ad ho method an be

derived to form starting estimates for the xed parameter and varianes of the

randomterms,say 2

b

and 2

. Theproedures forobtainingadho startingestimates

for this logisti model are as follows. The individual mixed eets

1i =

1 +b

i

an be viewed as the response of the ith individual at \x

ij

= 1". In other words,

lim

t!1 E(y

ij ) =

1i