NonlinearMixed Eets Models (Under the diretion of John F. Monahan.)
Nonlinear mixed-eets models (NLMM) have reeived a great deal of attention
in the statistial literature in reent years beause of the exibility they oer in
handlingthe unbalaned repeated-measurements data that arise in dierent areas of
investigation,suhaspharmaokinetis. Weonentratehereonmaximumlikelihood
estimation for the parameters in nonlinear mixed-eets models. A rather omplex
numerialissuefor maximumlikelihoodestimation innonlinear mixed-eetsmodels
is the evaluation of a multiple integral that, in most ases, does not have a
losed-form expression. We restritour attention in this artile onnumerialmethods that
are based onapproximation for the likelihood. Several numerialapproximations for
the likelihood have been proposed suh as rst-order linearization (FOL), Laplae
approximation, Importane Sampling, and Gaussian Quadrature (GQ). In addition,
for ageneraloptimization problem,iterative methodsare usually required toupdate
the parameter estimates iteratively. Alarge numberof parameterupdating methods
have been developed suh as Newton-Raphson, Steepest Desent, Stohasti
opti-mization, et. Many urrent optimization algorithms implement a Newton iterative
method to update the parameter estimates in NLMM. The objetive of this thesis
is to propose an optimization approah for the parameter estimation in nonlinear
rameter estimates in NLMM. In Chapter 1, we desribe the model and introdue
several likelihoodapproximationsandparameterupdatingproedures forthese
mod-els. The proposed optimization approah is illustrated in Chapter 2. In order to
omparethis approahtotheotheroptimizationmethods, simulationsare performed
and onlusions are drawn based on simulation results in Chapter 3. Some future
by
Jing Wang
A dissertation submitted inpartial satisfationof the
requirementsfor the degree of
Dotor of Philosophy
in
Statistis
in the
GRADUATE SCHOOL
at
NC STATE UNIVERSITY
2004
Professor John F. Monahan
Chair of AdvisoryCommittee
Professor DavidA. Dikey Professor Marie Davidian
Nonlinear Mixed Eets Models
Copyright 2004
by
To my parents
Biography
JingWang wasborn inTonghua, China, toparents Yurong Ma and Weijia Wang on
Otober22, 1971. She earnedher Bahelorof Siene degreein MathematisinMay
of 1993, a Master of Siene degree in Probability and Statistisin May of 1996, all
atthe Northeast NormalUniversity inChina. She beganher Ph.D. studiesat North
CarolinaState University in August 2000, and her thesis defense was July 21, 2004.
Aftergraduating,shewillbeginaareeratLouisianaStateUniversity asanassistant
Aknowledgements
Thankstoallofthe followingpeople. Withoutyouthiswouldnothavebeenpossible.
To my advisor, Dr. John Monahan, for your instrution and patiene.
Tomyommitteemembers,ProfessorsMarieDavidian,BibhutiBhattaharyya,
and DavidDikey, for atually readingall of this.
To the friends I have made atNCSU, Luna, Weiwei Wu, and Jimmy Doi.
Speial thanks to my parents and the rest of my family for enouraging me
Contents
List of Figures viii
List of Tables ix
1 Introdution 1
1.1 Approximation methods . . . 5
1.1.1 Comparing approximation methods . . . 14
1.2 A numerialapproah forthe parameter estimation of NLMM . . . . 21
2 Stohasti Approximation 24 2.1 RMSA . . . 25
2.2 KWSA . . . 27
2.3 GeneralizedRMSA . . . 28
2.4 Simultaneous Perturbation SA(SPSA) . . . 29
2.5 The appliationof IS and SPSA in the MLE of aNLMM . . . 32
3.1 Example . . . 35
3.2 Designs and Simulations . . . 37
3.3 Some ImplementationIssues . . . 43
3.3.1 Choosing startingvalues by ad ho method. . . 43
3.3.2 Improvingad ho estimates by LSI method . . . 45
3.3.3 Saling . . . 48
3.3.4 AlgorithmFator Issues . . . 57
3.3.5 A numerialstudy onalgorithmfators . . . 63
3.3.6 AlgorithmForm . . . 64
3.3.7 Results onalgorithmfators . . . 68
3.3.8 Optimizationmethod pro nlmixed . . . 69
3.4 Compare SPSAIS to pro nlmixed . . . 70
3.5 A largesample size problem . . . 72
3.5.1 Conlusions . . . 74
3.5.2 Twoadditionaltypes of relative error . . . 77
3.5.3 AlgorithmVariane . . . 78
4 Future work 80 4.1 More appliations . . . 80
4.2 A quadratioptimization . . . 81
Bibliography 83
A ANOVA Code and Results 89
A.1 Algorithmfators . . . 89
A.2 Tukey's multiple omparisons tests . . . 90
A.3 Relative squares error . . . 91
A.4 Maximum relative squares error . . . 92
B SPSAIS 94
List of Figures
2.1 Desribethe relationship between the sign and the slopein RMSA. . 26
3.1 The trunk irumferene of the orange tree over1582 days. . . 36
3.2 Four prolesof the simulateddata. . . 40
3.3 The plots for 16parameter settings . . . 41
List of Tables
3.1 MLE of the parameters
1 ;
2 ;
3 ;
2
; and 2
b
;by nlme . . . 37
3.2 Designfor the Parameter Settings . . . 38
3.3 ad ho startingestimates inthe rst parametersetting . . . 48
3.4 LSI starting estimates in the rst parameter setting . . . 49
3.5 An overall designon parameter settingand algorithmfators . . . 64
3.6 The treatmentmeans of the resultant R AEsfrom IS;AS, and SS . . 68
3.7 The number of funtionalls requested by nlmix(IS)(i). . . 70
3.8 Treatment meansof the R AEs resultingfrom methods. . . 71
3.9 Tukey's multiple omparisons tests for methods. . . 71
3.10 ad ho startingestimates forthe 1st parameter setting . . . 73
3.11 LSI starting estimates for the 1st parameter setting. . . 74
3.12 The number of funtionalls requested by nlmix (IS)(i) . . . 75
3.13 Treatment meansof the resultantR AEs fromIS;AS, and SS. . . 75
3.14 Treatment meansof the R AEs resultingfrom methods. . . 76
3.16 Treatment meansof the resultantR AEs fromSS levels . . . 77
3.17 Treatment meansof the R SEs frommethodsin smallersample ase. 78
3.18 Treatment meansof the R MSEs frommethods insmaller sample ase. 78
3.19 Treatment meansof the R SEs frommethodsin largersample ase. . 78
Chapter 1
Introdution
Nonlinear mixed-eets models provide a powerful and useful tool for analyzing
repeated-measuresdatathatariseindierentareasofinvestigation,suhaseonomis
andpharmaokinetis. Repeated-measuresdataaregeneratedbyobservinganumber
of subjets repeatedly under varying onditions. Nonlinear mixed-eetsmodels are
mixed-eets models in whih the intrasubjet model relating the response variable
to time is nonlinear in the parameters. We onsider here the model proposed by
Lindstromand Bates(1990). This model an be viewed asa hierarhial model that
insomewaysgeneralizesboththelinearmixedeetsmodelofLairdandWare(1982)
and the usualnonlinear modelfor independent data(Bates andWatts, 1988). In the
rst stage the jthobservation on the ith subjetis modeled as
y
ij
=f(
i ;x
ij )+
ij
; i=1;::N;j =1;::N
where f is a nonlinear funtion of a subjet-spei parameter vetor
i
and the
preditor or ovariate vetor x
ij ;
ij
is a normally distributed noise term, N is the
totalnumberofsubjets, and N
i
isthe numberof observationsontheith subjet. In
the seond stage, the subjet-spei parameter vetor ismodeled as
i = i (;b i )=A
i +B i b i ; b i N(0;D)
whereisap dimensionalvetorofxedpopulationparameters,b
i isam
0
dimensional
random eets vetor assoiated with the ith subjet (not varying with j), the
ma-tries A
i
and B
i
are design matries of size r p and r m
0
for the xed and
randomeets, respetively, and D isa ovariane matrix. This isa generalform of
the nonlinear-mixedeets model. Weoften writethe model for the ith individual's
entire response vetor y
i
. This is aomplishedby letting
y i = 0 B B B B B B B B B B y i1 y i2 . . . y iN i 1 C C C C C C C C C C A i = 0 B B B B B B B B B B i1 i2 . . . iN i 1 C C C C C C C C C C A
and f(
i )= 0 B B B B B B B B B B f( i ;x i1 ) f( i ;x i2 ) . . . f( i ;x iN i ) 1 C C C C C C C C C C A : Then y i
=f(
i )+ i (1.1) where i N(0; i
) is omposed of random error variables and
i
is a positive
We an write the N individualmodels as one by letting y= 0 B B B B B B B B B B y 1 y 2 . . . y n 1 C C C C C C C C C C A
f()= 0 B B B B B B B B B B f( 1 ) f( 2 ) . . . f( n ) 1 C C C C C C C C C C A ~
D=diag(D;D;::;D);and =diag(
1 ;
2 ;:::;
n
): Now, the overall model an be
writtenas
yj(x;b)N(f(););=A+Bb
bN(0; ~
D)
A=(A
1 ;A 2 ;:::A n ) T
;B =diag(B
1 ;B
2 ;:::;B
n
);b=(b
1 ;b
2 ;:::;b
n )
T
:
To have a better understanding of the nonlinear mixed-eets model, we onsider
a simple example. This example is given by Hartford and Davidian (2000) whih
onerns a study of the pharmaokinetis of a drug given as an intravenous bolus.
The one ompartment model is given inthe following form
y ij =f(x ij ; i )+e
ij = 100 V i exp ( Cl i x ij V i
)+e
ij
(e
i j
i
)N(0; 2 I N i ) Cl i
=exp (
1 + 2 100 a i +b 1i ) V i
=exp (
3 +b
2i )
b =(b ;b ) T
mayprovideasuitableharaterizationforwithin-subjetplasmonentrationatthe
timex
ij
forsubjeti,whereCl
i andV
i
representsubjet-speilearaneandvolume
of distribution, respetively, a
i
is aovariate for eah individual, and the ovariane
matrix is D = 0 B B D 11 D 12 D 21 D 22 1 C C A
: The xed parameter vetor is =(
1 ; 2 ; 3 ) T and
therandomeetvetorisb
i =(b 1i ;b 2i ) T
. Thus,thesubjetspeiparametervetor
is modeledas
i = 0 B B logCl i logV i 1 C C A = 0 B B 1 a i =100 0
0 0 1
1 C C A 0 B B B B B B 1 2 3 1 C C C C C C A + 0 B B 1 0 0 1 1 C C A 0 B B b 1i b 2i 1 C C A The matries A i = 0 B B 1 a i =100 0
0 0 1
1 C C A and B i = 0 B B 1 0 0 1 1 C C A
may be thought of asthe two \design"matriesorresponding to
i :
Combining the unknown xed parameters into one vetor and letting
=(;[veh(D)℄ T
;[veh()℄ T
the maximum likelihoodestimatefor an be found by maximizingthe likelihood L(yj) = N Y i=1 L i (y i j i ) = N Y i=1 Z p(y i ;b i jx i ; i )db i = N Y i=1 Z p(y i jb i ;x i ; i )(b i )db i (1.2) where L i (y i j i
) is the likelihood for the ith individual, p(y
i ;b i jx i ; i
) is the joint
density of Y
i jx i and b i , p(y i jb i ;x i ; i
) is the onditional density of the Y
i jx i given b i
, and (b
i
) isthe density funtion of the random eet b
i :
1.1 Approximation methods
Beausetheexpetationfuntionof(y
i ;b i jx i ; i
),thatis,f in(1.1)isnonlinearin
b
i
, thereisnolosed-formexpression forthe likelihoodfuntionLin(1.2). This
non-linearitymakesthealulationofthemaximumlikelihoodestimationoftheparameter
very diÆult. There are several approximationsproposed for estimating the
likeli-hood. We desribe briey here four dierent approximations for the likelihood that
are ommoninuse inthenonlinear mixed-eetsmodel(1.1): First-Order
Lineariza-tion (FOL) of the expetation funtion f around the expeted value of the random
eets (Sheiner and Beal, 1980; Vonesh and Carter, 1992), Laplae approximation
(Pinheiro and Bates, 1995), Importane Sampling ( Pinheiro and Bates, 1995), and
between these approximations ontheir omputationaland statistial properties.
First Order Linearization (FOL)
Aording to (1.2), the integral that we want toevaluate for the marginal
distri-bution of y
i
inmodel (1.1) for the ith subjet is
L i (y i j)= Z (2) (Ni+m0)=2 jDj 1=2 j i j 1=2
exp [ q(;D;
i ;y i ;b i )=2℄db i (1.3) where q(;D; i ;y i ;b i )=(y
i f( i (;b i ))) T i 1 (y i f( i (;b i
)))+b T i D 1 b i (1.4)
Shiener and Beal (1980) approximate the integral in (1.3) by approximating the
in-tegrand with a Taylor series expansion for f(
i
)about b
i
=0 beforeintegration
y
i
= f(
i (;b i ))+ 1=2 i e i f( i
(;0))+[f(
i (;b i ))=b i j bi=0 ℄b i + 1=2 i e i where e i
N(0;I
m
). Shiener and Beal proeed by onstruting the likelihoodfrom
this approximate model where b
i and
i
are normallydistributed as
L(yj) N Y i=1 Z (2) N i =2 j i j 1=2 exp 1 2 fy i f( i
(;0)) Z
i (;0)b i g T 1 i fy i f( i
(;0)) Z
i (;0)b i g (2) m 0 =2 jDj 1=2 exp[ 1 2 (b T i D 1 b i )℄db i
whihhas the following losed formexpression afterompleting the square
whereZ i =f( i (;b i ))=b i j b i =0 and V i =V i
(;0;veh(D))=
i +Z i DZ T i :For
simpliity, wedenote f((;b
i
))by f
i (;b
i
) inthe following disussion.
Laplae
Lindstrom and Bates (1990) attempt to improve on the rst-order
population-averaged approximation by expanding q around ^
b
i
, the non-zero estimate of b
i
or-respondingto the best linear unbiased preditor inthe linear ase. This
approxima-tion, alongwith a Gaussianposterior approximationused by Laird and Louis (1982)
and Stiratelli, Laird and Ware (1984), forms the motivation for the Lindstrom and
Bates algorithm. Alternatively, the Lindstrom and Bates algorithm an be derived
by Laplae's approximation,
L(yj) = N Y i=1 L i (y i j)= N Y i=1 Z exp fN i l i (b i )gdb i N Y i=1 (2=N i ) m0=2 jr i ( ^ b i )j 1=2 exp fN i l i ( ^ b i )g (1.5) where l i (b i ) = 1 N i ln[p(y i jb i )(b i )℄, ^ b i
is the estimate of b
i
by maximizing l
i (b i ), and r i ( ^ b i )= 2 l i (b i )=b i b T i j b i = ^ b i .
Theaboveapproximationin(1.5)isalledthe\orreted"Laplae's(laplae)
ap-proximationby Hartford (2001) beausethe exat seondderivative ofthe integrand
for the ith subjet is taken with respetive tothe randomeet b
i
. Another version
of Laplae'sapproximation tothe likelihoodfor the ith subjet whih Hartford alls
Bates (1995)by onsideringa seond order Taylor expansionof q in(1.4) around ^ b i q(;D; i ;y i ;b i
)q(;D;
i ;y i ; ^ b i )+ 1 2 [b i ^ b i ℄ T [ 2 q(;D; i ;y i ;b i ) b i b T i ℄j (b i = ^ b i ) [b i ^ b i ℄
where the linear term vanishes beause q(;D;
i ;y i ;b i )=b i j (b i = ^ b i )
=0. For
sim-pliity,denote q(;D;
i ;y
i ;b
i ) by q
i (;b
i ).
Consider anapproximation to 2 q i (;b i )=b i b T i atb i = ^ b i 2 q i (;b i )=b i b T i = 2 f i (;b i ) b i b T i j (bi= ^ bi) [y i f i (; ^ b i )℄+ + f i (;b i ) b T i f i (;b i ) b i j (b i = ^ b i ) [y i f i (; ^ b i
)℄+D 1 f i (;b i ) b T i f i (;b i ) b i j (b i = ^ b i ) [y i f i (; ^ b i
)℄+D 1
= Q
i
the seond lastterm is obtained by ignoring
2 f i (;b i ) b i b T i j (b i = ^ b i ) [y i f i (; ^ b i )℄ sine at ^ b i
itsontributionis usually negligibleompared tothat of
f i (;b i ) b T i f i (;b i ) b i j (b i = ^ b i ) [y i f i (; ^ b i )℄
Hartford (2000) alls this version \ulaplae" beause the integrand is
approxi-mated by a seond order Taylor expansion insteadof the exat seond derivativeon
b
i
for the ith subjet. The ulaplae approximation has the advantage of requiring
only the rst-order partialderivativesof the model funtionwith respet to the
ran-dom eets, whih are usually available from the estimation of ^
appropriate whenurvature is small. Therefore aording to(1.4) q(;D; i ;y i ;b i
) q(;D;
i ;y i ; ^ b i )+ 1 2 [b i ^ b i ℄Q i [b i ^ b i ℄ = (y i f i (; ^ b i )) T i 1 (y i f i (; ^ b i ))+ ^ b T i D 1 ^ b i + 1 2 [b i ^ b i ℄ T Q i [b i ^ b i ℄
The resultingapproximationfor the likelihood usingLaplae's approximation is
L(yj) Q N i=1 (2) N i =2 jDj 1=2 j i j 1=2 jQ i j 1=2
exp f q(;D;
i ;y i ; ^ b i )=2g Importane Sampling
ImportaneSamplingisanumerialintegrationtehniquethattakesadvantageof
the fatthatany integralan bethought ofasanexpetation. Consider the problem
of estimating the multiple integral
I = Z
v
1
(x)dx x2R n
(1.6)
Wesupposethatv
1 2L
2
(x). Thebasiideaofthistehniqueonsistsofonentrating
the distributionofthe samplepointsintheparts ofthe regionofR n
that areof most
\important" instead of spreading them out evenly. We an represent the integral
(1.6) by I = Z v 1 (x) v X (x) v X
(x)dx=E( v 1 (x) v X (x) ) (1.7)
where the expetation E() is taken with respet to v
X
(x): Here X is any random
vetor with p.d.f. v (x), suh that v (x) > 0 for eah x 2 R n
v
X
(x) is alled the Importane Samplingdensity. It is obvious from (1.7) that u =
v
1
(X)=v
X
(X)is anunbiased estimatorof I,with the variane
var(u)= Z v 2 1 (x) v X (x) dx I 2 :
In order to estimate the integral (1.7) we take asample X
1
; ;X
n
from the p.d.f.
of v
X
(x)and substitute itsvalues in the sample-mean formula
^ I = 1 n n X i=1 v 1 (x i ) v X (x i ) :
ImportaneSamplingprovidesa simpleand eÆient way of performing MonteCarlo
integration. The ritial step for the suess of this method is the hoie of an
importanedistributionfromwhihthe sampleis drawn and importaneweights are
alulated. Ideally this distribution orresponds to the density that we are trying to
integrate, but inpratie one uses aneasilysampledapproximation. In NLMM ase,
the likelihoodin (1.2) isan integral overthe random eets, say,
L(yj) = N Y i=1 L i (y i j i )= N Y i=1 Z p(y i jb i ;x i ; i )(b i )db i
Inorder toestimatethe aboveintegral, wewould liketouse ImportaneSamplingto
nd the distribution for the random eets b
i
for the ith subjet. Sine the random
parametersb
i
are assumed tobenormally distributed,if themaximum likelihood
es-timateofb
i ,say,
^
b
i
andtheestimateoftheHessianmatrixH
i
areomputedfromthe
integrand p(y i jb i ; i )(b i
),then theImportaneSamplingdistributionanbeviewed
as N( ^ b i ;H 1 i
). In other words, the density of N( ^ b i ;H 1 i
) should have a shape
sim-ilarto that of the integrand p(y jb
i ;
i )(b
i
the funtion that we want to integrate is, up to a multipliative onstant, equal to
exp ( q(;D;
i ;y
i ;b
i
)=2). As shown in the \unorreted" Laplae approximation,
the Hessian matrix H
i
an be approximated by a seond order Taylor expansion
on b
i
, say, Q
i
, and the integrand is, up to a multipliative onstant, approximately
equal to a N( ^ b i ;Q 1 i
) density. This gives us a natural hoie for the importane
distribution. Letting I(b
i
) denotethis ImportaneSamplingdensity and N
IS
denote
the number of importane samples to be drawn, in pratie one suh sample an be
generated by seleting a vetor z ?
k
with distribution N(0;I
m
) and alulating the
sampleof randomeets as b ? ik = ^ b i +Q 1=2 i z ? k
;k =1; ;N
IS
. Then the likelihood
of y approximated by ImportaneSamplingis
L(yj)= N Y i=1 L i (y i j) = N Y i=1 Z p(y i jb i ;)(b i )db i = N Y i=1 Z 1 (2) (N i +m 0 )=2 jDj 1=2 j i j 1=2 exp 1 2 fy i
f(;b
i )g T 1 i fy i f(;b i )g 1 2 b T i D 1 b i db i = Z N Y i=1 1 (2) N i =2 jDj 1=2 jQ i j 1=2 j i j 1=2 exp 1 2 fy i f(;b i )g T 1 i fy i
f(;b
i )g 1 2 b T i D 1 b i + 1 2 (b i ^ b i ) T Q i (b i ^ b i ) I(b i )db i N Y i=1 1 N IS N IS X k=1 exp n 1 2 (y i
Similar to Laplae approximation method, there are two versions of the Importane
Samplingapproximation (Hartford, 2000). The above method implementedthe idea
ofenteringandsalingandisalledthe\unorreted"ImportaneSamplingmethod
(imp)beauseHessianmatrixH
i onb
i
isapproximatedbyQ
i
,aseond-orderTaylor
expansion of Hessian matrix H
i onb
i
. The orreted version (uimp) isreplaing Q
i
with the Hessian matrix H
i .
Gaussian Quadrature
Gaussian Quadrature is used to approximate integrals of funtions with respet
toa given kernel by a weighted average of the integrand evaluatedat predetermined
absissas. The weightsandabsissas usedinGaussianQuadraturerules forthe most
ommon kernels an be obtained from the tables of Abramowitz and Stegun (1964,
Chapter25)orbyusinganalgorithmproposedbyGolub(1973);afurtherdesription
is given in Monahan (2001, Chapter 10) or Davis and Rabinowitz (1984). The idea
is similarto ImportaneSampling. In the univariate ase, suppose R
h(x)dx isto be
numerially integrated where h(x) an be written as h(x) = f(x)w(x) and w(x) is
the weight funtion. The integralan be approximated by
Z
h(x)dx= Z
f(x)w(x)dx N
GQ
X
i=1 !
i f(x
i )
where !
i
are weights, x
i
are absissas, and N
GQ
is the number of the quadrature
in the nonlinear mixed-eets model we an transform the problem into suessive
appliations of simple one-dimensional Gaussian Quadrature rules. Letting z ?
j , !
j ,
j = 1; ;N
GQ
denote respetively the absissas and weights for (one-dimensional)
GaussianQuadrature rule with N
GQ
pointsbased on the N(0;1) kernel,we have
L(yj) = N Y i=1 Z (2) (N i +m 0 )=2 jDj 1=2 j i j 1=2 exp n 1 2 (y i f i (;b i )) T i 1 (y i f i (;b i )) o
exp ( b T i D 1 b i =2)db i = N Y i=1 Z (2) N i =2 j i j 1=2 exp n 1 2 (y i f i (;D 1=2 z ? )) T i 1 (y i f i (;D 1=2 z ? )) o
exp ( z ?T
z ?
=2)dz ?
by saling b
i =D 1=2 z ? N Y i=1 (2) N i =2 j i j 1=2 N GQ X j 1 N GQ X j q q Y k=1 ! jk exp h 1 2 (y i f i (;D 1=2 z ? j 1 ;;j q )) T i 1 (y i f i (;D 1=2 z ? j 1 ;;j q )) i (1.9) where z ? j 1 ;;jq =(z ? j 1
; ;z ?
jq )
T
.
GaussianQuadrature rulein this asean be viewed asadeterministi version of
Monte Carlointegration inwhihrandom samplesof b
i
are generated fromN(0 ;D)
distribution. The samplesz ?
j
i
and the weights !
j
are xed beforehand, but inMonte
Carlo integration they vary randomly. Beause Importane Sampling tends to be
muhmore eÆientthansimple MonteCarlointegration, wealsoonsiderthe
equiv-aleneofImportaneSamplingintheGaussianQuadratureontext, whihisdenoted
by Adaptive Gaussian Quadrature (Pinheiro and Bates 1995). In this approah the
grid of absissas in b
i
sale is entered around the onditional modes ^
b
i
rather than
of z ?
. The Adaptive GaussianQuadrature approximation very losely resembles the
ImportaneSampling. Thebasidiereneisthatthe formerusesxedabsissas and
weights,butthelatterallowsthemtobedeterminedbyapseudo-randommehanism.
It is alsointeresting to note that Laplae isone-pointGaussian Quadrature beause
in this ase z ?
1
= 0 and !
1
= 1. The Adaptive Gaussian Quadrature also gives the
exat likelihoodwhenthemodelfuntionislinearinb, butthatisnottrue ingeneral
for the GaussianQuadrature approximation(1.9).
The methodsFOL,Laplae,ImportaneSamplingapproximation,and Adaptive
GaussianQuadrature giveexat resultswhen the model funtion f islinear in b.
1.1.1 Comparing approximation methods
Pinheiroand Bates(1995)onstrut asimulationexperimenttoompare the
per-formaneofmethodsinludingFOL,Laplae'sapproximation,ImportaneSampling,
Gaussianquadrature,and thenew version ofAdaptiveGaussianQuadraturefor
esti-mating the parameters of nonlinear mixed eets models. They onlude that FOL
approximationgivesaurateandreliableestimationresultstothelog-likelihood
fun-tion in some nonlinear mixed-eets models suh as the one proposed by Lindstrom
andBates(1990). Themainadvantageofthisapproximationisitsomputational
eÆ-ieny. Inotherases, however, theyndFOLmethodnottobeaurate. They
on-lude that,beauseitis simpleromputationally,it ouldbeused forstarting values
AdaptiveGaussianQuadraturemethodsgivethe best mixofeÆieny and auray.
The drawbak of the Laplae approximation isthat itis just aone-pointestimation.
Inotherwords,weannotimprovetheauraywithmorenumerialeortsineonly
one absissaisused for eah individual. Quadraturerules relyona deterministi
ap-proximationtoanintegralasaweightedsumoftheintegrandevaluatedataspeially
hosen set of values, or absissas, where the weights are also speially hosen. The
approximationthusrequires theintegrandtobeevaluatedateahabsissa,and then
these values are weighted and summed. The auray of the rule for approximating
the true value of the integral is prediated on the number of absissas, N
GQ
whih
may be hosen by the user. The more absissas we have, the better the
approxima-tion willbe ahieved. For m=1 dimensional integrals,itis not too omputationally
burdensome to arry out suh numerial integration, as the absissas need only be
hoseninonedimension. TheapproximationoftenworkswellforN
GQ
assmallas5or
10(Pinheiroand Bates1995, Hartford2000). However, form >1, absissas must be
hosen ineahdimension,and the integrand must be evaluatedateahombination;
for example, for m =3 and N
GQ
=10, there are 10 3
=1000 funtion evaluationsto
perform. Thus, for larger m, the omputational hallengeinreases greatly. Besides,
in the ontextof maximizingthe integrand Q
N
i=1 R
p(y
i jb
i ;
i
;;D)(b
i
); some sort
of optimizationsheme, suh as Newton-Raphson iterative approah, would have to
be invoked to update the parameter estimates. This would of ourse require that
turn would require nm-dimensional integrals to be evaluated at the urrent iterate
for the parameters. It should be lear that the omputational burden ould beome
overwhelming. IfonereduedN
GQ
toaddressthisburden,aurayoftheintegration
(and hene auray of the evaluatedlikelihood)would beompromised. This
prob-lem, wherein omputational hallenges beome overwhelming in higher dimensions,
is oftenalled the urse of dimensionality. Moreover, Gaussian Quadrature
approxi-mationwillperformpoorlyif therandomeets arenot losetonormal. Importane
Samplingapproximationgivesreliable estimationresults, omparabletothose of the
Gaussian Quadrature and Laplae approximations. The main advantage of
Impor-tane Samplingapproximation is its versatility in handling distributions other than
normal, for both the random eets and the subjet-spei error term. For
exam-ple,itwould beratherstraightforwardtoadaptthe ImportaneSamplingintegration
to handle a multivariate t or logisti distribution for the random eets, but that
would not be a trivial task for either the FOL, Laplae, or Gaussian Quadrature
approximations. Also, by using Importane Samplingapproximation, the likelihood
an be approximated to any speied level of auray by inreasing the number of
importanesamples. In ontrast toGaussian Quadraturemethod with the absissas
being determined by aquadrature rule, the methods based onImportane Sampling
are random. This random method leads to random input for an optimization
algo-rithm that resultsin dierent values of the likelihoodwhen re-evaluatedat the same
one would expet the result for the approximation to the likelihood to be slightly
dierent. This randomness of Importane Sampling method auses some numerial
diÆultiesfortheoptimizationalgorithmusedtoobtainthemaximumlikelihood
esti-mates,beausethestohastivariabilityassoiatedwithdierentimportanesamples
overwhelmed the numerial variability of the likelihood for small hanges in the
pa-rameter values (used to alulate numerialderivatives). To avoid this problem, the
same importanesamples are used throughout the alulations (Pinheiroand Bates,
1995). In other words, only one importane sample is used throughout the entire
searh, and thus it requires a large number of absissas whih turns out to be very
omputationally ineÆient. Hene, Importane Sampling approximation is
onsid-erably less omputationally eÆient than FOL, Laplae, or Gaussian Quadrature
approximation. Nonetheless, ImportaneSamplingapproximationis most promising
for high auray among all the numerial methods we disussed above. The reason
is that it not only an handle distributions other than normal but alsoan improve
the auray of the estimation by putting more numerial eort, provided that it is
aordable tothe algorithm.
Optimization Algorithms
An iterativealgorithmfor updating the parameter estimates require:
(1)startingvaluesorestimates(seelaterdisussion onhowtoget startingvaluesand
(2) a method for updatingfrom iterationt to iterationt+1,
(3) a rule for deiding when to stop. Basially, most optimizations for nding an
extremepointofthe funtionf(),denoted by ^
are variationsonNewton'smethod
t+1 =
t a
t H
1
(
t
)5f(
t )
where a
t
is alled the step size or gain in the tth iterative step, 5f and H are the
gradient and Hessian matrix of the objetive funtion f, respetively. Speially,
note that:
if H =I and a
t
=a isa onstant, thenwehave the methodof steepest desent,
if a
t
=1, then we have the \pure" Newton method,
inaseswherethe objetivefuntionisobservedwithaddednoise,stohasti
approx-imation(SA) may be appropriate. A general SA algorithmhas the form
t+1 =
t a
t g(
t
) (1.10)
where g(
t
) isthe gradientfuntion observed with error.
Sine analyti methods are often too ostly in human or omputer eort, some
gradient approximation shemes are usually used suh as the nite dierene (FD).
IntheaseofSA,Spall(1992)introduesthe simultaneousperturbation(SP)forthe
gradient approximation. It turns out that SP is an eÆient approximation beause
Finite Dierene (FD)
In the ase of SA,the diret measurementsorevaluationsofthe gradient 5f(
t )
are usually not available if the objetive funtion is very ompliated. One major
approximation to the gradient is Finite Dierene (FD). The FD formula for
ap-proximatingthe gradientof anobjetive f for the mth parameter, say,
m is
5f
m (
m )=
f(+
m Æ
m
) f(
m Æ
m )
2
m
; m=1;;p
wherepis thedimension ofthe parameter vetor , orthenumberofthe parameters
tobeestimated,
m
isasmallpositivesalarandÆ
m
isavetorwithaoneinthemth
plaeandzeroselsewhere. Eahomponentof isperturbed one-at-a-time,andeah
omponent of the gradient estimate is formed by the ratio of two dierenes. This
methodis motivated diretlyfromthe denitionof agradientasavetor ofppartial
derivatives,eahonstrutedasthelimitoftheratioofahangeinthefuntionvalue
over aorresponding one omponent of the argument vetor.
Simultaneous Perturbation Dierene (SP)
Spall(1992)providesoneformforthegradientapproximationusingsimultaneous
perturbation in SA ase. The simultaneous perturbation dierene approximation
(SP) has all elements of randomly perturbed together, but eah omponent of
the gradient is formed from a ratio involving the individual omponents in the
approximation to the gradient funtiong() has the following form
g()=
f(+) f( )
2
(1.11)
where fgis a vetor omposed of amutually independent mean-zero random
vari-ables f
1 ;
2
; ;
p
g with
m
independent of
0 ;
1
; ;
p
; m = 1 ;p. The
m
omponents are hosen randomlyaording tothe onditions. The two ommon
hoies of are:
1)
m
are i:i:d: 1with equal probability(symmetri Bernoulli distribution).
2) is uniformlydistributedon unit sphere.
Therefore, the mth element of the gradient approximation by SP is
g
m
()=
m
f(+) f( )
2
Note that this estimate diers from the usual FD gradient approximation sine the
numerator is the same for allelements ofthe vetor and thus eah omponent of the
gradient approximation requires only two evaluations, instead of 2p evaluations as
requiredby ageneralFDapproximation. The name\simultaneous perturbation"as
applied to (1.11) arises from the fat that all elements of vetor are being varied
1.2 A numerial approah for the parameter
esti-mation of NLMM
Among all the approximations for the likelihood, Importane Sampling is most
promisingfor high auraybeausethe preision an be improved by putting more
numerial work or absissas. Of all optimization tehniques for updating the
pa-rameter estimates, stohasti reursive proedure an be regarded as an eonomi
and eÆientupdating algorithmbeauseitavoids the alulationof the Hessian
ma-trix. The key quantity in stohasti updating formula is the approximation for the
gradient of the objetive funtion. It is shown that SP is a very eÆient gradient
approximation method beause it saves many evaluations of the objetive funtion
ateahiteration andthe noise does not presentan obstaletoonvergene. We
pro-pose here an optimization approah for the parameter estimation for NLMM whih
approximates the likelihood using Importane Samplingand updates the parameter
estimates using SA algorithmwith SP gradientapproximation.
As disussed in 1.2, the gradient 5f() of the objetive funtion f() an be
approximated by SP method tog() in (1.11)
g()=
f(+) f( )
2
thus the SA optimizationwith SP gradientapproximationhas the formof
t+1 =
t
t a
t
f(+
t
t
) f(
t
t )
2
t
and the mth elementof the parameter,say,
m
an be updatedin the tth iterative
step by (t+1)m = tm tm a t
f(+
t t ) f( t t ) 2 t (1.13)
In the ase of NLMM, our objetive funtion is the likelihoodL(). Therefore,
sub-stitutingf with L inthe formulae (1.12) and (1.13) gives usthe reursive algorithm
for updating the parameter estimates of NLMM, that is
t+1 = t t a t L(+ t t ) L( t t ) 2 t (1.14) or (t+1)m = tm tm a t L(+ t t ) L( t t ) 2 t : (1.15)
However,sinethereisnolosed-formexpressionforthelikelihoodfuntioninNLMM,
or there are nodiret measurements available forL, the taskfor the parameter
esti-matesofNLMMismadeverydiÆult. WewilldisusshowweoveromethisdiÆulty
inChapter2. Toinvestigatethe performaneofour optimizationapproah,weapply
this approah to a simple nonlinear mixed-eets model for tting a real data and
make omparisons among dierent optimizations inluding the proposed approah
and the existing ones.
The optimization for the parameter estimates of NLMM has been implemented
in software pakage SAS pro nlmixed and Splus or S funtion nlme. SAS pro
Quadrature,AdaptiveGaussianQuadrature,andImportaneSampling, thepriniple
ones beingAdaptiveGauss quadrature andarst-order Taylorseries approximation.
Laplaeapproximationisanoption;itisequivalenttoAdaptiveGaussianQuadrature
with only one quadrature point. There are several iterativetehniques for updating
the parameter estimates available in pro nlmixed that work well in various
irum-stanes. They inlude trust region Method (TRUREG), Newton-Raphson method
with Line Searh (NEWRAP), Newton-Raphson method with Ridging (NRRIDG),
Quasi-Newtonmethods(QUANEW),Double-Doglegmethod(DBLDOG),Conjugate
Gradientmethods(CONGRA),andNelder-Mead Simplexmethod(NMSIMP).Some
of these optimization tehniques require the alulation of the gradient or Hessian
matrix and some donot. The nite dierenemethod(FD) is the main method for
the gradient and Hessian approximation required by the reursive proedures
imple-mented in pro nlmixed. Sine eah of the optimizers requires dierent derivatives,
some omputational eÆienies an be gained. Eah optimization method employs
one or more onvergene riteria that determine when it has onverged. Details
on these optimization tehniques are available in Chapter \Nonlinear Mixed Eets
Model" onSASOnlineManual(version8.2). Throughoutthis artile,weuse the
op-timizationmethodpro nlmixed toompare withthe proposedone. Foronveniene,
we name the optimization algorithm of pro nlmixed using Importane Sampling,
FOL, GQ, and Laplae for the likelihoodapproximation as nlmix(IS), nlmix(FOL),
Chapter 2
Stohasti Approximation
Stohasti approximation(SA)isused forthe maximizationproblem,orfor
max-imizing an objetive funtion f of the parameter . For onveniene, we substitute
thenotationf withL,thelikelihoodwhihisourfousedobjetivefuntion
through-out this artile. Other ommon names for L are performane measure,
measure-of-eetiveness,tnessfuntion,orriterion(Spall,2004). Spallsummarizes(2004)that
stohasti optimization algorithmsapply where
(I) there is a random noise in the measurements of L(), say, L()+(), where
represents the randomnoise terms with E(())=0;
and or
(II) there is a random (Monte Carlo) hoie made in the searh diretion as the
2.1 RMSA
Stohasti approximation (SA) is a ornerstone of stohasti optimization.
Rob-bins and Monro (1951) rst introdue SA, whih is alled RMSA, as a general
root-ndingmethodintheunivariateasewhenonlynoisymeasurementsoftheunderlying
funtion are available. In other words, SA isrst designed to nd ?
that solves the
one-dimensional equation
L()=0
with the noisy measurements,denoted by L ? L ? t =L t + t
; or L ?
(
t
)=L(
t
)+(
t
); t=1;;T
where 1 ; 2 ; ; t
are i.i.drandom variableswith mean zero and variane 2
.
Robbins and Monro(1951) denea (nonstationary)Markov hain f
t
g by taking
1
to bean arbitraryonstant and dening
t+1 = t a t L ? ( t ) wherefa t
gisa gainsequene of positivereal numbers satisfyingthe well-known R M
onditions a t !0; X t a t =1; X t a 2 t <1
Details onhow to hoose anappropriate gain sequene fa
t
g inpratie for SA
algo-rithm will be disussed in Chapter 3. In p = 1 dimensional ase, the typial hoie
of the gain a
t is a0 t0+t 1 bt
(Chung, 1954) where a
0 , t
0
are two integers, and b
t
estimate of the slope. In addition, in the p = 1 dimensional ase, we need to be
areful ofthesign inRMSA.Theplot inFigure2.1 shows howthe signofthe RMSA
is determined by the slope of the objetive funtion. Observing L ?
(
t
) positive or
slopepositivesuggests dereasing
t
,and observingL ?
(
t
)negativeorslope negative
suggests inreasing
t
. Under ertain onditions Robbins and Monro (1951) show
Figure2.1: Desribe the relationship between the sign and the slope inRMSA.
−2
−1
0
1
2
−2
−1
0
1
theta
L(theta)
thatf
t
gisaonsistentestimatorof ?
:Inotherwords,
t
onvergestothe root ?
in
probability,i.e.,
t p
! ?
. Insome appliationsit ismoreonvenient tomake agroup
of observations willthen be
L ?
(t;1)
; ;L ? (t;r) (2.1) Let L ? t
be the arithmeti meanof the values (2.1),then the RMSA algorithman be
hanged into t+1 = t a t L ? t 2.2 KWSA
Beginning with the paper of Robbins and Monro muh work has been done in
stohastiapproximation. UnderthestimulusofRobbinsandMonro'smethod,Kiefer
andWolfowitz(1952)establishananalogousreursiveproedureforndingthe
max-imum of a univariatefuntion L() with the noisy measurements
L ? t =L t + t
; or L ?
(
t
)=L(
t
)+(
t
); t=1;;T
where 1 ; 2 ; ; t
are i.i.d randomvariableswith mean zero and variane 2
. They
presentashemewhereby,startingfromanarbitrarypoint
1
;oneobtainssuessively
2 ;
3
;::: suh that
t
onverges to ?
in probability as t !1: The reursive sheme
an be desribed asfollows
t+1 = t a t L ? ( t + t ) L ? ( t t ) 2 t wherefa t
gandf
t
gare gain sequenes ofpositivenumbers suh that
t !0; P a t = 1; P a t t <1, P a 2 2
fa
t
gand f
t
gin pratie willbedisussed inChapter 3. Under regularity onditions
on L() they prove that
t
onverges to the maximum ?
in probability. The above
reursive proedure isalled Kiefer-Wolfwitz stohasti approximation (KWSA).
As disussed earlier,maximizingL() an be onverted to ndingthe root of the
gradient funtion of L(), say, L()= =0. Therefore, KWSA is in fat a speial
ase of RMSA with noisymeasurements of the gradient funtionL()= obtained
by taking the ratio of the nite dierene L
?
(+) L ?
( )
2
. In other words, the usual
nite-dierene is used to approximate the gradient funtion of at eah iteration.
In ontrast to RMSA proedure, KWSA is a gradient-free optimization sine there
is no need to postulate the existene of the derivative of L() (indeed, L() an be
disrete.)
2.3 Generalized RMSA
Blum(1954)extendsRMSAproeduretoamultivariateversionofRobbins-Monro
proedure. Now both 0 and are p-dimensional vetors, then a multidimensional
RMSA algorithmfor ndingthe rootof the funtion L() is onstrutedas
t+1 =
t a
t L
?
(
t )
and the mth variable of the parameter vetor , say,
m
an be updated in the tth
step of a RMSA algorithmby
(t+1)m =
tm a
t L
?
(
Generalized KWSA
Correspondingly,a multivariate KWSAfor maximizingL() is
(t+1)m = tm a t L ? ( t + t Æ m ) L ? ( t t Æ m ) t (2.2)
Under appropriateonditions,theiterationin(2.2)onverges to ?
almostsurely,
(Blum1954, Kushner and Clark 1978, Fabian 1971 orKushner and Yin1997).
2.4 Simultaneous Perturbation SA (SPSA)
As disussed earlier, for a maximizationproblem, there are two methods for
ap-proximatingthegradientfuntion,thatis,FDandSP. KWSAisinfatanalgorithm
formaximizingthe objetivefuntion usingFD approximationfor the gradient.
A-ordingto (1.11),an SP approximationfor gradient is
g()=
f(+) f( )
2
and the parameter vetor an be updated at the tth iteration in the following
iterativefuntion t+1 = t t a t L(+ t t ) L( t t ) 2 t where f t
g is a vetor omposed of mutually independent mean-zero random
vari-ables f t1 ; t2 ;; tp g with tm independent of 0 ; 1 ; ; p
; m =1;p. The
ti
ne-3the detailsonhow tohoose the gainsequenes fa
t
g andf
t
g inpratie forSPSA
algorithm. Foronveniene, we allSA algorithmusingSP for the gradient
approxi-mation assimultaneous perturbation SA, denoted by SPSA.
As disussedearlier,the gradientapproximatedbySPSA requiresonlytwo
evalu-ationsof L(),insteadof 2pevaluationsat eah iteration. The measurementsavings
periteration,ofourse,providesthepotentialforSPSAtoahievelargesavings(over
FDSA,KWSA)inthetotalnumberofmeasurementsrequiredtoestimatewhenpis
large. This potentialis onlyrealizedif the numberof iterationsrequiredfor eetive
onvergene to ?
does not inrease ina way toanel the measurementsavings per
gradient approximation at eah iteration. Spall (1988, 1992) presents onditions for
onvergene ofSPSA (
t a:s
! ?
)usingthedierentialequationapproahasdisussed
in Ljung(1977) and Kushner and Clark (1978)in the ontextof the R M algorithm.
The most interesting theoretial results in Spall (1992) and those that most justify
theuseofSPSA,arethe asymptotieÆienyonlusionsthatfollowfromonditions
given inSpall (1992) showing that
t =2
(
t
?
) d
!N(;) (2.3)
where d
! denotes onvergene indistribution, >0,and are amean vetor and
ovariane matrix. Spall (1992) uses this asymptoti normality result in expression
(2.3) (together with a parallelresult for FDSA) toestablishthe relativeeÆieny of
SPSA. This eÆieny depends on the shape of L(), the values for a
t ,
t
, and the
distribution of
tm
and measurement noise terms ()
expres-sion that an be used to haraterize the relative eÆieny. However, as disussed
in Spall (1992, Set.4) and Chin (1997), in most pratial problems, SPSA will be
asymptotiallymore eÆient than FDSA. In partiular, by equating the asymptoti
mean-squared errors E( ?
) 2
inSPSA and FDSA,they nd
no:of L ?
()valuesinSPSA
no: ofL ?
()valuesinFDSA !
1
p
(2.4)
asthe numberof measurementsofL inboth proedures gets large. Heneexpression
(2.4)impliesthatthe p-foldsavingsperiteration(gradientapproximation)translates
diretly into ap-foldsavingsin the overall optimizationproess.
Overall, of the stohasti optimization tehniques, KWSA, FDSA, SPSA may
be alled \gradient-free" stohasti algorithms. In ontrast, RMSA may be alled
the \gradient-based" algorithm. The \gradient-free" in this irumstane refers to
the ase where the gradientL()= of the likelihood L() is not readily available
or not diretly measured (even with noise). The gradient-free stohasti algorithms
exhibit onvergene properties similar to the gradient-based stohasti algorithms
whilerequiringonlyobjetivefuntionmeasurements. Thegradient-basedalgorithms
relyondiret measurements of thegradientof the objetivefuntion with respet to
the parameters being optimized. These measurements typially yield an unbiased
estimateof the gradient. The main advantage of the gradient-free algorithmsis that
they do not require the knowledge of the gradient funtion whih in many ases is
2.5 The appliation of IS and SPSA in the MLE
of a NLMM
In the ase of NLMM, the maximum likelihood estimation of the parameters
an beobtained bymaximizinglikelihoodL(). When Land the gradientL()=
are observed diretly, thereare, ofourse, manymethodsforndingthe maximum ^
(e.g., steepest desent, Newton-Raphson). However, in NLMM ase, as disussed in
setion 1.1, it is diÆult to get the diret measurements of the likelihood, whih is
given inan integral form
L()= Z
p(yjb;)(b)db
and the orresponding gradientfor aomplex nonlinear mixed-eetsmodel beause
the expetationfuntion is nonlinear in b and there is no losed-formexpression for
theabovelikelihoodfuntion. Butaswementioned,Lanbeapproximatedbytaking
the sample mean of the integrand at importanesamples, say,
L()L
IS ()=
1
N
IS N
Y
i=1 N
IS
X
k=1 p(y
i jb
ik
;)(b
ik )=I(b
ik )
fromequation (1.8). If E(L
IS
())=L(), or,L
IS
is anunbiasedestimate ofL, then
L
IS
an be onsidered to be the noisy measurements of the likelihoodL, say, L ?
in
the framework of SA.Letting
=L
IS
then we have
E() =E[L
IS
() L()℄=L() L()=0
In fat,the expetationof L
IS
an be writtenas
E(L IS ()) = 1 N IS N Y i=1 N IS X k=1 Ep(y i jb ik ;)(b ik )=I(b ik ) = 1 N IS N Y i=1 N IS X k=1 Z fp(y i jb ik ;)(b ik )=I(b ik )gI(b ik )db ik = 1 N IS N Y i=1 N IS X k=1 Z p(y i jb ik ;)(b ik )db ik = 1 N IS N Y i=1 N IS X k=1 L i () = N Y i=1 L i (y i
j)=L(yj)
We show above that L
IS
is an unbiased estimate of L, and therefore, L
IS
are the
noisy measurements of the likelihood. Consequently, aording to (1.14), SA with
Importane Sampling likelihood approximation and SP gradient approximation an
be formulated as
t+1 = t t a t L IS ( t + t t ) L IS ( t t t ) 2 t (2.5) where t
is a vetor with p mutually independent variables in the tth stohasti
step. Forsimpliity,weallthealgorithmgiven in(2.5)asSimultaneous Perturbtion
Stohasti Approximation Using Importane Sampling for likelihood approximaton,
denoted by SPSAIS.
A numerialexperimentalstudy isperformedonthe importanesamplesize fordata
generated from a nonlinear model in Chapter 3. We expet the auray of the
parameterestimationtobeimprovedbyusingmoreimportanesamples. Inaddition,
we willmake omparisons between SPSAIS and other optimization methods suh as
Chapter 3
Appliation of SPSAIS to NLMM
3.1 Example
Consider the orange tree data given by Draper and Smith (1981). This example
was used by Pinheiro & Bates (2000, Ch.8.2) to illustrate how a logisti growth
urve model an be t using the S-Plus/R routine nlme. These data onsist of
seven measurements of the trunk irumferene (in millimeter) on eah of ve
or-ange trees over 1582 days ranging from t = 118 to 1582. The seven time points
are t = 118;484;664;1004;1231;1372; and 1582. A plot of this funtion over 1582
days for a partiular hoie of =(
1 ;
2 ;
3
) is given inFigure 3.1. Lindstrom and
Bates (1990) and Pinheiro and Bates (1995) propose the following logisti nonlinear
Figure 3.1: The trunk irumferene of the orange tree over1582 days.
2
2
2
2
2
2
2
500
1000
1500
80
100
120
140
160
180
200
(time)
(circumference)
1
1
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
4
4
5
5
5
5
5
5
y ij = 1 +b i1+exp [ (x
ij 2 )= 3 ℄ + ij wherey ij
representsthe jthmeasurementonthe ith tree (i=1; ;5;j =1; ;7);
x
ij
is the orresponding day;
1 ;
2 ,
3
are the xed-eets parameters; b
i
are the
random-eetsparametersassumedtobei:i:d:N(0; 2
b
)and
ij
assumed tobei:i:d: N(0; 2
) and independent of b
i
. This model has a logistiform,
and the random eets b
i
enter the model linearly. Pinheiro and Bates estimate the
parameters usingGaussian Quadratureapproximation for the loglikelihoodwith 200
absissas. The results of the estimation are presented in Table 3.1.
Table 3.1: MLE of the parameters
1 ;
2 ;
3 ;
2
; and 2
b
; by nlme
1
2
3
2
b
2
GQ(200) 192.293 727.074 348.074 1003.25 61.49
3.2 Designs and Simulations
To ompare howSPSAIS workson this logistimodel with the optimizationpro
nlmixed asdisussed inChapter2,wesimulatedata andemployoptimizationsSPSA
and pro nlmixed for these data. The omparison riterion is based on relative
absolute error, whih is dened as the averaged ratio of the dierene between the
estimatedparametervalue ^
m
andthe trueparametervalue ?
m
tothe trueparameter
value, that is,
R AE = 1
p p
X
m=1 j
^
m
?
m j
?
m
wherepisthe totalnumberofthe parameters. Forsimpliity,wedenotethe relative
absoluteerrorbyR AE. Whyisitneessarytodenesuhavalue? Thereasonisthat
nourrent algorithman give a\perfet" maximumlikelihoodestimation for NLMM
the nonlinearity of the expetation funtion of the random eets and thus there is
no losed or analytial form for the likelihood. In this sense, we make omparisons
between parameter estimates as if we were treatingthem as \ompeting" parameter
estimatesor\real"MLE.Fromthispointofview,aquantitydenedassuharelative
absoluteerrormakessense. Apparently,a\better"optimizationhasarelativelysmall
R AE inmost ases.
In order to examine whether all optimizations in the preeding disussion work
for this logisti model, we design several parameter settings and generate 20 data
sets under eah parameter setting, and apply all optimizations to these data. The
parameter settings are designed in a way that they represent as versatile proles as
possible. The parameter estimates from the preeding orange example in Table 3.1
provide usthe referene forthe hoie of parameter settings. Sine
1
takes the role
of asaling parameter,itis set toa onstant,for example 200,as given in Table 3.1,
and weset twolevels, high andlowforeah ofthe other fourparameters. Therefore,
wehaveafatorial experimentaldesignof size2222=16. Listed inTable3.2
is the struture of these sixteen parameter settings.
Table 3.2: Designfor the Parameter Settings
1
2
3
2
b
2
high 200 600 600 1000 60
low 200 300 300 600 10
data based solely on the levels of the xed parameters, ignoring rst the random
eets and random error term in the model. The plots given in Figure3.2 show the
data proles determined by the values of
2
and
3
, as given in Table 3.2. We an
see from Figure 3.2 that the proles determined by parameters (200,600,600) and
(200,300,600) reveal a linear shape with the latter havinga bigger rate, and proles
determinedbyparameters(200,600,300)and(200,300,300)revealapolynomialshape,
with thelatter havingabigger rate. Basially,this indiates thatthe generated data
represent 4 dierent proles. Adding the random eets and random error terms
bak onto the model, we get 16 proles showing the variation among individuals
at dierent time points. The orresponding plots for a single sample from the 16
parameter ombinationare given in Figure 3.3 and 3.4. Note that the design points
x
ij
Figure 3.2: Four proles of the simulateddata.
60
80
100
120
140
160
500
1000
1500
theta (200,600,600)
t
y
50
100
150
500
1000
1500
theta(200,600,300)
t
y
100
120
140
160
180
500
1000
1500
theta(200,300,600)
t
y
80
100
120
140
160
180
200
500
1000
1500
theta(200,300,300)
t
Figure3.3: The plots for 16 parameter settings
1
1
1
1
1
1
1
500
1000
1500
60
120
(time),theta(200,600,600,1000,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
60
120
(time),theta(200,600,600,1000,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
60
120
(time),theta(200,600,600,600,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
60
120
(time),theta(200,600,600,600,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
1
1
1
1
1
1
1
500
1000
1500
50
150
(time),theta(200,600,300,1000,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
50
150
(time),theta(200,600,300,1000,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
50
150
(time),theta(200,600,300,600,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
50
150
(time),theta(200,600,300,600,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
Figure3.4: The plots for 16 parameter settings
1
1
1
1
1
1
1
500
1000
1500
80
140
(time),theta(200,300,600,1000,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
80
120
180
(time),theta(200,300,600,1000,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
80
140
(time),theta(200,300,600,600,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
80
120
180
(time),theta(200,300,600,600,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
1
1
1
1
1
1
1
500
1000
1500
60
120
200
(time),theta(200,300,300,1000,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
80
140
200
(time),theta(200,300,300,1000,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
60
120
200
(time),theta(200,300,300,600,60)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
1
1
1
1
1
1
1
500
1000
1500
80
140
200
(time),theta(200,300,300,600,10)
(circumference)
1
1
1
1
1
1
1
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
3.3 Some Implementation Issues
There are several important issues assoiated with the implementation of
SP-SAIS optimization algorithm in terms of the hoie of the starting values, saling,
and algorithm fators in the algorithm. All of these issues play an important role
in determining whether SPSAIS an be started, and or whether the algorithm will
onverge.
3.3.1 Choosing starting values by ad ho method
Beause nonlinear models an have any dierent forms,there is noone, \all
pur-pose" or \ automati" approah for identifying a sensible hoie of starting values.
However, for models withpartiular features,ad ho methodsan be onstrutedfor
starting estimates.
For example,for the logisti model
y
ij
=f(x
ij ;)=
1 +b
i
1+exp [ (x
ij
2 )=
3 ℄
+
ij
where b
i
are i:i:d: N(0; 2
b
) and
ij
are i:i:d: N(0; 2
); an ad ho method an be
derived to form starting estimates for the xed parameter and varianes of the
randomterms,say 2
b
and 2
. Theproedures forobtainingadho startingestimates
for this logisti model are as follows. The individual mixed eets
1i =
1 +b
i
an be viewed as the response of the ith individual at \x
ij
= 1". In other words,
lim
t!1 E(y
ij ) =
1i