• No results found

Nonlinear Statistical Models

N/A
N/A
Protected

Academic year: 2021

Share "Nonlinear Statistical Models"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Nonlinear Statistical Models

Earlier in the course, we considered the general linear statistical model (GLM):

yi =β0+β1x1i+· · ·+βkxki +εi, i =1, . . . ,n, written in matrix form as: y =Xβ +ε , OR       y1 .. . yn       =         1 x11 · · · xp1,1 1 x12 · · · xp1,2 .. . ... ... 1 x1n · · · xp1,n                 β0 β1 .. . βp1         +         ε1 ε2 .. . εn         .

Specifically, we discussed estimation of model parameters, measurements of uncertainty in these estimates and the predicted values, and the use of confidence intervals or significance tests to assess the significance of parameters. Through the matrix formulation, we saw that in linear models (model is alinear function of the parameters), the estimates of linear functions of the response values and their standard errors have simple closed-form expressions and can thus be analytically computed. Unfortunately, this will notbe the case for nonlinear models.

Definition: As before, consider a set ofn observed values on a response variable y assumed to be dependent on k

explanatory variables x =(x1, . . . ,xk). This situation can be described through the following nonlinear statistical model:

yi = f(x0

i,θ) +εi, i=1, . . . ,n ,

where x0

i =[x1i,x2i, . . . ,xki]is the row vector of

observations on kexplanatory variables x1, . . . ,xk for the

ithobservational unit,θ =(θ1, . . . ,θp)is a vector of p

parameters, andεi is theitherror term. In a matrix setting, this model can be expressed as:

        y1 y2 .. . yn         | {z } y =         f(x0 1,θ) f(x0 2,θ) .. . f(x0 n,θ)         | {z } f(θ) +         ε1 ε2 .. . εn         | {z } ε .

• Typically, we assumeεi iid∼ (0,σ2) and that the explanatory variables in x are fixed.

• If f(θ)=Xθ, where X is the design matrix as defined earlier, then the model reverts to a linear model.

(2)

• As Leonid has argued, nonlinear models commonly arise as solutions to a system of differential equations which describe some dynamic process in a particular application. Some common examples of nonlinear models are given below:

1. Exponential Growth/Decay Model: yi =αeβxi +ε

i, where x is typically time.

This arises as a solution of the differential equation: d E(y)

d x =βE(y).

2. Two-term Exponential Model:

yi= θ1

θ1θ2 ”

eθ2xieθ1xi—+ε

i.

This model results when two processes lead to exponential growth or decay. For example, if we are monitoring the concentration of a drug in the bloodstream, which enters the bloodstream from muscle tissue and is removed by the kidneys, then this model is a solution to the differential equation:

d E(y)

d x =θ1E(ym)−θ2E(y),

where ymis the concentration of the drug in the muscle tissue and y is the concentration in the blood.

3. Mitscherlich Model: yi=α”1−eβ(xi+δ)—+ε

i. As an example of where this model is used, let y

be the yield of a crop, let x be the amount of some nutrient, and letα be the maximum

attainable yield. Then if we assume the increase in yield is proportional to the difference between αand the actual yield, we get the following differential equation, to which the model is a solution: d E(y)

d x =β

αE(y),

whereδis the equivalent nutrient value of the soil.

4. Logistic Growth Model:

yi = M u

u+ (Mu)∗exp(−kti) +εi .

This model results from a differential equation where we assume the population growth rate decays as the population increases, so that it

(3)

reaches acarrying capacity M. The form of this differential equation was discussed in some detail in the past weeks, and is given by:

d E(y) d t =kE(y) 1− E(y) M , E(y|t =0) =u.

where M is the carrying capacity,u is the

population size at time 0, and kcontrols the rate of growth. The form of this model is shown as a fit to the AIDS data (1981-1992) below.

! " # "$ % & ' % ( % & & )

Fitting Nonlinear Models via Least Squares : In principle, fitting a nonlinear statistical model using least squares is no different than fitting a linear one. That is, for the model

yi= f(x0

i,θ) +εi, we seek to find that parameter vectorθb

that minimizes the sum of the squared errors:

Q(θ|y,X) = n X i=1 ε2i = n X i=1 [yif(x0 i,θ)] 2 = [y f(θ)]0[y f(θ)] =ε0ε where f(θ)is then×1 vector of f(x0

i,θ)values.

DifferentiatingQ with respect to theθj (in an effort to minimizeQ), setting the resulting equations to zero and evaluating atθ =θbgives: ∂Q(θ|y,X) ∂ θj θ=θb =−2 n X i=1 [yif(x0 i,θb)]   f(x 0 i,θb) ∂θbj  =0,

j=1, . . . ,p. These resulting equations are p nonlinear equations in the pparameters, which in general cannot be solved analytically to obtain explicit solutions for θ. As a result, numerical methods are used to solve these

nonlinear systems. In MatLab, there are several numerical methods that can be used, although the default is a

(4)

generalization of the standard Gauss-Newton method, known as the Levenberg-Marquardt Algorithm.

Gauss-Newton Method: This method, like others that may be used, is iterative in nature, and requires starting values for the parameters. These starting values are then

continually adjusted to home in on the least squares

solution. The method is conducted via the following steps: 1. Consider taking a 1st-order Taylor expansion of f(θ)

about some starting parameter vectorθ0:

f(θ)≈ f(θ0) +F(θ0)(θ

θ0), (1)

where F(θ0)is the n×pmatrix of partial derivatives evaluated atθ0 and the n data values x0

i, as given below: F(θ0)=              [f(x0 1,θ 0)] ∂ θ1 [f(x0 1,θ 0)] ∂ θ2 · · · [f(x 0 1,θ 0)] ∂ θp [f(x0 2,θ 0)] ∂ θ1 [f(x0 2,θ 0)] ∂ θ2 · · · [f(x0 2,θ 0)] ∂ θp .. . ... ... [f(x0 n,θ0)] ∂ θ1 [f(x0 n,θ0)] ∂ θ2 · · · [f(x 0 n,θ0)] ∂ θp              .

2. Calculate f(θ0) and F(θ0). Recognizing f(θ)as an

approximatelylinearized model inθ from equation

(1), linear least squares is used to update the starting values and obtain the solution at the next iteration. 3. Repeat steps 1 & 2 until the change in the parameters

is below some tolerance.

Some Key Results: Based on this same Taylor expansion, Gallant (1987) established the following asymptotic (large sample) results:

1. If we assumeεN(0,σ2I), then the least squares parameter vectorθbisapproximatelynormally distributed with meanθ and variance matrix Var(θb) =(F0F)−1σ2, written:

b

θ ∼• N,(F0F)−1σ2] .

• In practice, we compute F(θb)as an estimate of F(θ)and estimate σ2 with MSE=SSE/(np) to give the asymptotic variance-covariance matrix for

b θ as:

(5)

• This gives an approximate 100(1−α)% confidence interval for a given parameterθi as:

b θi±t1α,np·s(θb)ii = θbi±t1α,np·ÆMSE(Fb0Fb)−1 i i , where(bF0Fb)−1 i i is the i th diagonal entry of(Fb0Fb)−1.

2. Again assuming the errorsεN(0,σ2I), then: Var(by) =F(F0F)−1F0σ2, which can be estimated as: s2(by)=dVar(by) =”Fb(Fb0Fb)−1Fb0—·MS E.

• This technique of quantifying uncertainty through the use of a first-order Taylor series expansion is known as the Delta Methodand was the most commonly used technique for finding approximate standard errors before the advent of modern computing.

• Gallant (1987) illustrated several cases where these approximations are poor, giving confidence intervals and hypothesis tests which are completely wrong. • Some more modern techniques for estimating

standard errors and subsequently performing statistical inferences in nonlinear models are the method of bootstrapping, andMarkov chain Monte Carlo (MCMC) methods.

Example: Data were collected on the number of big horn sheep on the Mount Everts winter range in Yellowstone between 1965 and 1999 (obtained from Marty Kardos). Consider the time plot of these data below. The population suffered a die-off resulting from the infectious kerato-conjunctivitis in 1983. Marty indicates that this is fairly typical for bighorn sheep populations: logistic-like population growth up to a carrying capacity followed by disease-related die-offs, and subsequent recovery. For now, consider modeling the initial population growth from 1965 to 1982 using a logistic growth model. Here, we will fit the model, and use it to illustrate the calculation of standard errors for the parameter estimates from this nonlinear model. * + , - * + . / * + . - * + 0 / * + 0 - * + + / * + + - 1 / / / / - / * / / * - / 1 / / 1 - / 2 3 45 6 7 8 9 : ; < = > ? @ A < ; B C A : : D EF G H I JK L M IN OPQ RF M H S R L L T UF G H IN V W X Y Z [ W X X X

(6)

The logistic growth model is a 3-parameter nonlinear model of the form:

yi = M u

u+ (Mu)exp(−kti) +εi, where:

yi = the number of bighorn sheep at time ti,

M = the carrying capacity of the population,

u = the number of sheep at time 0 (1964 here),

k = a parameter representing the growth rate. Fitting this model to the 1965-1982 data with nonlinear least squares using the nlinfitfunction in MatLab gives the following fitted model:

byi = Mbub b

u+ (Mb −ub)exp(−bkti), where Mb =221.0513, ub=27.4643, bk=0.3279, as computed using the Gauss-Newton method outlined earlier. The fitted model is shown in overlay below. Is this a good model? \ ] ^ _ \ ] ^ ^ \ ] ^ ` \ ] a b \ ] a c \ ] a _ \ ] a ^ \ ] a ` \ ] ` b \ ] ` c b d b \ b b \ d b c b b c d b e f gh i j k l m n o p q r s t o n u v t m m w xy z { | } ~  { € €  ‚ | ƒ ~ „ … † ‡ ˆ ‰ Š ‹ ‡ ˆ Œ  Ž  y „ {  | z y … „y ‘ ’y„

The code for fitting the logistic model is given below and can be found on the course webpage under the files sheep.mandlogistic.m. The first file is the driver program and the logistic.mfile defines the logistic function.

(7)

% ============================================= % % Fits a logistic growth model to the 1965-1982 %

% sheep data, and plots the fitted model. %

% ============================================= % sheep82.year = sheep.year(sheep.year<=1982);

% Take only years before 1983

sheep82.count = sheep.count(sheep.year<=1982); % Take only cases before 1983

time = sheep82.year-1964;

% Defines the year-1964 variable

beta = [221 27 0.33]; % Parameter starting values [betahat,resid,J] = nlinfit(time, ...

sheep82.count,@logistic,beta);

% Performs NLS, returning the parameter

% estimates, residuals, & Jacobian

yhat = sheep82.count-resid; % Computes predicted counts

plot(sheep82.year,sheep82.count,’ko’,... sheep82.year,yhat);

% Plots sheep counts vs. year w/ fitted curve xlabel(’Year’,’fontsize’,14);

ylabel(’Number of Bighorn Sheep’,’fontsize’,14); title(’Bighorn Sheep Counts: 1965-1982’,...

’fontsize’,14,’fontweight’,’b’);

• I tried a few different things to try to improve the fit, including a “Weibull-like” modification to the logistic growth model, where I added a parameter in the exponent of the model yielding ageneralized logistic growth model:

yi = M u

u+ (Mu)exp[−ktγi] +εi , whereγis a parameter controlling the degree of curvature in the logistic shape. Ifγ=1, this reverts to the usual logistic growth model.

• This change is “Weibull-like” in the sense that we have added a parameter in the exponent of an exponential function, much like the addition of theγ-parameter in going from exponential growth to a Weibull growth model.

• This “Weibull-like” curvature-adjustment to the logistic model has a corresponding differential equation to which it is the solution. This equation has the form:

d E(y) d t =γt γ−1ku1 u M ‹ ,

(8)

where M is the carrying capacity,u is the population size at time 0, kcontrols the rate of growth, andγis this curvature piece.

• When parameterized this way, it turns out that the resulting parameter estimates for k &γare unstable because they are highly correlated. An alterative, but equivalent way to parameterize this model is as follows:

yi = M u

u+ (Mu)exp[−(kti)γ] +εi .

This is the form of the model we will use for the remainder of our treatment of this generalized logistic model.

• Essentially,γcontrols thesteepnessof the S-shape in the logistic model, where larger values ofγcorrespond to steeper curves. Since our logistic model for the sheep data appears to not be steep enough, we would expect this parameter to be larger than 1.

• Fitting this generalized logistic model produced the fitted curve and parameter estimates as given on the next page. “ ” • – “ ” • • “ ” • — “ ”˜ ™ “ ”˜ š “ ”˜ – “ ”˜ • “ ”˜ — “ ” — ™ “ ” — š • ™ — ™ “ ™ ™ “ š ™ “ – ™ “ • ™ “ — ™ š ™ ™ š š ™ š – ™ › œ ž Ÿ   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª ¥ ¤ « ¬ ª £ £ ­ ®¯ ° ± ² ³ ´ µ ± ¶ ¶ · ¸ ² ¹ ´ º » ¼ ½ ¾ ¿ À Á ½ ¾  ÃÄ Å ¯º ± Æ ¶ ´¶ ³ Ç È¯ É ¶ Ê Ë ² ° ¯ » º¯ Ì Í¯º b M = 207.6399 b u = 67.9731 bk = 0.1380 b γ = 3.4542

Does this seem like a better fit?

The MatLab code used to fit this generalized logistic model is shown below:

(9)

% ===================================== %

% Fits a generalized logistic growth %

% model to the 1965-1982 sheep data, %

% and plots the fitted mode in overlay. % % ===================================== % beta2 = [221 27 0.3 2];

% Parameter starting values

[betahat2,resid2,J2] = nlinfit(time, ... sheep82.count,@genlogistic,beta2); % Performs NLS, returning the parameter

% estimates, residuals, & Jacobian

yhat2 = sheep82.count-resid2; % Computes predicted counts

plot(sheep82.year,sheep82.count,’ko’,... sheep82.year,yhat2);

% Plots the sheep counts vs. year with

% fitted curve in overlay

xlabel(’Year’,’fontsize’,14); ylabel(’Number of Bighorn Sheep’,

’fontsize’,14);

title(’Bighorn Sheep Counts (1965-1982) with Generalized Logistic Fit’,’fontsize’,14,... ’fontweight’,’b’);

Which of these parameters do you expect to be the least stable (i.e.: which do you expect to have the highest

variability?) To get the standard errors of these parameters inθ =(M,u,k,γ), we need to compute the Delta Method approximate variance matrix, given as:

s2(θb)=Var(Ó θb) =(Fb0Fb)−1·MSE .

To compute the Fb-matrix, we first compute the partial derivatives of

fi(θ) = f(x0

i,θ) =

M u

u+ (Mu)exp[−(kti)γ]

with respect to M,u, k, and γrespectively. Doing so with repeated use of the product (quotient) rule gives:

fi(θ) ∂M = u2{1−exp[−(kti)γ]} {u+ (Mu)exp[−(kti)γ]}2, fi(θ) ∂u = M2exp[−(kti)γ] {u+ (Mu)exp[−(kti)γ]}2, fi(θ) ∂k = M u(Mu−1i exp[−(kti)γ] {u+ (Mu)exp[−(kti)γ]}2 , fi(θ) ∂ γ = M u(Mu)(kti)γexp[−(kti)γ]log(kti) {u+ (Mu)exp[−(kti)γ]}2 .

(10)

Plugging the least squares estimates θb= (Mb,ub,bk,γ)b into these derivatives yields the Fb-matrix shown below: Fb=             [f(t1,θb)] ∂Mb [f(t1,θb)] ∂ub [f(t1,θb)] bk [f(t1,θb)] ∂γb [f(t2,θb)] ∂Mb [f(t2,θb)] ∂ub [f(t2,θb)] bk [f(t2,θb)] ∂γb .. . ... ... ... [f(t17,θb)] ∂Mb [f(t17,θb)] ∂ub [f(t17,θb)] bk [f(t17,θb)] ∂γb             =                        0.0001 1.0004 0.0012 −0.0969 0.0053 1.0161 0.0553 −1.9492 0.0153 1.0415 0.1532 −3.6359 0.0371 1.0816 0.3439 −5.0945 0.0825 1.1242 0.6710 −5.0515 0.1729 1.1274 1.1459 −1.5678 0.3349 1.0106 1.6292 6.4654 0.5652 0.7222 1.7487 15.1716 0.7904 0.3683 1.2833 16.5367 0.9293 0.1276 0.6181 10.3187 0.9834 0.0303 0.1980 3.9937 0.9973 0.0050 0.0427 0.9987 0.9997 0.0006 0.0061 0.1615 1.0000 0.0000 0.0006 0.0165 1.0000 0.0000 0.0000 0.0010 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000                        ,

where the actual values were computed in MatLab. Finally, computing MSE·(Fb0Fb)−1 to get the

variance-covariance matrix of the parameter estimates, extracting the diagonal entries of this

4x4 matrix, and taking square roots yields the standard errors of the parameter estimates. The MatLab code to do this is given below:

% ============================================= %

% Computes the Delta Method parameter estimate %

% standard errors for a 4-parameter generalized %

% logistic model. %

% ============================================= % n = length(sheep82.year); % Sample size

p = length(beta2); % No. model parameters

mse = sum(resid2.^2)/(n-p);

% Computes the mean squared error (MSE) Mhat = betahat2(1); uhat = betahat2(2); khat = betahat2(3); ghat = betahat2(4);

% Renames the three parameter estimates den = (uhat+(Mhat-uhat)*exp(-(khat* ...

time).^ghat)).^2;

(11)

dfdM = uhat^2*(1-exp(-(khat*time).^... ghat))./den;

% Computes df/dM (nx1 vector)

dfdu = Mhat^2*exp(-(khat*time).^ghat)./den; % Computes df/du (nx1 vector)

dfdk = Mhat*uhat*(Mhat-uhat).*(time.^... ghat).*ghat.*khat.^(ghat-1).*... exp(-(khat*time).^ghat)./den; % Computes df/dk (nx1 vector) dfdg = Mhat*uhat*(Mhat-uhat).*(khat*... time).^ghat.*log(khat*time).*... exp(-(khat*time).^ghat)./den; % Computes df/dg (nx1 vector) Fhat = [dfdM dfdu dfdk dfdg];

% Combines derivatives into nx4 matrix varmat = inv(Fhat’*Fhat).*mse;

% Inverts estimated (F’F)*MSE matrix separ = sqrt(diag(varmat));

% Standard errors are square root

% diagonal elements of varmat

tstar = tinv(1-.025/p,n-p); % Computes Bonferroni t-value

ci = [betahat2’-tstar*separ betahat2’+ ... tstar*separ];

% Computes Bonferroni CIs for the 3 parameters

betahat2 = 207.640 67.973 0.1380 3.4542

separ = 4.178 7.166 0.0091 0.8093

• This provides approximate Bonferroni-adjusted

simultaneous 95% confidence intervals for the four

parameters (using t1.025/4(np) =t.99375(17−4) = tinv(.99375,13)=2.8961) as: b M±t.99375(np)·SE(Mb) =207.640±2.8961(4.178) = (195.54, 219.74). b u±t.99375(np)·SE(ub) =67.973±2.8961(7.166) = (47.22, 88.73). bk±t.99375(np)·SE(bk) =0.1380±2.8961(0.0091) = (0.1116, 0.1645). b γ±t.99375(np)·SE(bγ) =3.4542±2.8961(0.8093) = (1.11, 5.80). • In MatLab, the Delta Method of computing standard

(12)

and can be easily obtained after calling the nlinfit function, by typing:

ci2 = nlparci(betahat2,resid2,J2)

% Computes NLS Delta Method CIs

This returns approximate 95% individualconfidence intervals for the three model parameters based on the Delta Method standard errors.

• We will next explore the use of bootstrapping as an alternative to using Delta Method-based standard errors. Both techniques have their problems and consider parameter uncertainty for each parameter

separately rather than jointly. Later, we will briefly

explore the use of Markov chain Monte Carlo (MCMC) methods which account for the dependence among model parameters in measuring uncertainty.

References

Related documents

between gender inequality in terms of education and life expectancy and the risk of domestic terrorism. The results do not indicate such support for the hypothesized

There are infinitely many principles of justice (conclusion). 24 “These, Socrates, said Parmenides, are a few, and only a few of the difficulties in which we are involved if

This essay asserts that to effectively degrade and ultimately destroy the Islamic State of Iraq and Syria (ISIS), and to topple the Bashar al-Assad’s regime, the international

From design through construction Pathway is a remodeling company with a focus on healthy homes.. Healthy Homes (continued) Red Room Sundance

The purpose of the integration is to strengthen the ability of the learners to read, write and memorize the Qur’an in a conducive learning atmosphere and to introduce secular

The statute creates a legal fiction that the former spouse or domestic partner has predeceased the insured, the former spouse or domestic partner “having died at the time of entry

Moreover, they may either be established by the members themselves or through mandate; and in that case, are often set up when the network first forms, to stimulate its growth through

Real user transactions across physical, virtual, cloud and mainframe Mobile-to-Mainframe View of Customer Interaction with CEM.. • Customer-centric visibility into application,