Regression Analysis 10.1
10.1.2 Nonlinear Regression
A nonlinear regression model, which is non-linear in regression parameters ๐ฝ = (๐0, ๐1, ๐2โฆ ๐๐), is utilized for estimating ๐ฝ based on the assumed non-linear relationship between ๐ and ๐๐. The relation can be expressed in general form by the set of regression equations for a sample size of ๐ observations as expressed below (Gallant, 1975, p.
73):
๐ฆ๐ = ๐(๐๐, ๐ฝ) + ๐๐ โ๐: ๐ = 1,2, โฆ , ๐ (10.1.56) The term ๐ฆ๐ is the value of the dependent variable for the ๐๐กโ of ๐ observations, ๐(๐, ๐ฝ) represents the expectation function, ๐๐ is the ๐ + 1 dimensional row vector of ๐๐กโ inputs (i.e.
inclusive a constant), ๐ฝ is a ๐ + 1 dimensional vector of unknown parameters, and ๐๐ is the error term for the ๐๐กโ observation with the same properties as in the linear regression (i.e.
๐~๐๐(๐, ๐๐2๐ฐ๐)). The general non-linear model can also be expressed in matrix form as follows (Gallant, p. 73):
๐ = ๐(๐, ๐ฝ) + ๐ (10.1.57)
Where
๐โฒ = (๐ฆ1, ๐ฆ2, โฆ , ๐ฆ๐) (10.1.58)
๐โฒ(๐ฝ) = [๐(๐1, ๐ฝ), ๐(๐2, ๐ฝ), โฆ , ๐(๐๐, ๐ฝ)] (10.1.59)
๐โฒ= (๐1, ๐2, โฆ , ๐๐) (10.1.60)
The likelihood of the general nonlinear model ๐(๐ฝ, ๐๐2) can be represented as shown below (Fox, p. 463):
๐(๐ฝ, ๐๐2) = 1
(2๐๐๐2)๐/2๐๐ฅ๐ [โ 1
2๐๐2๐๐๐ธ(๐ฝ)] (10.1.61)
107 The function ๐๐๐ธ๐๐ฟ(๐ฝ)88 denotes the sum of squares of error function for nonlinear regression and can be explicitly expressed as follows (Fox, p. 463):
๐๐๐ธ๐๐ฟ(๐ฝ) = โ[๐ฆ๐โ ๐(๐๐, ๐ฝ)]2
๐
๐=1
= [๐ โ ๐(๐ฝ)]โฒ[๐ โ ๐(๐ฝ)] = โ๐ โ ๐(๐ฝ)โ2 (10.1.62)
As in the case of the general linear model, the objective is to maximize ๐(๐ฝ, ๐๐2) by minimizing ๐๐๐ธ๐๐ฟ. Subsequently, ๐๐๐ธ๐๐ฟ can be differentiated to derive normal equations as indicated below (Fox, p. 464):
๐๐๐๐ธ๐๐ฟ(๐ฝ)
๐๐ฝ = โ2 โ[๐ฆ๐โ ๐(๐ฅ๐, ๐ฝ)]๐๐(๐ฅ๐, ๐ฝ)
๐๐ฝ (10.1.63)
The normal equations can be achieved by setting these partial derivatives to 0, and replacing the unknown parameters ๐ฝ with the vector of non-linear least squares estimates ๐ฝฬ. The normal equations can also be represented in matrix form as follows (Fox, p. 464):
[๐ญ(๐ฝฬ, ๐)]โฒ[๐ โ ๐(๐ฝฬ, ๐)] = 0 (10.1.64) The term ๐ญ(๐ฝฬ, ๐ฟ) is the matrix of derivatives with ๐๐กโ row and ๐๐กโ column entry (Fox, p. 464):
๐น๐,๐ =๐๐(๐ฝฬ, ๐ฅ๐)
๐๐ฬ๐ (10.1.65)
In nonlinear regression models, the derivatives of expectation functions w.r.t. the parameters in ๐ฝฬ depend on at least one of the parameters in ๐ฝฬ. Note that in linear regression the derivatives are not functions of ๐ทโs. Therefore, in nonlinear regression more advanced methods are required for the computation of ๐ฝฬ. In the following subsection, information on the methods of estimating ๐ฝ is given.
10.1.2.1.1 Methods of Computing Nonlinear Least Squares Estimators
The procedure for the computation of the nonlinear normal equations starts through linearization of the nonlinear function and then continues with the application of the least-squares method on the linearized relation. The linearization of the expectation function is
88 The subscript "๐๐ฟ" in ๐๐๐ธ๐๐ฟ stands for nonlinear regression, in order to distinguish with ๐๐๐ธ previously mentioned in Subsection 10.1.1 for the linear regression.
108 achieved by using the Taylor series expansion of ๐(๐๐ก, ๐ฝ) about the point ๐ฝ0โฒ = [๐1,0, ๐2,0, โฆ , ๐๐,0] without the second and higher order terms of the series as shown below (Draper & Smith, 1981, p. 462):
๐(๐๐, ๐ฝ) = ๐(๐๐, ๐ฝ0) + โ [๐๐(๐๐, ๐ฝ)
๐๐๐ ]
๐ฝ=๐ฝ0 ๐+1
๐=1
(๐๐โ ๐๐,0) โ๐: ๐ = 1,2, โฆ , ๐ (10.1.66)
The zero subscript of ๐ฝ0, in Eq. (10.1.66), indicates the initial (zeroth) iteration for the chosen starting value of ๐ฝ.
The common methods of computing non-linear least squares estimators are stated to be Hartley's modified Gauss-Newton method and Marquardt's algorithm (Gallant, p. 76). The information given in this section encompasses the idea of linearization and iterative process in a routine computer calculation.
Hartley's modified Gauss-Newton method
The Gauss-Newton method is based on the substitution of the first-order Taylor series expansion of ๐(๐ฝ) about a trial (๐) parameter value ๐ฝ๐ in the formula for ๐๐๐ธ๐๐ฟ(๐ฝ) (Gallant, p. 76):
๐๐๐ธ๐๐ฟ(๐ฝ๐) = โ๐ โ ๐(๐ฝ๐) โ ๐น(๐ฝ๐)(๐ฝ โ ๐ฝ๐)โ2 (10.1.67) The approximating sum of squares obtained from Eq. (10.1.67) can be minimized by linear least squares. This opportunity can be attained by substituting the terms in general non-regression model with the below given corresponding terms for ๐ฝ0 (Draper & Smith, p. 462):
๐๐0= ๐(๐๐, ๐ฝ0) (10.1.68)
๐๐0= ๐๐โ ๐๐0 (10.1.69)
๐น๐,๐0 = [๐๐(๐๐, ๐ฝ)
๐๐๐
]
๐ฝ=๐ฝ0
(10.1.70)
109 Subsequently, the substitution results in approximated form of a linear regression model as represented below (Draper & Smith, p. 463):
๐ฆ๐กโ ๐๐ก0 = โ ๐๐0๐น๐ก,๐0 + ๐๐ก, ๐ก = 1,2, โฆ , ๐
๐
๐=1
(10.1.71)
or in vector form as
๐0 = ๐ญ0๐0+ ๐ (10.1.72)
Hence, the estimate of ๐๐, i.e. โ๐ฬ๐โ, can be computed using least squares method as follows (Draper & Smith, p. 463):
๐ฬ๐= (๐ญ0โฒ๐ญ0)โ๐๐ญ0โฒ๐0
= (๐ญ0โฒ๐ญ0)โ๐๐ญ0โฒ(๐ โ ๐0)
(10.1.73)
The value of the parameter ๐ฝ๐ minimizing the approximating sum of squares following ๐ iterations can be expressed as given below (Gallant, p. 76) in Eqs.(10.1.74) and (10.1.75):
๐ฝ๐ = ๐ฝ๐+ ๐ฬ๐ (10.1.74)
๐ฝ๐ = ๐ฝ๐ + [๐นโฒ(๐ฝ๐)๐น(๐ฝ๐)]โ1๐นโฒ(๐ฝ๐)[๐ โ ๐(๐ฝ๐)] (10.1.75) The iterative solution process for the approximating sum of squares proposed by Hartley proceeds as follows (Gallant, p. 76):
1. 0th Iteration: Choose a starting estimate ๐ฝ0 and compute
๐ซ0 = [๐นโฒ(๐ฝ0)๐น(๐ฝ0)]โ1๐นโฒ(๐ฝ0)[๐ โ ๐(๐ฝ0)] (10.1.76) Then, find a ๐0 between 0 and 1 such that
๐๐๐ธ๐๐ฟ(๐ฝ0+ ๐0๐ซ0) โค ๐๐๐ธ๐๐ฟ(๐ฝ0) (10.1.77)
2. 1st Iteration: Let ๐ฝ1 = ๐ฝ0 + ๐0๐ซ0 and compute
๐ซ1 = [๐นโฒ(๐ฝ1)๐น(๐ฝ1)]โ1๐นโฒ(๐ฝ1)[๐ โ ๐(๐ฝ1)] (10.1.78) Then, find a ๐1 between 0 and 1 such that
110 ๐๐๐ธ๐๐ฟ(๐ฝ1+ ๐1๐ซ1) โค ๐๐๐ธ๐๐ฟ(๐ฝ1) (10.1.79) 3. 2nd Iteration: Let ๐ฝ2 = ๐ฝ1 + ๐1๐ซ1
โฎ
A practical method for choosing the step length ๐๐ at each iteration (๐) is by picking up the largest number in the sequence ๐๐ = (.8)๐ ๐ = (0,1,2, โฆ ) for which ๐๐๐ธ๐๐ฟ(๐ฝ๐ + ๐๐๐ซ๐) <
๐๐๐ธ(๐ฝ๐) (Gallant, p.76). See Gallant (p. 76) for other methods for choosing ๐๐. The iterative solution process can be continued until the termination by a stopping rule such as
โ๐ฝ๐โ ๐ฝ๐+1โ < ๐(โ๐ฝ๐โ + ๐) (10.1.80)
and simultaneously
|๐๐๐ธ๐๐ฟ(๐ฝ๐) โ ๐๐๐ธ๐๐ฟ(๐ฝ๐+1)| < ๐(๐๐๐ธ๐๐ฟ(๐ฝ๐) + ๐) (10.1.81) where ๐ > 0 and ๐ > 0 are preset tolerance limits, e.g. ๐ = 10โ5 and ๐ = 10โ3 (Gallant, p.
76).
Marquardt's algorithm
Marquardt's algorithm is another method providing solution to ๐๐๐ธ๐๐ฟ(๐ฝ๐) by approximation as shown below (Gallant, p. 77):
๐ฝ๐ฟ= [๐นโฒ(๐ฝ๐)๐น(๐ฝ๐) + ๐ฟ๐ฐ]โ1๐นโฒ(๐ฝ๐)[๐ โ ๐(๐ฝ๐)] (10.1.82)
The basis of the Marquardt's algorithm is formed by the fact that for all ๐ฟ sufficiently large, ๐ฝ๐ฟ is an improvement such that ๐๐๐ธ๐๐ฟ(๐ฝ๐ฟ) is smaller than ๐๐๐ธ๐๐ฟ(๐ฝ๐) under appropriate conditions (Gallant, p. 77). The initial value of ๐ฟ0 is commonly set to some small number, e.g.
10โ8 (Fox, p. 466). If ๐ + 1๐กโ iteration results in ๐๐๐ธ๐๐ฟ(๐ฝ๐+1) < ๐๐๐ธ๐๐ฟ(๐ฝ๐), then the new value of ๐ฝ๐+1 is accepted and the next iteration is initiated with ๐ฟ๐+2= ๐ฟ๐+1/10; if however, ๐๐๐ธ๐๐ฟ(๐ฝ๐+1) > ๐๐๐ธ๐๐ฟ(๐ฝ๐), then ๐ฟ๐ is increased by a factor of ten and tried again (Fox, p.
466). The Marquardt procedure seems similar to Gauss-Newton; when ๐ฟ is small. Note that Marquardt algorithm is stated to be more difficult to implement than the Gauss-Newton, since both the conditioning factor ๐ฟ and step factor ๐ must be manipulated (Bates & Watts, p. 81).
See Bates and Watts (p. 81) for more information.
111 Gallant (p. 78) notes that using either method may not lead to convergence to ๐ฝ๐ from a starting value. The reasons for not being able to achieving a convergence may depend both on the distance of the starting value from the correct answer and on the extent of over-parameterization in the response function relative to the data. In case of a failure of convergence, it is recommended to find better starting values or to use a similar response function with fewer parameters. Further, in case of a convergence, it is suggested to check for several reasonable starting values to see whether the iterations converge to the same answer for each starting value.
10.1.2.1.2 Statistical Properties of Nonlinear Least Squares Estimators
Nonlinear regression inference is carried out through the linear approximation of non-linearity (i.e. discussed in Subsection 10.1.2.1.1) to reduce the condition to the linear case and then, by analogy use linear model inference results (Bates & Watts, p. 52). Note that the use of approximation leads to approximate (asymptotic) results rather than exact ones. It should be emphasized that the standard error can be exact, when the sample size is infinitely large. In case of a finite sample size, the calculated standard error is only an approximation which improves itself as sample size gets larger.
The two of the non-linear model inferences, which can be considered in analogy with linear model inferences, are mentioned in the following. See Gallant (pp. 78-81) for more information about hypothesis testing and confidence intervals of nonlinear regression models and see Subsection 10.1.1.2 for linear model inferences for analogy.
An approximate 100(1 โ ๐ผ)% confidence interval for ๐๐ with an approximate standard deviation (๐ด๐๐ธ(๐ฬ๐)) can be expressed by the confidence statement given below (Graybill &
Iyer, 1994, p. 610):
๐ถ[๐ฬ๐โ ๐ก๐โ๐โ11โ๐ผ/2๐ด๐๐ธ(๐ฬ๐) โค ๐๐ โค ๐ฬ๐+ ๐ก๐โ๐โ11โ๐ผ/2๐ด๐๐ธ(๐ฬ๐)] โ 1 โ ๐ผ (10.1.83) An approximate hypothesis test for ๐ผ level of significance can be written as represented below:
๐ป0: ๐๐ = ๐ (10.1.84)
๐ป1: ๐๐ โ ๐ (10.1.85)
112 where ๐ is any specified number. The test can be performed as follows (Graybill & Iyer, p.
610):
1. Compute ๐ก0 = ๐ฬ๐โ๐
๐ด๐๐ธ(๐ฬ๐), 2. Reject ๐ป0 if |๐ก0| > ๐ก๐โ๐โ1๐ผ/2 .
10.1.2.2 Nonlinear Regression Diagnostics
Similar to the case of the linear regression, the assumptions underlying a nonlinear regression should also be checked for their validity. The assumptions in nonlinear regression models are listed below (Ritz & Streibig, 2008, p. 55):
1. The mean function is correct,
2. The variance of the errors are homoscedastic, 3. The errors are normally distributed,
4. The errors are not auto correlated.
It can be inferred that the previously mentioned techniques in linear regression diagnostics can be similarly applied on nonlinear regression. See Chapter 5 and Chapter 6 (73-91) for information on the corresponding diagnostic tests and remedies for model violations in nonlinear regression models in Ritz and Strebig (pp. 55-70) respectively.
113