• No results found

Regression Analysis 10.1

10.1.2 Nonlinear Regression

A nonlinear regression model, which is non-linear in regression parameters ๐œฝ = (๐œƒ0, ๐œƒ1, ๐œƒ2โ€ฆ ๐œƒ๐‘˜), is utilized for estimating ๐œฝ based on the assumed non-linear relationship between ๐’š and ๐’™๐‘–. The relation can be expressed in general form by the set of regression equations for a sample size of ๐‘› observations as expressed below (Gallant, 1975, p.

73):

๐‘ฆ๐‘– = ๐‘“(๐’™๐‘–, ๐œฝ) + ๐œ–๐‘– โˆ€๐‘–: ๐‘– = 1,2, โ€ฆ , ๐‘› (10.1.56) The term ๐‘ฆ๐‘– is the value of the dependent variable for the ๐‘–๐‘กโ„Ž of ๐‘› observations, ๐‘“(๐’™, ๐œฝ) represents the expectation function, ๐’™๐‘– is the ๐‘˜ + 1 dimensional row vector of ๐‘–๐‘กโ„Ž inputs (i.e.

inclusive a constant), ๐œฝ is a ๐‘˜ + 1 dimensional vector of unknown parameters, and ๐œ–๐‘– is the error term for the ๐‘–๐‘กโ„Ž observation with the same properties as in the linear regression (i.e.

๐~๐‘๐‘›(๐ŸŽ, ๐œŽ๐2๐‘ฐ๐‘›)). The general non-linear model can also be expressed in matrix form as follows (Gallant, p. 73):

๐’š = ๐‘“(๐’™, ๐œฝ) + ๐ (10.1.57)

Where

๐’šโ€ฒ = (๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘›) (10.1.58)

๐‘“โ€ฒ(๐œฝ) = [๐‘“(๐’™1, ๐œฝ), ๐‘“(๐’™2, ๐œฝ), โ€ฆ , ๐‘“(๐’™๐‘›, ๐œฝ)] (10.1.59)

๐โ€ฒ= (๐œ–1, ๐œ–2, โ€ฆ , ๐œ–๐‘›) (10.1.60)

The likelihood of the general nonlinear model ๐‘™(๐œฝ, ๐œŽ๐2) can be represented as shown below (Fox, p. 463):

๐‘™(๐œฝ, ๐œŽ๐2) = 1

(2๐œ‹๐œŽ๐2)๐‘›/2๐‘’๐‘ฅ๐‘ [โˆ’ 1

2๐œŽ๐2๐‘†๐‘†๐ธ(๐œฝ)] (10.1.61)

107 The function ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ)88 denotes the sum of squares of error function for nonlinear regression and can be explicitly expressed as follows (Fox, p. 463):

๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ) = โˆ‘[๐‘ฆ๐‘–โˆ’ ๐‘“(๐’™๐‘–, ๐œฝ)]2

๐‘›

๐‘–=1

= [๐’š โˆ’ ๐‘“(๐œฝ)]โ€ฒ[๐’š โˆ’ ๐‘“(๐œฝ)] = โ€–๐’š โˆ’ ๐‘“(๐œฝ)โ€–2 (10.1.62)

As in the case of the general linear model, the objective is to maximize ๐‘™(๐œฝ, ๐œŽ๐2) by minimizing ๐‘†๐‘†๐ธ๐‘๐ฟ. Subsequently, ๐‘†๐‘†๐ธ๐‘๐ฟ can be differentiated to derive normal equations as indicated below (Fox, p. 464):

๐œ•๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ)

๐œ•๐œฝ = โˆ’2 โˆ‘[๐‘ฆ๐‘–โˆ’ ๐‘“(๐‘ฅ๐‘–, ๐œฝ)]๐œ•๐‘“(๐‘ฅ๐‘–, ๐œฝ)

๐œ•๐œฝ (10.1.63)

The normal equations can be achieved by setting these partial derivatives to 0, and replacing the unknown parameters ๐œฝ with the vector of non-linear least squares estimates ๐œฝฬ‚. The normal equations can also be represented in matrix form as follows (Fox, p. 464):

[๐‘ญ(๐œฝฬ‚, ๐’™)]โ€ฒ[๐’š โˆ’ ๐‘“(๐œฝฬ‚, ๐’™)] = 0 (10.1.64) The term ๐‘ญ(๐œฝฬ‚, ๐‘ฟ) is the matrix of derivatives with ๐‘–๐‘กโ„Ž row and ๐‘—๐‘กโ„Ž column entry (Fox, p. 464):

๐น๐‘–,๐‘— =๐œ•๐‘“(๐œฝฬ‚, ๐‘ฅ๐‘–)

๐œ•๐œƒฬ‚๐‘— (10.1.65)

In nonlinear regression models, the derivatives of expectation functions w.r.t. the parameters in ๐œฝฬ‚ depend on at least one of the parameters in ๐œฝฬ‚. Note that in linear regression the derivatives are not functions of ๐œทโ€™s. Therefore, in nonlinear regression more advanced methods are required for the computation of ๐œฝฬ‚. In the following subsection, information on the methods of estimating ๐œฝ is given.

10.1.2.1.1 Methods of Computing Nonlinear Least Squares Estimators

The procedure for the computation of the nonlinear normal equations starts through linearization of the nonlinear function and then continues with the application of the least-squares method on the linearized relation. The linearization of the expectation function is

88 The subscript "๐‘๐ฟ" in ๐‘†๐‘†๐ธ๐‘๐ฟ stands for nonlinear regression, in order to distinguish with ๐‘†๐‘†๐ธ previously mentioned in Subsection 10.1.1 for the linear regression.

108 achieved by using the Taylor series expansion of ๐‘“(๐’™๐‘ก, ๐œฝ) about the point ๐œฝ0โ€ฒ = [๐œƒ1,0, ๐œƒ2,0, โ€ฆ , ๐œƒ๐‘,0] without the second and higher order terms of the series as shown below (Draper & Smith, 1981, p. 462):

๐‘“(๐’™๐‘–, ๐œฝ) = ๐‘“(๐’™๐‘–, ๐œฝ0) + โˆ‘ [๐œ•๐‘“(๐’™๐‘–, ๐œฝ)

๐œ•๐œƒ๐‘— ]

๐œฝ=๐œฝ0 ๐‘˜+1

๐‘—=1

(๐œƒ๐‘—โˆ’ ๐œƒ๐‘—,0) โˆ€๐‘–: ๐‘– = 1,2, โ€ฆ , ๐‘› (10.1.66)

The zero subscript of ๐œฝ0, in Eq. (10.1.66), indicates the initial (zeroth) iteration for the chosen starting value of ๐œฝ.

The common methods of computing non-linear least squares estimators are stated to be Hartley's modified Gauss-Newton method and Marquardt's algorithm (Gallant, p. 76). The information given in this section encompasses the idea of linearization and iterative process in a routine computer calculation.

Hartley's modified Gauss-Newton method

The Gauss-Newton method is based on the substitution of the first-order Taylor series expansion of ๐‘“(๐œฝ) about a trial (๐‘‡) parameter value ๐œฝ๐‘‡ in the formula for ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ) (Gallant, p. 76):

๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘‡) = โ€–๐’š โˆ’ ๐‘“(๐œฝ๐‘‡) โˆ’ ๐น(๐œฝ๐‘‡)(๐œฝ โˆ’ ๐œฝ๐‘‡)โ€–2 (10.1.67) The approximating sum of squares obtained from Eq. (10.1.67) can be minimized by linear least squares. This opportunity can be attained by substituting the terms in general non-regression model with the below given corresponding terms for ๐œฝ0 (Draper & Smith, p. 462):

๐‘“๐‘–0= ๐‘“(๐’™๐‘–, ๐œฝ0) (10.1.68)

๐‘๐‘—0= ๐œƒ๐‘—โˆ’ ๐œƒ๐‘—0 (10.1.69)

๐น๐‘–,๐‘—0 = [๐œ•๐‘“(๐’™๐‘–, ๐œฝ)

๐œ•๐œƒ๐‘—

]

๐œฝ=๐œฝ0

(10.1.70)

109 Subsequently, the substitution results in approximated form of a linear regression model as represented below (Draper & Smith, p. 463):

๐‘ฆ๐‘กโˆ’ ๐‘“๐‘ก0 = โˆ‘ ๐‘๐‘—0๐น๐‘ก,๐‘—0 + ๐œ–๐‘ก, ๐‘ก = 1,2, โ€ฆ , ๐‘›

๐‘

๐‘—=1

(10.1.71)

or in vector form as

๐’š0 = ๐‘ญ0๐’ƒ0+ ๐ (10.1.72)

Hence, the estimate of ๐’ƒ๐ŸŽ, i.e. โ€œ๐’ƒฬ‚๐ŸŽโ€, can be computed using least squares method as follows (Draper & Smith, p. 463):

๐’ƒฬ‚๐ŸŽ= (๐‘ญ0โ€ฒ๐‘ญ0)โˆ’๐Ÿ๐‘ญ0โ€ฒ๐’š0

= (๐‘ญ0โ€ฒ๐‘ญ0)โˆ’๐Ÿ๐‘ญ0โ€ฒ(๐’š โˆ’ ๐’‡0)

(10.1.73)

The value of the parameter ๐œฝ๐‘€ minimizing the approximating sum of squares following ๐‘‡ iterations can be expressed as given below (Gallant, p. 76) in Eqs.(10.1.74) and (10.1.75):

๐œฝ๐‘€ = ๐œฝ๐‘‡+ ๐’ƒฬ‚๐‘‡ (10.1.74)

๐œฝ๐‘€ = ๐œฝ๐‘‡ + [๐นโ€ฒ(๐œฝ๐‘‡)๐น(๐œฝ๐‘‡)]โˆ’1๐นโ€ฒ(๐œฝ๐‘‡)[๐’š โˆ’ ๐‘“(๐œฝ๐‘‡)] (10.1.75) The iterative solution process for the approximating sum of squares proposed by Hartley proceeds as follows (Gallant, p. 76):

1. 0th Iteration: Choose a starting estimate ๐œฝ0 and compute

๐‘ซ0 = [๐นโ€ฒ(๐œฝ0)๐น(๐œฝ0)]โˆ’1๐นโ€ฒ(๐œฝ0)[๐’š โˆ’ ๐‘“(๐œฝ0)] (10.1.76) Then, find a ๐œ†0 between 0 and 1 such that

๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ0+ ๐œ†0๐‘ซ0) โ‰ค ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ0) (10.1.77)

2. 1st Iteration: Let ๐œฝ1 = ๐œฝ0 + ๐œ†0๐‘ซ0 and compute

๐‘ซ1 = [๐นโ€ฒ(๐œฝ1)๐น(๐œฝ1)]โˆ’1๐นโ€ฒ(๐œฝ1)[๐’š โˆ’ ๐‘“(๐œฝ1)] (10.1.78) Then, find a ๐€1 between 0 and 1 such that

110 ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ1+ ๐œ†1๐‘ซ1) โ‰ค ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ1) (10.1.79) 3. 2nd Iteration: Let ๐œฝ2 = ๐œฝ1 + ๐œ†1๐‘ซ1

โ‹ฎ

A practical method for choosing the step length ๐œ†๐‘™ at each iteration (๐‘™) is by picking up the largest number in the sequence ๐‘Ž๐‘ž = (.8)๐‘ž ๐‘ž = (0,1,2, โ€ฆ ) for which ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘– + ๐‘Ž๐‘ž๐‘ซ๐‘–) <

๐‘†๐‘†๐ธ(๐œฝ๐‘–) (Gallant, p.76). See Gallant (p. 76) for other methods for choosing ๐œ†๐‘™. The iterative solution process can be continued until the termination by a stopping rule such as

โ€–๐œฝ๐‘™โˆ’ ๐œฝ๐‘™+1โ€– < ๐œ€(โ€–๐œฝ๐‘™โ€– + ๐œ) (10.1.80)

and simultaneously

|๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘™) โˆ’ ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘™+1)| < ๐œ€(๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘™) + ๐œ) (10.1.81) where ๐œ€ > 0 and ๐œ > 0 are preset tolerance limits, e.g. ๐œ€ = 10โˆ’5 and ๐œ = 10โˆ’3 (Gallant, p.

76).

Marquardt's algorithm

Marquardt's algorithm is another method providing solution to ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘‡) by approximation as shown below (Gallant, p. 77):

๐œฝ๐›ฟ= [๐นโ€ฒ(๐œฝ๐‘‡)๐น(๐œฝ๐‘‡) + ๐›ฟ๐‘ฐ]โˆ’1๐นโ€ฒ(๐œฝ๐‘‡)[๐’š โˆ’ ๐‘“(๐œฝ๐‘‡)] (10.1.82)

The basis of the Marquardt's algorithm is formed by the fact that for all ๐›ฟ sufficiently large, ๐œฝ๐›ฟ is an improvement such that ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐›ฟ) is smaller than ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘‡) under appropriate conditions (Gallant, p. 77). The initial value of ๐›ฟ0 is commonly set to some small number, e.g.

10โˆ’8 (Fox, p. 466). If ๐‘™ + 1๐‘กโ„Ž iteration results in ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘˜+1) < ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘˜), then the new value of ๐œฝ๐‘™+1 is accepted and the next iteration is initiated with ๐›ฟ๐‘™+2= ๐›ฟ๐‘™+1/10; if however, ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘™+1) > ๐‘†๐‘†๐ธ๐‘๐ฟ(๐œฝ๐‘™), then ๐›ฟ๐‘™ is increased by a factor of ten and tried again (Fox, p.

466). The Marquardt procedure seems similar to Gauss-Newton; when ๐›ฟ is small. Note that Marquardt algorithm is stated to be more difficult to implement than the Gauss-Newton, since both the conditioning factor ๐›ฟ and step factor ๐œ† must be manipulated (Bates & Watts, p. 81).

See Bates and Watts (p. 81) for more information.

111 Gallant (p. 78) notes that using either method may not lead to convergence to ๐œฝ๐‘€ from a starting value. The reasons for not being able to achieving a convergence may depend both on the distance of the starting value from the correct answer and on the extent of over-parameterization in the response function relative to the data. In case of a failure of convergence, it is recommended to find better starting values or to use a similar response function with fewer parameters. Further, in case of a convergence, it is suggested to check for several reasonable starting values to see whether the iterations converge to the same answer for each starting value.

10.1.2.1.2 Statistical Properties of Nonlinear Least Squares Estimators

Nonlinear regression inference is carried out through the linear approximation of non-linearity (i.e. discussed in Subsection 10.1.2.1.1) to reduce the condition to the linear case and then, by analogy use linear model inference results (Bates & Watts, p. 52). Note that the use of approximation leads to approximate (asymptotic) results rather than exact ones. It should be emphasized that the standard error can be exact, when the sample size is infinitely large. In case of a finite sample size, the calculated standard error is only an approximation which improves itself as sample size gets larger.

The two of the non-linear model inferences, which can be considered in analogy with linear model inferences, are mentioned in the following. See Gallant (pp. 78-81) for more information about hypothesis testing and confidence intervals of nonlinear regression models and see Subsection 10.1.1.2 for linear model inferences for analogy.

An approximate 100(1 โˆ’ ๐›ผ)% confidence interval for ๐œƒ๐‘— with an approximate standard deviation (๐ด๐‘†๐ธ(๐œƒฬ‚๐‘—)) can be expressed by the confidence statement given below (Graybill &

Iyer, 1994, p. 610):

๐ถ[๐œƒฬ‚๐‘—โˆ’ ๐‘ก๐‘›โˆ’๐‘˜โˆ’11โˆ’๐›ผ/2๐ด๐‘†๐ธ(๐œƒฬ‚๐‘—) โ‰ค ๐œƒ๐‘— โ‰ค ๐œƒฬ‚๐‘—+ ๐‘ก๐‘›โˆ’๐‘˜โˆ’11โˆ’๐›ผ/2๐ด๐‘†๐ธ(๐œƒฬ‚๐‘—)] โ‰ˆ 1 โˆ’ ๐›ผ (10.1.83) An approximate hypothesis test for ๐›ผ level of significance can be written as represented below:

๐ป0: ๐œƒ๐‘— = ๐‘ (10.1.84)

๐ป1: ๐œƒ๐‘— โ‰  ๐‘ (10.1.85)

112 where ๐‘ is any specified number. The test can be performed as follows (Graybill & Iyer, p.

610):

1. Compute ๐‘ก0 = ๐œƒฬ‚๐‘—โˆ’๐‘ž

๐ด๐‘†๐ธ(๐œƒฬ‚๐‘—), 2. Reject ๐ป0 if |๐‘ก0| > ๐‘ก๐‘›โˆ’๐‘˜โˆ’1๐›ผ/2 .

10.1.2.2 Nonlinear Regression Diagnostics

Similar to the case of the linear regression, the assumptions underlying a nonlinear regression should also be checked for their validity. The assumptions in nonlinear regression models are listed below (Ritz & Streibig, 2008, p. 55):

1. The mean function is correct,

2. The variance of the errors are homoscedastic, 3. The errors are normally distributed,

4. The errors are not auto correlated.

It can be inferred that the previously mentioned techniques in linear regression diagnostics can be similarly applied on nonlinear regression. See Chapter 5 and Chapter 6 (73-91) for information on the corresponding diagnostic tests and remedies for model violations in nonlinear regression models in Ritz and Strebig (pp. 55-70) respectively.

113

The Box-Jenkins Method of Time Series Analysis