ADEQUACY CHECKING FOR REGRESSION MODELS - Kun-Nan Chen and Ming-Ju Chen CONTENTS

Kun-Nan Chen and Ming-Ju Chen CONTENTS

6.6 ADEQUACY CHECKING FOR REGRESSION MODELS

Once a response surface is successfully constructed by a set of design points and measured responses before proceeding to optimization of the function, the response model has to be checked for its adequacy. This section describes some of the most common tests for accessing the suitability of theﬁtted model. All these tests may be integrated into the analysis of variance (ANOVA) to examine the adequacy of the regression models.

6.6.1 T

EST FOR

S

IGNIFICANCE OF THE

L

INEAR

R

EGRESSION

M

ODEL

Whether a linear relationship exists between the response variable and a subset of the input factors can be determined by the test for signiﬁcance of regression, this test involves whether to reject or accept a null hypothesis stating that none of the regression variables (input factors) contributes signiﬁcantly to the model, i.e., H0:b1¼ b2 ¼ ¼ bk¼ 0. Rejection of the hypothesis leads to a conclusion that

the response is linearly related to at least one of the factors. The total sum of squares (SST), regression sum of squares (SSR), and error sum of squares (SSE), respectively, are deﬁned as

740 650 680710 620 590 590 620 650 680 710 740 770 740 710 680 650 620 590 90 90 100 110 110 120 120 130 130 140 140 150 (a) (b) (c) (d) 770 800 770 620 540 540 460 460 380 380 300 300 300 220 220 140 140 60 60 −20 −20 770 100 700 620

FIGURE 6.4 Contour plots for second-order response surfaces: (a) maximum surface, (b) saddle system, (c) stationary ridge, and (d) rising ridge.

SST ¼X n i_¼1 (yi y)2 (6:26) SSR ¼X n i¼1 (^yi y)2 (6:27) SSE ¼X n i¼1 (yi ^yi)2 (6:28) where

yiis the ith observation

y is the average value of all observations ^yiis the predicted response

n is the total number of observations

The total sum of squares can be partitioned into a regression sum of squares and an error sum of squares: SST¼ SSR þ SSE. Further, deﬁning a statistic as follows:

F0¼ SSR=k

SSE=(n p)¼ MSR

MSE (6:29)

where

p denotes the number of the regression coefﬁcients

MSR and MSE are called the regression and error mean squares, respectively

Denoting g as the signiﬁcance level of the test, if the computed value of the test statistic in Equation 6.29, f0, is greater than fg,k,np, whose value may be looked up

from the tables of the F distribution or calculated by computer software, the hypothesis should be rejected, and a linear relationship between the response function and at least one of the input factors is conﬁrmed.

6.6.2 T

EST FOR

S

IGNIFICANCE ON

S

UBSETS OF THE

R

EGRESSION

C

OEFFICIENTS Passing the test for significance of the full regression model mentioned above is often not enough to conclude that the model is appropriate. Testing on the individual terms is required to determine the significance of the linear, interaction, quadratic, and other terms in the regression model. Remember that a response function with nonlinear terms in the input factors can always be rewritten as the form of a standard linear regression model. The coefficient vector b in Equation 6.12 can be partitioned into two parts as follows:

b ¼ b½ 1 b2

T _(6:30)

where

b1contains the coefﬁcients of the terms to be tested for signiﬁcance

Suppose b1 is a vector of dimension m 1, then b2 is a vector of dimension

(p m)1, and the terms to be tested involve independent variables x1, x2, . . . , xm.

Now, given a model with b2 coefﬁcients, to test the signiﬁcance of adding

x1, x2, . . . , xm terms to the model is to test the hypothesis H0: b1¼ 0, and an

appropriate test statistic is (Montgomery and Runger, 1999)

F0¼SSR(b_MSE1jb2)=m (6:31)

where SSR(b1jb2)¼ SSR(b) SSR(b2). If the calculated value of the test statistic

in Equation 6.31, f0, is greater than fg,m,np, the hypothesis H0should be rejected.

As a result, at least one of the input factors x1, x2, . . . , xmcontributes signiﬁcantly

to the regression model. This signiﬁcance test for a subset of the regression coefﬁ- cients, also known as the partial F-test, can be used to determine the contribution of each input factor xiby treating it as the last variable to be added to the regression

model, i.e., SSR(bijb0,b1, . . . ,bi1,biþ1, . . . ,bk), i¼ 1, 2, . . . , k.

6.6.3 L

ACK

-

-F

T

EST

The lack-of-ﬁt test is used to check the integrity of the regression model and to determine if the order of the model is correct. The statistical hypothesis for the lack-of-ﬁt test is that H0: the linear regression model is correct, and we can

test this hypothesis by ﬁrst splitting the error sum of squares into two portions: SSE¼ SSEPþ SSEL, in which SSEP is the sum of squares due to pure error and

SSELis the sum of squares due to lack ofﬁt of the model. To perform the lack-of-ﬁt

test, we must have at least one set of repeated observations on the response. Suppose we have q sets of repeated trials that contain r1, r2, . . . , rqobservations. Then, the

sum of squares due to pure error can be calculated by

SSEP¼ Xq i¼1 Xri j¼1 (yij yi)2 (6:32) where

yijis the jth observation in the ith set containing repeated trials

yistands for the average value of all rirepeat observations, and there are n q

degrees of freedoms for SSEP

The sum of squares due to lack ofﬁt, having q 2 degrees of freedom, can now be computed by SSEL¼ SSE SSEP. Finally, the statistic for the lack-of-ﬁt test is

(Montgomery and Runger, 1999):

F0¼

SSEL=(q 2)

SSEP=(n q)¼

MSEL

MSEP (6:33)

If the computed value f0 is greater than fg,q2,nq, the hypothesis H0 should be

data, which means the model should be abandoned and a more suitable model should be sought after. On the other hand, if the computed value leads to the acceptance of the hypothesis H0, the model is probably an appropriate one. However, it is a

common practice to incorporate multiple tests, including those that will be discussed later, to strengthen the conﬁdence in the adequacy of the model.

6.6.4 C

OEFFICIENTS OF

M

ULTIPLE

D

ETERMINATION

The coefﬁcient of multiple determination R2(Myers and Montgomery, 1995) is a measure of the amount of predictability for the response y by the ﬁtted response model ^y, both evaluated using the independent variables x1, x2, . . . , xk, and the

coefﬁcient is deﬁned as

R2¼SSR

In document Optimization in Food Engineering (Page 149-152)