Kun-Nan Chen and Ming-Ju Chen CONTENTS
6.6 ADEQUACY CHECKING FOR REGRESSION MODELS
Once a response surface is successfully constructed by a set of design points and measured responses before proceeding to optimization of the function, the response model has to be checked for its adequacy. This section describes some of the most common tests for accessing the suitability of thefitted model. All these tests may be integrated into the analysis of variance (ANOVA) to examine the adequacy of the regression models.
6.6.1 T
EST FORS
IGNIFICANCE OF THEL
INEARR
EGRESSIONM
ODELWhether a linear relationship exists between the response variable and a subset of the input factors can be determined by the test for significance of regression, this test involves whether to reject or accept a null hypothesis stating that none of the regression variables (input factors) contributes significantly to the model, i.e., H0:b1¼ b2 ¼ ¼ bk¼ 0. Rejection of the hypothesis leads to a conclusion that
the response is linearly related to at least one of the factors. The total sum of squares (SST), regression sum of squares (SSR), and error sum of squares (SSE), respect- ively, are defined as
740 650 680710 620 590 590 620 650 680 710 740 770 740 710 680 650 620 590 90 90 100 110 110 120 120 130 130 140 140 150 (a) (b) (c) (d) 770 800 770 620 540 540 460 460 380 380 300 300 300 220 220 140 140 60 60 −20 −20 770 100 700 620
FIGURE 6.4 Contour plots for second-order response surfaces: (a) maximum surface, (b) saddle system, (c) stationary ridge, and (d) rising ridge.
SST ¼X n i¼1 (yi y)2 (6:26) SSR ¼X n i¼1 (^yi y)2 (6:27) SSE ¼X n i¼1 (yi ^yi)2 (6:28) where
yiis the ith observation
y is the average value of all observations ^yiis the predicted response
n is the total number of observations
The total sum of squares can be partitioned into a regression sum of squares and an error sum of squares: SST¼ SSR þ SSE. Further, defining a statistic as follows:
F0¼ SSR=k
SSE=(n p)¼ MSR
MSE (6:29)
where
p denotes the number of the regression coefficients
MSR and MSE are called the regression and error mean squares, respectively
Denoting g as the significance level of the test, if the computed value of the test statistic in Equation 6.29, f0, is greater than fg,k,np, whose value may be looked up
from the tables of the F distribution or calculated by computer software, the hypothesis should be rejected, and a linear relationship between the response func- tion and at least one of the input factors is confirmed.
6.6.2 T
EST FORS
IGNIFICANCE ONS
UBSETS OF THER
EGRESSIONC
OEFFICIENTS Passing the test for significance of the full regression model mentioned above is often not enough to conclude that the model is appropriate. Testing on the individual terms is required to determine the significance of the linear, interaction, quadratic, and other terms in the regression model. Remember that a response function with nonlinear terms in the input factors can always be rewritten as the form of a standard linear regression model. The coefficient vector b in Equation 6.12 can be partitioned into two parts as follows:b ¼ b½ 1 b2
T (6:30)
where
b1contains the coefficients of the terms to be tested for significance
Suppose b1 is a vector of dimension m 1, then b2 is a vector of dimension
(p m)1, and the terms to be tested involve independent variables x1, x2, . . . , xm.
Now, given a model with b2 coefficients, to test the significance of adding
x1, x2, . . . , xm terms to the model is to test the hypothesis H0: b1¼ 0, and an
appropriate test statistic is (Montgomery and Runger, 1999)
F0¼SSR(bMSE1jb2)=m (6:31)
where SSR(b1jb2)¼ SSR(b) SSR(b2). If the calculated value of the test statistic
in Equation 6.31, f0, is greater than fg,m,np, the hypothesis H0should be rejected.
As a result, at least one of the input factors x1, x2, . . . , xmcontributes significantly
to the regression model. This significance test for a subset of the regression coeffi- cients, also known as the partial F-test, can be used to determine the contribution of each input factor xiby treating it as the last variable to be added to the regression
model, i.e., SSR(bijb0,b1, . . . ,bi1,biþ1, . . . ,bk), i¼ 1, 2, . . . , k.
6.6.3 L
ACK-
OF-F
ITT
ESTThe lack-of-fit test is used to check the integrity of the regression model and to determine if the order of the model is correct. The statistical hypothesis for the lack-of-fit test is that H0: the linear regression model is correct, and we can
test this hypothesis by first splitting the error sum of squares into two portions: SSE¼ SSEPþ SSEL, in which SSEP is the sum of squares due to pure error and
SSELis the sum of squares due to lack offit of the model. To perform the lack-of-fit
test, we must have at least one set of repeated observations on the response. Suppose we have q sets of repeated trials that contain r1, r2, . . . , rqobservations. Then, the
sum of squares due to pure error can be calculated by
SSEP¼ Xq i¼1 Xri j¼1 (yij yi)2 (6:32) where
yijis the jth observation in the ith set containing repeated trials
yistands for the average value of all rirepeat observations, and there are n q
degrees of freedoms for SSEP
The sum of squares due to lack offit, having q 2 degrees of freedom, can now be computed by SSEL¼ SSE SSEP. Finally, the statistic for the lack-of-fit test is
(Montgomery and Runger, 1999):
F0¼
SSEL=(q 2)
SSEP=(n q)¼
MSEL
MSEP (6:33)
If the computed value f0 is greater than fg,q2,nq, the hypothesis H0 should be
data, which means the model should be abandoned and a more suitable model should be sought after. On the other hand, if the computed value leads to the acceptance of the hypothesis H0, the model is probably an appropriate one. However, it is a
common practice to incorporate multiple tests, including those that will be discussed later, to strengthen the confidence in the adequacy of the model.
6.6.4 C
OEFFICIENTS OFM
ULTIPLED
ETERMINATIONThe coefficient of multiple determination R2(Myers and Montgomery, 1995) is a measure of the amount of predictability for the response y by the fitted response model ^y, both evaluated using the independent variables x1, x2, . . . , xk, and the
coefficient is defined as
R2¼SSR