A Test for Lack of Fit of a Linear Multiresponse Model

(1)

A Test for Lack of Fit of a Linear

Multiresponse

Model

A. I. Khuri Department of Statistics University of Florida Gainesville, FL 32611

This article introduces a multivariate test for lack of fit of a linear muitiresponse model. The test requires that replicated observations be taken at some points in the experimental region. A procedure is presented for the purpose of detecting which responses are responsible for lack of fit whenever the multivariate test is significant. An example is given to illustrate the procedure.

KEY WORDS: Multivariate lack-of-fit test; Union-intersection principle; Roy’s largest-root test ; Subset selection.

1. INTRODUCTION

The problem of lack of fit of a postulated model has traditionally been investigated in connection with single, or univariate, linear response models (e.g., see Draper and Herzberg 1971, Draper and Smith 1981, Box and Draper 1982, and Shelton et al. 1983). Little attention has been given to this problem in situations involving the fitting of a multiresponse model when observations on several response variables are available. It has been customary in such situations to check the adequacy of fit of each response model individually and apart from the re- maining responses. Since the responses can be corre- lated, lack of fit in one response may influence the fit of the other responses. Hence, the assumption that the responses can be analyzed independently, which is implicit in the approach of several univariate analyses, is not correct. A more appropriate approach would be to use multivariate techniques to comprehensively assess the adequacy of the multiresponse model in a manner that is compatible with the multivariate character of the response data. Box and Draper (1965) alluded to the usefulness of such an approach in conjunction with their determinant criterion for the Bayesian estimation of common parameters from several responses.

In this article I present a multivariate test for lack of fit of a linear multiresponse model. The test, which is given in Section 2, is based on Roy’s union- intersection principle (Morrison 1976, chaps. 4 and 5) and provides a new contribution to the methods used for multiresponse models. We shall assume that replicated observations on all the responses are available at some points in the experimental region. We shall also assume that all of the responses under investigation can be represented by polynomial regression

models not necessarily of the same degree or makeup (in terms of the input variables that influence the responses). This allows for the possibility of having different design matrices for the responses as in Mc- Donald (1975); he used a similar approach in his development of tests for the general linear hypothesis under a linear multiresponse model. In Section 3 a method is proposed for the selection of subsets of the responses that are influential contributors to lack of fit whenever the multivariate test is significant. A numerical example is given in Section 4.

2. A MULTIVARIATE LACK-OF-FIT TEST

Let yl, Y,, . . . , y, be a set of response variables that

can be measured at each of N settings of a group of k input variables, x1, x2, . . . , xk. Consider the models

4Yi) = xi Bi 9 i = 1, 2, . . . , r, (2.1)

where yi is an N x 1 vector of observations on the ith response, E(yi) denotes the mean vector of yi, Xi is an N x pi matrix of rank pi whose elements are known polynomial functions (squares, cubes, mixed products, etc.) of the settings of the input variables, and pi is a pi x 1 vector of unknown constant parameters. A multivariate formulation of the models given in (2.1) is

E(Y) = XB, _(2.2)

(2)

rows of Y are independent observations from multivariate normal populations with a common nonsingular variance-covariance matrix Z of order r x r. We also assume that the design used to fit the models in (2.1) allows for repeated runs to be taken on all r responses at some points in the experimental region. Without loss of generality, we consider that the repeated runs are taken at each of the first n points of the design (1 I n < N).

Model (2.2) was called the multiple-design multivariate linear model by Srivastava (1967) (also see Roy and Srivastava 1964 and McDonald 1975). In this article I refer to it as a linear multiresponse model.

The Development of the Multivariate Lack-of- Fit Test

Let c = [c,, c2, . . . . c,]’ be an arbitrary nonzero I x 1 vector. From (2.2) we obtain the model

E(Y,) = WC 7 (2.3)

where yc = Yc is a vector of N observations on the univariate response y, = I;= r ci yi , and II, = Bc. Note that yE , being a linear combination of normally distributed random vectors, has the multivariate normal distribution with a variance+ovariance matrix given by

var (Y,) = 0: IN, (2.4)

where IJ,” = c’Ec and I, is the identity matrix of order N x N.

Clearly, the multiresponse model (2.2) is correct if and only if the univariate models (2.3) are correct for all c # 0. Hence testing for lack of fit of the model in (2.2) is equivalent to testing for lack of fit of the models in (2.3) that correspond to all nonzero values of the vector c. But yE has the variance-covariance matrix given in (2.4), and by the choice of design, contains replicated observations on the univariate response y, at each of the first n design points. Hence the usual lack-of-fit test described in Draper and Smith (1981, chap. 2) can be applied to model (2.3). This test is based on the partitioning of the residual sum of squares, denoted by S&(c), from the analysis of the fitted model (2.3) into two sums of squares: S!&,(c), the sum of squares due to lack of fit of the model, and SSrn(c), the sum of squares due to pure error. Since X is in general a less than full column- rank matrix, the residual sum of squares is given by the formula

SS&) = c’y’[I, - X(X’X)~X-JYC, (2.5)

where (X’X)- is a generalized inverse of XX (e.g., see Searle 1971, p. 170). Note that X(X’X)-X’ is invariant

to the choice of (XX- and is equal to X,(X0

X0)- ‘X0, where X, is a matrix of order N x p and

rank p, whose columns form a basis for the columns of X, which, if we recall, is of rank p (see McDonald 1975, p. 462). Formula (2.5) can then be written as

S&(c) = c’Y’[I, - x,(x; x,)- ‘x;]Yc. _(2.6)

The pure error sum of squares, S&.,(c), is calculated by using the replicated observations on the univariate response y,. If vi repeated observations are taken at the ith design point (i = 1, 2, . . . , n), then

SS,,(c) can be expressed as

where

SS,,(c) = c’Y’KYc, _(2.7)

K = diag (K,, K,, . . . , K,, 0) _(2.8)

is a block diagonal matrix of order N x N, with 0 being a zero matrix of order (N - ~~=i Vi) x (N - x1= 1 vi), and

Ki = Ii - (l/vi)JVi , i = 1, 2, . . . , n, _(2.9)

where I,i and JVi are the identity matrix and the matrix of ones, respectively, both of order vi x vi. The use of the Ki matrices in (2.7) amounts to com- puting the sum of the squared deviations of the vi observations at the ith repeat-runs site (i = 1, 2, . . . , n) from their mean. By pooling the sums of squares from all of the repeat-runs sites, we obtain the pure error sum of squares. The number of degrees of freedom for the pure-error sum of squares, denoted by

vpE , is thus equal to

VPE = i (Vi - 1). (2.10)

i=l

The lack-of-fit sum of squares, S&,(c), is obtained by subtracting SS,,(c) from SS,(c) and can be written as SS,,(c) = c’Y’[I, - X,(X; X0)- ‘X0 - K]Yc. (2.11) Since SSAc) has N - p degrees of freedom (Searle 1971, p. 176), the number of degrees of freedom for the lack-of-fit sum of squares, denoted by vrr, is given by

vLF = N - p - i (vi - 1). (2.12)

i=l

I shall denote the matrices of the quadratic forms in (2.11) and (2.7) by G, and G, and label them as the lack-of-fit and pure-error matrices, respectively. Thus,

G, = Y’[I, - X,(X;X,)-‘XL - K]Y (2.13)

and

G, = Y’KY. (2.14)

(3)

c 7~ 0, at a level of significance c(, is to compare the ratio

F(c) = (ss,,(c)/v,,)/(ss,,(c)/v,,) (2.15)

with Fa, yLF ypE y the upper lOOa% point of the central

F distribution with vLF and vpE degrees of freedom

(see Draper and Smith 1981). Lack of fit is declared to be significant if F(c) 2 F,, VLF, YPE, or if

c’G,c/c’G, c 2 hhdF,, wF, w.E 2 (2.16)

that is, for large values of the statistic c’G,c/c’G,c. On the other hand, small values of this statistic give no reason to doubt the adequacy of the model.

The construction of a multivariate lack-of-fit test for the multiresponse model (2.2) is based on a general principle known as Roy’s union-intersection principle (see Morrison 1976, chaps. 4 and 5) and can be described as follows: The multiresponse model (2.2) is Icorrect if and only if the univariate models (2.3) are correct for all nonzero vectors c. It follows that model (2.2) can be declared to be inadequate if at least one of the models in (2.3) is deemed inadequate aocording to inequality (2.16) for some c # 0. This occurs when max,(c’G,c/c’G, c) is large and exceeds a certain critical value. But

max (c’G,c/c’G, c) = e,,,(G; ‘G,), E

where e,,,(G;‘G,) denotes the largest eigenvalue of

the r x r matrix G; ‘G, (see Morrison 1976, chap. 5, and Roy et al. 1971, chap. 4). If A, denotes the upper lOOc(% point of the distribution of e,,,(G;‘G,) when model (2.2) is correct, then a significant lack of fit can be detected at the u. level if

e,,,(G’Gl) 2 2,. (2.17)

The lack-of-fit test given by inequality (2.17) is a multivariate analog of the usual lack-of-fit test for a single response and is known as Roy’s largest-root test. Note that since the rank of the matrix K in

formula (2.14) is vpE , the pure-error degrees of free-

dom, the r x r matrix G, will be positive definite, hence nonsingular, with probability 1 if r I vpE (see Roy et al. 1971, p. 35, and McDonald 1975, p. 463). Tables for the critical value A, are available in Roy et al. (1971) and Morrison (1976); those for two and three responses are in Foster and Rees (1957) and Foster (1957), respectively.

In addition to Roy’s largest-root test, other multivariate test statistics could have been used to test lack of fit of model (2.2). These include Wilks’s likeli-

hood ratio, W = det (G,)/det (G, + G,); Pillai’s

trace, tr [G,(G, + GJl] ; and Hotelling-Lawley’s

trace, tr (GY’G,). For a detailed discussion con- cerning these tests, the interested reader is referred to M:uirhead (1982, chap. 10). Roy’s largest-root test is preferred in this article because, unlike the latter

three tests, it is conducive to the method of selecting subsets of the responses contributing to lack of fit, as will be seen in the next section.

3. THE SELECTION OF SUBSETS OF

THE RESPONSES CONTRIBUTING

TO LACK OF FIT

The mere significance of the multivariate lack-of-fit test given in (2.17) does not indicate which responses have contributed to lack of fit. This is a problem of multiple comparisons, as in the analysis of variance, but with respect to responses rather than treatments.

One approach to this problem would be to inspect

the eigenvector of G;‘Gl associated with its largest

eigenvalue, e,,,(G;‘G,). If this vector is denoted by

c* = [c:, ct , . . . , c:]‘, then c’G,c

max - = e,,,(G; ‘G,) = -$$$. (3.1)

E c’G, c 2

The second equality in (3.1) is true because the opti- mal vector c* must satisfy the equation

6% - emaxG2)c* = 0, (3.2)

where emax in (3.2) is an abbreviation for emax(G; ‘G,)

(see Roy 1957, p. 142). The elements of c* produce the linear combination

Yc* = i$lcF Yi 7 (3.3)

which consists of responses believed to have some influence on lack of fit. Since the responses may be measured in different units, the linear combination in (3.3) can be expressed in terms of standardized response variables as

YC* = id:Zi, (3.4)

i=l

where Zi = yi/ll yill with 11 yill being the euclidean

norm (Yf Yi) 1’2 of the vector yi of observations on the ith response, and dt = c: 1) yi 11 (i = 1, 2, . . . , r).

If the ith response variable does not contribute significantly to lack of fit, then the coefficient d: in (3.4) is expected to be close to zero. Consequently, large absolute values of the d: coefficients are associated with responses suggested to be influential with

respect to lack of fit. These responses may be selected

for further investigation to determine if they suffice by themselves to produce a significant lack of fit. More specifically, let S = {yil, yi, , . . . , y,,} be a nonempty subset of responses (1 I il < i, < . . . < i, I r).

Let us compute emax(G; ‘G,) after removing all those

(4)

S. It follows that for any nonempty subset S of the r

responses, we must have

e,,,G lGlh 2 emaxKG ‘GA. (3.5)

The subset S would then suffice to produce a signifi-

cant lack of fit, just like the entire set of responses, if

e,AG; ‘GA 2 4. (3.6)

It is easy to see that whenever (3.6) holds for a nonempty subset S, inequality (2.17) holds also. Conse- quently, if A and A, are the events

A = {e,,,G ‘G,) 2 k),

4 = ~e,,,G;‘Gl)s 2 41,

liquid sample), y, = maximum overrun [overrun is determined by weighing 5-0~. paper cups of foam and unwhipped liquid sample and calculating by the following expression: % overrun = 100 (weight of liquid - weight of foam)/weight of foam], and y, = percent soluble protein. The design used was a central composite design that consisted of a 4 fraction of a 25 factorial design, 10 axial points, and 5 center point replications. The original and coded levels of the five input variables can be found in Table 1. The design, in coded form, and the multiresponse data are given in Table 2. The fitted model for each of the three responses is a quadratic model of the form

then A, will be contained in A for all nonempty sub-

sets S of the r responses. Hence, the union us A, of

all events such as A, will also be contained in A. It

follows that

P(& A,) I P(A) = ct. _(3.7)

Inequality (3.7) states that when all nonempty subsets

of the r responses are examined, the probability of

falsely detecting a significant subset cannot exceed the value ol. Thus the use of the same critical value 1, in these simultaneous tests of significance helps con- trol the Type I family error rate at a level not exceed- ing the value LY.

In this case, the matrix X that appears in (2.2) is of order 31 x 63 and is partitioned as X = [Xl : X2 : X,] with Xl = X2 = X,; hence the rank of X is p = 21. The matrix K defined in (2.8) and (2.9) is equal to K = diag (K,, 0), where K, = I, - ($)J5 and 0 is a zero matrix of order 26 x 26. The pure-error and lack-of-fit degrees of freedom are, respectively, vpE = 4 and vLF = 6. The lack-of-fit and pure-error matrices defined in (2.13) and (2.14), respectively, are given in Table 3.

In summary, an examination of the elements of the eigenvector corresponding to the largest eigenvalue of G; ‘G, combined with an inspection of the values of e,,,(G; ‘G,), can be helpful in pinpointing subsets of the responses that contribute significantly to lack of fit.

4. A NUMERICAL EXAMPLE

Richert et al. (1974) investigated the effects of heating temperature (xl), pH level (x2), redox potential (x,), sodium oxalate (x,), and sodium lauryl sulfate (x5) on foaming properties of whey protein concentrates (WPC). These products are of considerable interest to the food industry because of their potential value as functional food ingredients. Measure- ments were made on three responses, namely, y, = whipping time (the total elapsed time required to produce peaks of foam formed during whipping of a

The value of Roy’s largest-root test statistic given in (2.17) is e,,,(G; ‘G,) = 245.518. The critical value 2, for this test can be obtained from the generalized beta distribution table given in Foster (1957). This table gives values of x,, the upper lOOa% point of the distribution of the largest eigenvalue of the

matrix (G, + G,))‘G, when model (2.2) is correct.

The relationship between 2, and x, is given by 2, = x,/(1 - x,). At the c( = .10 level of significance, x, = .9884; hence A, = 85.21. Consequently, a significant lack of fit can be detected at the 10% level.

In order to assess the contributions of the various responses to lack of fit, I follow the procedure out- lined in Section 3. Here the eigenvector c* corre-

sponding to the eigenvalue emax(G; ‘G,) = 245.518 is

c* = C3.2659, .0385, -.0904]‘. The euclidean norms of the response vectors yl, y2, and y3 are II ylll =

27.60, II y2 II = 5,929.27, and 11 y3 II = 517.49. Thus, in

terms of the standardized response variables, zl, z2, E(Y) = PI) + t PiXi + k PiiXf + CaijXiXj.

i=l i=l i<j

Table 1. The Original and Coded Levels of the Input Variables Coded Levels

Variable -2 -1 0 1 2

Heating temperature (“C), x, 65.0 70.0 75.0 80.0 85.0

PH. xz 4.0 5.0 6.0 7.0 8.0

Redox potential (volt), x3 - ,025 ,075 ,175 ,275 ,375

Sodium oxalate (molar), x, .05 .0375 ,025 .0125 .O

(5)

Table 2. Experimental Design (coded) and the Table 4. Values of em, ( G; ‘G l)s for All Nonempty Multiresponse Data Subsets of the Three Responses Xl X2 X3 X4 X5 vl(min) Y2(%) YnW)

0 0 0 0 0 3.5 1,179 104 0 0 0 0 0 3.5 1,183 107 0 0 0 0 0 4.0 1,120 104 0 0 0 0 0 3.5 1,180 101 0 0 0 0 0 3.0 1,195 103 -1 -1 -1 -1 1 4.75 1,082 81.4 1 -1 -1 -1 -1 4.0 824 69.6 -1 1 -1 -1 -1 5.0 953 105 1 1 -1 -1 1 9.5 759 81.2 -1 -1 1 -1 -1 4.0 1,163 80.8 1 -1 1 -1 1 5.0 839 76.3 -1 1 l-l 1 3.0 1.343 103 1 1 1 -1 -1 7.0 736 76.9 -1 -1 -1 1 -1 5.25 1,027 87.2 1 -1 -1 1 1 5.0 836 74.0 -1 1 -1 1 1 3.0 1,272 98.5 1 1 -1 1 -1 6.5 825 94.1 -1 -1 111 3.25 1,363 95.9 1 -1 1 1 -1 5.0 855 76.8 -1 1 1 1 -1 2.75 1,284 100 1 1 111 5.0 851 104 -2 0 0 0 0 3.75 1,283 100 2 0 0 0 0 11.0 651 50.5 0 -2 0 0 0 4.5 1,217 71.2 0 2 0 0 0 4.0 982 101 0 o-2 0 0 5.0 884 85.8 0 0 2 0 0 3.75 1,147 103 0 0 0 -2 0 3.75 1,081 104 0 0 0 2 0 4.75 1,036 89.4 0 0 0 o-2 4.0 1,213 105 0 0 0 0 2 3.5 1,103 113

and z3, the linear combination (3.4) is written as y,, = 90.1392, + 228.2772, - 46.7812,

* .395z, + z.J - .205Z,) _(4.1)

where - indicates that the two linear combinations are proportional.

From the size of the absolute values of the coefficients in (4.1), we can see that the responses y, and y, are the main contributors to lack of fit, with the latter being more influential than the former. Values

of ~z,,,~=(G;~G~)~ were subsequently computed for all

nonempty subsets S of the three responses. The results are given in Table 4. In addition to the subset of all three responses, the only other significant subset at the o( = .lO level is the subset S = {yl, y2), which supports my earlier finding. I conclude that the responses y, and y, together produce a significant lack

Table 3. The Lack - of - Fit ( G , ) and Pure - Error ( G J Matrices

G, G2 7.4682 165.124 -47.2333 .5 -37.5 .5 165.124 67.825.0 1.248.49 -37.5 3.465.2 -14.6 -47.2333 1.248.49 535.706 .5 -14.6 18.8 Subset (S) e,,(G;lGl), YI. YZI Y3 245.518; YI, Y2 214.307* Yl, Y3 45.532 Y2. Y3 32.107 Yl 14.936 YZ 19.573 Y3 28.495

‘Slgnifmnt at the 10% level.

Critical Value (I,,,)

85.21 85.21 85.21 85.21 85.21 85.21 85.21

of fit, whereas the other two pairs of responses, {yi, y3} and {y2, y3}, do not appear to contribute as much to lack of fit. When considered individually, none of the three responses is sufficient to produce a significant lack of fit.

Note that the value of emax(G;lG1)s for each of the individual-response subsets is the ratio of the lack-of-fit sum of squares to the pure-error sum of squares that result from the analysis of each of the fitted three response models. Hence, if each such

ratio is multiplied by vpE/vLF = 4, we would obtain

the value of the F statistic that would result from

applying the univariate lack-of-fit test to each of the three responses.

All of the numerical results given in this example were obtained by using PROC MATRIX in the Sta- tistical Analysis System (SAS) computer package.

5. CONCLUDING REMARKS

When fitting a multiresponse model of the form given in (2.2) one is concerned about the adequacy of the model in representing the true means of all the r responses within the region of experimentation. In addition to these responses, certain linear combinations of the responses can be of interest (if such combinations are meaningful from the technical point of view). These linear combinations are represented by the univariate model given in (2.3) for par- ticular values of the nonzero vector c. In a multiresponse situation it is not only necessary to check the adequacy of each response model individually but also to check the adequacy of linear combinations of the models. Thus there can be a large number of univariate models of the form (2.3) whose adequacy must be checked. It would be erroneous to carry out univariate lack-of-fit tests for that purpose, for the number of the tests and the correlations among the responses (which are disregarded in such a univariate procedure) would lead to a great increase in the probability of Type I error for the family of tests (family error rate).

(6)

the responses that can be determined by the nonzero vector c. When the multivariate test is significant, there must exist at least one nonzero value of c for which the model (2.3) is declared to be inadequate. In other words, there must exist at least one subset of the responses that, jointly, is responsible for lack of fit. Note that the responses may not individually dis- play a significant lack of fit. The combined effect of lack-of-fit variations from more than one response, however, may be significant. The procedure described in Section 3 can be used to characterize the extent of the contributions of various subsets of the responses to lack of fit.

ACKNOWLEDGMENTS

I thank a referee and an associate editor for their comments and suggestions that helped to improve the presentation of this article, particularly in Sec- tions 3 and 4.

[Received March 1984. Revised December 1984.1

REFERENCES

Box, G. E. P., and Draper, N. R. (1965), “The Bayesian Estimation of Common Parameters From Several Responses,” Biometrika, 52,355-365.

~ (1982) “Measures of Lack of Fit For Response Surface Designs and Predictor Variable Transformations,” Techno- metrics, 24, l-8.

Draper, N. R., and Herzberg, A. M. (1971), “On Lack of Fit,” Technometrics, 13,231-241.

Draper, N. R., and Smith, H. (1981), Applied Regression Analysis (2nd ed.), New York: John Wiley.

Foster, F. G. (1957) “Upper Percentage Points of the Generalized Beta Distribution II,” Biometrika, 44,44453.

Foster, F. G., and Rees, D. H. (1957) “Upper Percentage Points of the Generalized Beta Distribution I,” Biometrika, 44,237-247. McDonald, L. (1975), “Tests For the General Linear Hypothesis

Under the Multiple Design Multivariate Linear Model,” Annals ofStatistics, 3,461-466.

Morrison, D. F. (1976), Multivariate Statistical Methods (2nd ed.), New York: McGraw-Hill.

Muirhead, R. J. (1982), Aspects of Multivariate Statistical Theory, New York: John Wiley.

Richert, S. H., Morr, C. V., and Cooney, C. M. (1974), “Effect of Heat and Other Factors Upon Foaming Properties of Whey Protein Concentrates,” Journal of Food Science, 39.4248. Roy, S. N. (1957) Some Aspects of Multivariate Analysis, New

York: John Wiley.

Roy, S. N., Gnanadesikan, R., and Srivastava, J. N. (1971), Analysis and Design of Certain Quantitative Multiresponse Experiments, New York: Pergamon Press.

Roy, S. N., and Srivastava, J. N. (1964), “Hierarchical and p-Block Multiresponse Designs and Their Analysis,” in Mahalanobis Dedi- catory Volume: Contributions to Statistics, eds. C. R. Rao, D. B. Lahiri, K. R. Nair, P. Pant, and S. S. Shrikhande, Calcutta: Indian Statistical Institute, pp. 419428.

Searle, S. R. (1971), Linear Models, New York: John Wiley. Shelton, J. T., Khuri, A. I., and Cornell, J. A. (1983), “Selecting

Check Points for Testing Lack of Fit in Response Surface Models,” Technometrics, 25,357-365.