• No results found

Analysis of Variance (ANOVA)

In document Introduction to Design of Experiments (Page 130-133)

Statistical Concepts for Designed Experiments

5.2.6 Analysis of Variance (ANOVA)

5.2.6.1 Principles of Analysis of Variance

Analysis of Variance (ANOVA) consists of finding the source of variation of the responses. Suppose that the responses have been calculated with a postulated model

i n

i

f x , x , x , x e

y = (

1 2 3

" ) +

, by using the method of least squares; that is, by

minimizing the sum of the squares of the errors. In this case, the responses are written

i

and the errors as e. These theoretical errors take particular values, written as ri, and called residuals. The residuals are therefore particular values of the errors. We have

) ˆ

i

f ( x

1

, x

2

, x

3

, x

n

y = "

With the new notation, the equation giving the response can be written as:

i i

i

y r

y = ˆ +

Classical analysis of variance uses not only the responses themselves but also the difference between the responses and their mean

( y

i

y )

or

( y ˆ

i

y )

. This difference is designated as “errors about the mean.” In the case of calculated responses, we can also say, “corrected for the mean.”

In the case of the method of least squares, the mean of the observed responses is equal to the mean of the observed responses under the postulated model. Therefore, if

y

is the mean of the responses,

i i

i

y y y r

y − = ˆ − +

Squaring both side of the equation gives:

− = − +

(yi y)2

(yˆi y)2

ri2 (5.2) This is the fundamental relation of analysis of variance. The left side is the sum of squares of the errors around the mean of the observed responses. This sum decomposes into two pieces: the sum of squares of the errors around the mean of the responses calculated with the model, and the sum of the squares of the residuals.

The sum of squares of the residuals is the smallest value from the sum of squares of the errors. Therefore,

σ

= = Δ +

ri2 Minimum of

ei2 Minimum of

( )2i

Dividing the sum of squares of the residuals by the number of degrees of freedom of the residuals gives the variance of the residuals. The variance of the residuals, V(ri), is therefore the smallest variance of the errors, V(e). Therefore,

V (ri)= Minimum ofV(e) = 1

n− p ri2

i=1 i=n

This is the minimum value of the variance of the errors that is the generally adopted standard for evaluating the importance of a coefficient. The variance of the coefficients is calculated by the following general formula, which is used by computers:

V (a

i

) = KV(e) = KV(r

i

)

which can be simplified with factorial or polynomial models to:

)

In summary, the variance of the residuals of the analysis of variance is used to calculate the variance of the coefficients. This is the variance of the coefficients that are used to find the standard for testing whether a coefficient is significant or not.

Note 1

The n observed responses are completely independent, i.e., there is no mathematical relationship among them. They therefore have n degrees of freedom. One degree of freedom is used to calculate the mean. The error variance of the mean of the observed responses, yiy, has n – 1 degrees of freedom.

The calculation of p coefficients takes p – 1 degrees of freedom only because the mean has already been calculated. The variance of the errors around the mean of the observed responses yˆiy, therefore, has p – 1 degrees of freedom.

There remains n – 1 – (p – 1) = n – p degrees of freedom for calculation of the variance of the residuals.

Note 2

These rules and statistical tests apply only to purely random variables. The lack of fit is not a random measurement, but a systematic error, so the statistical rules of errors are not applicable. However, it is often assumed that the lack of fit is the same order of

magnitude as the experimental error. It is a good idea to verify this fact. The measure of

experimental error, V(σ), is assured by repetitions. The measure of the residuals, V(r), is taken from the analysis of variance. The lack of fit V(Δ) can therefore be calculated as:

V (r) = V(Δ) + V( σ )

By comparing the experimental error variance to the lack of fit, it is possible to see if this hypothesis is valid.

Note 3

Do not confuse the errors, the residuals, and the deviations from the mean.

The errors are the differences, e, between the observed responses and the mathematical model postulated before calculating the coefficients by the method of least squares:

i n

i

f x , x , x , x e

y = (

1 2 3

" ) +

The residuals are the differences, ri, between the observed responses and the predicted responses with the coefficients obtained by the method of least squares:

i i

i

y y

r = − ˆ

The errors around the mean are the differences,

y

i

y

or

( y ˆ

i

y )

, between the responses and the mean of the responses.

Note 4

The mean square of the residuals is obtained by dividing the sum of squares of the residuals by its corresponding number of degrees of freedom. It is therefore a

measurement analogous to the variance. Consequently, the square root of this variance is analogous to a standard deviation. This is why the expressions residual variance and standard deviation of the residuals are found to qualify these quantities. However, since these errors are not entirely random, some authors prefer to use the term mean square of the residuals instead of residual variance and root square of the mean square of the residuals instead of standard deviation of the residuals. The expression square root of the mean square of the residuals is a little long, so the abbreviation RMSE (root mean square error) is usually used.

When the model is well fit, the RMSE is used to calculate the error on the coefficients of the postulated model.

Regardless of the units used for the measurements, long calculations are inevitable. Luckily, computers carry out all the calculations and provide the results of the analysis of variance in tabular form. The only element chosen by the experimenter is the a priori model used to calculate the answers. The results of the analysis are dependent on the model choice. You

are invited to redo the calculations of the analysis of variance by utilizing different postulated models to see the influence of the model choice on the RMSE.

5.2.6.2 Presentation of the Analysis of Variance (ANOVA)

Software, even spreadsheets, can construct ANOVA tables. The simplest of these tables has five columns (source of variation, sum of squares, degrees of freedom (DF), mean square, and F-ratio) and four lines (column titles, model corrected for the mean, residuals and observed responses corrected for the mean) similar to Table 5.5. The first column shows the sources of variation. The second column shows the DF of each sum of squares.

Note also that the sum of the DF from the model and residuals is equal to the DF of the observed responses. The third column gives the sums of squares of the errors around the mean. Note that the sum of squares of the observed responses (corrected for the mean) is equal to the sum of the two other columns. The mean squares of the fourth column are the sums of squares divided by their DF. Note that the square root of the mean squares of the residuals serves to calculate the standard, allowing for testing of the coefficients. It is therefore a very important statistic. Finally, the fifth column shows the F-ratio, which is the ratio of the mean square of the model to the mean square of the residuals. This ratio allows the calculation of the probability that the two mean squares are not equal. In other words, if the F-ratio is high (small probability that the model is only due to the effect of the mean), the variations of the observed responses are likely due to variations in the factors. If the F-ratio is near 1 (strong probability that the model is not due to the effects), the variations of the observed responses are comparable to those of the residuals. The p-value corresponding to the F-ratio is also shown.

Table 5.5 Analysis of variance (ANOVA) table

In document Introduction to Design of Experiments (Page 130-133)

Related documents