1. Thet-test for the relationship between the response variable yand a particu- lar predictor variablexi, in the presence of the other predictor variables,x(i), wherex(i) =x1,x2, . . . ,xi−1, xi+1, . . .xmdenotes the set of all predictors not includingxi.
2. TheF-test for the significance of the regression as a whole.
3. The confidence interval,βi,for the slope of theith predictor variable.
4. The confidence interval for the mean of the response variableygiven a set of particular values for the predictor variablesx1, x2, . . . ,xm.
5. The prediction interval for a random value of the response variable ygiven a set of particular values for the predictor variablesx1,x2, . . . ,xm.
INFERENCE IN MULTIPLE REGRESSION 101
t-Test for the Relationship Betweenyandx
iThe hypotheses for at-test betweenyandxi are given by
r H0: βi =0 r Ha: βi=0
The models implied by these hypotheses are given by:
r Under H0: y=β0+β1x1+ · · · +βi−1xi−1+βixi+βi+1xi+1+ · · · +βmxm+ε
r Under Ha: y=β0+β1x1+ · · · +βi−1xi−1+βi+1xi+1+ · · · +βmxm+ε
Note that the only difference between the two models is the presence or absence of theith term. All other terms are the same in both models. Therefore, interpretations of the results for thist-test must include some reference to the other predictor variables being held constant.
Under the null hypothesis, the test statistict =bi/sbi follows at-distribution withn–m– 1 degrees of freedom, wheresbi refers to the standard error of the slope for theith predictor variable. We proceed to perform thet-test for each of the predictor variables in turn, using the results displayed in Table 3.1.
t-Test for the Relationship Between NutritionalRatingandSugars
r H0: β1=0; model : y=β0+β2(f i ber)+ε.
r Ha: β1=0; model : y=β0+β1(sugar s)+β2(f i ber)+ε.
r In Table 3.1, under “Coef” in the “Sugars” row is found the value of b
1, −2.2090.
r Under “SE Coef” in the “Sugars” row is found the value ofsb
1,the standard error of the slope for sugar content. Heresb1 =0.1633.
r Under “T” is found the value of thet-statistic, that is, the test statistic for the
t-test,
t= b1 sb1
= −2.2090
0.1633 = −13.53
r Under “P” is found thep-value of thet-statistic. Since this is a two-tailed test, thisp-value takes the form p-value=P(|t|>tobs),wheretobsrepresents the value of thet-statistic observed from the regression results. Here p-value= P(|t|>tobs)=P(|t|>−13.53)≈0.000,although of course no continuous p-value ever equals precisely zero.
The p-value method is used, whereby the null hypothesis is rejected when the p-value of the test statistic is small. Here we have p-value≈0.00,which is smaller than any reasonable threshold of significance. Our conclusion is therefore to reject the null hypothesis. The interpretation of this conclusion is that there is evidence for a linear relationship between nutritional rating and sugar content in the presence of fiber content.
SPH SPH
JWDD006-03 JWDD006-Larose November 25, 2005 17:26 Char Count= 0
102 CHAPTER 3 MULTIPLE REGRESSION AND MODEL BUILDING
t-Test for the Relationship Between Nutritional Rating
and Fiber Content
r H0: β2=0; model : y=β0+β1(sugars)+ε.
r Ha: β2=0; model : y=β0+β1(sugars)+β2(fiber)+ε. r In Table 3.1, under “Coef” in the “Fiber” row is foundb2=2.8408.
r Under “SE Coef” in the “Fiber” row is found the standard error of the slope for fiber content,sb2=0.3032.
r Under “T” is found the test statistic for thet-test,
t= b2 sb2
= 2.8408 0.3032 =9.37
r Under “P” is found thep-value of thet-statistic. Again,p-value≈0.000. Thus, our conclusion is again to reject the null hypothesis. We interpret this to mean that there is evidence for a linear relationship between nutritional rating and fiber content in the presence of sugar content.
F-Test for the Significance of the Overall Regression Model
Next we introduce the F-test for the significance of the overall regression model. Figure 3.4 illustrates the difference between thet-test and theF-test. One may apply a separatet-test for each predictorx1, x2,orx3,examining whether a linear relationship
t-test t-test t-test F-test x1 x2 x3 y y {x1, x2, x3}
Figure 3.4 TheF-test considers the relationship between the target and the set of predictors, taken as a whole.
INFERENCE IN MULTIPLE REGRESSION 103
exists between the target variableyand that particular predictor. On the other hand, theF-test considers the linear relationship between the target variableyand theset of predictors(e.g.,{x1,x2, x3}) taken as a whole.
The hypotheses for theF-test are given by r H0: β1=β2= · · · =βm=0.
r Ha: At least one of theβi does not equal 0.
The null hypothesis asserts that there is no linear relationship between the target variableyand the set of predictors,x1, x2, . . . ,xm.Thus, the null hypothesis states that the coefficientβifor each predictorxiexactly equals zero, leaving the null model to be
r Model underH0: y=β0+ε
The alternative hypothesis does not assert that the regression coefficients all differ from zero. For the alternative hypothesis to be true, it is sufficient for a single, unspecified regression coefficient to differ from zero. Hence, the alternative hypoth- esis for theF-test does not specify a particular model, since it would be true if any, some, or all of the coefficients differed from zero.
As shown in Table 3.2, theF-statistic consists of a ratio of two mean squares: the mean square regression (MSR) and the mean square error (MSE). Amean square represents a sum of squares divided by the degrees of freedom associated with that sum of squares statistic. Since the sums of squares are always nonnegative, so are the mean squares. To understand how theF-test works, we should consider the following. The MSE is always a good estimate of the overall variance (see model assump- tion 2)σ2,regardless of whether or not the null hypothesis is true. (In fact, recall that we use the standard error of the estimate,s=√MSE,as a measure of the usefulness of the regression, without reference to an inferential model.) Now, the MSR is also a good estimate ofσ2,but only on the condition that the null hypothesis is true. If the null hypothesis is false, MSR overestimatesσ2.
So consider the value of F=MSR/MSE with respect to the null hypothesis. Suppose that MSR and MSE are close to each other, so that the value ofF is small (near 1.0). Since MSE is always a good estimate ofσ2, and MSR is only a good estimate ofσ2when the null hypothesis is true, the circumstance that MSR and MSE are close to each other will occur only when the null hypothesis is true. Therefore, when the value ofFis small, this is evidence that the null hypothesis is true.
However, suppose that MSR is much greater than MSE, so that the value ofF is large. MSR is large (overestimatesσ2) when the null hypothesis is false. Therefore, when the value ofFis large, this is evidence that the null hypothesis is false. Therefore, for theFtest, we shall reject the null hypothesis when the value of the test statistic Fis large.
The F-statistic observed,F =Fobs=MSR/MSE, follows anFm,n−m−1dis- tribution. Since all F-values are nonnegative, theF-test is a right-tailed test. Thus, we will reject the null hypothesis when the p-value is small, where the p-value is the area in the tail to the right of the observed F-statistic. That is, p-value = P(Fm,n−m−1>Fobs),and we reject the null hypothesis when P(Fm,n−m−1>Fobs) is small.
SPH SPH
JWDD006-03 JWDD006-Larose November 25, 2005 17:26 Char Count= 0
104 CHAPTER 3 MULTIPLE REGRESSION AND MODEL BUILDING
F-Test for the Relationship Between Nutritional Rating and
{Sugar and Fiber}Taken Together
r H0: β1=β2 =0; model : y=β0+ε. r Ha: At least one ofβ1andβ2does not equal zero.
r The model implied byHais not specified, and may be any one of the following:
y=β0+β1(sugars)+ε y=β0+β2(fiber)+ε
y=β0+β1(sugars)+β2(fiber)+ε
r In Table 3.1, under “MS” in the “Regression” row of the “Analysis of Variance” table, is found the value of MSR, the mean square regression, MSR = 6058.9. r Under “MS” in the “Residual Error” row of the “Analysis of Variance” table is
found the value of MSE, the mean-squared error, MSE=38.9.
r Under “F” in the “Regression,” row of the “Analysis of Variance” table is found the value of the test statistic,
F= MSR
MSE =
6058.9
38.9 =155.73
r The degrees of freedom for theF-statistic are given in the column marked “DF,” so that we havem=2, andn−m−1=74.
r Under “P” in the “Regression” row of the “Analysis of Variance” table is found the p-value of the F-statistic. Here, the p-value is P(Fm,n−m−1>Fobs)= P(F2,74>155.73)≈0.000,although again no continuousp-value ever equals precisely zero.
This p-value of approximately zero is less than any reasonable threshold of significance. Our conclusion is therefore to reject the null hypothesis. The interpre- tation of this conclusion is the following. There is evidence for a linear relationship between nutritional rating on the one hand, and the set of predictors, sugar content and fiber content, on the other. More succinctly, we may simply say that the overall regression model is significant.