Hypothesis Tests and Confidence Intervals in Multiple Regression 7.1 Multiple Choice

1) The confidence interval for a single coefficient in a multiple regression A) makes little sense because the population parameter is unknown.

B) should not be computed because there are other coefficients present in the model. C) contains information from a large number of hypothesis tests.

D) should only be calculated if the regression R2 is identical to the adjusted R2. Answer: C

2) The following linear hypothesis can be tested using the F-test with the exception of A) 2 = 1 and 3= 4/ 5.

B) 2 =0.

C) 1 + 2 = 1 and 3 = -2 4. D) 0 = 1 and 1 = 0. Answer: A

3) The formula for the standard error of the regression coefficient, when moving from one explanatory variable to two explanatory variables,

A) stays the same.

B) changes, unless the second explanatory variable is a binary variable. C) changes.

D) changes, unless you test for a null hypothesis that the addition regression coefficient is zero. Answer: C

4) All of the following are examples of joint hypotheses on multiple regression coefficients, with the exception of A) H0 : 1 + 2 = 1 B) H0 : 3 2 = 1 and 4 = 0 C) H0 : 2 = 0 and 3 = 0 D) H0 : 1 = - 2 and 1 + 2 = 1 Answer: A

5) When testing joint hypothesis, you should

A) use t-statistics for each hypothesis and reject the null hypothesis is all of the restrictions fail. B) use the F-statistic and reject all the hypothesis if the statistic exceeds the critical value.

C) use t-statistics for each hypothesis and reject the null hypothesis once the statistic exceeds the critical value for a single hypothesis.

D) use the F-statistics and reject at least one of the hypothesis if the statistic exceeds the critical value. Answer: D

6) The overall regression F-statistic tests the null hypothesis that A) all slope coefficients are zero.

B) all slope coefficients and the intercept are zero.

C) the intercept in the regression and at least one, but not all, of the slope coefficients is zero. D) the slope coefficient of the variable of interest is zero, but that the other slope coefficients are not. Answer: A

7) For a single restriction (q = 1), the F-statistic A) is the square root of the t-statistic.

B) has a critical value of 1.96. C) will be negative.

D) is the square of the t-statistic. Answer: D

8) The homoskedasticity-only F-statistic is given by the following formula A) F= (SSRrestricted - SSRunrestricted)/q (SSRunrestricted /(n - kunrestricted -1) B) F= (SSRrestricted - SSRunrestricted)/q SSRrestricted /(n - kunrestricted -1) C) F= (SSRunrestricted - SSRrestricted)/q SSRunrestricted /(n - kunrestricted -1) D) F= (SSRrestricted - SSRunrestricted)/q-1) SSRunrestricted /(n - kunrestricted) Answer: A

9) All of the following are correct formulae for the homoskedasticity-only F-statistic, with the exception of A) F= (SSRrestricted - SSRunrestricted)/q SSRunrestricted /(n - kunrestricted -1) B) F= (SSRunrestricted - SSRrestricted)/q SSRrestricted /(n - krestricted -1) C) F= (SSRrestricted - SSRunrestricted) SSRunrestricted × n- kunrestricted-1 q D) F = SSRrestricted SSRunrestricted-1 × (n- kunrestricted-1) q Answer: B

10) In the multiple regression model, the t-statistic for testing that the slope is significantly different from zero is calculated

A) by dividing the estimate by its standard error. B) from the square root of the F-statistic. C) by multiplying the p-value by 1.96.

D) using the adjusted R2 and the confidence interval. Answer: A

11) To test joint linear hypotheses in the multiple regression model, you need to

A) compare the sums of squared residuals from the restricted and unrestricted model. B) use the heteroskedasticity-robust F-statistic.

12) The homoskedasticity-only F-statistic is given by the following formula A) F= (R2unrestricted - R2restricted)/q (1-R2unrestricted) /(n - kunrestricted -1) B) F= 1 - R2unrestricted)/q R2unrestricted /(n - kunrestricted -1) C) F= (R2unrestricted - R2restricted)/q (1-R2unrestricted) /(n - krestricted -1) D) F= (R2unrestricted - R2unrestricted)/q (1-R2unrestricted) /(n - krestricted -1) Answer: A

13) Let R2unrestricted and R2restricted be 0.4366 and 0.4149 respectively. The difference between the unrestricted and the restricted model is that you have imposed two restrictions. There are 420 observations. The F-statistic in this case is A) 4.61 B) 8.01 C) 10.34 D) 7.71 Answer: B

14) If you wanted to test, using a 5% significance level, whether or not a specific slope coefficient is equal to one, then you should

A) subtract 1 from the estimated coefficient, divide the difference by the standard error, and check if the resulting ratio is larger than 1.96.

B) add and subtract 1.96 from the slope and check if that interval includes 1. C) see if the slope coefficient is between 0.95 and 1.05.

D) check if the adjusted R2 is close to 1. Answer: A

15) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal distribution you can

A) safely assume that your regression results are significant. B) reject the null hypothesis.

C) reject the assumption that the error terms are homoskedastic.

D) conclude that most of the actual values are very close to the regression line. Answer: B

16) If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then A) a series of t-tests may or may not give you the same conclusion.

B) the regression is always significant.

C) all of the hypotheses are always simultaneously rejected. D) the F-statistic must be negative.

17) When your multiple regression function includes a single omitted variable regressor, then A) use a two-sided alternative hypothesis to check the influence of all included variables.

B) the estimator for your included regressors will be biased if at least one of the included variables is correlated with the omitted variable.

C) the estimator for your included regressors will always be biased.

D) lower the critical value to 1.645 from 1.96 in a two-sided alternative hypothesis to test the significance of the coefficients of the included variables.

Answer: B

18) A 95% confidence set for two or more coefficients is a set that contains

A) the sample values of these coefficients in 95% of randomly drawn samples. B) integer values only.

C) the same values as the 95% confidence intervals constructed for the coefficients. D) the population values of these coefficients in 95% of randomly drawn samples. Answer: D

19) When there are two coefficients, the resulting confidence sets are A) rectangles.

B) ellipses. C) squares. D) trapezoids. Answer: B

20) When testing the null hypothesis that two regression slopes are zero simultaneously, then you cannot reject the null hypothesis at the 5% level, if the ellipse contains the point

A) (-1.96, 1.96). B) (0, 1.96) . C) (0,0).

D) (1.962, 1.962). Answer: C

21) The OLS estimators of the coefficients in multiple regression will have omitted variable bias A) only if an omitted determinant of Yi is a continuous variable.

B) if an omitted variable is correlated with at least one of the regressors, even though it is not a determinant of the dependent variable.

C) only if the omitted variable is not normally distributed.

D) if an omitted determinant of Yi is correlated with at least one of the regressors. Answer: D

22) At a mathematical level, if the two conditions for omitted variable bias are satisfied, then A) E(ui X1i, X2i,..., Xki) 0.

B) there is perfect multicollinearity.

C) large outliers are likely: X1i, X2i,..., Xki and Yi and have infinite fourth moments. D) (X1i, X2i,..., Xki,Yi), i = 1,..., n are not i.i.d. draws from their joint distribution.

24) The general answer to the question of choosing the scale of the variables is A) dependent on you whim.

B) to make the regression results easy to read and to interpret.

C) to ensure that the regression coefficients always lie between-1 and 1.

D) irrelevant because regardless of the scale of the variable, the regression coefficient is unaffected. Answer: B

25) If the estimates of the coefficients of interest change substantially across specifications, A) then this can be expected from sample variation.

B) then you should change the scale of the variables to make the changes appear to be smaller. C) then this often provides evidence that the original specification had omitted variable bias. D) then choose the specification for which your coefficient of interest is most significant. Answer: C

26) You have estimated the relationship between testscores and the student -teacher ratio under the assumption of homoskedasticity of the error terms. The regression output is as follows: TestScore= 698.9 - 2.28×STR, and the standard error on the slope is 0.48. The homoskedasticity -only “overall” regression F- statistic for the

hypothesis that the Regression R2_{is zero is approximately}

A) 0.96 B) 1.96 C) 22.56 D) 4.75 Answer: C

27) Consider a regression with two variables, in which X1iis the variable of interest and X2iis the control variable.

Conditional mean independence requires A) E(ui|X1i, X2i) = E(ui|X2i)

28) The homoskedasticity-only F-statistic and the heteroskedasticity-robust F-statistic typically are A) the same

B) different

C) related by a linear function

D) a multiple of each other (the heteroskedasticity-robust F-statistic is 1.96 times the homoskedasticity-only

F-statistic)

Answer: B

29) Consider the following regression output where the dependent variable is testscores and the two explanatory variables are the student-teacher ratio and the percent of English learners:

TestScore = 698.9 - 1.10×STR - 0.650×PctEL. You are told that the t-statistic on the student-teacher ratio

coefficient is 2.56. The standard error therefore is approximately A) 0.25

B) 1.96 C) 0.650 D) 0.43 Answer: D

30) The critical value of F4, at the 5% significance level is A) 3.84

B) 2.37 C) 1.94

D) Cannot be calculated because in practice you will not have infinite number of observations Answer: B

7.2 Essays and Longer Questions

1) The F-statistic with q = 2 restrictions when testing for the restrictions 1 = 0 and 2 = 0 is given by the following formula:

F = 1

t2_{1 + t}2_{2 - 2}^_t1,t2_t1t

1- ^2_t1,t2

Discuss how this formula can be understood intuitively.

Answer: For the case when there is no correlation between the two explanatory variables, the formula reduces to a simple average of the squared t-statistics, i.e., F = 1

2 t 2

1 + t22 .The F2, distribution is the

distribution of a random variable with a chi-squared distribution with 2 degrees of freedom, divided by 2. Equivalently, the F2, distribution is the distribution of the average of 2 squared standard normal random variables. Because the t-statistics are uncorrelated by assumption, they are independent standard normal random variables under the null hypothesis. If either 1 or 2 are nonzero (or both), then either t2_{1 or t}2_{2 or both will be large. This leads to a large F-statistic, and hence a rejection of the} null hypothesis.

2) The cost of attending your college has once again gone up. Although you have been told that education is investment in human capital, which carries a return of roughly 10% a year, you (and your parents) are not pleased. One of the administrators at your university/college does not make the situation better by telling you that you pay more because the reputation of your institution is better than that of others. To investigate this hypothesis, you collect data randomly for 100 national universities and liberal arts colleges from the 2000 -2001

U.S. News and World Report annual rankings. Next you perform the following regression Cost^ = 7,311.17 + 3,985.20 × Reputation – 0.20 × Size

(2,058.63) (664.58) (0.13)

+ 8,406.79 × Dpriv – 416.38 × Dlibart – 2,376.51 × Dreligion (2,154.85) (1,121.92) (1,007.86)

(c) You want to test simultaneously the hypotheses that size = 0 and Dilbert = 0. Your regression package returns the F-statistic of 1.23. Can you reject the null hypothesis?

(d) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes

Cost^ = 5,450.35 + 3,538.84 × Reputation + 10,935.70 × Dpriv – 2,783.31 × Dreligion;

(1,772.35) (590.49) (875.51) (1,180.57)

R2=0.72, SER = 3,792.68

Why do you think that the effect of attending a private institution has increased now?

(e) You give a final attempt to bring the effect of Size back into the equation by forcing the assumption of homoskedasticity onto your estimation. The results are as follows:

Cost^ = 7,311.17 + 3,985.20 × Reputation – 0.20 × Size

(1,985.17) (593.65) (0.07)

+ 8,406.79 × Dpriv – 416.38 × Dlibart – 2,376.51 × Dreligion (1,423.59) (1,096.49) (989.23)

R2=0.72, SER = 3,682.02

Calculate the t-statistic on the Size coefficient and perform the hypothesis test that its coefficient is zero. Is this test reliable? Explain.

Answer: (a) The coefficient on liberal arts colleges, is not significantly different from zero. All other coefficients are statistically significant at conventional levels, with the exception of the size coefficient, which carries a t-statistic of 1.54, and hence is not statistically significant at the 5% level (using a one -sided alternative hypothesis).

(b) Using a one-sided alternative hypothesis, the p-value is 6.2 percent. Variables should not be eliminated simply on grounds of a statistical test. The sign of the coefficient is as expected, and its magnitude makes it important. It is best to leave the variable in the regression and let the reader decide whether or not this is convincing evidence that the size of the university matters.

(c)The critical value for F2, is 3.00 (5% level) and 4.61 (1% level). Hence you cannot reject the null hypothesis in this case.

(d) Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these variables had negative coefficients.

(e) Although the coefficient would be statistically significant in this case, the test is unreliable and should not be used for statistical inference. There is no theoretical suggestion here that the errors might be homoskedastic. Since the standard errors are quite different here, you should use the more reliable ones, i.e., the heteroskedasticity-robust.

3) In the multiple regression model with two explanatory variables

Yi = 0 + 1X1i + 2X2i + ui

the OLS estimators for the three parameters are as follows (small letters refer to deviations from means as in zi = Zi - Z):

^ 1 = n i=1 yix1i n i=1 x2_{2i -} n i=1 yix2i n i=1 x1ix2i n i=1 x 2_1i n i=1 x 2_{2i - (} n i=1 x1ix2i )2 ^ 2 = n i=1 yix2i n i=1 x2_{1i -} n i=1 yix1i n i=1 x1ix2i n i=1 x 2_1i n i=1 x 2_{2i - (} n i=1 x1ix2i )2

You have collected data for 104 countries of the world from the Penn World Tables and want to estimate the effect of the population growth rate (X1i) and the saving rate (X2i) (average investment share of GDP from 1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. The various sums needed to calculate the OLS estimates are given below:

n i=1 Yi = 33.33; n i=1 X1i= 2.025; n i=1 X2i=17.313 n i=1 y2_{i = 8.3103;} n i=1 x2_{1i = .0122;} n i=1 x 2_{2i = 0.6422} n i=1 yix1i= - 0.2304; n i=1 yix2i= 1.5676; n i=1 x1ix2i= -0.0520

The heteroskedasticity-robust standard errors of the two slope coefficients are 1.99 (for population growth) and 0.23 (for the saving rate). Calculate the 95% confidence interval for both coefficients. How many standard deviations are the coefficients away from zero?

Answer: The 95% confidence interval for the population growth is (–16.85,-9.05), and the 95% confidence interval for the saving rate is (0.94, 1.84). The population growth coefficient has a t-statistic of -6.51, and the saving rate coefficient of 6.04. These represent standard deviations away from zero.

4) A subsample from the Current Population Survey is taken, on weekly earnings of individuals, their age, and their gender. You have read in the news that women make 70 cents to the $1 that men earn. To test this hypothesis, you first regress earnings on a constant and a binary variable, which takes on a value of 1 for females and is 0 otherwise. The results were:

Earn = 570.70 - 170.72 × Female, R2=0.084, SER = 282.12.

(9.44) (13.52)

(a) Perform a difference in means test and indicate whether or not the difference in the mean salaries is significantly different. Justify your choice of a one-sided or two-sided alternative test. Are these results evidence enough to argue that there is discrimination against females? Why or why not? Is it likely that the errors are normally distributed in this case? If not, does that present a problem to your test?

(b) Test for the significance of the age and gender coefficients. Why do you think that age plays a role in earnings determination?

Answer: (a) The t-statistic is -12.63, while the critical value is –1.64. The difference is therefore statistically

significant. A one-sided alternative was chosen since the claim is that females make less than males. This represents little evidence of discrimination, since attributes of males and females have not been included. Given that earnings distributions are not normally distributed, the errors will also not be distributed normally, and assuming that they are, results in problematic inference.

(b) The t-statistics are 9.36 for the age coefficient, and -13.00 for the gender coefficient. Both of these values are greater than the (absolute) critical value from the standard normal distribution (1.64). Hence you can reject the null hypothesis that these coefficients are zero. Age proxies “on the job training.” A better proxy that has been used frequently in the past is the Mincer experience variable

5) You have collected data from Major League Baseball (MLB) to find the determinants of winning. You have a general idea that both good pitching and strong hitting are needed to do well. However, you do not know how much each of these contributes separately. To investigate this problem, you collect data for all MLB during 1999 season. Your strategy is to first regress the winning percentage on pitching quality (“Team ERA”), second to regress the same variable on some measure of hitting (“OPS – On -base Plus Slugging percentage”), and third to regress the winning percentage on both.

Summary of the Distribution of Winning Percentage, On Base plus Slugging Percentage, and Team Earned Run Average for MLB in 1999

Average Standard deviation Percentile 10% 25% 40% 50% (median) 60% 75% 90% Team ERA 4.71 0.53 3.84 4.35 4.72 4.78 4.91 5.06 5.25 OPS 0.778 0.034 0.720 0.754 0.769 0.780 0.790 0.798 0.820 Winning Percentage 0.50 0.08 0.40 0.43 0.46 0.48 0.49 0.59 0.60

The results are as follows:

Winpct= 0.94 – 0.100 × teamera , R2 = 0.49, SER = 0.06.

(0.08) (0.017)

Winpct = –0.68 + 1.513 × ops, R2=0.45, SER = 0.06.

(0.17) (0.221)

Winpct = –0.19 – 0.099 × teamera + 1.490 × ops , R2=0.92, SER = 0.02.

(0.08) (0.008) (0.126)

(a) Use the t-statistic to test for the statistical significance of the coefficient.

(b) There are 30 teams in MLB. Does the small sample size worry you here when testing for significance? Answer: (a) The t-statistics for team ERA and OPS are -12.38 and 11.83. Both of these are highly significant.

6) In the process of collecting weight and height data from 29 female and 81 male students at your university, you also asked the students for the number of siblings they have. Although it was not quite clear to you initially what you would use that variable for, you construct a new theory that suggests that children who have more siblings come from poorer families and will have to share the food on the table. Although a friend tells you that this theory does not pass the “straight-face” test, you decide to hypothesize that peers with many siblings will weigh less, on average, for a given height. In addition, you believe that the muscle/fat tissue composition of male bodies suggests that females will weigh less, on average, for a given height. To test these theories, you perform the following regression:

Studentw = –229.92 – 6.52 × Female + 0.51 × Sibs+ 5.58 × Height,

(44.01) (5.52) (2.25) (0.62)

R2=0.50, SER = 21.08

where Studentw is in pounds, Height is in inches, Female takes a value of 1 for females and is 0 otherwise, Sibs is the number of siblings (heteroskedasticity-robust standard errors in parentheses).

(a) Carrying out hypotheses tests using the relevant t-statistics to test your two claims separately, is there strong evidence in favor of your hypotheses? Is it appropriate to use two separate tests in this situation?

(b) You also perform an F-test on the joint hypothesis that the two coefficients for females and siblings are zero. The calculated F-statistic is 0.84. Find the critical value from the F-table. Can you reject the null hypothesis? Is it possible that one of the two parameters is zero in the population, but not the other?

(c) You are now a bit worried that the entire regression does not make sense and therefore also test for the height coefficient to be zero. The resulting F-statistic is 57.25. Does that prove that there is a relationship between weight and height?

Answer: (a) The t-statistics for gender and number of siblings are -1.18 and 0.23 respectively. Neither coefficient is statistically significant at conventional levels. If you wanted to test the two hypothesis simultaneously, then you should use an F-test.

(b) The critical value is 3.00 at the 5% level, and 4.61 at the 1% level. Hence you cannot reject the null hypothesis. The hypothesis is that both coefficients are zero, and this cannot be rejected. Had you rejected the null hypothesis, then the alternative hypothesis states that one or both of the restrictions do not hold. (c) Although you cannot prove anything in this context with certainty, there is a very high probability that there is a relationship between height and weight in the population, given the sample result. The critical value from the F-table is 3.78 at the 1% level.

7) You have collected data for 104 countries to address the difficult questions of the determinants for differences in the standard of living among the countries of the world. You recall from your macroeconomics lectures that the neoclassical growth model suggests that output per worker (per capita income) levels are determined by, among others, the saving rate and population growth rate. To test the predictions of this growth model, you run the following regression:

RelPersInc = 0.339 – 12.894 × n + 1.397 × SK , R2=0.621, SER = 0.177

(0.068) (3.177) (0.229)

where RelPersInc is GDP per worker relative to the United States, n is the average population growth rate, 1980-1990, and SK is the average investment share of GDP from 1960 to1990 (remember investment equals saving). Numbers in parentheses are for heteroskedasticity-robust standard errors.

(a) Calculate the t-statistics and test whether or not each of the population parameters are significantly different from zero.

(b) The overall F-statistic for the regression is 79.11. What is the critical value at the 5% and 1% level? What is your decision on the null hypothesis?

(c) You remember that human capital in addition to physical capital also plays a role in determining the standard of living of a country. You therefore collect additional data on the average educational attainment in years for 1985, and add this variable (Educ) to the above regression. This results in the modified regression

In document fgghhdh gdd hh (Page 154-183)