Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals 5.1 Multiple Choice

1) Heteroskedasticity means that

A) homogeneity cannot be assumed automatically for the model. B) the variance of the error term is not constant.

C) the observed units have different preferences. D) agents are not all rational.

Answer: B

2) With heteroskedastic errors, the weighted least squares estimator is BLUE. You should use OLS with heteroskedasticity-robust standard errors because

A) this method is simpler.

B) the exact form of the conditional variance is rarely known. C) the Gauss-Markov theorem holds.

D) your spreadsheet program does not have a command for weighted least squares. Answer: B

3) When estimating a demand function for a good where quantity demanded is a linear function of the price, you should

A) not include an intercept because the price of the good is never zero.

B) use a one-sided alternative hypothesis to check the influence of price on quantity. C) use a two-sided alternative hypothesis to check the influence of price on quantity. D) reject the idea that price determines demand unless the coefficient is at least 1.96. Answer: B

4) The t-statistic is calculated by dividing A) the OLS estimator by its standard error.

B) the slope by the standard deviation of the explanatory variable.

C) the estimator minus its hypothesized value by the standard error of the estimator. D) the slope by 1.96.

Answer: C

5) The confidence interval for the sample regression function slope

A) can be used to conduct a test about a hypothesized population regression function slope. B) can be used to compare the value of the slope relative to that of the intercept.

C) adds and subtracts 1.96 from the slope.

D) allows you to make statements about the economic importance of your estimate. Answer: A

6) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal distribution, you can

A) reject the null hypothesis.

B) safely assume that your regression results are significant. C) reject the assumption that the error terms are homoskedastic.

D) conclude that most of the actual values are very close to the regression line. Answer: A

7) Under the least squares assumptions (zero conditional mean for the error term, Xi and Yi being i.i.d., and Xi and ui having finite fourth moments), the OLS estimator for the slope and intercept

A) has an exact normal distribution for n > 15. B) is BLUE.

C) has a normal distribution even in small samples. D) is unbiased.

Answer: D

8) In general, the t-statistic has the following form: A) estimate-hypothesize value

standard error of estimate

B) estimator

standard error of estimator C) estimator-hypothesize value

standard error of estimator D) estimator-hypothesize value

standard error of estimator

Answer: C

9) Consider the following regression line: TestScore = 698.9 – 2.28 × STR. You are told that the t-statistic on the slope coefficient is 4.38. What is the standard error of the slope coefficient?

A) 0.52 B) 1.96 C) -1.96 D) 4.38 Answer: A

10) Imagine that you were told that the t-statistic for the slope coefficient of the regression line TestScore = 698.9 – 2.28 × STR was 4.38. What are the units of measurement for the t-statistic?

A) points of the test score

B) number of students per teacher C) TestScore

STR

D) standard deviations Answer: D

11) The construction of the t-statistic for a one- and a two-sided hypothesis A) depends on the critical value from the appropriate distribution.

B) is the same.

C) is different since the critical value must be 1.645 for the one-sided hypothesis, but 1.96 for the two-sided hypothesis (using a 5% probability for the Type I error).

13) The 95% confidence interval for 1 is the interval A) ( 1 - 1.96SE)( 1), 1 + 1.96SE( 1)). B) (^_{1 - 1.645SE)(}^_1),^_{1 + 1.645SE(}^_1)). C) (^_{1 - 1.96SE)(}^_1),^_{1 + 1.96SE(}^_1)). D) (^_{1 - 1.96,}^_{1 + 1.96).} Answer: C

14) The 95% confidence interval for 0 is the interval A) ( 0 - 1.96SE( 0), 0 + 1.96SE( 0)).

B) ( 0 - 1.645SE(^0), ^0 + 1.645SE(^0)). C) (^_{0 - 1.96SE(}^_0), ^_{0 + 1.96SE(}^_0)). D) (^_{0 - 1.96,}^_{0 + 1.96).}

Answer: C

15) The 95% confidence interval for the predicted effect of a general change in X is A) ( 1 x - 1.96SE( 1) × x, 1 x + 1.96SE( 1) × x).

B) (^_{1 x - 1.645SE(}^_{1) ×} x,^_{1 x + 1.645SE(}^_{1) ×} x).

C) (^_{1 x - 1.96SE(}^_{1) ×} x,^_{1 x + 1.96SE(}^_{1) ×} x).

D) (^_{1 x - 1.96,}^_{1 x + 1.96).} Answer: C

16) The homoskedasticity-only estimator of the variance of ^_{1 is}

A) S2_u^ n i=1 Xi - X2 . B) S u^ n i=1 Xi - X2 . C) S2_u^ n i=1 X2_{i - X} . D) 1 n× 1 n-2 n i=1 Xi - X2 u^ 2_i 1 n n i=1 Xi - X2 2 .

17) One of the following steps is not required as a step to test for the null hypothesis: A) compute the standard error of ^_1.

B) test for the errors to be normally distributed. C) compute the t-statistic.

D) compute the p-value. Answer: B

18) Finding a small value of the p-value (e.g. less than 5%) A) indicates evidence in favor of the null hypothesis.

B) implies that the t-statistic is less than 1.96. C) indicates evidence in against the null hypothesis. D) will only happen roughly one in twenty samples. Answer: C

19) The only difference between a one- and two-sided hypothesis test is A) the null hypothesis.

B) dependent on the sample size n. C) the sign of the slope coefficient. D) how you interpret the t-statistic. Answer: D

20) A binary variable is often called a A) dummy variable.

B) dependent variable. C) residual.

D) power of a test. Answer: A

21) The error term is homoskedastic if

A) var(ui Xi = x) is constant for i = 1,…, n. B) var(ui Xi = x) depends on x.

C) Xi is normally distributed. D) there are no outliers. Answer: A

22) In the presence of heteroskedasticity, and assuming that the usual least squares assumptions hold, the OLS estimator is

A) efficient. B) BLUE.

C) unbiased and consistent. D) unbiased but not consistent. Answer: C

24) If the errors are heteroskedastic, then A) OLS is BLUE.

B) WLS is BLUE if the conditional variance of the errors is known up to a constant factor of proportionality. C) LAD is BLUE if the conditional variance of the errors is known up to a constant factor of proportionality. D) OLS is efficient.

Answer: B

25) The homoskedastic normal regression assumptions are all of the following with the exception of: A) the errors are homoskedastic.

B) the errors are normally distributed. C) there are no outliers.

D) there are at least 10 observations. Answer: D

26) Using the textbook example of 420 California school districts and the regression of testscores on the student-teacher ratio, you find that the standard error on the slope coefficient is 0.51 when using the

heteroskedasticity robust formula, while it is 0.48 when employing the homoskedasticity only formula. When calculating the t-statistic, the recommended procedure is to

A) use the homoskedasticity only formula because the t-statistic becomes larger B) first test for homoskedasticity of the errors and then make a decision

C) use the heteroskedasticity robust formula

D) make a decision depending on how much different the estimate of the slope is under the two procedures Answer: C

27) Consider the estimated equation from your textbook

TestScore=698.9 - 2.28 STR, R2= 0.051, SER = 18.6

(10.4) (0.52)

The t-statistic for the slope is approximately A) 4.38

B) 67.20 C) 0.52 D) 1.76 Answer: A

28) You have collected data for the 50 U.S. states and estimated the following relationship between the change in the unemployment rate from the previous year ( ur) and the growth rate of the respective state real GDP (gy).

The results are as follows

ur= 2.81 — 0.23 gy, R2= 0.36, SER = 0.78

(0.12) (0.04)

Assuming that the estimator has a normal distribution, the 95% confidence interval for the slope is approximately the interval

A) [2.57, 3.05] B) [-0.31,0.15] C) [-0.31, -0.15] D) [-0.33, -0.13] Answer: C

29) Using 143 observations, assume that you had estimated a simple regression function and that your estimate for the slope was 0.04, with a standard error of 0.01. You want to test whether or not the estimate is statistically significant. Which of the following possible decisions is the only correct one:

A) you decide that the coefficient is small and hence most likely is zero in the population B) the slope is statistically significant since it is four standard errors away from zero

C) the response of Y given a change in X must be economically important since it is statistically significant D) since the slope is very small, so must be the regression R2_.

Answer: B

30) You extract approximately 5,000 observations from the Current Population Survey (CPS) and estimate the following regression function:

ahe= 3.32 — 0.45 Age, R2= 0.02, SER = 8.66

(1.00) (0.04)

where ahe is average hourly earnings, and Age is the individual’s age. Given the specification, your 95% confidence interval for the effect of changing age by 5 years is approximately

A) [$1.96, $2.54] B) [$2.32, $4.32] C) [$1.35, $5.30]

D) cannot be determined given the information provided Answer: A

5.2 Essays and Longer Questions

1) (Continuation from Chapter 4) Sir Francis Galton, a cousin of James Darwin, examined the relationship between the height of children and their parents towards the end of the 19 th century. It is from this study that the name “regression” originated. You decide to update his findings by collecting data from 110 college students, and estimate the following relationship:

Studenth = 19.6 + 0.73 × Midparh, R2 = 0.45, SER = 2.0

(7.2) (0.10)

where Studenth is the height of students in inches, and Midparh is the average of the parental heights. Values in parentheses are heteroskedasticity robust standard errors. (Following Galton’s methodology, both variables were adjusted so that the average female height was equal to the average male height.)

(a) Test for the statistical significance of the slope coefficient.

(b) If children, on average, were expected to be of the same height as their parents, then this would imply two hypotheses, one for the slope and one for the intercept.

(i) What should the null hypothesis be for the intercept? Calculate the relevant t-statistic and carry out the hypothesis test at the 1% level.

(ii) What should the null hypothesis be for the slope? Calculate the relevant t-statistic and carry out the hypothesis test at the 5% level.

(c) Can you reject the null hypothesis that the regression R2 is zero?

2) (Requires Appendix) (Continuation from Chapter 4) At a recent county fair, you observed that at one stand people’s weight was forecasted, and were surprised by the accuracy (within a range). Thinking about how the person could have predicted your weight fairly accurately (despite the fact that she did not know about your “heavy bones”), you think about how this could have been accomplished. You remember that medical charts for children contain 5%, 25%, 50%, 75% and 95% lines for a weight/height relationship and decide to conduct an experiment with 110 of your peers. You collect the data and calculate the following sums:

n i=1 Yi = 17,375, n i=1 Xi = 7,665.5, n i=1 y2_{i = 94,228.8,} n i=1 x2_{i = 1,248.9,} n i=1 xiyi= 7,625.9

where the height is measured in inches and weight in pounds. (Small letters refer to deviations from means as in zi = Zi – Z.)

(a) Calculate the homoskedasticity-only standard errors and, using the resulting t-statistic, perform a test on the null hypothesis that there is no relationship between height and weight in the population of college students.

(b) What is the alternative hypothesis in the above test, and what level of significance did you choose? (c) Statistics and econometrics textbooks often ask you to calculate critical values based on some level of significance, say 1%, 5%, or 10%. What sort of criteria do you think should play a role in determining which level of significance to choose?

(d) What do you think the relationship is between testing for the significance of the slope and whether or not the regression R2 is zero?

Answer: (a) The formula for the homoskedasticity-only standard errors requires knowledge of the residual variance. But S2_u^ = 1

n-2SSR, and SSR=TSS-ESS. Given the result in (2b), SSR=47,604.7, and hence S

u^ =

440.78. The SER is 21.00. Dividing by the square root of the variation in X then results in the

homoskedasticity-only standard error of the slope, which is 0.594. The t-statistic is 10.29, which rejects the null hypothesis of no relationship.

(b) The alternative hypothesis should be one-sided, since there is strong prior knowledge that taller people weigh more, on average. Given the size of the t-statistic, the null hypothesis can be rejected at any reasonable level of significance.

(c) Clearly the levels should not be picked arbitrarily, but should depend on the cost involved with the size and the power of the test. Consider a person who was accused of murder. In that case, the null hypothesis is that he is innocent. The size of the test would be the probability of letting an innocent person go to the electric chair, while (1-power of the test) gives the probability of letting a murderer go free. There are obviously vastly different costs attached to each error, and these will determine the levels chosen.

(d) If the slope in a regression function is zero, then there is no relationship between the two variables involved. Hence testing for the significance of the regression slope is the same as testing whether or not the regression R2 is zero.

3) You have obtained measurements of height in inches of 29 female and 81 male students ( Studenth) at your university. A regression of the height on a constant and a binary variable ( BFemme), which takes a value of one for females and is zero otherwise, yields the following result:

Studenth = 71.0 – 4.84×BFemme , R2 = 0.40, SER = 2.0

(0.3) (0.57)

(a) What is the interpretation of the intercept? What is the interpretation of the slope? How tall are females, on average?

(b) Test the hypothesis that females, on average, are shorter than males, at the 1% level. (c) Is it likely that the error term is homoskedastic here?

Answer: (a) The intercept gives you the average height of males, which is 71 inches in this sample. The slope tells you by how much shorter females are, on average (almost 5 inches). The average height of females is therefore approximately 66 inches.

(b) The t-statistic for the difference in means is -8.49. For a one-sided test, the critical value is –2.33. Hence the difference is statistically significant.

(c) It is safer to assume that the variances for males and females are different. In the underlying sample the standard deviation for females was smaller.

4) (continuation from Chapter 4, number 3) You have obtained a sub -sample of 1744 individuals from the Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age. The regression, using heteroskedasticity-robust standard errors, yielded the following result:

Earn= 239.16 + 5.20×Age , R2 = 0.05, SER = 287.21.,

(20.24) (0.57)

where Earn and Age are measured in dollars and years respectively. (a) Is the relationship between Age and Earn statistically significant?

(b) The variance of the error term and the variance of the dependent variable are related. Given the distribution of earnings, do you think it is plausible that the distribution of errors is normal?

Answer: (a) The t-statistic on the slope is 9.12, which is above the critical value from the standard normal distribution for any reasonable level of significance.

(b) Since the earnings distribution is highly skewed, it is not reasonable to assume that the error distribution is normal.

5) (Continuation from Chapter 4, number 5) You have learned in one of your economics courses that one of the determinants of per capita income (the “Wealth of Nations”) is the population growth rate. Furthermore you also found out that the Penn World Tables contain income and population data for 104 countries of the world. To test this theory, you regress the GDP per worker (relative to the United States) in 1990 ( RelPersInc) on the difference between the average population growth rate of that country (n) to the U.S. average population growth rate (nus ) for the years 1980 to 1990. This results in the following regression output:

RelPersInc= 0.518 – 18.831×(n – nus) , R2=0.522, SER = 0.197 (0.056) (3.177)

(a) Is there any reason to believe that the variance of the error terms is homoskedastic? (b) Is the relationship statistically significant?

Answer: (a) There are vast differences in the size of these countries, both in terms of the population and GDP. Furthermore, the countries are at different stages of economic and institutional development. Other factors vary as well. It would therefore be odd to assume that the errors would be homoskedastic. (b) The t-statistic is 5.93, making the relationship statistically significant, i.e., we can reject the null hypothesis that the slope is different from zero.

6) You recall from one of your earlier lectures in macroeconomics that the per capita income depends on the savings rate of the country: those who save more end up with a higher standard of living. To test this theory, you collect data from the Penn World Tables on GDP per worker relative to the United States ( RelProd) in 1990 and the average investment share of GDP from 1980-1990 (SK ), remembering that investment equals saving. The regression results in the following output:

RelProd = –0.08 + 2.44×SK , R2=0.46, SER = 0.21 (0.04) (0.38)

(a) Interpret the regression results carefully.

(b) Calculate the t-statistics to determine whether the two coefficients are significantly different from zero. Justify the use of a one-sided or two-sided test.

(c) You accidentally forget to use the heteroskedasticity-robust standard errors option in your regression package and estimate the equation using homoskedasticity-only standard errors. This changes the results as follows:

RelProd = -0.08 + 2.44×SK , R2=0.46, SER = 0.21 (0.04) (0.26)

You are delighted to find that the coefficients have not changed at all and that your results have become even more significant. Why haven’t the coefficients changed? Are the results really more significant? Explain. (d) Upon reflection you think about the advantages of OLS with and without homoskedasticity -only standard errors. What are these advantages? Is it likely that the error terms would be heteroskedastic in this situation? Answer: (a) An increase in the saving rate of 0.1, or from 0.15 to 0.25, results in an increase in relative GDP per

worker of 0.244, or from 0.5 to roughly 0.75. (Taiwan had a value of 0.5 for RelProd in 1990, while Sweden was at 0.77.) There is no interpretation for the intercept. The regression explains 46 percent of the variation in GDP per worker relative to the United States.

(b) The t- statistics are 2.00 and 6.42 for the intercept and slope respectively. You should use a two -sided test for the intercept, since there are no prior expectations on whether it should be positive or negative. Hence the intercept is statistically significant at the 5 percent level, but not at the 1 percent level. Since we expect a positive sign on the slope, we should conduct a one-sided test. The critical values suggest

versus heteroskedasticity in the textbook, it is safer to conduct inference under the assumption of heteroskedasticity.

(d) In the presence of homoskedasticity in addition to the least squares assumptions in the text, OLS is BLUE (Gauss-Markov theorem). If the errors are heteroskedastic, then the GLS estimator (weighted least squares) is BLUE if the form of heteroskedasticity is known, which rarely occurs in practice. Since economic theory does not suggest, in general, that errors are homoskedastic, it is safer to assume that they are not. This avoids invalid statistical inference.

7) Carefully discuss the advantages of using heteroskedasticity-robust standard errors over standard errors calculated under the assumption of homoskedasticity. Give at least five examples where it is very plausible to assume that the errors display heteroskedasticity.

Answer: There are virtually no examples where economic theory suggests that the errors are homoskedastic. Hence the maintained hypothesis should be that they are heteroskedastic. Using homoskedasticity-only standard errors when in truth heteroskedasticity-robust standard errors should be used, results in false inference. What makes this worse is that homoskedasticity-only standard errors are typically smaller than heteroskedasticity-robust standard errors, resulting in t-statistics that are too large, and hence rejection of the null hypothesis too often. There is an alternative GLS estimator, weighted least squares, which is BLUE, but requires knowledge of how the error variance depends on X, e.g. X or X2. Answers will vary by student regarding the examples, but earnings functions, cross country beta -convergence regressions, consumption functions, sports regressions involving teams from markets with varying population size, weight-height relationships for children, etc., are all good candidates.

8) (Requires Appendix material from Chapters 4 and 5) Shortly before you are making a group presentation on the testscore/student-teacher ratio results, you realize that one of your peers forgot to type all the relevant information on one of your slides. Here is what you see:

TestScore = 698.9 – STR R2 = 0.051, SER = 18.6

(9.47) (0.48)

In addition, your group member explains that he ran the regression in a standard spreadsheet program, and that, as a result, the standard errors in parenthesis are homoskedasticity-only standard errors.

(a) Find the value for the slope coefficient.

(b) Calculate the t-statistic for the slope and the intercept. Test the hypothesis that the intercept and the slope are different from zero.

In document fgghhdh gdd hh (Page 96-122)