1) In nonlinear models, the expected change in the dependent variable for a change in one of the explanatory variables is given by A) Y = f(X1 + X1, X2,... Xk). B) Y = f(X1 + X1, X2 + X2,..., Xk+ Xk)- f(X1, X2,...Xk). C) Y = f(X1 + X1, X2,..., Xk)- f(X1, X2,...Xk). D) Y = f(X1 + X1, X2,..., Xk)- f(X1, X2,...Xk). Answer: C
2) The interpretation of the slope coefficient in the model Yi = 0 + 1 ln(Xi) + ui is as follows: A) a 1% change in X is associated with a 1 % change in Y.
B) a 1% change in X is associated with a change in Y of 0.01 1.
C) a change in X by one unit is associated with a 1 100% change in Y. D) a change in X by one unit is associated with a 1 change in Y. Answer: B
3) The interpretation of the slope coefficient in the model ln(Yi) = 0 + 1Xi + ui is as follows: A) a 1% change in X is associated with a 1 % change in Y.
B) a change in X by one unit is associated with a 100 1 % change in Y. C) a 1% change in X is associated with a change in Y of 0.01 1. D) a change in X by one unit is associated with a 1 change in Y. Answer: B
4) The interpretation of the slope coefficient in the model ln(Yi) = 0 + 1 ln(Xi)+ ui is as follows: A) a 1% change in X is associated with a 1 % change in Y.
B) a change in X by one unit is associated with a 1 change in Y. C) a change in X by one unit is associated with a 100 1 % change in Y. D) a 1% change in X is associated with a change in Y of 0.01 1. Answer: A
5) In the case of regression with interactions, the coefficient of a binary variable should be interpreted as follows: A) there are really problems in interpreting these, since the ln(0) is not defined.
B) for the case of interacted regressors, the binary variable coefficient represents the various intercepts for the case when the binary variable equals one.
C) first set all explanatory variables to one, with the exception of the binary variables. Then allow for each of the binary variables to take on the value of one sequentially. The resulting predicted value indicates the effect of the binary variable.
D) first compute the expected values of Y for each possible case described by the set of binary variables. Next compare these expected values. Each coefficient can then be expressed either as an expected value or as the difference between two or more expected values.
7) An example of the interaction term between two independent, continuous variables is A) Yi = 0 + 1Xi + 2Di + 3(Xi × Di) + ui.
B) Yi = 0 + 1X1i + 2X2i + ui.
C) Yi = 0 + 1D1i + 2D2i + 3 (D1i × D2i) + ui. D) Yi = 0 + 1X1i + 2X2i + 3(X1i × X2i) + ui. Answer: D
8) Including an interaction term between two independent variables, X1 and X2, allows for the following except: A) the interaction term lets the effect on Y of a change in X1 depend on the value of X2.
B) the interaction term coefficient is the effect of a unit increase in X1 and X2 above and beyond the sum of the individual effects of a unit increase in the two variables alone.
C) the interaction term coefficient is the effect of a unit increase in (X1 × X2).
D) the interaction term lets the effect on Y of a change in X2 depend on the value of X1. Answer: C
9) A nonlinear function
A) makes little sense, because variables in the real world are related linearly.
B) can be adequately described by a straight line between the dependent variable and one of the explanatory variables.
C) is a concept that only applies to the case of a single or two explanatory variables since you cannot draw a line in four dimensions.
D) is a function with a slope that is not constant. Answer: C
10) An example of a quadratic regression model is A) Yi = 0 + 1X + 2Y2 + ui.
B) Yi = 0 + 1ln(X) + ui. C) Yi = 0 + 1X + 2X2 + ui. D) Y2i = 0 + 1X + ui. Answer: C
11) (Requires Calculus) In the equation TestScore= 607.3 + 3.85 Income – 0.0423Income2, the following income level results in the maximum test score
A) 607.3. B) 91.02. C) 45.50.
D) cannot be determined without a plot of the data. Answer: C
12) To decide whether Yi = 0 + 1X + ui or ln(Yi) = 0 + 1X + ui fits the data better, you cannot consult the regression R2 because
A) ln(Y) may be negative for 0<Y<1.
B) the TSS are not measured in the same units between the two models.
C) the slope no longer indicates the effect of a unit change of X on Y in the log-linear model. D) the regression R2 can be greater than one in the second model.
13) You have estimated the following equation:
TestScore= 607.3 + 3.85 Income – 0.0423 Income2 ,
where TestScore is the average of the reading and math scores on the Stanford 9 standardized test administered to 5th grade students in 420 California school districts in 1998 and 1999. Income is the average annual per capita income in the school district, measured in thousands of 1998 dollars. The equation
A) suggests a positive relationship between test scores and income for most of the sample. B) is positive until a value of Income of 610.81.
C) does not make much sense since the square of income is entered.
D) suggests a positive relationship between test scores and income for all of the sample. Answer: A
14) A polynomial regression model is specified as: A) Yi = 0 + 1Xi + 2X2i+ ··· + rXri + ui.
B) Yi = 0 + 1Xi + 21Xi + ··· + 1rXi + ui.
C) Yi = 0 + 1Xi + 2Y2i+ ··· + rYri + ui. D) Yi = 0 + 1X1i + 2X2 + 3 (X1i × X2i) + ui. Answer: A
15) For the polynomial regression model,
A) you need new estimation techniques since the OLS assumptions do not apply any longer. B) the techniques for estimation and inference developed for multiple regression can be applied. C) you can still use OLS estimation techniques, but the t-statistics do not have an asymptotic normal
distribution.
D) the critical values from the normal distribution have to be changed to 1.962, 1.963, etc. Answer: B
16) To test whether or not the population regression function is linear rather than a polynomial of order r,
A) check whether the regression R2 for the polynomial regression is higher than that of the linear regression. B) compare the TSS from both regressions.
C) look at the pattern of the coefficients: if they change from positive to negative to positive, etc., then the polynomial regression should be used.
D) use the test of (r-1) restrictions using the F-statistic. Answer: D
17) The best way to interpret polynomial regressions is to A) take a derivative of Y with respect to the relevant X.
B) plot the estimated regression function and to calculate the estimated effect on Y associated with a change in X for one or more values of X.
18) The exponential function
A) is the inverse of the natural logarithm function.
B) does not play an important role in modeling nonlinear regression functions in econometrics. C) can be written as exp(ex).
D) is ex, where e is 3.1415…. Answer: A
19) The following are properties of the logarithm function with the exception of A) ln(1/ x)= -ln(x).
B) ln(a + x) = ln(a) + ln(x). C) ln(ax)= ln(a) + ln(x). D) ln(xa) a ln(x). Answer: B
20) The binary variable interaction regression
A) can only be applied when there are two binary variables, but not three or more. B) is the same as testing for differences in means.
C) cannot be used with logarithmic regression functions because ln(0) is not defined.
D) allows the effect of changing one of the binary independent variables to depend on the value of the other binary variable.
Answer: D
21) In the regression model Yi = 0 + 1Xi + 2Di + 3(Xi × Di) + ui, where X is a continuous variable and D is a binary variable, 3
A) indicates the slope of the regression when D=1.
B) has a standard error that is not normally distributed even in large samples since D is not a normally distributed variable.
C) indicates the difference in the slopes of the two regressions. D) has no meaning since (Xi × Di) = 0 when Di = 0.
Answer: C
22) In the regression model Yi = 0 + 1Xi + 2Di + 3(Xi × Di) + ui, where X is a continuous variable and D is a binary variable, 2
A) is the difference in means in Y between the two categories. B) indicates the difference in the intercepts of the two regressions. C) is usually positive.
D) indicates the difference in the slopes of the two regressions. Answer: B
23) In the regression model Yi = 0 + 1Xi + 2Di + 3(Xi × Di) + ui, where X is a continuous variable and D is a binary variable, to test that the two regressions are identical, you must use the
A) t-statistic separately for 2 = 0, 2 = 0.
B) F-statistic for the joint hypothesis that 0 = 0, 1 = 0. C) t-statistic separately for 3 = 0.
D) F-statistic for the joint hypothesis that 2 = 0, 3= 0. Answer: D
24) In the model Yi = 0 + 1X1 + 2X2 + 3(X1 × X2) + ui, the expected effect Y X1 is A) 1 + 3X2. B) 1. C) 1 + 3. D) 1 + 3X1. Answer: A
25) In the log-log model, the slope coefficient indicates A) the effect that a unit change in X has on Y.
B) the elasticity of Y with respect to X. C) Y / X.
D) Y
X× YX.
Answer: B
26) In the model ln(Yi) = 0 + 1Xi + ui, the elasticity of E(Y|X) with respect to X is A) 1X
B) 1
C) 1X
0+ 1X
D) cannot be calculated because the function is non-linear Answer: A
27) Assume that you had estimated the following quadratic regression model
TestScore= 607.3 + 3.85 Income - 0.0423 Income2. If income increased from 10 to 11 ($10,000 to $11,000), then the predicted effect on testscores would be
A) 3.85 B) 3.85-0.0423
C) Cannot be calculated because the function is non-linear D) 2.96
Answer: D
28) Consider the polynomial regression model of degree Yi = 0 + 1Xi + 2 X2i+ ...+ rX r
i + ui. According to the null hypothesis that the regression is linear and the alternative that is a polynomial of degree r corresponds to
A) H0: r = 0 vs. r 0 B) H0: r = 0 vs. 1 0
C) H0: 3 = 0, ..., r = 0, vs. H1: all j 0, j = 3, ..., r
D) H0: 2 = 0, 3 = 0 ..., r = 0, vs. H1: at least one j 0, j = 2, ..., r Answer: D
30) Consider the population regression of log earnings [Yi, where Yi = ln(Earningsi)] against two binary variables: whether a worker is married (D1i, where D1i=1 if the ith person is married) and the worker’s gender (D2i, where D2i=1 if the ith person is female), and the product of the two binary variables Yi = 0 + 1D1i + 2D2i +
3(D1i×D2i) + ui. The interaction term
A) allows the population effect on log earnings of being married to depend on gender B) does not make sense since it could be zero for married males
C) indicates the effect of being married on log earnings
D) cannot be estimated without the presence of a continuous variable Answer: A
8.2 Essays and Longer Questions
1) Females, it is said, make 70 cents to the dollar in the United States. To investigate this phenomenon, you collect data on weekly earnings from 1,744 individuals, 850 females and 894 males. Next, you calculate their average weekly earnings and find that the females in your sample earned $346.98, while the males made $517.70. (a) Calculate the female earnings in percent of the male earnings. How would you test whether or not this difference is statistically significant? Give two approaches.
(b) A peer suggests that this is consistent with the idea that there is discrimination against females in the labor market. What is your response?
(c) You recall from your textbook that additional years of experience are supposed to result in higher earnings. You reason that this is because experience is related to “on the job training.” One frequently used measure for (potential) experience is “Age-Education-6.” Explain the underlying rationale. Assuming, heroically, that education is constant across the 1,744 individuals, you consider regressing earnings on age and a binary variable for gender. You estimate two specifications initially:
Earn = 323.70 + 5.15 × Age – 169.78 × Female, R2=0.13, SER=274.75
(21.18) (0.55) (13.06)
Ln(Earn) = 5.44 + 0.015 × Age – 0.421 × Female, R2=0.17, SER=0.75
(0.08) (0.002) (0.036)
where Earn are weekly earnings in dollars, Age is measured in years, and Female is a binary variable, which takes on the value of one if the individual is a female and is zero otherwise. Interpret each regression carefully. For a given age, how much less do females earn on average? Should you choose the second specification on grounds of the higher regression R2?
(d) Your peer points out to you that age-earning profiles typically take on an inverted U-shape. To test this idea, you add the square of age to your log-linear regression.
Ln(Earn) = 3.04 + 0.147 × Age – 0.421 × Female – 0.0016 Age2,
(0.18) (0.009) (0.033) (0.0001)
R2 =0.28, SER=0.68
Interpret the results again. Are there strong reasons to assume that this specification is superior to the previous one? Why is the increase of the Age coefficient so large relative to its value in (c)?
(e) What other factors may play a role in earnings determination?
Answer: (a) Female earnings are at 67 percent of male earnings. The difference in means test described in section 3.4 of the text. The t-statistic for comparison of two means is given in equation (3.20), which is one way to test for statistical significance. The alternative is to run a regression of earnings on a constant and a binary variable, which takes on the value of one for females and is zero otherwise. Using a t-test on the slope of the binary variable amounts to the same test as the difference in means (section 4.7 in the text). (b) Differences in attributes of the individuals, such as education, ability, and tenure with an employer, have not been taken into account. Hence, in itself, this is weak evidence, at best, for discrimination.
individual’s life. Females earn approximately 42.1 percent less than males at a given age. Again, the intercept should not be interpreted. The regression explains 17 percent of the variation in the log of earnings. You should not prefer this specification over the linear one on grounds of the higher regression
R2 since these cannot be compared as a result of the difference in the units of measurement of the
dependent variable.
(d) The coefficient on the added variable is statistically significant and has resulted in a substantial increase in the regression R2. The increase in the Age coefficient is due to the fact that earnings increase more initially than later in life or, mathematically speaking, it compensates for the negative coefficient on
Age2, which lowers earnings as individuals become older.
(e) Students’ answers will differ, but education, ability, regional differences, race, and professional choice are often mentioned.
2) An extension of the Solow growth model that includes human capital in addition to physical capital, suggests that investment in human capital (education) will increase the wealth of a nation (per capita income). To test this hypothesis, you collect data for 104 countries and perform the following regression:
RelPersInc= 0.046 – 5.869 × gpop + 0.738 × SK + 0.055 × Educ, R2=0.775, SER = 0.1377
(0.079) (2.238) (0.294) (0.010)
where RelPersInc is GDP per worker relative to the United States, gpop is the average population growth rate, 1980 to1990, sK is the average investment share of GDP from 1960 to1990, and Educ is the average educational attainment in years for 1985. Numbers in parentheses are for heteroskedasticity-robust standard errors. (a) Interpret the results and indicate whether or not the coefficients are significantly different from zero. Do the coefficients have the expected sign?
(b) To test for equality of the coefficients between the OECD and other countries, you introduce a binary variable (DOECD), which takes on the value of one for the OECD countries and is zero otherwise. To conduct the test for equality of the coefficients, you estimate the following regression:
RelPersInc = -0.068 – 0.063 × gpop + 0.719 × SK + 0.044 × Educ,
(0.072) (2.271) (0.365) (0.012)
0.381 × DOECD – 8.038 × (DOECD × gpop)- 0.430 × (DOECD × SK)
(0.184) (5.366) (0.768)
+0.003 × (DOECD × Educ), R2=0.845, SER = 0.116 (0.018)
Write down the two regression functions, one for the OECD countries, the other for the non -OECD countries. The F- statistic that all coefficients involving DOECD are zero, is 6.76. Find the corresponding critical value from the F table and decide whether or not the coefficients are equal across the two sets of countries.
(c) Given your answer in the previous question, you want to investigate further. You first force the same slopes across all countries, but allow the intercept to differ. That is, you reestimate the above regression but set
DOECD×gpop = DOECD×SK = DOECD×Educ = 0. The t-statistic for DOECD is 4.39. Is the coefficient, which
was 0.241, statistically significant?
(d) Your final regression allows the slopes to differ in addition to the intercept. The F-statistic for
DOECD×gpop = DOECD×SK = DOECD×Educ = 0 is 1.05. What is your decision? Each one of the t-statistics
is also smaller than the critical value from the standard normal table. Which test should you use? (e) Looking at the tests in the two previous questions, what is your conclusion?
Answer: (a) A one percentage point decrease in the population growth rate increases GDP per worker relative to the United States by roughly 0.06. An increase in the investment share of 0.1 results in an increase of GDP per worker relative to the United States by approximately 0.07. For every additional year of average educational attainment, the increase is 0.055. The intercept should not be interpreted. The regression explains 77.5 percent of the variation in relative productivity. All coefficients are significantly different from zero at conventional levels. All coefficients carry the expected sign.
(b) The regression for the non-OECD countries is
RelPerInc = -0.068 – 0.063 × gpop + 0.719 × SK + 0.044 × Educ.
For the OECD countries we get
RelPerInc= 0.313 – 8.101 × gpop + 0.289 × SK + 0.047 × Educ.
The critical value is 3.32 at the 1% level and hence you can reject the null hypothesis that the coefficients are equal.
(c) Answer: Given the critical value, the coefficient is statistically significant, that is, you can reject
DOECD = 0.
(d) Given the critical value of 3.78 at the 1% level, you cannot reject the null hypothesis that the additional coefficients are all zero. The F-test is the proper procedure to use when testing for simultaneous restrictions.
(e) There is evidence that the slopes can be set equal. However, there seems to be a level difference between the two groups of countries.
3) You have been asked by your younger sister to help her with a science fair project. During the previous years she already studied why objects float and there also was the inevitable volcano project. Having learned regression techniques recently, you suggest that she investigate the weight-height relationship of 4th to 6th graders. Her presentation topic will be to explain how people at carnivals predict weight. You collect data for roughly 100 boys and girls between the ages of nine and twelve and estimate for her the following relationship:
Weight= 45.59 + 4.32 × Height4, R2 = 0.55, SER = 15.69
(3.81) (0.46)
where Weight is in pounds, and Height4 is inches above 4 feet. (a) Interpret the results.
(b) You remember from the medical literature that females in the adult population are, on average, shorter than males and weigh less. You also seem to have heard that females, controlling for height, are supposed to weigh less than males. To see if this relationship holds for children, you add a binary variable ( DFY) that takes on the value one for girls and is zero otherwise. You estimate the following regression function:
Weight= 36.27 + 17.33 × DFY + 5.32 × Height4 – 1.83 × (DFY × Height4),
Boys Weight Boys Height Girls Weight Girls Height
9-year-old 60 52 60 49
10-year-old 70 54 70 52
11-year-old 77 56 80 57
12-year-old 87 58.5 92 60
Insert two height/weight measures each for boys and girls and see how accurate your predictions are.
(d) The F-statistic for testing that the intercept and slope for boys and girls are identical is 2.92. Find the critical values at the 5% and 1% level, and make a decision. Allowing for a different intercept with an identical slope results in a t-statistic for DFY of (–0.35). Having identical intercepts but different slopes gives a t-statistic on (DFYHeight4) of (–0.35) also. Does this affect your previous conclusion?
(e) Assume that you also wanted to test if the relationship changes by age. Briefly outline how you would specify the regression including the gender binary variable and an age binary variable ( Older) that takes on a value of one for eleven to twelve year olds and is zero otherwise. Indicate in a table of two rows and two columns how the estimated relationship would vary between younger girls, older girls, younger boys, and older boys.
Answer: (a) For every inch above 4 feet, children of that age group gain roughly 4 pounds. A student who is 4 feet tall, weighs approximately 45.5 pounds. The regression explains 55 percent of the weight variation in children of that age group.
(b) Shorter girls weight more than boys, and taller boys weigh more than girls on average. Given your prior expectations, this is somewhat unexpected. The coefficients involving the binary variable are statistically significant at conventional levels. The regressions for boys is
Weight = 36.27 + 5.32 × Height4.
For girls it is
(d) The critical value is 3.00 at the 5% level, and 4.61 at the 1% level. Hence you cannot reject equality of the two coefficients. The previous conclusion is unaffected since the test was for both hypotheses to hold simultaneously. The t-statistics indicate that imposing the equality and testing for either the slope or the intercept to be significantly different between boys and girls, does not result in a different coefficient either.
(e) Weight= 0 + 1DFY + 2Height4 + 3 (DFY × Height4) + 4Older + 5(Older × Height4) + u
Younger Older
Boys ^
0 +^2 Height4 (^0 +^4) + (^2 +^5) Height4 Girls (^
0 +^1) + (^2 +^3) Height4 (^0 +^1 +^4) + (^2 +^3 +^5) Height4
4) You have learned that earnings functions are one of the most investigated relationships in economics. These typically relate the logarithm of earnings to a series of explanatory variables such as education, work experience, gender, race, etc.
(a) Why do you think that researchers have preferred a log-linear specification over a linear specification? In