SIMPLE REGRESSION ANALYSIS
2.6 Interpretation of a Regression Equation
There are two stages in the interpretation of a regression equation. The first is to turn the equation into words so that it can be understood by a noneconometrician. The second is to decide whether this literal interpretation should be taken at face value or whether the relationship should be investigated further.
Both stages are important. We will leave the second until later and concentrate for the time being on the first. It will be illustrated with an earnings function, hourly earnings in 1992, EARNINGS, measured in dollars, being regressed on schooling, S, measured as highest grade completed, for the 570 respondents in EAEF Data Set 21. The Stata output for the regression is shown below. The scatter diagram and regression line are shown in Figure 2.8.
. reg EARNINGS S
Source | SS df MS Number of obs = 570 –––––––––+–––––––––––––––––––––––––––––– F( 1, 568) = 65.64 Model | 3977.38016 1 3977.38016 Prob > F = 0.0000 Residual | 34419.6569 568 60.5979875 R–squared = 0.1036 –––––––––+–––––––––––––––––––––––––––––– Adj R–squared = 0.1020 Total | 38397.0371 569 67.4816117 Root MSE = 7.7845 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
–––––––––+––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
S | 1.073055 .1324501 8.102 0.000 .8129028 1.333206 _cons | –1.391004 1.820305 –0.764 0.445 –4.966354 2.184347 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
For the time being, ignore everything except the column headed “coef.” in the bottom half of the table. This gives the estimates of the coefficient of S and the constant, and thus the following fitted equation:
INGS N
EAR ˆ = –1.39 + 1.07S. (2.43)
Interpreting it literally, the slope coefficient indicates that, as S increases by one unit (of S), EARNINGS increases by 1.07 units (of EARNINGS). Since S is measured in years, and EARNINGS is measured in dollars per hour, the coefficient of S implies that hourly earnings increase by $1.07 for every extra year of schooling.
Figure 2.8. A simple earnings function
What about the constant term? Strictly speaking, it indicates the predicted level of EARNINGS when S is 0. Sometimes the constant will have a clear meaning, but sometimes not. If the sample values of the explanatory variable are a long way from 0, extrapolating the regression line back to 0 may be dangerous. Even if the regression line gives a good fit for the sample of observations, there is no guarantee that it will continue to do so when extrapolated to the left or to the right.
In this case a literal interpretation of the constant would lead to the nonsensical conclusion that an individual with no schooling would have hourly earnings of –$1.39. In this data set, no individual had less than six years of schooling and only three failed to complete elementary school, so it is not surprising that extrapolation to 0 leads to trouble.
Interpretation of a Linear Regression Equation
This is a foolproof way of interpreting the coefficients of a linear regression
i
i b b X
Yˆ = 1+ 2
when Y and X are variables with straightforward natural units (not logarithms or other functions).
The first step is to say that a one-unit increase in X (measured in units of X) will cause a b2
unit increase in Y (measured in units of Y). The second step is to check to see what the units of X and Y actually are, and to replace the word "unit" with the actual unit of measurement. The third step is to see whether the result could be expressed in a better way, without altering its substance.
The constant, b1, gives the predicted value of Y (in units of Y) for X equal to 0. It may or may not have a plausible meaning, depending on the context.
-10 0 10 20 30 40 50 60 70 80
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Years of schooling (highest grade completed)
Hourly earnings ($)
It is important to keep three things in mind when interpreting a regression equation. First, b1 is only an estimate of β1 and b2 is only an estimate of β2, so the interpretation is really only an estimate.
Second, the regression equation refers only to the general tendency for the sample. Any individual case will be further affected by the random factor. Third, the interpretation is conditional on the equation being correctly specified.
In fact, this is actually a naïve specification of an earnings function. We will reconsider it several times in later chapters. You should be undertaking parallel experiments using one of the other EAEF data sets on the website.
Having fitted a regression, it is natural to ask whether we have any means of telling how accurate are our estimates. This very important issue will be discussed in the next chapter.
Exercises
Note: Some of the exercises in this and later chapters require you to fit regressions using one of the EAEF data sets on the website (http://econ.lse.ac.uk/ie/). You will need to download the EAEF regression exercises manual and one of the 20 data sets.
2.1* The table below shows the average rates of growth of GDP, g, and employment, e, for 25 OECD countries for the period 1988–1997. The regression output shows the result of regressing e on g. Provide an interpretation of the coefficients.
Average Rates of Employment Growth and GDP Growth, 1988–1997
employment GDP employment GDP
Australia 1.68 3.04 Korea 2.57 7.73
Austria 0.65 2.55 Luxembourg 3.02 5.64
Belgium 0.34 2.16 Netherlands 1.88 2.86
Canada 1.17 2.03 New Zealand 0.91 2.01
Denmark 0.02 2.02 Norway 0.36 2.98
Finland –1.06 1.78 Portugal 0.33 2.79
France 0.28 2.08 Spain 0.89 2.60
Germany 0.08 2.71 Sweden –0.94 1.17
Greece 0.87 2.08 Switzerland 0.79 1.15
Iceland –0.13 1.54 Turkey 2.02 4.18
Ireland 2.16 6.40 United Kingdom 0.66 1.97
Italy –0.30 1.68 United States 1.53 2.46
Japan 1.06 2.81
. reg e g
Source | SS df MS Number of obs = 25 –––––––––+–––––––––––––––––––––––––––––– F( 1, 23) = 33.22 Model | 14.2762167 1 14.2762167 Prob > F = 0.0000 Residual | 9.88359869 23 .429721682 R–squared = 0.5909 –––––––––+–––––––––––––––––––––––––––––– Adj R–squared = 0.5731 Total | 24.1598154 24 1.00665898 Root MSE = .65553 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
–––––––––+––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
g | .4846863 .0840907 5.764 0.000 .3107315 .6586411 _cons | –.5208643 .2707298 –1.924 0.067 –1.080912 .039183 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
2.2 Calculate by hand a regression of the data for p on the data for e in Exercise 1.3, first using all 12 observations, then excluding the observation for Japan, and provide an economic interpretation. (Note: You do not need to calculate the regression coefficients from scratch, since you have already performed most of the arithmetical calculations in Exercise 1.3)
2.3 Fit an educational attainment function parallel to that in Exercise 2.1, using your EAEF data set, and give an interpretation of the coefficients.
2.4 Fit an earnings function parallel to that discussed in Section 2.6, using your EAEF data set, and give an interpretation of the coefficients.
2.5* The output below shows the result of regressing the weight of the respondent in 1985, measured in pounds, against his or her height, measured in inches, using EAEF Data Set 21. Provide an interpretation of the coefficients.
. reg WEIGHT85 HEIGHT
Source | SS df MS Number of obs = 550 –––––––––+–––––––––––––––––––––––––––––– F( 1, 548) = 343.00 Model | 245463.095 1 245463.095 Prob > F = 0.0000 Residual | 392166.897 548 715.633025 R–squared = 0.3850 –––––––––+–––––––––––––––––––––––––––––– Adj R–squared = 0.3838 Total | 637629.993 549 1161.43897 Root MSE = 26.751 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
WEIGHT85 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
–––––––––+––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
HEIGHT | 5.399304 .2915345 18.520 0.000 4.826643 5.971966 _cons | –210.1883 19.85925 –10.584 0.000 –249.1979 –171.1788 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
2.6 Two individuals fit earnings functions relating EARNINGS to S as defined in Section 2.6, using EAEF Data Set 21. The first individual does it correctly and obtains the result found in Section 2.6:
S INGS
N
EARˆ =−1.39+1.07
The second individual makes a mistake and regresses S on EARNINGS, obtaining the following result:
EARNINGS Sˆ=12.255+0.097
From this result the second individual derives
S INGS
N
EARˆ =−126.95+10.36
Explain why this equation is different from that fitted by the first individual.
2.7* Derive, with a proof, the coefficients that would have been obtained in Exercise 2.5 if weight and height had been measured in metric units. (Note: one pound is 454 grams, and one inch is 2.54 cm.)
2.8* A researcher has data on the aggregate expenditure on services, Y, and aggregate disposable personal income, X, both measured in $ billion at constant prices, for each of the U.S. states and fits the equation
Yi = β1 + β2Xi + ui
The researcher initially fits the equation using OLS regression analysis. However, suspecting that tax evasion causes both Y and X to be substantially underestimated, the researcher adopts two alternative methods of compensating for the under-reporting:
1. The researcher adds $90 billion to the data for Y in each state and $200 billion to the data for X.
2. The researcher increases the figures for both Y and X in each state by 10 percent.
Evaluate the impact of the adjustments on the regression results.
2.9* A researcher has international cross-section data on aggregate wages, W, aggregate profits, P, and aggregate income, Y, for a sample of n countries. By definition,
Yi = Wi + Pi
The regressions
i
i a a Y
Wˆ = 1+ 2
i
i b b Y
Pˆ = 1+ 2
are fitted using OLS regression analysis. Show that the regression coefficients will automatically satisfy the following equations:
a2 + b2 = 1 a1 + b1 = 0 Explain intuitively why this should be so.
2.10* Derive from first principles the least squares estimator of β2 in the model Yi = β2Xi + ui
2.11 Derive from first principles the least squares estimator of β1 in the even more primitive model Yi = β1 + ui
(In other words, Y consists simply of a constant plus a disturbance term. First define RSS and then differentiate).