IAPRI Quantitative Analysis Capacity Building Series
Multiple regression analysis
& interpreting results
How important is R-squared?
R-squared Published in
Agricultural Economics
0.45 Best article of the year, 2008
??? Best article of the year,
2009
0.21 Best article of the year,
2010
Session 3 Topics
n Multiple regression analysis
¨ What does it mean?
¨ Why is it important?
¨ How is it done and how are results interpreted?
¨ What are the hazards?
Multiple Regression Analysis
n What does it mean?
¨ Multivariate analysis/statistics
¨ “ Ceteris paribus”
¨ “ All else equal”
¨ “ Controlling for”
Multiple Regression Analysis
n Why does it matter?
implying
¨ What if
¨ If , then
¨ Results are biased
n If (and other conditions), we can estimate w/ multiple regressors
u x
y = α + β 1 1 +
( u | x 1 ) ( ) = u E = 0
E Corr ( ) u , x 1 = 0
ε β +
= 2 x 2 u
( x 1 , x 2 ) ≠ 0
Corr Corr ( ) u , x 1 ≠ 0
( u | x 1 , x 2 ) = 0 E
ε β
β
α + + +
= 1 x 1 2 x 2
y
Multiple Regression Analysis
n Consider maize yield (mzyield) and basal fertilizer (basaprate), both kg/ha
_cons 1335.84 14.57861 91.63 0.000 1307.262 1364.417 basaprate 5.254685 .1344979 39.07 0.000 4.991037 5.518333 mzyield Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1.4388e+10 8647 1663962.69 Root MSE = 1189.3 Adj R-squared = 0.1500 Residual 1.2229e+10 8646 1414446.51 R-squared = 0.1501 Model 2.1590e+09 1 2.1590e+09 Prob > F = 0.0000 F( 1, 8646) = 1526.38 Source SS df MS Number of obs = 8648 . reg mzyield basaprate
u basaprate
mzyield = α + β 1 +
Multiple Regression Analysis
n Top dressing (topaprate) determines yield and is correlated with basaprate, both kg/ha
_cons 1314.93 14.58701 90.14 0.000 1286.336 1343.524 topaprate 3.62044 .3157663 11.47 0.000 3.001463 4.239418 basaprate 1.897807 .321747 5.90 0.000 1.267106 2.528508 mzyield Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1.4387e+10 8646 1664061.58 Root MSE = 1180.5 Adj R-squared = 0.1626 Residual 1.2046e+10 8644 1393535.34 R-squared = 0.1628 Model 2.3418e+09 2 1.1709e+09 Prob > F = 0.0000 F( 2, 8644) = 840.22 Source SS df MS Number of obs = 8647 . reg mzyield basaprate topaprate
ε β
β
α + + +
= basaprate topaprate
mzyield 1 2
Multiple Regression Analysis
n is the intercept
n are slope parameters (usually) u
x x
x
y = α + β 1 1 + β 2 2 + ... + β k k +
α
β
8
y
x
1 2 3
β 1 slope intercept
α
Multiple Regression Analysis
n is the intercept
n are slope parameters (usually)
n u is the unobserved error or disturbance term
n y is the dependant, explained, response or predicted variable
n x 1 ... x k are the independent, explanatory,
control or predictor variables, or regressors u
x x
x
y = α + β 1 1 + β 2 2 + ... + β k k +
β α
How is it done?
n OLS finds the β parameters that minimize:
n Minimize the “noise”
n Squared, so residuals don’t off set
n Gives us and predicted values β ˆ
( )
∑
=−
−
−
−
n
−
i
ik k i
i
i
x x x
y
1
2 2
2 1
1
β ... β
β α
yˆ
Ceteris Paribus Interpretation
u x
x x
y = α + β 1 1 + β 2 2 + ... + β k k +
n is the partial effect or ceteris paribus
n Change x 1 only:
n Change x 2 only:
n Share of total change attributable to x 1 : β
1
ˆ 1
ˆ x
y = Δ
Δ β
2
ˆ 2
ˆ x
y = Δ
Δ β
y x ˆ ˆ 1 1
Δ
β Δ
2 2
1
1 ˆ
ˆ ˆ x x
y = Δ + Δ
Δ β β
Ceteris Paribus Interpretation
n Now, how do we interpret the coefficient estimate for basaprate?
_cons 1314.93 14.58701 90.14 0.000 1286.336 1343.524 topaprate 3.62044 .3157663 11.47 0.000 3.001463 4.239418 basaprate 1.897807 .321747 5.90 0.000 1.267106 2.528508 mzyield Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1.4387e+10 8646 1664061.58 Root MSE = 1180.5 Adj R-squared = 0.1626 Residual 1.2046e+10 8644 1393535.34 R-squared = 0.1628 Model 2.3418e+09 2 1.1709e+09 Prob > F = 0.0000 F( 2, 8644) = 840.22 Source SS df MS Number of obs = 8647 . reg mzyield basaprate topaprate