• No results found

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

N/A
N/A
Protected

Academic year: 2021

Share "IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

IAPRI Quantitative Analysis Capacity Building Series

Multiple regression analysis

& interpreting results

(2)

How important is R-squared?

R-squared Published in

Agricultural Economics

0.45 Best article of the year, 2008

??? Best article of the year,

2009

0.21 Best article of the year,

2010

(3)

Session 3 Topics

n  Multiple regression analysis

¨  What does it mean?

¨  Why is it important?

¨  How is it done and how are results interpreted?

¨  What are the hazards?

(4)

Multiple Regression Analysis

n  What does it mean?

¨  Multivariate analysis/statistics

¨  “ Ceteris paribus”

¨  “ All else equal”

¨  “ Controlling for”

(5)

Multiple Regression Analysis

n  Why does it matter?

implying

¨  What if

¨   If , then

¨  Results are biased

n  If (and other conditions), we can estimate w/ multiple regressors

u x

y = α + β 1 1 +

( u | x 1 ) ( ) = u E = 0

E Corr ( ) u , x 1 = 0

ε β +

= 2 x 2 u

( x 1 , x 2 ) ≠ 0

Corr Corr ( ) u , x 1 ≠ 0

( u | x 1 , x 2 ) = 0 E

ε β

β

α + + +

= 1 x 1 2 x 2

y

(6)

Multiple Regression Analysis

n  Consider maize yield (mzyield) and basal fertilizer (basaprate), both kg/ha

_cons 1335.84 14.57861 91.63 0.000 1307.262 1364.417 basaprate 5.254685 .1344979 39.07 0.000 4.991037 5.518333 mzyield Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1.4388e+10 8647 1663962.69 Root MSE = 1189.3 Adj R-squared = 0.1500 Residual 1.2229e+10 8646 1414446.51 R-squared = 0.1501 Model 2.1590e+09 1 2.1590e+09 Prob > F = 0.0000 F( 1, 8646) = 1526.38 Source SS df MS Number of obs = 8648 . reg mzyield basaprate

u basaprate

mzyield = α + β 1 +

(7)

Multiple Regression Analysis

n  Top dressing (topaprate) determines yield and is correlated with basaprate, both kg/ha

_cons 1314.93 14.58701 90.14 0.000 1286.336 1343.524 topaprate 3.62044 .3157663 11.47 0.000 3.001463 4.239418 basaprate 1.897807 .321747 5.90 0.000 1.267106 2.528508 mzyield Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1.4387e+10 8646 1664061.58 Root MSE = 1180.5 Adj R-squared = 0.1626 Residual 1.2046e+10 8644 1393535.34 R-squared = 0.1628 Model 2.3418e+09 2 1.1709e+09 Prob > F = 0.0000 F( 2, 8644) = 840.22 Source SS df MS Number of obs = 8647 . reg mzyield basaprate topaprate

ε β

β

α + + +

= basaprate topaprate

mzyield 1 2

(8)

Multiple Regression Analysis

n  is the intercept

n  are slope parameters (usually) u

x x

x

y = α + β 1 1 + β 2 2 + ... + β k k +

α

β

(9)

8

y  

x  

 1    2    3  

β 1 slope intercept

α

(10)

Multiple Regression Analysis

n  is the intercept

n  are slope parameters (usually)

n  u is the unobserved error or disturbance term

n  y is the dependant, explained, response or predicted variable

n  x 1 ... x k are the independent, explanatory,

control or predictor variables, or regressors u

x x

x

y = α + β 1 1 + β 2 2 + ... + β k k +

β α

(11)

How is it done?

n  OLS finds the β parameters that minimize:

n  Minimize the “noise”

n  Squared, so residuals don’t off set

n  Gives us and predicted values β ˆ

( )

=

n

i

ik k i

i

i

x x x

y

1

2 2

2 1

1

β ... β

β α

(12)

Ceteris Paribus Interpretation

u x

x x

y = α + β 1 1 + β 2 2 + ... + β k k +

n  is the partial effect or ceteris paribus

n  Change x 1 only:

n  Change x 2 only:

n  Share of total change attributable to x 1 : β

1

ˆ 1

ˆ x

y = Δ

Δ β

2

ˆ 2

ˆ x

y = Δ

Δ β

y x ˆ ˆ 1 1

Δ

β Δ

2 2

1

1 ˆ

ˆ ˆ x x

y = Δ + Δ

Δ β β

(13)

Ceteris Paribus Interpretation

n  Now, how do we interpret the coefficient estimate for basaprate?

_cons 1314.93 14.58701 90.14 0.000 1286.336 1343.524 topaprate 3.62044 .3157663 11.47 0.000 3.001463 4.239418 basaprate 1.897807 .321747 5.90 0.000 1.267106 2.528508 mzyield Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1.4387e+10 8646 1664061.58 Root MSE = 1180.5 Adj R-squared = 0.1626 Residual 1.2046e+10 8644 1393535.34 R-squared = 0.1628 Model 2.3418e+09 2 1.1709e+09 Prob > F = 0.0000 F( 2, 8644) = 840.22 Source SS df MS Number of obs = 8647 . reg mzyield basaprate topaprate

u topaprate

basaprate

mzyield = α + β 1 + β 2 +

(14)

Ceteris Paribus Interpretation

n  “ According to these results, a one unit change in x 1 will result in a unit change in y, all else

equal.”

n  “ The ceteris paribus effect of a one unit change in x 1 is a unit change in y.”

n  Holding x 2 constant, a one unit change in x 1 results in a unit change in y.”

ˆ 1

β

ˆ 1

β

β ˆ

(15)

Key Assumptions

n  Linear in parameters

n  Random sample

n  Zero conditional mean

n  No perfect collinearity (variation in data)

n  Homoskedastic errors

(16)

Key Assumptions

n  Linear in parameters

n  Random sample

n  Zero conditional mean

n  No perfect collinearity (variation in data)

n  Homoskedastic errors

(17)

Perfect Collinearity

n  Variable is a linear function of one or more others.

n  No variation in one variable (collinear w/

intercept)

(18)

Can’t estimate slope parameter if no variation in x

Source: Wooldridge (2002) 17

(19)

Perfect Collinearity

n  Variable is a linear function of one or more others.

n  No variation in one variable (collinear w/

intercept)

n  Perfect correlation between 2 binary

variables

(20)

Other hazards

n  Multi-collinearity

n  Including irrelevant variables

n  Omitting relevant variables

(21)

Multi-Collinearity

n  Highly correlated variables

n  Variable is a nonlinear function of others

n  What’s the problem?

n  Efficiency losses

n  Schmidt thumb rule

(22)

Including Irrelevant Variables

n  Suppose x 3 is has no effect on y, but key assumptions are satisfied (overspecified)

n  OLS is an unbiased estimator of , even if is zero

n  Estimates of and will be less efficient u

x x

x

y = α + β 1 1 + β 2 2 + β 3 3 +

β 3

β 3

β 1 β 2

(23)

Omitting Relevant Variables

n  Suppose we omit x 2 (underspecifying)

n  OLS is generally biased u

x x

y = α + β 1 1 + β 2 2 +

(24)

Omitting Relevant Variables

n  Estimate

n  And let

n  It can be shown that:

1 1

~ ~

~ y = α + β x

u x

x

y = α + β 1 1 + β 2 2 +

2 1 0

1

~

~ x

x = δ + δ

( ) β ~ 1 = β 1 + β 2 δ ~ 1

E

Omitted Variable Bias

(25)

Multiple Regression Analysis

Corr(x 1 ,x 2 )>0 Corr(x 1 ,x 2 )<0 Positive bias Negative bias

Negative bias Positive bias

Source: Wooldridge, 2002, page 92 2 > 0

β

2 < 0

β

(26)

Omitting Relevant Variables

n  More generally, all OLS estimates will be biased, even if just one explanatory

variable is correlated with the omitted variables

n  Direction of bias is less clear

(27)

Multiple Regression Analysis

n  Goodness of fit

¨  R 2 is the share of explained variance

¨  R 2 never decreases when we add variables

¨  Usually, it will increase regardless of relevance

n  “ Adjusted R 2 ” accounts for this

(28)

Next time: Interpreting results

n  Binary regressors

n  Other categorical regressors

n  Categorical regressors as a series of binary regressors

n  Quadratic terms

n  Other interactions

n  Average Partial Effects

(29)

Sessions materials developed by Bill Burke with input from Nicole Mason. January 2012.

[email protected]

References

Related documents

We Þ nd that, if the cost of using checking deposits for making payments is lower than the liquidity return obtained from Þ nancing big transactions but larger than that for

This paper presents the modelling of FSW for various tool-pin profiles along with simulation of peak temperature induced in plate material and flow stresses

สรุปผลการอบรมการใชงานโปรแกรม Microsoft PowerPoint 2003 งานเทคโนโลยีสารสนเทศ ฝายสํานักผูอํานวยการ ไดจัดอบรมการใชงานโปรแกรม Microsoft

While conformity pressure strikes many people as an undesirable feature of a group, it is important to note that there are many situations in which individuals will resist the

The maximum score obtained is 150 (Mean = 4,55) and the minimum score is 51 (Mean = 7KH FODVVLILFDWLRQ RI WKH VWXGHQWV¶ IRUHLJQ ODQJXDJH DQ[LHW\ OHYHOV FDQ EH VKRZQ in the

Nonetheless, some generalizations can be made about issues which are the focus of policy attention across Canadian jurisdictions: improving integrated water resources

• Start of construction Terminal 3 in 2015 Additional dedicated capacity of 14 million.. passengers to launch in end of 2021, beginning

To compare the two cone and prismatic tips of the developed horizontal sensors, tests were conducted in the soil bin laboratory of the Agricultural Engineering