Estimation: Multiple Regression

(1)

Dr. Patrick Toche

References :

† Jeffrey M Wooldridge, Introductory Econometrics: A Modern Approach.

South Western, Cengage Learning, 6th edition (2015), 978-1305270107.

† James H. Stock and Mark W. Watson, Introduction to Econometrics, Addison Wesley Longman, 3rd edition (2015), 9788483229675.

Joshua Angrist and Jorn-Steffen Pischke, Mostly Harmless Econometrics:

An Empiricist’s Companion, Princeton University Press, 1st edition (2009), 978-8942008148.

These notes are a follow-up on Single Regression. This version is incomplete and will be expanded. Please check back later for an update.

(2)

Learning Objectives

2 / 21

(3)

1. How to develop a multiple regression model.

2. How to interpret the regression coefficients.

3. Calculate a predicted value of the dependent variable using a multiple regression equation.

4. Interpret and report the results of multiple linear regression analysis.

5. Determine which independent variables are most important in predicting a dependent variable.

6. Interpret the role of categorical independent variables.

(4)

Learning Objectives

3 / 21

(5)

(6)

Learning Objectives

3 / 21

(7)

(8)

Learning Objectives

3 / 21

(9)

(10)

Multiple Regression

I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.

I Classical assumptions:

1. The dependent variable is measured without error.

2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.

3. The error has zero mean conditional on the explanatory variables.

4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.

5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.

I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.

6 / 21

(11)

(12)

Multiple Regression

6 / 21

(13)

(14)

Multiple Regression

6 / 21

(15)

(16)

Multiple Regression

6 / 21

(17)

(18)

Degrees of Freedom

8 / 21

(19)

Degrees of Freedom

Estimates of population parameters are based on sample data.

The number of independent pieces of information provided by the sample is called the degrees of freedom.

I In the context of linear regression, the degrees of freedom are df = n − p − 1, wherenis the sample size andp + 1the number of parameters to be estimated (n slopes and 1 intercept). In a single linear regression, there aren − 2degrees of freedom.

I Approximate Intuition: Sample mean deviations are biased down- wards (small samples often miss outliers). Adjustingndown cor- rects the bias. The number of parameters to be estimated sets a minimum sample size below which additional sample values have no “freedom” to improve estimates. The more the parameters to be fitted, the greater the bias to be corrected.

(20)

Degrees of Freedom

Estimates of population parameters are based on sample data.

The number of independent pieces of information provided by the sample is called the degrees of freedom.

I In the context of linear regression, the degrees of freedom are df = n − p − 1, wherenis the sample size andp + 1the number of parameters to be estimated (n slopes and 1 intercept). In a single linear regression, there aren − 2degrees of freedom.

I Approximate Intuition: Sample mean deviations are biased down- wards (small samples often miss outliers). Adjustingndown cor- rects the bias. The number of parameters to be estimated sets a minimum sample size below which additional sample values have no “freedom” to improve estimates. The more the parameters to be fitted, the greater the bias to be corrected.

9 / 21

(21)

(22)

Multiple Regression: Price and Size of House Sales

Call:

lm(formula = Price ~ Size + Beds + Baths, data = d) Residuals:

Min 1Q Median 3Q Max

-458105 -127923 -67537 92551 607742 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 146817.28 85751.71 1.712 0.0908 . Size 221.38 35.19 6.291 0.0000000164 ***

Beds -52862.71 35755.95 -1.478 0.1433 Baths 27600.71 46382.98 0.595 0.5535 ---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 221700 on 79 degrees of freedom Multiple R-squared: 0.4374,Adjusted R-squared: 0.416 F-statistic: 20.47 on 3 and 79 DF, p-value: 0.0000000006526

11 / 21

(23)

Price

(1) (2)

Constant 146, 817.3^∗ 65, 930.3

(85, 751.7) (60, 993.6)

Size 221.4^∗∗∗ 202.4^∗∗∗

(35.2) (26.4)

Beds −52, 862.7

(35, 755.9)

Baths 27, 600.7

(46, 383.0)

N 83 83

R² 0.4 0.4

Adjusted R² 0.4 0.4

Residual Std. Error 221,665.3 (df = 79) 222,137.3 (df = 81) F Statistic 20.5^∗∗∗(df = 3; 79) 58.8^∗∗∗(df = 1; 81)

Standard errors in parentheses.

***Significant at the 1 percent level.

**Significant at the 5 percent level.

*Significant at the 10 percent level.

(24)

Multiple Regression: Marginal Residuals

12 13 14

1,000 2,000 3,000 4,000

Size (sqft)

House sales price ($)

Actual/Predicted Y Yp OLS : log(Price) Size

Actual Versus Predicted values

13 / 21

(25)

12 13 14

1 2 3 4 5

Bathrooms (count)

(26)

Multiple Regression: Marginal Residuals

12 13 14

2 4 6

Bedrooms (count)

15 / 21

(27)

-2 -1 0 1 2

normal distribution

standardized residuals

Regression residuals in:

OLS: log(Price) ∼ Size + Beds + Baths

Quantile-Quantile Plot

(28)

Multiple Regression: Confidence & Prediction Intervals

100,000 250,000 500,000 1,000,000 2,000,000 3,000,000

1,000 2,000 3,000 4,000

Size (sqft)

House sales price (log $)

Confidence Prediction OLS: log(Price) ∼ Size

95% confidence level

Confidence & Prediction Intervals

17 / 21

(29)

(30)

Estimators

I M-estimators: A broad class of estimators obtained as the ex- tremum of sums of functions of the data. The general principle, suggested by Peter Huber in 1964, is to minimize an objective functionρfor a vector of parametersθ:

min

θ n

X

i=1

ρ(x_i, θ)

Closed-form solutions rarely exist. M-estimators can be difficult to compute (typically by iterative algorithms). Examples include:

Least Squares minimizes the sum of the squared residuals from the sample data (given a linear mode). Admits a closed form.

Maximum Likelihood maximizes the probability of the sample data being generated by some parameter value (given the structure of the model)L(θ|x₁. . . x_n). It is often convenient to work with the natural logarithm of the likelihood function (the log-likelihood).

19 / 21

(31)

min

θ n

X

i=1

ρ(x_i, θ)

(32)

Estimators

min

θ n

X

i=1

ρ(x_i, θ)

19 / 21

(33)

I L-estimators: A class of estimators expressed as a linear combination of order statistics of the measurements — the kth order statistic of a statistical sample is equal to its kth-smallest value.

L-estimators are often simple to compute and robust. Popular L-estimators have a high breakdown point — the fraction of the measurements which can be arbitrarily changed without causing the resulting estimate to tend to infinity. Examples include the median, the quantiles, the interquartile range. In modern econometrics,L-estimators have been replaced byM-estimators.

Example: Least Absolute Deviations (LAD) minimizes the sum of the absolute residuals. Also known as L1-estimator — special case of anL_p-norm estimator.

(34)

Estimators

20 / 21

(35)

(36)

Estimators

I Lp-norm estimators:Given a vector of parametersθ, minimize:

min

θ n

X

i=1

| ρ(x_i, θ)|^p

!1/p

where in the simplest casesρis linear in the regression residuals.

Special cases include the sample meanL₂, used in OLS regressions, and the sample medianL₁, used in LAD regressions.

L_p estimators with 1 < p < 2 have robustness and efficiency properties intermediate between those of the sample mean and the sample median.

I Errors in variables estimators:Account for measurement errors in the independent variable. Example: Total Least Squares (TLS), a multivariate extension of the Deming regression.

21 / 21

(37)

min

θ n

X

i=1

| ρ(x_i, θ)|^p

!1/p

(38)

Estimators

min

θ n

X

i=1

| ρ(x_i, θ)|^p

!1/p

21 / 21

(39)

min

θ n

X

i=1

| ρ(x_i, θ)|^p

!1/p