Estimation: Multiple Regression

(1)

Estimation: Multiple Regression

Dr. Patrick Toche

References :

† Jeffrey M Wooldridge, Introductory Econometrics: A Modern Approach.

South Western, Cengage Learning, 6th edition (2015), 978-1305270107.

† James H. Stock and Mark W. Watson, Introduction to Econometrics, Addison Wesley Longman, 3rd edition (2015), 9788483229675.

Joshua Angrist and Jorn-Steffen Pischke, Mostly Harmless Econometrics:

An Empiricist’s Companion, Princeton University Press, 1st edition (2009), 978-8942008148.

These notes are a follow-up on Single Regression. This version is incomplete and will be expanded. Please check back later for an update.

Learning Objectives

1. How to develop a multiple regression model.

2. How to interpret the regression coefficients.

3. Calculate a predicted value of the dependent variable using a multiple regression equation.

4. Interpret and report the results of multiple linear regression analysis.

5. Determine which independent variables are most important in predicting a dependent variable.

6. Interpret the role of categorical independent variables.

(2)

Multiple Regression

I The term “regression to the mean” was coined by Francis Galton in a study on the correlation of the heights of fathers and sons.

I Multiple regression typically estimates the conditional expectation of the dependent variable given the independent variables — i.e.

the arithmetic mean of the dependent variable when the independent variables are fixed.

I Multiple quantile regression estimates the conditional distribution of the dependent variable given the independent variables.

I In both cases a function of the independent variables — the regression function — is estimated.

I The performance of regression analysis depends on the form of the data-generating process and on the validity of assumptions made about this process. If the assumptions are incorrect, regression results can be misleading, especially if the estimated effects are small and the sample size is small.

Multiple Regression

I Gauss-Markov theorem: Under classical assumptions, the OLS estimates are BLUE — Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.

I Classical assumptions:

1. The dependent variable is measured without error.

2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.

3. The error has zero mean conditional on the explanatory variables.

4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.

5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.

I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.

(3)

Multiple Regression

I Depending on context, the “dependent variable” is sometimes called

“explained variable”, “predicted variable”, “outcome variable”, “re- gressand”, “response variable”.

I Depending on context, an “independent variable” is sometimes called “explanatory variable”, “predictor variable”, “covariate”, “re- gressor”, “control variable”.

I An independent variable that may affect the dependent or independent variables, but is not actually be the focus of the experiment, is usually referred to as a control variable. Including control variables may help improve prediction and goodness of fit. Omitting control variables that have non-zero covariance with one or more of the independent variables creates an “omitted variable bias.”

Degrees of Freedom

Estimates of population parameters are based on sample data.

The number of independent pieces of information provided by the sample is called the degrees of freedom.

I In the context of linear regression, the degrees of freedom are df = n − p − 1, where nis the sample size andp + 1 the number of parameters to be estimated (n slopes and 1 intercept). In a single linear regression, there are n − 2 degrees of freedom.

I Approximate Intuition: Sample mean deviations are biased down- wards (small samples often miss outliers). Adjusting n down cor- rects the bias. The number of parameters to be estimated sets a minimum sample size below which additional sample values have no “freedom” to improve estimates. The more the parameters to be fitted, the greater the bias to be corrected.

(4)

Multiple Regression: Price and Size of House Sales

Price

(1) (2)

Constant 146, 817.3^∗ 65, 930.3

(85, 751.7) (60, 993.6)

Size 221.4^∗∗∗ 202.4^∗∗∗

(35.2) (26.4)

Beds −52, 862.7

(35, 755.9)

Baths 27, 600.7

(46, 383.0)

N 83 83

R² 0.4 0.4

Adjusted R² 0.4 0.4

Residual Std. Error 221,665.3 (df = 79) 222,137.3 (df = 81) F Statistic 20.5^∗∗∗ (df = 3; 79) 58.8^∗∗∗ (df = 1; 81)

Standard errors in parentheses.

*** Significant at the 1 percent level.

**Significant at the 5 percent level.

*Significant at the 10 percent level.

Multiple Regression: Marginal Residuals

12 13 14

1,000 2,000 3,000 4,000

Size (sqft)

House sales price ($)

Actual/Predicted Y Yp

OLS : log(Price) Size

Actual Versus Predicted values

(5)

Multiple Regression: Marginal Residuals

12 13 14

1 2 3 4 5

Bathrooms (count)

Multiple Regression: Marginal Residuals

12 13 14

2 4 6

Bedrooms (count)

(6)

Multiple Regression: Total Residuals

-2 -1 0 1 2

normal distribution

standardized residuals

Regression residuals in:

OLS: log(Price) ∼ Size + Beds + Baths

Quantile-Quantile Plot

Multiple Regression: Confidence & Prediction Intervals

100,000 250,000 500,000 1,000,000 2,000,000 3,000,000

1,000 2,000 3,000 4,000

Size (sqft)

House sales price (log $)

Confidence Prediction

OLS: log(Price) ∼ Size 95% confidence level

Confidence & Prediction Intervals

(7)

Estimators

I M-estimators: A broad class of estimators obtained as the ex- tremum of sums of functions of the data. The general principle, suggested by Peter Huber in 1964, is to minimize an objective function ρ for a vector of parameters θ:

minθ n

X

i=1

ρ(x_i, θ)

Closed-form solutions rarely exist. M-estimators can be difficult to compute (typically by iterative algorithms). Examples include:

Least Squares minimizes the sum of the squared residuals from the sample data (given a linear mode). Admits a closed form.

Maximum Likelihood maximizes the probability of the sample data being generated by some parameter value (given the structure of the model) L(θ|x₁. . . x_n). It is often convenient to work with the natural logarithm of the likelihood function (the log-likelihood).

Estimators

I L-estimators: A class of estimators expressed as a linear combination of order statistics of the measurements — the kth order statistic of a statistical sample is equal to its kth-smallest value.

L-estimators are often simple to compute and robust. Popular L-estimators have a high breakdown point — the fraction of the measurements which can be arbitrarily changed without causing the resulting estimate to tend to infinity. Examples include the median, the quantiles, the interquartile range. In modern econometrics, L-estimators have been replaced by M-estimators.

Example: Least Absolute Deviations (LAD) minimizes the sum of the absolute residuals. Also known as L₁-estimator — special case of an L_p-norm estimator.

(8)

Estimators

I Lp-norm estimators: Given a vector of parameters θ, minimize:

minθ n

X

i=1

| ρ(x_i, θ)|^p

!1/p

where in the simplest cases ρ is linear in the regression residuals.

Special cases include the sample mean L₂, used in OLS regressions, and the sample median L₁, used in LAD regressions.

L_p estimators with 1 < p < 2 have robustness and efficiency properties intermediate between those of the sample mean and the sample median.

I Errors in variables estimators: Account for measurement errors in the independent variable. Example: Total Least Squares (TLS), a multivariate extension of the Deming regression.