Dr. Patrick Toche
References :
† Jeffrey M Wooldridge, Introductory Econometrics: A Modern Approach.
South Western, Cengage Learning, 6th edition (2015), 978-1305270107.
† James H. Stock and Mark W. Watson, Introduction to Econometrics, Addison Wesley Longman, 3rd edition (2015), 9788483229675.
Joshua Angrist and Jorn-Steffen Pischke, Mostly Harmless Econometrics:
An Empiricist’s Companion, Princeton University Press, 1st edition (2009), 978-8942008148.
These notes are a follow-up on Single Regression. This version is incomplete and will be expanded. Please check back later for an update.
Learning Objectives
2 / 21
1. How to develop a multiple regression model.
2. How to interpret the regression coefficients.
3. Calculate a predicted value of the dependent variable using a multiple regression equation.
4. Interpret and report the results of multiple linear regression analysis.
5. Determine which independent variables are most important in predicting a dependent variable.
6. Interpret the role of categorical independent variables.
Learning Objectives
1. How to develop a multiple regression model.
2. How to interpret the regression coefficients.
3. Calculate a predicted value of the dependent variable using a multiple regression equation.
4. Interpret and report the results of multiple linear regression analysis.
5. Determine which independent variables are most important in predicting a dependent variable.
6. Interpret the role of categorical independent variables.
3 / 21
1. How to develop a multiple regression model.
2. How to interpret the regression coefficients.
3. Calculate a predicted value of the dependent variable using a multiple regression equation.
4. Interpret and report the results of multiple linear regression analysis.
5. Determine which independent variables are most important in predicting a dependent variable.
6. Interpret the role of categorical independent variables.
Learning Objectives
1. How to develop a multiple regression model.
2. How to interpret the regression coefficients.
3. Calculate a predicted value of the dependent variable using a multiple regression equation.
4. Interpret and report the results of multiple linear regression analysis.
5. Determine which independent variables are most important in predicting a dependent variable.
6. Interpret the role of categorical independent variables.
3 / 21
1. How to develop a multiple regression model.
2. How to interpret the regression coefficients.
3. Calculate a predicted value of the dependent variable using a multiple regression equation.
4. Interpret and report the results of multiple linear regression analysis.
5. Determine which independent variables are most important in predicting a dependent variable.
6. Interpret the role of categorical independent variables.
Learning Objectives
1. How to develop a multiple regression model.
2. How to interpret the regression coefficients.
3. Calculate a predicted value of the dependent variable using a multiple regression equation.
4. Interpret and report the results of multiple linear regression analysis.
5. Determine which independent variables are most important in predicting a dependent variable.
6. Interpret the role of categorical independent variables.
3 / 21
Multiple Regression
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
6 / 21
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
Multiple Regression
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
6 / 21
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
Multiple Regression
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
6 / 21
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
Multiple Regression
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
6 / 21
I Gauss-Markov theorem: Under classical assumptions, the OLS estimates areBLUE— Best in the class of Linear Unbiased Esti- mators — Best in that it yields the smallest variance of the estimate in its class.
I Classical assumptions:
1. The dependent variable is measured without error.
2. The independent variables are linearly independent: No variable can be expressed as a linear combination of the others.
3. The error has zero mean conditional on the explanatory variables.
4. The errors are uncorrelated: If the errors follow a predictable pattern, there is multicollinearity, e.g. serial correlation.
5. The variance of the errors is constant across observations (homoskedasticity): If the variance follows a predictable pattern, there is heteroskedasticity.
I Gauss-Markov does not assume the errors are normal and iid (independent and identically distributed), but assumes they have mean zero, are uncorrelated and homoskedastic.
Degrees of Freedom
8 / 21
Degrees of Freedom
Estimates of population parameters are based on sample data.
The number of independent pieces of information provided by the sample is called the degrees of freedom.
I In the context of linear regression, the degrees of freedom are df = n − p − 1, wherenis the sample size andp + 1the number of parameters to be estimated (n slopes and 1 intercept). In a single linear regression, there aren − 2degrees of freedom.
I Approximate Intuition: Sample mean deviations are biased down- wards (small samples often miss outliers). Adjustingndown cor- rects the bias. The number of parameters to be estimated sets a minimum sample size below which additional sample values have no “freedom” to improve estimates. The more the parameters to be fitted, the greater the bias to be corrected.
Degrees of Freedom
Degrees of Freedom
Estimates of population parameters are based on sample data.
The number of independent pieces of information provided by the sample is called the degrees of freedom.
I In the context of linear regression, the degrees of freedom are df = n − p − 1, wherenis the sample size andp + 1the number of parameters to be estimated (n slopes and 1 intercept). In a single linear regression, there aren − 2degrees of freedom.
I Approximate Intuition: Sample mean deviations are biased down- wards (small samples often miss outliers). Adjustingndown cor- rects the bias. The number of parameters to be estimated sets a minimum sample size below which additional sample values have no “freedom” to improve estimates. The more the parameters to be fitted, the greater the bias to be corrected.
9 / 21
Multiple Regression: Price and Size of House Sales
Call:
lm(formula = Price ~ Size + Beds + Baths, data = d) Residuals:
Min 1Q Median 3Q Max
-458105 -127923 -67537 92551 607742 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 146817.28 85751.71 1.712 0.0908 . Size 221.38 35.19 6.291 0.0000000164 ***
Beds -52862.71 35755.95 -1.478 0.1433 Baths 27600.71 46382.98 0.595 0.5535 ---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 221700 on 79 degrees of freedom Multiple R-squared: 0.4374,Adjusted R-squared: 0.416 F-statistic: 20.47 on 3 and 79 DF, p-value: 0.0000000006526
11 / 21
Price
(1) (2)
Constant 146, 817.3∗ 65, 930.3
(85, 751.7) (60, 993.6)
Size 221.4∗∗∗ 202.4∗∗∗
(35.2) (26.4)
Beds −52, 862.7
(35, 755.9)
Baths 27, 600.7
(46, 383.0)
N 83 83
R2 0.4 0.4
Adjusted R2 0.4 0.4
Residual Std. Error 221,665.3 (df = 79) 222,137.3 (df = 81) F Statistic 20.5∗∗∗(df = 3; 79) 58.8∗∗∗(df = 1; 81)
Standard errors in parentheses.
***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.
Multiple Regression: Marginal Residuals
12 13 14
1,000 2,000 3,000 4,000
Size (sqft)
House sales price ($)
Actual/Predicted Y Yp OLS : log(Price) Size
Actual Versus Predicted values
13 / 21
12 13 14
1 2 3 4 5
Bathrooms (count)
House sales price ($)
Actual/Predicted Y Yp OLS : log(Price) Size
Actual Versus Predicted values
Multiple Regression: Marginal Residuals
12 13 14
2 4 6
Bedrooms (count)
House sales price ($)
Actual/Predicted Y Yp OLS : log(Price) Size
Actual Versus Predicted values
15 / 21
-2 -1 0 1 2
-2 -1 0 1 2
normal distribution
standardized residuals
Regression residuals in:
OLS: log(Price) ∼ Size + Beds + Baths
Quantile-Quantile Plot
Multiple Regression: Confidence & Prediction Intervals
100,000 250,000 500,000 1,000,000 2,000,000 3,000,000
1,000 2,000 3,000 4,000
Size (sqft)
House sales price (log $)
Confidence Prediction OLS: log(Price) ∼ Size
95% confidence level
Confidence & Prediction Intervals
17 / 21
Estimators
I M-estimators: A broad class of estimators obtained as the ex- tremum of sums of functions of the data. The general principle, suggested by Peter Huber in 1964, is to minimize an objective functionρfor a vector of parametersθ:
min
θ n
X
i=1
ρ(xi, θ)
Closed-form solutions rarely exist. M-estimators can be difficult to compute (typically by iterative algorithms). Examples include:
Least Squares minimizes the sum of the squared residuals from the sample data (given a linear mode). Admits a closed form.
Maximum Likelihood maximizes the probability of the sample data being generated by some parameter value (given the structure of the model)L(θ|x1. . . xn). It is often convenient to work with the natural logarithm of the likelihood function (the log-likelihood).
19 / 21
I M-estimators: A broad class of estimators obtained as the ex- tremum of sums of functions of the data. The general principle, suggested by Peter Huber in 1964, is to minimize an objective functionρfor a vector of parametersθ:
min
θ n
X
i=1
ρ(xi, θ)
Closed-form solutions rarely exist. M-estimators can be difficult to compute (typically by iterative algorithms). Examples include:
Least Squares minimizes the sum of the squared residuals from the sample data (given a linear mode). Admits a closed form.
Maximum Likelihood maximizes the probability of the sample data being generated by some parameter value (given the structure of the model)L(θ|x1. . . xn). It is often convenient to work with the natural logarithm of the likelihood function (the log-likelihood).
Estimators
I M-estimators: A broad class of estimators obtained as the ex- tremum of sums of functions of the data. The general principle, suggested by Peter Huber in 1964, is to minimize an objective functionρfor a vector of parametersθ:
min
θ n
X
i=1
ρ(xi, θ)
Closed-form solutions rarely exist. M-estimators can be difficult to compute (typically by iterative algorithms). Examples include:
Least Squares minimizes the sum of the squared residuals from the sample data (given a linear mode). Admits a closed form.
Maximum Likelihood maximizes the probability of the sample data being generated by some parameter value (given the structure of the model)L(θ|x1. . . xn). It is often convenient to work with the natural logarithm of the likelihood function (the log-likelihood).
19 / 21
I L-estimators: A class of estimators expressed as a linear com- bination of order statistics of the measurements — the kth order statistic of a statistical sample is equal to its kth-smallest value.
L-estimators are often simple to compute and robust. Popular L-estimators have a high breakdown point — the fraction of the measurements which can be arbitrarily changed without causing the resulting estimate to tend to infinity. Examples include the me- dian, the quantiles, the interquartile range. In modern economet- rics,L-estimators have been replaced byM-estimators.
Example: Least Absolute Deviations (LAD) minimizes the sum of the absolute residuals. Also known as L1-estimator — special case of anLp-norm estimator.
Estimators
I L-estimators: A class of estimators expressed as a linear com- bination of order statistics of the measurements — the kth order statistic of a statistical sample is equal to its kth-smallest value.
L-estimators are often simple to compute and robust. Popular L-estimators have a high breakdown point — the fraction of the measurements which can be arbitrarily changed without causing the resulting estimate to tend to infinity. Examples include the me- dian, the quantiles, the interquartile range. In modern economet- rics,L-estimators have been replaced byM-estimators.
Example: Least Absolute Deviations (LAD) minimizes the sum of the absolute residuals. Also known as L1-estimator — special case of anLp-norm estimator.
20 / 21
I L-estimators: A class of estimators expressed as a linear com- bination of order statistics of the measurements — the kth order statistic of a statistical sample is equal to its kth-smallest value.
L-estimators are often simple to compute and robust. Popular L-estimators have a high breakdown point — the fraction of the measurements which can be arbitrarily changed without causing the resulting estimate to tend to infinity. Examples include the me- dian, the quantiles, the interquartile range. In modern economet- rics,L-estimators have been replaced byM-estimators.
Example: Least Absolute Deviations (LAD) minimizes the sum of the absolute residuals. Also known as L1-estimator — special case of anLp-norm estimator.
Estimators
I Lp-norm estimators:Given a vector of parametersθ, minimize:
min
θ n
X
i=1
| ρ(xi, θ)|p
!1/p
where in the simplest casesρis linear in the regression residuals.
Special cases include the sample meanL2, used in OLS regres- sions, and the sample medianL1, used in LAD regressions.
Lp estimators with 1 < p < 2 have robustness and efficiency properties intermediate between those of the sample mean and the sample median.
I Errors in variables estimators:Account for measurement errors in the independent variable. Example: Total Least Squares (TLS), a multivariate extension of the Deming regression.
21 / 21
I Lp-norm estimators:Given a vector of parametersθ, minimize:
min
θ n
X
i=1
| ρ(xi, θ)|p
!1/p
where in the simplest casesρis linear in the regression residuals.
Special cases include the sample meanL2, used in OLS regres- sions, and the sample medianL1, used in LAD regressions.
Lp estimators with 1 < p < 2 have robustness and efficiency properties intermediate between those of the sample mean and the sample median.
I Errors in variables estimators:Account for measurement errors in the independent variable. Example: Total Least Squares (TLS), a multivariate extension of the Deming regression.
Estimators
I Lp-norm estimators:Given a vector of parametersθ, minimize:
min
θ n
X
i=1
| ρ(xi, θ)|p
!1/p
where in the simplest casesρis linear in the regression residuals.
Special cases include the sample meanL2, used in OLS regres- sions, and the sample medianL1, used in LAD regressions.
Lp estimators with 1 < p < 2 have robustness and efficiency properties intermediate between those of the sample mean and the sample median.
I Errors in variables estimators:Account for measurement errors in the independent variable. Example: Total Least Squares (TLS), a multivariate extension of the Deming regression.
21 / 21
I Lp-norm estimators:Given a vector of parametersθ, minimize:
min
θ n
X
i=1
| ρ(xi, θ)|p
!1/p
where in the simplest casesρis linear in the regression residuals.
Special cases include the sample meanL2, used in OLS regres- sions, and the sample medianL1, used in LAD regressions.
Lp estimators with 1 < p < 2 have robustness and efficiency properties intermediate between those of the sample mean and the sample median.
I Errors in variables estimators:Account for measurement errors in the independent variable. Example: Total Least Squares (TLS), a multivariate extension of the Deming regression.