• No results found

Multiple Linear Regression - Quantitative Predictors

N/A
N/A
Protected

Academic year: 2021

Share "Multiple Linear Regression - Quantitative Predictors"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Multiple Linear Regression - Quantitative

Predictors

(2)

Introduction

Previously, we introduced multiple linear regression, which allows us to model an outcome variable using multiple predictors:

y = β0+ β1x1+ β2x2+ . . . + βkxk + 

I When the predictor xj is a dummy variable, we can view βj as

a modification of the model’s intercept

I When the predictor xj is a numeric variable, βj is the model’s

slope in the jth dimension

I This is easiest to visualize when the model contains two

numeric predictors, as the corresponding slopes will form a regression plane

(3)

Introduction

Previously, we introduced multiple linear regression, which allows us to model an outcome variable using multiple predictors:

y = β0+ β1x1+ β2x2+ . . . + βkxk + 

I When the predictor xj is a dummy variable, we can view βj as

a modification of the model’s intercept

I When the predictor xj is a numeric variable, βj is the model’s

slope in the jth dimension

I This is easiest to visualize when the model contains two

numeric predictors, as the corresponding slopes will form a regression plane

(4)

Regression Planes

For the Ames housing data, the estimated regression plane below displays the model:

SalePrice ~ Gr.Liv.Area + TotRms.AbvGrd

Gr.Liv.Area T otRms .Ab vGrd SalePr ice TotRms.AbvGrd Gr .Liv.Area SalePr ice

(5)

Regression Planes

The summary function will provide us the estimated slope in each dimension

## ## Call:

## lm(formula = SalePrice ~ Gr.Liv.Area + TotRms.AbvGrd, data = ah) ##

## Residuals:

## Min 1Q Median 3Q Max

## -572457 -28568 -2882 20536 348406 ##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 38534.235 4973.439 7.748 1.38e-14 *** ## Gr.Liv.Area 146.511 3.922 37.356 < 2e-16 *** ## TotRms.AbvGrd -11057.878 1273.236 -8.685 < 2e-16 *** ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##

## Residual standard error: 56630 on 2351 degrees of freedom ## Multiple R-squared: 0.5436, Adjusted R-squared: 0.5432 ## F-statistic: 1400 on 2 and 2351 DF, p-value: < 2.2e-16

(6)

Adjusted vs. Unadjusted Effects

I Notice the negative slope in the “TotRms.AbvGrd” dimension, does this mean that having more rooms is expected to decrease a home’s sale price?

I No, it’s essential to recognize that this slope is an adjusted

effect

I According to our model, having more rooms decreases a home’s sale price if the square footage remains unchanged

I This should make sense, since adjustment would imply the

home has smaller rooms

I For reference, the slope in the simple linear regression model

(7)

Adjusted vs. Unadjusted Effects

I Notice the negative slope in the “TotRms.AbvGrd” dimension, does this mean that having more rooms is expected to decrease a home’s sale price?

I No, it’s essential to recognize that this slope is an adjusted

effect

I According to our model, having more rooms decreases a home’s sale price if the square footage remains unchanged

I This should make sense, since adjustment would imply the

home has smaller rooms

I For reference, the slope in the simple linear regression model

(8)

Adjusted vs. Unadjusted Effects

We can further understand the adjusted vs. unadjusted effect of “TotRms.AbvGrd” using a scatterplot matrix:

plot(ah[,c("SalePrice","Gr.Liv.Area","TotRms.AbvGrd")])

SalePrice

1000

4000

0e+00 2e+05 4e+05 6e+05

1000 2000 3000 4000 5000 Gr.Liv.Area 0e+00 4e+05 2 4 6 8 10 12 14 2 6 10 14 TotRms.AbvGrd

I Multiple regression provides a method for isolating the effect of each variable

(9)

Connection to Stratification

I We’ve previously discussed using stratification to deal with confounding variables

I Both stratification and multiple regression work by holding the

confounding variable constant in order to isolate the impact of the explanatory variable of interest

I Additionally, stratification is sort of like a cross-section of the regression plane

I Within a given cross-section, the confounding variable is held at

a fixed value

I Unless the model includes an interaction, we don’t even need to

worry about which cross-section - the slope of the primary explanatory variable will be same

(10)

Connection to Stratification

I We’ve previously discussed using stratification to deal with confounding variables

I Both stratification and multiple regression work by holding the

confounding variable constant in order to isolate the impact of the explanatory variable of interest

I Additionally, stratification is sort of like a cross-section of the regression plane

I Within a given cross-section, the confounding variable is held at

a fixed value

I Unless the model includes an interaction, we don’t even need to

worry about which cross-section - the slope of the primary explanatory variable will be same

(11)

Adding More Predictors

I As we’ve seen, two numeric predictors will result in a regression

plane

I Adding a categorical predictor will shift the y -intercept of this

plane, leading to parallel planes for each of the variable’s category

I Adding another numeric predictor is not something we can visualize, but the overall concepts are the same

I Least squares will estimate a separate slope in each dimension

that isolates the impact of that variable

I In any case, when interpreting an estimated coefficient it is essential to recognize its effect has been adjusted for all other

(12)

Adding More Predictors

I As we’ve seen, two numeric predictors will result in a regression

plane

I Adding a categorical predictor will shift the y -intercept of this

plane, leading to parallel planes for each of the variable’s category

I Adding another numeric predictor is not something we can visualize, but the overall concepts are the same

I Least squares will estimate a separate slope in each dimension

that isolates the impact of that variable

I In any case, when interpreting an estimated coefficient it is essential to recognize its effect has been adjusted for all other

(13)

Adding More Predictors

I As we’ve seen, two numeric predictors will result in a regression

plane

I Adding a categorical predictor will shift the y -intercept of this

plane, leading to parallel planes for each of the variable’s category

I Adding another numeric predictor is not something we can visualize, but the overall concepts are the same

I Least squares will estimate a separate slope in each dimension

that isolates the impact of that variable

I In any case, when interpreting an estimated coefficient it is essential to recognize its effect has been adjusted for all other

(14)

Closing Remarks

I We’ve now discussed multiple regression, at a conceptual level, for categorical and numeric variables

I Our focus has been on understanding adjusted effects

I Next week we’ll look more closely at choosing variables that are worth including in a model, as well as some additional details regarding how certain data-points can influence the overall model

References

Related documents

That final editing having been made, attached as approved by the Supreme Court is the “descriptive material” on dispute resolution alternatives to conventional divorce litigation,

SOP-MB-023* Detection of Norovirus GI and GII in vegetables, shellfish, water and surfaces (based on ISO 15216-2:2019. Microbiology of food and animal feed. Horizontal method

LSXR 610 HL High/Low Mount 360° Sensor with High/Low Dimming Control Option LSXR 610 P High/Low Mount 360° Sensor with On/Off Photocell Option LSXR 610 ADC High/Low Mount 360°

Reporting. 1990 The Ecosystem Approach in Anthropology: From Concept to Practice. Ann Arbor: University of Michigan Press. 1984a The Ecosystem Concept in

To see the effect of having a scholarship, independent of the effect of getting a college degree, do control for college degree...

HEJ Research Institute of Chemistry, University of Karachi; Pakistan Council of Scientific and Industrial Research (PCSIR) Laboratories, Peshawar, Lahore and Karachi; Pakistan

The Lithuanian authorities are invited to consider acceding to the Optional Protocol to the United Nations Convention against Torture (paragraph 8). XII-630 of 3

Late Glacial to Mid-Holocene (15,500 to 7,000 Cal Years BP) Environmental Variability: Evidence of Lake Eutrophication and Variation in Vegetation from Ostracod Faunal Sucession