Lecture 16: Generalized Additive Models
http://polisci.msu.edu/jacoby/icpsr/regress3
Regression III:
Advanced Methods
Bill Jacoby
Michigan State University
Goals of the Lecture
• Introduce Additive Models
– Explain how they extend from simple nonparametric regression (i.e., local polynomial regression)
– Discuss estimation using backfitting – Explain how to interpret their results
• Conclude with some examples of Additive Models
applied to real social science data
Limitations of the Multiple Nonparametric Models
• As we see here, the multiple nonparametric model allows all possible interactions between the independent variables in their effects on Y—we specify a jointly conditional
functional form
• This model is ideal under the following circumstances:
1. There are no more than two predictors
2. The pattern of nonlinearity is complicated and thus
cannot be easily modelled with a simple transformation or polynomial regression
3. The sample size is sufficiently large
• Recall that the general nonparametric model (both the
lowess smooth and the smoothing spline) takes the
following form:
Limitations of the Multiple Nonparametric Models (2)
• The general nonparametric model becomes impossible to interpret and unstable as we add more explanatory
variables, however
1. For example, in the lowess case, as the number of variables increases, the window span must become wider in order to ensure that each local regression has enough cases This process can create significant bias (the curve becomes too smooth)
2. It is impossible to interpret general nonparametric regression when there are more than two variables—
there are no coefficients, and we cannot graph effects more than three dimensions
• These limitations lead us to the Additive Models
Additive Regression Models
• Additive regression models essentially apply local
regression to low dimensional projections of the data
• The nonparametric additive regression model is
The f
iare arbitrary functions estimated from the data;
the errors ε are assumed to have constant variance and a mean of 0
• Additive models create an estimate of the regression surface by a combination of a collection of one- dimensional functions
• The estimated functions f
iare the analogues of the
coefficients in linear regression
• The assumption that the contribution of each covariate is additive is analogous to the assumption in linear
regression that each component is estimated separately
• Recall that the linear regression model is
Additive Regression Models (2)
where the B
jrepresent linear effects
• For the additive model we model Y as an additive combination of arbitrary functions of the Xs
• The f
jrepresent arbitrary functions that can be estimated
by lowess or smoothing splines
• Now comes the question: How do we find these arbitrary functions?
• If the X’s were completely independent—which will not be the case—we could simply estimate each functional form using a nonparametric regression of Y on each of the X’s separately
– Similarly in linear regression when the X’s are
completely uncorrelated the partial regression slopes are identical to the marginal regression slopes
• Since the X’s are related, however, we need to proceed in another way, in effect removing the effects of other
predictors—which are unknown before we begin
• We use a procedure called backfitting to find each curve, controlling for the effects of the others
Additive Regression Models (3)
Estimation and Backfitting
• Suppose that we had a two predictor additive model:
• If we unrealistically knew the partial regression function f
2but not f
1we could rearrange the equation in order to solve for f
1• In other words, smoothing Y
i-f
2(x
i2) against x
i1produces an estimate of α+f
1(x
i1).
• Simply put, knowing one function allows us to find the
other—in the real world, however we don’t know either so
we must proceed initially with estimates
Estimation and Backfitting (2)
1. We start by expressing the variables in mean deviation form so that the partial regressions sum to zero, thus eliminating the individual intercepts
2. We then take preliminary estimates of each function from a least-squares regression of Y on the X’s
4. We then find the partial residuals for X
1, which removes Y from its linear relationship to X
2but retains the
relationship between Y and X
13. These estimates are then used as step (0) in an iterative
estimation process
Estimation and Backfitting (3)
The partial residuals for X
1are then
where S is the (n × n) smoother transformation matrix for X
jthat depends only on the configuration of X
ijfor the 5. The same procedure in step 4 is done for X
26. Next we smooth these partial residuals against their
respective X’s, providing a new estimate of f
Estimation and Backfitting (4)
• This process of finding new estimates of the functions by smoothing the partial residuals is reiterated until the partial functions converge
– That is, when the estimates of the smooth functions stabilize from one iteration to the next we stop
• When this process is done, we obtain estimates of s
j(X
ij) for every value of X
j• More importantly, we will have reduced a multiple regression to a series of two-dimensional partial regression problems, making interpretation easy:
– Since each partial regression is only two-dimensional, the functional forms can be plotted on two-dimensional plots showing the partial effects of each X
jon Y
– In other words, perspective plots are no longer
necessary unless we include an interaction between two
smoother terms
Interpreting the Effects
• A plot of of X
jversus s
j(X
j) shows the relationship between X
jand Y holding constant the other variables in the model
• Since Y is expressed in mean deviation form, the smooth term s
j(X
j) is also centered and thus each plot represents how Y changes relative to its mean with changes in X
• Interpreting the scale of the graphs then becomes easy:
– The value of 0 on the Y-axis is the mean of Y
– As the line moves away from 0 in a negative direction we subtract the distance from the mean when
determining the fitted value. For example, if the mean is 45, and for a particular X-value (say x=15) the curve is at s
j(X
j)=4, this means the fitted value of Y controlling for all other explanatory variables is 45+4=49.
– If there are several nonparametric relationships, we can add together the effects on the two graphs for any
particular observation to find its fitted value of Y
Additive Regression Models in R:
Example: Canadian prestige data
• Here we use the Canadian Prestige data to fit an additive model to prestige regressed on income and occupation
• In R we use the gam function (for generalized additive models) that is found in mgcv package
– The gam function in mgcv fits only smoothing splines (local polynomial regression can be done in S-PLUS) – The formula takes the same form as the glm function
except now we have the option of having parametric terms and smoothed estimates
– Smooths will be fit to any variable specified with the s(variable) argument
• The simple R-script is as follows:
Additive Regression Models in R:
Example: Canadian prestige data (2)
• The summary function returns tests for each smooth, the
degrees of freedom for each smooth, and an adjusted R-
square for the model. The deviance can be obtained from
the deviance(model) command
Additive Regression Models in R:
Example: Canadian prestige data (3)
• Again, as with other nonparametric models, we have no slope parameters to investigate (we do have an
intercept, however)
• A plot of the regression surface is necessary
Additive Regression Models in R:
Example: Canadian prestige data (4)
Additive Model:
• We can see the
nonlinear relationship for both education and Income with Prestige but there is no
interaction between them—i.e., the slope for income is the same at every value of
education
• We can compare this model to the general nonparametric
regression model
Income 5000
10000 15000
20000 25000
Education 8
10 12
14 Pre
stige 20 40 60 80
Additive Regression Models in R:
Example: Canadian prestige data (5)
General Nonparametric Model:
• This model is quite similar to the additive model, but there are some nuances—
particularly in the mid- range of income—that are not picked up by the additive model because the X’s do not interact
Income 5000
10000 15000
20000
25000
Education 8
10 12
14 Pre
stige 20 40 60 80
Additive Regression Models in R:
Example: Canadian prestige data (6)
• Perspective plots can also be made automatically
using the persp.gam
function. These graphs include a 95% confidence region
income
5000
10000 15000
20000
25000
education
8
10 12
14 20
40 60 80
income
5000
10000 15000
20000
25000
education
8
10 12
14 20
40 60 80
income
5000
10000 15000
20000
25000
education
8
10 12
14 20
40 60 80
income
5000
10000 15000
20000
25000
education
8
10 12
14 20
40
60
80
Additive Regression Models in R:
Example: Canadian prestige data (7)
• Since the slices of the additive regression in the direction of one predictor (holding the other constant) are parallel, we can graph each partial- regression function separately
• This is the benefit of the additive model—we can graph as many plots as there are variables, and allowing us to
easily visualize the relationships
• In other words, a multidimensional regression has been reduced to a series of two-dimensional partial-regression plots
• To get these in R:
Additive Regression Models in R:
Example: Canadian prestige data (8)
0 5000 10000 15000 20000 25000
-20010
income
s(income,3.12)
6 8 10 12 14 16
-20010
s(education,3.18)
Additive Regression Models in R:
Example: Canadian prestige data (9)
0 5000 15000 25000
-20-1001020
income
s(income,3.12)
6 8 10 12 14 16
-20-1001020
education
s(education,3.18)