Poisson regression
6.3 Example: Poisson model
6.3.1 Coefficient parameterization
This example comes from Medicare hospital length-of-stay data from the state of Arizona. The data are limited to only one diagnostic group (DRG 112).
Patient data have been randomly selected from the original data to be part of the medpar data set. Relevant medpar variables for this example include:
RESPONSE
los: length of stay, a count of the days each patient spent in the hospital PREDICTORS
hmo: Patient belongs to a Health Maintenance Organization (1), or private pay (0)
white: Patient identifies themselves as primarily Caucasian (1) in comparison to non-white (0)
type : A three-level factor predictor related to the type of admission.
1= elective (referent), 2 = urgent, and 3 = emergency
Program code and results are displayed in both Stata and R.
. glm los hmo white type2 type3, fam(poi)
Generalized linear models No. of obs = 1495
Optimization : ML Residual df = 1490
Scale parameter = 1 Deviance = 8142.666001 (1/df) Deviance = 5.464877 Pearson = 9327.983215 (1/df) Pearson = 6.260391<
. . .
AIC = 9.276131
Log likelihood = -6928.907786 BIC = -2749.057
---| OIM
los | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+---hmo | -.0715493 .023944 -2.99 0.003 -.1184786 -.02462 white | -.153871 .0274128 -5.61 0.000 -.2075991 -.100143 type2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127 type3 | .7094767 .026136 27.15 0.000 .6582512 .7607022 _cons | 2.332933 .0272082 85.74 0.000 2.279606 2.38626
---The variance covariance matrix can be displayed as
. matrix list e(V) symmetric e(V)[5,5]
los: los: los: los: los:
hmo white type2 type3 _cons
los:hmo .00057331
los:white -.00003093 .00075146
los:type2 .00003196 .00005636 .00044318
los:type3 .00007959 .00002412 .0001065 .00068309
los:_cons -.00006879 -.00068476 -.00015703 -.0001355 .00074028
The standard errors are contained in the above matrix as the square root of the main diagonal elements. For example, the standard error of hmo is:
. di sqrt(.00057331) .02394389
The coefficient and standard error of a predictor may also be abstracted from the table of parameter estimates. hmo statistics may be more nicely abstracted using the code:
. di "hmo: coefficient = " _b[hmo] " hmo: SE = " _se[hmo]
hmo: coefficient = -.07154931 hmo: SE =.02394396
As discussed in Chapter 2, the confidence intervals of the coefficient standard errors are determined by the formula given in equation 2.6. To repeat here for clarification, the confidence internals are
βk ± zα/2 SE(βk) (6.32)
where zα/2 is a quantile from the normal distribution. The 100(1 − α)%, or 95%, confidence interval has a value of 1.96, and is the same as α= 0.05. βk
is the coefficient on the kth predictor, and SE(βk) is its standard error.
Given the above formula, the 95% confidence intervals for hmo are
βhmo ± zα/2 ∗ SE(βhmo) = Conf interval
−0.0715493 − 1.96 ∗ 0.02394396 = −0.11847946
−0.0715493 + 1.96 ∗ 0.02394396 = −0.02461914
which are the values given in the model output above. Other predictor standard errors are determined in the same manner. We shall find, however, that the methods discussed here cannot be used to determine the standard error for exponentiated coefficients, i.e. incidence rate ratios. IRR standard errors and confidence intervals will be addressed in the next section.
The R code for estimating the same Poisson model as above, together with relevant output, is given in Table 6.8.
The dependent or response variable of a Poisson model is a count. The natural log of the expected or fitted value, ln(µ), is the canonical link function.
Therefore, the Poisson model estimates the log of the expected count, given the values of the respective explanatory predictors. Coefficients represent the difference in the expected log-count of one level compared with another. For a binary predictor, we have ln(µ1)− ln(µ0), which is the same as ln(µ1/µ0).
Table 6.9 provides an interpretation of the Poisson binary and factor coefficients from the above displayed models.
Prior to viewing Table 6.9, it should be recalled that coefficients of any regression model are slopes, and are to be interpreted like the slopes of any linear equation. One of the basic concepts learned in beginning courses in algebra is the meaning of the slope of a line to describe the rate of change of a function. Given the simple linear equation,
y =35x+ 5 (6.33)
The slope is 35 and the intercept is 5, giving the value of y when x= 0. The slope can be interpreted as: for each 5 units increase in x, y increases by 3. If
Table 6.8 R: Poisson model on Medpar data
data(medpar) attach(medpar)
poi <- glm(los ~ hmo + white + factor(type), family=poisson, data=medpar)
summary(poi) confint(poi)
vcov(poi) # Variance-Covariance matrix; not displayed below
---Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.33293 0.02721 85.744 < 2e-16 ***
hmo -0.07155 0.02394 -2.988 0.00281 **
white -0.15387 0.02741 -5.613 1.99e-08 ***
type 2 0.22165 0.02105 10.529 < 2e-16 ***
type 3 0.70948 0.02614 27.146 < 2e-16 ***
2.5 % 97.5 %
(Intercept) 2.2792058 2.38586658
hmo -0.1187331 -0.02486953
white -0.2072290 -0.09976522
factor(type)Urgent Admit 0.1802390 0.26276400 factor(type)Emergency Admit 0.6579525 0.76041000
the slope is negative, then the slope would be interpreted as: for each 5 units increase in x, y decreases by 3. If the slope is 3 rather than 35, a denominator of 1 is implied, so that the interpretation would be: for a unit increase in x, y increases by 3. In fact, it is this last interpretation that is the standard manner of interpreting regression coefficients. Converting the above slope, or coefficient,
3
5 to 0.6, would then be interpreted as: for every one unit increase in x, y increases by 0.6. This interpretation is the standard when explaining regression coefficients.
The above linear model is that which is assumed for Gaussian or normal linear regression models. The linear predictor, y|x, or xβ, is identical to the predicted fit, E(y), ˆy, or µ. For models that require a link function to establish a linear relationship between the fit and linear predictor, such as the log link for Poisson models, the change in y given a change in x is based on the link. If equation 6.33 is a Poisson model, the slope is interpreted as: for a 1 unit change in x, y changes by log3
5
.
Recall that for a binary predictor, a coefficient reflects the change in value of y given x= 1 compared with x = 0. However, if we need to know the coefficient
Table 6.9 Interpretation of Poisson coefficients
hmo: The difference in the log of the expected length of stay is expected to be 0.07 log-days lower for HMO than for private pay patients, with the remaining predictor values held constant.
white: The difference in the log of the expected length of stay is expected to be 0.15 log-days lower for patients who identified themselves as white than for patients having another racial background, with the remaining predictor values held constant.
type2: The difference in the log of the expected length of stay is expected to be 0.22 log-days higher for patients admitted as urgent compared with patients admitted as elective, with the remaining predictor values held constant.
type3: The difference in the log of the expected length of stay is expected to be 0.71 log-days higher for patients admitted as emergency
compared with patients admitted as elective, with the remaining predictor values held constant.
_cons: The log of the expected length of stay is 2.33 log-days when a patient is private pay, non-white, and admitted to the hospital as an elective patient. That is, hmo and white are both 0 in value and type1 has a value of 1.
on x= 0 compared with x = 1, we simply reverse the signs. Therefore, for the model we have been using as an example, we can model los on non-white private pay patients, as well as on type of admission as before, by constructing the following modification.
Table 6.10 R: Model with inversed binary predictors
private <- medpar$hmo==0 nonwhite <- medpar$white==0
poi0 <- glm(los ~ private + nonwhite + factor(type), family=poisson, data=medpar)
summary(poi0) confint(poi0)
. gen byte private = hmo==0 . gen byte nonwhite= white==0
. glm los hmo0 nonwhite type2 type3, fam(poi) nohead
---| OIM
los | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+---private | .0715493 .023944 2.99 0.003 .02462 .1184786 nonwhite | .153871 .0274128 5.61 0.000 .100143 .2075991 type2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127 type3 | .7094767 .026136 27.15 0.000 .6582512 .7607022 _cons | 2.107513 .0222731 94.62 0.000 2.063858 2.151167
---The coefficient and standard error values of private and non-white are iden-tical to that of hmo and white in the earlier model – except that the signs of the coefficients are reversed. The value of the intercept has also changed, from 2.33 to 2.11. This fact allows for an easy interpretation of both levels of a binary predictor.
Continuous predictors have an interpretation similar to that of binary predic-tors, except that there is more than one level or units of the predictor to relate.
The logic of the difference in log-count for continuous predictors is:
βK = ln(µk+1)− ln(µk) (6.34) where k is a given value of the continuous predictor and k+ 1 is the next highest value.
Consider an example taken from the German health registry for the years 1984–8 (rwm5yr). A Poisson model can be constructed for the number of visits a patient makes to a doctor during the year (docvis) given their age (age), where age takes the values of 25 through 64. After a patient reaches 65 their visits are recorded in another registry. The logic of the coefficient on age is therefore:
βAGE= ln(µage+1)− ln(µage) (6.35) for any two contiguous ages in the data. βAGEis the coefficient on age.
The number of visits to the physician (docvis) is modeled on gender (female) and educational level. We wish to see if gender and education level have a bearing on the numbers of doctor visits during 1984–8.
Response = docvis : count from 0-121 Predictor = female :(1=female;0=male)
edlevel1 : Not HS grad edlevel2 : HS grad edlevel3 : Univ/Coll edlevel4 : Grad school panel id = id : individual
time = year : 1984-1988
DATA
docvis | Freq. Percent Cum.
female | Freq. Percent Cum.
year | Freq. Percent Cum.
---+---1984 | 3,874 19.76 19.76
1985 | 3,794 19.35 39.10
1986 | 3,792 19.34 58.44
1987 | 3,666 18.70 77.14
1988 | 4,483 22.86 100.00
---+---Total | 19,609 100.00
|
edlevel | Freq. Percent Cum.
---+---Not HS grad | 15,433 78.70 78.70 [Reference]
HS grad | 1,153 5.88 84.58
Coll/Univ | 1,733 8.84 93.42
Grad School | 1,290 6.58 100.00
---+---Total | 19,609 100.00
Ignoring the fact that there are multiple observations of each patient based on the year that visits were recorded, we have the following Poisson model.
Table 6.11 R: German health data: 1984–8
detach(medpar) data(rwm5yr) attach(rwm5yr)
rwmpoi <- glm(docvis ~ age, family=poisson, data=rwm5yr) summary(rwmpoi)
confint(rwmpoi)
. glm docvis age, fam(poi) nohead
---| OIM
docvis | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+---age | .023908 .0003616 66.12 0.000 .0231993 .0246168 _cons | .0728969 .0173841 4.19 0.000 .0388246 .1069691
---The coefficient on age is 0.024, which can be interpreted as:
For a one-year increase in age, the difference in the log of the expected doctor visits increases by a factor of 0.024, or 2.4%. That is, for each one year of age, there is an increase in the expected log-count of doctor visits of 0.024.
The intercept for this model represents the value of the linear predictor, xβ, when age= 0, which is not in the data, and which makes little sense in the context of this study. Statisticians typically center a continuous predictor when a 0 value has little meaning. There are a number of ways that centering can be executed, but the most common is to use the mean value of the continuous predictor, subtracting it from the actual value of the predictor for a given observation. This is called mean-centering, or deviation scoring. The intercept of a mean-centered predictor is the value of the linear predictor at the mean value of the predictor. Other types of centering are occasionally used, like the median, but mean-centering is the most common. Centering is most commonly employed for interactions in which multicollinearity is a problem. However, not all statisticians agree on the value of centering for reducing multicollinearity.
By centering, the correlation between a multiplicative term (interaction or polynomial) and its constituent variables is lessened. However, it is simply wise to use centered continuous predictors when interpretation is made more meaningful as a result. A mean-centered age may be calculated as shown in Table 6.12.
Table 6.12 R: Model with centered age
mage <-mean(age) cage <- age - mage mage
mean(cage)
crwm <- glm(docvis ~ cage, family=poisson) summary(crwm)
confint(crwm)
. qui sum age, meanonly /// quietly summarize age
. gen mage = r(mean) /// mage = mean of age, 43.78566 . gen cage = age - mage /// cage = centered value of age . mean age cage
Mean estimation Number of obs = 19609
---| Mean Std. Err. [95% Conf. Interval]
---+---age | 43.78566 .0801855 43.62849 43.94283 cage | -1.05e-07 .0801855 -.1571705 .1571703
---Comparing the mean values of age and centered-age clearly indicates that the latter has a mean of zero, as we would expect. Modeling the mean-center age results in
. glm docvis cage, fam(poi) nohead
---| OIM
docvis | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+---cage | .023908 .0003616 66.12 0.000 .0231993 .0246168 _cons | 1.119726 .0041508 269.76 0.000 1.111591 1.127861 ---Note the identical values of age and cage. Parameterizing age to its centered value has no bearing on the coefficient; however, the value of the intercept differs. Likewise, the coefficients of other non-centered predictors will stay the same as well. Only when there are interactions do the values of the main effects terms differ between centered and uncentered models. Noted also is the fact that the predicted counts for models with uncentered predictors compared with centered predictors are identical.
Interpretation of centered predictors relate to the deviation of the value from the mean. For the example given of age above, interpretation of the centered value of age can be given as:
For a one-year increase from the mean of age, the difference in the log of the expected doctor visits increases by a factor of 0.024, or 2.4%. That is, for each one-year increase from the mean of age, there is an increase in the expected log-count of doctor visits of 0.024.
In data with a number of continuous predictors, all of which need to be centered, the following Stata program may prove useful. Execution of the code will create and label centered variables automatically. Centered variables will have a leading c, followed by the name of the original variable. In this instance, the
three centered variables will have the names cweight, cheight, and cage. The code may easily be amended to name the variables in alternative ways.
. foreach var of varlist weight height age { summ ‘var’, meanonly
gen c‘var’ = ’var’ - r(mean) label var c’var’ "centered ’var’"
}
These three predictors may be more easily centered by using the user authored commands, mcenter (Simmons, 2004) and center (Jann, 2007) Type:
. ssc install mcenter or
. scc install center
at the Stata command line to obtain the appropriate code.
To reiterate, recall that the value of the intercept is the value of the linear predictor when the values of all explanatory predictors are zero. When having a zero value for a continuous predictor makes little interpretative sense, then centering is the preferred method. The value of the mean-centered intercept is the value of the linear predictor at the mean value of the continuous predictor.
For example, 1.12 is the linear predictor value of age at its mean, which is 43.79.
To see this, quietly run the non-centered model and calculate the linear predictor based on the mean value of age.
. qui glm docvis age, fam(poi) nohead /// quietly run noncentered model . _b[_cons] + _b[age]*43.78566
or
. di.0728969 +.023908*43.78566 1.1197245
The value is the same as the intercept of the mean-centered model.