• No results found

Synthetic Poisson with offset

In document Negative Binomial Regression (Page 156-161)

Poisson regression

6.7 Parameterization as a rate model .1 Exposure in time and area

6.7.2 Synthetic Poisson with offset

It may be helpful to construct a synthetic Poisson regression model with an off-set. Recall from Table 6.1 that the offset is added to the calculation of the linear predictor, but is subtracted from the working response just prior to the iterative estimation of parameters. In constructing a synthetic rate-parameterized Pois-son model, however, the offset is only added to the calculation of the linear predictor.

A synthetic offset may be randomly generated, or may be specified by the user. For this example I will create an area offset having increasing values of 100 for each 10,000 observations in the 50,000-observation data set. The shortcut code used to create this variable is given in the first line below. I have commented code that can be used to generate the same offset as in the single line command that is used in the algorithm. The extended code better demonstrates what is being done.

We expect that the resultant model will have approximately the same param-eter values as the earlier model, but with different standard errors. Modeling the data without using the offset option results in similar parameter estimates to those produced when an offset is employed, but with an inflated value of the intercept. The COUNT poisson syn function can also be used.

Table 6.19 Synthetic rate Poisson data

* poio_rng.do clear

set obs 50000 set seed 4744

gen off = 100+100*int((_n-1)/10000) // creation of offset

* gen off=100 in 1/10000 // lines duplicate single line above

* replace off=200 in 10001/20000

* replace off=300 in 20001/30000

* replace off=400 in 30001/40000

* replace off=500 in 40001/50000

gen loff = ln(off) // log offset prior to model entry gen x1 = rnormal()

gen x2 = rnormal()

gen py = rpoisson(exp(2 + 0.75*x1 - 1.25*x2 + loff)) glm py x1 x2, fam(poi) off(loff) // added offset option

The results of the rate parameterized Poisson algorithm given in Table 6.19 (R-Table 6.20) above is displayed below:

Generalized linear models No. of obs = 50000

Optimization : ML Residual df = 49997

Scale parameter = 1 Deviance = 49847.73593 (1/df) Deviance = .9970145 Pearson = 49835.24046 (1/df) Pearson = .9967646

AIC = 10.39765

Log likelihood = -259938.1809 BIC = -491108.7

---| OIM

py | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---+---x1 | .7500656 .0000562 13346.71 0.000 .7499555 .7501758 x2 | -1.250067 .0000576 -2.2e+04 0.000 -1.25018 -1.249954 _cons | 1.999832 .0001009 19827.16 0.000 1.999635 2.00003

loff | (offset)

---The same code logic is employed when using R to effect the same model result.

Table 6.20 R: Poisson with rate parameterization using R

oset <- rep(1:5, each=10000, times=1)*100 loff <- log(oset)

sim.data <- poisson_syn(nobs = 50000, off = loff, xv = c(2, .75, -1.25)) poir <- glm(py ~ . + loff, family=poisson, data = sim.data)

summary(poir) confint(poir)

Relevant model R output is displayed as:

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.0002055 0.0002715 7367 <2e-16 ***

x1 0.7501224 0.0003612 2077 <2e-16 ***

x2 -1.2507453 0.0003705 -3375 <2e-16 ***

---A display of the offset table is shown as:

. table(off) off

100 200 300 400 500 10000 10000 10000 10000 10000

The resultant parameter estimates are nearly identical to the specified values.

We expect that the parameter estimates of the model with an offset or rate parameterized model will closely approximate those of the standard model.

However, we also expect that the standard errors will differ. Moreover, if the rate model were modeled without declaring an offset, we would notice a greatly inflated intercept value.

6.7.3 Example

An example of a Poisson model parameterized with an offset is provided below. The data are from the Canadian National Cardiovascular Disease reg-istry called, FASTRAK. They have been grouped by covariate patterns from individual observations and stored in a file called fasttrakg. The response is die, which is a count of the number of deaths of patients having a specific pat-tern of predictors. Predictors are anterior, which indicates if the patient has had a previous anterior myocardial infarction in difference to having had an inferior site infarct; hcabg, if the patient has a history of having had a CABG (coronary artery bypass graft) procedure compared with having had a PTCA (percuta-neous transluminal coronary angioplasty); and killip class, a summary indicator of the cardiovascular health of the patient, with increasing values indicating increased disability. The number of observations sharing the same pattern of covariates is recorded in the variable case. This value is log-transformed and entered into the model as an offset.

Table 6.21 R: Poisson model of Fasttrak data

rm(list=ls()) data(fasttrakg) attach(fasttrakg) lncases <- log(cases)

poioff <- glm(die ~ anterior + hcabg + kk2 + kk3 + kk4 + offset(lncases), family=poisson, data=fasttrakg) exp(coef(poioff))

exp(confint(poioff))

. glm die anterior hcabg kk2-kk4,fam(poi) eform lnoffset(cases)

Generalized linear models No. of obs = 15

Optimization : ML Residual df = 9

Scale parameter = 1 Deviance = 10.93195914 (1/df) Deviance = 1.214662 Pearson = 12.60791065 (1/df) Pearson = 1.400879

. . .

AIC = 4.93278

Log likelihood = -30.99584752 BIC = -13.44049

---| OIM

die | IRR Std. Err. z P>|z| [95% Conf. Interval]

---+---anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.6848 hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546 kk2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083 kk3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213 kk4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096 cases | (exposure)

---The Pearson dispersion has a relatively low value of 1.40. Given a total observation base of 5,388, the added 40% overdispersion may represent a lack of model fit. We shall delay this discussion until the next chapter where we deal specifically with models for overdispersed data.

Table 6.22 Interpretation of rate parameterized Poisson

anterior: A patient is twice as likely to die within 48 hours of admission if they have sustained an anterior rather than in inferior infarct, holding the other predictor values constant.

hcabg: A patient is twice as likely to die within 48 hours of admission if they have a previous history of having had a CABG rather than a PTCA, holding the other predictor values constant.

kk2: A patient rated as Killip level 2 is some two and a half times more likely to die within 48 hours of admission than is a patient rated at Killip level 1, holding the other predictor values constant.

kk3: A patient rated as Killip level 3 is some three times more likely to die within 48 hours of admission than is a patient rated at Killip level 1, holding the other predictor values constant.

kk4: A patient rated as Killip level 4 is over 12 times more likely to die within 48 hours of admission than is a patient rated at Killip level 1, holding the other predictor values constant.

Summary

The Poisson model is the traditional basic count response model. We discussed the derivation of the model and how the basic Poisson algorithm can be amended to allow estimation of rate models, i.e. estimation of counts per specified defined areas or over various time periods. The rate parameterization of the Poisson model is also appropriate for modeling counts that are weighted by person years.

A central distributional assumption of the Poisson model is the equivalence of the Poisson mean and variance. This assumption is rarely met with real data. Usually the variance exceeds the mean, resulting in what is termed as overdispersion. Underdispersion occurs when the variance is less than the nominal mean, but this also rarely occurs in practice. Overdispersion is, in fact, the norm, and gives rise to a variety of other models that are extensions of the basic Poisson model.

Negative binomial regression is nearly always thought of as the model that is to be used instead of Poisson when overdispersion is present in the data. Because overdispersion is so central to the modeling of counts, we next address it, and investigate how we determine if it is real or only apparent.

Overdispersion

This chapter can be considered as a continuation of the previous one. Few real-life Poisson data sets are truly equidispersed. Overdispersion to some degree is inherent to the vast majority of Poisson data. Thus, the real question deals with the amount of overdispersion in a particular model – is it statistically sufficient to require a model other than Poisson? This is a foremost question we address in this chapter, together with how we differentiate between real and apparent overdispersion.

In document Negative Binomial Regression (Page 156-161)