• No results found

Incidence rate ratio parameterization

In document Negative Binomial Regression (Page 129-136)

Poisson regression

6.3 Example: Poisson model

6.3.2 Incidence rate ratio parameterization

These three predictors may be more easily centered by using the user authored commands, mcenter (Simmons, 2004) and center (Jann, 2007) Type:

. ssc install mcenter or

. scc install center

at the Stata command line to obtain the appropriate code.

To reiterate, recall that the value of the intercept is the value of the linear predictor when the values of all explanatory predictors are zero. When having a zero value for a continuous predictor makes little interpretative sense, then centering is the preferred method. The value of the mean-centered intercept is the value of the linear predictor at the mean value of the continuous predictor.

For example, 1.12 is the linear predictor value of age at its mean, which is 43.79.

To see this, quietly run the non-centered model and calculate the linear predictor based on the mean value of age.

. qui glm docvis age, fam(poi) nohead /// quietly run noncentered model . _b[_cons] + _b[age]*43.78566

or

. di.0728969 +.023908*43.78566 1.1197245

The value is the same as the intercept of the mean-centered model.

6.3.2 Incidence rate ratio parameterization

Log-counts are a bit difficult to handle for practical situations, therefore most statisticians prefer working with rate ratios, as described in Chapter 2. We shall use the medpar data set for examples in this section.

In the parameterization that follows, the coefficients are exponentiated to assess the relationship between the response and predictors as incidence rate ratios (IRR). For a model with a single predictor, the IRR is identical to the value we can calculate by hand using the methods described in Chapter 2. A factored predictor, like the levels of type below, would also have the same values that we can calculate using the methods of Chapter 2, given that it is the only predictor in the model. For models with two or more predictors, predictors are considered as adjusters for each other; For example, in the model above, white and the levels of type are kept at a constant value as adjusters for hmo; hmo and levels of type are adjusters for white, and so forth. As adjusters, their presence alters the value of the predictor of interest, particularly if the adjuster(s) are significant predictors. That is, statistically significant adjusters, or confounders, change the value of the predictor of interest from what it would be if no adjusters were present in the model. A statistical model is required to determine its value. Each significant predictor in the model may statistically be regarded as a predictor of interest. Correspondingly, dropping a highly non-significant predictor from a model only slightly changes the value of the coefficients of the remaining predictors.

The interpretation of a Poisson coefficient is different from when it is param-eterized as a rate ratio. A table of incidence rate ratios and their corresponding standard errors and confidence intervals for our first model can be constructed using the following code:

rm(list=ls()) data(medpar) attach(medpar)

poi <- glm(los ~ hmo+white+type2+type3, family=poisson, data=medpar)

. glm los hmo white type2 type3, fam(poi) nohead eform

---| OIM

los | IRR Std. Err. z P>|z| [95% Conf. Interval]

---+---hmo | .9309504 .0222906 -2.99 0.003 .8882708 .9756806 white | .8573826 .0235032 -5.61 0.000 .8125327 .904708 type2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713 type3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778 ---Using R, the rate ratios may be calculated from the previous R model fit by exponentiating the coefficients. Recall that poi was the name given the R model.

exp(coef(poi))

(Intercept) hmo white type2 type3

10.3081316 0.9309504 0.8573826 1.2481366 2.0329271 Note that the intercept is also exponentiated. Most software applications do not display this value since it is not a rate ratio. The intercept is not compared with any other level or value, and may therefore be ignored. The following command may be used to obtain the confidence intervals of the rate ratios, exp(confint(poi))

2.5 % 97.5 % (Intercept) 9.7689185 10.8684770 hmo 0.8880448 0.9754372 white 0.8128335 0.9050499 type2 1.1975035 1.3005198 type3 1.9308350 2.1391531

The similarity in values between Stata and R output, as would be expected. The interpretation of the rate ratios is much more intuitive than the interpretation of model coefficients; however, rate ratios are nevertheless based on the coef-ficient notion of the difference in the log of expected counts. Recalling simple algebra,

βK= ln(µ1)− ln(µ0)= ln(µ10) (6.36) for binary predictors, and

βK= ln(µk+1)− ln(µk)= ln(µk+1k) (6.37) for continuous predictors.

Note that the relationship of the two values of the predictor are in ratio form.

Exponentiating the right most term in the two above equations gives us the rate ratio as understood in Chapter 2:

BINARY IRR

exp(βK)= µ10 (6.38)

CONTINUOUS IRR

exp(βK)= µk+1k (6.39)

Although we shall discuss this notion in more detail in Section 6.7, when counts are collected over a specified time period or area, we call the counts a rate.

Hence the name rate ratio. The incidence rate is the rate at which counts enter a time period or area. Hence the term incidence rate ratio, which is also simply

Table 6.13 Frequency interpretation of Poisson incidence rate ratios

hmo: HMO patients are expected to stay in the hospital 7% fewer days than are private pay patients, with the remaining predictor values held constant. (0.93–1)

white: White patients are expected to stay in the hospital 14% fewer days than are non-white patients, with the remaining predictor values held constant. (0.857–1)

type2: Urgent patients are expected to stay in the hospital 25% more days than are elective patients, with the remaining predictor values held constant. (1.248–1)

type3: Emergency patients are expected to stay some 103% more days in the hospital than are elective patients, with the remaining predictor values held constant. (2.03–1)

a variety of risk ratio. Different names are given to the same process by various authors, but they mean the same. The area or time period may be specified as one, in which case the model is estimated without an offset. Again, we address this notion in more detail in Section 6.7.

In the case of our first example, counts are days stayed in the hospital; in the second, counts are numbers of visits to the doctor each year. The interpretation of the incidence rate ratios of the first model is given in Tables 6.13 and 6.14. Both frequency and expected count interpretations are given since both are commonly found in statistical literature. Which interpretation is used for a particular study depends on the nature of the study report, as well as on personal preference. Neither makes a statistical difference, although I tend to prefer the frequency interpretation.

Recall that we earlier discussed how the regression coefficient changes when the inverse of a binary predictor is estimated, compared with a model with x= 1. When, for example, we wish to estimate the coefficient of private pay (hmo= 0) rather than hmo (hmo = 1) using the present working example, we need only reverse the signs of the coefficient of hmo. However, since all exponentiated coefficients are positive, this same relationship does not obtain for values of IRR. Instead the inverse of binary predictors are estimated by inverting the value of IRR.

The IRR values of hmo and white in the example model are 0.9309504 and 0.8573826 respectively. Inverting these IRR values yields,

. di 1/exp(_b[hmo]) /// or 1/.9309504 1.0741711

and

. di 1/exp(_b[white]) /// or 1/.8573826 1.1663405

The model of los on private, nonwhite, and type of admission, with exponenti-ated coefficients, can be given as

R

private <- 1 - hmo nonwhite <- 1 - white

poi0 <- glm(los ~ private+nonwhite+type2+type3, family=poisson, data=medpar)

exp(coef(poi0)) exp(confint(poi0))

. glm los private nonwhite type2 type3, fam(poi) nohead eform

---| OIM

los | IRR Std. Err. z P>|z| [95% Conf. Interval]

---+---private | 1.074171 .0257199 2.99 0.003 1.024926 1.125783 nonwhite | 1.16634 .0319726 5.61 0.000 1.105329 1.23072 type2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713 type3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778 ---which verifies the inverse relationship of exponentiated coefficients of binary predictors. This relationship demonstrates that one can subtract, for example, the same IRR value that x= 1 is greater than 1.0 from 1.0 to obtain the IRR of x= 0.

For the present example, the IRR of hmo is 0.931, which is approximately 0.07 less than 1.0. Adding the same value of 0.07 to 1.0 produces 1.07, which is in fact the IRR value of private. Likewise for white− non-white (0.857–1.16), where the difference in IRR values is some± 0.16. The IRR values of all binary predictors maintain the same relationship.

In order to examine the interpretation of a continuous predictor, we turn to the German health data example in rwm5yr, for which we earlier modeled the number of doctor visits on age, a continuous predictor. The incidence rate ratio of age may be interpreted by exponentiating its Poisson coefficient of 0.023908.

. di exp(.023908) 1.0241961

For a one-year increase in age, the rate of visits to the doctor increases by 2.4%, with the remaining predictor values held constant.

Table 6.14 Expected count interpretation of Poisson incidence rate ratios

hmo: HMO patients have an expected decrease in the number of days in the hospital by a factor of 0.93 compared with private pay patients, with the remaining predictor values held constant.

white: White patients have an expected decrease in the number of days in the hospital by a factor of 0.86 compared with non-white patients, with the remaining predictor values held constant.

type2: Urgent patients have an expected increase in the number of days in the hospital by a factor of 0.25 compared with elective patients, with the remaining predictor values held constant.

type3: Emergency patients have an expected increase in the number of days in the hospital by a factor of 2 compared with elective patients, with the remaining predictor values held constant. OR Emergency patients are expected to stay in the hospital twice as long as elective patients, other predictors being constant.

The rate ratio for a 10-year increase in age is obtained by exponentiating the incidence rate ratio 1.024 by 10.

. di 1.0241961ˆ10 1.2700803

This same value may be directly calculated from the coefficient as:

. di exp(.023908*10) 1.2700801

Any non-incremental increase or decrease in the value of a continuous predictor uses the same logic for obtaining rate ratios.

The expected count interpretation of exponentiated Poisson coefficients, or incidence rate ratios, are given in Table 6.14.

The expected count interpretation of the incidence or risk ratio of age in predicting the number of doctor visits in the German health registry can be given as:

For a one-year increase in age, the expected number of visits to a doctor increases by a factor of 0.024 for a given year, with the other predictor values held constant. That is, for each additional year of age, patients are expected to have 0.024 more visits to a doctor, other predictor values being constant.

On the surface, the Medicare model appears acceptable. However, the Pearson dispersion value, displayed above, is 6.26, far exceeding unity. Reiterating, the dispersion is defined as the ratio of the Pearson statistic to the degrees of freedom, or the number of observations less predictors. In this case we have

9327.983/1490= 6.260. Such a value for a model consisting of some 1,500 observations is clearly excessive.

The dispersion statistic in the Medicare model output has been indicated with a “<” to the immediate right of the statistic. We will discover, however, that even if the dispersion statistic exceeds unity, the model may nevertheless be only apparently overdispersed, not overdispersed in fact. In this case, though, the model is indeed overdispersed. Lagrange multiplier and Z tests results indicate overdispersion (not shown). We discuss overdispersion, its varieties, and additional tests in the next chapter.

The standard errors of incidence rate ratios are not found in the variance-covariance matrix as they are for model coefficients. Rather, IRR standard errors are determined using the delta method. The method specifies that the standard error is calculated using the following formula:

SEdelta= exp(β) ∗ SE(β) (6.40)

Employing the formula for the IRR of predictor hmo, we have hmo IRR

. di exp(-.0715493) .93095038

SE of hmo IRR

. di exp(-.0715493) *.02394396 .02229064

which are identical to the values displayed in the model output above.

What of the IRR confidence intervals? The values displayed in the above model output are 0.8882708 and 0.9756806. If we apply the same formula for determining the confidence intervals we used for model coefficients, we have:

. di.93095038 -- 1.96*.02229064 .88726073

and

. di.93095038 + 1.96*.02229064 .97464003

The values are close, and some software packages use this method. However, it is incorrect. It is generally considered preferable to simply exponentiate the coefficient confidence intervals. Doing so produces the following:

. di exp(-.1184786) .88827082

. di exp(-.02462) .9756806

Both values are identical to the model output. This same method is used for all exponentiated coefficient confidence intervals, including logistic regression odds ratios. The delta method for calculating standard errors is used in a variety of contexts – marginal effects standard errors being one.

In document Negative Binomial Regression (Page 129-136)