• No results found

Risk and 2 ×2 tables

In document Negative Binomial Regression (Page 35-38)

The concept of risk

2.1 Risk and 2 ×2 tables

The notion of risk lays at the foundation of the modeling of counts. In this chapter we discuss the technical meaning of risk and risk ratio, and how to interpret the estimated incidence rate ratios that are displayed in the model output of Poisson and negative binomial regression. In the process, we also discuss the associated relationship of risk difference as well as odds and odds ratios, which are generally understood with respect to logistic regression models.

Risk is an exposure to the chance or probability of some outcome, typically thought of as a loss or injury. In epidemiology, risk refers to the probability of a person or group becoming diseased given some set of attributes or char-acteristics. In more general terms, the risk that an individual with a specified condition will experience a given outcome is the probability that the individual actually experiences the outcome. It is the proportion of individuals with the risk factor who experience the outcome. In epidemiological terms, risk is there-fore a measure of the probability of the incidence of disease. The attribute or condition upon which the risk is measured is termed a risk factor, or exposure.

Using these terms then, risk is a summary measure of the relationship of disease (outcome) to a specified risk factor (condition). The same logic of risk applies in insurance, where the term applies more globally to the probability of any type of loss.

Maintaining the epidemiological interpretation, relative risk is the ratio of the probability of disease for a given risk factor compared with the probability of disease for those not having the risk factor. It is therefore a ratio of two ratios, and is often simply referred to as the risk ratio, or, when referencing counts, the incidence rate ratio (IRR). Parameter estimates calculated when modeling counts are generally expressed in terms of incidence risk or rate ratios.

15

Table 2.1 Partial Titanic data as grouped

obs survive cases class sex age

1 14 31 3 0 0

2 13 13 2 0 0

3 1 1 1 0 0

4 13 48 3 1 0

. . .

It is perhaps easier to understand the components of risk and risk ratios by constructing a table of counts. We begin simply with a 2×2 table, with the response on the vertical axis and risk factor on the horizontal. The response, which in regression models is also known as the dependent variable, can rep-resent a disease outcome, or any particular occurrence.

For our example we shall use data from the 1912 Titanic survival log. These data are used again for an example of a negative binomial model in Chapter 9, but we shall now simply express the survival statistics in terms of a table. The response term, or outcome, is survived, indicating that the passenger survived.

Risk factors include class (i.e. whether the passenger paid for a first-, second-, or third-class ticket), sex, and age. Like survived, age is binary (i.e. a value of 1 indicates that the passenger is an adult, and 0 indicates a child). Sex is also binary (i.e. a value of 1 indicates that the passenger is a male, and 0 indicates a female).

Before commencing, however, it may be wise to mention how count data is typically stored for subsequent analysis. First, data may be stored by observa-tions. In this case there is a single record for each Titanic passenger. Second, data may be grouped. Using the Titanic example, grouped data tell us the number of passengers who survived for a given covariate pattern. A covariate pattern is a particular set of unique values for all explanatory predictors in a model. Suppose the partial table shown inTable 2.1.

The first observation indicates that 14 passengers survived from a total of 31 who were third-class female children passengers. The second observation tells us that all second-class female children passengers survived. So did all first-class female children. It is rather obvious that being a third-class passenger presents a higher risk of dying, in particular for boys.

There are 1,316 observations in the titanic data set we are using (I have dropped crew members from the data). In grouped format, the dataset is reduced to 12 observations – with no loss of information.

Table 2.2 R: Basic tabulation of Titanic data: survived on age

library(COUNT) # use for remainder of book data(titanic) # use for remainder of Chapter attach(titanic) # use for remainder of Chapter library(gmodels) # must be pre-installed

CrossTable(survived, age, prop.t=FALSE, pror.r=FALSE, prop.c=FALSE, prop.chisq=FALSE)

Third, we may present the data in terms of tables, but in doing so only two variables are compared at a time. For example, we may have a table of survived and age, given as follows

. tab survived age

| Age (Child vs Adult) | Survived | child adults | Total

---+---+---no | 52 765 | 817

yes | 57 442 | 499

---+---+---Total | 109 1,207 | 1,316

This form of table may be given in paradigm form as:

x

0 1

---+---+

0 | A B | A+B

y | |

1 | C D | C+D

---+---+

A+C B+D

Many epidemiological texts prefer to express the relationship of risk factor and outcome or response by having x on the vertical axis and y on the horizontal.

Moreover, you will also find the 1 listed before the 0, unlike the above table.

However, since the majority of software applications display table results as above, we shall employ the same format here. Be aware, though, that the relationships of A, B, C, and D we give are valid only for this format; make the appropriate adjustment when other paradigms are used. It is the logic of the relationships that is of paramount importance.

Using the definition of risk and risk ratio discussed above, we have the following relationships.

The risk of y given x= 1:

D/(B+ D) (2.1)

The risk of y given x= 0:

C/(A+ C) (2.2)

The risk ratio (relative risk) of y given x= 1 compared with x = 0:

D/(B+ D)

C/(A+ C) =D/(A+ C)

C/(B+ D)= AD+ CD

BC+ CD (2.3)

We use the same paradigm with the values from the titanic data of survived on age. Recall that adults have the value of age= 1, children of age = 0.

0 1

Survived | child adults | ---+---+

0 no | 52 765 |

| |

1 yes | 57 442 |

---+---+

Total | 109 1,207

The risk of survival given that a passenger is an adult: 442/1207= 0.36619718 The risk of survival given that a passenger is a child: 57/109= 0.52293578 The risk ratio (relative risk) of survival for an adult compared with a child:

442/1207

57/109 =0.36619718

0.52293578 = 0.7002718 This value may be interpreted as:

The likelihood of survival was 30% less for adults than for children.

Or, since 1/0.70027= 1.42802,

The likelihood of survival was 43% greater for children than for adults.

In document Negative Binomial Regression (Page 35-38)