Defining Causal Effects

Occupational epidemiologists are often interested in trying to understand whether the effect of an intervention on a specific occupational exposure will delay or prevent the occurrence of an undesirable health-related outcome. In the present case, asbestos is a well-known human carcinogen (International Agency for Research on Cancer 1987). It is also, however, an extremely versatile mineral with desirable physical properties and many potential applications in commerce, industry, and research (Alleman and Mossman 1997). Thus, ideally, epidemiologic research on the detrimental health effects of asbestos exposure would be geared towards determining optimal permissible exposure levels in an occupational setting.

Here, we outline a framework for defining what we mean by the “theeffectof anin- tervention.” A policy maker interested in decreasing the rate of lung cancer mortality in a given occupation might wish to compare the survival function in a well-defined group of workers exposed to a given level of asbestos to the survival function in the exact same group of workers exposed to a lower level of asbestos over the same time period. In other words,ceteris paribuswhat would the rates of lung cancer mortality have been in this cohort under two different levels of exposure to asbestos. Let us consider a particular individual denoted by i in this cohort. Let us further denote

this individual’s exposure history from their first year at work to their year of death as x¯t ≡ {x0,· · · , xt}, where the overbar on the x denotes the entire exposure history

(thex process) up to timet. That is,x¯t is the set of exposure values documenting the

individual’s entire exposure history since they began work until they died at timet. We refer to this set of variables denoting an exposure history as aregime. We define

T_ix¯ as the time at which individual i would have died had she been exposed to the regime defined by x¯t ≡ {x0,· · · , xt}. Similarly, we define Tx¯

i as the time at which this

individual would have died had she been exposed to adifferent exposure regime defined byx¯_t0. We define the “intervention” as the act of changing the exposure regime fromx¯t to x¯t0.

For example, let xj be the exposure value at time j just before the individual’s

cumulative history will surpass some desired cutpoint, denoted a. Then we can define the intervention “limit the cumulative amount of asbestos to which an individual is exposed to some level a” as settingx¯t ≡ {x0, x1,· · · , xt}to x¯t0 ≡ {x

0 0= x0, x10 = x1,· · · , xj0 = xj, xj0+1 = 0,· · · , x 0 t = 0}, where P k≤tx 0

k ≤ a. Using the resulting variables T_ix¯ and T_ix¯0, we can assess whether capping the cumulative asbestos exposure to a

will lengthen the time to lung cancer mortality by an appreciable amount. Following Rubin (2005), for a given populationSwe define a causal effect as any comparison of the ordered sets{Tx¯

i , i ∈ S}and {T¯

i , i ∈ S}. Note that this definition is not the same

as comparing the ordered sets{Tx¯

i , i ∈ S0}and{Tx¯

i , i ∈ S1}, S0 ,S1. The importance of this last clarification will become clear in the following paragraphs.

The variablesT_ix¯ andT_ix¯0 are potential outcomes: outcomes that would have been observed under a potential exposure regime (possibly counter to the fact) x¯t and x¯t0,

respectively. Potential outcomes such as Tx¯

i and T

i can be thought of as baseline

variables similar to race or gender because they do not depend on subjecti’s actual exposure regime. Instead, the exposure regime can be thought of as the function

which determines which of the (possibly many) potential outcomes is observed. For example, if individual i’s observed exposure history is X¯t = x¯t, then (provided x¯t is

well-defined) the observed time to lung cancer mortality isTi = Tix¯. Consequently, all

other potential outcomes for individualiare missing: we cannot observe what would have happened to the same individual under different exposure regimes. This is a problem because our aim is to compare the ordered sets of potential outcomes (or some function thereof) for every unit i in the set of individuals defined by S under two different exposures.

This problem is known as the fundamental problem of causal inference (Holland 1986). Each potential outcome is observable, but we can never actually observe more than one potential outcome for a given individual. Thus, for a given set of individuals defined byS, we can never estimate the causal effect defined as a comparison of the ordered sets{Tx¯

i , i ∈ S}and{T¯

i , i ∈ S}. Rather, we must make certain assumptions

that will give us the desired information on both T_ix¯ and T_ix¯0 for i ∈ S. Put another way, because epidemiologic data can only provide information on the ordered sets

{Tx¯

i , i ∈ S0} and {Tx¯

i , i ∈ S1} for S0 ⊂ S, S1 ⊂ S,S0∩ S1 = ∅, we require certain assumptions to make the information obtained from {Tx¯

i , i ∈ S0}comparable to the information obtained in{Tx¯ i , i ∈ S}(similarly forT ¯ x0 i andS1).

Finally, although defining our causal effect as any comparison of the ordered sets

{Tx¯

i , i ∈ S} and {T¯

i , i ∈ S} helps acquire a general understanding of what we mean

by “causal effect,” it is too general to be of relevance to practicing epidemiologists. To make our definition more specific, we define Ti as the vector of potential outcomes {Tx¯

i , T¯ x0

i }and denote the joint density of Tgiven observed covariatesZi and observed

exposure Xi as f(t|Zi, Xi). For a given population S, this joint density defines any

causal comparison of interest. For example, one may be interested in the proportion of individuals who would not have lived past a certain age under exposure regimex¯t

andx¯_t0(doomed) compared to the proportion who would not have lived past that same age in the absence of the intervention (i.e., under exposure regime x¯t). Epidemiol-

ogists typically forego such contrasts because the required information of the joint distributionf(t|Z_i, Xi)is not identifiable with epidemiologic data (Greenland & Robins

1986; Imbens and Rubin 1997). Consequently, as will be done in the present study, it is usually sufficient to define and compare the marginal densities of Tx¯ and Tx¯0, denotedfTx¯(t|Z_i, X_i)andf_Tx¯0(t|Zi, Xi), respectively. One can then define a causal effect

as any contrast between the marginals fTx¯(t|Z_i, X_i) andf_Tx¯0(t|Zi, Xi). In particular, for

the present study, we define our causal estimand of interest as

ψ= QTx¯ 0(p) QTx¯(p) , (7.1) where Q_Tx¯0(·) ≡ F−1 Tx¯0(·) andQTx¯(·) ≡ F −1

Tx¯(·)denote the quantile (or inverse cumulative

distribution) functions for argument(·), taken over the marginal distributions of Tx¯0

and Tx¯, respectively. This estimand is a measure of the relative survival time (i.e., the survival time ratio) comparingTx¯0 andTx¯. It is a summary of the horizontal dis- tance between two survival curves at any given quantile (such as, e.g., the median) denoted by p. Defining our estimand as a survival time ratio, rather than a hazard ratio (which compares attributes of the vertical space between two survival curves) is preferable because: (i) the survival time ratio has a direct physical interpretation that is more intuitive (Reid 1994); and (ii) the hazard ratio has a built-in selection bias (Hernán 2010).

In document 5499.pdf (Page 53-56)