• No results found

differences. Alternatively, inverse-propensity weighting and regression adjustment can be combined to gain efficiency of an adjustment procedure (see Hirano & Imbens, 2001).

2.4.2 Double Robustness

To conclude this survey of adjustment methods for quasi-experimental designs, we must finally mention a special class of dual-modeling strategies. These recently suggested estimators for the average total ef-fect were not developed to gain efficiency of the adjustment procedure, but to make the resulting analysis more robust against the misspecification of either the assignment model or the outcome model. These so-called doubly robust procedures (see Lunceford & Davidian, 2004; Bang & Robins, 2005; Kang & Schafer, 2007a) are also known in the missing value literature (see, e. g., J. M. Robins & Rotnitzky, 1995; R. J. A. Little &

An, 2004; Carpenter, Kenward, & Vansteelandt, 2006). Kang and Schafer (2007a) discuss the performance of different doubly robust strategies for the estimation of the average total effect. Until today, there has been an ongoing discussion about the usefulness of doubly robust estimation methods in the literature (see, for example, the comments to Kang & Schafer, 2007a, by J. M. Robins, Sued, Lei-Gomez, & Rotnitzky, 2007;

Ridgeway & McCaffrey, 2007; Tan, 2007; Tsiatis & Davidian, 2007). The variance of the ATE–estimator is increased for doubly robust methods compared to the procedures based either solely on the assignment or solely on the outcome model (see, e. g., Freedman & Berk, 2008). Hence, doubly robust estimators are clearly suboptimal, if one of the parameterizations of the function form of the outcome or the assignment model is (precisely) correct. For a detailed description of the different doubly robust ATE–estimators and their theoretical foundation see, e. g., Lunceford and Davidian (2004), Tan (2006), Tsiatis (2006), and Glynn and Quinn (2010).

To analyze the data from the described quasi-experimental design, Pohl et al. (submitted) applied a weighted covariate-treatment regression with a selection of covariates and weights according to a propen-sity score subclassification. This analysis can be seen as double robust, provided the covariate-treatment regression is assumed to be Z –conditionally unbiased with the selection of potential confounders included as covariates Z .

2.5 Practical Issues Concerning the Adjustment

As described above, the average total effect can be estimated unbiasedly if a) the covariate-treatment regres-sion is unbiased and b) either the covariate-treatment regresregres-sions (outcome modeling) or the propensity score model (assignment modeling) is precisely correctly parameterized (see subsection 1.1.8). However, in practice some additional properties of the adjustment methods, which we will highlight in this subsection, are relevant.

2.5 Practical Issues Concerning the Adjustment 40

2.5.1 Balancing Check

A key concept for the practical application of adjustment methods based on the propensity score is the inspection of the achieved balance of the covariate-distributions of the matched samples or the weighted covariates. Rosenbaum and Rubin (1983b) have shown that the covariates are independent of the treatment variable conditional on the true propensity, i. e.,

Z ⊥⊥ X |φ. (2.19)

(see also Steyer et al., in press). This balancing property of the true propensity (which relies on large sam-ples for the estimated propensity score) can be used to evaluate the specification of the assignment model used for the estimation of the propensity score bπ, provided Z -conditional independence of X and the true outcomes. If Equation (2.19) is not fulfilled for an estimated propensity score, the assignment model for πj= P(X = j |Z ) cannot be specified correctly. Obviously, the propensity score πj = P(X = j |Z ) is modeled without taking the outcome variable Y of the study into account (see above). Hence, it is possible (and also strongly suggested, see, e. g., Dehejia, 2005) that different specifications of the assignment model be tried as often as necessary to obtain trustable estimates of the propensity score in terms of balance (see also Yanovitzkya, Zanutto, & Hornik, 2005, for a step-by-step example).

According to Equation (2.19), treated and untreated units have identical distributions of the covariates Z = (Z1,..., ZK) for a given value of the true propensity score. Therefore, we expect the distributions of the covariates conditional on the propensity score π = c to be equal between the treatment groups X = j and X = k for all values of c between 0 and 1 (see Schafer & Kang, 2008). For the univariate distribution this implies that, for instance, the first two central moments are equal conditional on the values π = c:

E (Z |π = c, X = j ) = E(Z |π = c, X = k) Var (Z |π = c, X = j ) = Var (Z |π = c, X = k).

(2.20)

In order to analyze the achieved balance simultaneously for many values c of the estimated propensity score b

π under realistic sample sizes, Rosenbaum and Rubin (1984) suggest an approach based on propensity score subclassification (i. e., a two-way analysis of variance with treatment variable X times πsas an indicator for the propensity subclass, conducted separately for each covariate). Alternatively, the weighted covariates can be analyzed to investigate the balance property, as described, for instance, by Steyer et al. (in press).

The assignment model itself is — from a substantive point of view — not of interest for the estima-tion of average total effects. This fact has motivated the conclusion that the assignment model need not be

2.5 Practical Issues Concerning the Adjustment 41

parsimonious and might include numerous covariates, interactions and nonlinear terms (see, e. g., Shaha, Laupacisa, Huxa, & Austina, 2005). Hence, given that the assignment model itself is only used as a device for estimating the (individual) propensity scores, statistical significance of regression coefficients in the logistic regression are usually not of great interest for the model building process (see subsection 2.5); Rosenbaum and Rubin (1984) therefore suggest using a liberal inclusion criterium for the (stepwise) regression. Fur-thermore, as pointed out by Imai, King, and Stuart (2008), if the balance is checked for matched samples, the sample size of the comparison group (related to the number of controls dropped through the matching process) can interfere with the test statistic used for the balance check.

Although the balance check is an important step for the specification of the assignment model, it is often found to be poorly implemented or even omitted in applied studies using propensity score techniques (see, e. g., Austin, 2008). Nevertheless, basic criteria for applied researchers to check the achieved balance are provided by Rubin (2001). In line with these rules, Pohl et al. (submitted) checked the specification of the propensity score model for the observational study (described as an example in section 2.1) according to the following two main balance metrics: The absolute standardized difference in the covariates’ means (Rubin, 1973; Rosenbaum & Rubin, 1985; Haviland, Nagin, & Rosenbaum, 2007) and the variance ratio of the covariates between treatment groups.

2.5.2 Regression Diagnostic

The corresponding counterpart to the balance check for the specification of the assignment model is the application of regression diagnostics for an evaluation of the parameterization of the outcome model. If the functional form of the covariate-treatment regression is misspecified, the ATE–estimator will be biased even if the covariate-treatment regression is Z -conditional unbiased. Therefore, regression diagnostics is of major importance for the outcome modeling approach.

Various regression diagnostics are known in the literature: For instance, Steyer (2003) describes a test statistic based on a comparison of a parametric regression model (with a functional form assumption) to a saturated model, which is obtained by creating indicator variables for each distinct pattern of values of the regressors. Therefore, this strategy can be applied for sufficiently large samples and for discrete covariates. If no saturated parametrization is possible to test the functional form assumption of the covariate-treatment regression, the specification can be evaluated by comparing different parameterizations of a regression, for example, a simple linear functional form against an extended model including additional power terms (a strategy known as RESET test developed by Ramsey, 1969).

According to Long and Trivedi (1992), possible misspecifications can be categorized as first order mis-specifactions regarding the conditional expectation of the residual [E (ε|X ) 6= 0], as second order

misspec-2.5 Practical Issues Concerning the Adjustment 42

ifications regarding the structure of the error variances [Var (ε|X ) = E(ε2|X ) 6= σ2I ] or as third order mis-specifications regarding the distribution of the error terms. This classification is of interest for the imple-mentation of generalized analysis of covariance because opposed to the general linear model some of the studied structural equation models with nonlinear constraints are with respect to the ATE–estimator not robust against second order misspecifications.

Most commonly, the specification of a regression model is judged based on a (visual) inspection of the residuals, a strategy known as residual analysis (see, for example, Belsley, Kuh, & Welsch, 1980, Kutner, Nachtsheim, Neter, & Li, 2005, Kang & Schafer, 2007b, and Sheather, 2009, as well as Tan, 2007, for a dis-cussion of differences in model checking between outcome regression and propensity score specification).

Since the residuals of a true regression are uncorrelated with any functions of the regressors and have an expectation of zero conditional on (functions of) the regressors, the inspection of (standardized) residual plots can help to detect inappropriate specifications of the covariate-treatment regression (see, for example, S. Weisberg, 2005).

2.5.3 Overlap, Common Support & Extrapolation

Within the Rubin Causal Model, substantial overlapping distributions of the covariates or the propensity scores between treatment groups is discussed as a desirable property for causal inference from observa-tional data. For example, Rubin (2001, p.176) warns, that “when there are some treated subjects with propen-sity scores outside the range of the control subjects, no inference can be drawn concerning the effect of treat-ment exposure for these treated subjects from the data set without invoking heroic modeling assumptions based on extrapolation.”

Without a parametric model assumption for the regression E (Y |X ,π) overlapping distributions are technically needed for some of the propensity score approaches (see, for instance, the propensity score subclassification described above). For adjustment methods based on parametric covariate-treatment re-gressions — like the analysis of covariance and the moderated multiple regression approach — sufficiently overlapping distributions (of covariates or propensity scores) among treated and untreated units are tech-nically unnecessary. If the functional form assumption of the regression E (Y |X , Z ) is precisely correct (and if the covariate-treatment regression is Z –conditionally unbiased), and if the individual treatment proba-bilities are different from zero and different from one for each treatment condition and for each unit [that means for the two group case 0 < P(X = 1|U = u) < 1 for each unit U = u], then the average total effect based on extrapolation is well defined and can be unbiasedly estimated.

Additionally, as Kang and Schafer (2007b, p. 575) summarized, extrapolation is a common phenomenon for the estimation of causal effects from quasi-experimental data: “All of our methods extrapolate. The

as-2.5 Practical Issues Concerning the Adjustment 43

sumption of ignorability is itself an extrapolation.” Nevertheless, (residual) diagnosis for the functional form assumption of the covariate-treatment regression can only be performed empirically within the observed range of data (see above). Furthermore, the average total effect is estimated as the expectation over the un-conditional distribution of the covariates. It seems therefore reasonable to conclude that the extrapolation might make the estimator of the average total effect unstable and vulnerable to misspecifications of the utilized parametric regression model (see, e. g., King & Zeng, 2006).

It should be noted that this is not necessarily a major drawback. If a missing overlap is observed by checking the empirical distribution of the weighted covariates or the estimated propensity scores, one pos-sible solution might be the estimation of the average total effect of the treated (provided that at least the estimated distribution for the treated group overlaps with the distribution of the control group). Further-more, similar to the common strategy for propensity score based analysis, the target population for which the average total effects can be estimated from a concrete observational study might be reduced for the outcome modeling approach as well as for the overlapping cases (Morgan & Winship, 2007).

2.5.4 Unmeasured Confounders

The different adjustment methods reviewed in this section are based on the assumption of unbiasedness of the covariate-treatment regression E (Y |X , Z ) and are therefore restricted to measured potential con-founders. Unmeasured confounders or omitted covariates are also treated as a problem of selection based on unobservable variables (see, e. g., Heckman & Robb, 1985). Often, data on important confounders are not available in practical applications. Angrist and Krueger (1999) suggest checking the sensitivity of the estimated treatment effects to changes in the set of included covariates Z ≡ (Z1,..., ZK) as a hint that unob-served covariates would change estimates further.

An elaborated strategy that deals with the violations of the fundamental unbiasedness assumption of the covariate-treatment regression is an approach known as sensitivity analysis (see, e. g., Rosenbaum &

Rubin, 1983a; Lin, Psaty, & Kronmal, 1998, or Rosenbaum, 2005, for an introduction). Sensitivity analysis addresses the question of how hidden biases due to unmeasured confounders of various magnitudes might alter the final interpretation of the results from an observational study. This is performed with respect to the qualitative conclusions drawn based on the estimated average total effect (see Rosenbaum, 2002b, for an extensive presentation of the various methods and their theoretical foundation).

2.5.5 Measurement Error

We will now briefly discuss the most distinct feature of generalized analysis of covariance implemented as structural equation model with nonlinear constraints: The flexibility to take the measurement error of covariates into account for the estimation of a latent covariate-treatment regression. As recently

demon-2.5 Practical Issues Concerning the Adjustment 44

strated by T. D. Cook, Steiner, and Pohl (2009), the reliability of the selected covariates (see section 1.1.7) is important to obtain unbiased estimates of the average total effect. Various adjustment methods, for in-stance, matching on covariates are prone to measurement error (Shadish & Cook, 2009). The same is true for stratifying on observed covariates.

For the outcome modeling it is well known that estimates of regression coefficients in manifest re-gression models are underestimated if regressors are measured with measurement error (see, for example, Lord, 1960; Cochran, 1968b; Degracie & Fuller, 1972; H. I. Weisberg, 1979; Cochran, 1983; Fuller, 1987; An-grist & Krueger, 1999). Therefore, biased (also known as attenuated ) regression coefficients might restrict the use of the discussed adjustment methods to purely descriptive purposes when the biased regression coefficients lead to biased ATE–estimates. Hence, covariates measured with error (also known as fallible co-variates) can be seen as a major drawback, as summarized by Bentler (1991, p. 159): “The standard method for doing this, analysis of covariance (ANCOVA), may fail to give an unbiased treatment effect because if the control variables are fallible, the control is at the level of an observed variable rather than at the level of the true characteristic” (see also the discussion in T. D. Cook & Campbell, 1979, ch. 4, and Huitema, 1980, p.

149–154, as well as, e. g., Bini, Monari, Piccolo, & Salmaso, 2010, for measurement error and value added modeling). Similar effects of measurement error are also well known for the analysis of change without explicitly referring to the estimation of causal effects (see, e. g., Cribbie & Jamieson, 2000; Jamieson, 2004;

Cribbie & Jamieson, 2004).

Different adjustment methods have been suggested for dealing with measurement error. For instance, Sörbom (1978) suggested an alternative to the analysis of covariance without interactions based on the group-specific regression (see also Arbuckle, 2006, for a technical description of this method as well as Aiken, Stein, & Bentler, 1994, for an application). Furthermore, West, Biesanz, and Pitts (2000) describe an analysis of covariance that corrects for unreliability of the covariates, without the specification of a mea-surement model.

Generalized analysis of covariance for the estimation of average total effects implemented as struc-tural equation model with nonlinear constraints can be applied straight forward if covariates are measured with error and an appropriate measurement model can be formulated and identified (see Steyer & Partchev, 2008, and Steyer et al., in press). It will not be necessary to consider measurement models for latent covari-ates with respect to the research questions focused on this thesis. Nevertheless, we will give a summary of subsequent research questions dealing with measurement error of covariates in section 5.4.2.