graphics, personality measures, educational background and others (“The [...] study was designed to have a rich set of covariates potentially related to treatment choice and outcome”, Shadish et al., 2008a, p. 1340).
As mentioned above (see also section 1.1.7), these covariates should be related to students’ decisions about whether to select the math or the language course (assignment model) and also to the expected treatment effect of the intervention (outcome model) by substantive theory.
2.2 Review of Adjustment Methods
The literature contains various adjustment procedures for the estimation of average total effects, whose theoretical foundation was summarized in subsection 1.1.8. In order to relate generalized analysis of co-variance studied in this thesis to the different research traditions, a short review of different adjustment techniques is given in this section.
We have already described the application of fully saturated modeling of the covariate-treatment re-gression. If all covariates are discrete (and provided that the covariate-treatment regression is unbiased), the adjusted means as well as their difference can be estimated without further assumptions from the con-ditional prima facie effects. This approach is known as stratifying on observed covariates (discussed, e. g., by J. Robins & Greenland, 1986; Morgan & Harding, 2006). A similar approach without any functional form assumptions is to match units on the observed covariates Z , as described for instance by Cochran (1953) [see, e. g., Rossi, Freeman, & Lipsey, 1999, ch. 13 for empirical applications]. In practice, this nonparametric adjustment method is impossible in finite samples if the dimensionality of the covariates in Z is large, and therefore exact matching pairs cannot be found in the sample (known as data sparseness problem, see, e. g., Morgan & Winship, 2007).
To circumvent this problem, alternatives are described in the literature based on distance measures between units with different (multivariate) values of the covariates, for instance, matching based on the Mahalanobis distance (Rubin, 1980). In addition, matching procedures differ with respect to the algorith-mic process of matching, whether the matching is performed with or without replacement, the number of matched control units for each treated unit, as well as with respect to other technical features (see, e. g., Gu
& Rosenbaum, 1993, for a technical presentation of different matching procedures).
The nonparametric matching on observed covariates is substantially extended through the inclusion of a balancing score based on the assignment modeling approach by Rosenbaum and Rubin (1985). We will discuss the resulting propensity score matching and related methods in section 2.3 (see, e. g., Sekhon, 2009, for a recent survey of matching methods as well as Rubin, 2006, for a collection of relevant work). Note that although matching was originally developed as a nonparametric method without assumptions about the
2.2 Review of Adjustment Methods 25
functional form of a regression, this is not generally true for propensity score based adjustment methods when the Z -conditional propensities have to be estimated (see section 1.1.8 above).
2.2.1 Traditional Analysis of Covariance
An alternative approach to circumvent the data sparseness problem is the assumption of a function form for the covariate-treatment regression. In the most common version (labeled as traditional analysis of co-variance, ANCOVA) the regression of the outcome variable Y on the covariates Z and the treatment variable X is assumed to be linear (e. g., Cochran, 1957; Rao, 1973; T. D. Cook & Campbell, 1979; Maxwell, Delaney,
& O’Callaghan, 1993; Cohen, Cohen, West, & Aiken, 2003, as well as Rubin, 2006, ch. 4):
E (Y |X , Z ) = γ00+ γ01· Z + γ10· X ε ≡ Y − E (Y |X , Z ).
(2.1)
According to the decomposition introduced in subsection 1.1.8 [see Equation (1.26)] the average total ef-fect can be obtained as regression coefficient, i. e., ATE = E¡
g1(Z )¢
= E¡ γ10¢
= γ10. Typically, the covariate-treatment regression [Equation (2.1)] is estimated by ordinary least-squares. Hence, the traditional ANCOVA is sometimes also simply called OLS regression when applied to the estimation of treatment effects (see, e. g., in econometrics Verbeek, 2004, and also Guo & Fraser, 2010).
Statistical Inference The hypothesis of no average total effect in the ANCOVA model is H0: γ10= 0. Ad-ditional assumptions are necessary to test this hypothesis for ordinary least-squares estimated covariate-treatment regressions. If these assumptions (particularly with respect to the distribution of the residuals ε, see section 3.2.2) are met, for instance, an F –test based on the general linear hypothesis can be applied (Steyer, 2003, see also subsection 3.2.3).
Example Using this method for the quasi-experimental example means to identify the treatment effect for each treatment (compared to the other) as the regression coefficient γ10[according to Equation (1.26) the effect function g1(Z )] with the pre-treatment covariates as the multivariate covariate Z ≡ (Z1,..., ZK) [in-cluded with γ10and γ0k in the intercept function], i. e., for K multiple covariates the traditional ANCOVA–
model is E (Y |X , Z ) = γ00+PK k=1
¡γ0k· Zk
¢+ γ10· X . If the assumption of unbiasedness of the
covariate-treatment regression as well as the linearity and additivity assumption of this parameterization are met, ˆγ10
is an unbiased estimator of the average total effect. For the estimation of the treatment effect of the math training, the math post-test is used as the outcome variable Y and the math pre-test is one of the covariates.
In the same way, the language pre-test is used as one of the covariates for the estimation of the average to-tal effect of the language training, whereas the language post-test is used as the outcome variable. Further covariates used in the study reported by Pohl et al. (submitted) for the estimation of both treatment effects
2.2 Review of Adjustment Methods 26
are demographic variables (gender, age, marital status, major area of study), prior academic achievement (high school grades), topic preference (of math or language) and psychological variables (positive and neg-ative affect). Including additional indicator variables for dummy codings of categorical covariates, Pohl et al. (submitted) used an outcome model according to Equation (2.1) with K = 33 covariates.
2.2.2 Analysis of Covariance Without Linearity
The traditional analysis of covariance assumes a linear relation of covariate(s) and outcome (see also sub-section 1.1.5). Generalizations exist in the literature which relax this assumption of linearity but still assume additivity (i. e., parallel curves). For example, R. J. A. Little, An, Johanns, and Giordani (2000) applied an ex-tended ANCOVA model without the assumption of linearity for the estimation of an adjusted average total effect. They summarized different data analysis techniques under the following regression equation
E (Y |X , Z ) = γ00+ g (Z ) + γ10· X ε ≡ Y − E (Y |X , Z ),
(2.2)
where g (Z ) is a “smooth” nonlinear function of the covariates Z . R. J. A. Little et al. (2000) implemented their method based on cubic splines with fixed knots, an analysis that can be conducted with standard pro-gram packages for multiple regression if the sample size is large enough. The estimation of g (Z ) is based on a polynomial regression model, with additional terms like Z2and Z3. These polynomials can be under-stood as new covariates, and the parameterization of the covariate-treatment regression is linear in these covariates. The specification of Equation (2.2) implies that there is no interaction between covariates and the treatment variable because X does not enter the nonlinear function g (Z ). In other words, because of additivity of effects (see, e. g., Hastie & Tibshirani, 1990), the γ10parameter is still interpreted as an esti-mator of the average total effect provided that the covariate-treatment regression is unbiased and that the functional form assumption is fulfilled for Equation (2.2) with a specified function g (Z ).
Statistical Inference As long as the covariate-treatment regression in Equation (2.2) does not include covariate-treatment interactions and if the regression is estimated by ordinary least-squares, the hypothesis of no average total effect can be tested similarly to the traditional analysis of covariance under the assump-tions mentioned above.3 A test statistic for parallelism of the non-parametric regression curves has been developed by Young and Bowman (1995) and was implemented inRby Bowman and Azzalini (2007).
3Note that a generalization of the regression model of Equation (2.2) was presented by J. M. Robins, Mark, and Newey (1992) as semi-parametric causal regression model (i. e., under the additional assumption of strong ignorability), as an application of semisemi-parametric regression modeling as suggested by Robinson (1988). We do not present semiparametric regression modeling here. For a discussion with respect to nonparametric analysis of covariance see Akritas, Arnold, and Du (2000). Nevertheless, note that the regression pre-sented in Equation (2.2) is a special case of the parameterization of generalized analysis of covariance prepre-sented in subsection 1.2 (see also Steyer et al., in press).
2.2 Review of Adjustment Methods 27
Example For the empirical example used in this section to illustrate the different adjustment methods, the function g (Z ) could be applied as a flexible model for the regression of the math or language post-test on all pretest measures, i. e., on the multivariate covariates Z ≡ (Z1,..., ZK). For this conditioning of the outcome variable Y on the selected confounders Z with a common nonlinear regression, the estimated co-efficient ˆγ10still has the interpretation as an estimator of the adjusted total treatment effect. Unfortunately, nonlinear analysis of covariance was neither applied by Shadish et al. (2008a) nor by Pohl et al. (submitted), probably because of too small sample sizes (relative to the large number of covariates).
2.2.3 Moderated Regression and Mean-Centering
The regression in Equation (2.2) allows nonlinear dependencies, which are assumed to be parallel between treatment and control group. In order to obtain a model without the assumption of either parallel regression lines or parallel regression curves, a model without additivity can be formulated for the covariate-treatment regression. For the simplest case of two linear regressions conditional on X = j , this model is algebraically equivalent to a multiple regression model with interaction terms, also known as moderated regression (e. g., Cohen & Cohen, 1983). From Equation (2.1) we obtain the parameterization for one covariate and a linear relation within each treatment condition by adding the product term Z · X as an additional regressor:
E (Y |X , Z ) = γ00+ γ01· Z + γ10· X + γ11· Z · X ε ≡ Y − E (Y |X , Z ).
(2.3)
Due to the interaction, the average total effect no longer equals a single regression coefficient. This fol-lows immediately from Equation (2.3), which fits into the decomposition presented in Equation (1.26) with g0(Z ) = γ00+ γ01· Z and g1(Z ) = γ10+ γ11· Z . Hence, the average total effect for a simple model with two treatment groups and one covariate is
ATE10 = E¡ g1(Z )¢
= E (γ10+ γ11· Z )
= γ10+ γ11E (Z ).
(2.4)
For moderated regression models, an often suggested procedure is to “mean-center” the covariates (see, e. g., Aiken & West, 1996). The appealing improvement of mean-centered covariates is the simple interpre-tation of γ10as average total effect, if mean-centering yields covariates Z∗with an unconditional expec-tation of zero, i. e., with E (Z∗) = 0 (see also Judd, Kenny, & McClelland, 2001, for the suggestion to center covariates in within-subject designs, Angrist & Pischke, 2009, for a similar suggestion regarding the analysis
2.2 Review of Adjustment Methods 28
of data from a regression discontinuity design, as well as Wooldridge, 2001, for an extended formulation of this idea including the centering for functions of the covariates):
ATE10 = γ10+ γ11· 0
= γ10.
(2.5)
Furthermore, a similar simplification can be obtained for the average treatment effect of the treated, if conditional mean-centering yields E (Z∗|X = 1) = 0.
Statistical Inference Unconditional inference about the average total effect estimated from covariate-treatment regressions with interaction terms (moderated regression models) is discussed in detail in sec-tion 3.2.5. To weaken assumpsec-tions of ordinary least-squares regressions, we will introduce different im-plementations within the framework of structural equation modeling. With respect to the mean-centering approach, note that Equation (2.4) and Equation (2.5) deal with the true population value of the covariates’
mean.
Example To apply the mean-centering approach for an analysis concerning the given example, a moder-ated regression with mean-centered covariates Z∗≡ (Z1∗,..., ZK∗) can be specified by the appropriate linear transformations of each pre-treatment covariate Zk. With an increasing number of covariates the model becomes more complex, as an interaction term is also included for each additional covariate.4 For mean-centered covariates, the regression coefficient ˆγ10is an unbiased estimator of the average total effect, pro-vided that the covariate-treatment regression is Z –conditionally unbiased and that the functional form of E (Y |X , Z ) is specified correctly. For moderated regression models without mean-centered covariates, the average total effect is estimated as average distance (see section 3.2.4).
Neither Shadish et al. (2008a) nor Pohl et al. (submitted) applied a specification of E (Y |X , Z ) with included covariate-treatment interactions. Nevertheless, empirical application of the mean-centering ap-proach for the estimation of causal effects can be found, e. g., in Brand and Halaby (2006) as well as in Zanutto (2006).
2.2.4 Prediction / Regression Estimates
The average total effect can be identified without mean-centering as averages of the difference between regression predictions, a procedure recently suggested by Schafer and Kang (2008) as an alternative to the analysis of covariance. Regression predictions, i. e., regression estimates are well known in the survey
litera-4Note that even if all covariate-treatment interactions are included, we still rest on assumptions, for example, that no higher order interactions terms, e. g., interactions between covariates are necessary to capture the functional form of E(Y |X , Z ) correctly.
2.2 Review of Adjustment Methods 29
ture (see, e. g., Cochran, 1977; Lohr, 1999). The predicted scores incorporated for this approach are obtained from J separate group-specific covariate-regressions
EX =j(Y |Z ) = β0j+ β1jZ ε ≡ Y − EX =j(Y |Z ),
(2.6)
as yi j = β0j+ β1jzi, using the case-specific value zi of the covariate Z . For each treatment condition j a predicted score yi jis assigned to each case i , i. e., two predicted scores for each unit under X = 1 and X = 0 regardless of the observed treatment assignment for the comparison of J = 2 treatment groups. Finally, the average total effect is computed as the mean of the differences between the two predicted scores:
ATE10= 1 N
XN i
¡yi1− yi0
¢. (2.7)
The sum in Equation (2.7) is taken over all individuals i = 1,..., N in the sample and consequently, the ob-served outcomes under treatment and the obob-served outcomes under control are replaced by the predicted scores as well.5By estimating separate regression models for the treatment group and for the control group, all interactions between the covariates and the treatment variable are included by default. This follows from the fact that the regression coefficients for the J covariate regressions are not constrained to be equal.
Regression estimates, i. e., the estimation of the average total effect as the difference between pre-dicted scores, applies very generally to different kinds of regression models and can therefore be extended very flexibly to model nonlinearities in the covariate-treatment regression. A similar suggestion was made by Wooldridge (2001, p. 609), who pointed out that the conditional regressions rj(Z ) ≡ E(Y |Z , X = j ) for each X = j are non-parametrically identified, i. e., these conditional expectations depend entirely on “ob-servables”. Hence, when r0(Z ) for treatment X = 0 and r1(Z ) for treatment X = 1 are known, the ATE is identified as
ATE10= 1 N
XN i
¡r1(Z = zi) − r0(Z = zi)¢
, (2.9)
(see also Imbens, 2004).
5Note that for most implementations of the prediction approach, “the average of the predicted treated outcome for the treated”, i. e.,PN
i =1
¡xiyi 1¢
“is equal to the average observed outcome for the treated”, i. e.,PN i =1
¡xiyi¢
[see Imbens, 2004, p. 12]. Accordingly, for the simple models with linear parameterized intercept and effect functions considered in this thesis, the average total effect in Equation (2.7) is equivalent to
ATE10= 1 N
XN i =1
¡xi(yi− yi 0) − (1 − xi)(yi− yi 1)¢
, (2.8)
i. e., this approach is equivalent to simple mean imputation (see Schafer & Kang, 2008).