small effect (d = 0.2), a medium effect (d = 0.5) and a large effect (d = 0.8) were used for comparing the statistical power of the different implementations of generalized analysis of covariance. Furthermore, to investigate the small sample performance of the adjustment procedures, a condition with no true average total effect (d = 0) was also included in simulation study II. The different levels of the regression coefficients and the different effect sizes used for the data generation are summarized in Table 4.3.
To manipulate the factor heterogeneity of residual variances, the variances of the two normally dis-tributed random variables εδ10 and εX =0were varied with the values given in table Table 4.4. In all condi-tions of the Monte Carlo simulation, a constant value of 0.5 was used for the residual variance Var (ζ) [see Equation (4.3) for the exact meanings of these terms].
Table 4.4: Residual variances used for data generation in simulations I and II Simulation Study Residual Variance Selected Values for the Data Generation
I Var (εδ10) 0.5, 1, 2.5, 5
Var (εX =0) 0.5, 1, 2.5, 5
Var (ζ) = 0.5
II Var (εδ10) 0.5, 2.5, 5
Var (εX =0) 0.5
Var (ζ) = 0.5
This method of data generation for the outcome model resulted in a larger variance of the outcome variable Y in the treatment group (indicated by X = 1) than in the control group (X = 0) across all generated datasets. This is notable, as we hereby introduce the important distinction between the two conditions with unequal treatment probabilities [P(X = 1) = 0.2 versus P(X = 1) = 0.8].
Table 4.5 summarizes all parameters used for data generation. Whenever the factor heterogeneity of between-group residual variances is mentioned in the result presentation, it refers to the ratio of Var (εX =0) to Var (εδ10) as used for the data generation in the outcome model. Furthermore, we will use the phrase amount of confounding to distinguish the results obtained under the two different levels of the regression parameter γ01. Equal group sizes [P(X = 1) = 0.5] and unequal group sizes [P(X = 1) = 0.2 or P(X = 1) = 0.8]
are obtained from the coefficients α0and α1of the assignment model (see Table 4.2). Finally, note that the selection of the two α–parameters also determines the factor dependency of X and Z , labeled as R2Y |Zwithin the result sections.
4.3 Design of the Simulation Studies
The simulation study was conducted in a fully crossed design, with NRep= 1000 replications of each com-bination of the varied independent parameters. For the first part of the simulation study (bias of the ATE–
estimators, standard error bias of the ATE–estimators, and empirical type-I-error rate for the test of the
4.3 Design of the Simulation Studies 111
Table 4.5: Summary of the parameters used for data generation in simulations I and II
Sample Size N Number of observations
Regression Coefficients γ00/ β00 Intercept of the covariate-treatment regression in the control group
γ01/ β01 Slope of the covariate-treatment regression in the control group
γ10= β10− β00 Main effect (average total effect when no covariate-treatment interaction is present)
γ11= β11− β01 Covariate-treatment interaction (difference be-tween the slopes of the group-specific covariate-treatment regressions)
β10= γ10+ γ00 Intercept of the covariate-treatment regression in the treatment group
β11= γ11+ γ01 Slope of the covariate-treatment regression in the treatment group
α0 Intercept for the assignment model
α1 Slope for the assignment model
Expectations E (Z ) = µZ Expectation of the covariate Variances Var (Z ) = σ2Z= 1 Variance of the covariate
Residual Variances Var (εX =0) Residual variance for the covariate-treatment re-gression in the control group
Var (εX =1) Residual variance for the covariate-treatment re-gression in the treatment group
Var (εδ10) Residual variance for the individual total effect Var (ζ) Additional residual variance of the outcome model
(uncorrelated with the residual variance for the covariate-treatment regression in the control group)
Covariances Cov (εX =0,εX =1) = 0 Covariance of the residuals for the treatment group-specific regressions of Y on Z
Cov (εX =0,εδ10) = 0 Covariance of the residual for the regression of Y on Z in the control group with the residual of the re-gression δ10on Z
hypothesis ATE = 0), 9216 cells were generated by combining the factor sample size (4 levels, see Table 4.1), the dependency of X and Z , the group size (12 combinations of α0and α1for the assignment model, see Table 4.2), the regression coefficients of the outcome model (2·6 combinations of γ01and γ11, see Table 4.3), and the residual variances of the outcome model [4 · 4 different pairs of Var (εX =0) and Var (εδ10), see Ta-ble 4.4].
The statistical power and the sample size requirements of the final models (simulation II) were studied in 8640 different cells. According to Table 4.1, the sample size was manipulated with 10 different levels, crossed with 12 combinations of α0and α1for the assignment model. Furthermore, the outcome model was generated for 2·3 different combinations of γ01and γ11(see Table 4.3) and 3 different residual variances Var (εδ10) [see Table 4.4]. Finally, 4 different values of the effect size d were used (see Table 4.3).
4.3 Design of the Simulation Studies 112
For each cell in the first and second part of the Monte Carlo simulation the following implementa-tions of generalized analysis of covariance were applied to estimate the average total effects and to test the hypothesis ATE = 0 in the NRepsimulated datasets:
• Two tests of the hypothesis of no average total effect implemented with the general linear hypothesis, either based on the estimated empirical mean of the covariate in the linear hypothesis, or with the true expectation of the covariate (see subsection 3.2.3 for details)
• The hypothesis ATE = 0 tested with the help of the general linear hypothesis and based on the mean-centering procedure, but with heteroskedasticity–adjusted variance-covariance matrices (HC3and HC4correction, see page 58 in subsection 3.2.2.1 for details)
• A test statistic for the estimated average total effect obtained as regression estimate and performed with the corresponding adjusted standard errors given by Schafer and Kang (2008) [see page 58 in subsection 3.2.2.1 for details]
• The application of the predictive simulation approach suggested by Gelman and Hill (2007) [see page 61 in subsection 3.2.3 for details]
Furthermore, for each generated dataset the hypothesis ATE = 0 was tested with the Wald–test of the non-linear constraint based on the following structural equation models:
• The simple multi-group model with fixed group size, where the mean of the treatment variable is assumed to be a known number (either from the data generation as the true population value or as the estimated empirical mean of the treatment variable X , see subsection 3.3.3.1 for details)
• The elaborated multi-group model as an extension of the simple multi-group model, where the group size is incorporated as an additional estimated model parameter with theKNOWNCLASS–option of Mplus(see subsection 3.3.3.2 for details)
• The approximated multi-group model with augmented variance-covariance matrix of parameter es-timates (see subsection 3.3.3.3 for details)
• The simple single group model (with interaction) [see section 3.3.4.1 for details]
• The elaborated single group model, where the implied variance structure is modeled with the help of the random slope approach (see section 3.3.4.2 for details)
The following methods were applied only to the generated datasets of the first part of the simulation study to save computational time: The simple multi-group model with fixed group size utilizing the true group size P (X = 1) known from the data generation and the general linear hypothesis / the moderated regression ap-proach with a mean-centered covariate with and without heteroskedasticity–adjusted variance-covariance matrices.