General Linear Hypothesis and Mean-Centering

4.5 Results for the General Linear Model

4.5.1 General Linear Hypothesis and Mean-Centering

According to the research question formulated in subsection 3.4.1, the results of the general linear model are reported here to demonstrate the expected bias of the standard errors of the ATE–estimator caused by the stochasticity of the covariates as well as to provide empirical evidence that mean-centering of covariates does not change the statistical properties of the ordinary least-squares estimated average total effect esti-mator (e. g., the standard error). Furthermore, we chose the robustness of the general linear model to het-erogeneity of residual variance for unequal group sizes as one of the central themes for the simulation study I (see the reasearch question in subsection 3.4.2). Hence, it will be interesting to see if the two violations of the assumptions of the general linear model (i. e., the violation of the fixed-X assumption and the violation of the heteroskedasticity assumption) can be distinguished by contrasting the effects of different parameter constellations of the Monte Carlo simulation.

First of all, note that we obtained the same results for the mean-centering approach and for the pro-cedures based on the general linear hypothesis with respect to the absolute bias of the ATE–estimator [B( dATE10), as defined in Equation (4.6)] and moreover with respect to the relative bias of the standard er-ror of the ATE–estimator [RB£

S.E.( dd ATE10)¤

, as defined in Equation (4.7)] and accordingly also with respect to the empirical type-I-error rates (i. e., the observed rejection frequencies, hRF) for tests of the hypothesis ATE10= 0.⁶

Absolute Bias of the ATE–Estimator The ATE–estimator of the GLH / mean-centering approach was found to be unbiased for all studied conditions of simulation study I.⁷Figure 4.1 presents the B( dATE10) for the GLH / mean-centering approach based on the estimated mean of the covariate on the y–axis, compared to

6Negligible differences between the mean-centering approach and the general linear hypothesis — if one method was implemented based on theGLMimplementation of the general linear model inR, and the other method was implemented based on theLM imple-mentation of the general linear model inR— are reported as additional Figure 3 on page 14, Figure 4 on page 15, Figure 5 on page 16, Figure 6 on page 17, Figure 7 on page 18 and Figure 8 on page 19 of the digital appendix. All results reported in this result section are based on theLMimplementation ofR(see Chambers & Hastie, 1991).

7The absolute biases of the ATE–estimator obtained for the GLH / mean-centering approach are given in the additional Table 1 on page 121, Table 2 on page 122 and Table 3 on page 123 of the digital appendix. Furthermore, the additional Table 4 on page 124 of the digital appendix presents the B ( dATE10) averaged for all conditions of the simulation study I, grouped by different sample sizes. Apart from random fluctuations, the absolute biases of the average total effect estimator decrease as expected for increasing sample sizes.

4.5 Results for the General Linear Model 117

Absolute Bias B( dATE10) of the ATE–Estimator

GLH / Mean-Centering with Estimated Mean of the Covariate vs.

Approximated Multi-Group Model, Grouped by Sample Size

−0.10 0.00 0.10

−0.100.000.050.10

N=100

Approximated Multi−

Group Model

GLH with estimated Z

−0.10 0.00 0.10

−0.100.000.050.10

N=250

Approximated Multi−

Group Model

GLH with estimated Z

−0.10 0.00 0.10

−0.100.000.050.10

N=400

Approximated Multi−

Group Model

GLH with estimated Z

−0.10 0.00 0.10

−0.100.000.050.10

N=1000

Approximated Multi−

Group Model

GLH with estimated Z

Figure 4.1: Absolute bias of ATE–estimator: Scatter plots for a comparison of the GLH / mean-centering approach (estimated mean of the covariate) and the approximated multi-group model, grouped by sample size N

the B( dATE10) for the approximated multi-group structural equation model on the x–axis. The scatter plots are grouped by different values of the sample size N , that means that each sample size condition of the simulation study I (N = 100, N = 250, N = 400, and N = 1000) is plotted separately as a chart. Each symbol in the four charts represents the B( dATE10) for one condition of the simulation study I, located in the chart according to the observed absolute biases of the two mentioned ATE–estimators. It is obvious that both methods are comparable with respect to the absolute bias of the average total effect estimator because all symbols are located perfectly on the diagonal line for each of the presented sample size conditions.

A comparison of the B( dATE10) for two different sample sizes (N = 100 top, versus N = 1000, bottom) is summarized in Figure 4.2. This figure presents the biases for 288 conditions of simulation study I with R²_{X |Z}= 0.75 and γ01= 0.75 for the GLH / mean-centering approach, condensed as level plots.⁸The B( dATE10) is different from zero for conditions with small sample sizes (obvious from a comparison of the first and the third row in the upper part of the figure with the same two rows in the lower part of the figure), especially for unequal group sizes with a treatment probability of P(X = 1) = 0.2.

Replacing the estimated mean of the covariate, i. e., bµZ, with the expectation of the covariate, i. e., E (Z ), yields different biases for the ATE–estimator, especially for data generated with large interaction ef-fects. This observation is presented as scatter plots in Figure 4.3.⁹ Each of the six charts in this figure

8Quantitative results, for instance, the observed bias of the ATE–estimator, are encoded in these level plots by using a color gradient.

The corresponding color key for mapping the printed colors to the observed values is given on the right side of each plot. Level plots are used often to present results because they are useful for judging patterns of variability, see Sarkar, 2008, for more information about level plots and their generation withR.

9Furthermore, the additional Figure 9 on page 20 of the digital appendix compares the observed distribution of B ( dATE10) based on the estimated mean of the covariate in the left column and based on the true expectation of the covariate in the right column,

4.5 Results for the General Linear Model 118

Absolute Bias B( dATE10) of the ATE–Estimator GLH / Mean-Centering with Estimated Mean of the Covariate,

N = 100 vs. N = 1000 [R²_{X |Z}= 0.75 and γ01= 5]

Figure 4.2: Absolute bias of ATE–estimator: Level plots for the GLH / mean-centering approach based on the estimated mean of the covariate [N = 100 vs. N = 1000; R²_{X |Z}= 0.75, γ01= 5 and γ11= 0.75]

presents the B( dATE10) for the GLH / mean-centering procedure with bµZ on the x–axis, and for the GLH /

conditional on the value of the interaction parameter γ11used for data generation (rows). Obviously, the differences between the to GLH / mean-centering procedures increase with the interaction term γ11.

4.5 Results for the General Linear Model 119

Absolute Bias B( dATE10) of the ATE–Estimator

GLH / Mean-Centering with True Expectation of the Covariate vs. GLH / Mean-Centering with Estimated Mean of the Covariate, Grouped by Interaction

−0.10 0.00 0.10

−0.100.000.050.10

γ11=0.5