• No results found

CHAPTER 2. EFFECT OF CORRELATION ON THE ESTIMATION OF

2.3 Simulations

Within education data many different multilevel data situations – longitudinal and hierar- chical – are common. This article applies joint estimation methods to the education context. In this section we will compare a simultaneous approach to an independent approach for both types of situations considering several two-level scenarios. The goal will be to evaluate the gains in efficiency from using a simultaneous modeling approach over a univariate approach. We will emulate a standardized testing scenario where the student is measured on at least two different tests. We consider different correlations between the random effects as well as different sample sizes.

2.3.1 Hierarchical Models

In the context of cluster randomized trials, the treatment is applied at the highest level. Within the two-level paradigm, we will consider both longitudinal and hierarchical situations that occur in educational research where the treatment is applied at the highest level. While it is not required that the covariates be the same for each model, in this simulation the models were generated using the same covariates with different values for the fixed effects. The values for the variances were selected to be similar to the observed values in the example data set examined in this article.

2.3.1.1 Two-Level

A two-level hierarchical model is considered for both outcomes. The covariates and pa- rameters were selected to closely mimic reasonable effects that may be seen when analyzing standardized test scores. The selection of variables includes: a linear grade level (G), a binary treatment variable (T), and two other binary covariates meant to mimic demographic indica- tor variables, (x3, x4). The binary variables are generated using a Bernouli random number

class level and all students in that class are exposed to the treatment. Equal numbers units are randomized to both treatment and control. Since each classroom is assumed to teach one grade, a random grade effect was not included. The model for both outcomes is as follows,

y1ij = β0+b0i+β1G1i+β2T1i+β3x3ij +β4x4ij +e1ij

y2ij = β5+b1i+β6G2i+β7T2i+β8x3ij +β9x4ij +e2ij (2.12)

wherei= 1, . . . , nare the classrooms andj = 1, . . . , sare the students in each classroom for a total of ns unique students providing 2ns test scores. The Gi and Ti are classroom level fixed

effect. Gi takes values 0,1,2,3 where each one unit increase emulates an increase of one grade

level. Each Ti represents the presence or absence of a treatment. x3 and x4 are simulated

binary variables at the student level emulating the present of various learning indicators. The random effects (b0i, b1i)0 ∼ MVN(0,D). The errors (e1ij, e2ij)0 ∼ MVN(0,Σ). The random

effects are independent of the errors. The covariance matrices take the form,

D=    20 20ρ 20ρ 20    Σ=    256 256ρ 256ρ 256   

where correlations ofρ= (0.2,0.3,0.4,0.5,0.6,0.7,0.8,and 0.9) are considered. The covariances were selected to be similar to empirical covariances observed from univariate models. The parameter values for the fixed effects were set atβ = (150,15,4,7,2,150,15,4,4,6). Simulations were conducted for n= 50 and s= 15. The values for the fixed effect were selected based on estimates from the example data set in section 2.5. The sample size was selected to reduce computation time. A sample of 50 teachers with 15 students in each class seemed reasonably close to values that might be observed in a real study. To evaluate the gain from increasing the number of teachers or increasing the classroom size, additional simulations were conducted with the number of teachers doubled (n = 100) or the number of students doubled (s= 30). The correlation was fixed at 0.6 in the additional simulations.

2.3.2 Longitudinal Models

Since most interventions in education and other social sciences are not one-time events, but instead are a process that is administered over time, it is common for multiple measures to be

recorded throughout the duration of the study. Simulations in this article are conducted to evaluate the performance of the simultaneous estimation approach with two outcome variables. The covariates are the same for each model including covariates at the lowest and highest levels. Only a balanced design with the treatment assigned at the highest level is considered. The values for the variances were selected to be similar to the observed values in the example data set examined in this article.

2.3.2.1 Two-Level

A two-level longitudinal model includes only the repeated measures and the group experi- mental units as the two sources of variation. The treatment variable is a student-level variable that is time dependent. At time 0, all students are considered to be unexposed to the treatment. At all future times, half the students are assigned to the treatment group and half are assigned to the control group. Two time-independent covariates were created to emulate student demo- graphic covariates. There were again created using a Bernoulli distribution with probabilities 0.3 and 0.5 respectively. A random grade level effect is included to describe improvements due to time. These led to the following model for the two outcomes:

y1ij = β0+b0j+ (β1+b1j)G1ij +β2T1ij+β3x3ij+β4x4ij +eij

y2ij = β5+b2j+ (β6+b3j)G2ij +β7T2ij+β8x3ij+β9x4ij +eij (2.13)

where i= 1, . . . , n represents the nstudents each having j = 1, . . . , r repeated scores on each outcome. In the completely balanced case there are 2nrunique test scores. The random effects (b0i, b1i, b2i, b3i)0 ∼ MVN(0,D). The errors (e1ijk, e2ijk)0 ∼ MVN(0,Σ). The random effects

are independent of the errors. The covariance matrices take the form

D=          240 53.67ρ 244.95ρ 48.99ρ 53.67ρ 12 54.77ρ 10.95ρ 244.95ρ 54.77ρ 250 50ρ 48.99ρ 10.95ρ 50ρ 10          Σ=    256 256ρ 256ρ 256   

where correlations ofρ= (0.2,0.3,0.4,0.5,0.6,0.7,0.8,and 0.9) are considered. The parameter values for the fixed effects were set at β = (150,15,4,7,2,150,15,4,4,6). Simulations were conducted with n= 50 and r = 4. Additional simulations we performed with the correlation fixed at 0.6 but with either the number of repeated measures doubled (r = 8) or the number of subjects doubled (n= 100). The values for the fixed effects were selected based to be similar to the values in the example in this article.

Related documents