• No results found

2.3. How Should Reclassification Standards be Set?

Setting reclassification standards requires both defining ELP and deciding what type of evidence is necessary to determine that it has been achieved. Needless to say, it is both complicated and high-stakes, as it affects which students make it through, and how reclassified students go on to perform as RFEP students.

As a baseline, Wolf and Farnsworth (2013) stress the importance of establishing the validity of the ELP assessment itself, as reclassification can only be as fair and as valid as the scores that form its primary – or perhaps only – criterion. In particular, they emphasize the importance of (1) appropriately articulating the ELP construct in terms of the language knowledge and skills necessary for academic contexts, and (2) properly aligning ELP standards with ELP assessments, including establishing correspondence between ELP standards and academic standards. They recommend that states collect evidence to support the reliability, construct validity, and consequences of the ELP assessment’s use for reclassification purposes. They also encourage states to consider impact data for different standards and models to see how they affect the number of students who transition, how long students spend as ELs, how students who are

reclassified go on to perform, etc. In a study of ELs from various grade levels in three different states, Kim and Herman (2009) explored these types of questions by seeing what the predicted content score was at the reclassification cut point, and comparing this predicted value to the average performance of non-ELs.

Beyond the ELP assessment itself, states must decide which scores to use, both from the ELP assessment, and possibly from other sources. In some states, ELP and reclassification are essentially synonymous, while in others, achieving ELP is a trigger to look for further evidence to make a reclassification decision. Ragan and Lesaux (2006) studied the identification and reclassification criteria used in the 10 states and 10 districts with the largest EL enrollments (at the time), and found that eight of the ten states and five of eight districts considered additional criteria in exit decisions such as academic test scores, grades, or teacher or committee recommendation. Generally, some states, such as California and Massachusetts, recommend that districts consult other measures for students who have met the state’s ELP standard (California Department of Education, 2015; Massachusetts Department of Elementary and Secondary Education & DePascale, 2012), whereas others, such as Oregon, Arizona, and the state used for this study, rely solely on ELP scores, as a rule.

The use of additional scores can obviously affect reclassification rates, as suggested with the examples from California referenced in section 2.2. Or, to use an example from a different state, Carroll and Bailey (2015) illustrate that a stepwise decision process, in which ELP scores serve as the first indicator for reclassification considerations, can also limit which students have the opportunity to transition. In particular, they found that a proportion of ELs in their sample of 875 demonstrated

proficient or advanced performance on all of their state’s academic content assessments, but fell short of meeting ELPA performance criteria. Such students – whose numbers varied from 2 to 34 individuals, depending on the ELPA decision rule used – would not have been considered for reclassification in their state since ELPA scores are considered first. The authors suggest, in response, that states should consider both language and content performance simultaneously – rather than sequentially – to ensure that students who may be eligible for reclassification are not overlooked.

Cook, Linquanti, Chinen, and Jung (2012) propose a different solution, which is to examine the relationships between ELP scores and content scores, rather than the scores themselves. They specifically recommend that:

Researchers can define “English language proficient” as the point at which EL students’ academic content achievement assessed using English becomes less related to their ELP. That is, there is a point at which EL students have sufficient English language skills to adequately function in English on content assessments; accordingly, there should be observable decreases in the relationship between the two assessments. At or beyond this point is where the ELP performance standard might be

considered… (Cook et al., 2012, p. 8)

They propose three quantitative methods for empirically identifying this turning point using real data: decision consistency, logistic regression, and descriptive box plots. All three models require sorting or grouping ELs based on their level of ELP, and then plotting the language-content relationships for these different groups. For decision consistency, they plot the percentage of students scoring in the proficient range in both ELP and content for different levels of ELP, and recommend setting the cut score at or around the level of ELP that has the highest percentage of proficient-proficient

agreement. For logistic regression, they predict the probability of scoring in the proficient range on each content assessment, conditional on ELP score. They recommend setting the

cut score at the point where the probability of meeting the ELA standard is at or above chance (0.5). For the descriptive box plots, they recommend creating box plots of content performance for each ELP performance level, and setting the cut score at or around the ELP level that is closest to being centered on the content performance cut score (i.e., half the students in the ELP level score above the content cut, and half score below).

Cook et al. (2012) illustrate all three of their methods using two-year data samples for grades 4, 7, or 10 from three different states (n-sizes ranged from 1,120 to 2,563 depending on the grade-level and state). Among other things, their illustration

demonstrates both that the predicted point of divergence does occur in all samples, and, equally important, that its placement varies from state to state and grade-level to grade- level based on factors such as the state’s ELP standards, content standards, and the linguistic complexity of the test forms themselves. They also note that their design is premised on the fact that academic content cut scores are typically non-negotiable when ELP cut scores are set; thus, the resulting cut scores should be considered to maximize desirable outcomes conditional on academic content scores. Were both cuts being set concurrently, it is possible that both could be placed more optimally.

Having set standards for ELP and reclassification, the next step for any state or policymaker is to collect evidence for the validity of this standard. In this context, three particularly important validity considerations are (1) which students meet this standard, (2) how long it takes them to get there, on average, and (3) how they fare after they have met it and transitioned out of the EL subgroup. Due to the scope of the current study, only (1) and (2) will be discussed in depth, in the final section of this literature review.

First, however, I pause briefly to discuss some methodological considerations for how reclassification should be studied at all.