How Should ELP Be Defined and Measured? - Who is like Whom? Reclassification and Performance Pa

Based on the phrasing of NCLB, states report anywhere from two to nine different scores from their ELP assessments. Currently, the most common model (used in 42 states) is to report separate scores for the four domains of reading, writing, listening, and speaking, plus a composite comprehension score (a combination of listening and reading) and overall proficiency (a combination, sometimes weighted, of all four domains)

(Faulkner-Bond, Shin, Wang, Zenisky, & Moyer, 2013). Although states may report all of these scores to students and teachers, they may use only some for the specific purpose of ELP standard-setting and reclassification. Thus, the first decision a state must make in designing a reclassification process is which ELP measures to consider for

reclassification, and how to combine them.

One option is to combine all ELP subscores into one overall proficiency score, and use this score as the sole indicator of students’ ELP. The use of an overall score represents a compensatory approach, which is premised on the idea that relative weaknesses in certain language domains can be tolerated if they are complemented by strengths in other language domains. If this does not seem believable (conceptually, or in practice), states also may choose to make the overall proficiency score a weighted

average of the four domains, with certain domains given more weight if they are shown or believed to be more important for academic success. In Texas, for example, 75 percent of a student’s overall ELP level is based on their performance on the state’s ELP reading test (Texas Education Agency & Pearson, 2012). One advantage of the compensatory approach is that the overall score is based on the most items and is thus the most reliable

score from a decision consistency standpoint. In other words, there is less of a chance that students might be classified differently if they were to retest.

The next step along this continuum is what Carroll and Bailey (2015) refer to as a combination decision rule, where students who meet an overall ELP standard also must

score above certain minima in each domain to be reclassified – e.g., a student must have an overall ELP level of “proficient” and not score in the lowest performance level for any language domain. Or, finally, if it is believed or shown that students need to demonstrate mastery in all domains to be truly proficient, states may use a conjunctive decision rule, where students must earn proficient scores on all ELP subtests to be eligible for

reclassification. A drawback to combination and conjunctive models is that they require making decisions based on subtests that may be relatively short (e.g., 15 to 20 items). Since test length directly affects reliability and standard errors of measurement, there is a higher chance that students might hit or miss a cut score due to random error, rather than actual linguistic proficiency (or a lack thereof). This fact can blur the distinction between students who are and aren’t reclassified – that is, students who are more or less

indistinguishable in their performance may end up on either side of the cut score, simply due to measurement error.

Although standard setting panels may consider the practical impact of different proficiency models when making their final decisions, there are few published studies that have directly compared the impact of different proficiency definitions on student outcomes and performance. In a descriptive study of 875 ELs and 92 non-ELs in grade 5, Carroll and Bailey (2015) illustrated how decision rules can lead to sizable differences in the number and proportion of EL students who are recommended for reclassification. In

particular, they found that conjunctive rules that impose minimum performance

requirements for all subtests (e.g., students must meet a cut score on all domain subtests) tend to identify the most students as non-proficient, whereas compensatory models that focus only on the overall proficiency score identify the fewest non-proficient students.

Some studies with California data have also evaluated which scores tend to be the limiting factor for students’ transitioning at different points in time. California uses a complex model, where students must meet three separate criteria: (1) an overall ELP score in the state’s performance level 4 (out of 5), (2) scores that are not below level 3 (out of 5) on each domain subtest, and (3) performance in at least level 2 (out of 4) for the state’s ELA assessment. In a longitudinal study of over 200,000 ELs from the Los

Angeles Unified School District (LAUSD), Thompson (2012) found that the ELP reading subtest was the most likely barrier to reclassification for her students up through grade 5, after which the ELA content assessment became the limiting score. Robinson (2011) reached a similar conclusion in a study of 39,736 California ELs (all from one unnamed district), where he also observed that the ELP reading score held back the largest

proportion of potentially eligible students in fourth, fifth and sixth grade (between 30 and 40 percent of students), but the ELA assessment limited between 40 and 50 percent of eligible students for grades 7 through 10. Umansky and Reardon (2014) also observed that the ELA content assessment became the limiting factor for reclassification starting in 6th grade in their study of 5,423 California ELs from one district.

Findings like these underscore the impact – intended or unintended – of using a conjunctive model compared to a compensatory one. For example, since neither

students, this suggests that, were California to use a compensatory model, as opposed to its conjunctive one, many more students would have been reclassified than under the current system. This is neither good nor bad per se, but does serve to illustrate how different ways of defining ELP can affect students’ time in the EL subgroup. These findings also point to the impact of incorporating content assessment scores into

reclassification decisions, in addition to ELP scores; I discuss this idea further in section

In document Who is like Whom? Reclassification and Performance Patterns for Different Groupings of English Learners (Page 31-34)