Use of the Value-added Method for the Identification of Educational Effectiveness Factors

Total 62 45 107 Overall, about half of all papers presented evidence which could be broadly described as

3.3 The Use of Value-Added in Educational Effectiveness Research

3.3.2 Use of the Value-added Method for the Identification of Educational Effectiveness Factors

Of the 100 studies identified in the educational effectiveness research survey detailed above, 20 studies used this approach to identify school-level effectiveness factors and 29 examined factors at other levels. This first group of studies examined the association between measured educational effectiveness factors and pupil performance after controlling for other (non- school) pupil characteristics. These studies do not seek to reach estimates of the performance of individual schools or the distribution of school differences, using value-added scores as an outcome in their own right. Their methodology does, however, rest on the assumption that, once non-school factors are controlled for, the variance in outcomes predominantly reflect (or at least contain) differences attributable to variable school effectiveness.

The first example discussed here is Chapman and Muijs (2013) who examined the practice of federating (i.e. partnering) schools. They obtained a sample of federated schools and used a statistical matching approach to find statistically-similar non-federated schools to use during subsequent comparisons using multi-level modelling techniques (see last chapter). The title, ‘Does school-to-school collaboration promote school improvement? A

study of the impact of school federations on student outcomes’, is explicitly causal (Chapman and Muijs, 2013, p.351) and this claim is repeated in the text of the paper. They found that ‘federation is positively related to performance in the years following federation’ (p. 382) and concluded that ‘federations can have a positive impact on student outcomes and federation impact is strongest where the aim of the federation is to raise educational standards by federating higher and lower attaining schools’ (p385). Chapman and Muijs (2013) noted, however, that their conclusion should be treated with caution and observe the difficulties of reaching causal conclusions from correlation evidence. They point out that ‘the possibility that differences found reflect non-measured variables cannot be fully discounted’ (p. 358).

Next we look at de Bilde et al. (2013) who also examined the influence of school types, this time comparing the results of alternative and traditional mainstream schools in a longitudinal study. Growth curve analysis is used to model the rates of change between the 3rd_{year of kindergarten until the 3}rd_{grade (pupil ages ranged from 53 to 87 months) on two} measured outcomes: enjoyment and independent participation. Their results showed that there was no difference in enjoyment between alternative and mainstream schools and that equivalent pupils in alternative schools were actually rated lower in terms of independent participation by their teachers. When it came to drawing conclusions from this, de Bilde et al. (2013, p.229) explicitly ruled out the possibility of causal interpretations stating, “…although sometimes we referred to the term effect, the correlational nature of the data does not allow for causal interpretations.”

Another example is Melhuish et al. (2013), who looked at the effects of preschools. Similar to the studies above, the effect of several preschool types (and non-preschool) are examined after controlling for a range of non-school factors including family socio- economic status, birth weight, developmental problems and home learning environment. They concluded that certain types of preschool provision, especially of a higher-quality, have a positive impact on pupil performance and call for expansion of high-quality preschool provision. The question of causal attribution was raised in relation to the observational nature of the study. In this case the authors made a judgement on the extent to which the results are likely to be confounded by unmeasured differences and the limitations of the measures used. This study had a large range of control variables which proved decisive in favour of their causal conclusion.

Similar studies are conducted to examine teacher-level effectiveness factors. Vanlaar et al. (2013), for example, measured various aspects of classroom practice and examined how these relate to reading comprehension. They used a multi-level model with a repeated measures design, ‘controlling for student characteristics’ and estimating differential as well as average effects of classroom practices. This is another example of a study in which the limitations and value of the value-added design are clearly noted:

These studies are all examples of what could be characterised as following an ‘educational effectiveness approach’ (Isac et al., 2013, p.29). There are differences in the exact statistical models used, how the statistical controls are implemented (e.g. Chapman and Muijs, 2013, who included a separate statistical matching process), whether outcomes are tracked in longitudinal data, as well as many other differences in detail and emphasis. Nevertheless, these all share a common approach and the authors have had to consider the limitations associated with this, choosing how to interpret and present the results. There are two key general threats to the strength of the conclusions reached in these papers which stem from limitations of the value-added method: First, the issue of whether or not the measured variables have done a sufficient job of ruling out confounding factors (i.e. are unbiased). It is possible that the estimates are biased by unmeasured non-school factors which have not been controlled for (omitted variable bias) and likely that schools or teachers adopting certain practices (such as choosing to federate) are somehow different to other schools, leading to a form of selection bias. The second major limitation is whether the relationships identified can be interpreted as causal. Drawing causal conclusions from correlational evidence is a well-known problem (Shadish et al., 2002). Some of the studies reviewed have made strong recommendations for practice based on value-added evidence, others have

To truly determine whether there is a different effect of certain class- level variables on low- and high-risk students, an experimental design with a control group would result in more certainty. This study was nonetheless useful as it relied on the use of a longitudinal design and a large sample to indicate which variables may be worth studying through such an experimental design.

positioned it more as a step within a larger research process which is capable of identifying factors warranting further study.

In document The validity, interpretation and use of school value-added measures (Page 53-56)