Chapter 2 – Exploration of evaluation methodology
2.5 What is the purpose of evaluation?
Evaluations can serve a wide variety of purposes, not least because of the range of audiences with a potential interest in evaluation findings (Weiss, 1986). I shall explore the issues relating to different audiences in the next section below, but it is first worth examining some of the debate around the general purposes of
evaluation.
Evaluations are often characterised as being either formative, providing learning for improvement within a programme, or summative, providing an overall
assessment of success or failure. As Stake describes this distinction, “When the cook tastes the soup, that's formative evaluation; when the guest tastes it, that's summative evaluation.” (cited in Scriven, 1991: 19). These categories of formative and summative evaluation are closely paralleled by Chelimsky’s (1997: 11-14) notion of evaluation for developmental purposes, providing feedback on a policy to improve it, and evaluation for accountability, to provide evidence to decision
makers about the success of a policy, in order to inform decisions.
Whilst this distinction between summative evaluation for accountability and
formative evaluation for development is enormously useful in considering what the purpose of an evaluation might be and therefore how it should be designed, there is also a risk that it becomes too readily equated with a simplistic divide between policy-makers and practitioners. As Weiss suggests:
"The assumption is that by providing 'the facts,' evaluation assists decision-makers to make wise choices among future courses of action. Careful and unbiased data on the consequences of programs should improve decision-making." (Weiss, 1993: 93-4)
However, as the previous sections have suggested, the idea that evaluation can provide straightforward black and white assessments of success or failure is significantly challenged by the complexities of social policy. Moreover, even for those who are less cynical about the possibility of summative assessments of policies, there is a substantial body of literature questioning the notion of a rational, linear policy-making process from summative evaluation to
evidence-51 based decisions. As noted earlier in relation to policy complexity, some elements of policy may be shaped from the bottom up (Sabatier, 1997; Lipsky, 1997). And there has been long-running debate about the extent to which the policy making process is rational or more incremental, gradually developing rather than making wholesale changes at clear decision points (Simon, 1947; Lindblom, 1959; Etzioni, 1967).
Thus it may make more sense to see the boundary between formative and summative evaluation as somewhat blurred, since the process of policy
development relies as much on learning and adaptation within an evolving policy framework as it does on broad assessments of ‘what works’. As Majone
expresses it:
"The real challenge for evaluation research and policy analysis is to develop methods of assessment that emphasise learning and adaptation rather than expressing summary judgements of pass or fail." (Majone, 1988, cited in Weiss, 1998: 20)
This is clearly reflected in Weiss' notion of the 'enlightenment' model of research utilisation, whereby studies gradually percolate into policy circles over time, rather than the 'knowledge-driven' model where research has direct and instant effects on policy (Weiss, 1986: 31). Similarly, although Cook (1997: 40) argues that talk of a 'crisis' in evaluation, caused by the realisation that policy makers do not always use findings in a linear, rational process, is over-blown, he also points to the 'educational' use of evaluation findings.
This analysis which blurs the lines between summative and formative evaluations, and between accountability and developmental purposes, chimes with theory-based evaluation approaches. Thus Connell and Kubisch (1998: 38) argue that this is exactly what a Theory of Change approach attempts to do. The suggestion is that such an approach serves a formative purpose by sharpening the planning and implementation of initiatives, focusing discussion on outcomes and theories of how they can be achieved, but also clarifies the measurement of outcomes at various stages and addresses issues of attribution, thus providing summative evaluation data. Likewise, Pawson and Tilley (1997:207) suggest that Realist
52 Evaluation could operate in a similar fashion, cutting across the boundary between summative and formative evaluation, as the evaluator feeds back findings on effective context-mechanism-outcome configurations to policy makers and
practitioners alike. By contrast, experimental approaches which do not explore the
‘black box’ of policy implementation would seem to struggle to provide formative feedback, although the suggestion from Ludwig et al (2011) about the possibility of using experimental approaches to test particular mechanisms within an overall theory perhaps opens a formative door, potentially utilising experimentation within a theory-based framework.
This possibility of providing both formative and summative findings from a theory-based evaluation opens up a further question, however, relating to the previous section’s discussion of the importance of context. Whilst a strictly formative evaluation may be of use within a particular programme, summative evaluation is generally of use to the extent that it can enable lessons to be transferred,
facilitating the expansion of a programme into new contexts, or providing learning for other programmes. Hence it is important to consider the extent to which
evaluations may provide generalisable findings, or as Chelimsky (1997: 11-14) expresses it, the generation of knowledge which goes beyond the immediate accountability and development purposes relating to the specific policy or programme.
As Pawson and Tilley (1997) themselves point out, there is a significant challenge in generalising evaluation findings, since even a very clear finding from one study that a policy appears to work, will not mean that the 'same' policy can be
transferred to new locations, new points in time, or new groups of people:
“The social world (bless it) will always conspire to throw up changes, differences, and apparent anomalies from trial to trial” (Pawson and Tilley, 1997: 116)
However, both Theories of Change and Realist Evaluation attempt to address this challenge, in different ways. For ToC approaches, the key factor is the strength of the overall model and the level of detail, which helps to identify how it may apply or differ in a new context. As Granger expresses it:
53
"Armed with a strong theory, evaluators are better prepared to
anticipate and then examine how between-site variations may shape effects." (Granger, 1998: 240)
Hence, although a particular Theory of Change may not be straightforwardly applicable in another context, a strong model should enable both policy makers and practitioners to make reasonable decisions about extending or amending a programme.
For Realist Evaluation, Pawson and Tilley emphasise the importance of
'cumulation' of findings with regard to specific context-mechanism-outcome (CMO) configurations (1997:115). Rather than attempting to 'pile the bricks' of
experimental studies on whole policies, the Realist Synthesis approach is to explore CMO configurations using evidence from a range of studies, to provide robust theories that can potentially be applied across different policy areas (Pawson, 2006).
Neither of these approaches to generalisation of evaluation findings are entirely unproblematic, however. As Milligan et al suggest, drawing on their experience of using ToC methodologies, there can be significant challenges in developing theories with sufficient depth:
"Without the detailed steps that are currently missing from the theories, it will be difficult to produce the compelling evidence stakeholders need in allocating resources among promising initiatives." (Milligan et al, 1998: 83)
The issue for generalisation from Realist Evaluation may be less about a lack of detail in the theory, but rather that the notion of “learning more and more about less and less” (Pawson and Tilley, 1997: 198) creates theories which become so specific to narrow contexts that they are of little use, despite all of Pawson’s focus on ‘middle-range theory’.
54 One possible solution to each of these difficulties is to follow Blamey and
Mackenzie’s suggestion of combining them:
“there is no obvious reason for believing that Theories of Change and Realistic Evaluation could not coexist within the one programme evaluation, with the former providing broad strategic learning about implementation theory and the latter bearing down on smaller and more promising elements of embedded programme theory.” (Blamey and Mackenzie, 2007: 451)
Thus the idea is that a strong theory of change, fortified by a range of CMO
configurations which have been tested across various policy areas, would provide a good basis for generalising findings into new settings by highlighting what should work and what may need adjustment. This could augment the potential for ToC methodology in particular to offer formative feedback through the evaluation process and for each of the approaches to offer summative findings in relation to particular programmes. In order to assess the potential merits or difficulties of this suggestion for the purposes of studying community participation policy, however, it is necessary to address the final fundamental question, which relates to the
interests being served by evaluations.