2.3 Frameworks for robust evaluation
2.3.1. Storey’s Six Steps
Storey (1998)14 expressed his concern at the state of evaluation, looking specifically at the
small business and entrepreneurship field.
“Given the huge variety of schemes, the diversity of countries in which the schemes
are found, and the often inflated claims on the part of those administering the schemes for their effectiveness, it is disappointing that the academic community has been rather slow in seeking to address this area. Perhaps even more seriously, where the issues have been addressed by small business academics, the methods of evaluation employed have rarely been at the intellectual frontier.”
Storey (1998, pp. 3-4)
This concern led Storey to look at the entire evaluation process. His first published work on this matter (Storey, 1998) covers two parts of what he perceived as important in evaluation, starting with the importance of clear objectives, which can be considered as SMART15
objectives. Secondly, a robust evaluation approach was introduced, with its robustness measureable against a scale of available approaches.
14 The initial working paper was published in 1998, picked up in a number of later works by Storey which
are referred to interchangeably in the context of the Six Steps to Heaven approach, e.g. Storey (2000).
15 A widely used term, SMART objectives define specific, measureable, assignable (later often stated
as achievable), realistic and timed objectives (Doran, 1981). This concept will be returned to further below.
- 31 -
In detail, Storey (1998) describes clearly specified objectives as central to enabling any sort of evaluation. Examples of vague outcomes are provided – “creating an enterprising society” (see Cowie’s [2012] SBS example earlier), “maximising SMEs contribution to economic development”, “enhancing competiveness” and “creating jobs”. Arguably the latter is comparably easier to measure, certainly by comparison to the other examples. But it fails to be of a specific nature – how many jobs are sought?16 Without being clear on what outcomes
were intended, it is impossible to measure them and to determine an intervention’s impact and effectiveness.
This lack of defined objectives in entrepreneurship and small business policy was highlighted by Koning and Snijders (1992; in Storey, 1998). Considering SME policies in EU countries with an intention to provide a comparison of the same, the only comparable measure they found was the number of policies introduced for certain aspects of policy – hardly an insightful measure.
Storey devised a framework where each ‘step’ provides a category of sophistication for the evaluation approach applied. These Six Steps can be summarised as follows, starting at the most basic level. The initial three steps consider qualitative analysis, where Step 1 would be an analysis considering scheme take-up only, Step 2 would include recipients’ opinions, Step 3 their views on the assistance’s impact. Clearly, these approaches have their limitations, as they merely rely on observing the group that received a particular intervention. Steps 1-3 therefore are described as “Monitoring”. The UNDP (2009) distinguishes between monitoring for the purpose of simply tracking progress of the implementation of an intervention or programme, and monitoring as a continuous process where frequent feedback is collected on progress towards specified goals. Either would be covered by Steps 1-3, however, given the
16 On a tangent, the argument may be extended further. It may also be questioned to what extent job
creation is the actual aim of policy-makers – is policy really motivated by creating new jobs (and measured against), or is the actual motivation a reduction of unemployment (keeping in mind that new jobs created at one place may mean a reduction of jobs elsewhere among the competition). Also “creating new jobs” does not make any reference to the quality of these – clearly jobs in some sectors are more desirable and ‘future-proof’ than others.
- 32 -
observed lack of clearly articulated intervention objectives in enterprise policy17 most such
monitoring would automatically be limited to the more simple tracking approach, without a vision of what ultimately would define policy success.
Steps 4-6, by comparison, are referred to as “evaluation” steps. The difference between monitoring and evaluation is the latter’s more rigorous and robust assessment of an activity (UNDP, 2009). As such, Step 4 would include the use of control groups, allowing determination of how non-assisted firms fared by comparison to those assisted. Step 5 further qualifies what would present a suitable control group through the use of a control group of matched firms, that is, firms with similar characteristics, so that any observed different outcome between the two groups can be attributed to the intervention assessed. However, differences are likely to remain between the groups even with the matching approach, and there may be systematic bias between the assisted and non-assisted groups – selection bias (the group getting assistance may be inherently different to firms not getting any, e.g. more proactive in researching their options, younger, in particular locations, etc.). Step 6 – “heaven” in Storey’s (1998) framework - would be achieved by an evaluation that considers and corrects such selection bias18.
An example of a short-term study that would be categorised as a Step 2 (and rather weak) evaluation would be Kapareliotis and Zarkada (2012), who set out – among other things – to evaluate the impact of a female entrepreneurship training in Greece. The training programme is described as having been the foundation for a large-scale programme across the nation. Their evaluation entirely built on facilitators’ and participants’ feedback straight after course completion. No specifics were reported on most of their detailed evaluation objectives, including results against measures on “the degree to which female students could see
17 Fortunately, Storey (1998) and subsequent works triggered some change, with clearly defined
objectives more common in enterprise policy now (certainly in the UK). An example is the Growth Voucher Programme (employing a random control trial approach, where eligible firms are randomly assigned to the assisted or non-assisted groups) with articulated objectives – even if not quantified (see BIS, 2014b, p. 5).
18 This limitation to six analytical steps is not without criticism, for example Roper and Hart (2003).
- 33 -
themselves as potential entrepreneurs, having unique characteristics but being as competent as their male counterparts” and “[participants] attitude towards business ownership”. As such the difference the training made was not assessed (which would be Step 3).
Just as demonstrated by Kapareliotis and Zarkada (2012) many evaluations provide dubious results failing to take into account the observable and unobservable differences between treated and untreated groups (Greene and Storey, 2007).
By comparison, Cumming and Fischer (2012) conducted a far more robust assessment of the non-profit Innovation Synergy Centre in Markham. Founded in 2003 it provides a “one-stop shop” linking available advisory services of “experienced consultants and business professionals” with those seeking it, described as “senior managers of established businesses”. Expressly referring to Storey’s work (2000) on the Six Steps methodology, their research hypotheses are all phrased to allow for selection effects and endogeneity. As such, different models with Heckman’s selection correction regression were run. Their findings indicate a significant link between hours of advice provided and sales growth (and success with obtaining financing), whilst the impact on patents and alliance formation appears more dubious after allowing for endogeneity.
Cumming and Fischer’s (2012) work provides one example of direct reference to the Six Steps framework. The framework has since been widely referred to by enterprise researchers when discussing the quality of evaluations, representing the enterprise policy evaluations’ gold standard. This has – no doubt – more recently also been helped by the OECD’s adoption of
the Six Steps framework for its guidelines “for the evaluation of SME and entrepreneurship policies and programmes” in 200719.
What Storey achieved through his articulation of a memorable grading scale for evaluations is a widely applied framework that leads to a more critical appraisal of both existing evaluation and (of) the design of planned evaluations. Arguably this has led to a wider use of techniques
- 34 -
involving the counterfactual and addressing selection bias. The OECD (2007), the World Bank (2010) or Cook et al. (2008)20 all provide fairly recent guides to what good evaluation should
look like. The former two make express reference to Storey’s framework, whilst Cook et al. (2008) are also clear about the difference of monitoring and impact assessment. The assessments of Business Link21 by BERR (2007)22, Mole et al. (2008); Mole et al. (2009) and
Mole et al. (2011) are all examples of evaluations adopting this type of evaluation framework.