Chapter 2: Literature Review
2.4 Evaluating Payments for Environmental Services
According to Leeuw and Vaessen, impact evaluation is the assessment of “the
positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended” (Leeuw
is linked into weak evidence base concerning the impacts of various interventions such as the impacts of protected areas and community forest on environmental conservation and poverty reduction (Andam et al., 2008; Ferraro, 2009a; Pattanayak et al., 2009; Andam et al., 2010; Pattanayak et al., 2010). Knowledge on the performance of market based conservation intervention like PES programs on what works and does not work is vital to the public, conservation practitioners and policy makers to improve their conservation decisions in the midst of scarce conservation funds (Ferraro and Pattanayak, 2006; Vaessen et al., 2007; Margoluis et al., 2009).
In order to capture what works and what does not work, methods from development economics, education and public health that are widely attributed to the transformation of these fields offer opportunity for conservation field to adopt them conducting rigorous impact evaluation of conservation interventions (Glewwe and Kremer, 2006; Glewwe et al., 2011). These methods assess counterfactual effects of an intervention by determining what would have happened in the absence of it (Ferraro and Pattanayak, 2006; White, 2006; Ferraro, 2009a; White, 2009). Approaches that are widely considered rigorous to assess counterfactual outcomes or impacts of an intervention include those that employ experimental and quasi-experimental research design (Baker, 2000; White, 2006; White, 2009; Bamberger et al., 2010). Experimental research design relies on data obtained from randomly assigned subjects into treatment and control groups and collected before and after intervention (White, 2006; White, 2009; Khandker et al., 2010). Both treatment and control groups are randomly eligible sample units. Control group which did not receive treatment is used as a
counterfactual or comparison group for treated since its characteristics are assumed to be not systematically different from treated group (White, 2009; Khandker et al., 2010). Experimental design is widely regarded as the gold standard for intervention evaluation. However, while their design presents strong evidence for causality and strong internal validity; often, it is not practical due to ethical issues, high implementation costs and weak external validity such that the findings cannot be generalized to other settings due to the artificial settings in which experiments are implemented (Vaessen et al., 2007; Margoluis et al., 2009).
Quasi-experimental research design is the second-best option for conducting impact evaluation when randomization of treatment is impossible. It relies on observational data from units exposed to intervention and units not exposed to intervention (with-without approach) or from data collected before intervention and after intervention (before-after approach) (Khandker et al., 2010). Before- after and with-without quasi-experimental designs are commonly used in the field of conservation policy (Ferraro and Simpson., 2002; Pagiola, 2005; Ferraro and Pattanayak, 2006; Pattanayak et al., 2009; Khandker et al., 2010; Brown et al., 2011). However, these methods are considered inadequate due to time trend biases for before-after comparison, and selection bias for with-without comparison (Margoluis et al., 2009; Khandker et al., 2010). Statistical methods such as difference - in - differences estimation or propensity score matching techniques are used to create comparison groups that are bias free (Ho et al., 2007; Margoluis et al., 2009; Austin, 2011). Compared to experimental design, quasi-experimental research design is easier to establish and able to generate
results with moderately high external validity that permits some generalizability of findings (Vaessen et al., 2007; Margoluis et al., 2009).
Frequently, findings in terms of mean differences from studies (i.e. environmental impacts) which use rigorous quasi-experimental research design (i.e. with appropriate counterfactual) are not significantly higher than might have been when inferred from simple comparisons with inappropriate counterfactual group such as comparison areas or non-treated groups that are different from treated areas or groups (Joppa and Pfaff, 2011). Often, high impacts are reported for studies that do not control selection bias caused by non-random factors such as those which influence selection of areas where interventions are implemented, or group assignment or sign up to participate in programs. For example, treatment areas such as protected areas may be located in more remote areas away from roads, in higher elevations or on soil types that are marginal for agriculture or population centres (Andam et al., 2008; Pfaff et al., 2008; Joppa and Pfaff, 2010; Joppa and Pfaff, 2011). Due to these aspects, treatment areas would be expected to have lower deforestation rates regardless of the protected area intervention itself (Joppa and Pfaff, 2011).
In addition, experimental and quasi-experimental research design encounters spill over effects when the direct or indirect effects of the intervention leaks over from the treatment group into the control. Also, they can lack compliance from intervention implementing agency that might compromise the study due to the fear of finding negative impacts or inadequate positive evidence which might jeopardise future support for funding (Chen et al., 2009; Prowse and Snilstveit,
2010). Furthermore, while experimental and quasi-experimental design can rigorously generate evidence of the effectiveness of an intervention, they are less able to tell why and how intervention works or does not work and under what circumstances in order to inform improvement or revisions (Bamberger et al., 2010). To address these concerns, quantitative methods need to be combined with qualitative methods that are able to explain questions that cannot be answered by experimental and quasi-experimental methods.
Evaluation studies that use qualitative evaluation design focus on sampling framework and not on how exposed and non-exposed subjects compare (Vaessen et al., 2007; Margoluis et al., 2009). Frequently, stratified purposeful sampling method is used whereby subjects that vary according to some dimensions are sampled within stratified samples to facilitate comparison (Margoluis et al., 2009). Another qualitative sampling method is extreme or deviant case sampling for the purpose of learning from highly unusual issues of interest such as outstanding success and notable failures (Margoluis et al., 2009). Theory based or operational construct sampling is another qualitative sampling method which sample subjects on the basis of their potential manifestation of theoretical construct so as to elaborate and examine constructs (Margoluis et al., 2009).