• No results found

I NDICATORS 5.1 Introduction

5.2. Causal and effect indicators

For something to function as an indicator, it must be the case that it reliably correlates/covaries with the underlying state that it is standing in for, and this requires a causal relationship with the target state. Correlation on its own is insufficient – many different and unrelated factors may correlate under particular test conditions (Borsboom, Mellenbergh, & van Heerden, 2004). Borsboom et al. (2004) argue that validity is grounded in causation – a test is valid when there is a causal link between the target and the indicator. Otherwise, measurement is just not taking place. Having a causal relationship between indicators and the

target also gives some other epistemic benefits (Markus & Borsboom, 2013) – it gives us increased reason to believe the correlation will persist under change (holding fixed relevant background conditions), and reason to have confidence in our predictions.

Sidestepping as much as possible the literature on the relation between correlation and causation, we can generally assume that when there is a reliable correlation between two variables A and B, it is either because A causes B, B causes A or there is a common cause for both A and B: “a covariance structure model implies potential nonzero covariances among measured variables if (a) there is a correlational, direct, or indirect path between the measured variables or (b) the measured variables share a common source variable or correlated source variables” (MacCallum & Browne, 1993, p. 539). When we are looking for indicator measures, they will stand in one of these three relationships with the target state – they will either be a cause of the target, an effect of the target, or a mutual effect of a common cause. These three categories of indicators will have different features, both mathematically and pragmatically. Here I will focus on the first two categories, the ‘causal’ and ‘effect’ indicators27, as the

‘common cause’ type are likely to be much less common. Animal welfare science commonly uses both causal and effect indicators.

Bollen & Lennox (1991) differentiate between “indicators that influence, and those that are influenced by, latent variables” (1991, p. 305) – causal and effect indicators. Effect indicators are those that stand causally ‘downstream’ from the target state. Changes in the indicator are a result of changes in the target. They can be characterised by an equation such as Yi = λi1η1 +

εi, where Y is the indicator, η is the target state, ε is the level of measurement error and λ is a

coefficient representing the level of effect of the target on the indicator (Figure 5.1). These indicators are thus determined by the underlying state we want to measure.

Figure 5.1: Path diagram of effect indicators (modified from Bollen & Lennox, 1991, p. 306)

Effect indicators stand downstream from the target state; they are effects of this state. These measures will covary with the target state because changes in the target will cause changes in the measures. There are many examples of the use of effect indicators in science. In medicine, an effect indicator might be white blood cell count – it is an indicator of infection, since white blood cells increase as a result of the presence of foreign micro-organisms. In animal welfare science these are what I referred to in Chapter Two simply as ‘indicators’, and are often referred to in the animal welfare literature as ‘animal-based’ measures (e.g. Botreau, Bracke, et al., 2007) or ‘output’ measures (e.g. Kagan et al., 2015). They are physiological and behavioural indicators that are used to measure changes in welfare, where it is assumed that a change in the indicator reflects a change in the underlying subjective experience. Examples include measurements of blood cortisol levels or approach and withdrawal behaviour towards a particular stimulus. These indicators change as welfare changes.

By contrast, causal indicators stand causally ‘upstream’ from the target state, where changes in the indicator are a cause of changes in the target. Causal indicators are characterised by a more complex equation of the form η1 = g11χ1 +g12χ2 + … + g1nχn + ζ1, where χis an indicator,

η is the target state, g is a coefficient representing the level of effect of each indicator on the target and ζ is a variable representing error or additional causal factors (Figure 5.2). The crucial difference here is that the indicators are determining the target variable rather than determined

by it. Although both types of indicators will correlate with the target state, with effect indicators

we are observing the effects of an underlying state, while with causal indicators we are observing the causes of that state.

Causal indicators stand upstream from the target state; they are themselves causes of this state. They covary with it, as changes in the indicator will create changes in the target. Some authors (e.g. Markus & Borsboom, 2013) don’t wish to admit causal indicators as true measures at all, as they define measurement as occurring in a single causal direction. In their view, causal indicators predict the target rather than measure it. For my purposes, as long as there is a reliable correlation between the indicator and the target, with an underlying causal relationship, this is sufficient to serve as an indicator - even if not a true ‘measure’ in this formal sense - as knowledge of the value of the indicator tells us about the value of the target. An example of the use of causal indicators in ecology is the use of rainfall measures to estimate biodiversity, as the level of rainfall will affect the type and abundance of species in an area. In the case of animal welfare, these form what I have called ‘conditions’ for welfare (see Chapter Two), sometimes also known as ‘provisions’ for welfare; those things that will cause changes in the subjective states that compose welfare. These are also referred to in the animal welfare literature as ‘environment-based’ indicators (e.g. Botreau, Bracke, et al., 2007) or ‘input’ measures (e.g. Kagan et al., 2015). We can measure changes in the conditions for welfare in order to infer changes in the state of welfare. For example, some of the conditions for welfare will be presence of adequate food and water, freedom from disease and adequate mental stimulation. Direct measurement of these conditions can serve as proxies for measurement of welfare itself. These types of indicators are commonly used in animal welfare assessments. Several different measurement frameworks are used to assess the welfare of animals under particular husbandry conditions, such as the Welfare QualityÒ (Botreau, Veissier, & Perny, 2009) or Five Domains (Mellor, 2016) frameworks, which will be discussed in more detail in Chapter Seven. These sorts of measures are less commonly used in animal welfare science itself.

The third set of indicators would be those that covary with the target state due to the presence of a common cause. For example, we might think that the visual presence of lightning could be used as an indicator for thunder, as both will correlate as a result of the common cause of an electrical discharge. There do not seem to be any examples of this sort of measure being used in animal welfare science, but it is the sort of thing that could potentially be used here, or in other areas that use proxy measures, such as in conservation biology. For example, using population levels of a particular species as a surrogate for overall biodiversity is likely to be of this type, as the environmental conditions that affect biodiversity will also affect the numbers

of the surrogate species, and thus they will covary – an example of such a common cause in this case could be water availability.

There is a difference in how we validate causal and effect indicators. Bollen & Lennox (1991) look at some of the common ‘guidelines’ in use for the selection of and validation of different indicators. Importantly, the types of procedures that can validate indicators will differ between causal and effect indicators, particularly when considering those that rely on measure of correlation between different indicators. The conventional wisdom has been that indicators that are positively correlated with the same concept, should be positively correlated with one another. Additionally, there has been disagreement about what level of correlation should be considered ideal. They looked at these claims mathematically and found that for effect indicators, they should always be positively correlated (that is, a negative or zero correlation says they are not measuring the same thing) and will be best off when the correlation is as high as possible (as it is a direct reflection of the correlation between each indicator and the target). For example, when considering indicators of animal welfare, we would expect to find a correlation between the change in blood cortisol levels and stress-related behaviour as they are both effects of the common cause of welfare.

By contrast, for causal indicators there is no reason to expect any correlation between indicators, as they work independently, and they thus will not correlate with one another. There is no reason to think that two common causes of a state will covary. For example, when considering animal welfare, there is no reason to think that the availability of food and water would necessarily have any relationship with access to a social group. Importantly, this means that while effect indicators can be, in part, validated through measures of correlation with one another, causal indicators can only be validated through embedding in a model which also contains effect indicators; a point which will be further explored in Section 5.5.

The second claim they examined regarded whether it was necessary for validation to select a variety of indicators. The claim is that selection of a diversity of indicators will ‘capture different facets’ of the target, and thus use of these different indicators is necessary for complete and valid measurement of the target. They found that this will be true only for causal indicators. In the case of effect indicators there is no reason to require use of diverse indicators, as removal of particular indicators would have no significant impact on the measurement of the target variable. If any single effect indicator is providing a measure of the target variable, the magnitude of change in this indicator will be representative of the magnitude of change in the target, regardless of how many other effect indicators are also used. For cases where this is not true, it is because we have a multi-dimensional concept (a construct or composite target)

which can then be broken down and analysed in terms of each individual facet. Take as an example the composite target of health. Because this is composed of a large number of different components, such as cardiac functioning and immune response, no single measure will be sufficient to capture the entire target, and many must be used in conjunction. But in typical single-target cases, while having multiple effect indicators will help in reliability (as failures in any one indicator will not ruin the results), and each indicator should be correlated with the target, which ‘facets’ they each measure will not matter.

However, this is not the case for causal indicators. When using causal indicators, it is important that all relevant factors are included in the model. The removal of even one causal factor will have a strong impact on the measurement of the variable, as they are all necessarily contributing to changes in the target. Consider measurement of welfare: if we were to try to measure welfare through causal indicators (welfare conditions), we might include things like stocking density, food availability and social interactions. We would then measure the level of all these variables to determine welfare level. However, if we left out an important contributing condition, such as presence of injury, our results would be inaccurate. We might look at an animal with lots of food and a soft place to sleep, concluding it has good welfare, but have failed to take into account the strong negative effect of pain. Only by including all causal indicators will we get an accurate measure of the target.

The important points to come out of this are as follows:

• Effect indicators can be, in part, validated through measures of correlation with one another

• Causal indicators can only be validated through embedding in a model which also contains effect indicators

• Causal indicators must all be measured to give a reliable measure of the target It may seem here that this all speaks against the use of causal indicators at all, which many authors seem to agree on (e.g. Markus & Borsboom, 2013). However, in many cases (particularly within animal welfare assessment), the causal indicators are easier to see than the effect indicators and can be used for quick large-scale assessments that effect indicators would be impractical for. For example, trying to do behavioural and physiological assessments on even a small sample of the animals on a farm is going to take far longer than looking for the causal husbandry variables which will impact all the animals and drawing conclusions based on these.

As causal and effect indicators stand in different relations to the target state, they are each going to have their own unique features and drawbacks. Each of the indicator types have different features and what is important is that we accurately identify which type of indicator is operating, in order to use it correctly. As I will detail in Section 5.5, correctly identifying which type of indicator we are using will be crucial for the process of validation.

5.3. Validation

The validity of a test or measure refers to whether or not it is really measuring what it purports to – whether the observed data are actually tracking the intended phenomenon. Validation of indicators is thus testing to ensure that the indicators are tracking the right target state – that the values and changes in indicators are correlating with changes in the target. In particular, we need to establish that one of the types of causal relationships discussed above holds between the indicator and the target. The process of validation will vary depending on what type of target we are talking about and in this section, I will apply some of the discussion of validation to the different categories of targets I introduced earlier. I will show that for hidden targets such as welfare, there is a particular problem for validating the indicators.

In some cases, we may use the presence of adequate predictions as a form of validation (Markus & Borsboom, 2013). The idea being that, if we are able to make such predictions from measurements of the indicators, this gives us reason to think that we are measuring the correct target. The success of the predictions is best explained by the validity of the indicators. Similarly, Bringman & Eronen (2016) suggest that the success of theories that are built using the measurements will add to our confidence in the validity of the measurements. When using the measures, we work with the assumption that the measures are valid, and if the theory is successful, in terms of explanatory and predictive power, this supports the assumption. It is very unlikely we will have accurate predictions based on invalid measures. “What increases confidence in the validity of measurements is the success of the theories that are based on them, and what justifies the success of those theories is their explanatory and predictive power. Testing the latter need not involve the same types of measurements whose validity is in question” (Bringmann & Eronen, 2016, p. 36). There are two ways in which predictions might be seen as a form of validation. The first is that if, using our measurements, we are able to make further predictions (e.g. that given a certain measurement of physiological variable x, we should see behaviour y), then this gives us confidence our measures are valid. This method is not necessarily strong, as there can be other explanations for the success of predictions. Although it may form one strand of evidence (and may be part of a robustness analysis, as

described further on), it does not seem sufficient to stand alone for validation. The second is that the predictions about the measures themselves – the inputs and outputs of our causal model about animal welfare – are accurate; so that targeted interventions on input give the expected outputs. This would give us confidence in the content of the model, and is a similar process to that I will propose in Section 5.5.

For ‘composite’ targets, there is not really a unique problem of validation. As the target state is simply an aggregate of the measured indicators, it is going to be true by definition that the indicators are measuring the target. There may be separate problems of deciding which features to include within the composite, but this not a validation issue. All that may be required is some modelling to determine the relative weights of the contribution of different indicators to the target and to decide on which aggregation function to use. For example, think of a simple case – the composite ‘bachelor’. Bachelorhood is not a natural property, but rather a construct of sex (male) and marital status (single). We do not need to validate to know whether measuring sex and marital status will measure the composite; it must be true. In some cases, our indicators may be indicators of the components of the composite, rather than themselves being components (e.g. using blood pressure as an indicator of cardiac function, which is a component of the composite ‘health’). In these cases, we would need to ensure that these were themselves valid indicators. Here, validation of the indicators would proceed according to one of the other two categories (‘hidden’ or ‘difficult’), depending on the particular example.

For ‘difficult’ targets, validation is relatively straightforward though direct measurement of the target. It is necessary to establish a causal link between the target and the indicator. Borsboom et al. (2004) describe the process of validation as requiring the establishment of a reliable correlation, and providing a theoretical explanation for the causal pathway between the target attribute and the measurement outcomes. This involves first determining the causal direction (whether we have a causal or an effect indicator). This can be done through using theory - embedding within a theoretical framework that explains the causal connections between the target and the indicators (Bringmann & Eronen, 2016; Lindenmayer & Likens, 2011) – or through testing to look for timing and direction of effect.

The second step is establishing a reliable correlation by measuring both the target and the indicator under a range of conditions and (preferably) interventions, where particular conditions will be deliberately varied to alter the target, and the indicator will be checked to ensure it tracks these changes. Where interventions or manipulations are not possible, we can try to use ‘natural’ experiments; using the results of natural change or randomness (Markus & Borsboom, 2013). If we see a reliable correlation between the target and indictor under a range

of conditions, we have good reason to think that there is a valid causal connection. What we require is correlation over a range of interventions (Markus & Borsboom, 2013).

We thus have a change in a condition (either induced experimentally or tracked naturally), which causes a change in the target state (which we can track through direct measurement), and which then also causes a change in the indicator (which we track through measurement of the indicator) (Figure 5.3). When we observe such correlation between measures of the target