Chapter I: Literature Review
1 Introduction
1.4 Issues in task design and frequentist approach to data analysis
Attentional set-shifting tasks on animals have been extensively used to study behavioural flexibility. But how do we know whether animals have learned as a researcher intended during discrimination learning in the set-shifting tasks? In all animal-learning studies, we can only infer whether animals have learned the correct stimulus-reward association based on the pattern of the behavioural choices. The behavioural learning criterion adopted in the attentional set-shifting tasks is typically based on a frequentist approach using inferential statistics. For example, a criterion of 6-correct-choices-in-a-row given a null hypothesis of responding randomly (p = 0.056
= 0.0156) was adopted to determine whether a rat has learned to find the reward- associated stimulus in each stage in the 7-stage task (Birrell & Brown, 2000). However, there are other criteria under which the rat might be judged to have learned a discrimination. Suppose a rat makes five correct choices, followed by one error, then again five correct choices followed by one error (Table 1.2). Although the chances of randomly choosing the correct bowls on 10 out of 12 trials is less than 2% (0.019), such a 10-out-of-12 correct choice would not satisfy the 6-correct-choice-in- a-row criterion. This raises two issues: (1) how big the ‘window’ over which performance is considered should be, for windows that are too large will not detect learning well, and windows that are too small are prone to statistical errors; and (2) must a subject (animal or human) perform perfectly to conclude that the subject has learned the contingencies in the task?
Table 1.2: an example of a rat’s choice results in a learning stage. Trial 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-20
Another issue is that the 6-correct-choice-in-a-row criterion (or similar criteria; Brigman et al., 2005; Brooks et al., 2012; Floresco et al , 2006; see Table 1.1) takes a classical null-vs-alternative hypothesis inferential testing approach. The rationale behind this approach is that, if the null hypothesis is very unlikely true (e.g., less than 5% chance), then the null hypothesis is rejected. The problem is that once the null hypothesis is rejected, there are often multiple alternative hypotheses consistent with the data. Deciding which of these alternatives is correct is beyond the scope of null hypothesis testing. For the 6-correct-choices-in-a-row criterion, ‘randomly choosing a bowl’ is the null hypothesis, and ‘using the reward-associated stimulus for bowl choice’ is considered as the (only) alternative hypothesis. However, this is a potential error in statistical reasoning, because there are other alternative hypotheses about the response pattern of the rat that do not lead to correct responding or random responding. For example, a rat may always choose bowls on the same side. This means the traditional ‘null-versus-alternative’ hypothesis testing approach does not fully describe the behaviour. The consequence of this is that while the criterion may indicate learning, it discards other information about the animal’s choices. This could lead to either more trials being presented even after the animal has learned the discrimination (‘false negative), or it could lead to finishing a stage even if the animal has not learned the discrimination (i.e., ‘false positive’). For example, if a rat correctly chooses bowls in six consecutive trials, with the six side choices ‘left-left-right-left- right-left’, the rat might choose bowls by alternatively changing side of choice in last five trials rather than by reward-associated information, leading to a ‘false positive’. In this case, the rat would probably be under-trained and need more training.
The problem of false positives increases as the number of trials increases. I have performed a computer simulation to estimate the likelihood of happening to respond correctly in six consecutive trials when virtual rats are actually responding randomly (Figure 1.3). The simulation result shows that the likelihood of false positives increases quickly with more trials, indicating that false positives may often happen
when using 6-in-a-row criterion to judge rats’ learning, particularly when rats find it hard to learn. This becomes disproportionately problematic for studies of animals exhibiting impaired behavioural flexibility; in which case, false positives would likely happen when animals take longer to learn, thereby reducing the apparent magnitude of the impairment.
Figure 1.3: the likelihood of false positive increases with more trials with two different learning criteria. With the 6-in-a-row criterion (red curve), to estimate the false positive likelihood for a specific trial number k, I generated 5000 sequences of k binary (i.e., ‘correct’/‘incorrect’) values, with each binary value randomly generated. A sequence is considered ‘false positive’ if there exist six consecutive correct values anywhere in the sequence. The false positive likelihood is the ratio between the number of ‘false positive’ sequences and the total sequence number 5000. A similar simulation process was performed for the 8-correct-in-10 criterion (green curve). The simulation result with 6-in-row criterion (red curve) is confirmed by a recursive formula (black curve) recently described by Fazekas et al. (2010).
Moreover, the traditional hypothesis-testing approach can only help us decide when animals have learned, it does not allow us to determine which pattern of possible responding, which could be relevant to perceptual stimuli or spatial locations, predominates in each learning stage for each rat. Before establishing the correct stimulus-reward association, a rat may have tried other non-random but reward- irrelevant spatial patterns or stimulus characteristics on which to base a bowl choice (see Section 2.3 for detail). Knowing these details from each rat would help researchers understand more deeply the course of learning when animals solve discrimination learning problems, and explore the difference in learning processes between individuals and groups (e.g., the effects of different lesions or drug treatments). The approach based on null hypothesis p-values cannot provide such
detailed information.