• No results found

AT THE EDGE VS THE CENTRE OF THE COHERENCE INTERVAL

Experiments 1 to 4 provided strong evidence that people's responses to a range of inferences of differing complexity are coherent at above chance levels. Experiments 5 to 7 move beyond this qualitative finding to investigate the degree to which responses are coherent above chance levels for these inferences, and whether people are more coherent for some inferences than for others. For example, is coherence higher for simpler one-premise inferences than for two- premise inferences? And does coherence differ for valid and invalid inferences?

The latter question is important in the investigation of a role of p-validity over and above coherence. As mentioned earlier, the definition of p-validity necessarily rests on that of coherence, and this makes it difficult to test whether p-validity plays an independent role. But if coherence differs for valid and invalid inferences, then this difference is not something that can be explained from within coherence, whereas it would be accounted for naturally by p- validity.

The way such quantitative comparisons of coherence become possible is by holding the chance rate constant between inferences and conditions. This was done in experiments 5 to 7 by using a binary response format, which rendered the chance rate equal to 50% in all cases. The precise task given to participants using this response format differed between Experiments 5 and 7.

Experiment 5 investigated questions about relative coherence between individual inferences, and between groups of inferences, such as those that are valid or invalid, and those of type A, B or C. But in addition, the experiment specifically studied the role of the location of the probability of the conclusion relative to the coherence interval. This was done by making two comparisons. The first was whether the probability of the conclusion was (a) clearly inside, or clearly outside, the interval, or alternatively (b) at the interval edge. The second was whether the probability of the conclusion was (a) inside or (b) outside of the interval.

Suppose people are sensitive to coherence, but their subjective "scale" for degrees of belief is coarser than a point probability, as suggested by Figures 5.10 and 5.14 of Experiments 3 and 4, respectively. Then one can predict that people's judgments would tend to be more coherent when a conclusion probability was clearly inside, or clearly outside, the relevant coherence interval than when it was at the interval edge. If people tend to be coherent but have rather "coarse" degrees of belief, then this would have implications for the development of algorithmic level accounts in reasoning and decision making. No specific prediction was made for whether participants would be more, or less, coherent when assessing whether the conclusion probability was inside, or outside, the coherence interval.

169 Participants received an inference task in which the probabilities of both the premises and the conclusion were given by the experimenter. The task was to judge whether or not the probability of the conclusion was consistent with the probabilities of the premises.

In contrast to Experiments 1 to 4, the task in this experiment was purely deductive. In Experiments 1 to 4 participants had been asked to generate their own conclusion probabilities, and these conclusion probabilities were constrained deductively by coherence. But coherence only constrained responses to a given interval, and with the exception of the inferences of DM and nDM, this interval was wider than a point value. Given that the task instructions asked for a point value as a response, it was up to inductive criteria or chance to narrow down the interval to a specific point. In the present experiment, in contrast, the question participants were asked could be fully answered using only the deductive constraints of coherence.

Method

Participants

A total of 136 participants from English speaking countries completed the online experiment in exchange for approximately £5 per hour. Participants accessed the experiment through the online platform Prolific Academic. Three participants were excluded because they indicated at the end of the experiment that they had not taken part seriously but had just "clicked through". Further 24 were excluded because they failed one or both of two catch trials designed to check whether participants were reading the materials. The final sample consisted of 109 participants. None of them had trial reaction times of 3 seconds or less, and they all indicated having at least "good" English language skills. Their median age was 31 years (range: 18-73), and most reported having some college education: Around 26.5% indicated having finished 12th grade, 12% reported having a technical/applied degree, 44% reported an undergraduate degree, and 17.5% a postgraduate degree. Participants' median rating of the difficulty of the experiment was 74%.

Design and materials

Experiments 5, 6, and 7 investigated the same 12 inferences as Experiments 3 and 4. These inferences are reproduced in Table 6.1. Each inference was presented in three premise probability conditions. For the one-premise inferences, these were the probabilities of 1, .8 and .5, which were taken as possible instantiations of "certain", "high", and "medium" degrees of belief. For the two-premise inferences, both premises always had the same probability in order to simplify the task for participants. These premise probabilities were matched to those for the one-premise inferences not in terms of their numerical value, but in terms of the sum of their uncertainty (with uncertainty = 1 – probability, Adams, 1998). This implies that the conditions

170 for the one-premise inferences with a premise probability of 1, .8, and .5 corresponded to the conditions for the two-premise inferences in which both premises had probabilities of 1, .9, and .75, respectively. For example, for a two-premise inference with premise probabilities of .9, the sum of the uncertainties of the premises is (1 - .9) + (1 - .9) = .2, which is equal to the uncertainty of a one-premise inference with a premise probability of .8. For ease of exposition, the results section refers to a condition with a premise probability of 1, .8, and .5, even though strictly speaking this is only true for the one-premise inferences. For the two-premise inferences the conditions of 1, .8, and .5 really refer to the complement of their uncertainty sum rather than to their premise probability.

Table 6.1. The inferences investigated in Experiments 5 to 7.

Type Name Validity Form

A. One- premise, equivalence and

contradiction

1. De Morgan (DM) 1 not(p & q) not-p or not-q 2. not De Morgan (nDM) 0 p & q  not-p or not-q

B. One- premise, valid in only one direction, left to right, or right to left

3. and-elimination (&E) 1 p & q  p 4. and-introduction (&I) 0 p  p & q 5. and-to-or (&Or) 1 p & q  p or q 6. or-to-and (Or&) 0 p or q  p & q

7. if-to-or (IfOr) 1 if not-p then q  p or q 8. or-to-if (OrIf) 0 p or q  if not-p then q C. Two-

premise, conditional syllogisms

9. Modus ponens (MP) 1 if p then q, p  q

10. Modus tollens (MT) 1 if p then q, not-q,  not-p 11. Affirmation of the

consequent (AC)

0 if p then q, q,  p

12. Denial of the antecedent (DA)

0 if p then q, not-p,  not-q

Note. "1" = "valid", "0" = "invalid", "" = "therefore".

Within each premise probability condition, each inference was presented five times, with five different conclusion probabilities. These conclusion probabilities are represented as blue dots in Figure 6.1. The vertical black lines in the figure represent the coherence intervals for each premise probability condition. Notice that for the premise probabilities used, Figure 6.1 illustrates a clear difference between the valid and invalid inferences, with the intervals for the probability preserving valid inferences being restricted to higher values, and those of the

171 invalid inferences taking low or more probabilistically uninformative values. Figure 6.1 also shows that the standardisation with respect to premise probability came at the cost of a lack of standardisation with respect to interval width. An exception was that all the intervals for the inferences of type B had equal width when the probability of the premise was .5.

Figure 6.1. The conclusion probabilities used in Experiment 5 for each inference. The dots represent the conclusion probabilities, and the vertical lines represent the coherence intervals for each premise probability condition.

The conclusion probabilities were chosen so as to lie at the centre of the interval, at its inner or outer edge, or clearly outside of the interval. In the latter case they were placed at the upper or lower end of the probability scale. Sometimes this rationale for varying conclusion probabilities relative to the location of the interval required fewer than five conclusion probabilities to be instantiated. For example, the cases in which the coherence interval equalled either the point value of 1 or the unit interval required only three conclusion probabilities: one equal to the point value resp. to the centre of the unit interval, one next to the point or at the top of the unit interval, and one at the bottom of the interval, around zero. In such cases further items were added in order to nonetheless have five items for each combination of inference with premise probability condition. Conversely, there were two cases (out of 36) that required more than five conclusion probabilities to be instantiated. One can see these in Figure 6.1: For inference 9 (MP), the case in which the premise probability condition was .5 (i. e. the case in which the sum of premise uncertainties was .5, because each premise

172 probability was .75) would have required two additional conclusion probabilities to cover the inner and outer edge of the lower end of the coherence interval. In this case priority was given to cover the upper end of the interval, in order to allow an assessment of overconfidence for MP. Overconfidence for this inference, unlike underconfidence, cannot be measured in the binary approach to reasoning, but it can be in the probabilistic approach.

The second case in which five conclusion probabilities were not enough to cover all relevant positions of the interval was for inference 12 (DA) when the premise probability condition was .5 (i. e., in the condition in which the probability of each premise was .75, and so the sum of premise uncertainties was .5). Figure 6.1 shows that here one further conclusion probability would have been necessary to cover the inner edge of the lower bound of the coherence interval. In this case the outer edge of the lower bound was given priority, because the inner edge of the lower bound is already covered in the condition in which premise probability is .8.

Overall, by fixing the number of conclusion probabilities for each combination of inference with premise probability to five in all cases, it was possible to capture the vast majority of relevant locations on the probability scale, while limiting the number of irrelevant additional items, and preventing some conditions from being more salient than others merely because of differences in their frequency of occurrence. A higher saliency for some conditions than others based on their frequency of occurrence could have led participants to process the oddball items with heightened attention, possibly leading to higher coherence (c. f. the effect of working memory load in Experiments 3 and 4). Differences in coherence due to the logical form of the inferences would then be confounded with differences in coherence due to saliency – a problem that is avoided by equating the number of conclusion probabilities used across conditions.

Table 6.2 provides more detailed information on the conclusion probabilities used. The rightmost column of the table shows that one of the conclusion probabilities was always a point value equal to the centre of the coherence interval. The remaining four conclusion probabilities were randomly selected for each participant and condition out of a range of five values, determined by the location of the coherence interval for the respective condition, and by the upper and lower ends of the probability scale. For example, suppose the lower bound of a coherence interval for some combination of inference and premise probabilities was .6. Then to capture the inner edge of the lower bound of this interval, a random number between .61 and .65 would be selected. And to capture the outer edge of this lower interval bound, a random number between .55 and .59 would be selected. Hence, the edges of intervals were captured by taking a random number within five percentage points of either side of the edge. An exception was when the edge of an interval was very near the lower or upper bound of the probability scale. This was the case for example for inference 12 (DA) in the condition in which the sum of premise uncertainties was .8 (see Figure 6.1). In this case the five percentage

173 point range for the outer edge of the upper interval bound would have overlapped with the five percentage point range for the upper end of the scale. Therefore instead of using two overlapping ranges, a single five percentage point range was used, which was centred between the upper edge of the coherence interval and the upper end of the probability scale.

Table 6.2. The conclusion probabilities used in Experiment 5 for each inference and premise probability condition.

Inference P(premise) Coherence Interval P(conclusion) 1 (DM) 1 1 1, [.95,.99], [.01,.05], [.64,.68], [.31,.35] .8 .8 .8, [.95,.99], [.81,.85], [.75,.79], [.01,.05] .5 .5 .5, [.95,.99], [.51,.55], [.45,.49], [.01,.05] 2 (nDM) 1 0 0, [.95,.99], [.01,.05], [.64,.68], [.31,.35] .8 .2 .2, [.95,.99], [.21,.25], [.15,.19], [.01,.05] .5 .5 .5, [.95,.99], [.51,.55], [.45,.49], [.01,.05] 3 (&E), 5 (&Or), 7 (IfOr) 1 1 1, [.95,.99], [.01,.05], [.64,.68], [.31,.35] .8 [.8,1] .9, [.95,.99], [.81,.85], [.75,.79], [.01,.05] .5 [.5,1] .75, [.95,.99], [.51,.55], [.45,.49], [.01,.05] 4 (&I), 6 (Or&), 8 (OrIf) 1 [0,1] .5, [.95,.99], [.01,.05], [.64,.68], [.31,.35] .8 [0,.8] .4, [.95,.99], [.81,.85], [.75,.79], [.01,.05] .5 [0,.5] .25, [.95,.99], [.51,.55], [.45,.49], [.01,.05] 9 (MP) 1 1 1, [.95,.99], [.01,.05], [.64,.68], [.31,.35] .8 [.81,.91] .86, [.94,.98], [.76,.80], [.01,.05], [.31,.35] .5 [.5625,.8125] .69, [.95,.99], [.82,.86], [.76,.80], [.01,.05] 10 (MT) 1 1 1, [.95,.99], [.01,.05], [.64,.68], [.31,.35] .8 [.88,1] .94, [.95,.99], [.89,.93], [.83,.87], [.01,.05] .5 [.66,1] .83, [.95,.99], [.67,.71], [.61,.65], [.01,.05] 11 (AC) 1 [0,1] .5, [.95,.99], [.64,.68], [.31,.35], [.01,.05] .8 [0,1] .5, [.95,.99], [.64,.68], [.31,.35], [.01,.05] .5 [0,1] .5, [.95,.99], [.64,.68], [.31,.35], [.01,.05] 12 (DA) 1 [0,1] .5, [.95,.99], [.64,.68], [.31,.35], [.01,.05] .8 [.01,.91] .46, [.94,.98], [.86,.90], [.02,.06], [.31,.35] .5 [.0625,.8125] .44, [.95,.99], [.82,.86], [.76,.80], [.01,.05]

Note. Conclusion probabilities without brackets denote the centre of the

respective coherence interval. Conclusion probabilities in square brackets represent the minimum and maximum of a range of values from which a number was drawn randomly for each participant and condition.

174 With 12 inferences, 3 premise probability conditions and 5 conclusion probabilities in each premise probability condition, the experiment had 12*3*5 = 180 trials, plus two catch trials to check whether participants were paying attention. The catch trials were similar in format to the regular trials, but the text for the premises and conclusion of the inferences was replaced with text stating that they were control trials to make sure participants were paying attention. Participants were asked not to respond, and were told that the experiment would continue automatically on the next page in a few seconds. The catch trails remained on screen for 8 seconds.

On each trial, the inference was introduced through a short context story in which a protagonist expressed a given degree of belief in the premises and in the conclusion. Participants' task was to indicate whether or not the likelihood that the protagonist assigned to the conclusion was consistent with the likelihood the protagonist assigned to the premise or premises. Participants were asked to provide their answers using the arrows on their keyboard, the left arrow standing for "no" and the right arrow for "yes". On each trial a picture of two arrows was presented: a red arrow pointing to the left with the word "no" written on it, and a green arrow pointing to the right displaying the word "yes". These arrow pictures were sensitive to mouse clicks, so that it was also possible to respond using a mouse.

The context stories were pseudonaturalistic. That is, they described concrete but fictitious situations in which it would be difficult to draw on world knowledge to judge the probabilities of the events involved. Further, the stories aimed to convey that the conclusions of the inferences were important or consequential, and that at the same time careful thinking as opposed to jumping to conclusions was called for. The reason for this was as follows. One of the purposes of the experiment was to compare coherence for valid and invalid inferences. But it seems implausible that the distinction between deduction and induction would be relevant in all contexts. A context in which it may become relevant is when it is worth being "conservative" because of a higher risk of drawing a conclusion that goes beyond what follows necessarily from the premises. The frame below shows a sample trial for inference 1 (DM) and a premise probability of .8.

The meanings of "premise" and "conclusion" were explained in the instructions. The experiment used six context stories on a range of topics: the research report of a team of zoologists, the murder of a member of parliament, an injured patient in an emergency hospital, a water dam with cracks, a robot mission to mars, and a cholera epidemic. The full description of the context stories can be found in Appendix D.

Each context story was randomly assigned to two of the twelve inferences for each participant. Within each participant the 15 trials in which each inference was involved referred to the same overall scenario, but the events described in the scenario varied slightly between premise probability conditions and between the two inferences to which the scenario was assigned. For example, in the case of the sample scenario above, the inference made reference

175 to different patients in each premise probability condition (patients P. M., H. D., and R. S.), and the doctor who expressed the beliefs about the premise and conclusion probabilities differed between inferences (Miriam and Leslie). These changes were introduced to avoid carry-over effects between trials, or an attempt to establish coherence across trials when only coherence within a trial was assessed.

Imagine you are part of a team of doctors who are working in an emergency hospital. Several patients are brought in by the ambulance with a variety of severe injuries. It is important that you act carefully on these cases because a wrong diagnosis could be fatal. You are reviewing their files with Miriam, another doctor in the team.

Based on the information gathered until now:

Premise: Miriam thinks it’s 80% likely that:

Patient P. M. does not have both a liver injury and a kidney injury.

Conclusion: Miriam thinks it’s 16% likely that:

Patient P. M. does not have a liver injury or he does not have a kidney injury.

Is the likelihood assigned to the conclusion consistent with the likelihood assigned to the premise?

Procedure

After the instructions and an example, participants went through 15 practice trials involving a different scenario and different inferences from those of the main experiment. The practice trials included five trials with the inference p, therefore not-p and a premise probability of 1, which were used to assess whether participants understood the instructions. At the end of the experiment participants provided demographical information and indicated whether they had taken part seriously or had just "clicked through". The last page contained debriefing information. The entire session took on average 18 minutes to complete.

Results and discussion

Figure 6.2 shows the proportion of "yes" and "no" responses to the contradiction p, therefore

not-p with a premise probability of 1, used in the practice trials. The coherent response was

"yes" for the second conclusion probability displayed on the x axis: probability 0, and "no" for