Part II of this thesis reports eight experiments designed to investigate the effect of outcome knowledge and information order
DATA REANALYSED TO INCLUDE 'DRAWS’
Groups I ,II,III,V Combined (N = 119)
Groups I, III,IV,V Combined (N = 119)(In parentheses)
Believed Believed did Overall
happened not happen
+ 78(78) 58 ( 58) 76(69)
- 41(41) 61(61) 43(50)
% + 65.5(65.5) 48.7(48.7) 63.9(58.0)
Group VI (N = 27)
USSR : No before, shortly after.
+ 20 IO 16
- 7 17 11
%+ 73.9 37.0 59.3
Group VII (N = 27)
CHINA: No before, long after.
+ 16 13 14
- 11 14 13
%+ 59.2 48.1 51.2
Group Vili (N = 37)
CHINA: No before, long after.
+ 18 22 24
- 21 15 13
%+ 48.6 59.4 64.9
Groups VI ,VIII Combined (N = 64)
Groups VII,VIII Combined (N = 64) (In parentheses)
+ 38(34) 32(35) 40(38)
- 26(30) 32(31) 24(26)
91.
The third, and most serious, problem stems from the previous two. It concerns the cases where foresight estimates equal hindsight ones and where subjects have equal numbers of (+) and (-). Fischhoff & Beyth, as outlined earlier, allocate both types of occurranees a (o) and then drop the data/subject in further analysis. This is unfortunate for the following
reasons. Subjects, when making hindsight judgments, are explicitly instructed to give "the same probabilities you gave then (two weeks ago). If you cannot remember the probability you assigned then, give the probability that you would have given to each of the various outcomes on the eve of President Nixon's trip...."
(p5). Thus accurately remembered or reconstructed probabilities must be regarded as non-hypothesis supporting, and hence evidence
against hindsight bias. Such instances should be assigned a (-)
and included in the analysis. They should not be assigned a (o) and dropped from the analysis.
Strictly speaking, the hypothesis being tested by Fischhoff and Beyth is that "the remembered or reconstructed probability of an event will tend to be larger than the probability originally assigned to it if the event is believed to have occurred, and smaller if it is believed not to have occurred", (p3). Dropping cases if f = h and (+) = (-) gives a misleading picture as to the percentage of hypothesis supporting subjects.
92.
Table 3.3 shows the effects of reanalysing Fischhoff and Beyth's data with cases of foresight = hindsight counting against the hypothesis of hindsight bias, i.e. assigned a minus
sign. The figures are obtained by taking an experimental group
adding the number of (+) and (-) together and then subtracting this from the total number of subjects (N) in that group. This figure was then added to the non-hypothesis-supporting category. The percentage of hypothesis-supporting (+) subjects was worked out by simply dividing the number in that category by the total N and multiplying by lOO.
The percentages shown in Table 3.3 are lower than those in Table 3.2. Whereas 3.2 shows that two-thirds of subjects exhibited hindsight bias Table 3.3 reveals a figure of less than 60 percent overall. For outcomes believed to have happened Fischhoff and Beyth (Table 3.2) report three-quarters of the subjects as giving higher assessments in hindsight. In contrast, Table 3.3 shows this figure to be less than two-thirds. For outcomes believed not to have happened Table 3.2 shows 57 per cent of subj ects giving lower assessments in hindsight; the re-analysed data (Table 3.3) shows this figure to be less than 50 per cent. Table 3.3 also shows
i
only three of the eight experimental groups to have greater than 60 per cent hypothesis-supporting subjects for outcomes believed to have occurred. The table also shows only one of the eight experimental groups to exceed this figure for outcomes believed not to have occurred. In contrast, Fischhoff & Beyth (Table 3.2) report seven and four out of eight for outcomes believed to have and not to
have occurred respectively. Re-analysing the data in this way shows the extent and strength of hindsight bias to be considerably less than Fischhoff & Beyth claim.
Unfortunately, inferential statistical analysis cannot be carried out on this re-analysed data in a similar way to Fischhoff and Beyth. The sign test is inappropriate and consequently a Z-score cannot be obtained.
T h e fourth and final criticism is a general one: although it is desirable to compute some overall index of foresight/hindsight differences it is also necessary to find out how many estimates for each question in the various groups are individually different. This could b e achieved simply by comparing individual likelihood assess ments using, for example, the Wilcoxon T-test. Such an analysis would provide more detailed information concerning the magnitude of hindsight bias and provide what is lacking from Fischhoff & Beyth's analysis.
3.3.2. Summary.
T h e problems discussed concerning Fischhoff & Beyth’s procedure for analysing their data (a) use a method which provides a very weak criteria as to what is to be considered as a case of hindsight bias;
(b) incorrectly analyse the data using this procedure, they ignore what should be regarded as non-hypothesis-supporting instances;
(c) give no indication as to the strength or magnitude of the bias. Such shortcomings mean that one should be cautious about the strength of the bias. Re-analysis of the data (Table 3.3) shows the bias to
be less extensive than claimed by Fischhoff a Beyth. In con clusion, stronger empirical support for hindsight bias is required before we can feel confident in its existence outside of the laboratory.
3.4. EXPERIMENTS USING FACTUAL MATERIAL
The two final experiments to be reviewed in this chapter
both make use of factual material. As we shall see,Fischhoff (1977); Wood (1978), in contrast to the previous studies, provide the
soundest evidence for the existence of hindsight bias, and give some indication to its extent.
3.4.1. Fischhoff (1977).
The two experiments reported in this paper firstly attempt to demonstrate hindsight bias, and secondly, try to discover if the bias can be reduced or eliminated. Both experiments make use of factual material, this was taken from a wide range of areas such as history, music, geography, literature. Subjects were presented with a word or statement and two definitions; they had to indicate which they thought correct. For example, Aladdin's nationality was (a) Persian or (b) Chinese. Subjects had to assign, to one of the alternatives, a subjective assessment of being correct.
Experiment 1 consisted of three treatments. For each treatment