Wisdom of the Crowd Effect - Extending the Rationale of Consensus Scoring

2.3 Solution Strategies to Scoring Challenges: Empirical Scoring

2.3.3 Extending the Rationale of Consensus Scoring

2.3.3.2 Wisdom of the Crowd Effect

group of independent individuals have been hypothesized providing accurate predictions and suggested as an alternative to judgments of single, even highly able, individuals. The WoC effect has been commonly used for everyday judgments, and has been addressed in popular science literature (Surowiecki, 2004) as well as scientific literature (Mannes, Soll, & Larrick, 2014). Probably the most famous example of WoC in psychological literature was

described by Francis Galton (1907) in his article vox populi. He described his visit to a fat stock exhibition where he collected ox’ weights estimates given by farmers and casual visitors. The median of N = 787 data points was used as an estimate of the ox’s true weight. As a matter of fact, this estimate was very close to the true weight, as the difference was only nine pounds. Galton expressed himself surprised that "democratic" judgments could be so trustworthy. Galton has been frequently cited as marking the point where the consensus idea entered social science research; the idea itself, however, is much older.

Applications of WoC Effect. WoC has been especially popular in economic or psychological applications (Gaissmaier & Marewski, 2011; Lee, Zhang, & Shi, 2011; Yi, Steyvers, Lee, & Dry, 2012). Beginning with Surowiecki’s best-seller of 2004, the number of scientific studies of the WoC effect has increased. However, the aggregation of judgments or forecasts has a long history in psychological, economical, and statistical research. Clemen (1989) provided an overview of different methods, as well as applications of

aggregated judgments. The WoC effect has been frequently known under terms like "swarm intelligence", "collective intelligence" or "aggregated group judgements" (Krause, James, Faria, Ruxton, & Krause, 2011; Nofer & Hinz, 2014; Surowiecki, 2004). However, the literature concerning the quality of group judgments in general covered various sorts of judgment. One very important classification attribute is whether judgments are given independently and aggregated mechanically, or result from some kind of group process that prohibits independence. WoC studies usually refer to the first case, whereas studies about group decision-making focus on interacting groups (Gigone & Hastie, 1997; Solomon, 2006). Crowds usually consist of different judges, but studies have also considered the WoC effect within individual respondents (Vul & Pashler, 2008). WoC effects have been

investigated for judgments of group-related as well as group-unrelated events. For one thing, crowds were used to predict events in which members of the group in the future possibly could be part of. For instance, Gaissmaier and Marewski (2011) investigated the quality of election forecasts based on recognition of heuristic and WoC effects. As these

authors showed, WoC generally performed better than, or at least as well as, prediction models based on recognition of parties and traditional polls. In economics, WoC has been used to make stock predictions. Hill and Ready-Campbell (2011) showed that WoC can outperform a famous stock index (S&P500). Predictions could be further improved by application of an algorithm reducing the crowd to specific experts. In a similar study, Nofer and Hinz (2014) investigated whether internet crowds are better in making stock predictions than professional experts. Based on analysis of data across several years, the authors concluded that crowd predictions provided higher returns than the predictions of experts. WoC has also been successfully used to predict events that are not group-related. For example, the WoC effect has been used to predict results of sports events (Herzog & Hertwig, 2011), to improve estimations of the correct price in the Price is Right (Lee et al., 2011), to solve complex problems (Yi et al., 2012), to estimate quantities (Krause et al., 2011) or to predict natural phenomena (Hueffer, Fonseca, Leiserowitz, & Taylor, 2013).

Most studies of the WoC effect present supporting results for the effect. However, few studies criticize the WoC effect or show limiting factors. Among these are Simmons,

Nelson, Galak, and Frederick (2010) as well as Stephen and Lee (2010), who showed that crowds can make unwise decisions in sports betting when biased point spreads are

available, even when the crowd knows that they are biased. In addition, Lorenz, Rauhut, Schweitzer, and Helbing (2011) showed that social influences provided as information by others could undermine the WoC effect.

Definition and Preconditions of WoC Effect. How did most studies come to the conclusion that the WoC effect exists? To define a group judgment as being wise, attributes of the judgment must be stated that allow precise evaluation of the wisdom. Firstly, it seems necessary that an objective criterion is available to compare the group judgment with. The group judgment needs to constitute a sufficient estimate of the target value. For instance, the ox weight measured with a calibrated scale is the target value in Galton’s example. Secondly, group judgments are compared to other estimation procedures

to make explicit in relation to which other procedures the crowd is considered comparatively wise (Mannes et al., 2014). Most commonly, the group judgment is compared to (1) a randomly selected person of the group or (2) an expert (i.e., an individual with high ability). Experts can also constitute estimators based on historical events or similar procedures (Hueffer et al., 2013). The third important question concerns the method of aggregating group judgments. Most often, central tendency measures are used, but other methods have been used, for example, in economics (Clemen, 1989; Merkle & Steyvers, 2011; Turner, Steyvers, Merkle, Budescu, & Wallsten, 2014).

According to Davis-Stober et al. (2014), the WoC effect can be defined as follows, incorporating the three aspects described above: "A crowd is wise if a linear aggregate, for example a mean, of its members’ judgments is closer to the target value than a randomly, but not necessarily uniformly, sampled member of the crowd." (p. 1).

The WoC effect is usually bound to specific preconditions. However, the

preconditions are seldomly mentioned in applications and the effect is hardly ever defined (Davis-Stober et al., 2014). Preconditions for the WoC effect are stated relatively

consistently as (1) knowledge about the event, and (2) diversity and independence of judgments (Davis-Stober et al., 2014; Galton, 1907; Larrick et al., 2012; Nofer & Hinz, 2014; Surowiecki, 2004). Some authors additionally state motivation of respondents as a precondition (Nofer & Hinz, 2014):

• Knowledge: Larrick et al. (2012) state that judges should possess a level of expertise in the sense that they need to have experience with the event or that they are educated in the specific area. Although knowledge is a precondition of key importance, the precondition is not precisely defined insofar as cut-off levels of knowledge are elusive.

• Diversity and independence: Judgments need to be given independently from each other: that is, the judgment of one person should not influence judgments of other individuals (Nofer & Hinz, 2014). As a definitional characteristic, this aspect

distinguishes the WoC effect from group decision making. However, the essential idea behind the independence precondition is a statistical assumption of error of

measurement. Each measurement is distorted by error that can be either systematic or random. Averaging across different measurement (data) points has been shown to eliminate random error (Eysenck, 1939). However, systematic error is not eliminated by averaging. Non-independent judgment are prone to such systematic error, for instance error based on social influence.

Concerning diversity, Larrick et al. (2012) hypothesize that judges should have

different perspectives on the event, should use different cues for judging, and therefore differ in errors. According to Nofer and Hinz (2014) diversity is advantageous because a diverse group provides more alternatives as sources of information and knowledge. According to Larrick et al. (2012) diversity and independence cannot be

distinguished, as dependence results in less diversity. Davis-Stober et al. (2014) defined diversity as the highest possible negative correlation between respondents and conclude on the basis of their mathematical considerations that not independence, but maximal negative dependence will improve the WoC effect.

Although the aspect of diversity is mentioned frequently, the theoretical foundation is still questionable. It does not become evident why different perspectives of

individuals ought to foster WoC effects. In addition, confusion of diversity and independence is misleading from a psychometric perspective. The definition of independence as zero correlation between respondents or of diversity as maximum negative correlation between respondents is rather unusual in psychometric theory. Usually, independence is an assumption in psychometric models; however this assumption does allow respondents to correlate positively. The assumption of local stochastic independence means no other latent variable than the assumed one should influence responses. Hence, responses of individuals show zero correlations only if the level of the latent variable is held constant. Although local stochastic independence

constitutes a psychometrically reasonable assumption as a definitional aspect of the WoC effect, this assumption is seemingly unnecessary for group judgments to be wise (Gigone & Hastie, 1997). In addition, theories of expertise and agreement among individuals indicate that experts yield a common truth which causes their judgments to correlate (Batchelder & Romney, 1988; Legree et al., 2005). Hence, it remains unclear why diversity or zero correlations should possess any advantages.

2.3.4 Concluding Comments. Having introduced the measurement of SI and

In document An Investigation of Empirical Scoring Methods for Ability Measurement (Page 62-67)