• No results found

The Gaussian Partial Information Model

4.3.1

Motivating Examples

A central component of the partial information models is the structure of the information overlap that is assumed to hold among the individual forecasters. It therefore behooves us to begin with some simple examples to show that the optimal aggregate is not well defined without assumptions on the information structure among the forecasters.

Example 4.3.1. Consider a basket containing a fair coin and a two-headed coin. Two fore-

casters are asked to predict whether a coin chosen at random is in fact two-headed. Before making their predictions, the forecasters observe the result of a single flip of the chosen coin. Suppose the flip comes out HEADS. Based on this observation, the correct Bayesian probability estimate is2/3. If both forecasters see the result of the same coin flip, the op- timal aggregate is again 2/3. On the other hand, if they observe different (conditionally independent) flips of the same coin, the optimal aggregate is4/5.

In this example, it is not possible to distinguish between the two different information structures simply based on the given predictions, and neither2/3 nor4/5 can be said to be a better choice for the aggregate forecast. Therefore, we conclude that it is necessary to incorporate an assumption as to the structure of the information overlap, and that the details must be informed by the particular instance of the problem. The next example shows that even if the forecasters observe marginally independent events, further details in the structure of information can still greatly affect the optimal aggregate forecast.

Example 4.3.2. LetΩ = {A, B, C, D} × {0,1}be a probability space with eight points.

Consider a measure µ that assigns probabilities µ(A,1) = a/4, µ(A,0) = (1 − a)/4,

µ(B,1) =b/4, µ(B,0) = (1−b)/4, and so forth. Define two events

S2 ={(A,0),(A,1),(C,0),(C,1)}.

Therefore, S1 is the event that the first coordinate isA orB, and S2 is the event that the

first coordinate isAorC. Consider two forecasters and suppose Forecasteriobserves Si.

Therefore theith Forecaster’s information set is given by theσ-fieldFi containingSi and

its complement. Their σ-fields are independent. Now, let G be the event that the second coordinate is 1. Forecaster 1 reports p1 = P(G|F1) = (a+b)/2ifS1 occurs; otherwise,

p1 = (c+d)/2. Forecaster 2, on the other hand, reportsp2 =P(G|F2) = (a+c)/2ifS2

occurs; otherwise,p2 = (b+d)/2. Ifεis added toaanddbut subtracted frombandc, the

forecastsp1andp2do not change, nor does it change the fact that each of the four possible

pairs of forecasts has probability 1/4. Therefore all observables are invariant under this perturbation. If Forecasters1and2report(a+b)/2and(a+c)/2, respectively, then the aggregator knows, by considering the intersection S1 ∩S2, that the first coordinate isA.

Consequently, the optimal aggregate forecast isa, which is most definitely affected by the perturbation.

This example shows that the aggregation problem can be affected by the fine structure of information overlap. It is, however, unlikely that the structure can ever be known with the precision postulated in this simple example. Therefore it is necessary to make reasonable assumptions that yield plausible yet generic information structures.

4.3.2

Gaussian Partial Information Model

The central component of the Gaussian model is a pool of information particles. Each particle, which can be interpreted as representing the smallest unit of information, is either positive or negative. The positive particles provide evidence in favor of the eventA, while the negative particles provide evidence againstA. Therefore, if the overall sum (integral) of the positive particles is larger than that of the negative particles, the eventA happens;

otherwise, it does not. Each forecaster, however, observes only the sum of some subset of the particles. Based on this sum, the forecaster makes a probability estimate forA. This is made concrete in the following model that represents the pool of information with the unit interval and generates the information particles from a Gaussian process.

The Gaussian Model. Identify the pool of information with the unit interval

S = [0,1]. Consider a centered Gaussian process {XB} that is defined on a

probability space(Ω,F,P)and indexed by the Borel subsetsB ⊆S such that

Cov (XB, XB0) = |B∩B0|. Such a process can be constructed, for example, by

considering a standard Brownian motion process Y(t) on[0,1], and defining

XB as the variation of Y over B. Let Adenote the event that the sum of all

the information is positive:A :={XS >0}. For eachi= 1, . . . , N, letBi be

some Borel subset ofS, and define the correspondingσ-field asFi :=σ(XBi). Forecasterithen predictspi :=E(1A| Fi).

The Gaussian model can be motivated by recalling the interpreted signal model of Broomell and Budescu (2009). They assume that Forecasteriforms an opinion based on

Li(Z1, . . . , Zr),

where eachLiis a linear function of observable quantities or cuesZ1, . . . , Zrthat determine

the outcome ofA. If the observables (or any linear combination of them) are independent and have small tails, then as r → ∞, the joint distribution of the linear combinations

L1, . . . , LN will be asymptotically Gaussian. Therefore, given that the number of cues in a

real-world setup is likely to be large, it makes sense to model the forecasters’ observations as jointly Gaussian. The remaining component, namely the covariance structure of the joint distribution is then motivated by the partial information framework. Of course, other distributions, such as thet-distribution, could be considered. However, given that both the

multivariate and conditional Gaussian distributions have simple forms, the Gaussian model offers potentially the cleanest entry into the issues at hand.

Overall, modeling the forecasters’ predictions with a Gaussian distribution is rather common. For instance, Di Bacco et al. (2003) consider a model of two forecasters whose estimated log-odds follow a joint Gaussian distribution. The predictions are assumed to be based on different information sets; hence, the model can be viewed as a partial informa- tion model. Unfortunately, as a specialization of the partial information framework, this model is a fairly narrow due to its detailed assumptions and extensive computations. The end result is a rather restricted aggregator of two probability forecasts. On the contrary, the Gaussian model sustains flexibility by specializing the framework only as much as is necessary. The following enumeration provides further interpretation and clarifies which aspects of the model are essential and which have little or no impact.

(i) Interpretations. It is not necessary to assume anything about the source of the in-

formation. For instance, the information could stem from survey research, records, books, interviews, or personal recollections. All these details have been abstracted away.

(ii) Information Sets. The set Bi holds the information used by Forecaster i, and the

covarianceCov (XBi, XBj) = |Bi∩Bj|represents the information overlap between Forecastersiandj. Consequently, the complement ofBi holds information not used

by Forecasteri. No assumption is necessary as to whether this information was un- known to Forecasteriinstead of known but not used in the forecast.

(iii) Pool of Information. First, the pool of information potentially available to the fore- casters is the white noise on S = [0,1]. The role of the unit interval is for the con- venient specification of the setsBi. The exact choice is not relevant, and any other

set could have been used. The unit interval, however, is a natural starting point that links the information structure to many known results in combinatorics and geometry

(see, e.g., Proposition 4.3.3). Second, there is no sense of time or ranking of infor- mation within the pool. Instead, the pool is a collection of information, where each piece of information has ana prioriequal chance to contribute to the final outcome. Quantitatively, information is parametrized by the length measure onS.

(iv) Invariant Transformations. From the empirical point of view, the exact identi-

ties of the individual sets Bi are irrelevant. All that matters are the covariances

Cov XBi, XBj

= |Bi ∩Bj|. The explicit sets Bi are only useful in the analysis,

e.g., when computing the oracular aggregator.

(v) Scale Invariance. The model is invariant under rescaling, replacingS by[0, λ]and

BibyλBi. Therefore, the actual scale of the model (e.g., the fact that the covariances

of the variablesXBare bounded by one) is not relevant.

(vi) Specific vs. General Model. A specific model requires a choice of an event A

and Borel sets Bi. This might be done in several ways: a) by choosing them in

advance, according to some criterion; b) estimating the parameters P(A), |Bi|, and

|Bi ∩Bj| from data; or c) using a Bayesian model with a prior distribution on the

unknown parameters. This paper focuses mostly on a) and b) but discusses c) briefly in Section 5.5. Section 4.4 provides one result, namely Proposition 4.4.2 that holds for any (nonrandom) choices of the setsBi.

(vii) Choice of Target Event. There is one substantive assumption in this model, namely the choice of the half-space for the event A. Choosing {XS > t} for some t ∈ R

makes the prior probability equal toP(A) = 1−Φ(t). The current paper defers the analysis oft 6= 0 to future work and focuses on the centered model for simplicity. Furthermore, choosingt = 0implies a prior probabilityP(A) = 1/2, which seems as uninformative as possible and therefore provides a natural starting point. Note that specifying a prior distribution forAcannot be avoided as long as the model depends

Figure 4.1: Illustration of Information Dis-

tribution among N Forecasters. The bars

leveled horizontally with Forecaster i rep- resent the information setBi.

Figure 4.2: Marginal Distribution ofpi un-

der Different Levels of δi. The more the

forecaster knows, the more the forecasts are concentrated around the extreme points zero and one.

on a probability space. This includes essentially any probability model for forecast aggregation.

4.3.3

Preliminary Observations

The Gaussian process exhibits additive behavior that aligns well with the intuition of an information pool. To see this, consider a finite partition of the full information {Cv :=

∩i∈vBi \ ∪i /∈vBi : v ⊆ {1, . . . , N}}. Each subset Cv represents a set of information

particles such thatBi = Sv3iCv andXBi =

P

v3iXCv. ThereforeXB can be regarded

as the sum of the particles in the subset B ⊆ S, and differentXB’s relate to each other

variables are summarized by a multivariate Gaussian distribution:          XS XB1 .. . XBN          ∼ N              0,    Σ11 Σ12 Σ21 Σ22   =              1 δ1 δ2 . . . δN δ1 δ1 ρ1,2 . . . ρ1,N δ2 ρ2,1 δ2 . . . ρ2,N .. . ... ... . .. ... δN ρN,1 ρN,2 . . . δN                           , (4.2)

where|Bi|=δiis the amount of information used by Forecasteri, and|Bi∩Bj|=ρij =ρji

is the amount of information overlap between Forecastersiandj. One possible instance of this setup is illustrated in Figure 4.1. Note thatBi does not have to be a contiguous subset

ofS. Instead, each forecaster can use any Borel measurable subset of the full information. Under the Gaussian model, the sub-matrix Σ22 is sufficient for the information struc-

ture. Therefore the exact identities of the Borel sets do not matter, and learning about the information among the forecasters is equivalent to estimating a covariance matrix under several restrictions. In particular, if the information inΣ22can be translated into a diagram

such as Figure 4.1, the matrixΣ22is calledcoherent. This property is made precise in the

following proposition. The proof of this and other propositions are deferred to Appendix A of the Supplementary Material.

Proposition 4.3.3. The overlap structureΣ22is coherent if and only ifΣ22∈COR(N) :=

convxx0 :x∈ {0,1}N , where conv{·}denotes the convex hull and COR(N)is known

as the correlation polytope. It is described by2N vertices in dimension dim(COR(N)) =

N+1 2

.

The correlation polytope has a very complex description in terms of half-spaces. In fact, complete descriptions of the facets of COR(N)are only known forN ≤7and conjectured for COR(8) and COR(9) (Ziegler, 2000). Fortunately, previous literature has introduced both linear and semidefinite relaxations of COR(N)(Laurent et al., 1997). Such relaxations

together with modern optimization techniques and sufficient data can be used to estimate the information structure very efficiently. This, however, is not in the scope of this paper and is therefore left for subsequent work.

The multivariate Gaussian distribution (4.2) relates to the forecasts by

pi =P(A|Fi) =P(XS >0|XBi) = Φ XBi √ 1−δi . (4.3)

The marginal density ofpi,

m(pi|δi) = r 1−δi δi exp Φ−1(pi)2 1− 1 2δi ,

has very intuitive behavior: it is uniform on[0,1]ifδi = 1/2, but becomes unimodal with

a minimum (maximum) atpi = 1/2whenδi > 1/2(δi <1/2). Asδi → 0, pi converges

to a point mass at1/2. On the other hand, as δi → 1, pi converges to a correct forecast

whose distribution has atoms of weight1/2at zero and one. Therefore a forecaster with no information “withdraws” from the problem by predicting a non-informative probability1/2

while a forecaster with full information always predicts the correct outcome with absolute certainty. Figure 4.2 illustrates the marginal distribution when δi is equal to0.3, 0.5, and

0.7.