Entropic prior and Maximum Entropy method

2.5 The prior

2.5.9 Entropic prior and Maximum Entropy method

Another approach searches the least informative model compatible with the data using a prior based on Boltzmann’s definition of entropySE2 _{(or equivalently, Shannon’s notion of infor-}

mation, seeShannon,1948),

P(s_|p) = exp(αSE), (2.31) and maximizing the resulting posterior distribution, beingαsome constant, andsthe so-called hidden image (or signal). This inference procedure is called the Maximum Entropy method (MEM) (Frieden,1972;Gull,1989;Gull & Daniell,1978;Hobson et al.,1998;Jaynes,1963,

1968;Maisinger et al.,1997;Skilling,1989). For a review seeNarayan & Nityananda(1986). From now on we will represent the underlying signal bys in the framework of MEM. The MEM can be considered as MAP estimation with an entropic prior.

The particular expression for the entropy depends on the statistical formulation of the non- informative prior. Let us think of a positive signal as a grid with q cells, with each cell i

having a certain intensity valuesi,i = 1, . . . , q, with an uncertainty on each value given by

±α−1_{. Then we define some discrete quanta}_n

i on each cell related to the intensity through

the uncertainty: ni=αsi. The signal can be guessed by distributing theniquantas in the grid.

In this way, the image is modeled in this way analogously to the energy configuration space of a thermo-dynamical system. If we further demand each cell to be iid, the number of ways this object can occur is given by the multiplicity

W = Nq!

n1!n2!. . . nq!

, (2.32)

The generalization to the multidimensional case leads to the following matrix form: Jij(θ) ≡ h∂logP(d|θ)

∂θi

∂logP(d|θ)

∂θj

i₍_d_|_θ₎(see appendixA.8).

1_{Here the autocorrelation matrix}_S _{is represented in k-space. We will discuss this in further detail in section} (3.3).

2.5 The prior

withNq being the total amount of quantas to be distributed in all cells (Nq = Pini). The

probability of any particular result is then given by the multinomial distribution

P(s′ _|p) =W q−Nq_. _(2.33) Sutton & Wandelt(2006) propose to sample from the multiplicity function directly to perform reconstructions in radioastronomy. By using Stirling’s formula for the factorials (n!_∼nne−n) we can write

logP(s′ _|p) =₋αX

s′_ilogs′_i+const. (2.34) Comparing this expression with eq. (2.31), we recover Shannon’s definition of entropy (S₊E =

is′ilogs′i)3. The expression that is commonly used for the entropy is a generalization of

Shannon’s formula by Skilling that can be derived based only on consistency arguments within probabilistic information theory for positive and additive distributions (PADs) (Skilling,1989). This generalization implies the definition of a Lebesgue measure (m) for the integral of some function of the hidden image to represent the entropy

S₊E(s′ _|p) =X

s′_i₋mi−si′log s′i/mi i, (2.35)

here in its discretized form. Skilling’s expression for the entropy can also be derived by con- sidering a team of monkeys throwing balls atqcells at random with Poissonian expectationµi:

P(n_|µ) =Q_iµni

i e−µi/ni!, whereni =αsi and µ=αmi (Skilling, 1989). For a review on

further expressions for the entropy seeMolina et al.(2001).

The global maximum of SE over sin the absence of further constraints is found to be

s′=m. Consequently, mcan also be thought of as a prior model for the image. However, this expression for the entropy will allow reconstructing positive signals only. Zaroubi et al.

(1995) propose to define s′=ρ and m=ρ₀, to avoid the possibility of having a negative distribution fors.

According toGull & Skilling(1990) the MEM can be extended to reconstruct distributions, which can be either positive or negative, as in the case of density fluctuations. Such distributions can be described as the difference between two subsidiary positive distributions (PADs)

s=u₋v, (2.36)

relative to a common modelm1

S_±E(u,v _|p) =X i h ui−2mi−uilog(ui/mi) i +X i h vi−2mi−vilog(vi/mi) i . (2.37)

One can see from eq. (2.36) that∂S_±E/∂u=₋∂S_±E/∂v, hence yielding

uv =m2. (2.38)

The “+” symbol inSE+denotes that the definition is only valid for positive signalss′.

1_{The “±” symbol in}_SE

2. BAYESIAN APPROACH TO SIGNAL RECONSTRUCTION

From the relations given by eqs. (2.36) and (2.38), it is easy to derive

u= 1

2(w+s), (2.39)

v = 1

2(w−s), (2.40)

withwi= (s2i + 4m2i)1/2. Using these expressions, the total entropy can be rewritten as

SE_±(s_|p) =X i h wi−2mi−silog (wi+si)/2mi i . (2.41)

The Maximum Entropy method gives a non-linear estimator of the underlying signal that one wants to reconstruct. This method is especially interesting to study deviations from Gaussianity (Hobson et al.,1998;Maisinger et al.,1997). It is equivalent to maximizeχ2_{with a Lagrangian}

multiplier, which includes a penalty function given by the entropy. Maximum Entropy in this context searches the hidden image that adds the least additional information to the data.

The quantity we need to maximize is given by

QE(s_|p) =αSE(s_|p) + logL₍_s_|_d_,_p₎_, _(2.42)

where thelogL_{is given by eq. (}_2.12_{) or eq. (}_A.29_{). The equation we want to solve is}

∇QE(s_|p) = 0. (2.43) In section (3.2), different iterative algorithms to solve this non-linear problem will be dis- cussed. The required expressions for the gradient of QE and its curvature for positive and positive/negative expressions of the entropy (eqs. 2.35and 2.41) and for both Gaussian and Poissonian likelihoods are presented in appendixA.10.

Note that in the limit of low density fluctuations, i.e. in the linear regime, the expression of the entropy reduces to the quadratic entropy (eventually with an offset of the origin ofs),

SE(s_|p)_{≃ −}P_is2_i/2mi. This expression is very similar to a Gaussian prior for the signal

with a variance given bym. In that case Maximum Entropy leads to the Wiener-filter.

In document Kitaura Joyanes, Francisco Shu (2007): Cosmic cartography: Bayesian reconstruction of the cosmological large-scale structure. Dissertation, LMU München: Fakultät für Physik (Page 48-50)