2.3.4
Maximum Entropy Method
The Maximum Entropy Method (MEM) is a technique that aims to extract the greatest amount of information from a measurement as justified by the signal-to- noise ratio of the data (Starck et al., 2002). The MEM retains all known information about a system, subject to the applied constraints, by determining the least biased image. In this way, unknown information is approximated in an unbiased manner.
In physics, entropy is defined as a measure of disorder in a system. Mathematical entropy requires a broader definition. Shannon initially defined the field of Information Theory in his ground-breaking publication ‘A mathematical theory of
communication’ (1949). This gave rise to a new definition of entropy as a measure of the degree of uncertainty in a system. Shannon postulated that by applying entropy to an information source, the minimum channel capacity required to reliably transmit the source as encoded binary digits can be determined.
Entropy is a measure that assigns a positive weight to all possible configurations that are not excluded by the given information or constraints (Shannon, 1949). This form of entropy exists as:
S(X) = −X
x
P r(x)log2P r(x) (2.32)
where P r(x) = probability that X is in state x P r(x)log2P r(x) = 0 if P r(x) is 0
Jaynes (1957) proposed that this form of entropy could be used for radio interferometric image deconvolution and showed that the only unbiased configuration is the solution that has maximum entropy. It has further been shown that maximum entropy is the only consistent method of selecting a solution which does not introduce correlations in the image beyond those which are required by the original data (Johnson and Shore, 1980, 1983; Livesey and Skilling, 1985).
For the application of image restoration, a statistical model for the imaging process must be developed to allow the definition of an entropy measure. This requires a discrete representation of the object in terms of pixels. The object is divided into N pixels, each with area ∆A and containing a particular radiance that can be considered as a random emission of photons with energye. Ifriis the average
rate of emission of photons from the ith pixel, then the average radiance of the ith
pixel is described by:
fi =
e
The probability that a photon was emitted from the ith pixel, given that it was
emitted from the object is:
P ri = ri P iri = Pfi ifi = fi F (2.34)
where F = total intensity
The entropy of the discrete probability distribution is defined as:
S =− n X i=1 P rilogP ri =− n X i=1 fi F log fi F (2.35)
This model describes the uncertainty as to which pixel emitted a given photon. Described more generally, this form of entropy has been proposed alongside other definitions in the image domain, each having unique attributes and advantages under different circumstances. These include:
Burg (1975): Sb(O(x, y)) =− X pixels ln(O(x, y)) (2.36) Frieden (1975): Sf(O(x, y)) =− X pixels O(x, y) ln(O(x, y)) (2.37) Gull and Skilling (1991):
Sg(O(x, y)) =− X pixels O(x, y)−m−O(x, y) ln O(x, y) m (2.38)
where m = background model
The major advantage of Gull and Skilling’s definition is that entropy has a maximum of zero when O equals the background model m. This is the form of entropy that has achieved the most success in image deconvolution and continues to be developed in new applications.
Returning to Bayes’ theorem (Equation 2.20) allows the evaluation of the probability of finding the original image O given the data I under a maximum entropy framework. P r(I|O) is the conditional probability of finding the data I
given the original imageO, which essentially represents the distribution of the noise. Uncorrelated Gaussian noise with variance σ2 is given by:
P r(I|O) = exp − X pixels (I−P ∗O)2 2σ2 I ! (2.39) Without any knowledge ofO other than it being positive, applying the maximum entropy principal leads to:
P r(O) = exp(αS(O)) (2.40)
where α = Lagrange multiplier
S(O) = entropy on image O
Again, P r(I) is independent of O and can thus be considered a constant. After substitution and taking logarithms:
ln(P r(O|I)) =αS(O)− X pixels (I−P ∗O)2 2σ2 I (2.41) This consists of the entropy of the image and a quantity corresponding to χ2
which can be used to measure the statistical distance between the data and the model prediction. The solution can be found by minimising:
J(O) = X pixels (I−P ∗O)2 2σI2 −αS(O) (2.42) = χ 2 2 −αS(O)
Skilling and Bryan (1984) developed an operational maximum entropy deconvolution algorithm that performed well, but was limited by the computational capabilities of its time. As computer hardware progressed, these methods became outdated and were replaced by more accurate and computationally complex methods. The Pyramid Maximum Entropy Method introduced the concept of multiresolution image analysis into maximum entropy deconvolution (Bontekoe et al., 1994). This was a breakthrough for the MEM as the new multiresolution interpretation allowed significant features to be resolved at different image resolutions and then recombined to produce the final image. However, the Pyramid Maximum Entropy Method suffered from some major drawbacks such as multiresolution image reconstruction, the need to determine a default background model and user-defined reconstruction parameter estimation. The Multiscale Entropy method resolved many of these issues and showed that the concept of multiresolution image analysis was indeed beneficial, but the correct mathematical tool to implement this method was the Wavelet transform (Starck, 1996).
Chapter 4 investigates the Multiscale Entropy deconvolution of MODIS Aqua ocean colour imagery and shows that instrumental PSF effects can significantly impact the quality of recorded satellite data. Wavelet transforms and optimal step size estimation are combined with customised techniques including multi- detector FFT convolution and detector-saturated radiometric correction to produce an accurate and robust MODIS deconvolution implementation.