We consider a linear unmixing model for the observed spectra, zn ∈ RD×1, n =
1, . . . , Nz, of a hyperspectral image with Nz pixels, D bands, and additive noise,
en ∈ RD×1,
zn = Fsn+ en, (4.1)
where F ∈ RD×K
+ are the endmembers and sn ∈ [0, 1]K×1 the corresponding abun-
dances. The number of endmembers is denotes by K. Comparing Eq. (4.1) with the BFL model in Section 2.3.3 reveals the similarity between linear unmixing and the lin- ear latent feature model. Thus, the endmembers can be considered as positive-valued features and the abundances as constrained feature coefficients. In particular, the abundances are required to fulfill the additivity constraint, i.e., PK
k=1sn,k = 1, and the
positivity constraint, i.e., sn,k ≥ 0, with n = 1, . . . , Nz and k = 1, . . . , K, as they repre-
sent the fractions of which the endmembers occur in each pixel. As in BFL, the noise, en, is assumed to be completely i.i.d. Gaussian distributed, i.e., p(e| σz) =Ne(0, σz2I)
with variance σ2
z. Although this model does not capture correlated noise, it has been
widely used in unmixing models, e.g., in [135, 157].
As explained, HSU can be considered as a feature learning problem. Hence, we can model the unmixing task by means of the BNFL framework, yielding the BNU al- gorithm. Thus, we assume that the endmember matrix, F, is composed of a binary activation matrix A ∈ {0, 1}D×K and a weighting matrix W ∈ RD×K
+ as suggested
in [75] and detailed in Section 2.3.3, i.e.,
F = A W, (4.2)
where represents the element-wise matrix multiplication. Following a nonparametric approach, we model A as an IBP. We want to highlight that the samples drawn from an IBP can be dense, though the IBP models a sparse matrix. This is explained in Section 2.3.3.1.
In the following, we detail the components of the proposed hierarchical Bayesian non- parametric model for spectral unmixing.
4.4 Bayesian Nonparametric Unmixing Model 59 αa βa A W z s N σ2z ασ βσ
Figure 4.1. Graphical model of the hierarchical Bayesian Nonparametric Unmixing (BNU) model. Only the spectra zn, n = 1, . . . , Nz, are observed, the other variables
are latent and need to be inferred.
4.4.1
Observation Likelihood
We assume that the observed data is conditionally independently distributed and cor- rupted by additive Gaussian noise. Hence, the likelihood is given as
p(Z| W, A, S, σ2 z) = Nz Y n=1 Nzn (A W) sn, σ 2 zI , (4.3) with Z =z1 . . . zNz and S = s1 . . . sNz.
In practice, the pixels of the image may suffer also from different lighting conditions that may lead to scattering effects which are not captured by a Gaussian noise model. Deriving a suitable model is challenging, and inference is likely to be intractable. To analyze the effect of varying light conditions, we simulate multiplicative noise on the abundances in Section 4.6.
4.4.2
Prior for the Noise Variance
The variable σ2
z denotes the variance of the Gaussian noise. Hence, a suitable prior for
σ2
z is the Inverse-Gamma distribution with parameters ασ and βσ,
p(σ2
z| ασ, βσ) = IGaσ2
z(ασ, βσ) .
Further, we assume that ασ and βσ follow Gamma distributions with p(ασ) =
Gaασ h(1)ασ, h (2) ασ and p(βσ) = Gaβσ h(1)βσ, h(2)βσ, respectively, where h(1)ασ, h (2) ασ, h (1) βσ and
60 Chapter 4: Hyperspectral Unmixing via Bayesian Nonparametric Feature Learning
4.4.3
Prior for the Abundances
Recalling that the prior on S needs to fulfill the additivity and positivity constraints, we use independent Dirichlet distributions to model the columns of S. Thus, the abundances of the pixels are assumed to be i.i.d. yielding
p(S) = Nz Y n=1 p(sn) = Nz Y n=1 Dirsn,1,...,sn,K(αs,1, . . . , αs,K).
We set the hyperparameters αs,k = 1 for k = 1, . . . , K, making the prior uniform
under the additivity and positivity constraints. The uniform distribution has the ad- vantage that no preferences on the different endmembers are imposed. Moreover, this assumption allows for efficient sampling as explained in Section 4.5.
4.4.4
Prior for the Endmember Weights and Activations
Following [134], we choose the distance prior for the endmember weights, W, with hyperparameter γw. This prior can be interpreted as a probabilistic version of the
volume regularization proposed in [139] which is based on the Euclidean distance. Further, we use the Heaviside step function to express the positivity constraint on the endmembers, yielding the prior of W [134],
p(W)∝ exp{−γw K X k=1 kwk− 1 K K X k0=1 wk0k22} H(W), (4.4)
where wk is the kth row of W.
A drawback of this prior is that the hyperparameter γw cannot be inferred from the
observations. This is due to the fact that the normalization of the prior p(W) is unknown, which is required to derive the conditional p(γw) needed for sampling. Thus,
γw needs to be set a priori. We choose this prior as it has shown good performance in
the experiments, despite this drawback.
As explained, the feature activation matrix, A, is modeled as IBP as described in Section 2.3.3.1.