The IRLS Interpretation of the Algorithm - Inference by MAP Estimation via the EM Algorithm

4.2 Inference by MAP Estimation via the EM Algorithm

4.2.4 The IRLS Interpretation of the Algorithm

Here we derive the FOCUSS algorithm and the algorithm proposed by Figueiredo in the light of the well known Iterative Reweighted Least Squares (IRLS) approach used in statistics to solve generalised linear mod- els (See Nelder’s original paper [101]). As already noted by Dempster in [24], the EM algorithm often leads to IRLS solutions, a fact also mentioned in the thorough discussion paper by Green [46].

Equation (4.4) can be written as the weighted least squares problem: ˆ s= arg min s 1 2kx−Ask 2₊_λ_sT_Ws_, _(4.6) where we use λ = σ2

ǫλc, with λc being the scale parameter of the prior.

In order for equation (4.6) to have the same fixed points as equation (4.4) the weights have to be determined so that both functions have the same gradient at the fixed points. This is exactly the same approach taken in the derivation of the FOCUSS algorithm, which can therefore be seen as an IRLS algorithm.

Again, the weighting matrixW(s) is a function of s and has to be re- evaluated in each iteration. The particular form of W(s) has to be chosen

such that the optimisation problems have the same fixed points. This is done by using sT_W_(s)s ₌ _f_{(s). This leads to exactly the same solution}

as the FOCUSS algorithm.

The FOCUSS and IRLS algorithms that use the Lp (quasi) norm or,

equivalently, a generalised Gaussian prior, lead to an iterative calculation of the weighting matrix asW(s) = diag(|sp1−2|,· · · ,|spN−2|). Intuitively one

would expect that in the limit, if we let p be zero, the algorithm would use aL0norm. This is however not true. In order to determine the exact form

of the regularisation term, and therefore the marginalised prior used when

pbecomes zero, the general regularisation term minimisation problem has to be considered: ˆ s= arg min s 1 2σ2 ǫ (x₋As)T(x₋As) +λcf(s),

the gradient of which is 1 σ2 ǫ (x−As)A+λc ∂ ∂sf(s). The weighted least squares equivalents are:

ˆ s= arg min s 1 2σ2 ǫ (x−As)T(x−As) +sTWs, with a gradient of 1 σ2 ǫ (x₋As)A+ 2sTW.

With the weights obtained by setting p= 0 in the IRLS algorithm or the EM algorithm with a Jeffrey’s hyper prior we can write

2sTW=λc ∂

∂sf(s) = 2[|s1| −1_,_|_s

2|−1,· · · ,|sN|−1].

By integration it can be found that for λc = 1 the regularisation term

minimised in this case is

f(s) =X

log_|sn|2,

which is one of the regularisation terms proposed by Rao et al. in [71] and [112], which he termed Gaussian entropy 2_{. This regularisation term goes}

The relation to Shannon-entropy however, is not clear. The term seems to stem from Donoho’s paper [27] in which he called all sparsity enforcing measures entropy.

CHAPTER 4. ANALYTIC APPROXIMATION ₇₁

to −inf when any of the s goes to zero and it is therefore not possible to think of a global optimal solution for the optimisation problem. Fur- thermore, once a coefficient s becomes small enough, it is forced to zero very rapidly [20]. In practice, this behaviour is found to be of advantage if very sparse solutions are required, and it was found in practice, that the algorithm had a fast convergence. However, the results were clearly influenced by the starting conditions as there is no unique solution.

The EM algorithm leads to exactly the same algorithm as the IRLS approach when both algorithms assume a Laplacian prior distribution for s. For p = 0, IRLS and FOCUSS algorithms use the logarithmic regularisation term, which can be seen to be the log prior of a parameter free, improper distribution of the form QN

1 s−n2. For other values of p

the equivalence of the IRLS algorithm and the EM algorithm shows that the IRLS algorithm with a generalised Gaussian prior leads to the same maxima in the posterior as a scale mixture of Gaussians with a hyper prior of _σ12

sn of the form σsnpα(α/2) where pα(α/2) is a symmetric alpha-stable distribution of _σ12

sn. This suggests that the IRLS algorithm can also be interpreted as an EM algorithm for the priors of the generalised Gaussian family, however, an exact proof of this is not yet available. This also leads to the question of whether the generalised Gaussian family converges to

p(s)_∝s−2

n in the limit as the generalised exponent p goes to zero.

Convergence is a key issue when dealing with iterative algorithms. The convergence of the FOCUSS algorithm has been proven in the noiseless case for both p = 0 and for 0 ₆= p _≤ 1 in [110] and [112]. The previous discussion on the equivalence of the FOCUSS and the EM algorithms for certain priors and regularisation terms allows for the application of the convergence properties of the EM algorithm [90] to the FOCUSS algorithm in the noisy model. The rate of convergence of the FOCUSS algorithm in the noiseless case was also investigated in [110]. The rate of convergence for the noisy case is analysed in [20].

In document Bayesian modelling of music: algorithmic advances and experimental studies of shift invariant sparse coding (Page 70-73)