Sampling from the Modified Rayleigh Distribution

5.3 Experimental Analysis

6.2.2 Sampling from the Modified Rayleigh Distribution

In the above sampling scheme it is necessary to draw samples from the posterior distribution ofsn conditioned upon the others. As the modified

Rayleigh distribution is a conjugate prior for the Gaussian mean, this distribution is also of the modified Rayleigh form with the parameters given above. It is therefore necessary to implement a method that allows us to draw samples from this distribution for different parameters. Due

CHAPTER 6. GIBBS SAMPLING APPROXIMATION ₉₉

to the non-standard form of this distribution a direct sampling scheme is not obvious. It is instructive to write the modified Rayleigh distribution in the form: 1 ZmR se−(s−µ)2/2σR2 = 1 ZmR (s−µ)e−(s−µ)2/2σR2 +µe−(s−µ) 2_/₂_σ2 R . (6.5) For s > µ and µ > 0, the modified Rayleigh distribution can be un- derstood as a mixture distribution of a Gaussian and a shifted Rayleigh distribution. However, the modified Rayleigh distribution is defined for values greater than zero, while the shifted Rayleigh distribution is defined for values greater than µ, as it would be negative for values smaller than

µ. We therefore propose a hybrid sampling strategy if µ > 0, which first determines whether the value is greater or smaller than µ. We have

p(s > µ) =p1 = 1 ZmR (σmR+ 0.5µ √ 2πσmR) and p(s < µ) =p2 = 1−p1.

With probability p1,s > µ which means that we can sample from:

p(s|s > µ) =µ+ σmR ZmR σ_mR−1(s)e−0.5σ−1 mRs 2 + 0.5 µ ZmR √ 2πσmR 2 r σ_mR−1 2π e −0.5σ−1 mRs 2 ,

which is a mixture of a rectified Gaussian and a shifted Rayleigh distribution with mixing probabilities given in the square brackets. For s < µ

we know that the distribution is bounded from above by 1 ZmR µe−(s−µ)2/2σR as 1 ZmR (s−µ)e−(s−µ)2/2σR

is negative. We can therefore use a simple rejection sampler to draw samples for 0< s < µ when µ >0.

If µ < 0 we see from equation (6.5) that the second term becomes negative while the first term is positive for all s > 0. The distribution is

then bounded from above by a shifted Rayleigh distribution which can be used for rejection sampling. This might not be a good strategy in general as it can lead to a very high rejection rate for certain parameter values. For our experiments, however, this was found to be of no great concern.

6.2.3 Random Subset Selection

The probabilistic model used in this chapter leads to a posterior foruthat is multi-modal 2_{. Furthermore, for many of the problems of interest here}

the dimension of this state space is very high. As the sampler has to be able to draw samples from often far apart modes associated with much of the probability mass and as the states between these modes have often very low probability, we generally require a large number of samples to be drawn.

In order to improve the Gibbs sampler performance different approaches could be adopted. We tried several methods, but most of these did not offer significant advantages. However, as these methods can be of interest for related applications, they have been included in appendix B. In order to significantly reduce the computational requirements we instead resort to subset selection. For example, the subset selection step introduced in chapter 4 could be used and we found this to work well in practice, however, the Gibbs sampler then does not have the chance to explore certain parts of the distribution.

An improvement on the deterministic subset selection procedure, which asymptotically explores the full probability space and therefore asymptotically draws samples from the correct distribution, is to use a random subset selection during each Gibbs cycle. The method that we found worked best for the problems under study used a combination of both approaches. We used the subset selection algorithm to select a fixed subset at the beginning of each Gibbs run. In each Gibbs cycle we then

We use the term multi-modal here for discrete state spaces. In order to be able to talk about modes, we need to define a neighbourhood for each point. In the discrete state space used here, such a neighbourhood can be defined based on Levenshtein distance (more com- monly known as edit distance). With such a definition, points that differ from any point by only a single indicator variableunconstitute its neighbourhood. A mode is then a point with

CHAPTER 6. GIBBS SAMPLING APPROXIMATION ₁₀₁

added a random selection of further features to this initial set. In each Gibbs step we sample from p(un|snˆ6=n, unˆ6=n,x, θ) and p(sn|snˆ6=n,u,x, θ).

The order in which we sample from these distributions can be chosen at random and it is not necessary to cycle through all n before returning to any one coefficient. The sampler is still guaranteed to converge to the stationary distribution if we ensure that we sample from each coefficient with non-zero probability. The random subset selection method can be seen as a way to specify a random set of subscripts n. If we combine the random subset selection method with the fixed subset selection as pro- posed here, we use a fixed set of subscripts, say _{n1, n2,· · ·, nW1}. In each cycle of the sampler we further select a random set of indices from the remaining set, i.e. {nW1+1, nW1+2,· · · , nW1+W2}. Each Gibbs cycle than samples fromp(unk|snˆ=6 nk, unˆ6=nk,x, θ) and p(snk|snˆ=6 nk,u,x, θ) where nk is a random permutation of the indices n1 tonW1+W2. This method ensures that a fixed region of the probability space was explored by the sampler, but enables the sampler to asymptotically explore the complete space.

In document Bayesian modelling of music: algorithmic advances and experimental studies of shift invariant sparse coding (Page 99-102)