Chapter 6 Examples: Earthquakes, Fingerprints (and briefly Galax-
6.2 Two-Dimensional Examples
6.2.2 Stanford and Raftery’s Simulated Example
A simulated data set used in Stanford and Raftery [2000] is shown in Figure 6.2(a). We include it here to facilitate comparison with the methods proposed by Stanford and Raftery. The data consist of 200 signal points and 200 noise points and is based on a family of two fibres each of length 157. The original data set, consisting of points over a [−1.5,2.5]×[−1.5,1.5] window, were scaled and translated to lie in a 200×150 window.
The BDMCMC was run for 60,000 units of algorithm time, the first 30,000 of which
(a) Simulated data. (b) A random sample from the BDMCMC output. Fibres are represented by curves, pluses indicate points allocated to signal and crosses indicate points allocated to noise in this sample.
(c) Estimate of the density of signal points found by smoothing a series of samples of fi- bres (darker areas indicate higher densities). Pluses indicate points allocated to signal and crosses indicate points allocated to noise in at least 50% of samples. The size of points repre- senting the data has been reduced to enhance the clarity of the density estimate.
(d) Estimate of the clustering of the signal points - different clusters are distinguished by varying symbol, crosses indicate noise. Esti- mated by considering how often pairs of points are associated with the same fibre across a number of samples.
Figure 6.3: Trace plot of the number of fibres against algorithmic time. Following consideration of this plot it was decided that the first 30,000 algorithmic seconds should be discarded.
were discarded. This rather long burn-in was chosen following consideration of the trace plot of the number of fibres, see Figure 6.3. This plot may suggest that insufficient mixing is occurring as the chain appears to traverse a region of the posterior distribution with 3-4 fibres, and then after 25,000 algorithmic seconds, move to a region favouring 2-3 fibres. However, from consideration of fibre samples and also of the posterior density, we have concluded that the chain, if run for longer, would not return to the original region favouring 3-4 fibres. Trace plots of other statistics such as the total length of the fibres and the total number of noise points show no evidence of a lack of convergence or poor mixing.
Samples were taken at a rate of 0.033 per unit of time. The initial state was a randomly sampled set ofκ= 2 fibres. Other hyperparameters were chosen as follows: dispersion parameter σdisp= 3; signal probability hyperparameters αsignal = 1 and βsignal = 1; density parameter η = 0.64; mean half-fibre length λ= 78.5; and the
Dirichlet parameter αDir = 1.
Figures 6.2(b) to 6.2(d) show that our model fits the data very well. The two fibres in the sample in Figure 6.2(b) compare favourably with the principal curves fitted in Stanford and Raftery [2000].
Naturally, noise points that lie near a fibre will frequently be associated with the signal component during the course of the BDMCMC. Most points sufficiently close to a fibre are associated to signal in at least 50% of samples as is evident in Figure 6.2(c). However, in an individual sample a random subset of these points will be noise (see Figure 6.2(b)), reflecting how noise is included in the model - as the superposition of a homogeneous Poisson process. This is in contrast to the work of Stanford and Raftery [2000] where the emphasis is on fitting a principal curve to the points, for this reason they use the approximation that all points that belong to a dense cluster are signal points.
As in the previous example, we estimate properties of the posterior distribution. Posterior means and highest posterior density intervals are given in Table 6.2.
Posterior Probabilities for Number of Fibres
Number of Fibres 2 3
Posterior Probability 0.78 0.22
Other Properties Conditioned on the Number of Fibres Number of Fibres Posterior Mean 50% HPD Interval 95% HPD Interval
Number of Noise Points 2 195.81 [194,202] [181,209]
3 195.09 [191,199] [180,206]
95th Percentile of Point to Fibre Distances
2 7.46 [7.11,7.65] [6.60,8.47]
3 7.48 [7.08,7.53] [6.80,8.22]
Total Length of Fibres 2 317.44 [312,320] [306,331]
3 319.02 [309,318] [305,337]
Table 6.2: Results for Stanford and Raftery’s Simulated Example: First sub-table gives posterior probabilities on the number of fibres, while the second gives posterior means and 50% and 95% HPD (highest posterior density) intervals for a selection of properties of the posterior distribution conditional on the number of fibres. The simulated data consists of 200 signal points and 200 noise points over a 200×150 window, and is based on a family of two fibres each of length 157. The dispersion parameter σdisp is set to 3 and the prior mean probability that a point is noise is
0.5. Posterior probabilities only given if non-zero to rounding error.
In this example, more points are associated to signal than expected. This is partly due to the high intensity of noise points, and also explained by a slight bias in the length of the fibres. The posterior statistics on the lengths of the fibres arguably suggest that the extension of fibres beyond their known length (of 157) is supported by the high intensity of noise points. This extrapolation is sometimes beneficial, particularly for fibre reconstruction in areas of missing data. Here the extrapola- tion is less desirable as it suggests there is evidence for fibres in the background noise.
The extrapolation of fibres into less dense regions of points can be reduced by choosing a higher Dirichlet parameter αDir for the distribution of anchor points
along the fibres. This decreases the posterior density of fibres lying through point clusters of non-constant intensity. The drawback of increasingαDir is that proposed
moves are more frequently rejected. This is because (A) no proposals account for the distribution of anchor points along fibres, and (B) a large value of αDir leads
to a multimodal anchor point distribution with most of the probability weighted around the modes. Hence, the proposal of a state with low posterior density is more likely.