Application: Galaxies - Three-Dimensional Examples

Chapter 6 Examples: Earthquakes, Fingerprints (and briefly Galax-

6.3 Three-Dimensional Examples

6.3.2 Application: Galaxies

The second 3-dimensional data set is the locations of galaxies as analysed in Stoica et al. [2007]. The original data set is mapped in the 2dF Galaxy Redshift Survey

Figure 6.9: Empirical estimate of the signal point process density viewed from 3 angles. Points associated with noise in more than 50% of samples are indicated by

×, other points are denoted by +.

Posterior Probabilities for Number of Fibres

Number of Fibres 7

Posterior Probability 0.99

Other Properties Conditioned on the Number of Fibres Number of Fibres Posterior Mean 50% HPD Interval 95% HPD Interval

Number of Noise Points 7 13.97 [13,14] [13,15]

95th Percentile of the Distances from Signal Points to Fibres

7 2.56 [2.43,2.65] [2.25,2.90]

Total Length of Fibres 7 88.66 [80,86] [72,113]

Table 6.5: Results for Simulated Helix Data: First sub-table gives posterior probabilities on the number of fibres, while the second gives posterior means and 50% and 95% HPD (highest posterior density) intervals for a selection of properties of the posterior distribution conditional on the number of fibres. The data consists of 97 points perturbed by a multivariate normal distribution with variance 1, from a helix of length 80.8. Twenty noise points, uniformly distributed over the 20×20×40 window, were superimposed on the signal point pattern. The dispersion parameter σdisp is set to 1 and the prior mean probability that a point is noise is 0.091.

(Colless et al., 2001), a 3-dimensional map of 221,000 galaxies. However, the analysis here is restricted to a subset (124 galaxies) of the scope of galaxies in the database. This is because running a BDMCMC on the full data set would take a long time - an estimated 40 weeks for the same 40,000 units of algorithm time. Stoica et al. [2007] identified three cuboidal samples or ‘bricks’ in the 2dF Galaxy Redshift Survey, each with an approximately constant intensity of galaxies. We use a portion of the first brick (NGP150), specificallyW = [0,40]×[30,60]×[0,10], where galaxy positions are given in respect to the lower left corner of the brick. One reason for choosing this particular subset of the data is that it does not exhibit 2-dimensional walls of galaxies, or dense spherical cluster of galaxies. These structures do appear in maps of galaxies, but such objects are not included in our model. It is anticipated that cosmic walls, or 2-dimensional surfaces embedded in the 3-dimensional space, would be challenging mathematical objects to model. It is unclear exactly how to identify a random surface from a field of orientations, or a field of vectors normal to a random surface, as linear integration techniques (used to identify fibres) do not naturally extend to 2-dimensional surfaces. For more information on the various cosmic structures see Mart´ınez and Saar [2002].

The first 13,000 units of algorithm time (of a total 40,000) were discarded. Samples were taken at a rate of 0.013 units of algorithm time. The initial state was a randomly sampled set ofκ= 6 fibres. Other hyperparameters were chosen as follows: dispersion parameter σdisp= 2; signal probability hyperparameters αsignal = 3 and βsignal = 1; density parameter η = 1.88; mean half-fibre length λ = 10; and the

Dirichlet parameter αDir = 1.

The data are presented in Figure 6.10.

An example of the clustering of points based on one sample is shown in Figure 6.11. It is evident that the data are located over approximately 6 clusters of points. However, from the empirical estimate of the density of signal points (see Figure 6.12) the relative proximity of these fibre clusters is clearer, and it appears that 5 is a better estimate for the number of fibres.

Implementation of all the additional moves described in Section 5.3 in 3 dimensions would improve the mixing properties of the BDMCMC. However these early results indicate that our model extends well to 3 dimensions.

6.4 Conclusions

This chapter demonstrates our approach to making inferences on the underlying curvilinear structure of point patterns through application to four planar data sets.

Figure 6.10: Subset of galaxy data taken from the 2dF Galaxy Redshift Survey (Colless et al., 2001). Specifically galaxies located in the window [0,40]×[30,60]×

[0,10] of the brick of galaxies NGP150, identified by Stoica et al. [2007].

Figure 6.11: Clustering of points in one sample. Curves represent fibres and different symbols represent different clusters

Figure 6.12: Empirical estimate of the density of signal points, darker areas indicate higher densities. Points allocated to noise in more than 50% of samples are represented by ×, while other points are represented by +.

Posterior Probabilities for Number of Fibres

Number of Fibres 4 5 6 7

Posterior Probability 0.01 0.32 0.57 0.10

Other Properties Conditioned on the Number of Fibres Number of Fibres Posterior Mean 50% HPD Interval 95% HPD Interval Number of Noise Points

5 8.05 [7,8] [6,15]

6 7.32 [6,7] [5,9]

7 6.09 [5,6] [4,9]

95th Percentile of the Distances from Signal Points to Fibres

5 5.39 [5.06,5.50] [4.69,6.15]

6 5.07 [4.87,5.29] [4.43,5.94]

7 5.12 [4.81,5.25] [4.67,6.04]

Total Length of Fibres

5 51.20 [45,51] [45,59]

6 54.62 [50,56] [46,62]

7 56.37 [55,60] [48,65]

Table 6.6: Results for Galaxy Data: The first sub-table gives posterior probabilities on the number of fibres, while the second gives posterior means and 50% and 95% HPD (highest posterior density) intervals for a selection of properties of the posterior distribution conditional on the number of fibres. The data are the galaxies located in the window [0,40]×[30,60]×[0,10] of the brick of galaxies NGP150, identified by Stoica et al. [2007]. There are 124 points in a 40×30×10 window, the dispersion parameter σdisp is set to 2 and the prior mean probability that a point is noise is

Also presented are preliminary results on two 3-dimensional data sets.

Following consideration of these examples, it is apparent that the curvature bias in the field of orientations on fibre samples can affect the number of fibres by causing the model to favour short fibre segments over a long single fibre. This is most apparent in the first simulated example. In order to estimate the number of fibres generating a point pattern from the posterior distribution, a bias-corrected estimator, weighted towards the lower end of the posterior distribution of the number of fibres, may be appropriate.

The examples in this chapter provide evidence that our approach can be applied to data exhibiting various types of fibre structure. For example, fingerprint pores lie close to the centre of ridge lines which, in turn, lie almost parallel to each other on the fingertip, yet the flexible model can be fitted to both this data and the densely clustered data of earthquake epicentres.

In the following chapter, we return to the discussion of tensors, analysing the ro- bustness of the local orientation estimate provided by the tensor method, and in- troducing a new measure of anisotropy (the degree to which a tensor deviates from isotropy).

In document An orientation field approach to modelling fibre generated spatial point processes (Page 134-140)