Without convolution of the error density - Bayesian learning in the absence of training data, b

3.3 Results

3.3.1 Without convolution of the error density

In Section 3.3.1, I present the results of my learning on the gravitational mass density pa-rameters (ρ₁, . . . , ρN_R) and the state space pdf parameters (f2, . . . , fN_E) – where the learn-ing is undertaken uslearn-ing the method described in Section 2.2 of Chapter 2, uslearn-ing dataD_PNe

Figure 3.3: Data cleaning of the Globular Clusters and Planetary Nebulae Data. Top row:

In red are the data points that are omitted from the observed datasets on GCs and PNe.

These data points are being removed for either having large error bars in V3 (in the GC case in Top Left), having too high a value of |V₃| (in the PNe case in Top Right), or having too high

X₁²+X²₂ values. Bottom row: Final datasets that are implemented to perform learning of ρ and f parameters.

andDGC, without convolving the error density with the likelihood. Recall that the convo-lution of the error density with likelihood was discussed in Section 2.6 of Chapter 2.

I undertook extensive experimentation with the parameters of the Metropolis-within-Gibbs algorithm that I implemented with the D_GC and D_PNe data sets. Such parameters included parameters of the proposal densities that were used to sample each of the ρ_i and

Figure 3.4: ρ_i parameters learnt along with the 95% HPD credible regions (shown in the left panels) and fjparameters learnt with the respective 95% HPD credible regions, (shown in the right panel) using the PNe data, from an MCMC-chain that implements a Truncaed Normal proposal with samples rejected if positivity/monotonicity is violated, (top panels), and another chain that implements a Truncated Normal proposal density [Robert, 1997]

(lower panels). Herei =1, . . . , 28; j =2, . . . , 9. The modes of the learnt parameters are shown in red filled circles.

f_jparameters from; in particular, I experimented with the proposal variances (which I also refer to as the jump-scales) towards securing convergence of the ρ_i and f_j parameters, wherei =1, . . . , N_R, j=2, . . . , N_E. The value of the pdf in the first energy bin (i.e f₁) is not learnt as it is fixed to 0, given that the energy in this bin is by definition extreme (close to0). The traces for the f parameters are thus shown from f2to f9.

I found that the convergence of the ρiparameters was more easily and robustly attained than for the f_jparameters. The latter were relatively more sensitive to changes made in the jump-scales (i.e. variances of the proposal densities) as well as to implemented changes that were solely relevant to the ρ_iparameters. For instance, even when the variance of the proposal density of the ρ_iparameters is changed very slightly such that the traces of the ρ_i parameters do not display any discernible change (visually speaking), the traces of some of the fj parameters can indicate a changed dispersion and quality of mixing. In order to achieve convergence of the f_j parameters∀j ∈ {2, . . . , NE}, I needed to experiment with all the jump-scales, while ensuring that the state space is comprehensively explored (as evidenced by the ergodicity of chains initiated from diverse seeds).

Figure 3.4 displays the 95% HPD credible regions on the learnt ρi and fj parameters (fori =1, . . . , NR; j =2, . . . , NE) (left and right columns respectively) from 2 runs based on the PNe data DPNe. The first run (top row of Figure 3.4) was designed with a Normal proposal density, where samples not respecting the positivity or monotonicity conditions were rejected, while the second run (bottom row of Figure 3.4) was designed with a Trun-cated Normal proposal density as perRobert[1997]. Traces of the first nine ρ parameters, and fj parameters from this latter run (bottom row of 3.4) using the Truncated Normal proposal density are shown in Figure 3.5 and 3.6 respectively.

In Figure 3.7 are shown the ρ and f parameters modes along with their accompanying 95% HPD credible regions, learnt using D_GC (with modal values of parameters in black triangles) over-plotted on results learnt using D_PNe (modes in red circles). I note the sim-ilarity in the ρ_i parameters (Figure 3.7 left plot). The f_j parameters (Figure 3.7 right plot) exhibit a similar behaviour as well, with slight differences in the sixth and ninth energy bin, which can be accounted for by the slight differences in the HPD credible regions of the learnt ρ_i;i=1, . . . , NR; j =2, . . . , NE. Some of the individual traces in Figure 3.6 are not showing clear convergence. I discuss the cause for this in detail in Section 3.3.3, but give an indication here as to why we should not be surprised to find this. I learn the ρ₁, . . . , ρ_N_R

Figure 3.5: Traces of the ρ₁, . . . , ρ₉parameters from a PNe run. The rest of the ρ parame-ters also exhibit a clear convergence.

parameters with uncertainty, and this causes uncertainty in the computation of the gravita-tional potential, which in the discretised version of the Poisson equation, is related to the ρ₁, . . . , ρ_N_R parameters as stated in Definition 9. Uncertainty in the gravitational potential Φ(R) (or in its discretised version) in turn causes uncertainty in the computation of the energy e (per unit mass) which is given by the sum of the kinetic energy per unit mass (^k^V₂^k²) and the gravitational potential. Thus, uncertainty in the computation of the potential induces uncertainty in the domain variable energy or e of the sought state space pdf f(e) – which translates to uncertainty in the identification of the j^th energy-bin within which I seek the j^th component of the discretised version f = (f₁, . . . , f_N_E)^T of f(e). It is this

Figure 3.6: Traces of the f₂, . . . , f₉parameters from a PNe run.

very mis-identification of the j^th energy-bin that is causing some sampled values of f_j/ in some iterations to be mis-identified as the pdf parameter f_j, where j^/ 6= j. This is true

∀j=1, . . . , NE.

What needs to be done to remedy this situation is to identify which energy-bin each sampled pdf parameter really belongs to in each iteration, given that the j^th energy-bin in a given iteration may encompass different physical energy values than in another iter-ation. Following which, the reallocation of the pdf parameters to the correct energy-bin in every iteration will be done. As stated above, this reallocation is discussed in detail in Section 3.3.3.

Indeed, it is the uncertainty in the learning of the ρ₁, . . . , ρN_R parameters that induces uncertainty in the identification of a given energy value to correspond to an energy-bin marked by a given index j ∈ {1, . . . , NE}. It follows that if I minimise the uncertainty in my learning of the ρ₁, . . . , ρN_Rparameters, this problem of mistaken placing of a learnt pdf

Figure 3.7: ρ1, . . . , ρ28 (Left) and f2, . . . , f9 (Right) parameters learnt along with the re-spective95% HPD credible regions, using the data on the GC sample in NGC4494 (param-eter modes shown in black triangles). Corresponding param(param-eters learnt using the PNe data are over-plotted (modes shown in red circles).

parameter in the energy-bins will be mitigated. One way to accomplish this minimisation in the uncertainties of the ρ1, . . . , ρN_R parameters is to tighten the priors on these parameters.

Thus, for example, if the prior density on ρ_iis tightened to the modal value of this parameter that is learnt in a run such as the chains described in the current sub-section, then the resulting pdf parameter traces will be much better converged, and the need for reallocation of the pdf parameters to the correctly identified energy-bin will diminish. At the same time, tightening the prior on ρ_iwill imply that the 95% HPD interval learnt on this parameter will

diminish in width. We will see this happening when we undertake such prior-tightening that is undertaken to combat the uncertainty in learning both the state space pdf, as well as its domain variable, namely, energy.

The need for reallocation stems from the fact that all parameters are learnt with un-certainty, as indeed is desired in Bayesian statistics; thus the pdf parameters are learnt with uncertainty, and the domain variable of the pdf is also uncertain, owing to the uncertainty in the learning of the ρ_iparameters. This is not an artifice of the choice of proposal or prior in the inference, but follows directly from the nesting of the sought gravitational mass density (ρ(k X k)) function, inside the domain of the pdf (f(_e)) function. Such an embedding of ρ(·)into the support of the state space pdf in fact allows for the learning of the gravitational mass density to be learn-able in this information-sparse context, in the first place. Indeed, it may be argued that it is the gravitational mass density parameters that we were interested in, in the first place, and should confine our attention to the inference of only these parame-ters. However, my interests are in exploring the inference as a whole. Hence we undertake the detailed study in Section 3.3.

In document Bayesian learning in the absence of training data, by embedding model parameters into support of the likelihood: applications to astrophysics (Page 74-81)