Bayesian Markov Chain Monte Carlo - The evolution of galaxy structures from a Bayesian perspect

2.2.2 The sky background

The sky background is a flat plane along the x- and y- image dimensions.

PHI

has the capability of including the sky background value in the fitting process. Méndez-Abreu et al. (2008a) shown that the most significant contributors to parameter miscalculation where systematic errors in the sky background and PSF FWHM. Improper sky subtraction was found to affect the the disc surface brightness profile in a two-component model. The circumvent these systematic errors Méndez-Abreu et al. (2008a) advised a careful sky subtraction pre-fit.

2.2.3 Convolution with the PSF

A careful analysis of the point spread function (PSF) is needed to perform robust photometric decompositions. This has been shown in many previous studies. Méndez-Abreu et al. (2008) found that errors of 2% in the PSF FWHM can produce variations of up to 10% in theRe and nbulge

parameters. (Gadotti, 2009) showed that to reliably retrieve the structural properties of bulges the effective radius, Re has to larger than 80% of the PSF full width half maximum. Our algorithm

uses a Fast Fourier Transform (FFT) to convolve the PSF with the model image. There are alterna- tive convolution techniques, however, we find the FFT convolution is fastest and thus used in the remainder of this paper.

2.3 Bayesian Markov Chain Monte Carlo

Bayesian statistics puts probabilities at the forefront. The evidence about the true state of the world is expressed in terms of Bayesian probabilities which can be thought of as a degree of knowledge. Bayesian methods use the combination of expert scientific information of a model characterised by its parameter vectorθalong with data,D, to obtain probabilities; this is described in Bayes Theorem,

p(θ_|D) = R L(D|θ)p(θ)

L(D|θ)p(θ)dθ, (2.7)

wherep(θ|D)is the posterior probability distribution of having a set of parametersθgiven the data

D; L(D|θ)is the likelihood function or the probability of the data given θ; and p(θ) is the prior probability of the parameter vector θ. We are free to assign any form of p(θ) that we feel best represents our understanding of the model. The prior distribution expresses our knowledge and prejudices about the relative likelihoods of the models or of specific parameter values of θ. The denominator of Equation 2.7 is just a normalisation factor to ensure the probability is of unity when

summed over all possible models, it is sometimes referred to as the probability of the data.

Bayesian methods are restricted by the need to perform integrations analytically. For a high- dimensional parameter space such as the models used to describe the surface brightness distribution of galaxies, the characterisation of the posterior distribution would become infinitely difficult. In recent decades, approximate Bayesian analysis has been performed by using numerical integration or sampling-based estimation methods,i.e, Monte Carlo methods. By generating repeated states by a first-order Markovian process, a Markov-chain Monte Carlo (MCMC) will asymptotically converge to the posterior distribution. Such sampling provides probabilities relating toθ.

The goal of a MCMC in Bayesian inference is to maximize the unnormalized joint posterior distribution and collect samples of the target distributions, which are marginal posterior distributions, later to be used for inference. In the simple case where we might have a model with two or three parameters, one can analytically calculate the posterior probability by forming a grid of parameter values to explore all the possible values. This is a common method in Astronomy, however, for a high dimensional parameter space such as fitting a Sérsic and an exponential surface brightness profile to a spiral galaxy image ( 9 free parameters), this would take a significant amount of time. We can use MCMC to explore this high-dimensional parameter space more efficiently than ever before due to improvements in the computational method.

2.3.1 Likelihood & priors

If we assume that a large number of photons are being detected in the CDD then the measurement noise can be considered to be Gaussian with a mean zero and covariance matrixC. It then follows that the likelihood can also be written as the Gaussian probability density function (PDF) for the difference between measurements and observations (Tamminen, 1999):

p(D|θ) = 1 (2π)N/2₍_det_C₎1/2× exp₋1 2(D−f(x;θ)) T_C−1₍_D −f(x;θ)) (2.8)

where f(x;θ)is the model function which consists of known quantities x (i.e constants, control variables etc.) and the unknown parametersθ andN is the total number of pixels. If we assume that the measurement errors (ε_i = d_i₋ f(x_i;θ)) are normally distributed and independent such thatε_i _∼N(0,σ2), the likelihood for a certain measurement gets the form

2.3. Bayesian Markov Chain Monte Carlo p(d_i|θ) = 1 (2πσ2_i)1/2exp − 1 2σ2_i(di−f(xi;θ)) 2 . (2.9)

Since the error terms are considered to be independent, the combined likelihood of all the measurements can be written as a product

p(D_|θ) =L(D_|θ) = N Y i=1 1 (2πσ2_i)1/2exp − 1 2χ 2 . (2.10)

whereχ2 takes the form

χ2₌ N X i=1 (d_i₋ f(x_i;θ))2 σ2 i . (2.11)

For practical reasons the likelihood is calculated in its logarithmic form so as to increase the numerical accuracy. The final form of the likelihood used for the remainder of this paper is

lnL(D_|θ) =₋χ 2 2 − N X i=1 lnσ_i₋ N 2ln 2π. (2.12)

As mentioned above, the prior distribution describes our a priori (previous) knowledge about the model, or more specificallyθ. Where we know some values to be more probable than others, a carefully selected prior distribution can emphasise this knowledge. However, biases may arise if the prior distribution isinformative e.g.a Gaussian PDF with a narrow width. For the purpose of galaxy image analysis, where the parameter space is known to have many local minima, it is inadvisable to use such an informative prior.

If we do not have any prior knowledge or if we do not want to impart a bias into the fitting process then anuninformativeprior can be used. The best example of an uninformative prior distribution is a uniform distribution: p(θ) =U(a,b) =      1 b−a forθ ∈[a,b] 0 otherwise . (2.13)

whereaand bare parameter limits set by the user. Thus it can be considered that p(θ)is constant within the limits a and b, and consequently returning to a maximum likelihood estimation in a practical sense.

In the case of photometric decompositions of galaxy images there are some physical motivated limits on the parameters that we should include in their priors. For example, negative or very large

Table 2.1: Input parameters and priors used in the MCMC code.

Individual parameters

Parameter Symbol Prior Range

Effective intensity I_e Uniform in log(I_e) log(I_e/counts)_∈U[0.01, 10] Effective radius Re Uniform in log(Re) log(Re/pi x els)∈U[0.01, 10]

Sérsic index n Uniform inn n∈U[0.4, 8]

Central intensity I₀ Uniform in log(I₀) log(I₀/counts)_∈U[0.01, 10] Scale length h Uniform in log(h) log(h/pi x els)∈U[0.01, 10] Axial ratio q Uniform inq q∈U[0.2, 1]

Position angle PA Uniform in PA PA/degrees_∈U[₋360, 360] Central coordinates x0,y0 Uniform inx0&y0 x0&y0∈U[0,Simage]

Combined parameters

Effective radius/scale length R_e/h Uniform inR_e/h R_e/h∈U[0.05, 1.678] Bulge-to-total ratio B/T Uniform inB/T B/T∈U[0.01, 1] Bulge-to-total ratio(<Re) B/T(<Re) Uniform inB/T(<Re) B/T(<Re)∈U[1,−]

# of crossing points N_x Delta funtion (see text)

values for the radius are unphysical. Physical values for the Sérsic index, n, are within the range 0.5<n<8 as larger values will produce non-physical concentrations of stellar light in the centres of galaxies. A clear cut-off point can be put in place to prevent non-physical parameter valuese.g

negative radii. Tables 2.1 list the prior functions used for each parameter of each component. Figures 2.2 and 2.3 present the effect of varying each of the different model parameters on the intensity profiles of two-component galaxies. We define a two-component system as a combination between an inner Sérsic with an outer exponential surface brightness profile. For fitting, we wish to consider all possible combinations where the inner regions are dominated by the inner (bulge) component and vice versa for the outer (disc) regions. While this sounds straightforward, for an inner Sérsic profile withn>1 and negatively sloped exponential disc profile, at some (large) radius the inner component will again dominate over the outer. The definition of where a galaxy’s outer edge is can be difficult to quantify. However, for photometric decompositions we work only with the current image we are fitting, thus, so long as the outer profile is dominant before the object itself vanishes into the background noise our canonical definition of the galaxy is met. In the case where the object is fit with a single component all the above priors relating to the structural parameters apply i.e. keeping all sizes and intensities positive along with the range in Sérsic index values and geometric properties.

The reversal of the bulge and disc in two-component fittings is a problem we want to avoid. This is where the initially assigned bulge component migrates to fit the disc and the disc to the bulge (Allen et al. 2006). Figure 2.2 and 2.3 show this in practice when theB/T is very low or when the

2.4. The MCMC engine

In document The evolution of galaxy structures from a Bayesian perspective (Page 62-66)