The Present-Day Mass Function - The Grand Challenge

5 The Grand Challenge

5.3 The Present-Day Mass Function

The IMF describes the distribution of initial true masses for a population of stars. There is no consensus about the exact shape of this function (the main alternatives due to

Chabrier (2003); Kroupa (2002); Miller and Scalo (1979); Salpeter (1955)). In our case

(where we neglect a key element of the stellar evolution past the stellar regime), we will not be inferring the IMF but the PDMF. Anyhow, and for the purpose of illustrating the method, we will use the broken power law proposed by Kroupa (2002) which is often assumed by the astronomical community (but it should be borne in mind that this is an analytical prescription for the IMF and not for the PDMF). We establish the hypothesis that the PDMF can be expressed as

ξ(m; θ) = cj m −θj , Mj < m ≤ Mj+1, j = 1, 2, 3, (5.4)

where θ = (θ1, θ2, θ3) is the parameter to infer, and the Mj are the mass limits that

deﬁne the support of the broken power law. These will be held ﬁxed in our model for the sake of simplicity, but could be inferred from the data as well. Finally, the cj are

computed for each θ such that the function ξ(θ) is continuous at the boundaries Mj ,

and represents a proper PDF. This involves solving the simple system of equations:

−θj −θj+1 cj Mi = cj+1Mj+1 , 3 � _Mj+1 L cj m −θj dm = 1. Mj j=1

In order to explain what hierarchical or multi-level models are, we will start with a very simple model depicted in Figure 5.4. It shows how observed masses, mˆi are related

to the true masses, mi, for each star.

In Figure 5.4, circles represent random variables and squares denote quantities held ﬁxed in the analysis. Arrows denote the existence of a probabilistic dependence while grey nodes represent random variables that are inferred from Gaia data. The number

104 _{Chapter 5. The Grand Challenge}

(a) Chain for PDMF parameter θ1 in 1300 iterations without any burn-in.

(b) Chain for PDMF parameter θ1 in 1000 iterations after a burn-in of 300 iterations.

(a) Chain for PDMF parameter θ2 in 1300 iterations without any burn-in.

(b) Chain for PDMF parameter θ2 in 1000 iterations after a burn-in of 300 iterations.

106 _{Chapter 5. The Grand Challenge}

(a) Chain for PDMF parameter θ3 in 1300 iterations without any burn-in.

(b) Chain for PDMF parameter θ3 in 1000 iterations after a burn-in of 300 iterations.

[2]

η

[3]

θ

[1]

m

[1]

b

m

[1]

bσ

N

Figure 5.4: Hierarchical model using plate notation. The circles represent random vari ables and the squares refer to ﬁxed quantities. Arrows denote the existence of a statistical dependence. Grey nodes represent measured random variables. For instance, mˆi is the

observed mass of the i-th star. The big rectangle (plate) represents the repetition of the variables inside. The value N in the corner is the number of these repetitions and match the number of sources/stars with true (mi) and observed (mˆi) masses. η is the

hyperparameter that governs the distribution of the prior probability distribution of the slopes p(θ|η). Finally the number inside the brackets represents the dimension of the variables.

108 _{Chapter 5. The Grand Challenge}

in square brackets in the node denotes the dimensionality of the parameter and arrows represent conditional dependence probabilistic relations between variables. The N in the corner indicates the number of equivalent variables in the plate (in our case, one for each mass in the data set).

Let us begin the description of the plate notation with the two right-most arrows. Under the hypothesis that the measurement uncertainties are Gaussian, this probabilistic relationship would be

pi( ˆmi|D) = log N ( ˆmi; mi, σˆi), (5.5)

where mˆi and σˆi are the mean and the standard deviation of the distribution given

in Equation 5.1. Both mˆi and σˆi are estimations from the posterior distribution of

masses delivered by FLAME team. In our case, and for the sake of simplicity in the explanation, we are considering the sample mean and the sample standard deviation as estimators. Of course, other functional relationships can be thought of to represent the uncertainty in the measurements. This is especially true for masses because they are obtained through convolved procedures whereby uncertainties can be far from Gaussian. This is the case of evolved stars that can have very diﬀerent masses and ages, and yet be characterised by the same observed properties (or, in the astronomical terminology, loci in the Hertzsprung-Russell diagram where several evolutionary tracks cross). This results in multi-modal uncertainties if correctly inferred (via, for example, MCMC techniques). In the plate example, we assume that the PDMF is modelled as a function of three parameters ξ(mi; θ1, θ2, θ3) that yields the probability density of generating a star of

mass mi. The number 3 between square brackets inside the node denotes exactly this

dimension of three in the parameters. Hence, the middle arrow and its two connected nodes represent the PDMF.

In our HBM, the vector of model parameters θ is itself treated as a random variable, the distribution of which we aim to infer. By inferring the probability distribution of θ given the data _{D, we infer the PDMF. As a random variable in our model, we need} to specify a PDF for it. Since θ does not depend on any other model parameter or random variable (η is held ﬁxed) the PDF for θ is its a priori or prior distribution. The arrow that joins the ﬁxed value of η with the parameter θ represents then the model element or conditional probabilistic relationship that encodes the a priori distribution p(θ|η) of possible slopes (θ1, θ2, θ3) of the PDMF. η is the hyperparameter that governs

the distribution of the prior probability distribution of the slopes p(θ_{|η). It is speciﬁed} based on a priori knowledge of the values of θ that are reasonably consistent with the estimated stellar masses in general. For the purpose of the example below, we consider η = (0, 5) which means that θi will be generated by a uniform distribution between 0

and 5.

Summarizing, we have a probabilistic latent model to explain the distribution of the observed data, and a generative model for the observed masses mˆi:

1. draw three slopes according to a uniform distribution with PDF p(θ|η) . 2. for each star, draw a true mass according to the PDF ξ(mi; θ).

� � �

3. for each star, draw an observed mass according to the PDF log N ( ˆmi; mi, σˆi).

The main goal of astronomers is to move in the inverse way: from observations to model parameters. In our example, from the masses and their uncertainties, _{mˆi, σˆi}N i=1,

to the parameter θ. This is a classical problem in science: infer model parameters from observed data, and that is exactly what Bayesian inference provides: the posterior probability is nothing but the probability distribution of the model parameters given the observed data. We obtain the posterior by multiplying the so called likelihood (i.e. p(_{mî}N i=1|{σî}iN =1, θ, η), the generative model we have just exemplified) times the prior

p(θ_{|η) .}

Assuming a sample of independent observations _{M, the posterior distribution can be} written according to p(θ|D, η) ∝ p({mî}iN =1|{σî}iN =1, θ, η)p(θ|{σî}iN =1, η) = p(_{mî}Ni=1|{σî}Ni=1, θ)p(θ|η) N = N p( ˆ σmi| î, θ) p(θ|η), i=1

and by using the log function

log p(θ|D, η) = L log p( ˆ σmi| ˆi, θ) + log p(θ|η) − C, (5.6) i=1

where C = log p(_{D) is a constant (the log-evidence) that can, for our purposes, be} ignored (we will only need to compute it explicitly in order to compare and select amongst alternative PDMF models like Chabrier (2003); Kroupa (2002), etc).

Let us now focus on the likelihood p( ˆ σmi| ˆi, θ). We only have observed masses avail

able, not the true ones, so every true (unknown) mass mi will also be a model parameter.

This is a severe drawback in the Gaia context of billions of sources. We avoid this problem by marginalising the likelihood:

p( ˆ σmi| ˆi, θ) = p( ˆmi, m| ˆσi, θ)dm

= p( ˆmi|m, ˆσi, θ)p(m| ˆσi, θ)dm

= p( ˆmi|m, ˆσi)p(m|θ)dm.

In order to obtain the posterior probability in equation 5.6, we must ﬁrst evaluate the integral above for every mass observed. We approximate the integral by using the trapezoidal rule.

In order to test the hierarchical model with a small simulated data set, 10,000 masses were drawn from an PDMF PDF with θ = (1.3, 2.3, 2.3). For each true mass obtained,

110 _{Chapter 5. The Grand Challenge}

θ

₁

= 1.29

+0.01 −0.01

2.16

2.24

2.32

2.40 θ

= 2.28

−+00..0505

1.275 1.290 1.305 1.320

θ

₁

2.10

2.25

2.40

2.55 θ

2.16 2.24 2.32 2.40

θ

₂

2.10 2.25 2.40 2.55

θ

₃

θ

= 2.33

−+00..0909

Figure 5.5: Samples drawn from the posterior distribution in Equation 5.6 by em cee algorithm (Foreman-Mackey et al., 2013). Blue lines represent true values of θ = (1.3, 2.3, 2.3). Dashed lines represent quantiles 0.16, 0.5, 0.84, for each θi. The

title above each 1-D histogram shows the 0.5 quantile with the upper and lower errors supplied by the quantiles. The 2-D plots show the contour lines for levels 0.11, 0.39, 0.67 and 0.86. See Foreman-Mackey (2016) for more information.

6

4

2

0

2

4

6 log(m)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45 density

log(m1)

log(m2)

log(m3)

log(m4)

True PDMF PDF

Posterior mean PDMF PDF

Figure 5.6: True and estimated PDMF PDF function are shown in green and red respectively. The red ribbon represents the 3σ conﬁndence band for estimated PDMF PDF. The graph was done by using the log transformation in x-axis to improve the visualization. Ticks on masses doing the intervals for the PDMF PDF support are shown. Note that the ﬁgure shows PDMF PDF with a change of variable to log(m) instead of m, in order to improve the visualization.

112 _{Chapter 5. The Grand Challenge}

we drew 50 samples from a log-normal distribution that reproduces the expected output from the FLAME Work Package1, at least during the ﬁrst cycles of data processing. All the log-normal distributions considered (one per star) were centred at the true mass, and had a standard deviation equal to 0.1.

Figures 5.5 and 5.6 show the posterior samples obtained by applying the emcee al gorithm (Foreman-Mackey et al., 2013) to the hierarchical model described above. The dashed line in Figure 5.5 represents the quantiles 0.16, 0.50 and 0.84 of the posterior

samples. These quantiles result in the following estimates: θˆ1 = 1.29+0.01 −0.01, θ2 = 2.28+0.05 −0.05

and θˆ3 = 2.33+0.09−0.09. The bulk of the marginal posterior distribution for θ1 seems to have

a bias with respect to the true value, but this displacement is only five thousandths from the maximun a posteriori value, which fits well for the purposes of this test example. Figure 5.6 shows a comparative between the true PDMF PDF (green) and the estimated posterior mean PDMF PDF (red) with a confidence band of 3σ. Note that the figure shows the PDMF PDF as a function of log(m) instead of m, for the sake of clarity.

5.4 The Present-Day Age Distribution

In document Architecture, techniques and models for enabling Data Science in the Gaia Mission Archive (Page 130-139)