5 The Grand Challenge
5.3 The Present-Day Mass Function
The IMF describes the distribution of initial true masses for a population of stars. There is no consensus about the exact shape of this function (the main alternatives due to
Chabrier (2003); Kroupa (2002); Miller and Scalo (1979); Salpeter (1955)). In our case
(where we neglect a key element of the stellar evolution past the stellar regime), we will not be inferring the IMF but the PDMF. Anyhow, and for the purpose of illustrating the method, we will use the broken power law proposed by Kroupa (2002) which is often assumed by the astronomical community (but it should be borne in mind that this is an analytical prescription for the IMF and not for the PDMF). We establish the hypothesis that the PDMF can be expressed as
ξ(m; θ) = cj m −θj , Mj < m ≤ Mj+1, j = 1, 2, 3, (5.4)
where θ = (θ1, θ2, θ3) is the parameter to infer, and the Mj are the mass limits that
define the support of the broken power law. These will be held fixed in our model for the sake of simplicity, but could be inferred from the data as well. Finally, the cj are
computed for each θ such that the function ξ(θ) is continuous at the boundaries Mj ,
and represents a proper PDF. This involves solving the simple system of equations:
−θj −θj+1 cj Mi = cj+1Mj+1 , 3 � Mj+1 L cj m −θj dm = 1. Mj j=1
In order to explain what hierarchical or multi-level models are, we will start with a very simple model depicted in Figure 5.4. It shows how observed masses, mˆi are related
to the true masses, mi, for each star.
In Figure 5.4, circles represent random variables and squares denote quantities held fixed in the analysis. Arrows denote the existence of a probabilistic dependence while grey nodes represent random variables that are inferred from Gaia data. The number
104 Chapter 5. The Grand Challenge
(a) Chain for PDMF parameter θ1 in 1300 iterations without any burn-in.
(b) Chain for PDMF parameter θ1 in 1000 iterations after a burn-in of 300 iterations.
(a) Chain for PDMF parameter θ2 in 1300 iterations without any burn-in.
(b) Chain for PDMF parameter θ2 in 1000 iterations after a burn-in of 300 iterations.
106 Chapter 5. The Grand Challenge
(a) Chain for PDMF parameter θ3 in 1300 iterations without any burn-in.
(b) Chain for PDMF parameter θ3 in 1000 iterations after a burn-in of 300 iterations.
[2]
η
[3]
θ
[1]
m
i[1]
b
m
i[1]
bσ
iN
Figure 5.4: Hierarchical model using plate notation. The circles represent random vari ables and the squares refer to fixed quantities. Arrows denote the existence of a statistical dependence. Grey nodes represent measured random variables. For instance, mˆi is the
observed mass of the i-th star. The big rectangle (plate) represents the repetition of the variables inside. The value N in the corner is the number of these repetitions and match the number of sources/stars with true (mi) and observed (mˆi) masses. η is the
hyperparameter that governs the distribution of the prior probability distribution of the slopes p(θ|η). Finally the number inside the brackets represents the dimension of the variables.
108 Chapter 5. The Grand Challenge
in square brackets in the node denotes the dimensionality of the parameter and arrows represent conditional dependence probabilistic relations between variables. The N in the corner indicates the number of equivalent variables in the plate (in our case, one for each mass in the data set).
Let us begin the description of the plate notation with the two right-most arrows. Under the hypothesis that the measurement uncertainties are Gaussian, this probabilistic relationship would be
pi( ˆmi|D) = log N ( ˆmi; mi, σˆi), (5.5)
where mˆi and σˆi are the mean and the standard deviation of the distribution given
in Equation 5.1. Both mˆi and σˆi are estimations from the posterior distribution of
masses delivered by FLAME team. In our case, and for the sake of simplicity in the explanation, we are considering the sample mean and the sample standard deviation as estimators. Of course, other functional relationships can be thought of to represent the uncertainty in the measurements. This is especially true for masses because they are obtained through convolved procedures whereby uncertainties can be far from Gaussian. This is the case of evolved stars that can have very different masses and ages, and yet be characterised by the same observed properties (or, in the astronomical terminology, loci in the Hertzsprung-Russell diagram where several evolutionary tracks cross). This results in multi-modal uncertainties if correctly inferred (via, for example, MCMC techniques). In the plate example, we assume that the PDMF is modelled as a function of three parameters ξ(mi; θ1, θ2, θ3) that yields the probability density of generating a star of
mass mi. The number 3 between square brackets inside the node denotes exactly this
dimension of three in the parameters. Hence, the middle arrow and its two connected nodes represent the PDMF.
In our HBM, the vector of model parameters θ is itself treated as a random variable, the distribution of which we aim to infer. By inferring the probability distribution of θ given the data D, we infer the PDMF. As a random variable in our model, we need to specify a PDF for it. Since θ does not depend on any other model parameter or random variable (η is held fixed) the PDF for θ is its a priori or prior distribution. The arrow that joins the fixed value of η with the parameter θ represents then the model element or conditional probabilistic relationship that encodes the a priori distribution p(θ|η) of possible slopes (θ1, θ2, θ3) of the PDMF. η is the hyperparameter that governs
the distribution of the prior probability distribution of the slopes p(θ|η). It is specified based on a priori knowledge of the values of θ that are reasonably consistent with the estimated stellar masses in general. For the purpose of the example below, we consider η = (0, 5) which means that θi will be generated by a uniform distribution between 0
and 5.
Summarizing, we have a probabilistic latent model to explain the distribution of the observed data, and a generative model for the observed masses mˆi:
1. draw three slopes according to a uniform distribution with PDF p(θ|η) . 2. for each star, draw a true mass according to the PDF ξ(mi; θ).
� � �
3. for each star, draw an observed mass according to the PDF log N ( ˆmi; mi, σˆi).
The main goal of astronomers is to move in the inverse way: from observations to model parameters. In our example, from the masses and their uncertainties, {mˆi, σˆi}N i=1,
to the parameter θ. This is a classical problem in science: infer model parameters from observed data, and that is exactly what Bayesian inference provides: the posterior probability is nothing but the probability distribution of the model parameters given the observed data. We obtain the posterior by multiplying the so called likelihood (i.e. p({mˆi}N i=1|{σˆi}iN =1, θ, η), the generative model we have just exemplified) times the prior
p(θ|η) .
Assuming a sample of independent observations M, the posterior distribution can be written according to p(θ|D, η) ∝ p({mˆi}iN =1|{σˆi}iN =1, θ, η)p(θ|{σˆi}iN =1, η) = p({mˆi}Ni=1|{σˆi}Ni=1, θ)p(θ|η) N = N p( ˆ σmi| ˆi, θ) p(θ|η), i=1
and by using the log function
N
log p(θ|D, η) = L log p( ˆ σmi| ˆi, θ) + log p(θ|η) − C, (5.6) i=1
where C = log p(D) is a constant (the log-evidence) that can, for our purposes, be ignored (we will only need to compute it explicitly in order to compare and select amongst alternative PDMF models like Chabrier (2003); Kroupa (2002), etc).
Let us now focus on the likelihood p( ˆ σmi| ˆi, θ). We only have observed masses avail
able, not the true ones, so every true (unknown) mass mi will also be a model parameter.
This is a severe drawback in the Gaia context of billions of sources. We avoid this problem by marginalising the likelihood:
p( ˆ σmi| ˆi, θ) = p( ˆmi, m| ˆσi, θ)dm
= p( ˆmi|m, ˆσi, θ)p(m| ˆσi, θ)dm
= p( ˆmi|m, ˆσi)p(m|θ)dm.
In order to obtain the posterior probability in equation 5.6, we must first evaluate the integral above for every mass observed. We approximate the integral by using the trapezoidal rule.
In order to test the hierarchical model with a small simulated data set, 10,000 masses were drawn from an PDMF PDF with θ = (1.3, 2.3, 2.3). For each true mass obtained,
110 Chapter 5. The Grand Challenge
θ
1= 1.29
+0.01 −0.012.16
2.24
2.32
2.40
θ
2θ
2= 2.28
−+00..05051.275 1.290 1.305 1.320
θ
12.10
2.25
2.40
2.55
θ
32.16 2.24 2.32 2.40
θ
22.10 2.25 2.40 2.55
θ
3θ
3= 2.33
−+00..0909Figure 5.5: Samples drawn from the posterior distribution in Equation 5.6 by em cee algorithm (Foreman-Mackey et al., 2013). Blue lines represent true values of θ = (1.3, 2.3, 2.3). Dashed lines represent quantiles 0.16, 0.5, 0.84, for each θi. The
title above each 1-D histogram shows the 0.5 quantile with the upper and lower errors supplied by the quantiles. The 2-D plots show the contour lines for levels 0.11, 0.39, 0.67 and 0.86. See Foreman-Mackey (2016) for more information.
6
4
2
0
2
4
6
log(m)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
density
log(m1)
log(m2)
log(m3)
log(m4)
True PDMF PDF
Posterior mean PDMF PDF
Figure 5.6: True and estimated PDMF PDF function are shown in green and red respectively. The red ribbon represents the 3σ confindence band for estimated PDMF PDF. The graph was done by using the log transformation in x-axis to improve the visualization. Ticks on masses doing the intervals for the PDMF PDF support are shown. Note that the figure shows PDMF PDF with a change of variable to log(m) instead of m, in order to improve the visualization.
112 Chapter 5. The Grand Challenge
we drew 50 samples from a log-normal distribution that reproduces the expected output from the FLAME Work Package1, at least during the first cycles of data processing. All the log-normal distributions considered (one per star) were centred at the true mass, and had a standard deviation equal to 0.1.
Figures 5.5 and 5.6 show the posterior samples obtained by applying the emcee al gorithm (Foreman-Mackey et al., 2013) to the hierarchical model described above. The dashed line in Figure 5.5 represents the quantiles 0.16, 0.50 and 0.84 of the posterior
ˆ
samples. These quantiles result in the following estimates: θˆ1 = 1.29+0.01 −0.01, θ2 = 2.28+0.05 −0.05
and θˆ3 = 2.33+0.09−0.09. The bulk of the marginal posterior distribution for θ1 seems to have
a bias with respect to the true value, but this displacement is only five thousandths from the maximun a posteriori value, which fits well for the purposes of this test example. Figure 5.6 shows a comparative between the true PDMF PDF (green) and the estimated posterior mean PDMF PDF (red) with a confidence band of 3σ. Note that the figure shows the PDMF PDF as a function of log(m) instead of m, for the sake of clarity.