Chapter 2 Observational, data reduction & statistical analysis techniques
2.5 Statistical techniques
This section describes some important aspects of two statistical approaches used through- out this thesis. This includes the Monte Carlo method and the numerical Markov chain Monte Carlo method. Both are very powerful in estimating the statistical significance of model fits to data, as well as determining the values and statistical uncertainties of model parameters. Depending on the data and model one wants to fit to the data, as well as on their complexity, one can choose which method provides the best approach to the problem.
2.5.1 Monte Carlo method
The Monte Carlo method is a relatively simple approach useful for evaluating the expec- tation value of a function or model parameter, by randomly drawing many samples from a distribution that can be used to approximate the expectation value. For example, with this approach it is possible to estimate statistical uncertainties of model parameters by repeatedly fitting the model to the data while slightly perturbing that data based on the uncertainties of the individual data points. Therefore, Monte Carlo analysis is based on repetition in order to explore a certain distribution. Note also that during Monte Carlo analysis, the original data may be altered in order to arrive at useful results for the model. For purposes of illustration, consider the ingress and egress features of white dwarf eclipses in light curves of cataclysmic variables. In Chapters 5 and 6 a Monte Carlo approach is used to determine the statistical uncertainties of the parameters in a simple model used to fit these features. The best fit of the model to the data can simply be found
by χ2 minimisation. Determining uncertainties on the model parameters can be done by
repeating this fit numerous times, say 1000 times. In every fit each individual data point is perturbed by a normal distribution with a mean and standard deviation equal to the value and uncertainty of the data point itself. In addition, another 1000 fits are performed in which the start and end points of the data included in the fit are varied by a few. The resulting (generally Gaussian) distributions for each model parameter not only describe the value of the parameters more accurately then a straight fit to the data, they also describe the uncertainties in these parameters better. This is a particularly good approach for fitting white dwarf eclipses of cataclysmic variables because the inherent flickering present in these systems shows itself as random jitter in the data points.
2.5.2 Markov chain Monte Carlo simulation
The Markov chain Monte Carlo method (MCMC; see for example Gilks et al., 1996; Mackay, 2003; Gelman et al., 2014) is a numerical method that often relies on Bayesian inference. One uses information known to be true and derives logical conclusions from them using
Bayes’ theorem. The known information Dgenerally consists of observed data, while the
missing data. Bayes’ theorem describes the probability distribution ofM given that Dis true, denoted asP(M|D), as follows:
P(M|D) = P(D|M)P(M)
P(D) . (2.1)
Here P(M|D) is the posterior distribution of the model M assuming that the data D is
true,P(M) andP(D) are the a priori distributions of the model and data, andP(D|M) is the probability of observing the data given that the model is true. Determining these prob- ability distributions analytically quickly becomes difficult, especially in high dimensions. In such cases, a robust alternative approach is the numerical method MCMC.
The Monte Carlo and Markov chain Monte Carlo techniques share (parts of) their name because both are based on drawing many samples from certain parameter distributions. This is done to evaluate, for example, a statistical distribution or a target distribution such as the posterior probability of a model’s parameters given some data (maximising
P(M|D) is approximately equivalent to minimising χ2). A significant difference between
the two approaches is that during MCMC the data itself is never altered, while this may be a justified approach in the Monte Carlo method. MCMC is especially useful for problems with many dimensions, even thousands or more, where the random exploration of Monte Carlo techniques could take longer than the age of the Universe.
In contrast to the Monte Carlo approach described before, in MCMC simulations the samples are not drawn randomly, but through the use of a Markov chain. In such chains,
each next samplexi+1 is drawn based only on the current samplexi and a proposal density
distribution at this position for all parameters. One has to choose an initial sample to start the chain from, but after a warm-up phase, also called the burn-in phase, the new samples do not depend on the original starting position. This is illustrated in Fig. 2.9, which shows the value of three parameters throughout a chain. After the warm-up phase, the chain has converged to a stationary distribution in all dimensions, which is also called the target or posterior distribution. These posterior distributions can be used to estimate means, variances, expectation values, correlation between parameters, etcetera for the relevant individual parameters that were used in the model. Besides removing the warm-up phase,
one can choose to discard all but every nth draw, thereby reducing correlations between
subsequent samples in the chain and possible effects this may have on the final distributions. The Markov chain itself can be constructed by the general Metropolis-Hastings algo- rithm or by a specialised version thereof. In this class of algorithms each proposed step,
from drawxi to draw xi+1, is accepted with a certain probability. If the step is accepted,
the newest addition to the chain is xi+1. If the step is not accepted, xi is added to the
chain again. Next a new drawxi+2 is made and evaluated for acceptance, and so on.
Where used in this thesis, MCMC has been implemented using the python package
emcee (Foreman-Mackey et al., 2013), which is based on ensemble Markov chains (Good-
man & Weare, 2010). In this case, there is not one sample x that is evolved throughout
0.85 0.90 0.95 1.00 1.05 0.995 0.997 0.999 1.001 n o rm al is ed p ar am et er v al u es 0 5000 10000 15000 20000 25000 30000 draw number 0.995 1.000 1.005 1.010 1.015 1.020
Figure 2.9: Normalised parameter values as a function of the number of draws in a Markov chain
Monte Carlo simulation to illustrate convergence of the chain and the warm-up phase (to the left of the vertical dashed line).
different position. The number of walkers in the ensemble should always be at least twice the number of free parameters in the model used to fit the data. Generally, a Markov chain explores the parameter space more effectively with 100 – 200 walkers. One cycle of draws through all walkers constitutes one step for the ensemble and one step in the Markov chain. In the implementation used throughout this thesis, the proposed draw for a given walker is determined by stretching or compressing along the straight line between the walker and the position of a random other walker in the set. The so-called stretch factor is now an additional parameter that has to be set. It should always be larger than unity, to ensure that the walkers move around and fully explore the parameter space, with my default value equal to 2. The acceptance probability for the individual walkers is still determined by the Metropolis-Hastings algorithm. The ensemble of walkers as a whole contains information about the multi-dimensional target distribution, and is especially useful when exploring highly-skewed parameter distributions.
When fitting a model to a set of data using an MCMC approach, it may be the case that the data do not constrain all model parameters equally well. In this case one can choose to constrain a given parameter by adding a prior to the evaluation criteria used to determine whether a step is accepted or not. This prior can be Gaussian, flat, with or without a hard cutoff, etcetera, where the choice is motivated by additional information that is available about this particular parameter. For example, if one is fitting a model
that includes a temperature as a free parameter, one can safely constrain this temperature to be always positive on a Kelvin scale.
One important issue that comes up when using MCMC methods is the so-called ac- ceptance fraction, which is the number of accepted steps divided by the total number of steps in the chain. Remember that if a certain draw is not accepted, the previous draw is repeated in the chain before a new draw is attempted. If the acceptance fraction of a chain is too low, the chain has not been able to explore the full parameter space effectively, as it moved around slowly and may even have been stuck for a long time in a certain place. On the other hand, if the acceptance fraction is too high, practically every proposed step has been accepted and the chain cannot converge to the posterior distribution. Typically, in MCMC simulations that explore five or more dimensions the acceptance fraction should
be∼0.25 (Gelman et al., 2014). In ensemble MCMC, one can manipulate the acceptance
fraction of a chain by varying the stretch factor.
Another essential aspect of running an MCMC simulation lies in deciding when a chain has fully converged and represents the posterior distribution correctly. The choice is some- what subjective, and may be limited by the available time and computational power. However, there are several ways to check convergence. One can run multiple chains, each with a different initial starting point. After the respective warm-up phases, the chains should all converge on the same distribution. One extremely long chain will also be able to fully explore the available parameter space and converge on the target distribution, but this approach may not be feasible for practical reasons such as those mentioned.
2.6
Conclusions
I have discussed the various techniques used throughout this thesis to obtain, reduce and analyse data. This covered the current-day standard of CCDs as the basis of most astro- nomical instrumentation, and somewhat more technical details of a number of instruments that were used to obtain photometric or spectroscopic observations presented in this thesis. In later chapters, some data may be used that was obtained with instruments not discussed in this chapter, and, if necessary, these will be briefly discussed in place, accompanied by appropriate references. In this chapter I have also introduced two statistical techniques that are useful in performing fits of models to data sets, as will become clear in the following chapters.