Fitting models to data - Long period exoplanets from photometric transit surveys

Chapter 2 Methods

2.3 Fitting models to data

In the case of exoplanetary lightcurves, for each data point we have a single well-constrained datapoint (time,xi), and a measured quantity (flux,yi). We may, regardless of the scenario, create a model that we think best describes the physics of the changing light from the system. In a noiseless system, and for a perfect model, ally datapoints could be determined from some function, f(x). However, this is impossible in reality, and we must find a model that fits best given the errors intrinsic to each flux value (Hogg et al., 2010)2.

In astronomy each flux measurement (yi) has intrinsic uncertainty (σy,i). In most cases this scatter is dominated by the presence of "shot" or "white" noise from the source. This uncertainty comes from the inconsistent nature of "counting" photons, as all photometric equipment including our eyes effectively do, and it can therefore be found from Poisson statistics. This shows that the uncertainty is proportional to the square root of the number of detector counts and therefore that the noise from an astronomical source is proportional to its brightness,σw =

√

N ∝ pf0.100.4∗(m∗−m0). As well as the source itself, all photometry also includes shot noise from the sky coincident with the source, and dark current & read- out noise from the detector. Calibrations such as flat fielding, bias and dark frames, also contribute uncertainties. The intrinsic flux from the source may also include uncertainties due to correlated noise between data points, which we explore in section 1.2.5.

To determine the model which most closely predicts the data, the "Chi-Squared" statistic is classically used:

χ2₌ N X i=1 (yi− f(xi))2 σ2 y,i (2.2) This is effectively the squared sum of the difference between y measurements and their model-predicted value, scaled to their individual errors. Minimising theχ2through iteration therefore provides the best-fit model for a given set of model parameters and datapoints.

More comparable to other models is the reduced chi-squared, which is the the chi- squared per degrees of freedom, χ2_R/ν, where the degrees of freedom are determined by

the number of model parameters Npar and the number of datapoints N by ν = N − Npar.

In the case of simple models with many datapoints, the reduced chi-squared tends to the average model deviation per measurement, scaled to uncertainties. χ2_Rclose to 1.0 suggest good model fits. Values less than 1.0 suggest overfitting (e.g. the errorbars are too large). To compare two models of the same data, one can compute the "f-distribution", which simplifies to the ratio of chi squares between each model for equal datasets and similar models. This computes information regarding which is the best-fitting model. Chi-square statistics only hold if the datapoints have Gaussian distributions.

Is it also useful to determine how well a model fits in a probabilistic manner. The frequency distribution for a datapoint,yi, normally distributed from some model value f(x), is therefore: P(yi|f)= exp−((yi− f(xi))2/2σ2_i) √ 2πσi (2.3) We seek to know some model parameters that maximise the probability of each of our observations. The "likelihood", which is the probability of obtaining your dataygiven the model f and the fixed observed values ofxiandσi:

L =P(yN_i₌₁|f,I)= N

i=1

P(yi|f) (2.4)

Applying a logarithm to remove the exponential term and encapsulating constants intoK

gives: lnL =K− N X i=1 (yi− f(xi))2 2σ2_yi = K− 1 2χ 2 _(2.5)

Therefore, for well-behaved Gaussian distributions, maximising the likelihood is equivalent to minimising the chi-square. For non-Gaussian statistics, the likelihood formulation can be modified to return the best-fitting model.

For many scenarios, we also have information from past experience or theoretical considerations, about what the eventual parameters should be - these are "priors" on the distribution. To compute the most likely model given the datapoints and the priors, we can turn to Bayes theorem:

P(f|yN_i₌₁,I)= P(y N

i=1|f,I)P(f|I)

P(yN_i₌₁|I) (2.6)

P(yN_i₌₁|f,I) is the likelihood function, our prior knowledge on the model parameters is

the denominator,P(y_iN₌₁|I) or the probability of the datapoints given their position and error, can be ignored as a constant.

The priors on model parameters are themselves a probability distribution, guided by past experiment or theoretical limits, that are applied to each model parameter. Typical priors include a uniform prior (limited between two end points), a Gaussian prior (normally distributed around expected valueµwith standard deviationµ), or more complex distributions. These are normalised such that the total probability for each parameter is one. The result is a posterior probability density function (PDF) that encodes the probability of each model parameter, given the datapoints and the priors.

The former two methods (chi-squared minimisation and log likelihood maximisa- tion) produce, for some given model, the best-fitting parameters. For perfectly gaussian (and uncorrelated) parameters, their uncertainties can be estimated. However, most models will have correlations or asymmetric parameter distributions, and therefore we need an empirical way to estimate for the uncertainties on (and covariances between) model parameters. The most frequent method is to use Markov Chain Monte Carlo (MCMC).

In document Long period exoplanets from photometric transit surveys (Page 43-45)