Richards et al features - Variable Star Classification

5.2 Variable Star Classification

5.2.2 Richards et al features

Richards et al. utilised a large number of features in their machine learned classification models for the All-Sky Automated Survey (ASAS) (Richards et al.,2012). Whilst most of their features have already been discussed in the variability indices and Fourier decomposition subchapters, they did introduce a number of novel features to characterise the performance of period estimation methods in addition to descriptors of the phase positions of eclipse features for eclipsing binary classification. Eclipsing binaries are not well modelled by sinusoidal models therefore the Fourier decomposition is of limited use as higher harmonic components are required for good fits yet these parameters are heavily regularised.

There are five eclipsing binary features which have been calculated from the Skycam light curves by making use of the PolyFit algorithm discussed in chapter 6. This allows the phases of the maxima and minima to be more carefully isolated than if extracted from the Fourier harmonic model. They are defined as:

• Eclipse Max Delta Mags

This feature determines the absolute value of the magnitude difference between the two maxima of the phase-folded light curves at 2× the candidate period. These maxima correspond to the primary and secondary eclipses for an eclipsing binary. For most eclipsing binaries this value will be non-zero as the eclipses have differing heights whereas for a light curve folded at 2× the true period, the two maxima should be the same producing a feature of zero.

• Eclipse Min Delta Mags

This feature determines the absolute value of the magnitude difference between the two minima of the phase-folded light curves at 2× the candidate period. This feature measures the relative out-of-eclipse brightness difference between the primary

and secondary eclipse and the secondary and primary eclipse. For most eclipsing binaries this is expected to be zero unless an additional effect is present such as the distortion wave of an RS Canum Venaticorum variable.

• Eclipse Phase Ratio

This feature is used to define the eccentricity of an eclipsing binary based on the phases of the primary and secondary eclipses. It is defined as the ratio between the phase difference of the first minimum and first maximum associated with the primary eclipse and the phase difference between the second minimum and second maximum associated with the secondary eclipse. This feature takes on values near unity for highly symmetrical eclipsing binaries. Values diverging from unity suggest the presence of either an eccentric eclipsing binary or some other class of non-eclipsing variable.

• Reference Phase

This last feature is present to define the location of the reference phase, the phase associated with the pre-primary eclipse out-of-eclipse brightest magnitude. This feature defines the performance of the PolyFit algorithm at fitting the phase-folded light curve.

• Period Double Ratio

The Period Double Ratio is a new feature we have defined which is also designed for assisting in the identification of light curves where the period estimation method has produced a period at half of the true astrophysical period. The period esti- mated by the GRAPE method is fine-tuned through the use of the Variance Ratio Periodogram, a multi-harmonic period estimation which can correctly model non- sinusoidal signals such as the eclipsing binary light curve. The Period Double Ratio is defined as the ratio between the variance ratio of the candidate period and the variance ratio of twice the candidate period. This calculation is shown in equation 5.29.

Prat =

Vrat(P )

Vrat(2P )

(5.29) where Prat is the Period Double Ratio and Vrat(x) is the variance ratio calculated

for a four harmonic Fourier fit determined by a candidate period x. If this feature is greater than unity, the better fitting model is the one generated by the candidate period indicating the light curve is either a pulsating star or a very close contact binary. On the other hand, if the feature is less than unity, the light curve is likely an eclipsing binary with a better fitting multi-harmonic model at twice the candidate period.

Feature Extraction 164

Richards et al. also introduced a set of features designed to describe the performance of the Fourier harmonic model in fitting the light curves (Richards et al.,2012). Using statistics that quantify the normality and scatter of the residuals of the fit calculated by equation 5.30, light curves with complexity beyond the modelling capability of the harmonic models can be identified. These features have been adopted from the work of Dubath et al. and Kim et al. from their classification methods (Dubath et al., 2011;

Kim et al.,2011).

ri= mi− ˆmi (5.30)

where ri is the residual of the ith data point, mi is the magnitude of the ith data point

and ˆmi is the predicted magnitude of the ithdata point as determined by the harmonic

Fourier model. There are two features that quantify the statistics of the residuals of the harmonic Fourier model defined as:

• Residual Normality

This feature is produced by applying the Anderson-Darling test discussed above to the set of residuals calculated by equation5.30. For light curves with signal and Gaussian noise, the application of a good-fitting harmonic model to the light curve should leave purely Gaussian noise which will return a highly normal distribution. For real light curves such as the SkycamT light curves, the correlated noise does reduce the effectiveness of this test but it might still be of some use to the classifiers. • Residual Raw Scatter

The Residual Raw Scatter determines the ratio of the range of the spread of magnitudes of the residuals to the initial amplitude. If the residuals have values across a large range of magnitudes it suggests either a poor fit by the harmonic model or the light curve has a large noise contribution. As the light curves are assumed to have similar noise statistics, the classifier will interpret the relative values of this feature as a measure of the goodness-of-fit of the harmonic model. Residual Raw Scatter is defined by equation 5.31.

rMAD=

med (|r − med(r)|)

amp (5.31)

where rMAD is the Residual Raw Scatter, med is the median operation, r is the

vector of residual magnitudes and amp is the amplitude of the initial light curve.

There is one final feature named Squared Differences over Variance. This feature is defined as the sum of the squared magnitude differences in successive measurements divided by the variance of the light curve (Kim et al., 2011). It is the equivalent to the unnormalised variability index so it is much more sensitive to the changes in well

sampled light curves. The normalised variability index is likely more useful for light curve classification but this is implemented regardless and can be removed by feature selection operations. Equation5.32 demonstrate the calculation of this feature.

sdv = 1 σ2 N −1 X i=1 (mi+1− mi)2 (5.32)

where sdv is the Squared Diferences over Variance feature, σ is the standard deviation of the magnitudes of the light curve, N is the total number of data points in the light curve and mi is the magnitude value of the ith data point.

In document An Automated Pipeline for Variability Detection and Classification for the Small Telescopes Installed at the Liverpool Telescope (Page 186-189)