• No results found

Verification of the Method Using SDSS S82 Classification Information

5.4 Methodology

5.4.4 Verification of the Method Using SDSS S82 Classification Information

In order to test the efficacy of the selection and classification method, detailed testing was carried out on the training set and especially on the S82 area, using PS1 3π lightcurves, with the training set’s labels from S82 (Schneider et al. 2007; Sesar et al. 2010) and Draco dSph (Kinemuchi et al.

2008) as the “ground truth” to quantify purity and completeness of the classifications. Purity and completeness are always quantified with respect to a threshold of the class probability pRRLyrae

or pQSO, respectively. Given a threshold on the score and knowing the true class of each source in

the training set, one can measure the fraction of recovered RR Lyrae (or QSO), the completeness, as well as the fraction of true RR Lyrae (or QSO) in the obtained sample, the purity.

For any one of the two categories, say RR Lyrae, one can define a candidate sampleS by the choice of a minimum pRRLyrae. On the basis of the S82 and Draco dSph ground truth, the completeness

and the purity of this sample can be defined (see Section 4.2.4). Here, purity is defined as the fraction of all RR Lyrae stars inS, and the “completeness” is the fraction of actual RR Lyrae stars contained in S. In both instances, one would expect completeness to be monotonic and purity to be nearly monotonic in pRRLyrae. For the QSOs and any other class, analogous definitions

apply. Depending on context, a sampleS is described either by a cut on pRRLyrae/QSO , or by the corresponding purity and completeness of this sample as determined on the training set.

In order to give estimates on purity and completeness, again a 10-fold stratified cross-validation is used. The model is trained on 9 subsets with a balanced ratio of sources from each class, and purity and completeness when applying this model to the tenth subset are calculated. The whole procedure is repeated 9 more times, each with a different held-out set. The average of 10 evaluations is finally used as purity and completeness. The spread of purity and completeness based on the chosen training set can be estimated from the spread of the purity and completeness obtained from the 10 individual runs.

For all purity-completeness plots in the following, a step size in pRRLyrae, pQSOof 0.001 was chosen.

Tables of purity and completeness with a stepsize of 0.01 are given in the Table Appendix, Section B.1.

Fig. 5.8 shows purity-completeness curves (see Section 4.2.4) for the trade-off between purity and completeness with respect to the total cross-matched sources. These values are calculated for all sources in the training set, irrespectively of brightness. The purity-completeness curves are given for using not only the full feature set, but various subsamples of the features for classification, namely PS1 3π variability and color, PS1 3π variability only, PS1 3π and WISE color only. For comparison, the case of using all features in the training set is given for PV2 as dashed line. The left column refers to QSO classification, the right one to RR Lyrae classification. This Figure shows that, as expected, for small completeness the purity is maximal, while the completeness is maximized with severe expense to the purity. What compromise needs to be made between completeness and purity in sample selection depends in detail on the science question, but the top panels of Fig. 5.8 suggests that the purity increases only little at the expense of completeness less than 80%. This may be a sensible threshold for an inclusive sample, whenever PS1 lightcurves and mean colors, as well as WISE colors are available. At the top of the horizontal axis, the relation between completeness and pRRLyrae, pQSO is indicated.

The different lines in the upper panels of Fig. 5.8 illustrate the relative importance of the different pieces of data that may enter the classification; the classification as not only carried out with the full feature set from Table 5.7, but also tested the cases where only color-related or variability- related information was used. Also the calculated feature importance, as given above, highlights the rigorous importance of the variability features.

Fig. 5.8 shows that the variability information is absolutely indispensable to define a sample with a sensible combination of purity and completeness for RR Lyrae as well as for QSO. These different purity-completeness curves also indicate what one might expect for purity and completeness, when a particular source lacks some information used as feature, for instance, a detection in WISE or particular PS1 3π bands.

Given that the training set is finite in size, the purity and completeness will depend in detail on the chosen training sample. The individual lines in the lower panels of Fig. 5.8 reflect different samplings of the training set. For a training set of the size available in S82, the effect is noticeable, but small.

PS1 3π variability, PS1 3π + WISE color PS1 3π variability, PS1 color PS1 variability only PS1 + WISE color only PV2: PS1 3π variability, PS1 3π + WISE color

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.0 0.88 0.8 0.68 0.50 0.004 purity completeness pQSO QSO 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.0 0.92 0.82 0.64 0.3 0.001 purity completeness pRRLyrae RR Lyrae 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.0 0.88 0.8 0.68 0.50 0.004 purity completeness pQSO QSO

PS1 3π variability, PS1 3π + WISE color

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.0 0.92 0.82 0.64 0.3 0.001 purity completeness pRRLyrae RR Lyrae

Figure 5.8 Trade-off between purity and completeness with respect to total cross-matched sources for different pieces of information provided to the RFC. The upper panels show purity-completeness curves when PS1 variability and PS1 + WISE colors, PS1 variability and colors only, PS1 variability only, PS1 + WISE colors are provided. There is a limited purity and completeness that can be achieved with variability only (yellow line). As expected, using all features (blue line) gives the best result quantified by purity and completeness.

The lower panel gives the impact of the training-set stochasticisty, illustrated by the dependence of purity and completeness on the chosen training set sources (presuming PS1 variability and PS1 + WISE colors are provided). The trade-off between purity and completeness is plotted from using 10 different randomly selected training sets, as well as their mean (thick dark blue line, the same as the blue line in the upper panel). At the top of the horizontal axis the relation between completeness, and pRRLyrae, pQSO is given. For RR Lyrae, with only 458 S82 RR Lyrae

and 269 Draco dSph RR Lyrae in the training set, the stochasticisty is noticeable. In contrast, for QSO with 9045 sources in the training set, it is negligible.

Tables of purity and completeness for the case of using all features (blue line) can be found in the Table Appendix, Table B.1 for QSOs and B.3 for RR Lyrae, respectively.

Dependence of Purity and Completeness on Source Brightness

The result shown in Fig. 5.8 is integrated over a range of distances (roughly 14.5 < rP1< 22, or

∼5−120 kpc for RR Lyrae). Since it is reasonable to expect variations in purity and completeness as a function of distance (or magnitude), a more detailed analysis is needed. It is likely that the classification becomes more uncertain as sources get fainter and light curves become sparser and noisier.

To specify the heliocentric distance dependence for RR Lyrae classification, the training set’s ground truth used for verification was split up into sources with ∼40 kpc (14.5 < rP1 < 18.5)

and ∼80 kpc (19.7 < rP1 < 20.7). The obtained completeness and purity was then compared to

the one from the full sample reaching∼14.5 < rP1 < 22. The rP1 = 18.5 mag brightness cut was

used because the vast majority of halo RR Lyrae stars are located within that magnitude range (Sesar et al. 2010).

The resulting purity and completeness is given in Fig. 5.9 as well as in the Tables B.3 to B.5. At a heliocentric distance of∼40 kpc, for a completeness of 0.8, a purity of 0.86 can be reached, using a pRRLyrae threshold of 0.27. At ∼80 kpc, the same threshold results into a completeness of 0.8,

purity of 0.8. For a threshold of 0.06, for sources at ∼40 kpc, the sample completeness is 0.98, the purity 0.62, and for sources at ∼80 kpc, the sample completeness is 0.88, the purity 0.52. As being interested in distant sources, for further analysis the following thresholds are used: Sources of the sample that can be selected using pRRLyrae > 0.27 are referred to as “likely RR

Lyrae”, whereas those selected using pRRLyrae ≥ 0.06 are referred to as “possibly RR Lyrae”.

To specify the magnitude dependence of QSO classification, a subsample of the training set’s QSOs were used, selected by 14.5 < rP1 < 20. The obtained completeness and purity was then

compared to the one obtained from the full sample. As for RR Lyrae, also for QSO a higher purity at the same level of completeness can be reached for less faint sources. The resulting purity and completeness is given in Fig. 5.10, as well as in the Tables B.1 and B.2.

Using again a threshold resulting into a completeness of 0.8, a purity of 0.8 can be reached using pQSO ≥ 0.56. Using pQSO ≥ 0.31 instead, this results into a sample purity of 0.75, completeness

of 0.88.

For further analysis, the following thresholds are used:

Sources of the sample that can be selected using pQSO ≥ 0.56 are referred to as “likely QSO”,

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.0 0.92 0.82 0.64 0.3 0.001 purity completeness pRRLyrae all (~5-120 kpc) 14.5 < rP1 < 18.5 (~40 kpc) 19.7 < rP1 < 20.7 (~80 kpc)

Figure 5.9Trade-off between RR Lyrae purity and completeness with respect to total cross-matched sources for different brightness limits. The equivalent heliocentric distance for each range of appparent magnitude is indicated. At the top of the horizontal axis the relation between completeness and pRRLyrae is given. At the bright end,

14.5 < rP1< 18.5, a significantly higher purity at the same completeness can be reached than for fainter sources.

The purity-completeness curve integrated over the full magnitude range is indicates as thick line. Tables of purity and completeness for the different brightness limits can be found in the Table Appendix, Tables B.3 to B.5.

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.0 0.88 0.8 0.68 0.50 0.004 purity completeness pQSO all (~14.5 < rP1 < 22 ) 14.5 < rP1 < 20

Figure 5.10 Trade-off between RR Lyrae purity and completeness with respect to total cross-matched sources for different brightness limits. At the top of the horizontal axis the relation between completeness and pRRLyrae is

given. As for RR Lyrae, also for QSO a higher purity at the same level of completeness can be reached for brighter sources. The purity-completeness curve integrated over the full magnitude range is indicates as thick line. Tables of purity and completeness for the different brightness limits can be found in the Table Appendix, Tables ?? to B.2.