4.4 Towards a galaxy cluster selection function for eROSITA
4.4.2 Source classification, completeness and contamination
The estimation of the cluster selection function in X-ray surveys requires a careful source classification and an accurate determination of the sample completeness and contamination. To estimate these quant- ities for eROSITA the methodology from Pacaud et al. (2006) is adopted in this work. This approach basically consists in exploring the output parameter space of the maximum likelihood fitting to set point- like and extended source selection criteria, determining the source detection efficiency and estimating the contamination by spurious or misclassified sources.
Point-like sources
AGNs represent the dominant extragalactic population in X-ray wavelengths. Although the goal of this study is to determine the galaxy cluster selection function, the estimation of the point-like detection efficiency and its contamination helps to control the systematics in the detection and characterisation of the extended source population.
When running the eSASS detection pipeline over the “AGN+Background” simulations of both sky regions, a high spurious detection rate is found: 17 ± 1 and 18 ± 2 spurious point-like detections per square degree for Equatorial and Intermediate fields, respectively (see “AGN+Background” fields in Table 4.7). This contamination level is too high and a different strategy has to be followed in order to reduce it. Moreover, these simulations are contaminated by some spurious sources classified as extended: 0.7 ± 0.3 and 3.0 ± 0.5 spurious extended detections per square degree for Equatorial and Intermediate fields, respectively.
By exploring the ermldet output parameter space of the “AGN+Background” simulations, one can see that a simple threshold of 10 in the minimum detection likelihood parameter removes most of the spurious point-like sources in the Equatorial field (dashed line in the left panel of Fig.4.18). However,
4.4 Towards a galaxy cluster selection function for eROSITA
Figure 4.18: Determination of the eSASS pipeline selection criteria for point-like sources. The selection is performed in the count rate−minimum detection likelihood plane for both simulated sky regions: Equatorial (left) and Intermediate (right) of the point-like simulations (“AGN+Background”). Simulated AGN are displayed as blue squares and spurious point-like detections as black triangles. The dashed line at minimum likelihood defines the point-like source sample.
Figure 4.19:Determination of the eSASS pipeline selection criteria for extended sources. The selection is per- formed in the count rate−minimum detection likelihood plane and in the extent - extension likelihood plane for both simulated sky regions: Equatorial (left) and Intermediate (right). Simulated AGN are displayed as green diamonds, spurious extended detections as black triangles, and misclassified AGN as blue squares. The dashed lines define the optimal parameters for the extended source samples.
choosing a similar threshold seems more complicated for the Intermediate field, where many spurious sources are detected and are difficult to disentangle from the true sources. In this case, the output parameter space of the “AGN+Background+Clusters” simulations is investigated. This is a valid move since the goal is to find low-contaminated samples of extended sources. Extended sources are less affected by spurious sources with a threshold of 20 in the minimum detection likelihood parameter (dashed line in the top-right panel of Fig.4.19). This value also applies for the point-like sources only simulations (right panel of Fig.4.18). The parameters that remove most of the spurious detections are considered as optimal values.
The eSASS task ermldet was run again over the erbox output (in map mode) of the “AGN+Background” simulations, but using the optimized parameters (see optimal parameters in Table 4.6). The new results give spurious detection rates of 0.1 ± 0.1 and 1.5 ± 0.4 spurious point- like detections per square degree, and 0.7 ± 0.3 and 0.8 ± 0.3 spurious extended detections per square degree for Equatorial and Intermediate fields, respectively (see “AGN+Background” fields with optimal parameters in Table4.7). These contamination levels are more reasonable than before.
The resulting AGN, i.e. point-like sources, detection efficiency as a function of input flux is shown in the top panels of Fig.4.20. This efficiency is obtained by calculating the ratio of the cross-identified objects to the input sources. The displayed error is given by the standard deviation over the 15 simulations of each sky region. For the Equatorial field, the point-like sources have a 90% completeness at a flux limit of ∼ 1.5×10−14erg s−1cm−2, while for the Intermediate field this flux limit is ∼ 1.2×10−14erg s−1cm−2. The large error bars in bright sources reflects mainly their lower density number, which is given by the AGN log N − log S distribution.
The dashed lines in the top panels of Fig. 4.20show the predicted point-source flux limit presented in the eROSITA Science Book (Merloni et al.2012) for the corresponding exposure times. However, such predictions are based in an AGN model completely different from the one used in the synthetic simulations. Moreover, as presented in Section4.1, the predicted flux limit represent the 5σ probability detection over an aperture of 6000 diameter. Therefore, a direct comparison is difficult.
The differential flux distributions for the Equatorial and Intermediate fields are shown in the middle panels of Fig. 4.20. This representation allows putting a conservative point source flux limit of ∼ 2 × 10−14 erg s−1 cm−2 and ∼ 1.5 × 10−14 erg s−1 cm−2 for the Equatorial and Intermediate fields, re- spectively. Below such fluxes, the sample incompleteness becomes important. Such limiting fluxes and incompleteness are confirmed by the log N − log S functions shown in the bottom panels of Fig.4.20.
Extended sources
The extended source selection is a complicated task since it has to deal with spurious detections charac- terised as extended sources and with misclassified point-like detections. Extended sources are usually lower and extended surface brightness objects, which also makes their detection a difficult process. As with point-like sources, the aim is to find a location in the ermldet output parameter space where the majority of the simulated extended sources are recovered while keeping the contamination level at a reasonable rate. This is important to keep in mind since the goal of eROSITA is to use galaxy cluster counts to constrain the dark energy. Therefore, obtaining a pure and complete galaxy cluster sample is necessary.
4.4 Towards a galaxy cluster selection function for eROSITA
Figure 4.20:Point-like source completeness analysis for both simulated sky regions: Equatorial (left) and Inter- mediate (right). Top panels: Point-like detection efficiency as function of input flux. The dashed line indicates the Merloni et al. (2012) flux prediction for a secure point-like detection. Middle panels: Differential number counts as a function of input flux. Bottom panels: Integral number of point-like sources as a function of input flux. In the middle and bottom panelsthe continuous histogram shows the input distribution and the blue squares show the eSASS detected distribution. The error is given by the standard deviation over the 15 simulations.
As a first step, the source contamination rate over the “AGN+Background+Clusters” simulations, for the Equatorial and Intermediate fields, are analysed. The results show the following spurious detection rate: 0.6 ± 0.2 and 2.0 ± 0.3 false extended sources per square degree, and 18 ± 4 and 20 ± 2 false point-like sources per square degree for the Equatorial and Intermediate sky regions, respectively (see “AGN+Background+Clusters” field in Table 4.7). There are two main things to highlight from these results. First, the number of spurious extended sources in the Intermediate field is rather high. Such contamination cannot be allowed in a galaxy cluster survey. Reliable cluster output samples are a necessary condition for cosmological tests. Second, the number of spurious point-like sources is slightly
higher than the one obtained in the “AGN+Background” simulations. The reason is that some galaxy clusters, especially the ones with a low surface or very extended, are more easily split in more than one source, which are misclassified as point-like by the detection algorithm. This, of course, increases the number of false point-like sources.
By looking at the ermldet output parameter space of the “AGN+Background+Clusters” simulations, shown in Fig.4.19, one realizes that the adopted threshold in the minimum detection likelihood para- meter (equal 10) for point sources also reduces the contamination rate in the extended sources in the Equatorial field. Although such threshold excludes a number of extended sources, a trade-off between sample completeness and contamination has to be made. For the simulated Intermediate field, in addi- tion to the threshold in the minimum detection likelihood parameter (equal to 20), thresholds of on the extent (equal to 3.5) and extension likelihood (equal to 7) parameters can lower the spurious and mis- classified sources. By using these optimized parameters on ermldet, one obtains a spurious detection rate of 0.5 ± 0.1 and 0.4 ± 0.2 false extended sources per square degree, and 0.7 ± 0.7 and 1.7 ± 0.3 false point-like sources per square degree for the Equatorial and Intermediate sky regions, respectively (see “AGN+Background+Clusters” field in Table4.7).
Figure 4.21shows the extent - extension likelihood plane of the “AGN+Background+Clusters” sim- ulations. The left panels of Fig. 4.21show all the extended-like sources detected by ermldet: true, spurious and misclassified sources. In the Equatorial field simulations, an uncontaminated sample can be determined by choosing only extended sources with extension likelihood values greater than 170. This kind of sample cannot be defined for the Intermediate field simulations since there are spurious detections all over the extent - extension likelihood plane.
The middle panels of Fig. 4.21show only the true detected galaxy clusters, which are colour coded according to their core radius. Although the core radius values are spread all over the extent - extension likelihood plane, there is a tendency of galaxy clusters with low core radius values to have a smaller extension, while galaxy clusters with large core radius values have a larger extension. The right panels of Fig. 4.21 show also the true detected galaxy clusters colour coded according to their input flux. From these plots one can see that mostly bright galaxy clusters are detected and correctly identify (> 2 × 10−14erg s−1cm−2). Although fainter clusters do not appear in these plots, it does not mean they are not detected. It can be that they have been just misclassified, and they can be recovered by changing the input parameters in ermldet. However, changing the parameters has consequences, for example, a higher contamination level.
The detection efficiency of extended sources is shown in Fig. 4.22for the Equatorial and Intermediate field simulations. The top panels display the detection efficiency as a function of input flux for each simulated value of core radius. The dashed lines show the predicted 7σ flux limit over a 30diameter for a secure detection of extended sources (eROSITA Science Book; Merloni et al.2012). This theoretical flux represents a source detection limit, rather than a threshold for classifying the source as extended. The extended source detection efficiency obtained in this work is obtained after applying two thresholds, on source detection and extension. Therefore, a comparison between the predicted flux limit and the measured one is not straightforward. Galaxy clusters with fluxes > 5×10−14erg s−1cm−2and core radii > 8000have more than 50% probability of being detected in both simulated sky fields. At lower fluxes and core radius values, the detection efficiency decreases rapidly.
The bottom panels of Fig. 4.22 show the galaxy cluster detection probability as a function of input flux and core radius for both simulated sky regions. In general, the detection and characterisation of extended sources with fluxes below 1 × 10−14erg s−1cm−2seems rather difficult for the two simulated
4.4 T o w ards a g alaxy cluster selection function for eR
Figure 4.21:Determination of the eSASS pipeline selection criteria for extended sources with optimal (low-contamination) ermldet parameters. The extent - extension likelihood plane is shown for both simulated sky regions: Equatorial (top) and Intermediate (bottom). Left panels: simulated AGN are displayed as green diamonds, spurious extended detections as black triangles, and misclassified AGN as blue squares. Middle panels: Only input detected galaxy clusters are displayed (green diamonds in the left panels). The distinct colours show the different simulated core radii (in arcsec). Right panels: Only input detected galaxy clusters are displayed. The different colours show the distinct simulated input fluxes (in units of erg s−1 cm−2). The dashed line in the top panels defines the
Figure 4.22:Extended source detection efficiency of the eSASS pipeline in the Equatorial (∼ 1.6 ks exposure, left) and Intermediate (∼ 4 ks, right) simulated fields. Top panels: As a function of input flux and for each simulated core radius value. Bottom panels: As a function of input flux and core radius.
fields. This does not necessarily signify that such galaxy clusters are not detected at all, it mainly implies that such objects do not satisfy the chosen extended source criteria. In Equatorial simulations, which have the lowest exposure time ( ∼ 1.6 k), ermldet has problems in detecting and characterising low surface (< 5×10−14erg s−1cm−2) and very extended sources (rc > 7000). The extended source detection efficiency for the Intermediate field simulations is moderately higher than for the Equatorial field due to the higher exposure time. The improvement is mostly noticed for sources with large core radius, at least for the brightest galaxy clusters.
As stated in Pacaud et al. (2006), and confirmed by the results of this work, the galaxy cluster detection efficiency is not a simple function of cluster flux. It highly depends on the galaxy cluster flux and size (or morphology). According to these results, flux-limited samples can be incomplete given the cluster detection dependence on the cluster size. The probability efficiency of the Intermediate field somewhat resembles a flux-limited sample for bright sources, but as going to fainter fluxes, the picture gets as complex as for the Equatorial field. Next section presents a comparison between a flux-limited sample and a sample described by the results obtained here.
4.4 Towards a galaxy cluster selection function for eROSITA