Boosted Decision Trees - Reconstruction Algorithms

4.4 Reconstruction Algorithms

4.4.10 Boosted Decision Trees

The boosted decision treeHSS+07 is one technique to combine the value of several different variables into a single quality criterion. The BDT has to be “trained”. For this background and signal data is used. The data is split with the variable, that has the best background to signal separation power, at the optimal cut position into two parts. The data in these parts is split again with the most powerful variable on this data region. This procedure is iterated on each part such that a decision tree forms. Each of the leaves contains either mostly signal or mostly background. So the event classification of a single tree i for an event x is then

hi(x)=

(

−1 ends up in a background leaf

+1 ends up in a signal leaf (4.44)

The weights of the events that were misclassified by the decision tree are multiplied by a factor αi+1 (so-called boosted) and a new tree i+ 1 is trained. This factor is calculated

by the boosting algorithm (here AdaBoost was used, see Hoecker et al.HSS+07for details). With this procedure a “forest” of N decision trees is created.

To calculate the BDT score for an event one now counts, in how many trees the events ends up as signal and in how many it ends up as background. The trees are weighted by the same boosting factor α, with which the misidentified events were reweighted. The event classification is now given as

sBDT(x)= 1 N X i∈forest ln αihi(x). (4.45)

5 Cascade Analysis

The goal of the analysis is to find an extra-terrestrial neutrino utilizing neutrino induced cascades. Electron neutrinos generate cascades for both neutral and charged current inter- action. Following the flux prediction discussed in Sec. 2.2 it is a natural choice, to optimize the analysis for an E−2 electron neutrino spectrum. Atmospheric muons and atmospheric neutrinos generated from pion and kaon decay are considered as background. A possible component of the atmospheric neutrino spectrum generated by mesons containing charm quarks is still undetected and as such considered as another interesting source of signal.

Due to the low number of expected signal events, it is crucial to avoid a bias from the experimenter. Therefore the analysis is performed in a blind manner (see Klein and Rood- manKR05). The cut optimization of the analysis has been done on Monte Carlo simulations. Around 10 % of the measured data was used as “burn sample” to check and verify the simulation data and is not further used for the search.

In this chapter the data and analysis technique are described. The results, including the systematic uncertainty calculation, are discussed in Chapter 6.

5.1 Data Samples

The analysis uses data taken during 2008 and 2009 with the 40-string configuration of Ice- Cube (see Sec. 3.1). Only data runs with all strings operational, the in-ice detector component running and having no known issues1are considered. These runs were selected from a data run summaryR+10done by the collaboration. As a further quality criterion the rates af- ter the off-line filtering (see Sec. 5.2.2) are used. It was found that the runs 111150, 113219 and 113241 had problems with the DAQ during data taking. Events were recorded multiple times, so that the runs were completely removed from the analysis. The rates of the remaining runs are consistent with each other and follow the seasonal variations (see Fig. 5.1). The runs 112764, 112763 and 111770 have already been lost at an early processing level2 and were not included. It is observed, that there are gaps larger than half a minute between events. Such gaps made up less than 10 % of the run time in some few runs and in total less 0.05 % of the full time of data taking (see Fig. 5.2). However, this gaps were taken into account in the lifetime calculation. All runs with a run number ending with zero are used as a burn sample to compare background simulation to and are not further used in the later analysis. This has the advantage, that they are well distributed over the full measurement

1_{Common issues are e.g. very short runs (less than 10 min) and an artificial light source switched on in the}

detector, seeR+07

2_{Likely some failed copying or writing step during a collaboration-wide processing. It was not found worth}

48 CHAPTER 5. CASCADE ANALYSIS

time and thus include seasonal variations (see Sec. 6.2). This burn sample has a life time of 35 days, the remaining data has a life time of 332 days.

The Monte Carlo simulations used in this work are listed in Tab. 5.1. The “basic MC” is used for the cut optimization described in this chapter. As described in Sec. 4 single atmospheric muon events and double and triple coincident events were separately simulated. They are used together for the atmospheric muon prediction (µatm). As higher energetic events are a more important background for this analysis, the single muon simulation ap- plies weights in order to over-sample higher energies (see Sec. 4.2). The neutrino data sets for electron, muon and tau neutrinos are generated with a E−1flux. They were used for the signal estimation and the atmospheric neutrino prediction by reweighting. There is no well established flux normalization for the signal, however, it was desired to calculate event num- bers, so that a comparison to background or between different stages of the analysis is possible. Therefore the E−2signal flux was normalized withΦ0= 5·10−7E−2GeVs−1sr−1cm−2,

which corresponds to the limit for an all flavor neutrino flux found in a diffuse search in five years of AMANDA data.3 A+11eFor the conventional atmospheric neutrino flux (νatme,µ), i.e.

the flux generated by pion and kaon decay, the Bartol modelBGL+04was used for reweighting the electron and muon neutrino data sets. For a component of the neutrino flux generated by mesons containing charm quarks, the so called prompt flux (νprompte,µ ) the model from Sarce-

vicERS08 was applied. Compare Sec. 2.2.2 for a description of the atmospheric neutrino fluxes. For neutrino cross sections CTEQ5L+00was used for the main analysis. However, the only available tau neutrino data set uses CSSCS08neutrino cross sections (see Sec. 4.1.1 and 6.2). The standard ice model for all simulation data sets is AHA (see Sec. 3.2).

Separate single muon simulation and electron neutrino simulation was used for the BDT training (see Sec. 5.2.3). Systematic checks in Sec. 6.2 use MC with the SPICE1 ice model (see Sec. 3.2) instead of AHA, modified DOM efficiencies 10 % higher and lower than the nominal ones and CSS cross sections instead of CTEQ5.

The flusher runs 111739 and 111741 were used for a cross check of the analysis and are described in some detail in Sec. 6.1.

In document Search for neutrino-induced cascade events in the IceCube detector (Page 56-58)