Tau Identification using Boosted Decision Trees

3.3 Tau Identification Techniques

3.3.3 Tau Identification using Boosted Decision Trees

The Boosted Decision Trees (BDT) Tau ID is described in detail in [126]. This method makes use of all the variables defined in 3.3.1 as input to a multi-variable technique which produces a

RE M Rtrk 1/ ftrk,1

Loose Cut 0.063 − 2.04e−4· pT+ 0.663/pT 0.134 − 2.72e−4· pT+ 0.0328/pT -

Medium Cut 0.0522 − 1.71e−4· pT+ 0.715/pT 0.134 − 2.72e−4· pT+ 0.0328/pT 8.33

Tight Cut 0.0306 − 1.03e−4_{· p}

T+ 0.819/pT 0.134 − 2.72e−4· pT+ 0.0328/pT 7.14

Table 3.3: Criteria for Safe Cuts Tau ID forτ1candidates with pT< 80 GeV. RE M Rtrk 1/ ftrk,1

Loose Cut 0.055 0.113 -

Medium Cut 0.0475 0.113 8.33

Tight Cut 0.0325 0.113 7.14

Table 3.4: Criteria for Safe Cuts Tau ID forτ1candidates with pT≥ 80 GeV.

RE M Rtrk 1/ ftrk,1

Loose Cut 0.179 − 5.17e−4· pT+ 0.339/pT 0.0565 − 1.44e−4· pT+ 0.695/pT 4.55

Medium Cut 0.162 − 4.62e−4· pT+ 0.447/pT 0.0303 − 9.9e−5· pT+ 0.810/pT 3.33

Tight Cut 0.0833 − 2.15e−4· pT+ 0.930/pT 0.0146 − 7.21e−5· pT+ 0.879/pT 2.5

Table 3.5: Criteria for Safe Cuts Tau ID forτ3candidates with pT< 80 GeV. RE M Rtrk 1/ ftrk,1

Loose Cut 0.142 0.0536 4.55

Medium Cut 0.13 0.0325 3.33

Tight Cut 0.0777 0.0198 2.5

Table 3.6: Criteria for Safe Cuts Tau ID forτ3candidates with pT≥ 80 GeV.

continuous discriminant output that later on is used to measure to what degree the τ candidate resembles a real tau jet.

The power of the BDT method relies on its higher discriminant power when compared to the individual variables used as input. In simple cut-based techniques, any candidate which fails the first cut is immediately discarded from the analysis sample. When using Decision Trees, candidates which fail a single cut are not immediately discarded from the sample, but are re- examined by the algorithm using all remaining variables. It is a multiple cut technique that allows to recover the signal candidates which would otherwise be lost in a simple cut method and remove background that would normally pass.

Decision trees are first constructed or “trained” using a sample of known signal and background composition called the training sample:

• The first step selects the variable and optimal cut value which provides the best signal to background separation and divide the sample in two subsets: those events that pass the cut and those that do not pass it.

• Then this process is repeated iteratively with the remaining variables until all have been used, or all subsets are pure signal or pure background or the number of events is too few to continue.

• Each decision point in the tree in which a variable and cut value are provided and the candidate is determined to either pass or fail it is called a node.

• At the end, if a subset has purity greater than 1/2 (or any defined cut value), then it is

labelled as signal and if the purity is less than 1/2, it is a labelled as background. Events are

classified signal if they land on a signal set and background if they land on a background set. The resulting tree is a decision tree and the ending data set are their leaves.

• For each event, the decision tree result (or classifier value) Di is equal to the purity of

the leaf on which the testing terminates. The more signal-like an event is, the closer its classifier value is to 1.

The key element of the BDT training process is therefore the criterion used to determine the optimal cut for each variable at each node. BDT training for τ identification uses the Gini index, defined as

iGINI= p(1 − p), (3.10)

where p is the signal purity at each node and (1-p) is the background purity. The purity of a sample at a given node is defined as

p= P sws P sws+ Pbwb , (3.11) whereP

swsandPbwbare the weighted sums of the signal and background events contained

within the node. Before training begins, events are weighted by the cross section for the sample divided by the number of events in the sample. Thus, the Gini index can be written as

iGINI= p(1 − p) = P sws Pbwb (P sws+ Pbwb)2 . (3.12)

The Gini index has a minimum of zero that occurs if p = 1 (all signal) or p = 0 (all back-

ground). The optimal cut minimizes the iGINI index at each node, therefore maximizing the

background purity, 1-p, on one side of the cut and maximizing the signal purity, p, on the other side of the cut. The optimization procedure to find the appropriate variable and cut value is well documented in [127].

Decision trees are a powerful but unstable tool to separate signal from background events. Small changes in the training sample can give a large change in the tree and the results. Also, as the number of nodes increases, the number of events available in each node decreases and

statistical fluctuations can affect the decision procedure and produce the tree to branch in a way

that does not truly correspond to the object under study.

A common way to address this weakness is through averaging several decision trees in order to determine the final probability that a candidate is signal. In addition to improving stability, the process generally improves performance, providing a better signal to background ratio of the sample. The averaging technique used for τ identification, known as “boosting”, uses the adaptive boosting algorithm AdaBoost [128], and proceed as follows:

• After passing a decision tree, if a training event with weight wi is misclassified, i.e, a signal event lands on a background leaf or a background event lands on a signal leaf, then the weight of that event is increased (boosted), in order to increase their importance during the next training. The weights of correctly classified candidates are not changed.

• The boosting parameter used to reweight the misclassified objects for the ith decision tree is defined by αi= β ln 1 − i i ! , (3.13)

where iis the error fraction of the ith decision tree which is the sum of the weights of the

misclassified event over the sum of the weights of all objects:

P wi,misclasified

P wi

, (3.14)

and β is a parameter set by the user which scales the amount by which the weights are

boosted/suppressed in the next tree. For τ identification it is set at β = 0.2.

• The new training sample with the new weights is used to build a new decision tree, in

which the order of the variables and the optimized cut values may differ from the previous

one. The new tree focus more on difficult cases and less on easy cases.

• The process is reiterated until no events in the training sample are misclassified.

The final classifier value, or BDT score, of an event after passing through the Boosted Deci-

sion Tree is the weighted sum of the individual tree results Di:

BDTscore= Ntrees X i

αi· Di. (3.15)

High scores mean the object is most likely signal and low scores that it is most likely background. By choosing a particular value of the score on which to cut, one can select a desired fraction of the signal or a desired signal to background ratio.

Boosted Decision Trees are widely used in high energy physics, and have already shown their

effectiveness, such as in the first evidence for single top quark production analysis by the D60

Collaboration [129].

For τ lepton identification, the BDT discriminant was trained using jets from data as back-

ground and τ leptons from Z → ττ MC as signal. For each τ candidate a BDT score (BDTj) is

calculated that has values that are more closer to 1.0 the more the candidate is like a real τ, and closer to 0.0 the less the candidate resembles a τ.

The cut in the BDTjhas to be such that it maximizes the acceptance on real τ leptons while

minimizing that of false τ jets. The optimized cut value for 2010 analysis was found to be

BDTj> 0.6 and it was increased to BDTj> 0.7 for 2011 analysis.

To remove electrons identified as τ leptons that may not have been identified as an electron we

use an additional criteria, relying on a Boosted Decision Tree output (BDTe) trained to separate

background. The most effective variables to construct the BDTeare E/p, the EM fraction and

the cluster-based shower width. As with any other BDT output, BDTetends to be near 1 (0) if the

τ candidate is a τ lepton (electron). The medium working point corresponds to 85% efficiency

for Z → ττ. This is an official veto supported by the ATLAS Tau Working Group which relies

4

Tau identification studies with early ATLAS

data.

In the Standard Model the top quark decays almost 100% of the times to a W boson and a b quark. Signal events produced from top quark pair production in the lepton plus tau channel:

tt→W(→e/µ+ νe/µ)bW(→τ+ ντ)b (4.1)

will exhibit the following experimental final state topology in the ATLAS detector (see Figure

4.1): two jets with high transverse energy ET, one isolated lepton (electron or muon) with high

transverse momentum (pT) from one W boson decay, one hadronically decaying τ from the other

Wboson decay and large missing transverse energy (Emiss

T ) from the undetected neutrinos.

The backgrounds affecting the t¯t(`, τ) signal can be classified in three types:

1. t¯t dileptonic and semileptonic decays where a jet or a lepton has been falsely identified as a tau jet

• tt→W(→e/µ+ νe/µ)bW(→qq0)b

• tt→W(→e/µ+ νe/µ)bW(→e/µ+ νe/µ)b

2. Non t¯t events with real tau leptons

• Z →τ+τ−_events.

• Diboson production WW, WZ, ZZ, where one boson decays into a lepton and the other into a tau.

• Single top, mainly from the Wt(→ Wb) production channel

3. Non t¯t events with a jet misidentified as a tau candidate

Figure 4.1: Scheme of a t¯t interaction decaying into two leptons. The final state in the detector is cha- racterized by two b-jets with high transverse energy ET, two isolated leptons (one of which will be an

hadronically decayingτ in the t¯t → ` + τ channel) and large missing transverse energy (Emiss

T ) from the

undetected neutrinos.

• W+ jet events

• Multi-jet events where one jet has been identified as a lepton and the other one as a tau jet.

The selection of t¯t → `+ τ events, described in Section 4.1, is based on kinematical require-

ments optimized for the 2010 data samples. After applying the event selection, the dominant

backgrounds are W+ jet and top pair production in the lepton plus jets channel. After applying

b-tagging t¯t lepton plus jets becomes the dominant background. Therefore, τ lepton identifica-

tion is the critical discriminator between signal and background. The two different identification

criteria described in Section 3.3, safe cuts and Boosted Decision Trees, are studied using 35

pb−1 of integrated luminosity data collected in 2010 to explore the feasibility of t¯t → `+ τ

measurements. The t¯t → `+ τ background from misidentified jets is estimated from data using

a simple matrix method technique described in Section 4.3.

4.1 t¯t →

` + τ event selection

The following event selection has been optimized to maximize the signal significance using Monte Carlo samples:

• Data event cleaning is needed in order to ensure the quality of object reconstruction. The Data Quality (DQ) selection within ATLAS is based upon inspection of a standard set of distributions that leads to a Data Quality assessment which is encoded in the so-called

DQ flags. DQ flags are issued for each detector, trigger, and for each physics object (jets, electrons, muons, etc.) to be reconstructed. Hence the state of the ATLAS detector, from hardware to physics object reconstruction, is expressed through the DQ flags, which are saved per luminosity block [130]. Luminosity blocks that pass the DQ selections are included in a Good Run List (GRL). Events not included in the GRL are discarded from the analysis.

• In order to ensure the event originated from the collision and to reject the non-collision background events, an event is required to have a primary vertex with at least five tracks

and is discarded if any jet with pT > 20 GeV fails the jet quality selections designed to

reject jets arising from out-of-time activity or calorimeter noise [130].

• There must be one, and only one, isolated electron (muon) and no identified muons (elec-

trons) for the e(µ)+ τ channel. This requirement vastly reduces the multi-jet background.

• The selected isolated lepton must be matched to a single lepton trigger: a single-electron

trigger with a pT threshold of 15 GeV or a single-muon trigger with a pT threshold be-

tween 10 and 13 GeV .

• There must be at least one loose τ candidate, as defined in Section 3.2.

• There must be at least two jets not overlapping with a τ candidate. In the case of two τ

candidates, we chose τ1over τ3as a τ candidate. If both are either τ1 or τ3, at least one

jet is required to be not overlapping with either τ candidate, and both candidates are kept.

• A cut on the transverse missing energy, Emiss

T > 30 GeV, is applied to reduce the QCD

multi-jet background. Multi-jet events have low Emiss

T since the only neutrinos produced

come from secondary decays inside the jets.

• A cut on the scalar sum of the lepton pT, the pTof the jets, the pTof the τ candidate, and

E_Tmiss, HT > 200 GeV, is applied to reduce the W + jets background.

• There must be at least one jet identified as b-jet (≥ 1 b-tag) by applying the b-tagging

requirement SV0 weight ≥ 5.85. The efficiency of this cut is 50% for b-jets from t¯t

decays.

cut t¯t(`, τ) t¯t(`+jets) t¯t(``0₎ _Z_+jets _W_+jets _{t(→ Wb)q} _Total _Data

Isolated e 49 ± 1 442 ± 2 83 ± 1 15627 ± 39 133705 ± 104 16 ± 0 149922 ± 111 320346 Trigger 48 ± 1 441 ± 2 82 ± 1 15514 ± 39 132806 ± 104 16 ± 0 148907 ± 111 315910 ≥ 1 τ candidate 23 ± 0 152 ± 1 17 ± 0 1333 ± 11 6378 ± 20 2 ± 0 7906 ± 23 27874 Njet≥ 2 20 ± 0 148 ± 1 14 ± 0 281 ± 5 841 ± 5 1 ± 0 1305 ± 7 3360 Emiss T > 30 GeV 17 ± 0 118 ± 1 12 ± 0 75 ± 3 567 ± 4 1 ± 0 791 ± 5 1069 P ET> 200 GeV 17 ± 0 117 ± 1 12 ± 0 60 ± 2 391 ± 3 1 ± 0 598 ± 4 708 ≥ 1 bjet 12 ± 0 79 ± 1 8 ± 0 2 ± 0 18 ± 1 0 ± 0 120 ± 1 166

Table 4.1: e+ τ1Cut Flow. t¯t(`, τ) are the expected signal events. t¯t(``0) are t¯t dilepton events with only

one lepton reconstructed as a lepton and a lepton reconstructed as aτ. t¯t(`+jets) are semileptonic events with one lepton and a jet reconstructed as aτ. After the b-jet requirement t¯t → ` + jet is the dominant background.

The number of MC and data events passing each step of the event selection are shown in

Table 4.1 for τ1candidates and Table 4.2 for τ3candidates for the e+ τ channel. The excess of

data over prediction is attributed to the QCD multi-jet background, which is in not included in

the MC simulation since QCD MC suffers from low statistics (especially in the e + τ channel)

and the detector simulation is insufficiently precise to model the rate of jets falsifying isolated leptons (electrons or muons) and tau candidates.

cut t¯t(`, τ) t¯t(`+jets) t¯t(``0₎ _Z_+jets _W_+jets _{t(→ Wb)q} _Total _Data

Isolated e 49 ± 1 442 ± 2 83 ± 1 15627 ± 39 133705 ± 104 16 ± 0 149922 ± 111 320346 Trigger 48 ± 1 441 ± 2 82 ± 1 15514 ± 39 132806 ± 104 16 ± 0 148907 ± 111 315910 ≥ 1 τ candidate 14 ± 0 189 ± 1 29 ± 0 1189 ± 11 8930 ± 24 5 ± 0 10356 ± 26 46406 Njet≥ 2 11 ± 0 179 ± 1 21 ± 0 289 ± 5 903 ± 5 1 ± 0 1404 ± 7 3817 Emiss T > 30 GeV 9 ± 0 142 ± 1 19 ± 0 67 ± 2 612 ± 4 1 ± 0 850 ± 5 1273 P ET> 200 GeV 9 ± 0 141 ± 1 18 ± 0 55 ± 2 456 ± 3 1 ± 0 680 ± 4 917 ≥ 1 bjet 7 ± 0 100 ± 1 13 ± 0 2 ± 0 26 ± 1 1 ± 0 148 ± 1 237

Table 4.2: e+ τ3Cut Flow. t¯t(`, τ) are the expected signal events. t¯t(``0) are t¯t dilepton events with only

Figure 4.2 shows the lepton and Emiss

T transverse mass distribution (Mt) before applying the

b-tagging requirement. The agreement between data and MC is quite good at high Mt, while at

low Mtthere is a significant excess in data. Since the QCD multi-jet background is expected to

have lower Mtvalues, the disagreement between data and MC, which is is larger at the first steps

of the event selection and gets smaller from the selection of at least one tau candidate onward, can be safely attributed to the multi-jet background.

After applying the event selections, the dominant background is W+ jets, followed by t¯t →

` + jet. The b-jet identification highly suppresses the W+ jets background. The dominant back-

ground after all event selections and b-tagging are applied is t¯t → `+ jet, where a jet has been

misidentified as a τ candidate. The t¯t semileptonic background constitutes 66% of the expected

background for τ1and 68% for τ3.

At this stage, the only discerning tool between signal and background is τ identification.

In document Dirigido por: Susana Cabrera Urbán. Tesis Doctoral Junio Facultat de Física Departament de Física Atòmica Molecular i Nuclear (Page 68-76)