Discriminating Variables - Boosted Decision Trees

4.5 Boosted Decision Trees

4.5.3 Discriminating Variables

Variables describing basic-object kinematics, composite-object kinematics, as well as angular correlations and event topology are considered as input variables for the BDT-classifier train-ing. 37 variables have been used in a previous iterations of this analysis [21, 107], which them-selves are partially inspired by the choice in [225, 226]. In this analysis, the set of variables is further reduced to 11 well discriminating variables. This subset is chosen such that the per-formance of the overall signal-to-background separation of the BDT classifier remains nearly constant while the analysis itself becomes less complex.

This subsection is organized as follows. In the first paragraph, a short introduction into an indicator of signal-to-background-separation power is given. After that, all 11 input variables which are used for the BDT training are discussed. Finally, all input variables and their separa-tion power are summarized in table 4.6.

Definition of separation power The Receiver operating characteristic (ROC) curve (cf.

fig. 4.13) illustrates the performance of a binary decision criterion at various thresholds of a particular variable. In the context of signal-to-background separation, the ROC is evaluated in terms of “signal efficiency” vs. “background rejection”. The binary decision criteria are “value is larger” or “value is less” a certain threshold.

The area under the receiver-operating-characteristic curve (AUC) serves as a key indicator that represents the “separation power” of a particular variable or classifier output. The values of the AUC lie within the interval [0.5, 1.0]. A value of 0.5, equivalent to the bisecting line in the signal-efficiency vs. background-rejection plot (cf. fig. 4.13), means no separation power at all, and AUC→ 1.0 for well-separating variables.

The following should be kept in mind when using the AUC as a performance indicator.

The AUC is insensitive to symmetric variables. However, BDTs is sensitive to symmetric vari-ables during tree building. In order to calculate meaningful AUC values for the (symmetric) pseudo-rapidity distributions, the absolute distributions|η| are taken (cf. table 4.6). In the fol-lowing figures, the AUC are always calculated w.r.t. the shown distribution, i.e. no additional transformations are applied. The AUC is calculated w.r.t. the sum of (weighted) background contributions. This makes it in particular insensitive to variables in which the dominant back-ground contributions, tt and W-boson-plus-jets events, have shapes that “envelope” the signal contribution. As an example, the sum of the transverse energies of all jets (fig. 4.19) appears

Signal efficiency

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Background rejection

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

MVA Method:

BDT 82.12%

Figure 4.13: Exemplary receiver-operating-characteristic curve as obtained for a BDT training with TMVA [207]. The area under the curve is used as a measure of separation power.

to be a less powerful variable with an AUC of 50− 55%. However, tt events have a harder spectrum than t-channel-signal events, and an even much harder spectrum than W-boson-plus-jets events. A variable with an overall low performing AUC might still separate very well in tree nodes in which the background contribution is dominated by one particular process.

Discussion of input variables The distinctive feature of t-channel production, the pseudo-rapidity of the light-jet hypothesis, is one of the most discriminating variables (fig. 4.14). Sig-nal events have a jet that can be close to the beamline and that most probably has a pseudo-rapidity of|η| ≈ 2.5. Jets from background events mostly are in the central part of the detector.

Their η distribution peaks at η≈ 0.

In t-channel-signal events, the light-jet candidate often also carries a large amount of trans-verse momentum, because it balances the heavy top quark. The jet originating from the b quark of the top-quark decay usually has a huge amount of pTdue to the large top-quark mass, too.

Thus, the leading jet is still often close to the beamline as well (fig. 4.15). In combination with the light-jet η distribution, the leading-jet η adds valuable information about the pT ordering of all jets in an event.

112

4.5 Boosted Decision Trees Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.14: Shape comparison of the pseudo-rapidity η distribution of the light jet in the elec-tron (left) or muon (right) “2 jets, 1 btag” signal category. Simulated background contributions are normalized to the SM prediction, and signal events are normal-ized to the same area.

η Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.15: Shape comparison of the leading-jet η distribution in the “2 jets, 1 btag” signal cat-egory for events with electron (left) or muon (right) final states. Simulated back-ground contributions are normalized to the SM prediction, and signal events are normalized to the same area.

Another input variable is the cosine of the angle between the reconstructed W boson and W-boson-plus-leading-jet system(cf. [227]),

cos^∗(W boson, leading jet) := cos

p_W^(P^W^+P^jet1⁾, (~p_W+ ~pjet1)^lab

. (4.12)

Here, jet1 refers to the leading jet, and W refers to the W boson. ~p_i is the momentum vector of particle i. Pi is the four-momentum of particle i. lab refers to the laboratory frame, and P_W+ Pjet1is the rest frame of W-boson plus leading jet. If both particles are back-to-back, and

|~pW| > |~pjet1| then cos^∗→ 1, while cos^∗→ −1 for |~pW| < |~pjet1|. If the W-boson movement (~pW), in the center-of-mass system of W boson and leading jet, is orthogonal to the movement of the center-of-mass system in the lab frame, then cos^∗ → 0 [227]. Thus, this variable is sensitive not only to the relative directions of W boson and leading jet, but also to their absolute-momentum ordering.

The angle between the between the reconstructed W boson and W-boson-plus-leading-jet system is a kind of variable that describes the event topology. It well separates between signal and background processes. Figure 4.16 shows the resulting distribution for simulated events.

The distribution shows a distinctive peak at cos^∗→ −1 for signal events and a much more even distribution for background events. About 33% of all t-channel events peak at cos^∗ ≈ −1 with a steeply falling spectrum to 0 and nearly constant spectrum between 0 and 1. The spectrum for W-boson-plus-jets events is much more smooth, with about 15% of all events peaking at cos^∗ ≈ −1, while the spectrum is nearly flat for tt events. The angular correlation mostly vanishes for tt events since two high-pT jets arise from the b quark in the top-quark decay, as well as due to combinatorics of the two top quarks (and two W bosons). A small “bias” to low values occurs due to the jet pTordering used in the definition of cos^∗.

cos*(W boson, leading jet)

-1 -0.5 0 0.5 1

Events / 0.08

0 100 200 300 400 500

AUC: 73.2%

= 7 TeV s

-1 at 1.56 fb Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

cos*(W boson, leading jet)

-1 -0.5 0 0.5 1

Events / 0.08

0 0.2 0.4 0.6 0.8 1 1.2 1.4

103

AUC: 69.8%

= 7 TeV s

-1 at 1.17 fb Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.16: Shape comparison of the cosine of the angle between the reconstructed W boson and W-boson-plus-leading-jet system. Shown are events with electron final states (left) or muon final states (right) in the “2 jets, 1 btag” signal category.

114

4.5 Boosted Decision Trees

Another variables that describes the topology of an event is the sphericity S. The following definition is given in ref. [226]. The sphericity tensor S^αβ(3×3) is defined as

S^αβ = P

jets, l^±p^α_ip^β_i P

jets, l^±|~pi|² α, β = x, y, z, (4.13) in which the sums take into account all reconstructed (accepted) jets in an event and the charged lepton. p^α_i refers to the α-component of the momentum vector of particle i. The normalized eigenvalues of the sphericity tensor, λ1, λ2, and λ3, are calculated and sorted in descending order

λ₁ ≥ λ2≥ λ3 with λ1+ λ₂+ λ₃ = 1. (4.14) The sphericity S is a linear combination of the eigenvalues λ2and λ3and is calculated as

S = 3

2(λ₂+ λ₃) with 0≤ S ≤ 1. (4.15)

The expected sphericity distribution for simulated events at a center-of-mass energy of

√s = 7 TeV is shown in fig. 4.17. The energy of t-channel events is mostly clustered in one direction, they are highly spherical. In background processes like tt production, the energy flow is more spherically and more regularly distributed in all three space dimensions than in t-channel events.

Sphericity

0 0.2 0.4 0.6 0.8 1

Events / 0.04

0 100 200 300 400 500 600

AUC: 69.4%

= 7 TeV s

-1 at 1.56 fb Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Sphericity

0 0.2 0.4 0.6 0.8 1

Events / 0.04

0 0.2 0.4 0.6 0.8 1 1.2

103

AUC: 69.9%

= 7 TeV s

-1 at 1.17 fb Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.17: Shape comparison of the sphericity distributions in the “2 jets, 1 btag” signal cat-egory for events with electron (left) or muon (right) final states. Simulated back-ground contributions are normalized to the SM prediction, and signal events are normalized to the same area.

The next category of input variables consists of variables that are related to the hadronic activity in the event. These are the sum of energies of all jets,

H(jets) =X

jets

Ei,

the sum of transverse energies of all jets

H_T(jets) =X

jets

E_T,i,

the angular separation between the two leading pTjets,

∆R(jet1, jet2) = q

(ηjet1− ηjet2)²+ (φjet1− φjet2)²,

and the mass of the Hadronic final state (HFS), i.e. the mass of the composite N -jets system, mass(HFS) = mass(X

jets

P_i).

The mass of the HFS is also referred to as “dijet mass” in case of categories with exactly two jets.

Since the light jet is close to the beamline in t-channel events, the separation power of the variables related to the hadronic activity is enhanced. Jets in signal events tend to have higher energies for signal events as described by the H(jets) distribution in fig. 4.18.

Furthermore, higher dijet masses (mass(HFS)) are generated in t-channel events than in background events (cf. fig. 4.21). The dijet mass is highly correlated to the other variables de-scribing the hadronic activity, e.g. to the angular separation ∆R. These correlations are taken into account by the BDT training. Additional separation power is gained from these correla-tions, as long as those correlations are different between signal and background processes. This is the case for variables which are related to the hadronic activity, since their separation power is driven by the forward jet in t-channel events.

The angular separation between both jets (∆R(jet1, jet2)) is much broader for signal events than for background events (cf. fig. 4.20). The distribution peaks for both signal and back-ground events at ∆R ≈ 3; the cut-off at 0.5 is due to the jet-clustering parameter. W-boson-plus-jets events have a smaller second peak at ∆R ≈ 0.8. These jets are expected to be two narrow jets originating from a radiated gluon.

Figure 4.19 refers to the H_T(jets) distribution, which is –at a first glance– not very well sep-arating. However, jets in tt events usually have more transverse energy than jets in t-channel events, and even much more transverse energy than jets in W-boson-plus-jets events. Thus, only the weighted background contribution is balanced against signal events, while individual processes can be separated from each other.

116

4.5 Boosted Decision Trees

E (TeV) Σ jets:

0 0.5 1 1.5 2

Events / 0.08 TeV

0 Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

E (TeV) Σ jets:

0 0.5 1 1.5 2

Events / 0.08 TeV

0 Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.18: Shape comparison of the sum-of-energies distributions of all jets for events in the electron (left) or muon (right) “2 jets, 1 btag” signal category.

(GeV) Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.19: Shape comparison of the sum-of-transverse-energies distributions for events in the electron (left) or muon (right) “2 jets, 1 btag” signal category.

R (jet1, jet2) (GeV)

0 1 2 3 ∆4 5 6 7

Events / 0.23 GeV

0 Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

R (jet1, jet2) (GeV)

0 1 2 3 ∆4 5 6 7

Events / 0.23 GeV

0 Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.20: Shape comparison of the angular separation ∆R between the two leading jets for events in the electron (left) or muon (right) “2 jets, 1 btag” signal category. Simu-lated background contributions are normalized to the SM prediction, and signal events are normalized to the same area.

dijet mass (TeV)

0 0.2 0.4 0.6 0.8 1

Events / 0.04 TeV

0 Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

dijet mass (TeV)

0 0.2 0.4 0.6 0.8 1

Events / 0.04 TeV

0 Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.21: Shape comparison of the dijet-mass distribution for events in the electron (left) or muon (right) “2 jets, 1 btag” signal category. Simulated background contributions are normalized to the SM prediction, and signal events are normalized to the same area. Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.22: Shape comparison of the lepton-pT distribution between signal and background events in the electron (left) or muon (right) “2 jets, 1 btag” signal category. Simu-lated background contributions are normalized to the SM prediction, and signal events are normalized to the same area.

The transverse momentum of the lepton (fig. 4.22) is much softer in t-channel events than in W-boson-plus-jets or tt events. Leptons from tt processes have the hardest transverse mo-mentum spectrum, they are highly boosted in the transverse plane.

The reconstructed b-tagged-top-quark mass (fig. 4.23) peaks for both t-channel and tt events at ≈ 170 GeV/c², while it peaks much lower at ≈ 140 GeV/c² for W-boson-plus-jets events.

The mass distribution of the reconstructed top-quark candidate has a broad tail to high recon-structed masses for tt and W-boson-plus-jets events. For t-channel events, the distribution is much more narrow, as mostly a correct combination of W boson and b-tagged-jet hypothesis is picked in these events.

The best-top-quark-mass (fig. 4.24) distribution is biased, by definition, to be close to the in-put value of 172.0 GeV/c²for all processes. The obtained distribution is more narrow than the mass distribution of the b-tagged-top-quark candidate. This best-top-quark candidate

appar-118

4.5 Boosted Decision Trees

btagged-top-quark mass (GeV)

100 200 300 400 500 600

Events / 20 GeV Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

btagged-top-quark mass (GeV)

100 200 300 400 500 600

Events / 20 GeV Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.23: Shape comparison of the reconstructed b-tagged-top-quark-mass distributions for events in the electron (left) or muon (right) “2 jets, 1 btag” signal category. Simu-lated background contributions are normalized to the SM prediction, and signal events are normalized to the same area.

ently is less powerful to reconstruct a meaningful top-quark mass, but it provides important information about the assignment of the correct jet hypothesis due to its high correlations to the b-tagged top-quark mass and other jet variables.

best-top-quark mass (GeV)

100 200 300 400 500 600

Events / 20 GeV Electron, 2 jets, 1 btags

t-channel (x 6.4)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

best-top-quark mass (GeV)

100 200 300 400 500 600

Events / 20 GeV Muon, 2 jets, 1 btags

t-channel (x 5.0)

; s-channel, tW t

W/Z + jets, Diboson QCD Multijet

Figure 4.24: Shape comparison of the reconstructed best-top-quark mass distributions, i.e. the mass reconstructed with the jet that yields a mass closest to 172.0 GeV/c², for events in the electron (left) or muon (right) “2 jets, 1 btag” signal category. Simulated background contributions are normalized to the SM prediction, and signal events are normalized to the same area.

Correlations among input variables Figure 4.25 shows the linear correlation coefficients among all BDT-input variables. Most variables are moderately correlated or even uncorre-lated among each other. Meaningful correlations exist between the reconstructed top-quark-mass hypotheses and among the jet-separation variables (P

ET,P

E, ∆R). Furthermore, the sphericity is (anti)-correlated to the dijet mass, angular separation between both jets and sum of energies of both jets, and cos^∗(W boson, leading jet). cos^∗(W boson, leading jet) is also cor-related to the dijet mass and sum of energies of both jets. Light-jet η and leading-jet η are correlated as well.

Linear correlations among variables for background events are in the same ballpark as for signal events, with a modestly diversified pattern, which partially can be explained due to the more complex alternation of jet hypotheses for background events.

-100

dijet massbest-top-quark massb-tagged-top-quark mass η leading-jet

light-jet (jets)HηT H(jets) R(jet1, jet2)∆ Sphericity

Lepton pcos*(W boson, leading jet) dijet mass

cos*(W boson, leading jet)

100

Linear correlation coefficients [%]

-100

dijet massbest-top-quark massb-tagged-top-quark mass η leading-jet

light-jet (jets)HηT H(jets) R(jet1, jet2)∆ Sphericity

Lepton pcos*(W boson, leading jet) dijet mass

cos*(W boson, leading jet)

100

Linear correlation coefficients [%]

Figure 4.25: Linear-correlations coefficients for BDT-input variables for signal (left) and back-ground events (right). The coefficients are exemplarily shown for events in the muon “2 jets, 1 btag” category.

120

4.5 Boosted Decision Trees

Summary of discussion of input variables All 11 input variables and their separation power are summarized in table 4.6. The most discriminating variables, according to the measure in units of [AUC], are the pseudo-rapidity of the jet that is closest to the beamline in an event, the cosine of the angle between the reconstructed W boson and the W-boson-plus-leading-jet system (cos^∗(W boson, leading jet)), and the sphericity of the event. For both signal and background events, most of the used input variables are linearly uncorrelated among each other, or they have relatively low correlation coefficients. Input variables that are related to the hadronic activity in an event are correlated among each other, but they add significant discrimination power to the BDT training even due to these correlations.

Input variable

Performance [AUC in %]

“2 jets, 1 btag” “3 jets, 1 btag”

e µ e µ

η of the light jet 73.0 73.3 74.8 74.9

Cosine of the angle between the

73.2 69.8 73.4 71.1 rec. W boson and W-boson-plus-leading-jet system

Sphericity 69.4 69.9 72.8 74.0

Sum of the energies of all jets 67.9 70.7 70.9 72.0 Dijet mass of the b-tagged-jet plus light-jet candidates 65.3 67.9 70.4 71.0 Angular separation ∆R between leading two jets 66.6 66.8 68.5 67.8

η of the leading jet 60.2 59.1 61.8 60.4

Mass of the b-tagged-top-quark candidate 64.8 56.4 59.9 58.0

Lepton pT 61.7 56.9 59.8 57.3

Sum of the transverse energies of all jets 55.0 50.8 55.9 54.3 Mass of the best-top-quark candidate 55.4 51.5 54.4 53.5 Table 4.6: Input variables used for the Boosted-Decision-Tree trainings. Separate Boosted

Deci-sion Trees are trained for the electron “2 jets, 1 btag”, muon “2 jets, 1 btag”, electron

“2 jets, 1 btag”, and electron “3 jets, 1 btag” categories.

In document Measurement of the t-channel single top quark production cross section and the CKM matrix element V tb with the CMS experiment (Page 117-127)