• No results found

Chapter 7 Boosted Decision Trees

7.3 Training

7.3.1 Final Variable List

The output of the procedure described above forms the final variable list, used to train the BDTs. The variables of the list will now be discussed in turn, with care taken to describe any unique properties and where their discriminatory power originates from. The final variable list is summarised in Table 7.1.

mMMCτ τ

The invariant mass of the ditau system, as reconstructed by the MMC tool. The MMC tool is described in Section 5.2.7. Including the ditau mass may bias the BDTs to prefer resonances with a reconstructed mass around 125 GeV. This means the analysis specifically targets the recently discovered boson, rather than being a more general search.

One alternative would be to remove the mass from the BDTs entirely, then make a two dimensional fit to mass and BDT score. While heavily investigated, this approach was found to be significantly less sensitive than the one dimensional fit to BDT score. The primary reason for this is that the mass information allows the BDTs to more readily distinguish between signal and the irreducible background of Z→τ τ.

The variable distribution for the VBF and boosted categories can be seen in the upper-left plot of Figure 7.1 and Figure 7.3 respectively.

mT

The transverse mass of the light lepton and theETmiss. This was defined above with the formula

mT =

q

2p`TETmiss(1−cos ∆φ), (5.3) where ∆φis the gap inφ between the lepton andETmiss.

The variable exhibits particularly good separation between signal events and W → `ν events where a jet fakes the hadronic tau. In such events, theETmiss and lepton both originate from the decay of theW±boson. This means ∆φis small and hencemT is large. For signal events theETmiss is made up of three neutrinos from

two tau decays, meaning the angle between the ETmiss and the lepton is typically larger and themT is reduced.

The variable distribution for the VBF and boosted categories can be seen in the upper-right plot of Figure 7.1 and Figure 7.3 respectively.

∆Rτ `

The angular separation between the tau and the lepton, defined using the formula

∆R=p∆η2+ ∆φ2. (7.1)

For resonance decays, such asH→τ τ andZ →τ τ, ∆Rτ ` tends to be small.

This is because the tau and lepton originate from the same parent particle. No such restriction applies to the other backgrounds, so these events are more spread across the ∆Rτ ` spectrum. This provides reasonable separation between signal events and

the fakes background, for both the VBF and boosted categories.

The variable distribution for the VBF and boosted categories can be seen in the middle-left plot of Figure 7.1 and Figure 7.3 respectively.

ETmiss φ Centrality

The use of continuous variables is preferred over boolean flags as they give the BDTs improved discriminatory powers.

TheETmissφcentrality has a maximum value of√2 when theETmissis directly between the lepton and τ in φ. It is equal to 1 if the ETmiss has a φ coordinate identical to the lepton orτ and is<1 elsewhere.

The equation forETmiss φcentrality (Cφ) is A= sin (φE miss T −φτ) sin (φ`−φτ) , (7.2a) B= sin (φ` −φEmiss T ) sin (φ`−φτ) , (7.2b) Cφ= A+B √ A2+B2. (7.2c)

InH→τ τ (andZ→τ τ) decays, theEmissT will typically be located between the hadronic tau and the lepton. This is because the two tau decays, which are the origin of the hadronic tau and lepton, both contribute to theETmiss. Therefore the ETmissφcentrality tends towards high values for signal events. This is especially true for events in the boosted category. Therefore this variable gives separation between the signal and the non-resonant backgrounds.

The variable distribution for the VBF and boosted categories can be seen in the middle-right plot of Figure 7.1 and Figure 7.3 respectively.

Dijet Variables (mj1,j2, ηj1×ηj2 and ∆η(j1, j2))

VBF events are characterised by two high-pT jets in opposite halves of the detector.

In VBF events, two quarks each emit a vector boson and the two bosons fuse to produce the Higgs. The two quarks are from protons travelling in opposite directions in the LHC and are the source of the two high-pT jets. This origin helps explain the

high momenta of the jets as well as their positions in opposite detector hemispheres. The three dijet variables aim to quantify such features. They tend to take on more extreme values for VBF signal events, providing reasonable discrimination against all backgrounds. The dijet mass (mj1,j2) andη gap (ηj1×ηj2) are typically

large while theη product (∆η(j1, j2)) tends towards large negative values.

The variable distributions for the VBF category can be seen in the lower-left and lower-right plot of Figure 7.1 and the upper-left plot of Figure 7.2.

` η Centrality

The use of continuous variables is preferred over boolean flags as they give the BDTs improved discriminatory powers.

The ` η centrality is at a maximum value of 1 when the lepton is directly between the two leading jets inη. This drops to 1/ewhen the lepton is aligned with

one of the two leading jets and decreases further elsewhere. The exact formula is Cη = exp −1 ηj1−(ηj1+2ηj2)2 η`− ηj1+ηj2 2 2! . (7.3)

VBF signal events typically have higher values, giving some separation against all backgrounds. This is because the Higgs, and hence the lepton, is generally pro- duced centrally in the detector in VBF events. This is coupled with the fact that the two jets are expected to be in opposite halves of the detector. These two factors increase the chance that the lepton will be located between the two jets inη.

The variable distributions for the VBF category can be seen in the upper- right plot of Figure 7.2.

ptotalT

The magnitude of the vector sum of the pT of the tau, lepton and two leading

jets plus the ETmiss. The combination of the tau and lepton pTs and the ETmiss

approximately reconstructs the Higgs candidate. Coupled with the two leading jets, this accounts for all the constituents of VBF production. Hence, thepT of the Higgs

and two jets should balance and the vector sum should be around zero. This gives separation between VBF events and the background processes.

The variable distributions for the VBF category can be seen in the lower plot of Figure 7.2.

P|

pT|

Sum of the pTs of the tau, lepton and jets (if any) of the event. This is effectively

a measure of the total activity in the event and is hence expected to be higher for events where the Higgs is boosted by a recoiling jet. The main sensitivity of this vari- able is due to its correlation with ∆Rτ `. More heavily boosted events (and therefore

with a higher P|

pT|) have a lower ∆Rτ `, as the boost makes the decay products

more collimated. However, the Higgs has a higher mass then the Z0 boson and this additional energy in Higgs decays decreases the collimating effect. This allows signal events to be more readily distinguished from the Z → τ τ background. The other backgrounds typically have fewer high-pT objects, providing some additional

separation.

The variable distributions for the boosted category can be seen in the lower- left plot of Figure 7.3.

p`T/pτT

ThepT ratio of the lepton and tau. In aH→τ`τhadevent, the hadronically decaying

tau’s signature is a hadronic jet and a neutrino. The leptonically decaying tau produces a lepton alongside two neutrinos. The additional neutrino in the leptonic case suggests that thispT ratio should typically be slightly below one. This feature

provides separation between signal events and the reducible backgrounds.

The variable distributions for the boosted category can be seen in the lower- right plot of Figure 7.3.