5.5 Intermezzo: statistical analysis
5.6.1 The fit model
As said multiple times throughout this work, a profile likelihood ratio fit is performed to data.
The chosen distributions for the signal regions of both the DIL and SL (resolved and boosted) channels are the classification BDT outputs, with the binning of the input distributions optimized in order to maxi- mize the analysis sensitivity. Among the various control regions, only
the CR≥6jt¯t +≥1c and CR5jt¯t +≥1c employ as discriminant the HT variable,
defined as the scalar sum of the pT of all the jets, whereas all the other
ones enter simply as a one-bin distribution, i.e. only the total number of
the events is used as input. The decision of not using the HT distribu-
tion in the other CR is because studies on the blinded dataset showed the presence of pulls and constraints of some NP beyond what is considered to be acceptable.
Only one signal strength parameter common to both channels is used in the fit. The t¯t + ≥ 1b and t¯t + ≥ 1c backgrounds, the two most im-
portant ones, are both assigned a free-floating normalization factor, κttb
and κttc, which are only constrained by the fit to data and are used by
the fit to absorb normalization mismodelling of the corresponding back- grounds. This is necessary due to the discrepancy between the observed data yields and the MC prediction especially in the regions where the t¯t+HF component is predominant, as it is known from previous stud- ies [162, 163] and is visible in Figures 5.18a and 5.18c. In this way the signal extraction will not be biased by a general underestimation of the predicted backgrounds.
The scheme used to incorporate the various sources of systematic uncertainties in the likelihood definition is of equal importance. The origins of these uncertainties include both experimental and theoretical sources, such as the reconstruction and identification of leptons and jets or the modelling of the signal and background processes.
tt+light5j CR 1c ≥ tt+5j CR tt+b5j CR 25j SR 15j SR SRboosted tt+light≥6j CR 1c ≥ tt+≥6j CR tt+b≥6j CR 3≥6j SR 2≥6j SR 1≥6j SR Data / Pred. 0.5 0.75 1 1.25 1.5 Events / bin 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 ATLAS -1 = 13 TeV, 36.1 fb s Single Lepton Pre-Fit Data ttH tt+ light 1c ≥ + tt tt+ ≥1b tt+ V t
Non-t Total unc. ttH
(a) tt+light5j CR 1c ≥ tt+5j CR tt+b5j CR 25j SR 15j SR SRboosted tt+light≥6j CR 1c ≥ tt+≥6j CR tt+b≥6j CR 3 6j ≥ SR 2 6j≥ SR 1 6j≥ SR Data / Pred. 0.5 0.751 1.25 1.5 Events / bin 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 ATLAS -1 = 13 TeV, 36.1 fb s Single Lepton Post-Fit Data ttH tt+ light 1c ≥ + tt tt+ ≥1b tt+ V t
Non-t Total unc. ttH
(b) tt+light3j CR 1b ≥ tt+3j CR tt+light≥4j CR 1c ≥ tt+≥4j CR 3≥4j SR 24j ≥ SR 1≥4j SR Data / Pred. 0.5 0.75 1 1.25 1.5 Events / bin 10 2 10 3 10 4 10 5 10 6 10 7 10 ATLASs= 13 TeV, 36.1 fb-1 Dilepton Pre-Fit Data ttH tt+ light 1c ≥ + tt tt+ ≥1b tt+ V t
Non-t Total unc. ttH
(c) tt+light3j CR 1b ≥ tt+3j CR tt+light≥4j CR 1c ≥ tt+≥4j CR 3 4j≥ SR 2 4j≥ SR 1 4j≥ SR Data / Pred. 0.750.5 1 1.25 1.5 Events / bin 10 2 10 3 10 4 10 5 10 6 10 7 10 ATLASs= 13 TeV, 36.1 fb-1 Dilepton Post-Fit Data ttH tt+ light 1c ≥ + tt tt+ ≥1b tt+ V t
Non-t Total unc. ttH
(d)
Figure 5.18: Comparison of predicted and observed event yields in each of the control and signal regions, in the semilepton (top) and dilepton (bottom) channels before (left) and after (right) the fit to the data.
They can affect both the normalization and shape of the various sam- ples considered in the search, with the exception of the luminosity and cross-section uncertainties, which affect only the normalization. In spite of that, the normalization uncertainties can and do modify the relative fractions of the different samples, which leads to a change in the shape of the final discriminant distribution under consideration.
Individual sources of systematic uncertainty are considered uncor- related, whilst each source has a correlated effect across the boosted, single-lepton and dilepton channels, their regions and their samples. Furthermore, most of the experiment uncertainties are decomposed in several orthogonal components.
Lastly, if a systematic source has an effect less than 0.5% in changing the normalization or the shape of a sample in one region, it is removed for that specific sample and region. This procedure is called pruning and is employed in order to simplify the model and speed up the evaluation time. Detailed comparisons have been carried out and they showed that no difference in the results occurs due to this procedure.
Experimental uncertainties
The uncertainties related to the object reconstruction have been de- scribed in Chapter 3, thus only a brief description will be reported here. The uncertainty on the total integrated luminosity for the combined 2015+2016 dataset is 2.1%. It is derived following a similar methodol- ogy to the one detailed in Ref. [155], from a calibration of the luminos- ity scale using x–y beam-separation scans performed in August 2015 and May 2016.
A variation in the pileup reweighting of MC events is included to cover the uncertainty in the ratio of the predicted and measured inelastic
cross-sections in the fiducial volume defined by MX >13 GeV, where
MX is the mass of the hadronic system [156].
The jet energy scale uncertainty is derived by combining several in- formation and is factorized into eight independent components. Further sources are considered, which are related to the jet flavour composition,
pileup corrections and η-intercalibration, high-pT jets, jet energy reso-
lution and the efficiencies of the pileup suppression cut, as described in Section 3.3, for a total of 21 independent jet-related systematic uncer- tainties.
Calibration correction factors of the efficiencies of the flavour tag- ging algorithm to correctly identify the three flavour components are used in the analysis and the uncertainties on the correction factors are considered as well. The pseudo-continuous b-tagging introduces com- plications due to the use of several working points simultaneously. The b-tagging efficiencies and mis-tag rate are first measured for the four working points separately and later combined in the calibration of the whole MV2c10 discriminant distribution, with care in considering the correlation among the various MV2c10 bins. The uncertainties are later factorized into 30 independent components associated with the b-jet
tagging efficiency, 15 component for the c-jets and 80 for light-jets. Lepton identification, isolation and reconstruction efficiency, as well as trigger efficiencies and lepton momentum scale and resolution, have systematic uncertainties associated with them. These are measured in data, as explained in Section 3.2, and account for a total of 24 indepen- dent sources.
Lastly, uncertainties in the scale and resolution of the missing energy soft term are considered, for a total of three additional sources of sys- tematic uncertainty.
Signal and t¯t modelling uncertainties
Two independent sources of uncertainties are associated with the t¯tH
cross-section: the QCD scale uncertainty and the PDF+αS one [166–
171], for an uncertainty of+5.8%−9.2%(scale) ±3.6% (PDF). In addition, un-
certainties on the theoretical Higgs boson branching fractions are con- sidered, which amount to 2.2% for the b¯b decay mode [166]. The last uncertainty on the t¯tH signal is associated with the choice of the par- ton shower and hadronization model, derived by comparing the nom-
inal prediction to the one obtained with events generated by MAD-
GRAPH5_aMC@NLO interfaced to Herwig++.
A 6% normalization uncertainty is considered for the inclusive t¯t production cross-section at NNLO+NNLL [133], which includes the effects from varying the factorization and renormalization scales, the
PDF, αSand the top quark mass. This is the only systematic uncertainty
that is correlated among the three t¯t + ≥ 1b, t¯t + ≥ 1c and t¯t+ light categories.
The other t¯t modelling uncertainties either affect only one of the three t¯t + jets components or are considered uncorrelated among them, given that the t¯t+ light profits from precise measurements in data, while this is not the case for the other two components. In addition, the mass dif- ference between the b- and c-quark contributes to a difference between the two processes and the flavour scheme used for the PDF: 4FS vs 5FS. The normalizations of t¯t + ≥ 1b and t¯t+ ≥ 1c are allowed to float freely in the fit.
the SHERPA5F one provides the uncertainty associated with the choice of the t¯t inclusive generator for the simulation of the hard scatter, even if it is obtained by actually varying both the generator and the parton shower and hadronization model. In order to have a fair comparison,
the SHERPA5F sample, along with all the other alternative samples,
underwent the same reweighting procedure exposed in Section 5.2.2, i.e. the subcategories of the t¯t + ≥ 1b sample are scaled to match
the predictions of SHERPA4F. Furthermore, the alternative samples are
reweighted in such a way that their t¯t + ≥ 1b and t¯t + ≥ 1c fractions match the one in the nominal sample.
Similarly to what was done for the calibration of Jet Vertex Charge, the parton shower and hadronization model uncertainty is derived by
comparing the nominal POWHEG+PYTHIA8 with the predictions from
POWHEGinterfaced with Herwig7, whereas the uncertainty in the mod- elling of initial and final state radiation is assessed with two alternative
POWHEG+PYTHIA8 samples with “up” and “down” variations. As an
example, Figure 5.19 shows the effect of the generator uncertainty and the uncertainty associated to the choice of parton shower model on the
t¯t + ≥ 1b templates in the SR≥6j1 .
Given the difficulties of describing the t¯t + ≥ 1c background from a theoretical point of view and the poor experimental guidance, an ad hoc uncertainty is applied to this background. The systematic is derived by comparing the nominal sample to an NLO sample of t¯t +c ¯c in the matrix element, including massive c-quarks (effectively a 3F scheme),
produced with MADGRAPH5_aMC@NLO interfaced to Herwig++, as
described in Ref. [182]. This uncertainty is related to the choice be- tween the t¯t +c ¯c ME calculation and the prediction from an inclusive t¯t sample, where the c-jets are mainly produced in the parton shower process.
Due to the importance of the t¯t + ≥ 1b background, several additional specific uncertainties have been considered. Different descriptions of this process can be obtained either with a dedicated NLO calculation of the t¯t + ≥ 1b ME in a 4F scheme generator or with the nominal POWHEG+PYTHIA8 inclusive t¯t (5F) sample. A comparison of the two samples is used to derive the corresponding systematic uncertainty.
t¯t + B and t¯t + ≥ 3b subcomponents of the SHERPA4F sample are taken into account. They are derived by varying parameters internal
to the SHERPAgenerator, as well as by considering two alternative PDF
sets; these uncertainties are the ones contributing to the uncertainty band
shown in Figure 5.3 for the SHERPA4F prediction. Additionally, a 50%
normalization uncertainty is assigned to the t¯t + ≥ 3b process, given that the discrepancy between the 4F and the 5F prediction is not cov- ered by the aforementioned systematics.
Lastly, a 50% normalization uncertainty is considered on the MPI contribution, based on studies of different underlying event sets of tuned parameters, as the fraction of this subcategory is not fixed in the alter- native samples. Classification BDT output 1 − −0.8−0.6−0.4−0.20 0.2 0.4 0.6 0.8 1 Number of events 0 50 100 150 200 250 300 ttb_Gen, tt + ≥ 1b 6j SR1 ≥ + 1 - 1 σσ (-6.9 %) (+6.9 %) Nominal Classification BDT output 1 − −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 [%] Nom. Syst.-Nom. 30 −20 −10 −0 10 20 30 (a) Classification BDT output 1 − −0.8−0.6−0.4−0.20 0.2 0.4 0.6 0.8 1 Number of events 0 50 100 150 200 250 300 ttb_PS, tt + ≥ 1b 6j SR1 ≥ + 1 - 1 σσ (-15.1 %) (+15.1 %) Nominal Classification BDT output 1 − −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 [%] Nom. Syst.-Nom. 40 −30 −20 −10 −0 10 20 30 40 (b)
Figure 5.19: Effect of the generator (a) and parton shower and hadroniza- tion (b) systematic uncertainties on the t¯t + ≥ 1b background template in the SR≥6j
1 signal region.
Modelling of the other backgrounds
Among the non-t¯t processes, the W /Z+jets, single top and fakes back- grounds are the most important ones, even though they represent a mi- nor fraction of the total background.
Two uncertainties are assigned to the W +jets cross-section: an over- all 40% normalization and an additional 30% normalization uncertainty only for events with heavy-flavour jets, which is uncorrelated between
events with two and more than two of such jets. The Z+jets has a 35% uncertainty applied uncorrelated for events with different jet multiplici- ties. These uncertainties are based on variations of the factorization and
renormalization scales, as well as matching parameters in the SHERPA
simulation.
The three cross-sections for the single-top production modes, namely
the s-channel, the t-channel and the Wt-channel, get a +5%−4% uncertainty
each [141–143]. An uncertainty in the amount of interference between Wt and t¯t production at NLO [145] is assessed by comparing the de- fault “diagram removal” scheme to the alternative “diagram subtrac- tion” scheme. The last two uncertainties on the single-top production are related to the choice of parton shower and hadronization model on one side and the amount of radiation on the other, for both the Wt and t-channels, for a total of four systematics. They are evaluated by com- paring the nominal samples with ad hoc samples that use alternative settings in full analogy of what is done for the t¯t sample.
A 50% normalization uncertainty is assumed for the diboson back- ground, which includes both the uncertainty on the inclusive cross- section and additional jet production [150].
A 50% normalization uncertainty is assigned to the overall prediction of the fakes background, uncorrelated between the electron+jets and muon+jets channels, uncorrelated between with regions with 5 and 6 jets and between the resolved and boosted channels. In the dilepton channel, to this background a 25% uncertainty is assigned, correlated across lepton flavours and all analysis regions.
The t¯t+W /Z NLO cross-section prediction uncertainty is 15% [183]. Additional modelling uncertainties related to the choice of the matrix element generator, parton shower and hadronization are evaluated, as usual, by comparing the nominal t¯tV samples to alternative one gen-
erated with SHERPA. A generic 50% normalization uncertainty is as-
signed to the t¯tt¯t background. The backgrounds from tZ, t¯tWW , tH jb and WtH are each assigned two normalization uncertainties related to PDF and scale variations, while to tW Z is assigned one cross-section uncertainty that accounts for both the scale and PDF effects.