Multijet background estimation from the same-sign sample

5.7 Background estimation

5.7.3 Multijet background estimation from the same-sign sample

The multijet background was not simulated with Monte Carlo but instead estimated from control regions in the data.

A common method for constructing a data-driven background model is to scale the data in a control region (or side band) by an appropriate weight, measured from the ratios of events in another pair of control regions. It is often called the “ABCD method”, named for the labels for the four control regions used in the estimate. Essentially, it is the method of applying a single-bin scale factor to a data sample one expects to look like the background to model. It is important that the variables used to select the control regions be largely uncorrelated to give an unbiased model of the background58_{. Examples of uses of the ABCD method are plentiful in ATLAS, especially in} 58_{The variables used to define the ABCD regions need to be uncorrelated for the background sample to be modeled,} but contaminations in the control regions that are not the background of interest can have correlations so long as

Table 5.9: Scale factors for the jet to tau fake rate obtained in Z+ jets events. The fake rate was about 3–7% in the 1-prong case and about 2–3% in the 3-prong case [180].

number of vertices 1-prong medium tau 3-prong tight tau 1,2 0.949±0.220 0.855±0.280 >2 0.626±0.240 1.151±0.436

Table 5.10: The predicted number ofW+ jets events in the signal region after all cuts, comparing estimates from the tau-by-tau scale factor and kW methods [180].

sample µτh channel

tau fake rate scale factors kW

W _→`ν 10.8_±0.8 (stat.)_±2.6 (syst.) 9.3_±0.7 (stat.)_±2.0 (syst.) W _→τ ν 4.1_±1.0 (stat.)_±1.1 (syst.) 3.6_±0.8 (stat.)_±0.8 (syst.)

sample eτh channel

tau fake rate scale factor kW

W _→`ν 6.6_±0.6 (stat.)_±1.6 (syst.) 4.8_±0.4 (stat.)_±1.2 (syst.) W →τ ν 2.0±0.6 (stat.)±0.5 (syst.) 1.5±0.4 (stat.)±0.4 (syst.)

5. Measurement of the Z_→τ τ cross section 140 W →"ν 10.8±0.8 (stat.)±2.6 (syst.) 9.3±0.7 (stat.)±2.0 (syst.)

W →τν 4.1±1.0 (stat.)±1.1 (syst.) 3.6±0.8 (stat.)±0.8 (syst.) Sample electron channel

τfake rate scale factor kW

W →"ν 6.6±0.6 (stat.)±1.6 (syst.) 4.8±0.4 (stat.)±1.2 (syst.) W →τν 2.0±0.6 (stat.)±0.5 (syst.) 1.5±0.4 (stat.)±0.4 (syst.)

Table 16:Numbers of events in the signal region after all cuts and after application ofτfake rate scale

factor orkWfactor. Opposite Sign Same Sign Isolated Non- isolated

A

B

C

D

Figure 15: Schematic diagram of the control regions for the main multijet background estimation

method.

• A: signal region with isolated lepton and the opposite sign requirement 591

• B: control region with isolated lepton and the opposite sign requirement reversed 592

• C: control region with inverted lepton isolation requirement and the opposite sign requirement 593

• D: control region with the opposite sign and isolation requirements inverted. 594

The four regions are illustrated schematically in Figure 15 This method takes advantage of the fact 595

that the signal was composed of almost exclusively isolated leptons whose charges were opposite theτ

596

candidates’ charges, and therefore signal contributions could effectively be excluded in all control regions 597

B, C and D. 598

All four regions had all the same cuts applied except for the opposite sign and isolation requirements, 599

keeping this method simple and reducing the number of systematic uncertainties. In each of the control 600

regions an estimate for the number of QCD events was obtained by correcting for the Z→"",tt and 601

diboson contributions as predicted from MC and for theW _→"νandW _→τνcontributions by correcting 602

the MC predictions using thekWnormalisation factors discussed in Section 8.1: 603

N_QCDi =N_Datai −N_Zi_→_ττ−N_Zi_→_""−N_ti_t_,_diboson−kW(NWi →"ν+NWi →τν), fori=B,C,D (10)

The leptons from the backgrounds W →"ν,W →τνandZ→""were typically very isolated, like 604

the signal, as discussed in Section 6.4. From Monte Carlo estimates this left regions C and D ∼99% 605

QCD pure. These QCD rich regions were used to measure the OS/SS ratioROS S S for QCD, expected to 606

be very close to untity: 607

Figure 4:Schematic diagram of the control regions for the main multijet background estimation method.

control region is defined, passing all cuts but requiring a lepton and aτcandidate of the same sign. The

ratio of opposite-sign to same-sign events,ROS/SS, is calculated in separate control regions of inverted

isolation, after subtracting all non-multijet backgrounds. It was found to be 1.1±0.2(stat.)±0.1(syst.)

for the muon and 1.2_±0.2(stat.)_±0.2(syst.) for the electron channel. The estimate for the opposite-sign

multijet background in the signal region is obtained by scaling the observed number of events in the primary control region with this ratio, after non-multijet background subtraction.

This method is limited by the poor statistics in the primary control region. In the electron channel,

the multijet estimate is 2.7_±2.4(stat.)_±0.7(syst.) events, while for the muon channel is 2.1_±2.4(stat.)_±

0.4(syst.) events are obtained. Thus the number of the estimated multijet events is in statistical agreement

with the estimation obtained from the main method.

6 Systematic uncertainties

Several possible sources of systematic uncertainties on the background estimation have been studied.

The systematic uncertainties can broadly be divided into two categories – those affecting the Monte

Carlo predictions due to the imperfect modelling of the data by the simulations, and those arising from the methods used to perform the data-driven multijet background estimation. For the first category the

τcandidate fake rate is the most important, followed by the energy scale uncertainty. For the multijet

background estimation the statistical uncertainty on the number of events in the control regions turns out to give a larger contribution to the total uncertainty than the systematic uncertainties on the method itself. All of the systematic uncertainties are summarized in Tables 4 and 5.

6.1 Systematic Uncertainties on Monte Carlo Predictions

The systematic uncertainties considered for the Monte Carlo predictions are described in the following.

All of these uncertainties are applied to theZ andttsamples, while only the energy scale uncertainty

is applied on theWsamples, as these have been rescaled as described in Section 5.1 and thus are not

susceptible to the other systematic uncertainties.

Lepton trigger efficiency A systematic uncertainty of 2% is assigned to the muon trigger efficiency in

the Monte Carlo predictions for theZandttbackgrounds to the muon channel, by taking the difference

Figure 5.28: Diagrams of the control regions for two ABCD methods for estimating the multijet background. The figure on the left shows the regions for the primary estimate. The figure on the right shows regions for the cross-check method [180].

first observations and measurements because it can be implemented simply and performs well in low count scenarios by grouping the counts into only four bins to determine the normalization59.

Two complementary ABCD methods, using different control regions, were used to estimate the multijet background. The first method took advantage of the fact that the multijet background is effectively symmetric between the samples with opposite sign (OS) and same sign (SS) charges for the lepton and tau candidate. This property is observed in dijet Monte Carlo samples60 _{as well as}

the data. Then these samples were divided into those that pass or fail lepton isolation requirements, giving the four combinations of regions: _{A, B, C, D_}, shown in Figure5.28. A multijet-rich control region is defined to contain the events that fail the lepton isolation requirements, denoted by the union of regions CD. The OS/SS ratio, ROS/SS, is measured in this control region and applied

as a weight to the SS sample that passes lepton isolation (B), to predict the multijet background normalization in the signal region (A).

Stated more explicitly, the method relies on the assumption that the OS/SS ratio is the same among multijet events with isolated and non-isolated lepton candidates:

NA multijet NB multijet =N C multijet ND multijet .

where N is the number of multijet events in four statistically independent regions, denoted _{A, B, C, D_} and defined as follows:

• A: signal region with isolated lepton and opposite-sign tau candidate

they can be modeled and are preferably small so they can be subtracted. For example, theZ→``background is obviously OS biased, as is theW+ jets.

59_{An other example use of the ABCD method can be found in the first ATLAS}_W_→_{τ ν}_{cross section measure-} ment [189], as discussed briefly in Section4.4.2.

• B: control region with isolated lepton and same-sign tau candidate

• C: control region with non-isolated lepton and opposite-sign tau candidate • D: control region with non-isolated lepton and same-sign tau candidate.

Regions B, C, and D are nearly signal free, and the regions C and D are very multijet pure. The contamination from other electroweak processes is estimated with Monte Carlo and subtracted in each control region:

Nmultijeti =Ndatai −NZi→τ τ −NZi→``,t¯t,diboson−kW(NWi →`ν+NWi →τ ν), fori= B,C,D. In each of the control regions an estimate for the number of multijet events was obtained by correcting for theZ →``, t¯t and diboson contributions as predicted from MC, and forW + jetsW →τ ν contributions by correcting the MC predictions using thekW normalisation factors discussed previ- ously.

The leptons from the backgroundsW _→`ν,W _→τ νandZ_→``are typically very well isolated, likeZ _→τ τ. From Monte Carlo, it is estimated that regions C and D are_≈99% multijet pure. These multijet rich regions were used to measure the OS/SS ratio,ROS/SS, for multijet events:

ROS/SS= NC multijet ND multijet =     

1.07_±0.04 (stat.)_±0.04 (syst.) µτh channel

1.07±0.07 (stat.)±0.07 (syst.) eτhchannel.

As expected it is consistent with 1. This measured ROS/SS was then used to scale the multijet

estimate from region B to give the prediction in region A: NmultijetA = NC multijet ND multijet NmultijetB =ROS/SSNmultijetB .

This yielded the numbers for each region shown in Table 5.11. The expected number of multijet events in the signal region A is

NA multijet=     

24_±6 (stat.)_±3 (syst.) µτhchannel

23_±6 (stat.)_±3 (syst.) eτhchannel.

This gave the normalization of the multijet background estimate in the signal region. The shapes of kinematic distributions for the multijet background were modeled with the SS events in data (region B if following the isolation requirement), corrected for contamination with MC and scaled to this normalization. This model was used as the primary estimate of the multijet background.

In document A Search for New Physics in High-Mass Ditau Events in the Atlas Detector (Page 155-158)