Hypothesis testing - Statistical methods - Search for charged Higgs bosons decaying into top an

Statistical methods

2.1 Hypothesis testing

Searching for new phenomena generally involves the comparison of distributions extracted from real data with distributions obtained from an idealised model. It is therefore necessary to define a figure of merit to quantify the agreement between the two. The approach followed in particle physics is called hypothesis testing. The method relies on the definition of a certain number of statistical hypotheses: a null-hypothesis H₀, which is generally associated to the absence of new physics, and one or more alternate hypotheses Hµthat take into account BSM

interactions, like the charged Higgs boson production. A single alternate hypothesis can be used when the BSM process is fully defined by the model under investigation. However, this analysis makes no assumption regarding the cross-section of the charged Higgs boson production. It is therefore necessary to test a variety of signal hypotheses, for all possible cross-sections. This is achieved by defining the signal strength:

µ = σx σref

, (2.1)

where σ_x can be any cross-section and σ_ref is a reference value, typically 1 pb. With this definition, Hµ can be imagined as a continuous spectrum of signal hypotheses that will

approach to the background-only hypothesis H₀ when σ_x is close to 0.

The level of agreement between data and predictions for a given signal strength is quantified by computing the p-value, p_µ. The p-value corresponds to the probability of observing a deviation from Hµthat is more extreme than the one measured with data, assuming that Hµ

Statistical methods

is true. When pµ is lower than a certain threshold, the hypothesis should be rejected. The p-value is defined as:

pµ=

∫ ∞

tobs

f (t|Hµ)dt, (2.2)

where t is a test statistic, i.e. a function related to a given observable of the analysed sample, and f (t|Hµ) is the distribution predicted for such function. tobs is the value of the test statistic measured on data. The significance Z = Φ−1(1 − pµ) is often preferred

to the p-value to quantify the level of disagreement between data and predictions (Φ is the cumulative distribution of a Standard Gaussian). The significance corresponds to the quantile of the distribution, i.e. the number of standard deviations at which Φ is equal to 1 − p_µ. In particle physics, the common approach is to require an observed significance of at least Z = 5 to reject the background-only hypothesis and Z = 1.64 to reject an alternate hypothesis. These correspond to p₀ = 2.87 · 10−7 and p_µ = 0.05 respectively. A schematic representation of the relation between the p-value and the test statistic is given in Figure 2.1.

Figure 2.1: Graphical representation of the p-value obtained by the measurement of a test statistic t for a certain hypothesis H_µ.

A powerful test statistic can be defined by using likelihood functions. Given a binned distribution of a kinematic variable for which both measured and expected results are available, the likelihood L is defined as:

L(µ, θθθ) = N ∏ i=1 (µs_i(θθθ) + bi(θθθ))ni ni! e−(µsi(θθθ)+bi(θθθ)) ∏ θj∈θθθ P (θj). (2.3)

The first term is the product of the Poisson probability density functions for all bins i, where N is the total number of bins, each with ni measured entries. The expected number

of entries is given by the sum of the expected number of background entries bi and signal

entries s_i (normalised to 1 pb). The signal is scaled by the signal strength µ. s_i and b_i depend on θθθ, the array of the nuisance parameters (NPs). Such parameters quantify the

Hypothesis testing

impact of the systematic uncertainties related to the signal and background modelling, as well as experimental uncertainties coming from the detector response and resolution. The second term is the product of the probability density functions P (θj) of all the NPs, called

penalty terms. Gaussian PDFs are generally used for systematic uncertainties that can assume both positive and negative values, Log-Normal distributions for cross-section uncertainties, which must be positive, and Gamma PDFs for the statistical uncertainties associated to the bins of the kinematic variable distribution [84]. As their name suggests, they penalise large deviations of the uncertainties from their nominal values.

The optimal µ and θθθ’s, as well as their errors, are not known a-priori and need to be extracted

from a fit of the predicted distributions to the observed data, maximising the likelihood. The agreement between data and predictions for a specific signal strength can be obtained from the likelihood ratio1:

λ(µ) = L(µ,

ˆ ˆ

θ)

L(ˆµ, ˆθ), (2.4)

whereˆˆθ is the array of NPs that maximises the likelihood for the considered value of µ, while

µ and ˆθ are unconditional values that maximise L. Although λ(µ) is already a suitable test

statistic, it is commonly rearranged as:

tµ= −2lnλ(µ). (2.5)

With this formulation, a good agreement between data and Hµ corresponds to values of tµ

close to 0. Depending on the purpose of the measurement, the test statistic can be subject to further modifications. For example, the version used in this analysis to set upper limits on the cross-section of the H+ production is:

˜ tµ= ⎧ ⎨ ⎩ −2ln˜λ(µ) µ < µˆ 0 µ ≥ µˆ = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ −2lnL(µ,ˆˆθ(µ)) L(0,ˆˆθ(0)) µ < 0ˆ −2lnL(µ,ˆˆθ(µ)) L(ˆµ,ˆθ) 0 < ˆµ < µ . 0 µ ≥ µˆ (2.6)

With respect to Eq. 2.5, ˜tµ assumes a positive signal and therefore negative values of ˆµ are

treated as ˆµ = 0. Furthermore, the test statistic is set to 0 for ˆµ ≥ µ because there is

no interest in considering signal hypotheses for which the signal strength is lower than the observed value.

As already mentioned, a signal hypothesis would be conventionally excluded when the cor- responding p-value is reasonably small (< 0.05). This approach is however not appropriate

Statistical methods

when signal and background are poorly separated, because negative fluctuations of the background could lead to the rejection of valid signal hypotheses. This is particularly true if the expected number of signal events is small. A different quantity is therefore used for this purpose:

CLs= pµ p0

. (2.7)

A signal hypothesis is rejected when CLs < 0.05, avoiding the exclusion of areas of the phase

space in which both p_µ and p₀ are small. Signal strength values for which CL_s = 0.05 define the 95% confidence level (CL) limits [86].

Fitting the likelihood function to data leads to the observed limit. It is however important to be able to predict the expected limit either in a background-only hypothesis, or for a given signal model, before looking at the observed data. When the size of the sample under analysis is large enough, both the likelihood ratio and the distribution of the test statistic can be approximated to some analytical form. A single dataset, named Asimov, can then be used to extrapolate the median sensitivity of the measurement. This dataset is created in such a way that, when used to extrapolate the parameter estimators, their "true" value is obtained, suppressing statistical fluctuations [83]. This corresponds to setting the content of each bin to the expected amount µsi + bi, the nuisance parameters to 0 and the signal

strength to the value of interest (0 for the background-only case). The general procedure to set upper limits on a given signal hypothesis is to first compute the expected limit under the background-only hypothesis, and then compare such result with the observed limit computed on data, eventually quantifying the deviation between the two.

In document Search for charged Higgs bosons decaying into top and bottom quarks with single-lepton final states using pp collisions collected at a centre-of-mass energy of 13 TeV by the ATLAS detector (Page 45-48)