Bayesian Evaluation - Statistical Inference

4.6 Statistical Inference

4.6.1 Bayesian Evaluation

The Bayesian approach to interpret measured data in terms of one or more parameters of inter-est is well described in the literature. In particular, this section refers to the descriptions given in ref. [11, 205, 208, 209, 228, 229].

General formulation of Bayes’ theorem Bayes’ theorem relates the conditional probability P (A|B), i.e. the probability to observe A given B,

P (A|B) = P (B|A) × P (A)

P (B) , (4.16)

with the conditional probability P (B|A), i.e. the probability to observe B given A, and the

“degree of belief” in A, which is referred to as P (A), and degree of belief in B, which is referred to as P (B) [209]. B refers to measured data and is fixed a priori with constant P (B) 6= 0. For a total number of events S, which are divided into A1, . . . , Anexclusive sets, and in which B is any event or subset of S, the probability to observe any set Axgiven B becomes [209, 228]

P (Ax|B) = P (B|Ax)× P (Ax) P

P (B|Ai)× P (Ai). (4.17)

124

4.6 Statistical Inference

Parametric, continuous distributions In the following, Bayes’ theorem is applied to para-metric, continuous distributions. The probability P (Ax|B) is identified with the posterior-probability-density function P (µ|~x) for the parameter of interest µ w.r.t. measured data ~x, and is given by [209]

P (µ|~x) = R P (~x|µ) × π(µ)

P (~x|µ) × π(µ) dµ. (4.18)

In particular, the outcome of the experiment with measured data ~x depends on the parameter µ, which is unknown a priori.

~x can be either a single measurement or a set of data points. The parameter µ is referred to as the “parameter of interest”, i.e. the parameter that is about to be estimated. In principle, µ can be a vector of parameters, however, only one parameter of interest is necessary for this analysis, and µ is of dimension one without loss of generality.

In case of continuous distributions, the addition in eq. 4.17 becomes an integral over the pa-rameter of interest µ. The probability P (B|A^x) becomes a probability-density function P (~x|µ) to obtain a certain measurement ~x for a given parameter of interest µ (cf. [11, 209]). In particu-lar, P (~x|µ) encodes the outcome of the experiment or analysis under a set of known parameters (cf. [209]).

For a fixed set of data points ~x, P (~x|µ) becomes the “likelihood function” L(µ|~x), which is no longer a probability-density function (cf. [209])

L(µ|~x) = P (~x|µ). (4.19)

The likelihood function L(µ|~x) is a function of the parameter µ for a fixed ~x, and is characteristic to the experiment or analysis⁵. Thus, the posterior-probability-density function P (µ|~x) is given by (cf. [209, 228])

P (µ|~x) = R L(µ|~x) × π(µ)

L(µ|~x) × π(µ) dµ. (4.20)

The construction of the likelihood function is discussed in detail in the next subsection (4.6.2).

π(µ) refers to the prior-probability distribution for the parameter of interest µ, i.e. the knowl-edge about µ before the actual measurement is performed (cf. [11, 209]).

More generally, the Bayes theorem relates the probability of a specific theory given measured data with the prior probability about the theory and the predicted outcome of the experiment based on a specific theory [11, 209]

P (theory|data) ∝ P (data|theory) × π(theory). (4.21) The statistical model of this analysis, described by the likelihood function, depends not only on the parameter of interest µ, but on a number of additional nuisance parameters

~θ = (θ1, ..., θn). Then, also the prior-probability and posterior-probability distributions depend on the nuisance parameters ~θ, and eq. 4.20 becomes

P (µ, ~θ|~x) = L(µ, ~θ|~x) × π(µ, ~θ) R L(µ, ~θ|~x) × π(µ, ~θ) d~θ dµ

= L(µ, ~θ|~x) × π(µ) × π(~θ) R L(µ, ~θ|~x) × π(µ) × π(~θ) d~θ dµ.

(4.22)

5The arguments of L(µ|~x) are interchanged w.r.t. P (~x|µ) to emphasize that L is a function of µ.

Here, the prior-probability distributions for the nuisance parameters are referred to as π(~θ).

If ~θ and µ are independent of each other, which is the case in this analysis, the joint prior-probability distribution of ~θ and µ factorizes

π(µ, ~θ) = π(µ)× π(~θ). (4.23)

A posterior-probability distribution p(µ|~x) that is independent of ~θ is obtained by integrating over all nuisance parameters ~θ (cf. [209])

P (µ|~x) = Z

P (µ, ~θ|~x) d~θ

= 1 C ×

L(µ, ~θ|~x) × π(µ) × π(~θ) d~θ

∝ Lm(µ|~x) × π(µ).

(4.24)

C is a constant that normalizes P (µ|~x) to a probability-density distribution and can be omitted in this analysis. The marginal-likelihood function Lm(µ|~x) is defined as (cf. [209])

Lm(µ|~x) = Z

L(µ, ~θ|~x) π(~θ) d~θ. (4.25)

This integration is also referred to as "marginalization“. In this analysis, the integration is numerically performed by using a Metropolis-Hastings Markov-chain-Monte-Carlo (MCMC) algorithm [211, 212].

The information that is contained in the posterior-probability distribution P (µ|~x) is summa-rized with two quantities. The median ˆµ of the posterior-probability distribution P (µ|~x) is used as the estimate of the parameter of interest in this analysis. The median is an unbiased estima-tor for the measurements that are presented in this analysis (cf. sec. 6.4) and its calculation is numerically stable. The uncertainty of ˆµ is estimated with the Bayesian-central-68%-confidence interval [µ1, µ₂], which is constructed by [230]

µ1

−∞

P (µ|~x) dµ = 1− C.L.

2 =

Z∞

µ2

P (µ|~x) dµ, (4.26)

in which the (Bayesian) confidence level is C.L. = 0.68.

The parameter of interest, the prior-probability distribution, and the construction of the like-lihood function will be discussed in the following paragraphs.

Parameter of interest In this analysis, the parameter of interest µ corresponds to the signal strength, which is defined as

µ = σ_t-channel^meas.

σ_t-channel^SM . (4.27)

Here, σ^meas._t-channelrefers to the measured cross section, and σ^SM_t-channelrefers to the SM prediction⁶.

6The SM prediction is given in section 3.5.1.

126

4.6 Statistical Inference

Prior-probability distribution π(µ) The prior-probability distribution for the parameter of interest π(µ) is chosen such that it is uniformly distributed (“flat”) in the parameter of interest µ within the interval [0,∞] (cf. [11])

π(µ) = (

0 µ < 0

1 µ≥ 0. (4.28)

In particular, this prior probability is flat in terms of the t-channel cross section in this analysis, and, therefore, flat for the Poisson means of t-channel events.

From a physics point of view, one could argue that it is more “natural” to use a prior-probability distribution that is flat in the fundamental parameter |Vtb|, rather than using a prior-probability distribution that is flat in the measured cross section σ_t-channel ∝ |Vtb|². A prior-probability distribution that is flat in|Vtb| can be defined as

πcross check(µ) =

( 0 µ < 0

√1

µ µ≥ 0, (4.29)

since µ∝ σt-channel∝ |Vtb|².

This prior probability is used as a cross check for the|Vtb| measurement. The |Vtb| measure-ment is performed twice, once with a prior-probability distribution that is flat in|Vtb|²(eq. 4.28) and once with a prior-probability distribution that is flat in|Vtb| (eq. 4.29). The comparison of the obtained posterior distributions gives information about the “objectiveness” of the used prior-probability distribution, i.e. how sensitive the observed result is under variation of the prior-probability distribution for the parameter of interest (cf. [11]). The impact on the|Vtb| measurement due to the choice of the prior-probability distribution is discussed in the section

“Results” 7.2.

The prior-probability distributions for the nuisance parameters π(θ) are discussed in the next section. They are directly related to the interpretation of systematic uncertainties, which are incorporated as additional nuisance parameters to the likelihood function.

In document Measurement of the t-channel single top quark production cross section and the CKM matrix element V tb with the CMS experiment (Page 130-133)