Validation - Systematic Event Generator Tuning with Professor

the desired quality of the fits — a subjective prioritising of particular generator aspects which cannot be avoided and usually requires some iteration.

3.3 Validation

Before parameterising real MC-generated data, the parameterisation algorithm had to be tested for robustness against the distribution of the anchor points and its behaviour when dealing with data which does not perfectly fit to the parameterising polynomial. Second, one is advised to check, that the GoF returned from the parameterisation resembles the GoF returned directly from the MC-generated data.

The Professor GoF function can be influenced by several observable weight combina-

tions, wO, and also by the number of runsused for the parameterisation. This offers

possibilities to check the minima for systematics due to improper parameterisation or overtuning to a specific set of observable weights.

In addition, the minimization results obtained from quadratic interpolations were compared to those obtained from cubic interpolations. So far, no significant difference has been found between the best tuning estimates, though the cubic interpolation describes the generator response better in regions that are far away from the minimum.

3.3.1 Robustness of the parameterisation algorithm

The basic functioning of the polynomial parameterisation was tested with input data generated with a second-order polynomial with random coefficients and the known coefficients were compared to those of the resulting parameterisations. After this, the robustness of parameterising error-smeared data and data from non-second-order distributions was tested. For this input data were generated using second- to fourth- order polynomials, especially polynomials of the form

f(~p) = (~p− ~m1)2(~p− ~m2)2+~a ·~p, (13)

and were smeared using an Gaussian error. Then, the unsmeared original polynomial and the parameterisation were evaluated at 10000 randomly located points and a simple

χ2_/N

df and pulls were calculated as a GoF measure, where the pulls were calculated as

follows:

p=10000

_∑

i=1

funsmeared(~xi) − fparam(~xi)

3.3 Validation 3 IMPLEMENTATION

(a) withour oversampling (b) with oversampling

Figure 4: Example pull distributions: Parameterisations of data generated with a smeared fourth-

order polynomial, equation (13), in 7 dimensions were compared to the unsmeared polynomial. The parameterisations in (a) were created using the minimal number of anchor points N_min(7) =37. Those in (b) are using N_min(7) +6=43 anchor points. One can clearly observe that the pull distribution narrows when using additional sample points.

with the~xibeing the test points, funsmeared and fparam the unsmeared polynomial and

parameterisation, respectively, and σ the width of the Gaussian error distribution. At last a Gaussian distribution was fitted to the pull histogram. This all was done for different dimensions of parameter space n and different numbers of sample points N =N_min(n), N_min(n) +2, . . .. Using N_min(n) sample points resulted in observed χ2/Ndfcovering

several orders of magnitude and broad, in the low dimensional case even biased pull distributions. Using additional sample points reduced all this unwanted behaviour, e.g. in the case of a 7 dimensional parameter space and data generated after a fourth-order polynomial the average width of pull distribution fell from 7.9 (N_min(7)) to 3.2 (N_min(7) +

6) and the range of observed χ2/Ndf from O(10–103) (Nmin(7) ) to O(1–10) (Nmin(7) +6),

consequently discouraging parameterisations based on the minimal number of sample points as outlined in section 3.3.3. Examples of pull distributions are given in Figure 4. Third, the influence of the distribution of the sample points in the parameter hypercube on the parameterisation quality was tested. A total of 5000 parameterisations based on

error-smeared data were performed. χ2_/N

df values were computed as above using four

different measures of the distance between the anchor point distributions • average and minimal cartesian distance between the anchor points

• average and minimal distance of the projections of the anchor points on the parameter axis

3 IMPLEMENTATION 3.3 Validation These were filled in 2D histograms. For the low dimensional cases a dependence of the GoF on the averaged distances was found for anchor point samples that were sampled in a way that larger regions of the parameter space were not covered. This problem could easily be solved by oversampling, whereas the more relevant, high dimensional cases did not show this dependence. The dimension of the parameter space for this test

ranged from P=1 to 10 and the number of anchor points from N= N_min(P) to N_min(P) +10.

3.3.2 Tune verification

As mentioned in section 2.5, it is useful to visualise Professor tunes along lines in parameter space, in particular lines which intersect both the estimated best tune and an alternative or default configuration. Professor provides a program to do this scan. This is useful to verify that the GoF really behaves as parameterised, and to ensure that the chosen point really is close to a GoF optimum.

To reduce the risk that a minimum returned by the numerical minimisation is a local

minimum,prof-tunecan perform several minimisations with different starting points.

A tighter tune, either Professor or grid-scan based, could be performed based on the correspondence between the true and parameterised GoF in the tune region.

3.3.3 Tuning stability

The Professor system offers two different ways to get an estimate of the stability of the minimum found.

One can benefit from oversampling the parameter space w.r.t. the feasibility of the

SVD in such a way that numerous run combinations1 _{may be chosen for different}

parameterisations simply by omitting a fraction of all the available Monte Carlo-runs. In order to reduce correlations between run combinations we usually choose this fraction to be about one-third. This is clearly a compromise between the quality of the parameterisation and the degree of correlation introduced by choosing several run combinations.

The outcome of all minimizations can be displayed parameter-wise such as in Figure 71. We observe that the minimization result derived from all available Monte Carlo runs

always lies in the center of the χ2_/N

df-distribution illustrating that certain interpolations

1_{Usually, minimisations are performed based on about 100 run combinations. The run combinations}

3.3 Validation 3 IMPLEMENTATION fit the data better than others but that using all the information available gives a good description on average.

Instead of varying the parameterisation it is also possible to influence the GoF function. This can be done by independently applying a weight to each observable included in the tuning. This more or less subjective approach is justified by two facts. Firstly, we certainly do not expect the generator’s response function to be a simple polynomial and secondly we know that the models are incapable to reproduce certain observables at all. In Professor it is possible to investigate the stability of the tuning under change of weights, again by comparing the outcome of the minimizations (Figure 72). So far no strong dependence on the observable weights has been found.

4 THE UNDERLYING EVENT

4 The Underlying Event

In this section an introduction to the underlying event phenomenology will be given. Furthermore, the strategies used at the Tevatron experiment CDF for the direct measure- ment of the underlying event characteristics will be explained. A detailed description of the model setup in question for tuning will be presented in section 5, followed by an explanation (and justification) of the observables that went into the tuning procedure with Professor.

4.1 Probing the Proton structure

I will shortly sum up the experimental techniques used to probe the proton structure.

Deep inelastic scattering

To our current level of knowledge free quarks do not exist. In fact, due to the mechanism of confinement, they appear only in compound objects called hadrons. Numerous experiments have been conducted so far in order to unveil the proton structure. Deep inelastic scattering (DIS) experiments, such as HERA, for one thing, use electrons or positrons as probes. Since they do not carry colour charge they interact with the quarks

inside the proton only by exchange of photons or Z0-bosons (see Figure 5). This makes

them the preferred choice for probing the charge distribution of hadrons and therefore the extraction of quark and also gluon density functions.

γ −Q = q = k0

− k

k k0

p p0

4.2 Underlying event phenomenology 4 THE UNDERLYING EVENT

In document Systematic Event Generator Tuning with Professor (Page 33-38)