In addition to the hardware and trigger system of the LHCb detector, there is also considerable software infrastructure that is necessary for the analysis of LHCb data.
4.9.1 Simulation
The use of simulated orMonte-Carlo (MC) data is an integral part of almost every
analysis performed in particle physics. One can generate what signal would be observed in the detector given some physical process, a description of the detector and read-out chain, and some analysis procedure for identifying that signal. This information can then be used to determine efficiencies, set limits on speculative processes, study systematic uncertainties, constrain background yields and shapes, and investigate potential biases.
In the LHCb Gauss framework, simulated data is produced using a chain of
specialised programs [98,99]. The first step in the chain is the event generation, where
the result of generic 7 and 8 TeVpp interactions are produced using Pythia 8 [76].
This first simulates the result of the ‘hard’ process by using parton distribution functions that describe the relative composition of the protons as a function of the momentum of the incoming proton, and generates outgoing partons. These partons generate showers which eventually form colour neutral hadron ‘strings’, due to QCD confinement. These strings then fragment into the primary hadrons. At LHCb the
last part of this procedure is often repeated many times until a desiredb- orc- hadron
is produced. This hadron is then decayed usingEvtGen [100] according to either a
set of user specified intermediate states or via branching fractions according to the Particle Data Group (PDG) tables [7].
LHCb detector simulation, where the propagation of all generated particles and their interaction with the detector material is simulated. This simulated detector is read
out usingBoole, which simulates the various data acquisition electronics present in
the real LHCb detector. Events are reconstructed using Brunel, and the L0 and
high-level triggers are simulated by theMoore package. Both the reconstruction and
trigger software are identical in simulation to those used in data-taking.
4.9.2 The Worldwide LHC Computing Grid
The Worldwide LHC Computing Grid was designed for the processing and storage requirements of the approximately 50 petabytes (PB) of data generated annually by the LHC experiments, and consists of thirteen ‘tier-1’ sites that provide several PB each of redundant data-storage capacity, connected to the central CERN ‘tier-0’ data-centre by 10 Gbps fibre-optic links. In addition to these storage nodes there are also around 100 smaller ‘tier-2’ sites connected via high-speed public network links, which along with the tier-1 sites provide compute nodes used for physics analysis and simulation. As of the end of 2016, these resources totalled 334 PB of storage and 620 000 logical CPU cores, around 10% of which are allocated for LHCb use.
At LHCb, the Grid is used for the storage of collision data and MC, centralised pre-selection (stripping) campaigns and MC productions, and the execution of user analysis jobs [103].
4.9.3 Stripping
To avoid the CPU-time consuming processes of each user analysis running over all raw collision data files, and the storage cost of saving multiple copies of the same
event, LHCb implements a centralised pre-selection procedure known asstripping.
These analyst or physics working group specified selections,stripping lines, are
run in bulk over the raw data, and the candidates that are flagged as passing one or more lines are duplicated into a user accessible area. Many of these selection algorithms save only information pertaining to the signal event to conserve storage space. Such pre-selections are also applied to centrally produced MC. In addition to this, it is possible to generate MC such that events not passing a specific stripping line are discarded, which helps to conserve storage space for large MC production
requests. This is known as filtered MC, which is utilised in the analysis of the
B+→π+π+π− decay in Chapter 7.
4.9.4 PIDCalib
Charged particle identification is essential to the LHCb physics programme, however the efficiency of the RICH detectors in particular is not well replicated by MC (this is mostly due to being highly correlated with the event multiplicity, which is not well replicated by the event generators). This necessitates a data-driven
statistics control modes that can be reliably identified without the use of the PID
system: D∗+→ D0(K−π+)π+, for pions and kaons, where the Cabbibo favoured
decay of theD0orD0is tagged by the charge of the slow pion from the flavour-specific
D∗+ decay; proton samples are derived from Λ→ pπ− decays, where final states
consistent with KS0→ π
+
π− are vetoed, supplemented by a comparatively small
sample of inclusiveΛ+c →pK
−
π+ decays for high momentum tracks; and electron
and muon samples are derived from inclusive J/ψ→µ+µ−, e+e− decays, with the
efficiency calculated using the tag-and-probe method.
Figure 4.12: Particle identification calibration samples for 2015 Run 2 data [105].
Left: Proton calibration sample distribution in momentum and pseudorapidity, where the red bands indicate the boundaries between different high-level trigger
lines optimised for each region. Right: Invariant mass distribution of the D∗+→
D0(K−π+)π+ decay used for pion and kaon calibration samples, where the dashed
lines indicate the components of theD0 signal model.
The invariant mass distribution of these data samples are fitted to extract sWeights [106], in order to perform background subtraction, and then binned into
variables that are correlated with PID efficiency (usually p,pT, and the number of
reconstructed tracks in the event). The desired PID requirements are then applied to these data and the efficiency can be extracted on a per-bin basis.
4.9.5 Decay tree re-fitting
In order to execute as fast as possible, the online reconstruction does not assume anything about the particle content of the decay being reconstructed. However, when applying the requirements from the trigger and some loose pre-selection, it is possible to re-fit the entire decay chain according to specific intermediate and final-state particle hypotheses. In doing so, improved resolution on the intermediate decay vertices and unconstrained invariant masses can be obtained.
In LHCb, this fit is performed with the ‘Decay tree fitter’ algorithm [107], which applies a Kalman filter to iteratively re-fit the entire decay chain. This makes use of internal constraints, such as the requirement that decay products originate from the same vertex, but also the exact external mass constraints, which are implemented
into the Kalman filter by the method of Lagrange multipliers. This is of particular use in decays involving a photon or neutral pion, where conventional unconstrained fitting is not possible. Re-fitted decay chains with the incorrect mass hypothesis also result in a decrease in the vertex fit quality, which can be exploited to reject background of this kind.
In addition to improving the mass resolution for ‘cascade’ decays (such as the
decays described in Chapter 5 that involve an intermediate Λ→ pπ− decay), a
b-hadron mass constraint ensures that in fits to quasi-two-body decay amplitudes in
the Dalitz plot (such as those described in Chapter 7), all decays exist within the kinematic boundaries.