Graph Clustering Algorithms - Reconstruction techniques for fine grained neutrino detectors

Chapter 9 Conclusion

A.3 Graph Clustering Algorithms

One approach to extracting information from a 3D point cloud that was considered was to build a graph (in the mathematical sense of a connected network of nodes with possibly weighted edges) where each node was a point in the data set, and each edge was weighted by the distance between the nodes it connects.

Given such a graph, it is possible to analyse the data in a number of ways, including determining the shortest path between two nodes using standard shortest path algorithms, for example Dijkstra’s algorithm [83]. Using the networkx1_Python

library for graph manipulation, this could be trivially applied to data from the TrackGen orLatte simulations.

Given two extreme points in the event (e.g. a point close to the origin, and a point near the edge of the event) the shortest path algorithm provides a preliminary clustering of a track, containing those hits that lie on the shortest path between the ends of the track, however it is strongly dependent on a good seed point. Use of this approach in conjunction with the feature detection algorithms outlined in chapter 4.5 may give reasonable seed points for this technique. There still remains the issue of assigning hits surrounding the core to the track that was found, and the cellular automaton once again outperforms in this area.

No substantial analysis of this approach was performed, in part because the cellular automaton provided a promising approach to track reconstruction, and because there are many possible algorithms that could be run against a network or graph of spatial data, and it was not clear which possibilities would give the best results. The use of this technique to cluster or classify showers in a liquid Argon volume was brieﬂy considered, but again no signiﬁcant characterisation of the properties or mechanics of such an algorithm was performed.

B

Continued Use of the Latte Framework

Following on from the work in this thesis, the Latte framework has seen continued use by others at the University of Warwick. This chapter describes some of these developments and puts them in the context of the work presented in this thesis.

B.1 Cellular Automaton

The characterisation and assessment of the CA algorithm for track ﬁnding in liquid Argon has continued, resulting in a draft paper accepted for publication in Eur. Phys. J. C [2].

In the paper, further optimisation of the CA algorithm for muon and proton tracks in liquid Argon yields the set of parameter values shown in table B.1. The CA algorithm itself remains identical to the implementation described in this thesis, but the merging strategy was changed to use information local to the endpoints of each cluster, giving rise to thestitching parameters listed.

The CA was applied to two datasets from 0.77GeV µν interactions, each

containing1000simulated events. The first set hadµ+p(CCQE) final states, while the second set had µ+p+π+ (CC1π) final states. The reconstruction efficiency and purity for each particle species of interest are shown in table B.2 for the CCQE

Smoothing CA CA search radius Stitching Stitching Min. cluster radius angle / max. radius angle num. end-points hit count

6mm 20◦ 3mm / 12mm 30◦ 5 5

Table B.1: Summary of the optimal reconstruction parameters for the CA algorithm applied to CCQE events. Data from [2].

Num. recon. Muon Muon Proton Proton clusters eﬃciency (%) purity (%) eﬃciency (%) Purity (%)

N = 2 98.3 92.4 99.0 95.9

N >1 92.7 92.3 98.8 95.5

N >0 93.1 91.7 98.8 95.1

Table B.2: Mean eﬃciency and purity measured at the hit level for muons and protons from 0.77GeV νµ interactions. A proton range cut at 25 hits was

applied based on the Monte Carlo truth data. Data from [2].

Num. recon. Muon Muon Proton Proton Pion Pion clusters efficiency purity efficiency Purity efficiency purity

(%) (%) (%) (%) (%) (%)

N >2 96.9 86.9 98.3 90.1 92.6 85.5

N >0 97.3 85.6 98.5 89.0 93.1 84.1

Table B.3: Mean eﬃciency and purity measured at the hit level for muons, protons and pions from 0.77GeV νµ interactions. A proton range cut at 25 hits

was applied based on the Monte Carlo truth data. Data from [2].

events, and table B.3 for the CC1π events. In both cases a proton range cut was applied based on the truth data, requiring protons to have more than25hits, equiv- alent to a 2.5cm track, or a10MeV minimum ionising particle travelling through liquid Argon.

The paper also evaluates the application of the CA algorithm to track-shower discrimination, using data from the clusters found by the CA as input to a Boosted Decision Tree (BDT) based on the TMVA [81] package. The three input variables are derived from the CA output as follows:

1. Ratio of transverse to logitudinal extent of the cluster

2. Ratio of Coulomb energy1 to the longitudinal extent of the cluster

3. Number of clusters produced by the CA; important for low energy, where the geometric variables do not provide good discrimination between muon tracks and electron showers.

The BDT was trained with a signal consisting of electron showers with energies between 10MeV and 3GeV and a background consisting of muons in the same energy range. The optimal cut was deﬁned to be the one that maximises the

1_{The Coulomb energy of a cluster is deﬁned here as the sum over all hits of the charge in each hit}

ratio of signal to total event rate. This yields a 97.3% signal eﬃciency (i.e. a 3.7% contamination) for discriminating electron showers from muon tracks, across all of the energies sampled.

These results demonstrate that the cellular automaton algorithm is ﬁt for its original purpose, but can also be applied to scenarios for which it was never designed. Therefore, as a track ﬁnder and as a general purpose clustering tool, the CA is a valuable addition to the toolkit provided by the Latte framework.

In document Reconstruction techniques for fine grained neutrino detectors (Page 142-146)