Description
REDuced VIrtual Detector (REDVID) is a simulation framework and a synthetic data generator written in Python. As a reduced simulator, REDVID simulates the propagation of subatomic particles in a virtual detector model with a given geometry, inspired by the detectors installed at the Large Hadron Collider (LHC). The simulation model is complexity-reduced and is intended for generating source data to train Machine Learning (ML) algorithms and to perform ML-assisted solution exploration. The data is in the form of hit point coordinates in space and trajectory function parameters.
Sample events
A few events from simulations with varying recipes are shown for demonstration purposes. The plots below vary in track definitions. The track count is limited to five to improve legibility. From left to right, these plots depict the full event view, the hit points view and the tracks view, respectively. Note the incorporated detector model geometry as depicted in Figure 1.
The following plots consider this virtual detector with a 90-degree rotated orientation for plots. Note that the Z-axis has to go through the detector. We keep the scale down in these examples for legibility purposes, i.e., we are generating a low number of tracks per event.
3D space, noisy hit coordinates, linear tracks
A sample event with five linear tracks starting at the geometric origin and being randomly directed. The randomisation of these tracks follows the first track randomisation protocol, i.e., Protocol 1 - Last layer hit guarantee. Refer to our relevant publications for further details on track randomisation protocols. Different views for this event are depicted in Figure 2.
3D space, noisy hit coordinates, helical uniform tracks
This example event goes a step further in complexity compared to events with linear tracks. Helical uniform tracks do not occur in realistic settings. However, the generated datasets are of independent value for research. Figure 3 depicts such an event with five helical uniform tracks.
3D space, noisy hit coordinates, helical expanding tracks
Helical expanding tracks are the closest type REDVID can generate to real-world tracks. Other complexity increasing features do not directly influence a track's formation principles. For instance, all these examples have hit point coordinate smearing enabled. Figure 4 showcases an event with five helical expanding tracks, following the same track randomisation protocol as earlier.
Feature set
REDVID is highly configurable and many features available in the main configuration file can be tweaked according to user requirements. We provide the available and planned features in Table 1, without exhaustive descriptions. Current availability is indicated using status markers.
Orange: Limited selection
Red: Under development
| Category | Feature | Status |
|---|---|---|
| Execution config |
Anchor path Multiple output modes Automated execution parallelism Automated large job division Automated batch processing Automated batch processing parallelism Performance monitoring Visualisations Import/load spawned detectors Dataset coordinate system |
|
| Experiment config |
Custom/auto experiment ID Event count Fixed track count Variable track count with range Track direction, designated/random Shift over the Z-axis |
|
| Experiment config (2D tracks) |
Slope limits y-intercept limits |
|
| Experiment config (3D tracks) |
Track randomisation protocols Sub-detector track aggregation Track type: Linear Track type: Helical uniform Track type: Helical expanding Track type: Multiple types Track level: Primary tracks Track level: Secondary tracks Early terminating tracks Jet track type: Linear Jet track level: Primary jets Jet track level: Secondary jets |
|
| Experiment config (Hits) |
Hit point calculation methods Hit point smearing Hit point recording probability Holes (unrecorded hits) |
|
| Geometry features |
Custom/auto detector ID Dimension Detector space Cartesian axis boundaries Detector space Spherical boundaries |
|
| Geometry config (2D) |
Origin coordinates: (x, y) Sub-det. presence: Pixel, SS, LS Sub-det. layer count, per type Sub-det. centre coordinates Sub-det. layer distance Sub-det. outer radius Sub-det. outer-inner radii delta |
|
| Geometry config (3D) |
Origin smearing Origin smearing type Origin coordinates: (r, θ, z) Sub-det. presence: Pixel, SS, LS, Barrel Sub-det. layer count, per type Sub-det. centre coordinates Sub-det. layer distance Sub-det. outer radius Sub-det. outer-inner radii delta Sub-det. end z Sub-det. end-start z delta |
|
Code repository
The code is open-source and publicly available. Refer to the included configuration file for a complete list of available parameters and their effect. To understand the overall functionality and usage of the tool, refer to the provided README, RELEASE NOTES, and documentation, as well as the related publication [1].
Datasets
Collections of example, representative datasets are generated using the REDVID simulation framework which contain complexity-reduced subatomic particle collision event data for linear [2] and helical [3] tracks. Particle trajectory information and hit coordinates from interactions with reduced-order virtual detector models are included. The data are generated in 3D domain and follows the cylindrical coordinate system for hit point coordinates in space and trajectory function parameters.
The included five tarballs each belong to a different data generation recipe. While all recipes include 10000 collision events, the number of tracks included in events varies from 1 track per event to 10000 tracks per event. This is noticeable from the tarball names.
The dataset is intended to be used as synthesised input for research involving ML-assisted pipeline design exploration, as well as ML model design exploration, e.g., Neural Architecture Search (NAS). To understand the data and its generation in detail, refer to the provided README, as well as the related publication [1]. Further details regarding the ML research incorporating these datasets are available in our Connecting The Dots 2023 (CTD 2023) proceedings paper [5].
Publications
Publications and contributions about REDVID
Abstract
Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking is exceptionally computationally challenging and fielded solutions, relying on traditional algorithms, do not scale linearly. Machine Learning (ML) assisted solutions are a promising answer. We argue that a complexity-reduced problem description and the data representing it, will facilitate the solution exploration workflow. We provide the REDuced VIrtual Detector (REDVID) as a complexity-reduced detector model and particle collision event simulator combo. REDVID is intended as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design. The fully parametric nature of our tool, with regards to system-level configuration, while in contrast to physics-accurate simulations, allows for the generation of simplified data for research and education, at different levels. Resulting from the reduced complexity, we showcase the computational efficiency of REDVID by providing the computational cost figures for a multitude of simulation benchmarks. As a simulation and a generative tool for ML-assisted solution design, REDVID is highly flexible, reusable and open-source. Reference data sets generated with REDVID are publicly available. Data generated using REDVID has enabled rapid development of multiple novel ML model designs, which is currently ongoing.
Abstract
An example, representative data set is generated using the REDuced VIrtual Detector
(REDVID) simulation framework and contains complexity-reduced subatomic particle
collision event data. Particle trajectory information and hit coordinates from
interactions with reduced-order virtual detector models is included. The data is
generated in 3D domain and follows the cylindrical coordinate system for hit point
coordinates in space and trajectory function parameters.
The included five tarballs each belong to a different data generation recipe. While all
recipes include 10000 collision events, the number of tracks included in events varies
from 1 track per event to 10000 tracks per event. This is noticeable from the tarball
names.
The data set is intended to be used as synthesised input for research involving
ML-assisted pipeline design exploration, as well as ML model design exploration,
e.g., Neural Architecture Search (NAS). To understand the data and its generation in
detail, refer to the provided README file, as well as the related publication.
Abstract
An example, representative data set is generated using the REDuced VIrtual Detector
(REDVID) simulation framework and contains complexity-reduced subatomic particle
collision event data. Particle trajectory information and hit coordinates from
interactions with reduced-order virtual detector models is included. The data is
generated in 3D domain and follows the cylindrical coordinate system for hit point
coordinates in space and trajectory function parameters.
The included five tarballs each belong to a different data generation recipe. While all
recipes include 10000 collision events, the number of tracks included in events varies
from 1 track per event to 10000 tracks per event. This is noticeable from the tarball
names.
The data set is intended to be used as synthesised input for research involving
ML-assisted pipeline design exploration, as well as ML model design exploration,
e.g., Neural Architecture Search (NAS). To understand the data and its generation in
detail, refer to the provided README file, as well as the related publication.
Abstract
Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking, in its current form, is exceptionally computationally challenging. Fielded solutions, relying on traditional algorithms, do not scale linearly and pose a major limitation for the HL-LHC era. Machine Learning (ML) assisted solutions are a promising answer. Current ML model design practice is predominantly ad hoc. We aim for a methodology for automated search of ML model designs, consisting of complexity reduced descriptions of the main problem, forming a complexity spectrum. As the main pillar of such a method, we provide the REDuced VIrtual Detector (REDVID) as a complexity-aware detector model and particle collision event simulator. Through a multitude of configurable dimensions, REDVID is capable of simulations throughout the complexity spectrum. REDVID can also act as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design evaluation. Starting from the simplistic end of the spectrum, lesser designs can be eliminated in a systematic fashion, early on. REDVID is not bound by real detector geometries and can simulate arbitrary detector designs. As a simulation and a generative tool for ML-assisted solution design, REDVID is open-source and reference data sets are publicly available. It has enabled rapid development of novel ML models.
Publications and contributions using REDVID
Abstract
Track reconstruction is a vital aspect of High-Energy Physics (HEP) and plays a critical role in major experiments. In this study, we delve into unexplored avenues for particle track reconstruction and hit clustering. Firstly, we enhance the algorithmic design effort by utilising a simplified simulator (REDVID) to generate training data that is specifically composed for simplicity. We demonstrate the effectiveness of this data in guiding the development of optimal network architectures. Additionally, we investigate the application of image segmentation networks for this task, exploring their potential for accurate track reconstruction. Moreover, we approach the task from a different perspective by treating it as a hit sequence to track sequence translation problem. Specifically, we explore the utilisation of Transformer architectures for tracking purposes. Our preliminary findings are covered in detail. By considering this novel approach, we aim to uncover new insights and potential advancements in track reconstruction. This research sheds light on previously unexplored methods and provides valuable insights for the field of particle track reconstruction and hit clustering in HEP.
Abstract
Inspired by the recent successes of language modelling and computer vision machine
learning techniques, we study the feasibility of repurposing these developments for
particle track reconstruction in the context of high energy physics. In particular,
drawing from developments in the field of language modelling we showcase the
performance of multiple implementations of the transformer model, including an
autoregressive transformer with the original encoder-decoder architecture, and
encoder-only architectures for the purpose of track parameter classification and
clustering. Furthermore, in the context of computer vision we study a U-net style model
with submanifold convolutions, treating the event as an image and highlighting those
pixels where a hit was detected.
We benchmark these models on simplified training data utilising a recently developed
simulation framework, REDuced VIrtual Detector (REDVID). These data include noisy
linear and helical track definitions, similar to those observed in particle detectors
from major LHC collaborations such as ATLAS and CMS. We find that the proposed
models can be used to effectively reconstruct particle tracks on this simplified
dataset, and we compare their performances both in terms of reconstruction efficiency
and runtime. As such, this work lays the necessary groundwork for developments in the
near future towards such novel machine learning strategies for particle tracking on
more realistic data.
Abstract
Track reconstruction is a crucial part of High Energy Physics (HEP) experiments.
Traditional methods for the task scale poorly, making machine learning and deep
learning appealing alternatives. Following the success of transformers in the field
of language processing, we investigate the feasibility of training a Transformer to
translate detector signals into track parameters. We study and compare different
architectures. Firstly, an autoregressive Transformer model with the original
encoder-decoder architecture which reconstructs a particle's trajectory given a few
initial hits. Secondly, an encoder-only architecture used as a classifier, producing
a class label for each hit in an event, given pre-defined bins within the track
parameter space. Lastly, an encoder-only model with the purpose of regressing track
parameter values for each hit in an event, followed by clustering.
The Transformer models are benchmarked on simplified datasets generated by the
recently developed simulation framework REDuced VIrtual Detector (REDVID) as well as
a subset of the TrackML data. The preliminary results of the proposed models show
promise for the application of these deep learning techniques on more realistic data
for particle reconstruction.
This work has been previously presented at the following conferences: Connecting The
Dots 2023 (https://indico.cern.ch/event/1252748/contributions/5521505/), NNV 2023
(https://indico.nikhef.nl/event/4510/contributions/18909/), and ML4Jets2023
(https://indico.cern.ch/event/1253794/contributions/5588602/).
Abstract
Particle track reconstruction is a fundamental aspect of experimental analysis in
high-energy particle physics. Conventional methodologies for track reconstruction are
suboptimal in terms of efficiency in anticipation of the High Luminosity phase of the
Large Hadron Collider. This has motivated researchers to explore the latest developments
in deep learning for their scalability and potential enhanced inference efficiency.
We assess the feasibility of three Transformer-inspired model architectures for hit
clustering and classification. The first model uses an encoder-decoder architecture to
reconstruct a track auto-regressively, given the coordinates of the first few hits. The
second model employs an encoder-only architecture as a classifier, using predefined
labels for each track. The third model, also utilising an encoder-only configuration,
regresses track parameters, and subsequently assigns clusters in the track parameter
space to individual tracks.
We discuss preliminary studies on a simplified dataset, showing high success rates for
all models under consideration, alongside our latest results using the TrackML dataset
from the 2018 Kaggle challenge. Additionally, we present our journey in the adaptation of
models and training strategies, addressing the trade-offs among training efficiency,
accuracy, and the optimisation of sequence lengths within the memory constraints of the
hardware at our disposal.
Abstract
Track reconstruction is a crucial part of High Energy Physics experiments. Traditional methods for the task, relying on Kalman Filters, scale poorly with detector occupancy. In the context of the upcoming High Luminosity-LHC, solutions based on Machine Learning (ML) and deep learning are very appealing. We investigate the feasibility of training multiple ML architectures to infer track-defining parameters from detector signals, for the application of offline reconstruction. We study and compare three Transformer model designs, as well as a U-Net architecture. We describe in detail the two most promising approaches and benchmark the pipelines for physics performance and inference speed on methodically simplified datasets, generated by the recently developed simulation framework, REDuced VIrtual Detector (REDVID). Our second batch of simplified datasets are derived from the TrackML dataset. Our preliminary results show promise for the application of such deep learning techniques on more realistic data for tracking, as well as efficient elimination of solutions.
Abstract
High-Energy Physics experiments are facing a multi-fold data increase with every new iteration. This is certainly the case for the upcoming High-Luminosity LHC upgrade. Such increased data processing requirements forces revisions to almost every step of the data processing pipeline. One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking. A Machine Learning-assisted solution is expected to provide significant improvements, since the most time-consuming step in tracking is the assignment of hits to particles or track candidates. This is the topic of this paper. We take inspiration from large language models. As such, we consider two approaches: the prediction of the next word in a sentence (next hit point in a track), as well as the one-shot prediction of all hits within an event. In an extensive design effort, we have experimented with three models based on the Transformer architecture and one model based on the U-Net architecture, performing track association predictions for collision event hit points. In our evaluation, we consider a spectrum of simple to complex representations of the problem, eliminating designs with lower metrics early on. We report extensive results, covering both prediction accuracy (score) and computational performance. We have made use of the REDVID simulation framework, as well as reductions applied to the TrackML data set, to compose five data sets from simple to complex, for our experiments. The results highlight distinct advantages among different designs in terms of prediction accuracy and computational performance, demonstrating the efficiency of our methodology. Most importantly, the results show the viability of a one-shot encoder-classifier based Transformer solution as a practical approach for the task of tracking.
Abstract
This artefact includes five individual data sets containing particle collision data in virtual detector setups. These data sets are utilised for Machine Learning (ML) model design and training within the publication “TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era”. Three of the data sets are generated using the REDuced VIrtual Detector (REDVID) simulation framework. The other two are reduced versions of the TrackML data set. The full TrackML data set is simulated using Pythia 8 event generator. Refer to the provided README file for further details.
Abstract
TrackFormers is a machine learning framework for track reconstruction in particle physics experiments. It leverages transformer- and U-Net-inspired deep learning architectures to predict particle tracks from hit data. This repository contains 4 directories corresponding to the 4 models described in the paper TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era. EncDec, EncCla, and EncReg are transformer-based models, whereas U-Net is, as the name suggests, a U-Net model. Refer to the provided README file for further details.
Abstract
High-Energy Physics experiments are rapidly escalating in generated data volume, a trend that will intensify with the upcoming High-Luminosity LHC upgrade. This surge in data necessitates critical revisions across the data processing pipeline, with particle track reconstruction being a prime candidate for improvement. In our previous work, we introduced "TrackFormers", a collection of Transformer-based one-shot encoder-only models that effectively associate hits with expected tracks. In this study, we extend our earlier efforts by incorporating loss functions that account for inter-hit correlations, conducting detailed investigations into (various) Transformer attention mechanisms, and a study on the reconstruction of higher-level objects. Furthermore we discuss new datasets that allow the training on hit level for a range of physics processes. These developments collectively aim to boost both the accuracy, and potentially the efficiency of our tracking models, offering a robust solution to meet the demands of next-generation high-energy physics experiments.
Abstract
High-Energy Physics experiments are rapidly escalating in generated data volume, a trend that will intensify with the upcoming High-Luminosity LHC upgrade. This surge in data necessitates critical revisions across the data processing pipeline, with particle track reconstruction being a prime candidate for improvement. In our previous work, we introduced "TrackFormers", a collection of Transformer-based one-shot models that effectively associate hits with expected tracks. In this study, we extend our earlier efforts of model development by incorporating loss functions that account for inter-hit correlations, conducting detailed investigations into (various) Transformer attention mechanisms, and a study on the reconstruction of higher-level objects. Furthermore, we discuss new datasets that allow the training on hit level for a range of physics processes. These developments collectively aim to boost both the accuracy and the potential efficiency of our tracking models, offering a robust solution to meet the demands of next-generation high-energy physics experiments.