REDVID Collision Event Data – Tracks and Hits
dr. ir. Uraz Odyurt
Introduction
REDuced VIrtual Detector (REDVID) is a simulation framework and a synthetic data generator written in Python. The generator simulates the propagation of subatomic particles, inspired by the detectors installed at the Large Hadron Collider (LHC). The simulation model is complexity-reduced and is intended for generating source data for Machine Learning (ML) algorithms.
For further information, refer to REDVID website.
Detector geometry
A detector’s geometry consists of multiple layers of sub-detectors of different shapes, belonging to different categories. Each category defines the relevant shape of a sub-detector. However, dimensions and the placement of different layers, relative to the detector origin, will vary within each category.
Such details, as well as particulars defining the presence/absence of categories of sub-detectors are provided as a configuration file(s).
2D structure
As a result of the oversimplified design of this 2D variant, sub-detector categories have minute differences. In essence, all sub-detector layers are circles centred at the origin. Although the simulator supports filled or partially filled circle designs, the only filled sub-detector is the innermost circle, i.e., the Pixel sub-detector.
The following sub-detector categories are available for a 2D variant:
- Pixel
- Short-strip
- Long-strip
3D structure
The 3D variant is a big step forward towards closing the gap with the real-world detector apparatus. The available sub-detectors can take the forms of disks and cylinders, or generally speaking, cylinders. As a matter of fact, a disk is a special case for the cylinder shape.
The following sub-detector categories are available for a 3D variant:
- Pixel
- Short-strip
- Long-strip
- Barrel
The Pixel and the Barrel categories are of the cylindrical shape, with the Pixel being a filled shape and the Barrel being only a shell and without the end-caps. These two categories are centred at the origin.
Both the Short-strip and the Long-strip categories consist of sub-detectors in the shape of disks. These disks are centred on the Z-axis. There are multiple such disks, which are mirrored relative to the XY-plane. In other words, each disk has an identical twin on the opposite side of the XY-plane. The shape, the size and the orientation of these pairs are the same.
All sub-detector shapes, regardless of their category, are positioned around the Z-axis, i.e., the Z-axis goes through their centre.
Data description
The generated data includes information on geometry, tracks and hits from experiments. While the information on the geometry is included, the bulk of the data set covers data for tracks and hits. There are differences in the data generated for the 2D and the 3D structures.
Tracks and hits belonging to the experiments performed on the 2D structure are given by means of function coefficients and point coordinates in the Cartesian coordinate system, respectively. Additionally, conversion to the polar coordinate system is included.
Tracks and hits belonging to the experiments performed on the 3D structure on the other hand, are defined as parameters of line equations and point coordinates in the Cylindrical coordinate system, respectively.
Folder structure
The generated folder structure, holding the data files, is as follows:
ANCHOR_PATH
|-- detector_<detector_id>
|-- detector_<detector_type>
|-- experiment_<detector_type>_<experiment_tag>
|-- events_all
|-- hits_<detector_type>_events_all.csv
|-- hits_and_tracks_<detector_type>_events_all.csv
|-- hits_and_tracks_polar_<detector_type>_events_all.csv
|-- tracks_<detector_type>_events_all.csv
|-- events_individual
|-- event_<detector_type>_<event_id>
|-- hits_<detector_type>_<event_id>.csv
|-- tracks_<detector_type>_<event_id>.csv
|-- ...
|-- report
|-- dataset_<detector_type>_report.txt
|-- geometry_<detector_type>
|-- geometry_<detector_type>.csv
File combinations
The generated data is saved in multiple CSV files, with the same data being replicated in three file combinations. The user can opt for any of these combinations and will end up with a complete set. These file combinations are as follows:
- The complete collection of files per event, i.e., all
hits_<detector_type>_<event_id>.csvandtracks_<detector_type>_<event_id>.csvfiles under theevents_individualfolder tree are to be considered. - The files
hits_<detector_type>_events_all.csvandtracks_<detector_type>_events_all.csv, residing inside theevents_allfolder, have to be considered. - The file
hits_and_tracks_<detector_type>_events_all.csv, residing inside theevents_allfolder, has to be considered.
Conversions to polar coordinate system for the 2D structure
A post-generation step exists, performing the conversion of the original hit point coordinates from the Cartesian system into the polar coordinate system, as well as the conversion of the track line slope to a degree given in radians.
The extra headers generated as a result of this step are appended as new columns to the
most comprehensive data file, hits_and_tracks_<detector_type>_events_all.csv. The
resulting extended CSV file is save as
hits_and_tracks_polar_<detector_type>_events_all.csv in the same location.
Data headers for the 2D structure
Data headers, i.e., CSV column titles, apply to all CSV files. Different CSV files include different subsets of the headers, depending on the contained data. These headers are as follows:
-
event_id- An incremental identifier for events belonging to an experiment, which is unique within the scope of the experiment.
Type =>integer -
sub_detector_id- An incremental identifier for different sub-detector layers belonging to a geometry, which is unique within the scope of the geometry.
Type =>integer -
sub_detector_type- The type of the sub-detector layer recording a hit, which can be one of three available types, pixel, short-strip, or long-strip.
Type =>string -
track_id- An incremental identifier for tracks belonging to an event, which is unique within the scope of the event.
Type =>integer -
track_type- Indicates the type of function defining the track in terms of polynomial degree. At the moment, all tracks are ‘linear’.
Type =>string -
coefficient_1- The first track polynomial function coefficient. Not applicable.
Type =>float -
coefficient_2- The second track polynomial function coefficient. Not applicable.
Type =>float -
slope- The third track polynomial function coefficient (coefficient_3), i.e., slope.
Type =>float -
y_intercept- The forth track polynomial function coefficient (coefficient_4), i.e., y-intercept.
Type =>float -
hit_id- An incremental identifier for hits belonging to an event, which is unique within the scope of the event.
Type =>integer -
hit_x- The X coordinate of the hit, in the Cartesian coordinate system.
Type =>float -
hit_y- The Y coordinate of the hit, in the Cartesian coordinate system.
Type =>float -
track_theta(polar) - The slope degree of the track line in radians.
Type =>float -
hit_r(polar) - The vector radius, or in other words, the distance of the hit from the origin.
Type =>float -
hit_theta(polar) - The hit vector slope degree in radians. As a result of the added random noise during data generation, this value is slightly different compared to thetrack_thetavalue.
Type =>float
The header inclusion map for different files are as follows:
hits_<detector_type>_<event_id>.csv- 1, 2, 3, 4, 10, 11, 12tracks_<detector_type>_<event_id>.csv- 1, 4, 5, 6, 7, 8, 9hits_<detector_type>_events_all.csv- 1, 2, 3, 4, 10, 11, 12tracks_<detector_type>_events_all.csv- 1, 4, 5, 6, 7, 8, 9hits_and_tracks_<detector_type>_events_all.csv- 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12hits_and_tracks_polar_<detector_type>_events_all.csv- 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
Data headers for the 3D structure
Data headers, i.e., CSV column titles, apply to all CSV files. Different CSV files include different subsets of the headers, depending on the contained data. These headers are as follows:
-
event_id- An incremental identifier for events belonging to an experiment, which is unique within the scope of the experiment.
Type =>integer -
sub_detector_id- An incremental identifier for different sub-detector layers belonging to a geometry, which is unique within the scope of the geometry.
Type =>integer -
sub_detector_type- The type of the sub-detector layer recording a hit, which can be one of three available types, pixel, short-strip, or long-strip.
Type =>string -
track_id- An incremental identifier for tracks belonging to an event, which is unique within the scope of the event.
Type =>integer -
track_type- Indicates the type of function defining the track in terms of polynomial degree. Available types are ‘linear’, ‘helical_uniform’ and ‘helical_expanding’.
Type =>string -
r_0orradial_const- Thercoordinate of the(r, theta, z)tuple defining the pointP_0, used in a track’s parametric set of equations. The value will represent origin smearing forr.r_0andradial_constare applicable to ‘linear’ and ‘helical_expanding’ track types, respectively.
Type =>float -
theta_0orazimuthal_const- Thethetacoordinate of the(r, theta, z)tuple defining the pointP_0, used in a track’s parametric set of equations. The value will represent origin smearing fortheta.theta_0andazimuthal_constare applicable to ‘linear’ and ‘helical_expanding’ track types, respectively.
Type =>float -
z_0orpitch_const- Thezcoordinate of the(r, theta, z)tuple defining the pointP_0, used in a track’s parametric set of equations. The value will represent origin smearing forz.z_0andpitch_constare applicable to ‘linear’ and ‘helical_expanding’ track types, respectively.
Type =>float -
r_d- Thercoordinate of the(r, theta, z)tuple defining the direction vectorV_d, used in a track’s parametric set of equations.r_dis applicable to the ‘linear’ track type.
OR,
radial_coeff- The coefficient affecting the radius rate in the helical track.radial_coeffis applied to the free variable in the equation forr.radial_coeffis applicable to the ‘helical_expanding’ track type.
Type =>float -
theta_d- Thethetacoordinate of the(r, theta, z)tuple defining the direction vectorV_d, used in a track’s parametric set of equations.theta_dis applicable to the ‘linear’ track type.
OR,
azimuthal_coeff- The coefficient affecting the clockwise/counter-clockwise extrusion direction of the helical track.azimuthal_coeffis applied to the free variable in the equation fortheta.azimuthal_coeffis applicable to the ‘helical_expanding’ track type.
Type =>float -
z_d- Thezcoordinate of the(r, theta, z)tuple defining the direction vectorV_d, used in a track’s parametric set of equations. This value will be1or-1, depending on which side of the XY-plane the track is being directed to.z_dis applicable to the ‘linear’ track type.
OR,
pitch_coeff- The coefficient affecting the pitch rate in the helical track.pitch_coeffis applied to the free variable in the equation forz.pitch_coeffis applicable to the ‘helical_expanding’ track type.
Type =>integer -
hit_id- An incremental identifier for hits belonging to an event, which is unique within the scope of the event.
Type =>integer -
hit_r- Thercoordinate of the(r, theta, z)tuple defining the recorded hit point on the relevant sub-detector.
Type =>float -
hit_theta- Thethetacoordinate of the(r, theta, z)tuple defining the recorded hit point on the relevant sub-detector.
Type =>float -
hit_z- Thezcoordinate of the(r, theta, z)tuple defining the recorded hit point on the relevant sub-detector.
Type =>float
The header inclusion map for different files are as follows:
hits_<detector_type>_<event_id>.csv- 1, 2, 3, 4, 12, 13, 14, 15tracks_<detector_type>_<event_id>.csv- 1, 4, 5, 6, 7, 8, 9, 10, 11hits_<detector_type>_events_all.csv- 1, 2, 3, 4, 12, 13, 14, 15tracks_<detector_type>_events_all.csv- 1, 4, 5, 6, 7, 8, 9, 10, 11hits_and_tracks_<detector_type>_events_all.csv- 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
Report
A textual report file is generated and saved under the report folder within an
experiment’s folder tree. The automatically composed report lists many of the
detector-specific configuration directly from the information provided in the relevant
configuration file(s). Not every configuration field is needed by the user and only the
fields facilitating the understanding of the data are considered. Other than the set of
important experiment parameters, basic statistical information regarding the data set are
also included. These fields would be most meaningful when dealing with variable event
conditions, e.g., variable number of tracks.
Usage and citation
If you use this data set in your research or any publication, we kindly request you to cite the following paper:
@misc{Odyurt:2023:REDVID,
author = {Odyurt, Uraz and Swatman, Stephen Nicholas and Varbanescu, Ana-Lucia and
Caron, Sascha},
title = {Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven
Physics Research},
year = {2023},
eprint = {2309.03780},
archivePrefix = {arXiv},
doi = {10.48550/arXiv.2309.03780}
}
We put significant effort into curating and providing this data set and proper citation helps acknowledge and support the continued development of this resource.
Support
Note that this data set is being shared on an “as is” basis, without any express or implied warranties or obligations of support. While we have made efforts to ensure the accuracy and completeness of the data, we cannot guarantee its fitness for any particular purpose or provide any form of ongoing support.
As the creators and sharers of this data set, we are unable to offer any dedicated support or assistance in working with or analysing the data. We do not commit to responding to inquiries, fixing issues, or providing additional documentation or guidance related to this data set. Should you encounter any challenges or have questions, we recommend referring to the existing documentation.
Roadmap
Confidential
Authors and acknowledgement
The REDVID simulation framework and the generated data sets are authored by:
- dr. ir. Uraz Odyurt - Radboud University; Nikhef
The collaborating team includes:
- dr. Sascha Caron - Radboud University; Nikhef
- dr. ir. Ana-Lucia Varbanescu - University of Twente; University of Amsterdam
- dr. Roel Aaij - Nikhef
Previous collaborating members:
- MSc Stephen Nicholas Swatman - University of Amsterdam; CERN
Licence
The data set is licenced under the Creative Commons Attribution 4.0 International License (CC-BY-4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited, as shown above.
If you have any questions regarding the licence or usage of the data set, please contact the authors.
Note: The licence applies only to the data set itself and not to any third-party content or software that may be included with the data set. Please review any licences or terms of use associated with those components separately.
Project status
As of January 2024, the project is under active development.