REDVID Collision Event Data – Tracks and Hits

dr. ir. Uraz Odyurt

Introduction

REDuced VIrtual Detector (REDVID) is a simulation framework and a synthetic data generator written in Python. The generator simulates the propagation of subatomic particles, inspired by the detectors installed at the Large Hadron Collider (LHC). The simulation model is complexity-reduced and is intended for generating source data for Machine Learning (ML) algorithms.

For further information, refer to REDVID website.

Detector geometry

A detector’s geometry consists of multiple layers of sub-detectors of different shapes, belonging to different categories. Each category defines the relevant shape of a sub-detector. However, dimensions and the placement of different layers, relative to the detector origin, will vary within each category.

Such details, as well as particulars defining the presence/absence of categories of sub-detectors are provided as a configuration file(s).

2D structure

As a result of the oversimplified design of this 2D variant, sub-detector categories have minute differences. In essence, all sub-detector layers are circles centred at the origin. Although the simulator supports filled or partially filled circle designs, the only filled sub-detector is the innermost circle, i.e., the Pixel sub-detector.

The following sub-detector categories are available for a 2D variant:

3D structure

The 3D variant is a big step forward towards closing the gap with the real-world detector apparatus. The available sub-detectors can take the forms of disks and cylinders, or generally speaking, cylinders. As a matter of fact, a disk is a special case for the cylinder shape.

The following sub-detector categories are available for a 3D variant:

The Pixel and the Barrel categories are of the cylindrical shape, with the Pixel being a filled shape and the Barrel being only a shell and without the end-caps. These two categories are centred at the origin.

Both the Short-strip and the Long-strip categories consist of sub-detectors in the shape of disks. These disks are centred on the Z-axis. There are multiple such disks, which are mirrored relative to the XY-plane. In other words, each disk has an identical twin on the opposite side of the XY-plane. The shape, the size and the orientation of these pairs are the same.

All sub-detector shapes, regardless of their category, are positioned around the Z-axis, i.e., the Z-axis goes through their centre.

Data description

The generated data includes information on geometry, tracks and hits from experiments. While the information on the geometry is included, the bulk of the data set covers data for tracks and hits. There are differences in the data generated for the 2D and the 3D structures.

Tracks and hits belonging to the experiments performed on the 2D structure are given by means of function coefficients and point coordinates in the Cartesian coordinate system, respectively. Additionally, conversion to the polar coordinate system is included.

Tracks and hits belonging to the experiments performed on the 3D structure on the other hand, are defined as parameters of line equations and point coordinates in the Cylindrical coordinate system, respectively.

Folder structure

The generated folder structure, holding the data files, is as follows:

ANCHOR_PATH
|-- detector_<detector_id>
    |-- detector_<detector_type>
        |-- experiment_<detector_type>_<experiment_tag>
            |-- events_all
                |-- hits_<detector_type>_events_all.csv
                |-- hits_and_tracks_<detector_type>_events_all.csv
                |-- hits_and_tracks_polar_<detector_type>_events_all.csv
                |-- tracks_<detector_type>_events_all.csv
            |-- events_individual
                |-- event_<detector_type>_<event_id>
                    |-- hits_<detector_type>_<event_id>.csv
                    |-- tracks_<detector_type>_<event_id>.csv
                |-- ...
            |-- report
                |-- dataset_<detector_type>_report.txt
        |-- geometry_<detector_type>
            |-- geometry_<detector_type>.csv

File combinations

The generated data is saved in multiple CSV files, with the same data being replicated in three file combinations. The user can opt for any of these combinations and will end up with a complete set. These file combinations are as follows:

Conversions to polar coordinate system for the 2D structure

A post-generation step exists, performing the conversion of the original hit point coordinates from the Cartesian system into the polar coordinate system, as well as the conversion of the track line slope to a degree given in radians.

The extra headers generated as a result of this step are appended as new columns to the most comprehensive data file, hits_and_tracks_<detector_type>_events_all.csv. The resulting extended CSV file is save as hits_and_tracks_polar_<detector_type>_events_all.csv in the same location.

Data headers for the 2D structure

Data headers, i.e., CSV column titles, apply to all CSV files. Different CSV files include different subsets of the headers, depending on the contained data. These headers are as follows:

  1. event_id - An incremental identifier for events belonging to an experiment, which is unique within the scope of the experiment.
    Type => integer

  2. sub_detector_id - An incremental identifier for different sub-detector layers belonging to a geometry, which is unique within the scope of the geometry.
    Type => integer

  3. sub_detector_type - The type of the sub-detector layer recording a hit, which can be one of three available types, pixel, short-strip, or long-strip.
    Type => string

  4. track_id - An incremental identifier for tracks belonging to an event, which is unique within the scope of the event.
    Type => integer

  5. track_type - Indicates the type of function defining the track in terms of polynomial degree. At the moment, all tracks are ‘linear’.
    Type => string

  6. coefficient_1 - The first track polynomial function coefficient. Not applicable.
    Type => float

  7. coefficient_2 - The second track polynomial function coefficient. Not applicable.
    Type => float

  8. slope - The third track polynomial function coefficient (coefficient_3), i.e., slope.
    Type => float

  9. y_intercept - The forth track polynomial function coefficient (coefficient_4), i.e., y-intercept.
    Type => float

  10. hit_id - An incremental identifier for hits belonging to an event, which is unique within the scope of the event.
    Type => integer

  11. hit_x - The X coordinate of the hit, in the Cartesian coordinate system.
    Type => float

  12. hit_y - The Y coordinate of the hit, in the Cartesian coordinate system.
    Type => float

  13. track_theta (polar) - The slope degree of the track line in radians.
    Type => float

  14. hit_r (polar) - The vector radius, or in other words, the distance of the hit from the origin.
    Type => float

  15. hit_theta (polar) - The hit vector slope degree in radians. As a result of the added random noise during data generation, this value is slightly different compared to the track_theta value.
    Type => float

The header inclusion map for different files are as follows:

Data headers for the 3D structure

Data headers, i.e., CSV column titles, apply to all CSV files. Different CSV files include different subsets of the headers, depending on the contained data. These headers are as follows:

  1. event_id - An incremental identifier for events belonging to an experiment, which is unique within the scope of the experiment.
    Type => integer

  2. sub_detector_id - An incremental identifier for different sub-detector layers belonging to a geometry, which is unique within the scope of the geometry.
    Type => integer

  3. sub_detector_type - The type of the sub-detector layer recording a hit, which can be one of three available types, pixel, short-strip, or long-strip.
    Type => string

  4. track_id - An incremental identifier for tracks belonging to an event, which is unique within the scope of the event.
    Type => integer

  5. track_type - Indicates the type of function defining the track in terms of polynomial degree. Available types are ‘linear’, ‘helical_uniform’ and ‘helical_expanding’.
    Type => string

  6. r_0 or radial_const - The r coordinate of the (r, theta, z) tuple defining the point P_0, used in a track’s parametric set of equations. The value will represent origin smearing for r. r_0 and radial_const are applicable to ‘linear’ and ‘helical_expanding’ track types, respectively.
    Type => float

  7. theta_0 or azimuthal_const - The theta coordinate of the (r, theta, z) tuple defining the point P_0, used in a track’s parametric set of equations. The value will represent origin smearing for theta. theta_0 and azimuthal_const are applicable to ‘linear’ and ‘helical_expanding’ track types, respectively.
    Type => float

  8. z_0 or pitch_const - The z coordinate of the (r, theta, z) tuple defining the point P_0, used in a track’s parametric set of equations. The value will represent origin smearing for z. z_0 and pitch_const are applicable to ‘linear’ and ‘helical_expanding’ track types, respectively.
    Type => float

  9. r_d - The r coordinate of the (r, theta, z) tuple defining the direction vector V_d, used in a track’s parametric set of equations. r_d is applicable to the ‘linear’ track type.
    OR,
    radial_coeff - The coefficient affecting the radius rate in the helical track. radial_coeff is applied to the free variable in the equation for r. radial_coeff is applicable to the ‘helical_expanding’ track type.
    Type => float

  10. theta_d - The theta coordinate of the (r, theta, z) tuple defining the direction vector V_d, used in a track’s parametric set of equations. theta_d is applicable to the ‘linear’ track type.
    OR,
    azimuthal_coeff - The coefficient affecting the clockwise/counter-clockwise extrusion direction of the helical track. azimuthal_coeff is applied to the free variable in the equation for theta. azimuthal_coeff is applicable to the ‘helical_expanding’ track type.
    Type => float

  11. z_d - The z coordinate of the (r, theta, z) tuple defining the direction vector V_d, used in a track’s parametric set of equations. This value will be 1 or -1, depending on which side of the XY-plane the track is being directed to. z_d is applicable to the ‘linear’ track type.
    OR,
    pitch_coeff - The coefficient affecting the pitch rate in the helical track. pitch_coeff is applied to the free variable in the equation for z. pitch_coeff is applicable to the ‘helical_expanding’ track type.
    Type => integer

  12. hit_id - An incremental identifier for hits belonging to an event, which is unique within the scope of the event.
    Type => integer

  13. hit_r - The r coordinate of the (r, theta, z) tuple defining the recorded hit point on the relevant sub-detector.
    Type => float

  14. hit_theta - The theta coordinate of the (r, theta, z) tuple defining the recorded hit point on the relevant sub-detector.
    Type => float

  15. hit_z - The z coordinate of the (r, theta, z) tuple defining the recorded hit point on the relevant sub-detector.
    Type => float

The header inclusion map for different files are as follows:

Report

A textual report file is generated and saved under the report folder within an experiment’s folder tree. The automatically composed report lists many of the detector-specific configuration directly from the information provided in the relevant configuration file(s). Not every configuration field is needed by the user and only the fields facilitating the understanding of the data are considered. Other than the set of important experiment parameters, basic statistical information regarding the data set are also included. These fields would be most meaningful when dealing with variable event conditions, e.g., variable number of tracks.

Usage and citation

If you use this data set in your research or any publication, we kindly request you to cite the following paper:

@misc{Odyurt:2023:REDVID,
  author = {Odyurt, Uraz and Swatman, Stephen Nicholas and Varbanescu, Ana-Lucia and 
    Caron, Sascha},
  title = {Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven 
    Physics Research}, 
  year = {2023},
  eprint = {2309.03780},
  archivePrefix = {arXiv},
  doi = {10.48550/arXiv.2309.03780}
}

We put significant effort into curating and providing this data set and proper citation helps acknowledge and support the continued development of this resource.

Support

Note that this data set is being shared on an “as is” basis, without any express or implied warranties or obligations of support. While we have made efforts to ensure the accuracy and completeness of the data, we cannot guarantee its fitness for any particular purpose or provide any form of ongoing support.

As the creators and sharers of this data set, we are unable to offer any dedicated support or assistance in working with or analysing the data. We do not commit to responding to inquiries, fixing issues, or providing additional documentation or guidance related to this data set. Should you encounter any challenges or have questions, we recommend referring to the existing documentation.

Roadmap

Confidential

Authors and acknowledgement

The REDVID simulation framework and the generated data sets are authored by:

The collaborating team includes:

Previous collaborating members:

Licence

The data set is licenced under the Creative Commons Attribution 4.0 International License (CC-BY-4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited, as shown above.

If you have any questions regarding the licence or usage of the data set, please contact the authors.

Note: The licence applies only to the data set itself and not to any third-party content or software that may be included with the data set. Please review any licences or terms of use associated with those components separately.

Project status

As of January 2024, the project is under active development.