Skip to content

pRNG's for host device

Andrei Gheata edited this page Nov 18, 2020 · 5 revisions

This describes the general features/requirements contributing to a "quality score" for future pRNG implementations usable in simulation across host/accelerators. The content can be also used to create a "master" issue for specific pRNG implementation tasks.

Period and statistics quality

As a general remark, simulation on GPU will exploit track-level parallelism, expected to deal with O(106 - 107) tracks concurrently. Each track is undergoing stochastic physics processes, sampled using sequences of random numbers (variates). A general requirement is that these sequences are non-overlapping across the batch of events produced by a given simulation. In addition to that, these sequences must have good statistical qualities, i.e. de-correlation properties demonstrated by passing crush-resistant tests such as DIEHARD or BigCrush.

  • Both the pRNG period and statistical quality are essential criteria for selecting a good candidate for GPU porting (to be quantified)

Repeatability

An important feature is the sequence repeatability on the same hardware configuration, allowing to reproduce the same sequence when starting from the same seed. In addition to this, for the heterogenous use case we need repeatability of the same sequence on both host and device. This is a pre-requisite for reproducibility of results per algorithm across host/device.

  • Repeatability host/device implies that the implementations have to be numerically compatible (same or compatible source code) and produce identical sequences when starting from an arbitrary seed.
  • In a scenario where simulation of a track starts on the host and continues on the device (or vice-versa), moving a pRNG state between host<->device should be able to continue the same sequence as in the case where the state is not moved.

State size

pRNG's use an internal state Sn to produce a single variate, which is advanced then to Sn+1. In a scenario aiming for reproducible simulation, each track has to store such a state to produce an independent and reproducible stream of variates. The state size becomes very important if handling millions of tracks concurrently on the GPU.

  • For similar pRNG performance, the version having smaller state size is to be favoured. Generators having the state more than (100?) bytes are prohibitive for the GPU use case.

Initialization time and uniform variate generation time

A pRNG state needs in general to be initialised from a given seed, except for the cases when the state matches the seed. The seeding time is important for GPU simulation since it has to be counted per primary track, and even more if the generator does not support the skip-ahead feature (see further below), when it has to be counted also per secondary generated track.

  • The generator initialisation time should not be prohibitive (>?), specially for generators not supporting skip-ahead

More importantly, the time to generate an uniformly-distributed variate by advancing the current state, has to be counted per step for each track, multiplied by the number of possible physics processes in the current region.

  • The generation time should be benchmarked against standard generators (e.g. cuRand). This is an essential criteria for algorithms to be considered.

Skip ahead

To have reproducible sequences of particles spawned by a single primary, the most reliable method is to use consecutive non-overlapping streams of the same sequence, assigning the next stream to the next generated secondary track. This is called "skip-ahead", and works in the following way: The primary track uses the stream of states starting with S0. The first generated secondary skips-ahead with a fixed (very large) number of states N, while its siblings skip with 2N, 3N, .... One needs to keep track of the total number of skipped sequences down to the current generation of secondaries.

  • The time to skip-ahead is very important, it multiplies with the total number of particles produced in the simulation

Resources

  • VecMath repository
  • Vectorization of random number generation and reproducibility of concurrent particle transport simulation - ACAT 2019 paper

... to be continued, corrected ...