-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove hard limit on events #1233
Comments
From @amandalund :
My follow up:
|
@amandalund Why doesn't partitioning by charge help? 😕 (I don't think it helps GPU+G4 since we don't yet pass that option through right?) And is this after #1405? And so this is just the regression problems with and without "merge event"? |
Right, toggling It helps a bit in some cases, but more often seems to hurt... I'm not really sure why that is yet. EDIT: Same plot as above but with 4x the number of track slots: |
Ok @sethrj good news: I played around with different numbers of tracks slots, and increasing this number does improve performance. I added plots above using 4x as many track slots. With more slots the partitioning does help, and continues to help relatively more the larger the state size gets (though the overall performance gets worse past a certain point), so could be that with a larger state we get less mixing for longer. |
@amandalund To maintain the memory limits (especially with CMS and on V100) I have the runner script reduce the number of track slots when EDIT: and on an unrelated note, do you mind setting |
2^18 track slots and 1300 primaries per thread (I know for celer-sim we do that division in the |
Oh ok that's great, I was afraid that was with the 2^20 number . |
@amandalund I think we can make a "multiple simultaneous event ring buffer" work by trading off storage space for kernel launches. Suppose our event IDs are just the index into 'simultaneous events'. Then we could put a loop around the primary/secondary construction kernels which are super-fast:
Using a loop instead of increasing resources makes the book-keeping trivial and reduces the resource requirements for initialization. We can also "prioritize" which events are completed first by reordering the event IDs before the loop so that the higher-priority initializers are created last and put onto the top of the initializer stack. |
Problems:
num_events * (num_threads - 1) / num_threads
zeros for its track counter.max_event
raises an assertion.We could simplify reproducibility by requiring all tracks in flight to be from the same event (i.e., the usual way we integrate into Geant4). We could store the single event ID on the "core state" object since we don't need access to it on GPU. (Or, as of #1447, we just use the "unique event ID" to reset the RNG reproducibly.)
If we want to run multiple events simultaneously, the "event IDs" should be more like event slots so we can have up to N events in flight simultaneously, and once that event finishes we send an end-of-event "action" and let the event slot manage a new unique event. This methodology could let us have a single CPU thread + GPU simultaneously handle multiple Geant4 workers.
The text was updated successfully, but these errors were encountered: