You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regarding the kernels, the quickest way would be to wrap all kernel launches in C-functions, with void** argument type for any array, which can be called from Fortran.
The memory is more tricky:
The F90 code runs on the CPU, should we transfer input arrays to the GPU ourselves, or assume they are there already? Note that the openacc directives transfer the data every time step.
Intermediate Fortran allocations could be either interchanged with our GPU memory pool allocations (yields fastest code) via the C-layer, or we allocate & transfer after the allocation, as is currently done by the openacc code.
Same for deallocations
For sustainability, it will be useful to setup a 'mirroring' administration connecting CPU-arrays with their GPU equivalent
The memory solution above does not take into account automatic allocations inside subroutines in the Fortran code. If we want to replace those with our pool allocations, more changes need to be made to the Fortran code.
In this issue we can discuss a strategy to insert our tuned kernels into the rte-rrtmgp original source code.
The text was updated successfully, but these errors were encountered: