Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break Kernels based on shell-type and using CUDA Streams #8

Open
Ali-Tehrani opened this issue Nov 4, 2024 · 0 comments
Open

Break Kernels based on shell-type and using CUDA Streams #8

Ali-Tehrani opened this issue Nov 4, 2024 · 0 comments

Comments

@Ali-Tehrani
Copy link
Collaborator

Ali-Tehrani commented Nov 4, 2024

To greatly increase performance with minimal coding changes, it may be worthwhile to break up the computation of the kernels (e.g., computing atomic orbitals, derivatives of atomic orbitals, second derivatives, etc.) based on the Shell-Type. Currently, all functions are limited by the maximum of 255 registers per thread, which reduces the number of active threads. Based on profiling, I have observed that at most 8 warps are running concurrently. Additionally, breaking up the kernels can lead to reduced branch divergence and better compiler optimizations.

If you break up the kernels so that compute_atomic_orbitals<S>, compute_atomic_orbitals<P> etc, then the S-type can use less registers and more threads can be running at a time. Further, using CUDA Streams would mean that the shell-type functions are runned at the same time. This approach could also eliminate any if-statements by utilizing template specialization.

#define STYPE  0
#define PTYPE 1
...
template<int ShellType>
void compute_atomic_orbitals(...)
    if constexpr (ShellType == STYPE) {
         compute s-type orbitals
     }

To implement this for evaluate_scalar_quantity you'll need either need to change it so that it takes an array of function pointers with each entry the S-type function, P-type function etc or simpler but harder to understand, use templates. CUDA streams would be added here, as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant