Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Kernel Parameters #11

Open
Ali-Tehrani opened this issue Nov 17, 2024 · 0 comments
Open

CUDA Kernel Parameters #11

Ali-Tehrani opened this issue Nov 17, 2024 · 0 comments

Comments

@Ali-Tehrani
Copy link
Collaborator

CUDA 11.7 introduced __grid_constant__ so that function parameters reside inside constant memory. This makes it much easier to pass the basis-set, similar to GBasis, rather than the approach taken in cuGBasis. But threads reading constant memory, reads through with a cache line (see Section 3.4 in "Dissecting Turing T4 GPU"), still making the approach that cuGBasis has, reaching high optimal performance since it exploits spatial locality. Based on Figure 3.9, there is 100-200 Latency (clock-cycles) improvement.

However, the parameters of the CUDA kernel function residing within constant memory of the GPU, was limited to only 4096 bytes, which means it can only store 512 (double-precision) numbers.

Starting with CUDA 12.1 (see here (and Volta architecture and higher), kernel functions can now have parameters 32,764 bytes (4095 double-precision numbers). This is still less than the 64 kilobytes of constant memory, but would make coding and storing other kinds of parameters easier.

This would be beneficial for Promolecular Coefficients/Exponents to be stored as kernel parameters, and use the constant memory for atomic coordinates, making it significantly easier to add more atoms. For example, each atom has roughly 20-27 coefficients for both S-Type and P-type, making for each atom a total of 80-108 numbers per atom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant