CUDA Kernel Parameters #11

Ali-Tehrani · 2024-11-17T15:26:26Z

CUDA 11.7 introduced __grid_constant__ so that function parameters reside inside constant memory. This makes it much easier to pass the basis-set, similar to GBasis, rather than the approach taken in cuGBasis. But threads reading constant memory, reads through with a cache line (see Section 3.4 in "Dissecting Turing T4 GPU"), still making the approach that cuGBasis has, reaching high optimal performance since it exploits spatial locality. Based on Figure 3.9, there is 100-200 Latency (clock-cycles) improvement.

However, the parameters of the CUDA kernel function residing within constant memory of the GPU, was limited to only 4096 bytes, which means it can only store 512 (double-precision) numbers.

Starting with CUDA 12.1 (see here (and Volta architecture and higher), kernel functions can now have parameters 32,764 bytes (4095 double-precision numbers). This is still less than the 64 kilobytes of constant memory, but would make coding and storing other kinds of parameters easier.

This would be beneficial for Promolecular Coefficients/Exponents to be stored as kernel parameters, and use the constant memory for atomic coordinates, making it significantly easier to add more atoms. For example, each atom has roughly 20-27 coefficients for both S-Type and P-type, making for each atom a total of 80-108 numbers per atom.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Kernel Parameters #11

CUDA Kernel Parameters #11

Ali-Tehrani commented Nov 17, 2024

CUDA Kernel Parameters #11

CUDA Kernel Parameters #11

Comments

Ali-Tehrani commented Nov 17, 2024