You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUDA 11.7 introduced__grid_constant__ so that function parameters reside inside constant memory. This makes it much easier to pass the basis-set, similar to GBasis, rather than the approach taken in cuGBasis. But threads reading constant memory, reads through with a cache line (see Section 3.4 in "Dissecting Turing T4 GPU"), still making the approach that cuGBasis has, reaching high optimal performance since it exploits spatial locality. Based on Figure 3.9, there is 100-200 Latency (clock-cycles) improvement.
However, the parameters of the CUDA kernel function residing within constant memory of the GPU, was limited to only 4096 bytes, which means it can only store 512 (double-precision) numbers.
Starting with CUDA 12.1 (see here (and Volta architecture and higher), kernel functions can now have parameters 32,764 bytes (4095 double-precision numbers). This is still less than the 64 kilobytes of constant memory, but would make coding and storing other kinds of parameters easier.
This would be beneficial for Promolecular Coefficients/Exponents to be stored as kernel parameters, and use the constant memory for atomic coordinates, making it significantly easier to add more atoms. For example, each atom has roughly 20-27 coefficients for both S-Type and P-type, making for each atom a total of 80-108 numbers per atom.
The text was updated successfully, but these errors were encountered:
CUDA 11.7 introduced
__grid_constant__
so that function parameters reside inside constant memory. This makes it much easier to pass the basis-set, similar to GBasis, rather than the approach taken in cuGBasis. But threads reading constant memory, reads through with a cache line (see Section 3.4 in "Dissecting Turing T4 GPU"), still making the approach that cuGBasis has, reaching high optimal performance since it exploits spatial locality. Based on Figure 3.9, there is 100-200 Latency (clock-cycles) improvement.However, the parameters of the CUDA kernel function residing within constant memory of the GPU, was limited to only 4096 bytes, which means it can only store 512 (double-precision) numbers.
Starting with CUDA 12.1 (see here (and Volta architecture and higher), kernel functions can now have parameters 32,764 bytes (4095 double-precision numbers). This is still less than the 64 kilobytes of constant memory, but would make coding and storing other kinds of parameters easier.
This would be beneficial for Promolecular Coefficients/Exponents to be stored as kernel parameters, and use the constant memory for atomic coordinates, making it significantly easier to add more atoms. For example, each atom has roughly 20-27 coefficients for both S-Type and P-type, making for each atom a total of 80-108 numbers per atom.
The text was updated successfully, but these errors were encountered: