You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not surprisingly, the bottleneck is computing atomic to molecular orbitals transformation via matrix multiplication
In the case of 786 basis-functions (dipeptide) and 1 million points this is a 786x786 matrix multiplication with 786 x 1,000,000 matrix. This array will end up being a largely sparse matrix, and thus it would motivate to break up the matrices into sub-matrices.
This is in all cases more expensive (50-70-186- ms) than computing the atomic orbitals (their derivs as well), which is around 16-48 ms).
The following algorithm is probably the most efficient.
Let M be the number of basis-functions, N=Number of points and K=number of atoms
It takes 10 ms to compute atomic orbitals, so while it is doing so fill up an array of boolean of size NxK, if a point is non-zero at an atomic orbital for an atom then sets to one its entry in NxK array. This can be done with one instruction using ternary operators, which has a PTX assembly command [slct]https://docs.nvidia.com/cuda/parallel-thread-execution/#comparison-and-selection-instructions-slct). My guess it is a fixed-latency operation.
You can do this by assuming most atoms don't go past couple bond orders, 5-7 angstrom.
Then split up the points and boolean array N_1xK, N_2xK, ..., N_7xK. I choose seven because most atoms wouldn't bond past seven, but this should be optimized. For each subset N_1, N_2 ..., figure out the atoms where the atomic orbitals are positive. Allocate the smaller MO coefficient and Atomic orbitals, Write up a quick copy kernel that transfers based on which atoms should be placed into the smaller should be around (20 ms), then calculate the electron density as usual.
The text was updated successfully, but these errors were encountered:
Not surprisingly, the bottleneck is computing atomic to molecular orbitals transformation via matrix multiplication
In the case of 786 basis-functions (dipeptide) and 1 million points this is a 786x786 matrix multiplication with 786 x 1,000,000 matrix. This array will end up being a largely sparse matrix, and thus it would motivate to break up the matrices into sub-matrices.
This is in all cases more expensive (50-70-186- ms) than computing the atomic orbitals (their derivs as well), which is around 16-48 ms).
The following algorithm is probably the most efficient.
Let M be the number of basis-functions, N=Number of points and K=number of atoms
It takes 10 ms to compute atomic orbitals, so while it is doing so fill up an array of boolean of size NxK, if a point is non-zero at an atomic orbital for an atom then sets to one its entry in NxK array. This can be done with one instruction using ternary operators, which has a PTX assembly command [slct]https://docs.nvidia.com/cuda/parallel-thread-execution/#comparison-and-selection-instructions-slct). My guess it is a fixed-latency operation.
You can do this by assuming most atoms don't go past couple bond orders, 5-7 angstrom.
Then split up the points and boolean array N_1xK, N_2xK, ..., N_7xK. I choose seven because most atoms wouldn't bond past seven, but this should be optimized. For each subset
N_1, N_2 ...
, figure out the atoms where the atomic orbitals are positive. Allocate the smaller MO coefficient and Atomic orbitals, Write up a quick copy kernel that transfers based on which atoms should be placed into the smaller should be around (20 ms), then calculate the electron density as usual.The text was updated successfully, but these errors were encountered: