You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a simple kernel tuner script (sw_source_adding_kernel.py). The advantage of this kernel is that it does not require realistic input, so np.zeros() or np.random.random() as input is fine. The only disadvantage is that you can't do correctness checking with the kernel tuner.
I also tested/tuned the kernel for a varying number of nlay (which is something which will frequently vary in e.g. our LES runs), in this case nlay={64, 128, 192, 256}, the fastest configurations are always with block_size_x=32, and is not sensitive to the choice of block_size_y.
The old configuration was already quite optimal, the best configuration from the kernel tuner is only 8% faster.
I'm starting with this kernel..
The text was updated successfully, but these errors were encountered: