New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Single-precision support for CUDA variants #91

Merged

reuterbal merged 4 commits into develop from nams-cuda-sp-single-dir

Jul 18, 2024

Contributor

MichaelSt98 commented Jul 16, 2024 •

edited

Loading

Tested via:

./cloudsc-bundle build --clean --build-dir=build-sp-cuda-hdf5 --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu --single-precision
./cloudsc-bundle build --clean --build-dir=build-dp-cuda-hdf5 --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu
./cloudsc-bundle build --clean --build-dir=build-sp-cuda-serialbox --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu --single-precision --with-serialbox
./cloudsc-bundle build --clean --build-dir=build-dp-cuda-serialbox --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu --with-serialbox

and cd build-... and . ./env.sh

./bin/dwarf-cloudsc-c-cuda 1 65536 32
./bin/dwarf-cloudsc-c-cuda-hoist 1 65536 32
./bin/dwarf-cloudsc-c-cuda-k-caching 1 65536 32


          single precision CUDA (via preprocessor macro(s))

67d1265

MichaelSt98 force-pushed the nams-cuda-sp-single-dir branch from 555366a to 67d1265 Compare

July 16, 2024 11:06

MichaelSt98 requested a review from reuterbal

July 16, 2024 11:32

reuterbal reviewed

View reviewed changes

Collaborator

reuterbal left a comment

Thanks, this looks great, definitely worth doing over replicating source for single precision. I've left a few remarks where things seemed a bit off, otherwise very happy with this.

NB: I haven't tested this myself, yet, which I would do before merging

src/cloudsc_cuda/cloudsc/cloudsc_validate.cu Outdated

Comment on lines 63 to 64

		// # pragma omp parallel for default(shared) private(b, bsize, jk) \
		// reduction(min:zminval) reduction(max:zmaxval,zmaxerr) reduction(+:zerrsum,zsum)

Collaborator

reuterbal Jul 16, 2024

This should likely be removed entirely?

src/cloudsc_cuda/cloudsc/cloudsc_validate.cu Outdated

Comment on lines 102 to 103

		// # pragma omp parallel for default(shared) private(b, bsize, jl, jk) \
		// reduction(min:zminval) reduction(max:zmaxval,zmaxerr) reduction(+:zerrsum,zsum)

Collaborator

reuterbal Jul 16, 2024

Same

src/cloudsc_cuda/cloudsc/cloudsc_validate.cu Outdated

Comment on lines 144 to 145

		// # pragma omp parallel for default(shared) private(b, bsize, jl, jk, jm) \
		// reduction(min:zminval) reduction(max:zmaxval,zmaxerr) reduction(+:zerrsum,zsum)

Collaborator

reuterbal Jul 16, 2024

ditto

src/cloudsc_cuda/cloudsc/dtype.h Outdated

Collaborator

reuterbal Jul 16, 2024

Probably needs single include guards:

#ifndef CLOUDSC_DTYPE_H
#define CLOUDSC_DTYPE_H

...

#endif

src/cloudsc_cuda/cloudsc/cloudsc_driver.cu

@@ @@ -12,6 +12,7 @@ @@
               #include <omp.h>
               #include "mycpu.h"
+              // #include "dtype.h"

Collaborator

reuterbal Jul 16, 2024

This should probably be here but has likely caused problems because of duplicate definitions without the include guards?

Contributor Author

MichaelSt98 Jul 16, 2024

The include is within cloudsc_driver.h and cloudsc_driver.h is included in cloudsc_driver.cu

src/cloudsc_cuda/cloudsc/load_state.cu

Collaborator

reuterbal Jul 16, 2024

Should probably have a #include "dtype.h"?

src/cloudsc_cuda/cloudsc/load_state.cu Outdated


		#pragma omp parallel for default(shared) private(b, i, buf_start_idx, buf_idx)
		#pragma omp parallel for default(shared) private(b, l, i, buf_start_idx, buf_idx)

Collaborator

reuterbal Jul 16, 2024

The l appears unused?

src/cloudsc_cuda/cloudsc/load_state.cu Outdated


		#pragma omp parallel for default(shared) private(b, i, buf_start_idx, buf_idx)
		#pragma omp parallel for default(shared) private(b, l, i, buf_start_idx, buf_idx)

Collaborator

reuterbal Jul 16, 2024

The l appears to be unused?

src/cloudsc_cuda/cloudsc/load_state.cu Outdated

+                dtype (*buffer)[nlev][nlon] = (dtype (*)[nlev][nlon]) buffer_in;
+                dtype (*field)[nclv][nlev][nproma] = (dtype (*)[nclv][nlev][nproma]) field_in;
               #pragma omp parallel for default(shared) private(b, buf_start_idx, buf_idx, l, i)

Collaborator

reuterbal Jul 16, 2024

This should probably also have c as private?

src/cloudsc_cuda/cloudsc/load_state.cu Outdated

               }
               void load_and_expand_1d_int(serialboxSerializer_t *serializer, serialboxSavepoint_t* savepoint,
                   const char *name, int nlon, int nproma, int ngptot, int nblocks, int *field)
               {
-                int buffer[nlon];
+                double buffer[nlon];

Collaborator

reuterbal Jul 16, 2024

This should load ints, why does the buffer have to be double?

Contributor Author

MichaelSt98 Jul 16, 2024

there is no reason ... not sure why I changed that

MichaelSt98 added 3 commits

July 17, 2024 07:11


          re-enable OpenMP pragmas for CPU threading (cloudsc_cuda/cloudsc/clou…

a767965

…dsc_validate.cu)


          introduce header/include guards for 'dtype.h'

fd05472


          fix openmp pragmas and type of buffer

3b79fc4

MichaelSt98 requested a review from reuterbal

July 18, 2024 09:40

reuterbal approved these changes

View reviewed changes

Collaborator

reuterbal left a comment

Looks great and tested building and running on AC. Many thanks!

reuterbal changed the title ~~CUDA SP~~ Single-precision support for CUDA variants

reuterbal merged commit 63448cc into develop

18 checks passed

reuterbal deleted the nams-cuda-sp-single-dir branch

July 18, 2024 10:29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet