Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-precision support for CUDA variants #91

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

MichaelSt98
Copy link
Contributor

@MichaelSt98 MichaelSt98 commented Jul 16, 2024

Tested via:

  • ./cloudsc-bundle build --clean --build-dir=build-sp-cuda-hdf5 --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu --single-precision
  • ./cloudsc-bundle build --clean --build-dir=build-dp-cuda-hdf5 --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu
  • ./cloudsc-bundle build --clean --build-dir=build-sp-cuda-serialbox --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu --single-precision --with-serialbox
  • ./cloudsc-bundle build --clean --build-dir=build-dp-cuda-serialbox --arch=arch/ecmwf/hpc2020/nvhpc/22.11 --with-cuda --with-gpu --with-serialbox

and cd build-... and . ./env.sh

  • ./bin/dwarf-cloudsc-c-cuda 1 65536 32
  • ./bin/dwarf-cloudsc-c-cuda-hoist 1 65536 32
  • ./bin/dwarf-cloudsc-c-cuda-k-caching 1 65536 32

@MichaelSt98 MichaelSt98 force-pushed the nams-cuda-sp-single-dir branch from 555366a to 67d1265 Compare July 16, 2024 11:06
@MichaelSt98 MichaelSt98 requested a review from reuterbal July 16, 2024 11:32
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great, definitely worth doing over replicating source for single precision. I've left a few remarks where things seemed a bit off, otherwise very happy with this.

NB: I haven't tested this myself, yet, which I would do before merging

Comment on lines 63 to 64
// # pragma omp parallel for default(shared) private(b, bsize, jk) \
// reduction(min:zminval) reduction(max:zmaxval,zmaxerr) reduction(+:zerrsum,zsum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely be removed entirely?

Comment on lines 102 to 103
// # pragma omp parallel for default(shared) private(b, bsize, jl, jk) \
// reduction(min:zminval) reduction(max:zmaxval,zmaxerr) reduction(+:zerrsum,zsum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Comment on lines 144 to 145
// # pragma omp parallel for default(shared) private(b, bsize, jl, jk, jm) \
// reduction(min:zminval) reduction(max:zmaxval,zmaxerr) reduction(+:zerrsum,zsum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs single include guards:

#ifndef CLOUDSC_DTYPE_H
#define CLOUDSC_DTYPE_H

...

#endif

@@ -12,6 +12,7 @@

#include <omp.h>
#include "mycpu.h"
// #include "dtype.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be here but has likely caused problems because of duplicate definitions without the include guards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The include is within cloudsc_driver.h and cloudsc_driver.h is included in cloudsc_driver.cu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably have a #include "dtype.h"?


#pragma omp parallel for default(shared) private(b, i, buf_start_idx, buf_idx)
#pragma omp parallel for default(shared) private(b, l, i, buf_start_idx, buf_idx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The l appears unused?


#pragma omp parallel for default(shared) private(b, i, buf_start_idx, buf_idx)
#pragma omp parallel for default(shared) private(b, l, i, buf_start_idx, buf_idx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The l appears to be unused?


dtype (*buffer)[nlev][nlon] = (dtype (*)[nlev][nlon]) buffer_in;
dtype (*field)[nclv][nlev][nproma] = (dtype (*)[nclv][nlev][nproma]) field_in;

#pragma omp parallel for default(shared) private(b, buf_start_idx, buf_idx, l, i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also have c as private?

}

void load_and_expand_1d_int(serialboxSerializer_t *serializer, serialboxSavepoint_t* savepoint,
const char *name, int nlon, int nproma, int ngptot, int nblocks, int *field)
{
int buffer[nlon];
double buffer[nlon];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should load ints, why does the buffer have to be double?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no reason ... not sure why I changed that

@MichaelSt98 MichaelSt98 requested a review from reuterbal July 18, 2024 09:40
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and tested building and running on AC. Many thanks!

@reuterbal reuterbal changed the title CUDA SP Single-precision support for CUDA variants Jul 18, 2024
@reuterbal reuterbal merged commit 63448cc into develop Jul 18, 2024
18 checks passed
@reuterbal reuterbal deleted the nams-cuda-sp-single-dir branch July 18, 2024 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants