Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non-standard SIZEOF from GPU subtree #161

Merged
merged 4 commits into from
Oct 16, 2024

Conversation

samhatfield
Copy link
Collaborator

Resolves #160.

Needs testing on Nvidia and AMD platforms.

Copy link
Collaborator

@wdeconinck wdeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this up!

src/trans/gpu/algor/ext_acc.F90 Outdated Show resolved Hide resolved
@@ -98,7 +98,7 @@ SUBROUTINE FTDIR(ALLOCATOR,HFTDIR,PREEL_REAL,PREEL_COMPLEX,KFIELD)
PREEL_COMPLEX => PREEL_REAL
#else
CALL ASSIGN_PTR(PREEL_COMPLEX, GET_ALLOCATION(ALLOCATOR, HFTDIR%HREEL_COMPLEX),&
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*SIZEOF(PREEL_COMPLEX(1)),KIND=C_SIZE_T))
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8,KIND=C_SIZE_T))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8,KIND=C_SIZE_T))
& 1_C_SIZE_T, 1_C_SIZE_T*KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8)

It might be fine, not sure about the type of NLENGTF, but make sure you are not introducing new potential overflows. The value you compute is going to be 8X larger than before. Since you are changing this lines anyway, might be a good point to make those computations in JPIB or C_SIZE_T.

Copy link
Collaborator

@wdeconinck wdeconinck Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the iso_c_binding function c_sizeof returns already bytes in c_size_t and storage_size also accepts a KIND argument alternatively.
So either:

  • C_SIZEOF(PREEL_COMPLEX(1))
  • (STORAGE_SIZE(PREEL_COMPLEX(1),KIND=C_SIZE_T) / 8_C_SIZE_T)
  • INT(STORAGE_SIZE(PREEL_COMPLEX(1)) / 8, KIND=C_SIZE_T)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are four possible solutions:

  • 1_C_SIZE_T*KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8)
  • KFIELD*D%NLENGTF*C_SIZEOF(PREEL_COMPLEX(1))
  • KFIELD*D%NLENGTF*(STORAGE_SIZE(PREEL_COMPLEX(1),KIND=C_SIZE_T) / 8_C_SIZE_T)
  • KFIELD*D%NLENGTF*INT(STORAGE_SIZE(PREEL_COMPLEX(1)) / 8, KIND=C_SIZE_T)

But for all of these, the type of the expression is ambiguous because KFIELD and D%NLENGTF are not C_SIZE_T. Only the original expression

INT(KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8,KIND=C_SIZE_T))

returns a value that is definitely C_SIZE_T. So what's the benefit of changing it? Something I'm missing?

Copy link
Collaborator

@lukasm91 lukasm91 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, what you are missing is my initial thought process, which was: I thought that develop already includes fixes t hat I know did in (#162), which it does not, so I am fine with any solution. But: What I wanted to say initially is that the intermediate KFIELD*D%NLENGTF*STORAGE_SIZE(...) is 8 times larger than before, so a overflow is more likely than it was before, so we have to make sure that this expression does not overflow, by casing to C_SIZE_T early enough. But if you don't see overflows, you probably can just leave it like this and we can discuss it in #162.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to C_SIZEOF which I hope will at least not introduce more chances for overflow than currently exist.

src/trans/gpu/internal/ftdir_mod.F90 Outdated Show resolved Hide resolved
@samhatfield samhatfield force-pushed the samhatfield/remove_sizeof branch 3 times, most recently from 08f7ae3 to 265a66c Compare October 16, 2024 13:22
@samhatfield
Copy link
Collaborator Author

Okay, I have switched from STORAGE_SIZE to C_SIZEOF, since the latter returns an answer in bytes and is closer to what we were doing before. I've added some casting here and there to try and keep things consistent. Let me know what you think @lukasm91 @wdeconinck.

@samhatfield
Copy link
Collaborator Author

Note: still to be tested. I'm on it.

@samhatfield samhatfield force-pushed the samhatfield/remove_sizeof branch from c815d34 to 5cd712e Compare October 16, 2024 15:02
@samhatfield samhatfield marked this pull request as ready for review October 16, 2024 16:34
@samhatfield
Copy link
Collaborator Author

Successfully tested on LUMI-G up to 8 nodes.

@samhatfield
Copy link
Collaborator Author

And successfully tested on JEDI up to 8 nodes.

Note that PR #158 has broken the benchmark program on both Nvidia platforms I use currently: HPC2020/AC (A100) and JEDI (GH200). I have to revert that commit for now. Something we need to investigate @lukasm91. I'll post the error message here in a bit.

@samhatfield samhatfield merged commit cc10e4c into develop Oct 16, 2024
21 checks passed
@samhatfield samhatfield deleted the samhatfield/remove_sizeof branch October 16, 2024 17:17
@samhatfield
Copy link
Collaborator Author

This error

+ srun --cpus-per-task=72 /p/project1/jureap5/jube-suites/ectrans/ectrans_gpu/000004/000002_execute/work/build/build/bin/ectrans-benchmark-gpu-sp --norms --truncation 316 --niter 100 --nlev 137 --nfld 1 --vordiv --uvders --scders --meminfo
line 156: cudaLaunchKernel returned status 98: invalid device function

line 156: cudaLaunchKernel returned status 98: invalid device function

line 156: cudaLaunchKernel returned status 98: invalid device function

line 156: cudaLaunchKernel returned status 98: invalid device function

srun: error: jpbot-001-01: tasks 1-2: Exited with exit code 1
srun: error: jpbot-001-01: tasks 0,3: Exited with exit code 1

@marsdeno is also aware and is investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants