-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove non-standard SIZEOF from GPU subtree #161
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this up!
src/trans/gpu/internal/ftdir_mod.F90
Outdated
@@ -98,7 +98,7 @@ SUBROUTINE FTDIR(ALLOCATOR,HFTDIR,PREEL_REAL,PREEL_COMPLEX,KFIELD) | |||
PREEL_COMPLEX => PREEL_REAL | |||
#else | |||
CALL ASSIGN_PTR(PREEL_COMPLEX, GET_ALLOCATION(ALLOCATOR, HFTDIR%HREEL_COMPLEX),& | |||
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*SIZEOF(PREEL_COMPLEX(1)),KIND=C_SIZE_T)) | |||
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8,KIND=C_SIZE_T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8,KIND=C_SIZE_T)) | |
& 1_C_SIZE_T, 1_C_SIZE_T*KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8) |
It might be fine, not sure about the type of NLENGTF, but make sure you are not introducing new potential overflows. The value you compute is going to be 8X larger than before. Since you are changing this lines anyway, might be a good point to make those computations in JPIB or C_SIZE_T.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that the iso_c_binding function c_sizeof returns already bytes in c_size_t and storage_size also accepts a KIND argument alternatively.
So either:
C_SIZEOF(PREEL_COMPLEX(1))
(STORAGE_SIZE(PREEL_COMPLEX(1),KIND=C_SIZE_T) / 8_C_SIZE_T)
INT(STORAGE_SIZE(PREEL_COMPLEX(1)) / 8, KIND=C_SIZE_T)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are four possible solutions:
1_C_SIZE_T*KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8)
KFIELD*D%NLENGTF*C_SIZEOF(PREEL_COMPLEX(1))
KFIELD*D%NLENGTF*(STORAGE_SIZE(PREEL_COMPLEX(1),KIND=C_SIZE_T) / 8_C_SIZE_T)
KFIELD*D%NLENGTF*INT(STORAGE_SIZE(PREEL_COMPLEX(1)) / 8, KIND=C_SIZE_T)
But for all of these, the type of the expression is ambiguous because KFIELD
and D%NLENGTF
are not C_SIZE_T
. Only the original expression
INT(KFIELD*D%NLENGTF*STORAGE_SIZE(PREEL_COMPLEX(1))/8,KIND=C_SIZE_T))
returns a value that is definitely C_SIZE_T
. So what's the benefit of changing it? Something I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, what you are missing is my initial thought process, which was: I thought that develop already includes fixes t hat I know did in (#162), which it does not, so I am fine with any solution. But: What I wanted to say initially is that the intermediate KFIELD*D%NLENGTF*STORAGE_SIZE(...)
is 8 times larger than before, so a overflow is more likely than it was before, so we have to make sure that this expression does not overflow, by casing to C_SIZE_T early enough. But if you don't see overflows, you probably can just leave it like this and we can discuss it in #162.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched to C_SIZEOF
which I hope will at least not introduce more chances for overflow than currently exist.
08f7ae3
to
265a66c
Compare
Okay, I have switched from |
Note: still to be tested. I'm on it. |
c815d34
to
5cd712e
Compare
Successfully tested on LUMI-G up to 8 nodes. |
And successfully tested on JEDI up to 8 nodes. Note that PR #158 has broken the benchmark program on both Nvidia platforms I use currently: HPC2020/AC (A100) and JEDI (GH200). I have to revert that commit for now. Something we need to investigate @lukasm91. I'll post the error message here in a bit. |
This error
@marsdeno is also aware and is investigating. |
Resolves #160.
Needs testing on Nvidia and AMD platforms.