Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix overflows up to at least tco1279 #162

Merged
merged 6 commits into from
Dec 13, 2024

Conversation

lukasm91
Copy link
Collaborator

This is one more change that Olivier ported from my branch to his branch, and now I am taking it to develop.

The upper end of the sizes that this allows is certainly more "theoretical" because we can't run simulations with it, but even for real use-cases, we might see overflows without this, and those can be very difficult to spot. With one rank, we can grow to very large domains, with multiple ranks, at some point we run into overflows of the MPI buffers, those are properly diagnosed.

This will conflict with #161 , but should be trivial to resolve.

@lukasm91 lukasm91 changed the title Fix overflows up to tco1279 Fix overflows up to at least tco1279 Oct 16, 2024
@lukasm91 lukasm91 changed the base branch from main to develop October 17, 2024 06:30
@lukasm91
Copy link
Collaborator Author

lukasm91 commented Dec 5, 2024

@samhatfield This is the second last PR from the original GPU branch that is missing and I propose for integration. This should be much more straight forward than the previous one. I am using this for a while now without issues, and it is very useful because the overflows are very hard to find, and it is nice to be able experimenting on a single node/GPU.

@samhatfield
Copy link
Collaborator

Noted, thanks Lukas. I'll try to find time to look at this in the next week.

Copy link
Collaborator

@samhatfield samhatfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor comments.

A common pattern in this PR is something like

2_JPIB*D%NLENGT0B*KF_LEG*C_SIZEOF(PFBUF(1))

which involves a mixture of integer types. I don't usually do this in Fortran so I'm not sure what the rules are for casting in these cases. Can someone reassure me that it's all fine? Here we have JPIB * JPIM * C_SIZE_T.

@samhatfield samhatfield added enhancement New feature or request gpu labels Dec 9, 2024
@samhatfield samhatfield merged commit 094eca2 into ecmwf-ifs:develop Dec 13, 2024
11 of 12 checks passed
@lukasm91 lukasm91 deleted the fix-overflow branch January 20, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants