Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't run multi GPU test if enough GPU's aren't present #209

Merged
merged 1 commit into from
Jul 31, 2020

Conversation

PDoakORNL
Copy link
Contributor

this will prevent multi gpu tests from running if multiple GPU's are not present

@PDoakORNL PDoakORNL requested review from biddisco and gbalduzz July 30, 2020 17:23
@efdazedo
Copy link

Do you mean the ringG algorithm would fail if number of GPU < 3? One may think the ringG algorithm with non-blocking MPI isend/irecv would work even if there is only 1 MPI rank. I think it is also possible for multiple MPI ranks to share the same GPU.

Copy link
Contributor

@gbalduzz gbalduzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to have an hotfix before having a more generic implementation, but I would find confusing relying on an unrelated property for the switch.

if (DCA_HAVE_CUDA)
EXECUTE_PROCESS(COMMAND bash -c "nvidia-smi -L | awk 'BEGIN { num_gpu=0;} /GPU/ { num_gpu++;} END { printf(\"%d\", num_gpu) }'"
Copy link
Contributor

@gbalduzz gbalduzz Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ring algorithm should be independendent from the numbers of GPUs on the node. Rather the technical issue is that we rely on cuda aware MPI. Can we have a check similar to hostname==summit for the moment? Assuming automatically detecting if MPI is cuda aware is more complicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cvd script will definitely have undefined behavior if there is not 1 GPU per rank.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FAQ for Open-MPI, there is mention of "Can I tell at compile time or runtime whether I have CUDA-aware support?" . It seems to test for MPIX_CUDA_AWARE_SUPPORT
https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-dev

Just FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #210 and with that I think this should go in so #208 and #206 can pass CI and go in.

@PDoakORNL PDoakORNL dismissed gbalduzz’s stale review July 31, 2020 16:10

Issue added to address mpi with gpu aware dependency

@PDoakORNL PDoakORNL merged commit f3ac453 into CompFUSE:master Jul 31, 2020
gbalduzz added a commit to gbalduzz/DCA that referenced this pull request Aug 3, 2020
…PU_tests"

This reverts commit f3ac453, reversing
changes made to 0460fba.
@gbalduzz gbalduzz mentioned this pull request Aug 3, 2020
gbalduzz added a commit to gbalduzz/DCA that referenced this pull request Aug 3, 2020
…PU_tests"

This reverts commit f3ac453, reversing
changes made to 0460fba.
gbalduzz added a commit to gbalduzz/DCA that referenced this pull request Aug 3, 2020
…PU_tests"

This reverts commit f3ac453, reversing
changes made to 0460fba.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants