You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the containerized environment on hh5 is using an OpenMPI build located at /g/data/hh5/public/apps/openmpi/4.1.6. However, upon checking the build, it appears to actually be version 4.1.7.
Since the default OpenMPI on Gadi is now also version 4.1.7, there shouldn’t be any inherent difference. However, I’ve encountered an issue: there are no Conda packages available for OpenMPI 4.1.7, and most existing packages were built with links to version 4.1.6.
Here’s what I’ve tried so far:
Loading the Gadi OpenMPI module before the Singularity module in the .common_v3 script.
Linking to a copy of the OpenMPI 4.1.6 build in /g/data/xp65/public/apps/openmpi/4.1.6. (This is a copy for now, and I may need to perform a proper build here.)
Unfortunately, we’re still encountering intermittent issues, including:
Problems with OpenFabrics and Infiniband.
Significant slowdowns or errors due to missing library links.
I’m curious about the rationale behind your current implementation. I understand you’re busy with your new role, but any insights or advice would be greatly appreciated.
Thanks for your help!
The text was updated successfully, but these errors were encountered:
Hi @dsroberts,
I noticed that the containerized environment on hh5 is using an OpenMPI build located at /g/data/hh5/public/apps/openmpi/4.1.6. However, upon checking the build, it appears to actually be version 4.1.7.
Since the default OpenMPI on Gadi is now also version 4.1.7, there shouldn’t be any inherent difference. However, I’ve encountered an issue: there are no Conda packages available for OpenMPI 4.1.7, and most existing packages were built with links to version 4.1.6.
Here’s what I’ve tried so far:
Loading the Gadi OpenMPI module before the Singularity module in the .common_v3 script.
Linking to a copy of the OpenMPI 4.1.6 build in /g/data/xp65/public/apps/openmpi/4.1.6. (This is a copy for now, and I may need to perform a proper build here.)
Unfortunately, we’re still encountering intermittent issues, including:
Problems with OpenFabrics and Infiniband.
Significant slowdowns or errors due to missing library links.
I’m curious about the rationale behind your current implementation. I understand you’re busy with your new role, but any insights or advice would be greatly appreciated.
Thanks for your help!
The text was updated successfully, but these errors were encountered: