Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with openmpi #158

Open
rbeucher opened this issue Jan 20, 2025 · 0 comments
Open

Issues with openmpi #158

rbeucher opened this issue Jan 20, 2025 · 0 comments

Comments

@rbeucher
Copy link
Member

Hi @dsroberts,

I noticed that the containerized environment on hh5 is using an OpenMPI build located at /g/data/hh5/public/apps/openmpi/4.1.6. However, upon checking the build, it appears to actually be version 4.1.7.

Since the default OpenMPI on Gadi is now also version 4.1.7, there shouldn’t be any inherent difference. However, I’ve encountered an issue: there are no Conda packages available for OpenMPI 4.1.7, and most existing packages were built with links to version 4.1.6.

Here’s what I’ve tried so far:

Loading the Gadi OpenMPI module before the Singularity module in the .common_v3 script.
Linking to a copy of the OpenMPI 4.1.6 build in /g/data/xp65/public/apps/openmpi/4.1.6. (This is a copy for now, and I may need to perform a proper build here.)
Unfortunately, we’re still encountering intermittent issues, including:

Problems with OpenFabrics and Infiniband.
Significant slowdowns or errors due to missing library links.
I’m curious about the rationale behind your current implementation. I understand you’re busy with your new role, but any insights or advice would be greatly appreciated.

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant