Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amesos2: KLU2 re-indexing operation causes throw from Map ctor in MueLu solve #13774

Closed
MalachiTimothyPhillips opened this issue Feb 3, 2025 · 5 comments
Labels
client: Sierra All issues that primarily impacts SNL Sierra codes pkg: Amesos2 pkg: MueLu type: bug The primary issue is a bug in Trilinos code or tests

Comments

@MalachiTimothyPhillips
Copy link

Bug Report

@trilinos/amesos2 @iyamazaki

#13740 introduces a Map ctor call that ends up hitting the following error in a Sierra/Fuego test that uses MueLu:

  /sierra/build/linux_rh7/nightly/intel_trilinos_master/objs/tpls/spack/spack/.spack/stage/spack-stage-trilinos-2025.02.02-hcx7s2trdqx6ncqfewvuau5py6eebw5h/spack-src/packages/tpetra/core/src/Tpetra_Map_def.hpp:841:
  Throw number = 1
  Throw test that evaluated to true: minAllGID_ < indexBase_
  Tpetra::Map constructor (noncontiguous): Minimum global ID = 0 over all process(es) is less than the given indexBase = 2005.

Specifically, setting gather_supported=false just before the branch here fixes the issue:

if (gather_supported) {

@MalachiTimothyPhillips MalachiTimothyPhillips added client: Sierra All issues that primarily impacts SNL Sierra codes pkg: Amesos2 type: bug The primary issue is a bug in Trilinos code or tests labels Feb 3, 2025
Copy link

github-actions bot commented Feb 3, 2025

Automatic mention of the @trilinos/muelu team

@MalachiTimothyPhillips
Copy link
Author

MalachiTimothyPhillips commented Feb 3, 2025

@iyamazaki By the way, you can also reproduce this by using KLU2 on the actual matrix. The minimum gid is > 0, so I'm pretty sure you could add a unit test with a single row and a non-zero gid to reproduce this issue.

Here are the relevant matrix files for the error we're observing on np=4:

matrix-files-for-13774.tar.gz

@MalachiTimothyPhillips
Copy link
Author

Maybe the error stems from here:

auto tmpMap = rcp (new contiguous_map_type (numDoFs, nRows, indexBase, rowComm));
?

@iyamazaki
Copy link
Contributor

Thank you, @MalachiTimothyPhillips. I see the issue, and will look into fixing it.

@MalachiTimothyPhillips
Copy link
Author

Closing as #13790 has been merged.

Thank you, @iyamazaki!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: Sierra All issues that primarily impacts SNL Sierra codes pkg: Amesos2 pkg: MueLu type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

2 participants