Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New asici0xy nodes brought online 2022-08-16 in PR testing missing system-installed BLAS and LAPACK #10893

Closed
bartlettroscoe opened this issue Aug 17, 2022 · 4 comments
Labels
Framework tasks Framework tasks (used internally by Framework team) PA: Framework Issues that fall under the Trilinos Framework Product Area stage: in review Primary work is completed and now is just waiting for human review and/or test feedback

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Aug 17, 2022

CC: @e10harvey, @fryeguy52, @srbdev, @rppawlo, @jhux2, @csiefer2

Internal issues

Description

As shown in this query the new 'ascic0xy' nodes showing:

image

that seem to have gone online just today 2022-08-16 appear to be broken for the GCC builds:

  • rhel7_sems-gnu-7.2.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
  • rhel7_sems-clang-10.0.0-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
  • rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables

These are showing the global build error:

ninja: error: '/usr/lib64/liblapack.so', needed by 'packages/kokkos-kernels/perf_test/batched/KokkosKernels_KokkosBatched_Test_BlockJacobi.exe', missing and no known rule to make it

And the build clang build:

  • rhel7_sems-clang-10.0.0-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables

likely has similar build errors as well since it has hundreds of not-run tests but those build errors are not being reported to CDash for some reason.

It seems the intel-17 builds are not having problems because they use -mkl for BLAS and LAPACK.

This is obviously bringing down PR builds. (I am never going to get PR #10808 merged at this rate.)

@bartlettroscoe bartlettroscoe added Framework tasks Framework tasks (used internally by Framework team) PA: Framework Issues that fall under the Trilinos Framework Product Area labels Aug 17, 2022
@bartlettroscoe bartlettroscoe changed the title New asici0xy nodes brought online 2022-08-16 in PR testing appear to be broken for gcc builds New asici0xy nodes brought online 2022-08-16 in PR testing missing system-installed BLAS and LAPACK Aug 17, 2022
@bartlettroscoe
Copy link
Member Author

And the build clang build ... likely has similar build errors as well since it has hundreds of not-run tests but those build errors are not being reported to CDash for some reason.

FYI, the reason that the clang-10.0.0 builds shown here are not showing any build errors (just lots of "not run" and "failing" tests) is that it is using CMake 3.17.1 which is before Kitware fixed ctest to report global errors to CDash for build errors that are not related to a specific target. Alternatively, the gnu-7.2.0 and gnu-8.3.0 builds are using CMake 3.19.1 which has that fix and therefore shows the global failure:

ninja: error: '/usr/lib64/liblapack.so', needed by 'packages/teuchos/numerics/test/BLAS/TeuchosNumerics_BLAS_ROTG_test.exe', missing and no known rule to make it

@bartlettroscoe
Copy link
Member Author

FYI: @e10harvey said these nodes got added by accident to the pool of PR builds and he has taken them down.

Putting this GitHub issue in review.

@bartlettroscoe bartlettroscoe added the stage: in review Primary work is completed and now is just waiting for human review and/or test feedback label Aug 17, 2022
@bartlettroscoe
Copy link
Member Author

FYI: I have put AT: RETEST on PR #10808 as these failures were the only ones causing problems.

@bartlettroscoe
Copy link
Member Author

As (not) shown in this query, no PR builds have been running on these defective 'ascic0xy' nodes. Therefore, I think it is safe to close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework tasks Framework tasks (used internally by Framework team) PA: Framework Issues that fall under the Trilinos Framework Product Area stage: in review Primary work is completed and now is just waiting for human review and/or test feedback
Projects
None yet
Development

No branches or pull requests

1 participant