-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Framework: Some PR build errors showing up as strange 'cmake --build . --config Release -- -j29 -k 0' errors #10823
Comments
CC: @tcclevenger And this error just took out a PR testing iteration for PR #10751 shown in the build: |
CC: @fryeguy52 FYI: I searched the Trilinos 'develop' branch as of commit 1cd5bae:
and I did a search to try to find the code that is generated this command:
by running:
The closest match above is:
That name
But looking at the line from
Well, that does not match the the signature:
It seems that file So I am stumped how this command is getting run as part of Trilinos PR testing. I will see if I can reproduce these errors myself (word is that we should be able to which I will try out now). |
CC: @csiefer2, @e10harvey NOTE: The builds that show these errors are similar to those that show "6" build errors reported in #10836 in that they zero tests "Not Run", "Fail" and "Pass". These are a little harder to search for on CDash but this query seems to select them. Looking over this set of builds, we see different types of errors reported for the command:
These look like real build errors in Trilinos but they are not being reported correctly with each package. Instead, they are just reported for the outer Here are some examples of different build errors reported: 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h :
NOTE: All of the examples above are from the builds 2. MueLu_Test_ETI.hpp ISO C++ forbids declaration of ‘type name’ with no type:
NOTE: All of the examples above are from the builds 3. ninja: error: loading 'build.ninja': No such file or directory:
4. No error output: |
Note, we see the error: 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h above being cleanly reported on the 'vortex' builds with the Thyra package shown here impacting PRs #10834, #10802, #10801, and #10751. What I think is happening is that the same build error for the 'ascic' builds with the 'gnu-7.2.0' and 'gnu-8.3.0' builds is getting reported through the command We need to see if we can reproduce this build error on one of the 'ascic' builds locally. |
FYI: I am trying to reproduce the 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h error for the build:
on the machine 'hpws055'. |
FYI: I tried to reproduce the build error 1.EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h for the build:
from the machine 'hpws055' and I was not successful in doing so. All of Thyra built just fine, including the executable
It appears you can't reproduce Trilinos PR builds on HPWS machines at SNL :-( I will try reproducing on a real 'ascicgpu' machine. Attempt to reproduce 'EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h' build error with 'rhel7_sems-gnu-7.2.0-openmpi-1.10.1' build on 'hpws055' Details: (click to expand)Trying to reproduce the build error "EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h" on the machine 'hpws055'. The repo version is:
Doing the configure, build, and test with:
Well, everything built but I got a bunch of test failures. It seems the problem is:
Hum, it seems you can't reproduce Trilinos PR build test results from an HPWS machine :-( |
FYI: I tried to reproduce the build error 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h for the build:
from the machine 'ascicgpu17' and I was not successful in doing so. All of Thyra built just fine, including the executable Attempt to reproduce 'EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h' build error with 'rhel7_sems-gnu-7.2.0-openmpi-1.10.1' build on 'ascicgpu17' Details: (click to expand)Trying to reproduce the build error "EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h" on the machine 'ascicgpu17'. The repo version is:
Doing the configure, build, and test with:
|
FYI: There is independent confirmation in new issue #10842 of the error 1. EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h. I will move my analysis of this error over to that issue. NOTE: My current hypothesis is that an older version of Trilinos from a couple of weeks ago showed this error but has since been fixed on 'develop'. I will test that hypothesis out and document findings in #10842. |
FYI: There is another clue in #10842 (comment). It seems that you might see the error ** 1. EpetraOperatorWrapper_UnitTests.cpp missing Trilinos_Util_CrsMatrixGallery.** when running out of disk space. |
FYI: I made my last very careful effort to reproduce the EpetraOperatorWrapper_UnitTests.cpp cannot open Trilinos_Util_CrsMatrixGallery.h error in #10842 (comment) for the 'vortex' build for PR #10808 and I was not able to do so (i.e. it passed the build). |
Note that this issue is also tracking what was reported in #10906. "In some PR testing, compile failures are erroneously showing up under the subproject Zoltan2Sphyx." |
FYI: Still no XML files being archived in the Jenkins jobs to allow us to debug what is causing this behavior. See TRILINOSHD-188. |
FYI: We are still seeing a bunch of these cases where errors are reported to Zoltan2Sphynx as seen here over the last 2 days with 7 PR iterations showing failures: |
FYI: See #10836 (comment) and #10836 (comment). |
CC: @e10harvey, @zackgalbreath FYI: The problem of reporting the global which shows a build error in the example object file:
Why is that build error not being reported along with the Compadre? The Build.xml file archived in: shown here is given below. What is strange about these two build errors is that they are for the same Compadre build error:
and the Build.xml file shows two entries for the same build error. It is almost like the ctest -S process is running the build twice: once with launchers turned on and a follow up build with launchers turned off. The second failure for the global cmake --build command entry in the XML file shows:
This is so strange. |
FYI: The behavior described above turns out the be a CTest defect. For details and to follow the fix, see: Unfortunately, I think that means we will need to upgrade CMake/CTest on all client machines to fix this which will require waiting for CMake 3.25.0 in Jan 2023 (or perhaps a patch release of CMake 3.24). Update: The fix is going to come out in CMake 3.23.3! |
FYI: The fix for this is in CMake 3.24.3 (released 2022-11-01) . (See SNL Kitware #209). Next: Install CMake 3.24.3 everywhere and use with Trilinos PR builds ... |
With the upgrade of CMake 3.24.3 for all of the Trilinos PR builds yesterday, this should be resolved (see TRILINOSHD-228). For example, we are only seeing build errors for actual targets in the PR builds over the last day shown here and we see just the build error for the target: Using a version of CMake between versions 3.19 and 2.24.2 (inclusive), we would have seen that same error showing up along with the entire Closing this as complete. Boy, that was a hard one to diagnose. But the fact that Kitware was willing to patch CMake 3.24.3, SEMS was willing to install CMake 3.24.3, and the Trilinos Framework team was willing and able to upgrade all of the PR builds is what allowed this to be fixed relatively quickly. |
Bug Report
@trilinos/framework
Next Action Status
This is due to a defect in CTest introduced in CMake 3.18. The fix for this is in CMake 3.24.3 (released 2022-11-01) . (See SNL Kitware #209). Next: Install CMake 3.24.3 everywhere and use with Trilinos PR builds ...
Internal issues
Description
As shown in this query showing:
the new Trilinos Framework GenConfig build
rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
is failing with a build error reported in theZoltan2Sphynx
package showing:returning error code
1
.As you can see, this is currently failing in the "Master Merge" builds for promotion PRs #10820 and #10797 so this error has nothing to do with a given PR branch, this is impacting 'develop' and will impact everyone's PRs. The reason I saw it is because it took out my last PR iteration #10813 (comment) for PR #10813.
Steps to Reproduce
Run a PR build.
The text was updated successfully, but these errors were encountered: