-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve segmentation faults for FULLMG_CYCLE #1362
Conversation
- In CTurbSSTSolver.cpp consider the FULLMG_CYCLE scenario - In CIntegration.hpp fixed the description for the variable "Convergence_FullMG"
You should be able to use CConfig::SetFinestMesh in CDriver::Geometrical_Preprocessing_FVM to initialize this to the coarsest grid that was produced. The threshold to stop agglomeration should stay. But you can expose it as a config option if you want. |
By the way, thank you for fixing this 👍. And the reason the agglomeration stops earlier in parallel than in serial is that the algorithm is less effective (due to the shape of the MPI partitions not being very nice). |
@suargi Just a service note on the hybrid_regression_AD.py reg tests that fail: They seem to sometimes fail due to mood swings or idk. So if you Re-run them in the |
Your suggestion is also valid, but in my opinion it would be more coherent, in terms of code structure, to reduce the finest mesh level whenever we modify the number of multigrid levels. So in CMultiGridGeometry::CMultiGridGeometry add this line of code
|
Thank you @TobiKattmann. I will keep it in mind! |
In that case you can also modify CConfig::SetMGLevels to also set the FinestLevel, that way this is always up to date, even if we use it somewhere else. |
Btw please change one of the regressions to cover this feature. |
That sounds great. After introducing those fixes I can definitely create a regression test to cover the fullmultigrid feature. Nevertheless there is still the segmentation fault problem when using mpirun. I have to delve into that. |
Co-authored-by: Guillermo Suarez <[email protected]>
Commit a458251 deals with the second issue as proposed by Pedro (#1362 (comment)). |
Let me recall the other sub-issues reported by @suargi: Third issue
I was able to partially reproduce this with @suargi's help, by modifying Edit: My version are Fourth issue
|
I had a look at the following part of the output:
So I think that MGLEVEL=6 is too high, it makes the coarsest mesh have too few points. We should choose e.g. MGLEVEL=4 for testing. The |
In CConfig::SetPostprocessing: if (Restart) MGCycle = V_CYCLE; I don't know what is the purpose of that, and whether we should warn the user if we deviate from what the cfg file wants us to do. |
I can see why FULL MG would not be useful for restarts, but ruling out W_CYCLE seems unnecessary. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is still a relevant issue please comment on it to restart the discussion. Thank you for your contributions. |
do we need much more work on this PR? |
The following tasks have been finished:
The following optional tasks remain:
|
FULLMG_CYCLE produces segmentation faults in different ways:
This PR tries to fix these segmentation faults.
Proposed Changes
First issue
In CTurbSSTSolver::CTurbSSTSolver() we do not initialize the residual nor the solution if we are using a FULLMG_CYCLE, producing a segmentation fault. I have included a line of code to consider the FULLMG_CYCLE scenario.
Second issue
This issue is not related to mpi per se but to domain partitioning.
In CMultiGridGeometry::CMultiGridGeometry() is computed the ratio between the number of points in the finest grid and a given coarse grid level. If this ratio is below 2.5 (I do not know why do we make this evaluation neither the reason why exactly 2.5) a multigrid level is removed (without warning the user!?), see lines 629-632. For a few cases that I have tested, when running in parallel certain grid levels have a ratio below 2.5 hence are removed. This does not happen when running in serial.
In CMultiGridIntegration::MultiGrid_Iteration() the index of the "finest grid" is then required and the system of equations at the "finest grid" level is intended to be solved. This index is set by default with the number multigrid levels specified by the user on the config file. In case that multigrid levels have been removed, when trying to access to the "finest level" mesh in CMultiGridIntegration::MultiGrid_Iteration() that level does not exist and produces a segmentation fault.
To solve this issue:
What are your thoughts?
After solving the previous issues, with mpirun and depending on the number of cores to be used SU2 will still produce a segmentation fault when using Full MG. This problem is not present when using mpiexec. I traced it back and found that in CFVMFlowSolverBase<V, FlowRegime>::Friction_Forces the method GetNormal_Neighbor() from CVertex returns a non existing point. Before delving into the problem would be nice if you could corroborate these findings. My versions of mpirun and mpiexec are 3.1.3 and 3.3 respectively.
Additional Work
Additionally I have corrected the description of the variable Convergence_FullMG in CIntegration.hpp. Related to this same variable, what is the definition of convergence for the full multigrid?
Further, this parameter, Convergence_FullMG, is set to false in CIntegration::CIntegration() and never updated i.e., there is no evaluation whether the FullMG has converged or not. Consequently, the function SetProlongated_Solution() in CMultiGridIntegration::MultiGrid_Iteration() is never executed.
PR Checklist