Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed run once setting up files for the spinup run #24

Closed
FariborzDaneshvar-NOAA opened this issue Aug 18, 2023 · 10 comments
Closed

Failed run once setting up files for the spinup run #24

FariborzDaneshvar-NOAA opened this issue Aug 18, 2023 · 10 comments
Assignees

Comments

@FariborzDaneshvar-NOAA
Copy link
Collaborator

FariborzDaneshvar-NOAA commented Aug 18, 2023

A test run for Dorian 2019 (with OFCL track) stock in the Setting up the model ... step and runs did not launch (DependencyNeverSatisfied). Here is the content of the slurm.out file for the failed step, noting that The GAHM asymmetric data structure has more than 4 iSotachs in cycle 59.

slurmstepd: error: TMPDIR [/lustre/.tmp] is not writeable
slurmstepd: error: Setting TMPDIR to /tmp
+ pushd /lustre/hurricanes/dorian_2019_b90c3ac1-d946-47d1-878c-322e8a63a34d/setup/ensemble.dir/spinup
/lustre/hurricanes/dorian_2019_b90c3ac1-d946-47d1-878c-322e8a63a34d/setup/ensemble.dir/spinup ~/ondemand-storm-workflow/singularity/scripts
+ mkdir -p outputs
+ mpirun -np 36 singularity exec --bind /lustre /lustre/singularity_images//solve.sif pschism_PAHM_TVD-VL 4
 
---------- MODEL PARAMETERS ----------
   title                = 
   bestTrackFileName(1) = hurricane-track.dat
   meshFileType         = 
   meshFileName         = 
   meshFileForm         = 
 
   gravity              = 9.81000 m/s^2
   rhoWater             = 1000.00000 kg/m^3
   rhoAir               = 1.14780 kg/m^3
   backgroundAtmPress   = 1013.25000 mbar
   windReduction        = 0.90
 
   refDateTime          = 
   refYear              = 2019
   refMonth             = 08
   refDay               = 22
   refHour              = 12
   refMin               = 00
   refSec               = 00
   refDateSpecified     = T
 
   begDateTime          = 
   begYear              = 2019
   begMonth             = 08
   begDay               = 22
   begHour              = 12
   begMin               = 00
   begSec               = 00
   begDateSpecified     = T
 
   endDateTime          = 5000-01-01 00:00:00
   endYear              = 5000
   endMonth             = 01
   endDay               = 01
   endHour              = 00
   endMin               = 00
   endSec               = 00
   endDateSpecified     = T
 
   unitTime             = S
   outDT                = -999999.00000 s
   mdOutDT              = -999999.00000 s
   begSimTime           = 0.00000 s
   mdBegSimTime         = 0.00000 s
   begSimSpecified      = T
   endSimTime           = 94051108800.00000 s
   mdEndSimTime         = 94051108800.00000 s
   endSimSpecified      = T
   nOutDT               = -999999
 
   outFileName          = 
   ncShuffle            = 0
   ncDeflate            = 0
   ncDLevel             = 0
   ncVarNam_Pres        = P
   ncVarNam_WndX        = uwnd
   ncVarNam_WndY        = vwnd
 
   modelType            =         10
---------- MODEL PARAMETERS ----------
 
InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called ::                                                   : The GAHM asymmetric data structure has more than 4 iSotachs in cycle 59.
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 0 on
node sorooshmani-nhccolab2-00005-1-0001 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.
--------------------------------------------------------------------------

Run directory on NHC_COLAB_2 cluster: /lustre/hurricanes/dorian_2019_b90c3ac1-d946-47d1-878c-322e8a63a34d

@FariborzDaneshvar-NOAA
Copy link
Collaborator Author

Here are the first few lines of hurricane track files:

  • Downloaded from stormevent (/lustre/hurricanes/dorian_2019_.../nhc_track/hurricane-track.dat):
AL, 05, 2019082412, 03, OFCL,   0, 103N,  474W,  30, 1010, TD,  34, NEQ,    0,    0,    0,    0, 1014,    0,  30,  35,   0,    ,   0, SRS,   0,   0,           ,,,,,,,,
AL, 05, 2019082418, 03, OFCL,   0, 106N,  486W,  30, 1009, TD,  34, NEQ,    0,    0,    0,    0, 1014,    0,  30,  40,   0,    ,   0, SRS, 284,   6,           ,,,,,,,,
AL, 05, 2019082500, 03, OFCL,   0, 108N,  499W,  35, 1008, TS,  34, NEQ,   20,    0,    0,   20, 1014,    0,  10,  45,   0,    ,   0, DAZ, 279,   7,           ,,,,,,,,
AL, 05, 2019082506, 03, OFCL,   0, 109N,  510W,  35, 1008, TS,  34, NEQ,   20,    0,    0,   20, 1014,    0,  10,  45,   0,    ,   0, ESB, 275,   6,           ,,,,,,,,
AL, 05, 2019082512, 03, OFCL,   0, 111N,  523W,  35, 1008, TS,  34, NEQ,   20,    0,    0,   20, 1014,    0,  10,  45,   0,    ,   0, JLB, 279,   7,           ,,,,,,,,
AL, 05, 2019082518, 03, OFCL,   0, 114N,  535W,  40, 1006, TS,  34, NEQ,   30,    0,    0,   30, 1015,    0,  10,  50,   0,    ,   0, JLB, 284,   6,           ,,,,,,,,
AL, 05, 2019082600, 03, OFCL,   0, 116N,  547W,  45, 1003, TS,  34, NEQ,   40,   30,   20,   40, 1012,    0,  10,  55,   0,    ,   0, RJP, 280,   6,           ,,,,,,,,
AL, 05, 2019082606, 03, OFCL,   0, 118N,  558W,  50, 1002, TS,  34, NEQ,   40,   30,   20,   40, 1012,    0,  10,  60,   0,    ,   0, JPC, 281,   6,           ,,,,,,,,
AL, 05, 2019082606, 03, OFCL,   0, 118N,  558W,  50, 1002, TS,  50, NEQ,   20,    0,    0,    0, 1012,    0,  10,  60,   0,    ,   0, JPC, 281,   6,           ,,,,,,,,
AL, 05, 2019082612, 03, OFCL,   0, 121N,  571W,  50, 1002, TS,  34, NEQ,   40,   30,   20,   40, 1012,    0,  10,  60,   0,    ,   0, SRS, 283,   7,           ,,,,,,,,
  • In the spinup directory (/lustre/hurricanes/dorian_2019_.../setup/ensemble.dir/spinup/hurricane-track.dat):
AL, 05, 2019082406,   , BEST,   0, 103N,  464W,  25, 1011, TD,  34, NEQ,    0,    0,    0,    0, 1015,  130,  30,  35,   0,   L,   0,    ,   0,   0,     INVEST,,,,,,,,
AL, 05, 2019082412,   , BEST,   0, 104N,  475W,  30, 1010, TD,  34, NEQ,    0,    0,    0,    0, 1014,  120,  30,   0,   0,   L,   0,    , 275,   6,       FIVE,,,,,,,,
AL, 05, 2019082418,   , BEST,   0, 106N,  487W,  35, 1008, TS,  34, NEQ,   30,    0,    0,   30, 1014,  120,  30,  40,   0,   L,   0,    , 280,   6,       FIVE,,,,,,,,
AL, 05, 2019082500,   , BEST,   0, 108N,  499W,  35, 1008, TS,  34, NEQ,   30,    0,    0,   30, 1014,  120,  30,  45,   0,   L,   0,    , 280,   6,     DORIAN,,,,,,,,
AL, 05, 2019082506,   , BEST,   0, 110N,  510W,  35, 1008, TS,  34, NEQ,   30,    0,    0,   30, 1014,  120,  30,  45,   0,   L,   0,    , 281,   6,     DORIAN,,,,,,,,
AL, 05, 2019082512,   , BEST,   0, 112N,  523W,  40, 1007, TS,  34, NEQ,   30,    0,    0,   30, 1013,  120,  30,  45,   0,   L,   0,    , 279,   7,     DORIAN,,,,,,,,
AL, 05, 2019082518,   , BEST,   0, 114N,  535W,  45, 1007, TS,  34, NEQ,   30,    0,    0,   30, 1014,  120,  30,  50,   0,   L,   0,    , 280,   6,     DORIAN,,,,,,,,
AL, 05, 2019082600,   , BEST,   0, 116N,  547W,  45, 1007, TS,  34, NEQ,   40,   30,   30,   40, 1012,  120,  30,  55,   0,   L,   0,    , 280,   6,     DORIAN,,,,,,,,
AL, 05, 2019082606,   , BEST,   0, 119N,  560W,  45, 1006, TS,  34, NEQ,   40,   30,   30,   40, 1012,  120,  30,  60,   0,   L,   0,    , 283,   7,     DORIAN,,,,,,,,
AL, 05, 2019082612,   , BEST,   0, 122N,  572W,  45, 1006, TS,  34, NEQ,   40,   30,   30,   40, 1012,  120,  30,  60,   0,   L,   0,    , 284,   6,     DORIAN,,,,,,,,

@SorooshMani-NOAA SorooshMani-NOAA self-assigned this Aug 21, 2023
@SorooshMani-NOAA
Copy link
Collaborator

The issue is that in the spinup case that we are not supposed to have any track, somehow it picks up the best track and uses it. And the issues of duplication is actually in the best track file directly downloaded from ATCF webpage. I need to find out why the best track is used and just get rid of it.

@FariborzDaneshvar-NOAA
Copy link
Collaborator Author

@SorooshMani-NOAA Thanks for looking into it.

@SorooshMani-NOAA
Copy link
Collaborator

@FariborzDaneshvar-NOAA this issue should be fixed for OFCL. I'm still working on remove duplication for the best track. If you rerun the workflow for OFCL (past forecast) it should work fine. I tested for a 7-member ensemble for Dorian 2019 and it went through (spinup ran was successful)

@SorooshMani-NOAA
Copy link
Collaborator

@saeed-moghimi-noaa When trying Dorian 2019 with the workflow, I noticed an issue. In the workflow when deciding when to start perturbing the track, I need to calculate a rough estimate of landfall time, so I take the shapefile of US and intersect it with the track. In case of Dorian best track, the track doesn't seem to intersect the US shape at all! so I was wondering if you have any suggestion for how to improve the logic?

One way is to just say perturb before the landfall on any country; but the reason I didn't do that is that sometimes some storm in the gulf might landfall on a country and then again on US coast, and we'd like to perturb before US landfall. Please let me know what you think.

@SorooshMani-NOAA
Copy link
Collaborator

@FariborzDaneshvar-NOAA I added a fix for this for now. Both best track and official track should work for all storms (including Ian and Dorian) please let me know if you notice any issues.

@saeed-moghimi-noaa
Copy link
Collaborator

Hi @SorooshMani-NOAA Please discuss about this with our friends from NHC. Perhaps they have a specific way of handling this. Thanks.

@SorooshMani-NOAA
Copy link
Collaborator

SorooshMani-NOAA commented Aug 24, 2023

I asked this question in the NHC meeting and they said that they use a subjective approach. In cases where there is no actual landfall (Marco 2020, Dorian 2019, ...) the take the point of closes approach as the landfall and then calculate the perturbation location. For the 25 storms to test for skill assessment (https://github.com/saeed-moghimi-noaa/Next-generation-psurge-tasks/issues/14) there's a fixed table that is used for lead times.

We can take this table and put it in our workflow when we run these storms.

@FariborzDaneshvar-NOAA
Copy link
Collaborator Author

FariborzDaneshvar-NOAA commented Aug 24, 2023

@SorooshMani-NOAA Thanks for the fix. New runs of Ian and Dorian with the OFCL track completed successfully. There was only a memory issue during the post-processing step for Dorian that I documented it here: Memory issue for combining results #111

@SorooshMani-NOAA
Copy link
Collaborator

Since the spinup issues are resolved, I close this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants