Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] MPAS crashing after restart from GETKF analyses #309

Open
SamuelDegelia-NOAA opened this issue Mar 4, 2025 · 3 comments
Open

[bug] MPAS crashing after restart from GETKF analyses #309

SamuelDegelia-NOAA opened this issue Mar 4, 2025 · 3 comments
Assignees

Comments

@SamuelDegelia-NOAA
Copy link
Contributor

Current behavior (describe the bug)

The rrfs-workflow PR #665 added functionality to cycle ensemble DA using GETKF. The GETKF tasks complete successfully but I am seeing crashes for the fcst step on the 0300 UTC cycle. The error is:

----------------------------------------------------------------------
Beginning MPAS-atmosphere Error Log File for task       1 of      40
    Opened at 2025/03/03 22:26:38
----------------------------------------------------------------------

ERROR: Error in compute_layer_mean: pressure should increase with index

Steps to Reproduce

Running on hera, run the workflow using the default 27 May retro case with exp.ens_conus12km and the following settings:

export DO_IODA="true"
export DO_JEDI=true

Expected behavior

The forecast step should not crash after GETKF analysis.

@SamuelDegelia-NOAA
Copy link
Contributor Author

SamuelDegelia-NOAA commented Mar 4, 2025

More error information from /scratch2/BMC/zrtrr/Samuel.Degelia/rrfs-workflow-getkf/experiment/stmp/20240527/rrfs_fcst_03_v2.0.9/enkf/mem001/fcst_03/log.atmosphere.0000.out:

 --- subroutine MPAS_to_phys - pressure(1) < pressure(2):
 i      =16
 latCell=40.0822
 lonCell=287.934
 1 16 1 17.9503 99877.4 -99876.5 0.945312 0.310608E-03 287.190 10.5356 0.107672E-01
 1 16 2 20.2955 99616.7 -99609.7 7.00781 0.128606E-02 289.898 18.8474 0.119644E-01
 1 16 3 22.9392 99322.8 -99315.1 7.69531 0.137200E-02 290.819 19.4275 0.119640E-01
 1 16 4 25.9173 98991.8 -98984.5 7.24219 0.131206E-02 291.234 19.1206 0.118730E-01
 1 16 5 29.2696 98619.2 -98612.6 6.57031 0.122087E-02 291.550 18.6055 0.117649E-01
 1 16 6 33.0401 98200.1 -98194.1 6.08594 0.115428E-02 291.943 18.2269 0.116568E-01
 1 16 7 37.2760 97729.4 -97724.0 5.47656 0.106815E-02 292.451 17.7125 0.115488E-01
 1 16 8 42.0294 97201.2 -97196.3 4.96094 0.994358E-03 293.000 17.2578 0.114164E-01
 1 16 9 47.3568 96609.3 -96604.7 4.57812 0.937638E-03 293.602 16.9046 0.112603E-01
 1 16 10 53.3181 95946.9 -95942.4 4.57031 0.935042E-03 294.196 16.9336 0.110838E-01
 1 16 11 59.9785 95207.0 -95202.0 4.92969 0.985420E-03 294.693 17.3322 0.109144E-01
 1 16 12 67.4069 94381.8 -94376.3 5.48438 0.106096E-02 295.121 17.8873 0.107633E-01
 1 16 13 75.6739 93463.5 -93457.8 5.68750 0.108780E-02 295.613 18.1086 0.106243E-01
 1 16 14 84.8527 92443.8 -92438.1 5.68750 0.108543E-02 296.121 18.1362 0.104795E-01
 1 16 15 95.0166 91314.5 -91309.1 5.35938 0.103867E-02 296.684 17.8659 0.102251E-01
 1 16 16 106.238 90067.2 -90062.0 5.28906 0.102692E-02 297.322 17.8358 0.976315E-02
 1 16 17 118.587 88694.1 -88688.1 5.94531 0.111404E-02 298.045 18.4854 0.877078E-02
 1 16 18 132.128 87187.4 -87180.7 6.73438 0.121618E-02 298.851 19.2122 0.755007E-02
 1 16 19 146.916 85540.6 -85533.0 7.60938 0.132289E-02 299.686 19.9411 0.633912E-02
 1 16 20 162.997 83748.1 -83740.6 7.51562 0.130846E-02 300.545 19.9294 0.535421E-02
 1 16 21 180.400 81805.7 -81800.1 5.65625 0.106601E-02 301.367 18.4283 0.483054E-02
 1 16 22 199.135 79711.3 -79711.2 0.125000 0.681103E-04 301.918 6.14951 0.580696E-02
 1 16 23 219.188 77464.9 NaN NaN -0.929689E-03 302.412 NaN 0.700290E-02
 1 16 24 240.521 75069.2 NaN NaN -0.165907E-02 302.874 NaN 0.802890E-02
 1 16 25 263.064 72529.8 NaN NaN -0.209259E-02 303.378 NaN 0.878596E-02
 1 16 26 286.714 69855.5 NaN NaN -0.191919E-02 304.019 NaN 0.901553E-02
 1 16 27 311.335 67058.1 NaN NaN -0.951302E-03 304.954 NaN 0.803168E-02
 1 16 28 336.757 64153.1 NaN NaN -0.662192E-04 306.555 NaN 0.594188E-02

The fifth and sixth columns printed are pressure_base and pressure_p, respectively. This error message shows negative values (unexpected) for pressure_p for the first 22 levels and then NaNs after that.

However, printing out the values from mpasin.nc used to initialize MPAS shows reasonable values. Here is the first level at the same grid cell:

file = /scratch2/BMC/zrtrr/Samuel.Degelia/rrfs-workflow-getkf/experiment/stmp/20240527/rrfs_fcst_03_v2.0.9/enkf/mem001/fcst_03/mpasin.nc
pressure_p = 1536.5333251953125
pressure_base = 99877.421875

Instead, it seems that these bad values are coming from the mpasout valid at the forecast initialization time after MPAS has started. Here are those values:

file = /scratch2/BMC/zrtrr/Samuel.Degelia/rrfs-workflow-getkf/experiment/stmp/20240527/rrfs_fcst_03_v2.0.9/enkf/mem001/fcst_03/mpasout.2024-05-27_03.00.00.nc
pressure_p = -99876.4765625
pressure_base = 99877.421875

It is thus unclear if this restart problem is related to GETKF or something else. But the crash does not occur if the ensemble perturbations are not updated by GETKF.

@SamuelDegelia-NOAA SamuelDegelia-NOAA self-assigned this Mar 4, 2025
@guoqing-noaa
Copy link
Collaborator

@SamuelDegelia-NOAA Thanks a lot for debugging on this!
I think it is related to GETKF DA. As you mentioned, if we don't update ensembles, they will run without a problem.

This might be related to the JEDIVAR issue @chunhuazhou and @spanNOAA have been debugging. There are some weird aircraft temperatures at 10km with a QC flag of NaN

@SamuelDegelia-NOAA
Copy link
Contributor Author

FYI fixing the QC markers for the aircar data did not resolve this fcst crash for GETKF. I will continue debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants