Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in the structure of Xarray Dataset with for PSurge Data (grib2io vs. cfgrib) #163

Open
ShaneMill1 opened this issue Dec 23, 2024 · 4 comments · May be fixed by #165
Open

Difference in the structure of Xarray Dataset with for PSurge Data (grib2io vs. cfgrib) #163

ShaneMill1 opened this issue Dec 23, 2024 · 4 comments · May be fixed by #165
Assignees
Labels
bug Something isn't working

Comments

@ShaneMill1
Copy link

I am attempting to ingest psurge grib files into xarray using grib2io. This is the location of the data:

https://slosh.nws.noaa.gov/psurge/data.co/2024/2024101012_Milton_Adv22/data/

When reading the data with grib2io, we get the following Xarray dataset structure:

ds
<xarray.Dataset>
Dimensions:                   (leadTime: 17, duration: 17, y: 5505, x: 8577)
Coordinates:
    refDate                   datetime64[ns] ...
  * leadTime                  (leadTime) timedelta64[ns] 0 days 06:00:00 ... ...
    valueOfFirstFixedSurface  float64 ...
    percentileValue           int64 ...
  * duration                  (duration) timedelta64[ns] 0 days 06:00:00 ... ...
    latitude                  (y, x) float64 ...
    longitude                 (y, x) float64 ...
    validDate                 (leadTime) datetime64[ns] ...
Dimensions without coordinates: y, x
Data variables:
    SURGE                     (leadTime, duration, y, x) float32 ...
Attributes:
    engine:   grib2io

For comparison, this is what the xarray dataset looks like when using cfgrib:

Dimensions:            (step: 17, y: 5505, x: 8577)
Coordinates:
    time               datetime64[ns] ...
  * step               (step) timedelta64[ns] 0 days 12:00:00 ... 8 days 12:0...
    heightAboveGround  float64 ...
    latitude           (y, x) float64 ...
    longitude          (y, x) float64 ...
    valid_time         (step) datetime64[ns] ...
Dimensions without coordinates: y, x
Data variables:
    surge              (step, y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          14
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    history:                 2024-12-23T20:08 GRIB to CDM+CF via cfgrib-0.9.1...

It appears to me that grib2io may be adding an extra dimension (either leadTime or duration are being added but shouldn't be). I believe this is meant to be 3 dimensional data as cfgrib shows, but grib2io sees it as 4-dimensional data.

It is desirable for us to use grib2io instead of cfgrib because grib2io contains the local tables that allow other surge related variables, such as TCSRG20 to be read into xarray (cfgrib shows these variables as undefined).

It would be useful to know if what I have described here is expected behavior of grib2io or if this is indeed a bug. Thanks!

@EricEngle-NOAA EricEngle-NOAA self-assigned this Jan 6, 2025
@EricEngle-NOAA
Copy link
Collaborator

Hi @ShaneMill1. Can you provide the syntax you used to create the grib2io-based Xarray dataset?

@ShaneMill1
Copy link
Author

Hey @EricEngle-NOAA, here is the full syntax of opening the Xarray dataset:

Python 3.11.11 | packaged by conda-forge | (main, Dec  5 2024, 14:17:24) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray as xr
>>> import grib2io
>>> 
>>> ds=xr.open_dataset('2024101012_Milton_Adv22_e70_cum_dat.grb',engine='grib2io')
>>> ds
<xarray.Dataset> Size: 2GB
Dimensions:                   (leadTime: 2, duration: 2, y: 5505, x: 8577)
Coordinates:
    refDate                   datetime64[ns] 8B ...
  * leadTime                  (leadTime) timedelta64[ns] 16B 3 days 08:00:00 ...
    valueOfFirstFixedSurface  float64 8B ...
    percentileValue           int64 8B ...
  * duration                  (duration) timedelta64[ns] 16B 3 days 08:00:00 ...
    latitude                  (y, x) float64 378MB ...
    longitude                 (y, x) float64 378MB ...
    validDate                 (leadTime) datetime64[ns] 16B ...
Dimensions without coordinates: y, x
Data variables:
    TCSRG70                   (leadTime, duration, y, x) float32 755MB ...
Attributes:
    engine:   grib2io
>>> 

@EricEngle-NOAA EricEngle-NOAA added the bug Something isn't working label Jan 23, 2025
@EricEngle-NOAA
Copy link
Collaborator

@ShaneMill1 - finally getting back around to answering some of your questions about this. So what you are showing is indeed a bug in the grib2io xarray backend, but not because of the additional dimension. grib2io allows for certain Grib2Message attributes to be coordinates/dimensions and duration is one of them.

The bug shown here is that leadTime and duration are both dimension of size 2. This would imply that there are 4 messages in the file and we know that is not the case.

I have a fix ready for this.

@ShaneMill1
Copy link
Author

Thanks for the update!

@EricEngle-NOAA EricEngle-NOAA linked a pull request Jan 23, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants