Skip to content

Add cfgrib_compat flag and changes to backend#89

Open
rachtsingh wants to merge 5 commits intompiannucci:mainfrom
rachtsingh:add-compat-backend
Open

Add cfgrib_compat flag and changes to backend#89
rachtsingh wants to merge 5 commits intompiannucci:mainfrom
rachtsingh:add-compat-backend

Conversation

@rachtsingh
Copy link

@rachtsingh rachtsingh commented Nov 7, 2025

Following up on conversation from #88

This:

  • add a cfgrib_compat argument that can be passed to the xarray backend
  • adds a values_dtype argument like in cfgrib (default is np.float32)
  • removes projection coordinates like cfgrib does when cfgrib_compat=True
  • squeezes dimensions of size 1 when cfgrib_compat=True
  • changes the definition of time to fcast_time (i.e. when the forecast was initialized) and adds a separate valid_time when cfgrib_compat=True

So there's one breaking change in there, though I'm not sure how many people are working with float64 data in GRIB (since most data is 12-24 bits integer encoded).

Here's the updated comparison, with cfgrib_compat=True:

cfgrib:

<xarray.Dataset> Size: 38MB
Dimensions:            (y: 1059, x: 1799)
Coordinates:
    time               datetime64[ns] 8B ...
    step               timedelta64[ns] 8B ...
    heightAboveGround  float64 8B ...
    latitude           (y, x) float64 15MB ...
    longitude          (y, x) float64 15MB ...
    valid_time         datetime64[ns] 8B ...
Dimensions without coordinates: y, x
Data variables:
    t2m                (y, x) float32 8MB ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    history:                 2025-11-07T19:34 GRIB to CDM+CF via cfgrib-0.9.15.1/ecCodes-2.44.0

and gribberish:

<xarray.Dataset> Size: 38MB
Dimensions:     (y: 1059, x: 1799)
Coordinates:
    time        datetime64[s] 8B ...
    step        timedelta64[s] 8B ...
    valid_time  datetime64[s] 8B ...
    latitude    (y, x) float64 15MB ...
    longitude   (y, x) float64 15MB ...
Dimensions without coordinates: y, x
Data variables:
    tmp         (y, x) float32 8MB ...
Attributes:
    GRIB_edition:    2
    GRIB_centre:     7
    GRIB_subCentre:  0
    Conventions:     CF-1.7
    institution:     GRIB centre 7
    history:         2025-11-07T19:34 GRIB to CDM+CF via gribberish-0.23.0

@mpiannucci
Copy link
Owner

float64 to float32

The resolution is passed in the grib message, which allows float64 i believe so its hard to assume float32 for everything safely

@rachtsingh
Copy link
Author

Ok, got it, fixed - I reverted that change and now we convert on the fly (by default to float32 like in cfgrib, but we can pass float64, i.e. no conversion).

I found this comment in eccodes explaining their behavior:

GRIB files can encode their values in a number of ways including pretty much any number of bits per value (e.g. 3, 11, 18, 24) and these cannot be expressed as numpy arrays. When we get the array of values from the GIRB file via the ecCodes library, we call an ecCodes function to do so, and this function decodes from the file and always returns an array of float64, regardless of what the encoding in the original GRIB message was; then we convert to float32. There is also now a function to return the values as a float32 array, which in fact cfgrib really should be calling in the first place! So since there usually is not a numpy way of expressing the encoding in the GRIB file, the only choice we have is to ask ecCodes for float32 or float64 (it does the dirty work of doing the conversion); then we can always convert to another numpy representation if needed. (The current behaviour is inefficient because we ask ecCodes for a float64 array, and then we cast to a float32 array.)

@rachtsingh
Copy link
Author

@mpiannucci I wanted to bump this to see if it's still the direction you want to take this? I'm working separately on a GRIB table/template crate that I think will be useful, but that's not ready yet. No rush to merge/review, but I didn't want to do extra work if it's inline with the goal.

@mpiannucci
Copy link
Owner

Hi @rachtsingh, let me take a day or two to reivew this and think it through and ill get back to you

@mpiannucci
Copy link
Owner

Okkkk @rachtsingh i like most of these changes and I actually want to make some of them the defaults:

  • the forecast time and step dimensions are better than what I have now, I would like to make this the default behavior even without the flags
  • I'm not sure I understand the change to the coordinates, as I understand it hrrr is lambert projected and having x and y is valuable for not having to interpolate later, can project instead.
  • thanks for adding the edition and Centre attrs
  • squeezing dimensions is fine as default as well as long as we can still see time coordinate values

After this feedback I'm not sure if we need the flag and can instead update the library as such with a new major version? (Assuming we can look deeper into the lambert coorsinat system stuff).

What do you think?

@mpiannucci
Copy link
Owner

Basically I think this are all great changes hit I rely on the x and y coordinates and the projection information from hrrr for production data so I'd like to keep those if they don't hurt anything

@rachtsingh
Copy link
Author

Hey sorry @mpiannucci, I have been kind of busy with some family stuff so no opportunity to respond or polish this up further. Will respond as soon as I can. I have a deeper proposal for dealing with other stuff (shortName, cfVarName) in eccodes that I think is pretty nice but will require even more explanation / buy-in, so needs some time to test and write up.

@mpiannucci
Copy link
Owner

No problem,, @rachtsingh take care!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants