Skip to content

Consider setting time and platform as dimensions #235

@melodb

Description

@melodb

I used a function like this:

def restore_mindex(
    data: xr.Dataset,
    index_var: str = "index",
    platform_var: str = "platform",
    indicator_var: str = "number_of_identifier",
) -> xr.Dataset:
    platform_arr = data[platform_var].values[data[indicator_var].values.astype("int")]
    mindex = pd.MultiIndex.from_arrays([platform_arr, data.time.values])
    return data.assign_coords(xr.Coordinates.from_pandas_multiindex(mindex, index_var))

to make index a coordinate that is a stack of time and platform coordinates (and also creates dimensions for time and platform). I'm not 100% sure that this will let you select by time and platform... you might need to do .unstack("index") first (which might introduce NaNs if different platforms have different time values). We use something similar in OpenGHG Inversions to work with our old output format, and I found have the time and site (or platform) coordinates useful.

Also the platform_arr replaces the indices with the labels, but you could just use data[indicator_var].values instead if you want the indices.

(If you already have time and platform, then ds.stack(index=("time", "platform")).dropna("index") will give you an index coordinate like fluxy uses now. You can't save stacked dimensions/coordinates like this to netCDF, so either you need to unstack (and add NaNs), or convert to something like the format in the CDL templates.)

Since assimilation_flag isn't a dimension you'd need to use .where in that case anyway. But potentially you could do ds_all[m].sel(time=slice(start_date, end_date).sel(platform="TAC"). (If say, "TAC" == ds_all[m]["platform"][site_index])

Originally posted by @brendan-m-murphy in #234 (comment)

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions