I used a function like this:
def restore_mindex(
data: xr.Dataset,
index_var: str = "index",
platform_var: str = "platform",
indicator_var: str = "number_of_identifier",
) -> xr.Dataset:
platform_arr = data[platform_var].values[data[indicator_var].values.astype("int")]
mindex = pd.MultiIndex.from_arrays([platform_arr, data.time.values])
return data.assign_coords(xr.Coordinates.from_pandas_multiindex(mindex, index_var))
to make index a coordinate that is a stack of time and platform coordinates (and also creates dimensions for time and platform). I'm not 100% sure that this will let you select by time and platform... you might need to do .unstack("index") first (which might introduce NaNs if different platforms have different time values). We use something similar in OpenGHG Inversions to work with our old output format, and I found have the time and site (or platform) coordinates useful.
Also the platform_arr replaces the indices with the labels, but you could just use data[indicator_var].values instead if you want the indices.
(If you already have time and platform, then ds.stack(index=("time", "platform")).dropna("index") will give you an index coordinate like fluxy uses now. You can't save stacked dimensions/coordinates like this to netCDF, so either you need to unstack (and add NaNs), or convert to something like the format in the CDL templates.)
Since assimilation_flag isn't a dimension you'd need to use .where in that case anyway. But potentially you could do ds_all[m].sel(time=slice(start_date, end_date).sel(platform="TAC"). (If say, "TAC" == ds_all[m]["platform"][site_index])
Originally posted by @brendan-m-murphy in #234 (comment)
I used a function like this:
to make
indexa coordinate that is a stack oftimeandplatformcoordinates (and also creates dimensions fortimeandplatform). I'm not 100% sure that this will let you select bytimeandplatform... you might need to do.unstack("index")first (which might introduce NaNs if different platforms have different time values). We use something similar in OpenGHG Inversions to work with our old output format, and I found have the time and site (or platform) coordinates useful.Also the
platform_arrreplaces the indices with the labels, but you could just usedata[indicator_var].valuesinstead if you want the indices.(If you already have
timeandplatform, thends.stack(index=("time", "platform")).dropna("index")will give you an index coordinate like fluxy uses now. You can't save stacked dimensions/coordinates like this to netCDF, so either you need to unstack (and add NaNs), or convert to something like the format in the CDL templates.)Since
assimilation_flagisn't a dimension you'd need to use.wherein that case anyway. But potentially you could dods_all[m].sel(time=slice(start_date, end_date).sel(platform="TAC"). (If say,"TAC" == ds_all[m]["platform"][site_index])Originally posted by @brendan-m-murphy in #234 (comment)