In reading the conv-adpupa dataset (radiosondes), loading more than one day at a time via ds.sel(time=slice(...)) raises an error. This error is benignly ignored with the pandas backend, but it raises a fatal exception with the dask backend.
Python 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.5.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: IPython supports combining unicode identifiers, eg F\vec<tab> will become F⃗, useful for physics equations. Play with \dot \ddot and others.
In [1]: import nnja_ai
...: from nnja_ai import DataCatalog
...: # Monkeypatch _get_auth_args to instead supply the 'trust_env' variable to the network connection,
...: # allowing the connection to open on behind-proxy machines
...: nnja_ai.io._get_auth_args = lambda x : {'session_kwargs': {'trust_env' : True}}
In [2]: catalog = DataCatalog()
...: sonde_ds = catalog['conv-adpupa-NC002001']
...: print(sonde_ds.info())
Loading manifest for dataset 'conv-adpupa-NC002001'...
Dataset 'conv-adpupa-NC002001': ADP Upper-air data; Rawinsonde - fixed land
Tags: adpupa, upper air, global, station data, fixed land, radiosonde, rawinsonde
Files: 5445 files in manifest
Variables: 265
In [3]: tstart = 'T00:00Z'
...: tend = 'T23:59Z'
...: day1 = '2024-01-01'
...: day2 = '2024-01-02'
In [4]: # Pandas
...: for (name, tslice) in (('01/01',(day1+tstart,day1+tend)),
...: ('02/02',(day2+tstart,day2+tend)),
...: ('01/02',(day1+tstart,day2+tend))):
...: foo = sonde_ds.sel(time=slice(*tslice)).load_dataset(backend='pandas')
...: print(f'{name}: {len(foo)} entries')
...:
01/01: 1185 entries
02/02: 1211 entries
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/site-packages/nnja_ai/io.py:148: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
return pd.concat(
01/02: 2396 entries
In [5]: # Dask
...: for (name, tslice) in (('01/01',(day1+tstart,day1+tend)),
...: ('02/02',(day2+tstart,day2+tend)),
...: ('01/02',(day1+tstart,day2+tend))):
...: try:
...: foo = sonde_ds.sel(time=slice(*tslice)).load_dataset(backend='dask').compute()
...: print(f'{name}: {len(foo)} entries')
...: except Exception as e:
...: print(f'{name} Exception: {e}')
...:
01/01: 1185 entries
02/02: 1211 entries
01/02 Exception: Unsupported cast from double to null using function cast_null
In reading the conv-adpupa dataset (radiosondes), loading more than one day at a time via
ds.sel(time=slice(...))raises an error. This error is benignly ignored with the pandas backend, but it raises a fatal exception with the dask backend.