First model configuration with inference run from inference artifact by leifdenby · Pull Request #10 · dmidk/mlwm-deployment

leifdenby · 2025-05-14T10:36:19Z

This PR contains changes to implement the first model configuration that from an inference artifact as able to produce a gridded zarr forecast dataset. In detail the modified entry.sh-script which will serve as the container image entrypoint does the following:

creates inference an DANRA datastore config and neural-lam from the configurations in the inference artifact
calls mllam-data-prep to create an inference dataset with the inference datastore config
calls neural-lam to produce an inference dataset, this is the transformed datastructure similar to the training/inference datasets (i.e. stacked spatial coordinates, stacked variables along feature coordinates, etc)
calls mllam-data-prep to invert the inference datasets structure back to a gridded forecast zarr dataset with separate variables

Steps 3. and 4. require upstream changes to mllam-data-prep and neural-lam, which are detailed in the model configuration README. I suggest we merge this model configuration now and then I can work on getting these upstream changes in after, and then update the model configuration pyproject.toml file to point to main branches of neural-lam and mllam-data-prep, rather than the development branches that it currently points to:

# from pyproject.toml
mllam-data-prep = { git = "https://github.com/leifdenby/mllam-data-prep", rev = "feat/inference-cli-args" }
neural-lam = { git = "https://github.com/leifdenby/neural-lam", rev = "dev/first-inference-image" }

Full execution output

 ✝  mlwm-deployment/configurations/surface-dummy-model_DINI   feat/forecast-inference-dataset-creation±  ./entry.sh
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
2025-09-26 09:14:25.912 | DEBUG    | __main__:_prepare_inference_dataset_zarr:197 - Opened stats dataset:  Size: 416B
Dimensions:                        (state_feature: 5, static_feature: 2)
Coordinates:
    static_feature_source_dataset  (static_feature) object 16B ...
  * state_feature                  (state_feature) object 40B 'pres_seasurfac...
  * static_feature                 (static_feature) object 16B 'lsm' 'orography'
    static_feature_long_name       (static_feature) object 16B ...
    state_feature_long_name        (state_feature) object 40B ...
    state_feature_source_dataset   (state_feature) object 40B ...
    static_feature_units           (static_feature) object 16B ...
    state_feature_units            (state_feature) object 40B ...
Data variables:
    state__train__diff_std         (state_feature) float64 40B ...
    state__train__diff_mean        (state_feature) float64 40B ...
    static__train__std             (static_feature) float64 16B ...
    state__train__mean             (state_feature) float64 40B ...
    state__train__std              (state_feature) float64 40B ...
    static__train__mean            (static_feature) float64 16B ...
Attributes:
    schema_version:   v0.5.0
    dataset_version:  v0.1.0
    created_on:       2025-05-15T16:58:37
    created_with:     mllam-data-prep (https://github.com/mllam/mllam-data-prep)
    mdp_version:      v0.6.0
    creation_config:  dataset-version: v0.1.0\nextra:\n  projection:\n    cla...
2025-09-26 09:14:25.912 | DEBUG    | __main__:_prepare_inference_dataset_zarr:199 - Loading training datastore config from inference_artifact/configs/danra.datastore.yaml
2025-09-26 09:14:25.916 | INFO     | __main__:_create_inference_datastore_config:100 - Overwriting input path for danra_surface with https://object-store.os-api.cci1.ecmwf.int/danra/v0.6.0dev1/single_levels.zarr/ previously https://object-store.os-api.cci1.ecmwf.int/mllam-testdata/danra_cropped/v0.2.0/single_levels.zarr
2025-09-26 09:14:25.916 | INFO     | __main__:_create_inference_datastore_config:100 - Overwriting input path for danra_static with https://object-store.os-api.cci1.ecmwf.int/danra/v0.5.0/single_levels.zarr/ previously https://object-store.os-api.cci1.ecmwf.int/mllam-testdata/danra_cropped/v0.2.0/single_levels.zarr
2025-09-26 09:14:25.917 | INFO     | __main__:_create_inference_datastore_config:149 - Replaced time dimension with ['analysis_time', 'elapsed_forecast_duration'] for state
2025-09-26 09:14:25.917 | INFO     | __main__:_create_inference_datastore_config:149 - Replaced time dimension with ['analysis_time', 'elapsed_forecast_duration'] for forcing
2025-09-26 09:14:25.917 | INFO     | mllam_data_prep.create_dataset:create_dataset:169 - Loading dataset danra_surface from https://object-store.os-api.cci1.ecmwf.int/danra/v0.6.0dev1/single_levels.zarr/
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/ops/loading.py:21: FutureWarning: In a future version, xarray will not decode the variable 'elapsed_forecast_duration' into a timedelta64 dtype based on the presence of a timedelta-like 'units' attribute by default. Instead it will rely on the presence of a timedelta64 'dtype' attribute, which is now xarray's default way of encoding timedelta64 values.
To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
To opt-in to future behavior, set `decode_timedelta=False`.
  ds = xr.open_zarr(fp)
2025-09-26 09:14:26.818 | INFO     | mllam_data_prep.create_dataset:create_dataset:183 - Extracting selected variables from dataset danra_surface
2025-09-26 09:14:26.823 | INFO     | mllam_data_prep.create_dataset:create_dataset:229 - Mapping dimensions and variables for dataset danra_surface to state
2025-09-26 09:14:28.346 | INFO     | mllam_data_prep.create_dataset:create_dataset:169 - Loading dataset danra_static from https://object-store.os-api.cci1.ecmwf.int/danra/v0.5.0/single_levels.zarr/
2025-09-26 09:14:28.884 | INFO     | mllam_data_prep.create_dataset:create_dataset:183 - Extracting selected variables from dataset danra_static
2025-09-26 09:14:28.885 | INFO     | mllam_data_prep.create_dataset:create_dataset:229 - Mapping dimensions and variables for dataset danra_static to static
2025-09-26 09:14:29.580 | INFO     | mllam_data_prep.create_dataset:_merge_dataarrays_by_target:76 - Merging dataarrays for target variable `state`
2025-09-26 09:14:29.588 | INFO     | mllam_data_prep.create_dataset:_merge_dataarrays_by_target:76 - Merging dataarrays for target variable `static`
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:109: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge(dataarrays, join="exact")
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:109: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge(dataarrays, join="exact")
2025-09-26 09:14:31.075 | INFO     | mllam_data_prep.create_dataset:create_dataset:262 - Chunking dataset with {'analysis_time': 1}
2025-09-26 09:14:31.229 | INFO     | mllam_data_prep.create_dataset:create_dataset:270 - Setting splitting information to define `['train', 'val', 'test']` splits along dimension `time`
2025-09-26 09:14:31.238 | INFO     | mllam_data_prep.create_dataset:create_dataset:305 - Adding pre-computed statistics to dataset
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:307: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge([ds, ds_stats], join="exact")
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:307: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge([ds, ds_stats], join="exact")
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:307: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge([ds, ds_stats], join="exact")
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:307: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge([ds, ds_stats], join="exact")
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:307: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge([ds, ds_stats], join="exact")
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/create_dataset.py:307: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge([ds, ds_stats], join="exact")
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/zarr/core/dtype/npy/string.py:248: UnstableSpecificationWarning: The data type (FixedLengthUTF32(length=9, endianness='little')) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/zarr/core/dtype/npy/string.py:248: UnstableSpecificationWarning: The data type (FixedLengthUTF32(length=5, endianness='little')) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/zarr/core/dtype/npy/string.py:248: UnstableSpecificationWarning: The data type (FixedLengthUTF32(length=15, endianness='little')) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/zarr/api/asynchronous.py:233: ZarrUserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(
2025-09-26 09:14:37.051 | INFO     | __main__:_prepare_inference_dataset_zarr:225 - Saved inference dataset to inference_workdir/danra.datastore.zarr
2025-09-26 09:14:37.081 | INFO     | __main__:_create_inference_config:246 - Saved inference config to inference_workdir/config.yaml
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
The loaded datastore contains the following features:
 state   : pres_seasurface r2m t2m u10m v10m
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/neural_lam/datastore/mdp.py:214: UserWarning: no forcing data found in datastore
  warnings.warn("no forcing data found in datastore")
 static  : lsm orography
With the following splits (over time):
 train   : 1549281600000000000 to 1549281600000000000
 val     : 1549281600000000000 to 1549281600000000000
 test    : 1549281600000000000 to 1549303200000000000
Writing graph components to inference_workdir/graph/multiscale
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch_geometric/utils/convert.py:249: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:256.)
  data[key] = torch.tensor(value)
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Seed set to 42
The loaded datastore contains the following features:
 state   : pres_seasurface r2m t2m u10m v10m
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/neural_lam/datastore/mdp.py:214: UserWarning: no forcing data found in datastore
  warnings.warn("no forcing data found in datastore")
 static  : lsm orography
With the following splits (over time):
 train   : 1549281600000000000 to 1549281600000000000
 val     : 1549281600000000000 to 1549281600000000000
 test    : 1549281600000000000 to 1549303200000000000
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/neural_lam/datastore/mdp.py:214: UserWarning: no forcing data found in datastore
  warnings.warn("no forcing data found in datastore")
Loaded graph with 523770 nodes (464721 grid, 59049 mesh)
Edges in subgraphs: m2m=527096, g2m=875571, m2g=1858884
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
wandb: Currently logged in as: leifdenby (mllam) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.22.0
wandb: Run data is saved locally in ./wandb/run-20250926_091515-z1cwfia2
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run eval-test-graph_lam-4x2-09_26_09-6510
wandb: ⭐️ View project at https://wandb.ai/mllam/neural_lam
wandb: 🚀 View run at https://wandb.ai/mllam/neural_lam/runs/z1cwfia2
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W926 09:15:16.635975000 ProcessGroupGloo.cpp:545] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/neural_lam/datastore/mdp.py:294: UserWarning: no forcing data found in datastore
  warnings.warn("no forcing data found in datastore")
Restoring states from the checkpoint path at ./inference_artifact/checkpoint.pkl
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:445: The dirpath has changed from '/Users/B280936/git-repos/mllam/neural-lam/saved_models/train-graph_lam-4x2-05_15_17-2301' to '/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/saved_models/eval-test-graph_lam-4x2-09_26_09-6510', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
Loaded model weights from the checkpoint at ./inference_artifact/checkpoint.pkl
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Testing DataLoader 0:   0%|                                                                                                                                                                                                                                                      | 0/1 [00:00:128: RuntimeWarning: 'mllam_data_prep.recreate_inputs' found in sys.modules after import of package 'mllam_data_prep', but prior to execution of 'mllam_data_prep.recreate_inputs'; this may result in unpredictable behaviour
2025-09-26 09:15:43.393 | WARNING  | __main__:recreate_inputs:127 - Target output variable static for input dataset danra_static not found in dataset, skipping
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/mllam_data_prep/recreate_inputs.py:85: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge(dataarrays, join="exact")
2025-09-26 09:15:43.441 | INFO     | __main__:main:321 - Saving input dataset danra_surface to ./inference_workdir/outputs/danra_surface.zarr with chunks={}
/Users/B280936/git-repos/mllam/mlwm-deployment/configurations/surface-dummy-model_DINI/.venv/lib/python3.11/site-packages/zarr/api/asynchronous.py:233: ZarrUserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(
Renaming ./inference_workdir/outputs/danra_surface.zarr to ./inference_workdir/outputs/single_levels.zarr
 ✝  mlwm-deployment/configurations/surface-dummy-model_DINI   feat/forecast-inference-dataset-creation±  uvx zarrdump ./inference_workdir/outputs/single_levels.zarr
Installed 17 packages in 153ms
 Size: 56MB
Dimensions:                    (elapsed_forecast_duration: 6, x: 789, y: 589)
Coordinates:
    analysis_time              datetime64[ns] 8B ...
  * elapsed_forecast_duration  (elapsed_forecast_duration) timedelta64[ns] 48B ...
    time                       (elapsed_forecast_duration) datetime64[ns] 48B ...
  * x                          (x) float64 6kB -1.999e+06 ... -2.925e+04
  * y                          (y) float64 5kB -6.095e+05 ... 8.605e+05
Data variables:
    pres_seasurface            (elapsed_forecast_duration, x, y) float32 11MB ...
    r2m                        (elapsed_forecast_duration, x, y) float32 11MB ...
    t2m                        (elapsed_forecast_duration, x, y) float32 11MB ...
    u10m                       (elapsed_forecast_duration, x, y) float32 11MB ...
    v10m                       (elapsed_forecast_duration, x, y) float32 11MB ...
Attributes:
    recreated_from:       ./inference_workdir/outputs/inference_output.zarr
    recreation_config:    dataset-version: v0.1.0\nextra:\n  projection:\n   ...
    source_dataset_name:  danra_surface
    created_by:           mllam_data_prep.recreate_inputs
    created_on:           2025-09-26T07:15:43.441570+00:00
    mdp-version:          0.6.1

…feat/forecast-inference-dataset-creation

khintz · 2025-09-26T08:47:57Z

Steps 3. and 4. require upstream changes to mllam-data-prep and neural-lam, which are detailed in the model configuration README. I suggest we merge this model configuration now and then I can work on getting these upstream changes in after, and then update the model configuration pyproject.toml file to point to main branches of neural-lam and mllam-data-prep, rather than the development branches that it currently points to

Agree. Will review with this in mind.

khintz

Minor things and some questions.
Great work !

khintz · 2025-09-26T08:52:17Z

 COPY pyproject.toml .
 COPY *.yaml ./
 COPY entry.sh ./
+COPY src/ ./src


Do you expect we will have more under src in longer term? Just wondering if it gives a better overview by just having create_inference_dataset.py in the root of each configuration?

we could do that yes, maybe that is better. It was just that using src/ follows the convention of python scripts for a package (here called surface-dummy-model_DINI) should reside in a subdirectory, named src/ by default.

khintz · 2025-09-26T08:54:09Z

+The model configuration in this directory is a dummy model that was trained on
+surface variables from DANRA, only 10 days of data and only trained 10
+epochs. It is intended only as a demonstration of the inference pipeline and is
+expected to give very poor results.
+
+## Upstream package change requirements
+
+Relative to the `main` branch on both github.com/mllam/mllam-data-prep and
+github.com/mllam/neural-lam and number of pieces of functionality are currently
+required to run this configuration:


In general, are there a reason for line breaks here?

I just find it easier to read, I can remove them if you prefer :) I think this might be a pre-commit markdown default too, but I'm not sure...

khintz · 2025-09-26T08:59:45Z

+# b) include the statistics from the training dataset and
+# c) set the dimensions in the configuration to have `analysis_time` and
+#    `elapsed_forecast_duration` instead of just `time`.
+uv run python src/create_inference_dataset.py


I liked that it was easy to see the command that was needed to create the inference data, but I am also sure you have a good reason for creating the python script instead. Is it to handle configs and chunking?

yes, there is quite a few steps to this so I thought it was best to have in an isolated python script. I added env vars to make it clear what this script depends on in leifdenby@5f56aef#diff-e44c8d78f75e8f0f19d6f563bb5cfa93328cd98925a52573ed2706e0a370e8a7R10-R86 and leifdenby@62bd766#diff-e44c8d78f75e8f0f19d6f563bb5cfa93328cd98925a52573ed2706e0a370e8a7R89

khintz · 2025-09-26T09:09:32Z

+    danra_surface=f"{S3_BUCKET_URL}/v0.6.0dev1/single_levels.zarr/",
+    danra_static=f"{S3_BUCKET_URL}/v0.5.0/single_levels.zarr/",
+)
+ANALYSIS_TIME = "2019-02-04T12:00"


This will work for the POC, but we should think about adding this as an argument in "non-POCs". Maybe add a comment.

Yes, I made this an argument that could be change at runtime in leifdenby@5f56aef

leifdenby · 2025-09-26T13:47:12Z

Thanks for your review @khintz! After disabling W&B I have also been able build and image and run inference with the DANRA forecast data i put on EWC on super-juice inside a container 🥳

I'm working on some further improvements to this PR to address your comments and then I will reply to your comments.

leifdenby · 2025-09-30T13:47:43Z

I'm getting there, but here are some further things I need to fix:

the projection info needs to be update from the training datastore config. We shouldn't really have hardcoded the projection info in the datastore config, rather it should be read from the source data, but that is what we have for now
there is a bug in neural-lam doing ds.sel(time=ds.splits.t_start.item()) where t_start.item() becomes an int if it is a np.datetime64[ns] rather than np.datetime64[s]
there is no python3.11 manylinux wheel for torch on ohm.dmi.dk, and so because zarr requires python>=3.11 we have to use zarr2
write dev notes for ohm (on using venv in squashed with uv, and .env)
fix warning in mllam-data-prep feature for merging stats with training dataset when longnames differ

- expose container build program with env var so we can use docker on DGX Spark - use nvidia docker registry for ARM nvidia image on ARM platforms - install with pip directly into system python site-packages (with overwrite) while setting torch version constraint (otherwise dependencies install overwrites the pytorch install) - optionally use pip directly inside container (rather than via uv) to ensure we use system python without venv

khintz · 2025-11-06T10:12:06Z

With Spark, should we ignore the issues on ohm for now?

khintz · 2025-11-06T11:39:35Z

As agreed, we merge now and continue from main.

leifdenby added 9 commits May 14, 2025 12:35

cli for building forecast inference datasets

ae0411c

Merge branch 'main' of https://github.com/dmidk/mlwm-deployment into …

00d501d

…feat/forecast-inference-dataset-creation

working inference dataset creation

7705d85

able to load inference datastore and config in neural-lam

94742f8

wip on inference run

081e605

Merge branch 'main' of https://github.com/dmidk/mlwm-deployment into …

9ff6393

…feat/forecast-inference-dataset-creation

first working inference entry-point!

316ccb5

cleanup

91b7104

more cleanup

d0b6acc

leifdenby changed the title ~~cli for building forecast inference datasets~~ First inference run from inference artifact Sep 26, 2025

leifdenby changed the title ~~First inference run from inference artifact~~ First model configuration with inference run from inference artifact Sep 26, 2025

leifdenby marked this pull request as ready for review September 26, 2025 07:22

more cleanup dmidk#2

2982d4c

leifdenby requested a review from khintz September 26, 2025 07:22

leifdenby added 2 commits September 26, 2025 09:55

remove src from pyproject.toml

ff7966c

include src/ in container image

9dcfa6a

khintz reviewed Sep 26, 2025

View reviewed changes

disable wandb

16f5c99

leifdenby added 7 commits September 26, 2025 16:28

move runtime args to env vars and support multiple datastores

5f56aef

add developing notes

db44c91

":" -> "." in datastore input path overrides

ebbf192

update for upstream fixes

c0e06a1

expose workdir through env var

62bd766

use single gpu during inference

47904d4

no inline comments in multiline bash commands

8835370

leifdenby added 2 commits November 5, 2025 14:32

fix linting

29f7cf9

leifdenby added 3 commits November 5, 2025 14:49

add comment wrt inference workdir

54e6027

add comment about splits in config

c58cc36

add missing docstring

0f4a959

leifdenby added 2 commits November 6, 2025 11:41

use new artifact without orography

aaa75c1

add .env creation util

2d1d0cc

khintz merged commit c43d522 into dmidk:main Nov 6, 2025
4 checks passed

leifdenby mentioned this pull request Jan 27, 2026

Feat/forecast inference dataset creation 2 #17

Closed

Conversation

leifdenby commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khintz commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khintz left a comment

Choose a reason for hiding this comment

Uh oh!

khintz Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

leifdenby Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

khintz Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

leifdenby Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

khintz Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

leifdenby Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

khintz Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

leifdenby Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leifdenby commented Sep 26, 2025

Uh oh!

leifdenby commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khintz commented Nov 6, 2025

Uh oh!

khintz commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leifdenby commented May 14, 2025 •

edited

Loading

khintz commented Sep 26, 2025 •

edited

Loading

leifdenby Nov 5, 2025 •

edited

Loading

leifdenby commented Sep 30, 2025 •

edited

Loading