mlcast-community · martinbo-meteo · Feb 11, 2026 · Feb 13, 2026 · Feb 13, 2026 · Feb 24, 2026
diff --git a/.gitignore b/.gitignore
@@ -4,3 +4,10 @@ __pycache__/
 build/
 dist/
 .venv/
+.ipynb_checkpoints/
+src/mlcast/modules/.ipynb_checkpoints/
+src/mlcast/models/ldcast/context/.ipynb_checkpoints/
+src/mlcast/models/ldcast/diffusion/.ipynb_checkpoints/
+src/mlcast/models/ldcast/.ipynb_checkpoints/
+src/mlcast/models/ldcast/autoenc/.ipynb_checkpoints/
+src/mlcast/models/ldcast/blocks/.ipynb_checkpoints/
diff --git a/README.md b/README.md
@@ -1,83 +1,42 @@
-# mlcast
+# MLCast implementation of LDCast
 
-<!-- SPDX-License-Identifier: Apache-2.0 OR BSD-3-Clause -->
+see main branch https://github.com/mlcast-community/mlcast for context.
 
-The MLCast Community is a collaborative effort bringing together meteorological services, research institutions, and academia across Europe to develop a unified Python package for AI-based nowcasting. This is an initiative of the E-AI WG6 (Nowcasting) of EUMETNET.
+## Code structure
 
-This repo contains the `mlcast` package for machine learning-based weather nowcasting.
+There is one main `LDCast` class, subclassing the `NowcastingModelBase` class. There are three main nets in LDCast:
+ - the autoencoder
+ - the conditioner
+ - the denoiser
 
-## Project Status
+The `NowcastingLightningModule` is subclassed by the smaller composites of nets that should be trained at once. This gives two subclasses in this case:
+ - the autoencoder (encoder + decoder) has to be trained on its own, so there is one subclass of `NowcastingLightningModule` called `Autoencoder`
+ - the conditioner and the denoiser have to be trained together, so they are combined into one neural network (the `LatentDiffusionNet` class), whose training is handled by the `LatentDiffusion` subclass of the `NowcastingLightningModule`
 
-⚠️ **Under Development** - This package is currently in early development stages and not usable by end users. The API and functionality are subject to change.
+## Documentation
 
-## Installation
-```bash
-# Install from pypi
-pip install mlcast
-```
+See `docs` folder for some documenation on the main `LDCast` class, on the autoencoder and on the latent diffusion part.
 
-or
-```bash
-# Install from source
-git clone https://github.com/mlcast-community/mlcast
-cd mlcast
-uv pip install -e .
+## TO DO
 
-# For development
-uv pip install -e ".[dev]"
-```
+reorganize the `LatentDiffusion` class ? for the moment, `LatentDiffusionNet.forward` is never called during inference because the inference process is quite different than in training (see `docs/ldm.md). It might be maybe a bit clearer to reorganize that by implementing explicitly different training and inference step methods in the `LatentDiffusion` class (that being said, `AutoencoderKLNet.forward` is never called either during inference)
 
-## Project Structure
+The 'timesteps' variable sometimes refers to the timesteps of the diffusion process (= 1000 during training) and sometimes refers to the nowcasting timesteps (where each time step = 5 minutes). Better to have different names.
 
-```
-mlcast/
-├── src/mlcast/          # Main package source code
-│   ├── __init__.py      # Package initialization and version
-│   ├── data/            # Data loading and preprocessing
-│   │   ├── zarr_datamodule.py   # PyTorch Lightning data module for Zarr
-│   │   └── zarr_dataset.py      # PyTorch dataset for Zarr arrays
-│   ├── models/          # Lightning model implementations
-│   │   └── base.py      # Abstract base classes for nowcasting models
-│   └── modules/         # Pure PyTorch neural network modules
-│       └── convgru_modules.py   # ConvGRU encoder-decoder modules
-├── examples/            # Example scripts and notebooks
-│   └── scripts/
-│       └── simple_train.py      # Basic training example
-├── pyproject.toml       # Project metadata and dependencies
-├── LICENSE              # Apache 2.0 license
-└── README.md            # This file
-```
+We might integrate this code within the Hugging Face Diffusers Library.
 
-## Development
+It remains mainly to write code in the main LDCast class (in `ldcast.py`)
 
-This project uses `uv` for dependency management. To set up the development environment:
+It would be nice to rewrite the PLMS sampler, it is a little messy
 
-```bash
-# Install uv if not already installed
-curl -LsSf https://astral.sh/uv/install.sh | sh
+implement different parametrization than 'eps'
 
-# Install dependencies
-uv sync
+use ZarrDataModule and ZarrDataset !
 
-# Run pre-commit hooks
-uv run pre-commit install
-```
+add the computation of the EMA loss during the ldm training, change the LDCast.predict method so that EMA weights are automatically used during inference
 
-## Contributing
+add in the code (and in the doc) the input and output shapes of the nets
 
-Please feel free to raise issues or PRs if you have any suggestions or questions.
+understand which parameters can be changed, which have to be adapted when others change
 
-## Links to presentations for discussion about the API
-
-- [2025/02/04 first design discussions](https://docs.google.com/presentation/d/1oWmnyxOfUMWgeQi0XyX4fX9YDMX1vl6h/edit?usp=drive_link&rtpof=true&sd=true)
-
-## License
-
-This project is dual-licensed under either:
-
-* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
-* BSD 3-Clause License ([LICENSE-BSD](LICENSE-BSD) or https://opensource.org/licenses/BSD-3-Clause)
-
-at your option.
-
-See [LICENSE](LICENSE) for more details.
+make the implementation of the `AutoencoderDataset` more efficient ? (see docs/autoencoder)
diff --git a/config.yaml b/config.yaml
@@ -0,0 +1,94 @@
+model:
+  autoencoder:
+    optimizer_class: "${as_class: 'torch.optim.AdamW'}"
+    optimizer_kwargs:
+      lr: 0.001
+      betas: [0.5, 0.9]
+      weight_decay: 0.001
+    lr_scheduler: 
+      class: "${as_class: 'torch.optim.lr_scheduler.ReduceLROnPlateau'}"
+      kwargs:
+        patience: 3
+        factor: 0.25
+      extra:
+        monitor: 'val/rec_loss'
+        frequency: 1
+        interval: 'epoch'
+    antialiaser:
+      use: True
+      kwargs: {}
+    encoder: {}
+    decoder: {}
+    net_kwargs:
+      hidden_width: &autoencoder_hidden_width 32
+    loss:
+      kl_weight: 0.01
+    trainer:
+      max_epochs: 200
+      accelerator: 'gpu'
+      log_every_n_steps: 5
+      callbacks: "${as_class: '[pl.callbacks.EarlyStopping(\"val/loss_epoch\", patience=6, verbose=True, check_finite=False)]'}"
+      strategy: 'ddp'
+      num_nodes: 1
+      sync_batchnorm: True
+    dataloader:
+      batch_size: 1
+      num_workers: 0
+      persistent_workers: False
+
+  ldm:
+    conditioner:
+      autoencoder_dim: *autoencoder_hidden_width
+      output_patches: &output_patches 5
+      cascade_depth: 3
+      embed_dim: 128
+      analysis_depth: 4
+    denoiser:
+      in_channels: *autoencoder_hidden_width
+      model_channels: 256
+      out_channels: *autoencoder_hidden_width
+      num_res_blocks: 2
+      attention_resolutions: [1, 2]
+      dims: 3
+      channel_mult: [1, 2, 4]
+      num_heads: 8
+      num_timesteps: *output_patches
+      context_ch: [128, 256, 512] # should be equal to conditioner.cascade_dims ?
+    ema:
+      use: True
+      kwargs:
+        store_device: 'cuda'
+    optimizer_class: "${as_class: 'torch.optim.AdamW'}"
+    optimizer_kwargs:
+      lr: 0.0001
+      betas: [0.5, 0.9]
+      weight_decay: 0.001
+    lr_scheduler: 
+      class: "${as_class: 'torch.optim.lr_scheduler.ReduceLROnPlateau'}"
+      kwargs:
+        patience: 3
+        factor: 0.25
+      extra:
+        monitor: 'val/loss' # is actually the ema loss, since the ema weights are used for validation
+        frequency: 1
+        interval: 'epoch'
+    scheduler: {} # diffusion scheduler
+    trainer:
+      max_epochs: 200
+      accelerator: 'gpu'
+      log_every_n_steps: 5
+      callbacks: "${as_class: '[pl.callbacks.EarlyStopping(\"val/loss_epoch\", patience=6, verbose=True, check_finite=False)]'}"
+      strategy: 'ddp'
+      num_nodes: 1
+      sync_batchnorm: True
+    dataloader:
+      batch_size: 1
+      num_workers: 0
+      persistent_workers: False
+
+sampled_radar_dataset:
+  zarr_path: '/scratch/martinbo/MLCast/radklim.zarr'
+  csv_path: '/scratch/martinbo/MLCast/LDCastTraining/indexes_radklim/sampled_datacubes_2001-01-01-2001-01-01_24x256x256_3x16x16_1500000.csv'
+  steps: 24
+  augment: False
+  data_var: 'RR'
diff --git a/docs/autoencoder.md b/docs/autoencoder.md
@@ -0,0 +1,80 @@
+# Autoencoder documentation
+
+1. [Autoencoder class](#autoencoder-class)
+2. [Tensor shapes](#tensor-shapes)
+3. [Encoding and decoding](#encoding-and-decoding)
+4. [Loading original weights](#loading-original-weights)
+5. [Antialiasing](#antialiasing)
+6. [Autoencoder training dataset](#autoencoder-training-dataset)
+7. [Background on variational autoencoders](#background-on-variational-autoencoders)
+
+## Autoencoder class
+
+The `Autoencoder` class is a subclass of `NowcastingLightningModule`, and takes three arguments:
+ - the `net` (an instance of `AutoencoderKLNet` for LDCast), which is the neural network of the autoencoder, containing the decoder and the autoencoder
+ - the `loss` (an instance of `AutoencoderLoss` for LDCast)
+Options for the optimizer and the learning rate scheduler can be passed as well.
+
+An instance can be created from a `dict` containing the configuration, based on the architecture of LDCast's autoencoder:
+```python
+from mlcast.models.ldcast.autoencoder.autoencoder import Autoencoder
+autoencoder = Autoencoder.from_config(config)
+```
+
+## Tensor shapes
+
+The autoencoder encodes sequences of radar images (not image by image). The number of radar images encoded at once is given by `autoenc_time_ratio` and was set to 4 in the original code (and kept here). `Conv3d` layers are used for the encoding, so input tensors have shape
+```
+(batch_size, n_channels, autoenc_time_ratio,) + spatial shape
+```
+`n_channels` is always 1 for radar images.
+
+In latent space, the tensors have shape `(batch_size, 32, n, 64, 64)`, where 32 is the `hidden_width` of the `autoencoder` and `n` is the number of consecutive encoded radar images divided by `autoenc_time_ratio`. **I should still clarify which of these parameters can be changed freely, and how it affects other shapes. Can `autoencoder.net` encode a e.g. 8 images at once (in which case `n` is 2) ?**
+
+
+## Encoding and decoding
+
+Doing the following
+```python
+import torch
+inputs = torch.randn(1, 1, 4, 256, 256, device = 'cuda') # fake sample
+autoencoder(inputs)
+```
+is equivalent to `autoencoder.net(inputs)` and computes the whole forward pass through the `net` (encoding + decoding). To encode only, one needs to do
+```python
+autoencoder.net.encode(inputs).
+```
+If `encoded` is an encoded sample, it can be decoded as
+```python
+autoencoder.net.decode(encoded)
+```
+
+## Laoding original weights
+
+The original weights can be loaded directly as
+```python
+autoenc_weights_fn = '/path/to/original/autoencoder/weights'
+autoencoder.net.load_state_dict(torch.load(autoenc_weights_fn))
+```
+
+## Antialiasing
+
+As in the original code, antialiasing is applied by default (by an Antialiaser object) to the inputs before being fed to the `net`.
+
+## Autoencoder training dataset
+
+Gabriele's code produces a dataset whose samples are sequences of `steps` images (`steps` is usually set to 24, to have 4 input images and 20 ground truth images).
+
+But the autoencoder needs samples which are sequences of only 4 images, so each sample in `SampledRadarDataset` needs to be divided in 6 samples. This is done by the `AutoencoderDataset`. Its samples are tuple `(x, y)` where `y = x` since we want the autoencoder to reconstruct the sequences.
+
+**The current implementation of this class is not the most efficient since, when going through the `AutoencoderDataset`, each sample of the `SampledRadarDataset` is loaded 6 times.**
+
+## Background on variational autoencoders
+
+The autoencoder used in LDCast is a variational autoencoder. Here is some background on that kind of autoencoder.
+
+Source https://medium.com/@jpark7/finally-a-clear-derivation-of-the-vae-kl-loss-4cb38d2e47b3.
+
+Variational autoencoders encode the data through a normal distribution in latent space: each sample is represented by the mean and the standard deviation of the normal distribution. When decoding the sample, a new sample is created resembling the original sample, but is not quite the same. The degree to which we force the decoded samples to resemble the original ones is tuned by the `kl_weight` parameter of the KL loss function.
+
+When using the encoded sample (for example to produce a condition with the conditioner), only the mean is used. In the original code, `autoencoder.net.decode` was returning a tuple `(mean, log_var)`, so that one had to select the mean with `autoencoder.net.decode(x)[0]`, which is not very clear. I replaced this by adding a keyword `return_log_var` in `autoencoder.net.decode`.
diff --git a/docs/ldcast.md b/docs/ldcast.md
@@ -0,0 +1,57 @@
+# Main LDCast class documentation
+
+1. [LDCast class](#ldcast-class)
+2. [Inference](#inference)
+3. [Loading/saving weights](#loading/saving-weights)
+4. [Training](#training)
+
+## LDCast class
+
+The `LDCast` class is a subclass of `NowcastingModelBase` and takes three arguments
+ - the `ldm` (typically, an instance of `LatentDiffusion`)
+ - the `autoencoder` (typically, an instance of `Autoencoder`)
+ - the `sampler`
+
+An instance can be created from a `dict` containing the configuration, based on the architecture of LDCast:
+```python
+from mlcast.models.ldcast.ldcast import LDCast
+ldcast = LDCast.from_config(config)
+```
+A config very close to what was used in the original code is in 'config.yaml'. It should be loaded as
+```python
+from omegaconf import OmageConf
+OmegaConf.register_new_resolver("as_class", lambda class_name: eval(class_name))
+config = OmegaConf.load('config.yaml')
+```
+
+## Inference
+
+Predictions can be produced with
+```python
+import torch
+inputs = torch.randn(1, 1, 4, 256, 256, device = 'cuda') # fake data
+ldcast.predict(inputs)
+```
+**Do not use for the moment, since the EMA weights (if used) are not automatically used for inference**
+
+## Loading/saving weights
+To load from a folder containing in different files the weights of the autoencoder, of the denoiser and of the conditioner (and possibly ema weights):
+```python
+ldcast.load('/path/to/folder')
+```
+To save in a folder:
+```python
+ldcast.save('/path/to/folder')
+```
+
+## Training
+
+If `sampled_radar_dataset` is a `SampledRadarDataset` built with Gabriele's code (https://github.com/DSIP-FBK/ConvGRU-Ensemble/blob/main/convgru_ensemble/datamodule.py), the autoencoder can be trained with
+```python
+ldcast.fit_autoencoder(sampled_radar_dataset)
+```
+and the ldm can be trained
+```python
+ldcast.fit_ldm(sampled_radar_dataset)
+```
+Keyword arguments can be passed to the trainer and the dataloader through the `trainer_kwargs` and `dataloader_kwargs` keywords.