Docs/hello world danra notebook 69 by Jayant-kernel · Pull Request #202 · mllam/neural-lam

Jayant-kernel · 2026-02-20T21:03:13Z

Describe your changes

This PR adds a brief, onboarding Jupyter notebook (hello_world_danra.ipynb) to address #69.
It uses the existing DANRA 100m winds example configuration to walk new users through:

CPU-safe environment setup
mllam-data-prep preprocessing
Single-level (1level) graph generation
CPU model training (1 epoch)
Evaluation and W&B offline logging
This is a draft to get feedback on the notebook content, defaults (e.g., 1level vs hierarchical), and location before finalizing.

Issue Link

Addresses #69

Type of change

Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch
I have performed a self-review of my code
I have given the PR a name that clearly describes the change, written in imperative form.
I have requested a reviewer and an assignee.

Checklist for reviewers

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change

Checklist for assignee

PR is up to date with the base branch
the tests pass
the PR is assigned to the next milestone
author has added an entry to the changelog

…llam#69)

Jayant-kernel · 2026-02-21T03:54:07Z

@sadamov @joeloskarsson
Please review that my draft is okay or not?

sadamov · 2026-02-21T09:41:07Z

Thanks @Jayant-kernel for your work on this. I am having a look at the notebook right now. Could you in the meantime use the original PR template please? The current one seems to be missing some bullet points.

sadamov

This is certainly going into the right direction, thanks again. What I really like is that the full notebook takes ~5mins on CPU. But, the data fetching and the model training fail with the current default parameters. Just wanted to emphasize again that in our org it's fine to be slower but more accurate and that PRs are tested by human hands before submission.

Three general points:

have you already considered how you would keep this notebook up to date? So that for each release of neural-lam or mllam-data-prep the codecells still work (e.g. integrate notebook exec. in a test)?
a. one such change will be the introduction of weather-model-graphs for graph creation: #184
it would be great if you could vizualise, the input data, the graph, the model predictions similar to this notebook here: https://github.com/joeloskarsson/neural-lam-dev/blob/research/docs/reproduce_paper_sample.md
please fix the pre-commit errors

Here are some remarks from my initial walkthrough:

Before the cell that "import os", advice the user to chose a pre-installed Python version according to the pyproject.toml file, that provides ipykernel
Since the notebook is in ./docs/notebooks you first have to cd into ROOT, otherwise cells will not run in ROOT
convert uv install into a code cell
The CLI expects config as a positional argument, not --config for mllam-data-prep
for the model training/eval you need to set --val_steps_to_log 1 because the default setting would validate ar_steps > 1 which we are not training on. You also need to set --num_workers to a higher value like 8 instead of 0, because by default neural-lam uses persistens workers, 0 is not allowed.
you can also link this paper, actually describing the research with DANRA: https://arxiv.org/abs/2504.09340

Now about the next steps: this issue is scheduled for version v0.7.0: https://github.com/mllam/neural-lam/milestone/12. that means we can merge it only after v0.6.0 is released. We discuss these milestones every month in the dev-meeting and they can be adjusted.
The work on this coul be part of a GSoC proposal for the project about structured docs.

…m#202)

Jayant-kernel · 2026-02-21T18:31:11Z

@sadamov
Thanks for the detailed review — I’ve updated the draft notebook based on your comments.

Changes made

switched to the original PR template
added Python version / ipykernel note
added repo-root cd fix so cells run from docs/notebooks/
converted install steps into runnable code cells
fixed mllam_data_prep CLI usage (positional config arg)
fixed training/eval flags (--val_steps_to_log 1, --num_workers 8)
added basic visualizations (input data, graph preview, predictions)
added DANRA paper reference
fixed pre-commit issues

I also added a note about future maintainability (notebook execution test in CI) and the upcoming weather-model-graphs transition (#184).

Please take another look when you have time — happy to make more changes.

varunsiravuri · 2026-02-22T21:06:26Z

Nice work @Jayant-kernel ! One small suggestion , it might be helpful to add a note in the notebook about the minimum required version of mllam-data-prep , so new users don't run into compatibility issues.

…edictions, version note, fix graph viz (mllam#202)

…ositional arg (mllam#202)

Jayant-kernel · 2026-02-23T17:28:39Z

Hey @sadamov, another round of fixes:

Checkpoint now picks the most recently modified .ckpt (not arbitrary first)
Prediction plots display inline after eval
Added mllam-data-prep >= 0.6.0 version note
Graph cell now lists actual .pt files instead of loading non-existent graph.pt
Fixed --config → positional arg in Scaling Tips
Happy to keep working.

sadamov

I just ran this notebook on my 7 year-old laptop and it works perfectly! 🤯 this is great!

since it is a bit tricky to review notebooks, I pushed some smaller changes directly into your branch. please have a look. we should keep the notebook rendered, so that other users can refer to the expected output.

there are currently two major open points that I would like to see addressed:

The plotting
a. the input data should be a 2D horizontal plot of at least one of the 4 state features (atmospheric variables u100m v100m r2m t2m) and not a 1D-lineplot
b. the graph should be plotted and vizualised with neural_lam/plot_graph.py
c. the prediction plots fail to be displayed (and saved to disk?) please double check paths
CI/CD integration: In some form or another we should make sure that this notebook keeps up to date with the codebase. But unlike tests you are allowed to break the notebook cells. But we need to realize that they are broken and fix it for each PR in the future. Maybe it could become a new entry in PR-temple? not sure, open to your suggestions.

Jayant-kernel · 2026-02-27T08:52:05Z

Thanks for the review @sadamov! All requested changes are pushed:

Formatting: Updated PR template & fixed pre-commit issues.
Notebook: Added kernel version warning and a cd repo root cell.
Setup: Converted uv install to a code cell.
CLI: Fixed mllam-data-prep positional config args.
Plotting: Replaced 1D plots with a 2D map & added graph visualization via plot_graph.py.
Performance: Tuned training defaults for reliable CPU execution.
CI: Happy to add an optional GH action to validate notebook runs for freshness.
Ready for another look

Mani212005 · 2026-03-06T03:33:22Z

@Jayant-kernel @sadamov I have been following the PR, it look great!
One blocking issue I noticed while reviewing: the evaluation cell (Section 5) fails with a PyTorch 2.6 compatibility error visible in the captured output:

_pickle.UnpicklingError: Unsupported global: GLOBAL argparse.Namespace was not an allowed global by default.

This is because PyTorch 2.6 changed the default of weights_only in torch.load from False to True, which breaks loading checkpoints containing argparse.Namespace objects. A notebook-level workaround would be:

import argparse
import torch.serialization
torch.serialization.add_safe_globals([argparse.Namespace])

Though the cleaner fix is probably in train_model.py's checkpoint loading code rather than the notebook itself — worth checking if there's already an issue open for this.

sadamov · 2026-03-06T07:11:21Z

@Mani212005 were you actually able to run the notebook? For me the file was corrupted and I couldn't even open it. I pushed a quick fix to remove the artifacts that caused the not-opening issue. Now however I see a lot of other characters that were introduced in one of the recent commits, e.g. ”œâ”€â”€. @Jayant-kernel it looks a lot like an LLM went rouge here. Maybe it's best to revert to commit e1bb57a and restart from there?

kshirajahere · 2026-03-06T07:45:38Z

@sadamov I diffed the notebook history locally.

what i can confirm from the file itself is:

e1bb57a is valid json and parses cleanly
ad32d03 is the commit that makes the notebook invalid json
the current tip badfe4b restores JSON validity, but the notebook content still looks damaged relative to e1bb57a

Reverting back to e1bb57a is on the safer side. And afaik pytorch 2.6 checkpoint-loading error is separate

Mani212005 · 2026-03-06T07:46:52Z

@sadamov I saw the notebook source directly from the GitHub diff, not by running it locally. The PyTorch error I flagged was visible in the captured cell outputs embedded in the .ipynb JSON from a previous run.

- Revert to e1bb57a baseline to remove LLM-introduced encoding artifacts - Replace 1D line plot in cell 12 with proper 2D pcolormesh map of state features using x/y grid coordinates from the zarr datastore - Add PyTorch 2.6 fix cell before eval: registers argparse.Namespace as a safe global so torch.load does not fail with weights_only=True - Clear stale error outputs from eval and prediction display cells

Jayant-kernel · 2026-03-06T09:10:29Z

@sadamov rereview

sadamov

The notebook is working again. But previous comments have not been adressed still. For example the plot of the graph object is missing and the plot of the prediction fails. Please also address the comment about torch > 2.6 from above.

once you are sure to have addressed all open remarks, please update the changelog

yukthagangadhari5 · 2026-03-07T17:48:43Z

Hi, I would like to work on this issue. Could you please assign it to me?

sadamov · 2026-03-08T05:43:54Z

@yukthagangadhari5 so @Jayant-kernel is already working on this. But maybe he would appreciate some help on the visualizations (input data, graphs, predictions). Best to ask them directly here

yukthagangadhari5 · 2026-03-08T06:21:39Z

Thanks for the suggestion! I’d be happy to help with the visualizations.

@Jayant-kernel would you like help adding plots for the input data, graphs, or model predictions in the notebook? Let me know what would be most useful.

nsg365 · 2026-03-10T10:30:38Z

Hi @Jayant-kernel , I am interested in picking up this issue as part of the documentation generation project. I have the required skills in Python and Shell.

Before I get started, are there any specific documentation frameworks (like Sphinx or pdoc) the team prefers, or should I start by doing some research and proposing a solution? I would love to be assigned to this!

… notebook - Add cell that runs neural_lam.plot_graph (--datastore_config_path, --graph 1level, --save graph_viz.html) to produce an interactive Plotly HTML visualisation of the message-passing graph (addresses reviewer request in mllam#202) - Add cell that displays the saved graph_viz.html inline via IFrame - Replace broken PNG-glob prediction display with a cell that loads the example_pred_*.pt / example_target_*.pt tensors saved by the wandb logger and renders a side-by-side Ground Truth vs Prediction pcolormesh plot inline - Add CHANGELOG entry under [unreleased] / Added for PR mllam#202 Fixes remaining open points from sadamov's review on PR mllam#202.

Jayant-kernel · 2026-03-10T14:52:35Z

Hi @sadamov — addressed all remaining open points from your last review:

Graph visualisation = Added two cells after the graph-creation step:

Runs neural_lam.plot_graph with --datastore_config_path, --graph 1level, and --save graph_viz.html
Displays the interactive Plotly HTML inline via IFrame
Prediction plots — Replaced the broken PNG-glob approach:

Loads example_pred_.pt / example_target_.pt tensors saved by the wandb logger under wandb/
Renders a side-by-side Ground Truth vs Prediction inline plot for the first AR step
PyTorch 2.6 (flagged by @Mani212005) — torch.serialization.add_safe_globals([argparse.Namespace]) is already in place before eval

CHANGELOG , Entry added under [unreleased]

…yant-kernel/202

sadamov

Thanks for adding these plots, here are some issues I had when running the latest notebooks additions:

cell 13: could we retrieve the units directly from the xarray dataset here, because the coords do have units attrs. That would make it even more instructive. (e.g. x_units = ds.coords["x"].attrs.get("units", ""))
cell 19: for me the graph html does not render in vscode (maybe related to security settings), could we use this instead with open("graph_viz.html") as f: display(HTML(f.read()))
cell 26: the eval fails because of the way torch loads checkpoints e.g. with torch=2.10.0. this issue will soon be addressed in #240. But until then we need another way to make it work here in this notebook. I can see the issue. Cell 24 already registers the safe globals for the notebook kernel, but the eval command in cell 25 runs as a subprocess (!{sys.executable} -m ...), so the fix doesn't carry over. you can use a sitecustomize.py for example.
```
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. 
```
cell 27: is currently broken even with the cell 26 fixed with RecursionError: maximum recursion depth exceeded. But instead of fixing it I propose to stay closer to https://github.com/joeloskarsson/neural-lam-dev/blob/research/docs/reproduce_paper_sample.md for the final eval and vizualizations. So only generate one example_pred, save everything as zarr. vizualise the test_rmse.pdf directly from the output, and also visualize the preditions based on plots generated during eval. then there is no need to do the manual .pt loading and such. let me know if you think some approaches in the notebook linked here could be improved. very open to suggestions.

and two more general remarks:

cell 8: when we use !{sys.executable} the first time, could you add a comment saying smth along the lines of: "# sys.executable ensures we use the same Python as this notebook's kernel"
for the future maintainability we were talking about a see diverging approaches. What do you think is the best way forward?
1. automatize pre-commits and github workflows to nbmake the full notebook and ensure it's compatability
2. add a note to the PR-template that tells authors to check if their changes break the notebooks
3. do nothing and fix the notebooks when broken. the argument here is that breaking changes are rare to the core functionailities of the repo.

sadamov · 2026-03-31T02:52:26Z

@Jayant-kernel doing some housekeep: there are many people interested in contributing to this PR. Are you still working on it or is it okay if somebody elsi takes over?

Sharkyii · 2026-03-31T04:19:57Z

@sadamov i can help integrate this into a new PR if @Jayant-kernel is not active.
Will apply all suggestions mentioned up here!

sadamov · 2026-04-10T09:13:23Z

superceded by #577

NullPointer-cell added 2 commits February 21, 2026 02:10

docs: add draft HelloWorld DANRA notebook for onboarding (Addresses m…

0cb7b0d

…llam#69)

docs: use current kernel python in notebook

fdfbf99

Jayant-kernel marked this pull request as draft February 20, 2026 21:04

Jayant-kernel mentioned this pull request Feb 20, 2026

Add DANRA tutorial notebook with pytest-nbmake #69

Open

sadamov self-requested a review February 21, 2026 09:37

sadamov reviewed Feb 21, 2026

View reviewed changes

sadamov added documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers labels Feb 21, 2026

sadamov added this to the v0.7.0 milestone Feb 21, 2026

sadamov assigned sadamov and Jayant-kernel and unassigned sadamov Feb 21, 2026

docs: address maintainer feedback on Hello World DANRA notebook (mlla…

e4e3e57

…m#202)

NullPointer-cell added 2 commits February 23, 2026 21:48

docs: improve Hello World notebook - robust ckpt selection, inline pr…

06c8755

…edictions, version note, fix graph viz (mllam#202)

docs: remove comments from notebook code cells, fix mllam-data-prep p…

e314bd6

…ositional arg (mllam#202)

sadamov assigned sadamov and unassigned Jayant-kernel Feb 24, 2026

Jayant-kernel marked this pull request as ready for review February 24, 2026 17:44

Merge branch 'main' into docs/hello-world-danra-notebook-69

4295b31

sadamov self-requested a review February 26, 2026 16:11

addressing minor review comments

e1bb57a

sadamov requested changes Feb 26, 2026

View reviewed changes

Jayant-kernel requested a review from sadamov March 2, 2026 08:41

sadamov mentioned this pull request Mar 3, 2026

Docs: Add HelloWorld Example Notebook (#69) #320

Closed

14 tasks

Jayant-kernel force-pushed the docs/hello-world-danra-notebook-69 branch from badfe4b to 44196b2 Compare March 6, 2026 08:56

Jayant-kernel force-pushed the docs/hello-world-danra-notebook-69 branch from 44196b2 to 42ad106 Compare March 6, 2026 08:57

Jayant-kernel added 2 commits March 6, 2026 14:30

docs: add trailing newline to fix pre-commit check

501add0

Merge branch 'main' into docs/hello-world-danra-notebook-69

4ecfcbe

sadamov requested changes Mar 6, 2026

View reviewed changes

sadamov mentioned this pull request Mar 8, 2026

Add tutorial notebooks covering installation to evaluation #337

Draft

21 tasks

Jayant-kernel added 2 commits March 10, 2026 20:13

docs: resolve CHANGELOG merge conflict with upstream/main

3d8b0a0

sadamov self-requested a review March 10, 2026 19:04

sadamov and others added 2 commits March 15, 2026 23:25

Merge branch 'main' of https://github.com/mllam/neural-lam into pr/Ja…

8cc3e29

…yant-kernel/202

Merge branch 'main' into docs/hello-world-danra-notebook-69

3aaf606

sadamov requested changes Mar 16, 2026

View reviewed changes

sadamov mentioned this pull request Mar 27, 2026

docs: add Architecture & Developer Guide page with Mermaid flowchart #525

Closed

4 tasks

Sharkyii mentioned this pull request Apr 3, 2026

Add DANRA tutorial notebook with pytest-nbmake (#69) #577

Open

21 tasks

sadamov closed this Apr 10, 2026

Conversation

Jayant-kernel commented Feb 20, 2026

Describe your changes

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

Uh oh!

Jayant-kernel commented Feb 21, 2026

Uh oh!

sadamov commented Feb 21, 2026

Uh oh!

sadamov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jayant-kernel commented Feb 21, 2026

Changes made

Uh oh!

varunsiravuri commented Feb 22, 2026

Uh oh!

Jayant-kernel commented Feb 23, 2026

Uh oh!

sadamov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jayant-kernel commented Feb 27, 2026

Uh oh!

Mani212005 commented Mar 6, 2026

Uh oh!

sadamov commented Mar 6, 2026

Uh oh!

kshirajahere commented Mar 6, 2026

Uh oh!

Mani212005 commented Mar 6, 2026

Uh oh!

Jayant-kernel commented Mar 6, 2026

Uh oh!

sadamov left a comment

Choose a reason for hiding this comment

Uh oh!

yukthagangadhari5 commented Mar 7, 2026

Uh oh!

sadamov commented Mar 8, 2026

Uh oh!

yukthagangadhari5 commented Mar 8, 2026

Uh oh!

nsg365 commented Mar 10, 2026

Uh oh!

Jayant-kernel commented Mar 10, 2026

Uh oh!

sadamov left a comment

Choose a reason for hiding this comment

Uh oh!

sadamov commented Mar 31, 2026

Uh oh!

Sharkyii commented Mar 31, 2026

Uh oh!

sadamov commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

sadamov left a comment •

edited

Loading

sadamov left a comment •

edited

Loading