Add DANRA tutorial notebook with pytest-nbmake (#69) by Sharkyii · Pull Request #577 · mllam/neural-lam

Sharkyii · 2026-04-03T13:02:13Z

Describe your changes

Added hello_world_danra.ipynb, an end-to-end tutorial demonstrating neural-lam training on a small DANRA dataset (data prep → graph creation → 1-epoch CPU training → evaluation). This is taken from #202, credits: @Jayant-kernel

Enabled notebook CI using pytest-nbmake; notebooks under docs/notebooks/ now run as pytest tests, with a conftest.py fixture pre-creating danra.datastore.zarr via MDPDatastore to avoid runtime downloads, and notebook logic skipping data prep if it already exists. Dev dependencies updated with nbmake>=1.5.0 and ipykernel>=6.0.0.

Issue Link

Solves #69

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the README to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug
- maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

PR is up to date with the base branch
the tests pass
(if the PR is not just maintenance/bugfix) the PR is assigned to the next milestone. If it is not, propose it for a future milestone.
author has added an entry to the changelog (and designated the change as added, changed, fixed or maintenance)
Once the PR is ready to be merged, squash commits and merge the PR.

Sharkyii · 2026-04-03T13:40:23Z

@sadamov once free have a look here :)

sadamov · 2026-04-10T09:27:30Z

Okay I organized the hello_world issue and PRs:

Add DANRA tutorial notebook with pytest-nbmake #69 now exclusively talks about DANRA -> assigned to @Sharkyii
Add DANRA tutorial notebook with pytest-nbmake (#69) #577 is the official and only PR for the DANRA notebook
Add COSMO example notebook for onboarding #587 is a new issue only targeting COSMO -> would like to assign to @info-gallary and @PalaniappanR02 to collaborate
Add COSMO example notebook for onboarding (#587) #392 is the existing COSMO notebook that now closes Add COSMO example notebook for onboarding #587

If there was some oversight let me know, tried my best to look through all previous comms.

…tebooks/conftest.py with session fixture to create zarr datastore

sadamov

Thanks for taking over here @Sharkyii. We are almost done! Please see inline suggestions for CHANGELOG.md, pyproject.toml, and the workflow, plus the notebook notes below.

Make sure sure to fully render the notebook before the next review. And also track the other PRs that implement model weight loading (fixes the workaround below) and the implementation of WMG if it lands.

Notebook — please address before merging:

Cell 17 (graph visualisation): The current cell runs plot_graph via CLI and embeds the full graph_viz.html inline. The HTML is ~300 MB (g2m: 12 716 edges, m2g: 30 720 edges serialised as inline JSON) — this makes the notebook hang on render. Replace both the CLI cell and the display cell with a single Python cell that calls the plot_graph API directly, writes the full HTML for browser use, and displays a lightweight filtered view inline (M2M edges + mesh nodes only, <1 MB):

from IPython.display import HTML
from neural_lam.config import load_config_and_datastore
from neural_lam import utils
from neural_lam.plot_graph import plot_graph as _plot_graph

config_path = "tests/datastore_examples/mdp/danra_100m_winds/config.yaml"
_, datastore = load_config_and_datastore(config_path=config_path)
xy = datastore.get_xy("state", stacked=True)
grid_pos = xy / np.max(np.abs(xy))

graph_dir = os.path.join(datastore.root_path, "graph", "1level")
hierarchical, graph_ldict = utils.load_graph(graph_dir_path=graph_dir)

fig = _plot_graph(grid_pos=grid_pos, hierarchical=hierarchical, graph_ldict=graph_ldict)
fig.write_html("graph_viz.html", include_plotlyjs="cdn")
print("Full interactive graph saved to graph_viz.html — open in a browser.")
fig.data = tuple(t for t in fig.data if t.name in {"M2M", "Mesh nodes"})
display(HTML(fig.to_html(include_plotlyjs="cdn", full_html=False)))

Cell 23 (sitecustomize workaround): sitecustomize.py only registers argparse.Namespace. PyTorch 2.6 also rejects all neural_lam.config dataclasses when loading a checkpoint with weights_only=True, causing _pickle.UnpicklingError. Extend the safe globals list:

from neural_lam.config import (
    DatastoreSelection, ManualStateFeatureWeighting, NeuralLAMConfig,
    OutputClamping, TrainingConfig, UniformFeatureWeighting,
)
torch.serialization.add_safe_globals([
    argparse.Namespace, DatastoreSelection, ManualStateFeatureWeighting,
    NeuralLAMConfig, OutputClamping, TrainingConfig, UniformFeatureWeighting,
])

Cell 24 (eval command): --processor_layers 2 is set in the training cell but absent from the eval cell. The default is 4, so eval fails with RuntimeError: Missing key(s) in state_dict. Add --processor_layers 2 to the eval command.

Cell 25 (eval output display): The cell searches for test_rmse.pdf and pred_*.png — neither matches what eval actually writes. All outputs land as PNGs in wandb/latest-run/files/media/images/: metric plots as test_rmse_*.png, example predictions as {var}_example_*.png. No forecast zarr is produced. Replace with:

img_dir = "wandb/latest-run/files/media/images"
rmse_plots = sorted(glob.glob(os.path.join(img_dir, "test_rmse_*.png")))
if rmse_plots:
    print("RMSE scorecard:", rmse_plots[0])
    display(Image(filename=rmse_plots[0]))
else:
    print("test_rmse plot not found — check eval output above.")
example_plots = sorted(glob.glob(os.path.join(img_dir, "*_example_*.png")))
if example_plots:
    n_show = min(2, len(example_plots))
    print(f"Showing {n_show} of {len(example_plots)} prediction plot(s):")
    for p in example_plots[:n_show]:
        print(" ", p)
        display(Image(filename=p))
else:
    print("No prediction plots found — check eval output above.")

Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>

Sharkyii · 2026-04-22T10:13:51Z

@sadamov i was not able to test it because of some issue arising on my linux, as it is fixed now i can now continue doing this [I am in dual-boot windows+linux]
I will be back to this soon..

Sharkyii · 2026-04-25T07:10:03Z

for cell 24
PyTorch 2.6 changed torch.load to use weights_only=True by default. When loading a checkpoint, it refuses to unpickle argparse.Namespace objects (which are stored inside the checkpoint by PyTorch Lightning) unless they're explicitly allowlisted via torch.serialization.add_safe_globals.
check the modified train_model.py in the PR

Sharkyii · 2026-04-25T07:15:20Z

codespell................................................................Failed

hook id: codespell
exit code: 65

docs/notebooks/hello_world_danra.ipynb:412: fO ==> of, for, to, do, go
docs/notebooks/hello_world_danra.ipynb:412: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:412: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:412: unx ==> unix
docs/notebooks/hello_world_danra.ipynb:412: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:930: Ot ==> To, Of, Or, Not, It
docs/notebooks/hello_world_danra.ipynb:930: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:930: fOf ==> for
docs/notebooks/hello_world_danra.ipynb:930: nd ==> and, 2nd
docs/notebooks/hello_world_danra.ipynb:930: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:948: OInt ==> point, pint, joint, lint
docs/notebooks/hello_world_danra.ipynb:948: FO ==> OF, FOR, TO, DO, GO
docs/notebooks/hello_world_danra.ipynb:948: tE ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:948: bu ==> by, be, but, bug, bun, bud, buy, bum
docs/notebooks/hello_world_danra.ipynb:965: fO ==> of, for, to, do, go
docs/notebooks/hello_world_danra.ipynb:965: buI ==> buoy, buy
docs/notebooks/hello_world_danra.ipynb:965: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:965: bU ==> by, be, but, bug, bun, bud, buy, bum
docs/notebooks/hello_world_danra.ipynb:965: FO ==> OF, FOR, TO, DO, GO
docs/notebooks/hello_world_danra.ipynb:965: wHe ==> when, we

due to this , disabling codespell in the pre-commit run in the notebook.

Add notebook CI tests with pytest-nbmake and HelloWorld.ipynb

29d2405

Sharkyii changed the title ~~Add notebook ci nbmake~~ Add CI-tested DANRA tutorial notebook with pytest-nbmake Apr 3, 2026

Sharkyii closed this Apr 3, 2026

Sharkyii deleted the add-notebook-ci-nbmake branch April 3, 2026 13:18

Sharkyii reopened this Apr 3, 2026

Sharkyii changed the title ~~Add CI-tested DANRA tutorial notebook with pytest-nbmake~~ DANRA tutorial notebook with pytest-nbmake Apr 6, 2026

sadamov self-requested a review April 10, 2026 08:43

This was referenced Apr 10, 2026

Add DANRA tutorial notebook with pytest-nbmake #69

Open

Docs/hello world danra notebook 69 #202

Closed

sadamov self-assigned this Apr 10, 2026

sadamov added the documentation Improvements or additions to documentation label Apr 10, 2026

sadamov added this to the v0.7.0 milestone Apr 10, 2026

This was referenced Apr 10, 2026

Add COSMO example notebook for onboarding #587

Open

Add COSMO example notebook for onboarding (#587) #392

Open

sadamov linked an issue Apr 13, 2026 that may be closed by this pull request

Add DANRA tutorial notebook with pytest-nbmake #69

Open

sadamov removed a link to an issue Apr 13, 2026

Add DANRA tutorial notebook with pytest-nbmake #69

Open

sadamov changed the title ~~DANRA tutorial notebook with pytest-nbmake~~ DANRA tutorial notebook with pytest-nbmake (#69) Apr 13, 2026

sadamov changed the title ~~DANRA tutorial notebook with pytest-nbmake (#69)~~ Add DANRA tutorial notebook with pytest-nbmake (#69) Apr 13, 2026

sadamov linked an issue Apr 13, 2026 that may be closed by this pull request

Add DANRA tutorial notebook with pytest-nbmake #69

Open

Add notebook timeout and selective CI execution

fd789f2

Sharkyii force-pushed the add-notebook-ci-nbmake branch from 68046d5 to fd789f2 Compare April 15, 2026 12:58

Sharkyii and others added 4 commits April 15, 2026 18:46

Merge branch 'main' into add-notebook-ci-nbmake

677a4c9

Fix notebook CI by creating danra.datastore.zarr fixture, Add docs/no…

82df71c

…tebooks/conftest.py with session fixture to create zarr datastore

Fix trailing whitespace in workflow file

ca9b827

Merge branch 'main' of github.com:mllam/neural-lam into pr/Sharkyii/577

52e3df1

sadamov requested changes Apr 21, 2026

View reviewed changes

Comment thread CHANGELOG.md

Comment thread CHANGELOG.md

Comment thread pyproject.toml Outdated

Comment thread pyproject.toml Outdated

Comment thread .github/workflows/install-and-test.yml Outdated

Sharkyii and others added 2 commits April 22, 2026 14:32

Update pyproject.toml

dd97010

Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>

Update pyproject.toml

f9847b2

Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>

Update .github/workflows/install-and-test.yml

5a9171f

Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>

sadamov self-requested a review April 22, 2026 09:43

sadamov and others added 3 commits April 23, 2026 05:58

Merge branch 'main' into add-notebook-ci-nbmake

cbc118a

fix checkpoint loading and notebook graph viz for PyTorch 2.6

32c8d81

Merge branch 'main' into add-notebook-ci-nbmake

f90d952

Sharkyii added 2 commits April 25, 2026 12:48

fixing pre-commit error

f1a0f7c

fix precommit

5c198ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DANRA tutorial notebook with pytest-nbmake (#69)#577

Add DANRA tutorial notebook with pytest-nbmake (#69)#577
Sharkyii wants to merge 14 commits intomllam:mainfrom
Sharkyii:add-notebook-ci-nbmake

Sharkyii commented Apr 3, 2026 •

edited by sadamov

Loading

Uh oh!

Sharkyii commented Apr 3, 2026

Uh oh!

sadamov commented Apr 10, 2026

Uh oh!

sadamov left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sharkyii commented Apr 22, 2026

Uh oh!

Sharkyii commented Apr 25, 2026

Uh oh!

Sharkyii commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sharkyii commented Apr 3, 2026 • edited by sadamov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

Uh oh!

Sharkyii commented Apr 3, 2026

Uh oh!

sadamov commented Apr 10, 2026

Uh oh!

sadamov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sharkyii commented Apr 22, 2026

Uh oh!

Sharkyii commented Apr 25, 2026

Uh oh!

Sharkyii commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sharkyii commented Apr 3, 2026 •

edited by sadamov

Loading

sadamov left a comment •

edited

Loading