Skip to content

Add DANRA tutorial notebook with pytest-nbmake (#69)#577

Open
Sharkyii wants to merge 14 commits intomllam:mainfrom
Sharkyii:add-notebook-ci-nbmake
Open

Add DANRA tutorial notebook with pytest-nbmake (#69)#577
Sharkyii wants to merge 14 commits intomllam:mainfrom
Sharkyii:add-notebook-ci-nbmake

Conversation

@Sharkyii
Copy link
Copy Markdown
Contributor

@Sharkyii Sharkyii commented Apr 3, 2026

Describe your changes

Added hello_world_danra.ipynb, an end-to-end tutorial demonstrating neural-lam training on a small DANRA dataset (data prep → graph creation → 1-epoch CPU training → evaluation). This is taken from #202, credits: @Jayant-kernel

Enabled notebook CI using pytest-nbmake; notebooks under docs/notebooks/ now run as pytest tests, with a conftest.py fixture pre-creating danra.datastore.zarr via MDPDatastore to avoid runtime downloads, and notebook logic skipping data prep if it already exists. Dev dependencies updated with nbmake>=1.5.0 and ipykernel>=6.0.0.

Issue Link

Solves #69

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug
    • maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • (if the PR is not just maintenance/bugfix) the PR is assigned to the next milestone. If it is not, propose it for a future milestone.
  • author has added an entry to the changelog (and designated the change as added, changed, fixed or maintenance)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@Sharkyii Sharkyii changed the title Add notebook ci nbmake Add CI-tested DANRA tutorial notebook with pytest-nbmake Apr 3, 2026
@Sharkyii Sharkyii closed this Apr 3, 2026
@Sharkyii Sharkyii deleted the add-notebook-ci-nbmake branch April 3, 2026 13:18
@Sharkyii Sharkyii reopened this Apr 3, 2026
@Sharkyii
Copy link
Copy Markdown
Contributor Author

Sharkyii commented Apr 3, 2026

@sadamov once free have a look here :)

@Sharkyii Sharkyii changed the title Add CI-tested DANRA tutorial notebook with pytest-nbmake DANRA tutorial notebook with pytest-nbmake Apr 6, 2026
@sadamov sadamov self-requested a review April 10, 2026 08:43
@sadamov sadamov self-assigned this Apr 10, 2026
@sadamov sadamov added the documentation Improvements or additions to documentation label Apr 10, 2026
@sadamov sadamov added this to the v0.7.0 milestone Apr 10, 2026
@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Apr 10, 2026

Okay I organized the hello_world issue and PRs:

If there was some oversight let me know, tried my best to look through all previous comms.

@sadamov sadamov linked an issue Apr 13, 2026 that may be closed by this pull request
@sadamov sadamov changed the title DANRA tutorial notebook with pytest-nbmake DANRA tutorial notebook with pytest-nbmake (#69) Apr 13, 2026
@sadamov sadamov changed the title DANRA tutorial notebook with pytest-nbmake (#69) Add DANRA tutorial notebook with pytest-nbmake (#69) Apr 13, 2026
@sadamov sadamov linked an issue Apr 13, 2026 that may be closed by this pull request
@Sharkyii Sharkyii force-pushed the add-notebook-ci-nbmake branch from 68046d5 to fd789f2 Compare April 15, 2026 12:58
Copy link
Copy Markdown
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking over here @Sharkyii. We are almost done! Please see inline suggestions for CHANGELOG.md, pyproject.toml, and the workflow, plus the notebook notes below.

Make sure sure to fully render the notebook before the next review. And also track the other PRs that implement model weight loading (fixes the workaround below) and the implementation of WMG if it lands.

Notebook — please address before merging:

Cell 17 (graph visualisation): The current cell runs plot_graph via CLI and embeds the full graph_viz.html inline. The HTML is ~300 MB (g2m: 12 716 edges, m2g: 30 720 edges serialised as inline JSON) — this makes the notebook hang on render. Replace both the CLI cell and the display cell with a single Python cell that calls the plot_graph API directly, writes the full HTML for browser use, and displays a lightweight filtered view inline (M2M edges + mesh nodes only, <1 MB):

from IPython.display import HTML
from neural_lam.config import load_config_and_datastore
from neural_lam import utils
from neural_lam.plot_graph import plot_graph as _plot_graph

config_path = "tests/datastore_examples/mdp/danra_100m_winds/config.yaml"
_, datastore = load_config_and_datastore(config_path=config_path)
xy = datastore.get_xy("state", stacked=True)
grid_pos = xy / np.max(np.abs(xy))

graph_dir = os.path.join(datastore.root_path, "graph", "1level")
hierarchical, graph_ldict = utils.load_graph(graph_dir_path=graph_dir)

fig = _plot_graph(grid_pos=grid_pos, hierarchical=hierarchical, graph_ldict=graph_ldict)
fig.write_html("graph_viz.html", include_plotlyjs="cdn")
print("Full interactive graph saved to graph_viz.html — open in a browser.")
fig.data = tuple(t for t in fig.data if t.name in {"M2M", "Mesh nodes"})
display(HTML(fig.to_html(include_plotlyjs="cdn", full_html=False)))

Cell 23 (sitecustomize workaround): sitecustomize.py only registers argparse.Namespace. PyTorch 2.6 also rejects all neural_lam.config dataclasses when loading a checkpoint with weights_only=True, causing _pickle.UnpicklingError. Extend the safe globals list:

from neural_lam.config import (
    DatastoreSelection, ManualStateFeatureWeighting, NeuralLAMConfig,
    OutputClamping, TrainingConfig, UniformFeatureWeighting,
)
torch.serialization.add_safe_globals([
    argparse.Namespace, DatastoreSelection, ManualStateFeatureWeighting,
    NeuralLAMConfig, OutputClamping, TrainingConfig, UniformFeatureWeighting,
])

Cell 24 (eval command): --processor_layers 2 is set in the training cell but absent from the eval cell. The default is 4, so eval fails with RuntimeError: Missing key(s) in state_dict. Add --processor_layers 2 to the eval command.

Cell 25 (eval output display): The cell searches for test_rmse.pdf and pred_*.png — neither matches what eval actually writes. All outputs land as PNGs in wandb/latest-run/files/media/images/: metric plots as test_rmse_*.png, example predictions as {var}_example_*.png. No forecast zarr is produced. Replace with:

img_dir = "wandb/latest-run/files/media/images"
rmse_plots = sorted(glob.glob(os.path.join(img_dir, "test_rmse_*.png")))
if rmse_plots:
    print("RMSE scorecard:", rmse_plots[0])
    display(Image(filename=rmse_plots[0]))
else:
    print("test_rmse plot not found — check eval output above.")
example_plots = sorted(glob.glob(os.path.join(img_dir, "*_example_*.png")))
if example_plots:
    n_show = min(2, len(example_plots))
    print(f"Showing {n_show} of {len(example_plots)} prediction plot(s):")
    for p in example_plots[:n_show]:
        print(" ", p)
        display(Image(filename=p))
else:
    print("No prediction plots found — check eval output above.")

Comment thread CHANGELOG.md
Comment thread CHANGELOG.md
Comment thread pyproject.toml Outdated
Comment thread pyproject.toml Outdated
Comment thread .github/workflows/install-and-test.yml Outdated
Sharkyii and others added 2 commits April 22, 2026 14:32
Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>
Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>
Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>
@sadamov sadamov self-requested a review April 22, 2026 09:43
@Sharkyii
Copy link
Copy Markdown
Contributor Author

@sadamov i was not able to test it because of some issue arising on my linux, as it is fixed now i can now continue doing this [I am in dual-boot windows+linux]
I will be back to this soon..

@Sharkyii
Copy link
Copy Markdown
Contributor Author

for cell 24
PyTorch 2.6 changed torch.load to use weights_only=True by default. When loading a checkpoint, it refuses to unpickle argparse.Namespace objects (which are stored inside the checkpoint by PyTorch Lightning) unless they're explicitly allowlisted via torch.serialization.add_safe_globals.
check the modified train_model.py in the PR

@Sharkyii
Copy link
Copy Markdown
Contributor Author

codespell................................................................Failed

  • hook id: codespell
  • exit code: 65

docs/notebooks/hello_world_danra.ipynb:412: fO ==> of, for, to, do, go
docs/notebooks/hello_world_danra.ipynb:412: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:412: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:412: unx ==> unix
docs/notebooks/hello_world_danra.ipynb:412: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:930: Ot ==> To, Of, Or, Not, It
docs/notebooks/hello_world_danra.ipynb:930: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:930: fOf ==> for
docs/notebooks/hello_world_danra.ipynb:930: nd ==> and, 2nd
docs/notebooks/hello_world_danra.ipynb:930: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:948: OInt ==> point, pint, joint, lint
docs/notebooks/hello_world_danra.ipynb:948: FO ==> OF, FOR, TO, DO, GO
docs/notebooks/hello_world_danra.ipynb:948: tE ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:948: bu ==> by, be, but, bug, bun, bud, buy, bum
docs/notebooks/hello_world_danra.ipynb:965: fO ==> of, for, to, do, go
docs/notebooks/hello_world_danra.ipynb:965: buI ==> buoy, buy
docs/notebooks/hello_world_danra.ipynb:965: te ==> the, be, we, to
docs/notebooks/hello_world_danra.ipynb:965: bU ==> by, be, but, bug, bun, bud, buy, bum
docs/notebooks/hello_world_danra.ipynb:965: FO ==> OF, FOR, TO, DO, GO
docs/notebooks/hello_world_danra.ipynb:965: wHe ==> when, we

due to this , disabling codespell in the pre-commit run in the notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add DANRA tutorial notebook with pytest-nbmake

2 participants