Skip to content

Docs/hello world danra notebook 69#202

Closed
Jayant-kernel wants to merge 14 commits intomllam:mainfrom
Jayant-kernel:docs/hello-world-danra-notebook-69
Closed

Docs/hello world danra notebook 69#202
Jayant-kernel wants to merge 14 commits intomllam:mainfrom
Jayant-kernel:docs/hello-world-danra-notebook-69

Conversation

@Jayant-kernel
Copy link
Copy Markdown
Contributor

Describe your changes

This PR adds a brief, onboarding Jupyter notebook (hello_world_danra.ipynb) to address #69.
It uses the existing DANRA 100m winds example configuration to walk new users through:

  • CPU-safe environment setup
  • mllam-data-prep preprocessing
  • Single-level (1level) graph generation
  • CPU model training (1 epoch)
  • Evaluation and W&B offline logging
    This is a draft to get feedback on the notebook content, defaults (e.g., 1level vs hierarchical), and location before finalizing.

Issue Link

Addresses #69

Type of change

  • Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch
  • I have performed a self-review of my code
  • I have given the PR a name that clearly describes the change, written in imperative form.
  • I have requested a reviewer and an assignee.

Checklist for reviewers

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • the PR is assigned to the next milestone
  • author has added an entry to the changelog

@Jayant-kernel
Copy link
Copy Markdown
Contributor Author

@sadamov @joeloskarsson
Please review that my draft is okay or not?

@sadamov sadamov self-requested a review February 21, 2026 09:37
@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Feb 21, 2026

Thanks @Jayant-kernel for your work on this. I am having a look at the notebook right now. Could you in the meantime use the original PR template please? The current one seems to be missing some bullet points.

Copy link
Copy Markdown
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly going into the right direction, thanks again. What I really like is that the full notebook takes ~5mins on CPU. But, the data fetching and the model training fail with the current default parameters. Just wanted to emphasize again that in our org it's fine to be slower but more accurate and that PRs are tested by human hands before submission.

Three general points:

  1. have you already considered how you would keep this notebook up to date? So that for each release of neural-lam or mllam-data-prep the codecells still work (e.g. integrate notebook exec. in a test)?
    a. one such change will be the introduction of weather-model-graphs for graph creation: #184
  2. it would be great if you could vizualise, the input data, the graph, the model predictions similar to this notebook here: https://github.com/joeloskarsson/neural-lam-dev/blob/research/docs/reproduce_paper_sample.md
  3. please fix the pre-commit errors

Here are some remarks from my initial walkthrough:

  • Before the cell that "import os", advice the user to chose a pre-installed Python version according to the pyproject.toml file, that provides ipykernel
  • Since the notebook is in ./docs/notebooks you first have to cd into ROOT, otherwise cells will not run in ROOT
  • convert uv install into a code cell
  • The CLI expects config as a positional argument, not --config for mllam-data-prep
  • for the model training/eval you need to set --val_steps_to_log 1 because the default setting would validate ar_steps > 1 which we are not training on. You also need to set --num_workers to a higher value like 8 instead of 0, because by default neural-lam uses persistens workers, 0 is not allowed.
  • you can also link this paper, actually describing the research with DANRA: https://arxiv.org/abs/2504.09340

Now about the next steps: this issue is scheduled for version v0.7.0: https://github.com/mllam/neural-lam/milestone/12. that means we can merge it only after v0.6.0 is released. We discuss these milestones every month in the dev-meeting and they can be adjusted.
The work on this coul be part of a GSoC proposal for the project about structured docs.

@sadamov sadamov added documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers labels Feb 21, 2026
@sadamov sadamov added this to the v0.7.0 milestone Feb 21, 2026
@sadamov sadamov assigned sadamov and Jayant-kernel and unassigned sadamov Feb 21, 2026
@Jayant-kernel
Copy link
Copy Markdown
Contributor Author

@sadamov
Thanks for the detailed review — I’ve updated the draft notebook based on your comments.

Changes made

  • switched to the original PR template
  • added Python version / ipykernel note
  • added repo-root cd fix so cells run from docs/notebooks/
  • converted install steps into runnable code cells
  • fixed mllam_data_prep CLI usage (positional config arg)
  • fixed training/eval flags (--val_steps_to_log 1, --num_workers 8)
  • added basic visualizations (input data, graph preview, predictions)
  • added DANRA paper reference
  • fixed pre-commit issues

I also added a note about future maintainability (notebook execution test in CI) and the upcoming weather-model-graphs transition (#184).

Please take another look when you have time — happy to make more changes.

@varunsiravuri
Copy link
Copy Markdown
Contributor

Nice work @Jayant-kernel ! One small suggestion , it might be helpful to add a note in the notebook about the minimum required version of mllam-data-prep , so new users don't run into compatibility issues.

@Jayant-kernel
Copy link
Copy Markdown
Contributor Author

Hey @sadamov, another round of fixes:

Checkpoint now picks the most recently modified .ckpt (not arbitrary first)
Prediction plots display inline after eval
Added mllam-data-prep >= 0.6.0 version note
Graph cell now lists actual .pt files instead of loading non-existent graph.pt
Fixed --config → positional arg in Scaling Tips
Happy to keep working.

@sadamov sadamov assigned sadamov and unassigned Jayant-kernel Feb 24, 2026
@Jayant-kernel Jayant-kernel marked this pull request as ready for review February 24, 2026 17:44
@sadamov sadamov self-requested a review February 26, 2026 16:11
Copy link
Copy Markdown
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran this notebook on my 7 year-old laptop and it works perfectly! 🤯 this is great!

since it is a bit tricky to review notebooks, I pushed some smaller changes directly into your branch. please have a look. we should keep the notebook rendered, so that other users can refer to the expected output.

there are currently two major open points that I would like to see addressed:

  1. The plotting
    a. the input data should be a 2D horizontal plot of at least one of the 4 state features (atmospheric variables u100m v100m r2m t2m) and not a 1D-lineplot
    b. the graph should be plotted and vizualised with neural_lam/plot_graph.py
    c. the prediction plots fail to be displayed (and saved to disk?) please double check paths
  2. CI/CD integration: In some form or another we should make sure that this notebook keeps up to date with the codebase. But unlike tests you are allowed to break the notebook cells. But we need to realize that they are broken and fix it for each PR in the future. Maybe it could become a new entry in PR-temple? not sure, open to your suggestions.

@Jayant-kernel
Copy link
Copy Markdown
Contributor Author

Thanks for the review @sadamov! All requested changes are pushed:

Formatting: Updated PR template & fixed pre-commit issues.
Notebook: Added kernel version warning and a cd repo root cell.
Setup: Converted uv install to a code cell.
CLI: Fixed mllam-data-prep positional config args.
Plotting: Replaced 1D plots with a 2D map & added graph visualization via plot_graph.py.
Performance: Tuned training defaults for reliable CPU execution.
CI: Happy to add an optional GH action to validate notebook runs for freshness.
Ready for another look

@Jayant-kernel Jayant-kernel requested a review from sadamov March 2, 2026 08:41
@Mani212005
Copy link
Copy Markdown
Contributor

@Jayant-kernel @sadamov I have been following the PR, it look great!
One blocking issue I noticed while reviewing: the evaluation cell (Section 5) fails with a PyTorch 2.6 compatibility error visible in the captured output:

_pickle.UnpicklingError: Unsupported global: GLOBAL argparse.Namespace was not an allowed global by default.

This is because PyTorch 2.6 changed the default of weights_only in torch.load from False to True, which breaks loading checkpoints containing argparse.Namespace objects. A notebook-level workaround would be:

import argparse
import torch.serialization
torch.serialization.add_safe_globals([argparse.Namespace])

Though the cleaner fix is probably in train_model.py's checkpoint loading code rather than the notebook itself — worth checking if there's already an issue open for this.

@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Mar 6, 2026

@Mani212005 were you actually able to run the notebook? For me the file was corrupted and I couldn't even open it. I pushed a quick fix to remove the artifacts that caused the not-opening issue. Now however I see a lot of other characters that were introduced in one of the recent commits, e.g. ”œâ”€â”€. @Jayant-kernel it looks a lot like an LLM went rouge here. Maybe it's best to revert to commit e1bb57a and restart from there?

@kshirajahere
Copy link
Copy Markdown
Contributor

@sadamov I diffed the notebook history locally.

what i can confirm from the file itself is:

  • e1bb57a is valid json and parses cleanly
  • ad32d03 is the commit that makes the notebook invalid json
  • the current tip badfe4b restores JSON validity, but the notebook content still looks damaged relative to e1bb57a

Reverting back to e1bb57a is on the safer side. And afaik pytorch 2.6 checkpoint-loading error is separate

@Mani212005
Copy link
Copy Markdown
Contributor

@sadamov I saw the notebook source directly from the GitHub diff, not by running it locally. The PyTorch error I flagged was visible in the captured cell outputs embedded in the .ipynb JSON from a previous run.

@Jayant-kernel Jayant-kernel force-pushed the docs/hello-world-danra-notebook-69 branch from badfe4b to 44196b2 Compare March 6, 2026 08:56
- Revert to e1bb57a baseline to remove LLM-introduced encoding artifacts
- Replace 1D line plot in cell 12 with proper 2D pcolormesh map of
  state features using x/y grid coordinates from the zarr datastore
- Add PyTorch 2.6 fix cell before eval: registers argparse.Namespace
  as a safe global so torch.load does not fail with weights_only=True
- Clear stale error outputs from eval and prediction display cells
@Jayant-kernel Jayant-kernel force-pushed the docs/hello-world-danra-notebook-69 branch from 44196b2 to 42ad106 Compare March 6, 2026 08:57
@Jayant-kernel
Copy link
Copy Markdown
Contributor Author

@sadamov rereview

Copy link
Copy Markdown
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook is working again. But previous comments have not been adressed still. For example the plot of the graph object is missing and the plot of the prediction fails. Please also address the comment about torch > 2.6 from above.

once you are sure to have addressed all open remarks, please update the changelog

@yukthagangadhari5
Copy link
Copy Markdown

Hi, I would like to work on this issue. Could you please assign it to me?

@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Mar 8, 2026

@yukthagangadhari5 so @Jayant-kernel is already working on this. But maybe he would appreciate some help on the visualizations (input data, graphs, predictions). Best to ask them directly here

@yukthagangadhari5
Copy link
Copy Markdown

Thanks for the suggestion! I’d be happy to help with the visualizations.

@Jayant-kernel would you like help adding plots for the input data, graphs, or model predictions in the notebook? Let me know what would be most useful.

@nsg365
Copy link
Copy Markdown

nsg365 commented Mar 10, 2026

Hi @Jayant-kernel , I am interested in picking up this issue as part of the documentation generation project. I have the required skills in Python and Shell.

Before I get started, are there any specific documentation frameworks (like Sphinx or pdoc) the team prefers, or should I start by doing some research and proposing a solution? I would love to be assigned to this!

… notebook

- Add cell that runs neural_lam.plot_graph (--datastore_config_path,
  --graph 1level, --save graph_viz.html) to produce an interactive
  Plotly HTML visualisation of the message-passing graph (addresses
  reviewer request in mllam#202)
- Add cell that displays the saved graph_viz.html inline via IFrame
- Replace broken PNG-glob prediction display with a cell that loads
  the example_pred_*.pt / example_target_*.pt tensors saved by the
  wandb logger and renders a side-by-side Ground Truth vs Prediction
  pcolormesh plot inline
- Add CHANGELOG entry under [unreleased] / Added for PR mllam#202

Fixes remaining open points from sadamov's review on PR mllam#202.
@Jayant-kernel
Copy link
Copy Markdown
Contributor Author

Hi @sadamov — addressed all remaining open points from your last review:

Graph visualisation = Added two cells after the graph-creation step:

Runs neural_lam.plot_graph with --datastore_config_path, --graph 1level, and --save graph_viz.html
Displays the interactive Plotly HTML inline via IFrame
Prediction plots — Replaced the broken PNG-glob approach:

Loads example_pred_.pt / example_target_.pt tensors saved by the wandb logger under wandb/
Renders a side-by-side Ground Truth vs Prediction inline plot for the first AR step
PyTorch 2.6 (flagged by @Mani212005) — torch.serialization.add_safe_globals([argparse.Namespace]) is already in place before eval

CHANGELOG , Entry added under [unreleased]

@sadamov sadamov self-requested a review March 10, 2026 19:04
Copy link
Copy Markdown
Collaborator

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these plots, here are some issues I had when running the latest notebooks additions:

  • cell 13: could we retrieve the units directly from the xarray dataset here, because the coords do have units attrs. That would make it even more instructive. (e.g. x_units = ds.coords["x"].attrs.get("units", ""))
  • cell 19: for me the graph html does not render in vscode (maybe related to security settings), could we use this instead with open("graph_viz.html") as f: display(HTML(f.read()))
  • cell 26: the eval fails because of the way torch loads checkpoints e.g. with torch=2.10.0. this issue will soon be addressed in #240. But until then we need another way to make it work here in this notebook. I can see the issue. Cell 24 already registers the safe globals for the notebook kernel, but the eval command in cell 25 runs as a subprocess (!{sys.executable} -m ...), so the fix doesn't carry over. you can use a sitecustomize.py for example.
    _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. 
    
  • cell 27: is currently broken even with the cell 26 fixed with RecursionError: maximum recursion depth exceeded. But instead of fixing it I propose to stay closer to https://github.com/joeloskarsson/neural-lam-dev/blob/research/docs/reproduce_paper_sample.md for the final eval and vizualizations. So only generate one example_pred, save everything as zarr. vizualise the test_rmse.pdf directly from the output, and also visualize the preditions based on plots generated during eval. then there is no need to do the manual .pt loading and such. let me know if you think some approaches in the notebook linked here could be improved. very open to suggestions.

and two more general remarks:

  • cell 8: when we use !{sys.executable} the first time, could you add a comment saying smth along the lines of: "# sys.executable ensures we use the same Python as this notebook's kernel"
  • for the future maintainability we were talking about a see diverging approaches. What do you think is the best way forward?
    1. automatize pre-commits and github workflows to nbmake the full notebook and ensure it's compatability
    2. add a note to the PR-template that tells authors to check if their changes break the notebooks
    3. do nothing and fix the notebooks when broken. the argument here is that breaking changes are rare to the core functionailities of the repo.

@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Mar 31, 2026

@Jayant-kernel doing some housekeep: there are many people interested in contributing to this PR. Are you still working on it or is it okay if somebody elsi takes over?

@Sharkyii
Copy link
Copy Markdown
Contributor

@sadamov i can help integrate this into a new PR if @Jayant-kernel is not active.
Will apply all suggestions mentioned up here!

@sadamov
Copy link
Copy Markdown
Collaborator

sadamov commented Apr 10, 2026

superceded by #577

@sadamov sadamov closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants