Add DANRA tutorial notebook with pytest-nbmake (#69)#577
Add DANRA tutorial notebook with pytest-nbmake (#69)#577Sharkyii wants to merge 14 commits intomllam:mainfrom
Conversation
|
@sadamov once free have a look here :) |
|
Okay I organized the hello_world issue and PRs:
If there was some oversight let me know, tried my best to look through all previous comms. |
68046d5 to
fd789f2
Compare
…tebooks/conftest.py with session fixture to create zarr datastore
There was a problem hiding this comment.
Thanks for taking over here @Sharkyii. We are almost done! Please see inline suggestions for CHANGELOG.md, pyproject.toml, and the workflow, plus the notebook notes below.
Make sure sure to fully render the notebook before the next review. And also track the other PRs that implement model weight loading (fixes the workaround below) and the implementation of WMG if it lands.
Notebook — please address before merging:
Cell 17 (graph visualisation): The current cell runs plot_graph via CLI and embeds the full graph_viz.html inline. The HTML is ~300 MB (g2m: 12 716 edges, m2g: 30 720 edges serialised as inline JSON) — this makes the notebook hang on render. Replace both the CLI cell and the display cell with a single Python cell that calls the plot_graph API directly, writes the full HTML for browser use, and displays a lightweight filtered view inline (M2M edges + mesh nodes only, <1 MB):
from IPython.display import HTML
from neural_lam.config import load_config_and_datastore
from neural_lam import utils
from neural_lam.plot_graph import plot_graph as _plot_graph
config_path = "tests/datastore_examples/mdp/danra_100m_winds/config.yaml"
_, datastore = load_config_and_datastore(config_path=config_path)
xy = datastore.get_xy("state", stacked=True)
grid_pos = xy / np.max(np.abs(xy))
graph_dir = os.path.join(datastore.root_path, "graph", "1level")
hierarchical, graph_ldict = utils.load_graph(graph_dir_path=graph_dir)
fig = _plot_graph(grid_pos=grid_pos, hierarchical=hierarchical, graph_ldict=graph_ldict)
fig.write_html("graph_viz.html", include_plotlyjs="cdn")
print("Full interactive graph saved to graph_viz.html — open in a browser.")
fig.data = tuple(t for t in fig.data if t.name in {"M2M", "Mesh nodes"})
display(HTML(fig.to_html(include_plotlyjs="cdn", full_html=False)))Cell 23 (sitecustomize workaround): sitecustomize.py only registers argparse.Namespace. PyTorch 2.6 also rejects all neural_lam.config dataclasses when loading a checkpoint with weights_only=True, causing _pickle.UnpicklingError. Extend the safe globals list:
from neural_lam.config import (
DatastoreSelection, ManualStateFeatureWeighting, NeuralLAMConfig,
OutputClamping, TrainingConfig, UniformFeatureWeighting,
)
torch.serialization.add_safe_globals([
argparse.Namespace, DatastoreSelection, ManualStateFeatureWeighting,
NeuralLAMConfig, OutputClamping, TrainingConfig, UniformFeatureWeighting,
])Cell 24 (eval command): --processor_layers 2 is set in the training cell but absent from the eval cell. The default is 4, so eval fails with RuntimeError: Missing key(s) in state_dict. Add --processor_layers 2 to the eval command.
Cell 25 (eval output display): The cell searches for test_rmse.pdf and pred_*.png — neither matches what eval actually writes. All outputs land as PNGs in wandb/latest-run/files/media/images/: metric plots as test_rmse_*.png, example predictions as {var}_example_*.png. No forecast zarr is produced. Replace with:
img_dir = "wandb/latest-run/files/media/images"
rmse_plots = sorted(glob.glob(os.path.join(img_dir, "test_rmse_*.png")))
if rmse_plots:
print("RMSE scorecard:", rmse_plots[0])
display(Image(filename=rmse_plots[0]))
else:
print("test_rmse plot not found — check eval output above.")
example_plots = sorted(glob.glob(os.path.join(img_dir, "*_example_*.png")))
if example_plots:
n_show = min(2, len(example_plots))
print(f"Showing {n_show} of {len(example_plots)} prediction plot(s):")
for p in example_plots[:n_show]:
print(" ", p)
display(Image(filename=p))
else:
print("No prediction plots found — check eval output above.")Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>
Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>
Co-authored-by: sadamov <45732287+sadamov@users.noreply.github.com>
|
@sadamov i was not able to test it because of some issue arising on my linux, as it is fixed now i can now continue doing this [I am in dual-boot windows+linux] |
|
for cell 24 |
|
codespell................................................................Failed
docs/notebooks/hello_world_danra.ipynb:412: fO ==> of, for, to, do, go due to this , disabling codespell in the pre-commit run in the notebook. |
Describe your changes
Added hello_world_danra.ipynb, an end-to-end tutorial demonstrating neural-lam training on a small DANRA dataset (data prep → graph creation → 1-epoch CPU training → evaluation). This is taken from #202, credits: @Jayant-kernel
Enabled notebook CI using pytest-nbmake; notebooks under docs/notebooks/ now run as pytest tests, with a conftest.py fixture pre-creating danra.datastore.zarr via MDPDatastore to avoid runtime downloads, and notebook logic skipping data prep if it already exists. Dev dependencies updated with nbmake>=1.5.0 and ipykernel>=6.0.0.
Issue Link
Solves #69
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee