feat: add plotting subpackage for radar dataset diagnostics#30
feat: add plotting subpackage for radar dataset diagnostics#30
Conversation
Adds mlcast_datasets.plotting with 10 modules covering: - Domain overview map with spatial coverage overlay - Monthly precipitation climatology (boxplot) - Precipitation statistics: mean/max/std maps + value histogram - Sample precipitation event maps - Spatial data coverage heatmap - Temporal completeness heatmap + yearly timestep bar chart - Summary metadata table (returns pd.DataFrame, saves as CSV) Also makes plotting an optional install: pip install 'mlcast-datasets[plotting]' (cartopy, dask, matplotlib, numpy, pandas) Removes numpy, pandas, jupyter-server, ipykernel, xarray, cartopy, matplotlib, tqdm from mandatory core dependencies. Docs-only packages (jupyter-server, ipykernel) moved to [dependency-groups].docs. Adds [tool.isort] profile = "black" to pyproject.toml to resolve isort/black pre-commit hook conflict.
leifdenby
left a comment
There was a problem hiding this comment.
Could you add docstrings throughout? I will give it a thorough review once I have those :)
Convert all 21 functions across 10 files in the plotting subpackage from one-liner or Google-style docstrings to full numpydoc format with Parameters, Returns, and Raises sections.
Demonstrates all plotting functions with small sample sizes for CI. Includes install instructions for the plotting extra.
|
This looks great @franchg :) I have made a PR #37 which ensures that we build notebooks in CI in PRs (that doesn't happen now, I had overlooked that) and it also will comment with a link to this preview build. Maybe we could merge that first and then we can check with that how long your notebooks take to build? |
|
View preview of built jupyterbooks on https://mlcast-community.github.io/mlcast-datasets/pr-preview/pr-30/ |
|
The rendered notebooks look great @franchg, but it takes the execution of the jupyterbook build from ~ 4min (https://github.com/mlcast-community/mlcast-datasets/actions/runs/24443479747/job/71413893785) to ~15min (https://github.com/mlcast-community/mlcast-datasets/actions/runs/24993116111/job/73183302255) Maybe we need to think about how to can reduce the long-running computations a bit? Otherwise we need to work out how to execute the notebook build closer to the data (i.e. on a EWC host) |







Summary
mlcast_datasets.plottingwith 7 modules for radar dataset diagnostics:domain_map— domain overview map with spatial coverage overlaymonthly_cycle— monthly precipitation climatology boxplotprecipitation_stats— mean/max/std maps + value histogramsample_precipitation— precipitation event snapshot gridspatial_coverage— data coverage fraction heatmaptemporal_coverage— monthly completeness heatmap + yearly bar chartsummary_table— metadata summary returningpd.DataFrame(CSV-saveable)pip install 'mlcast-datasets[plotting]'[tool.isort] profile = "black"to resolve pre-commit hook conflictTest plan
uv run pytest src/mlcast_datasets/tests/ -q— all tests passuv pip install -e .— installs without plotting extraspython -c "import mlcast_datasets"— core import worksuv pip install -e ".[plotting]"— installs plotting extraspython -c "from mlcast_datasets.plotting import plot_domain_map"— plotting import works