Skip to content

feat: add plotting subpackage for radar dataset diagnostics#30

Open
franchg wants to merge 10 commits intomainfrom
feat/plotting-subpackage
Open

feat: add plotting subpackage for radar dataset diagnostics#30
franchg wants to merge 10 commits intomainfrom
feat/plotting-subpackage

Conversation

@franchg
Copy link
Copy Markdown
Member

@franchg franchg commented Feb 18, 2026

Summary

  • Adds mlcast_datasets.plotting with 7 modules for radar dataset diagnostics:
    • domain_map — domain overview map with spatial coverage overlay
    • monthly_cycle — monthly precipitation climatology boxplot
    • precipitation_stats — mean/max/std maps + value histogram
    • sample_precipitation — precipitation event snapshot grid
    • spatial_coverage — data coverage fraction heatmap
    • temporal_coverage — monthly completeness heatmap + yearly bar chart
    • summary_table — metadata summary returning pd.DataFrame (CSV-saveable)
  • Makes plotting an optional install: pip install 'mlcast-datasets[plotting]'
  • Adds [tool.isort] profile = "black" to resolve pre-commit hook conflict

Test plan

  • uv run pytest src/mlcast_datasets/tests/ -q — all tests pass
  • uv pip install -e . — installs without plotting extras
  • python -c "import mlcast_datasets" — core import works
  • uv pip install -e ".[plotting]" — installs plotting extras
  • python -c "from mlcast_datasets.plotting import plot_domain_map" — plotting import works

Adds mlcast_datasets.plotting with 10 modules covering:
- Domain overview map with spatial coverage overlay
- Monthly precipitation climatology (boxplot)
- Precipitation statistics: mean/max/std maps + value histogram
- Sample precipitation event maps
- Spatial data coverage heatmap
- Temporal completeness heatmap + yearly timestep bar chart
- Summary metadata table (returns pd.DataFrame, saves as CSV)

Also makes plotting an optional install:
  pip install 'mlcast-datasets[plotting]'
(cartopy, dask, matplotlib, numpy, pandas)

Removes numpy, pandas, jupyter-server, ipykernel, xarray, cartopy,
matplotlib, tqdm from mandatory core dependencies. Docs-only packages
(jupyter-server, ipykernel) moved to [dependency-groups].docs.

Adds [tool.isort] profile = "black" to pyproject.toml to resolve
isort/black pre-commit hook conflict.
@franchg
Copy link
Copy Markdown
Member Author

franchg commented Feb 18, 2026

EXAMPLE OF USAGE

import mlcast_datasets
from mlcast_datasets.plotting import  plot_domain_map, plot_monthly_cycle, \
    plot_precipitation_stats, plot_sample_precipitation, plot_spatial_coverage, \
    plot_temporal_coverage, generate_summary_table
cat = mlcast_datasets.open_catalog()
ds = cat.precipitation.it_dpc_sri_5min.to_dask()
_ = plot_domain_map(ds.sel(time=slice("2025-01-01", None)), n_coverage_samples=1000)
image
_ = plot_spatial_coverage(ds, n_samples=10000)
image
_ = plot_monthly_cycle(ds, n_samples=10000)
image
_ = plot_sample_precipitation(ds, time_slice=slice("2023-07-01", "2023-07-02"), time_spacing_hours=1)
image
_ = plot_temporal_coverage(ds)
image
_ = plot_precipitation_stats(ds.sel(time=slice("2020-01-01", None)))
image image
generate_summary_table(ds)
Property Value
0 Time range 2010-01-01 to 2025-12-31
1 Total timesteps 1,039,785
2 Missing timesteps 15,530 (1.5%)
3 Grid dimensions 1400 × 1200 pixels
4 Spatial resolution 1 km
5 Data variable RR (Total precipitation rate)
6 Units kg m-2 h-1
7 Data type float32
8 Uncompressed volume 7.0 TB
9 Compressed volume N/A
10 Compression ratio N/A
11 Temporal frequency 15min (2010-01-01-2014-06-25), 10min (2014-06-...
12 CRS Transverse Mercator
13 License CC-BY-SA-4.0

@leifdenby leifdenby self-requested a review February 24, 2026 13:50
Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add docstrings throughout? I will give it a thorough review once I have those :)

Comment thread src/mlcast_datasets/plotting/_map_helpers.py
franchg and others added 3 commits March 2, 2026 13:57
Convert all 21 functions across 10 files in the plotting subpackage
from one-liner or Google-style docstrings to full numpydoc format
with Parameters, Returns, and Raises sections.
Demonstrates all plotting functions with small sample sizes for CI.
Includes install instructions for the plotting extra.
@franchg franchg requested a review from leifdenby March 2, 2026 13:46
@leifdenby
Copy link
Copy Markdown
Member

This looks great @franchg :)

I have made a PR #37 which ensures that we build notebooks in CI in PRs (that doesn't happen now, I had overlooked that) and it also will comment with a link to this preview build. Maybe we could merge that first and then we can check with that how long your notebooks take to build?

@leifdenby leifdenby added this to the v0.3.0 milestone Apr 10, 2026
@leifdenby leifdenby modified the milestones: v0.3.0, v0.4.0 Apr 14, 2026
@github-actions
Copy link
Copy Markdown

View preview of built jupyterbooks on https://mlcast-community.github.io/mlcast-datasets/pr-preview/pr-30/
(preview is automatically rebuilt and uploaded on later commits)

@leifdenby
Copy link
Copy Markdown
Member

The rendered notebooks look great @franchg, but it takes the execution of the jupyterbook build from ~ 4min (https://github.com/mlcast-community/mlcast-datasets/actions/runs/24443479747/job/71413893785) to ~15min (https://github.com/mlcast-community/mlcast-datasets/actions/runs/24993116111/job/73183302255)

Maybe we need to think about how to can reduce the long-running computations a bit? Otherwise we need to work out how to execute the notebook build closer to the data (i.e. on a EWC host)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants