Skip to content

Temporal aggregation infrastructure (Steps 1-3)#15

Draft
espg wants to merge 2 commits into
mainfrom
magg_temporal
Draft

Temporal aggregation infrastructure (Steps 1-3)#15
espg wants to merge 2 commits into
mainfrom
magg_temporal

Conversation

@espg
Copy link
Copy Markdown
Contributor

@espg espg commented Apr 7, 2026

Summary

Adds the infrastructure for temporal aggregation pipelines, enabling magg to support use cases beyond spatial binning (e.g., computing storm summary statistics from reanalysis data). Implements Steps 1–3 from the implementation plan on #12.

Motivated by the aggregation patterns in antarctic_AR_dataset, which does identical orchestration/dispatch but for temporal reduction of gridded fields under spatial masks.

What's done

  • pipeline.type config field — configs can now declare themselves as spatial (default, backward compatible), temporal, or event. Validation branches per type: spatial pipelines validate grid/source/function; temporal pipelines validate collections, spatial_func, and temporal_reducer references.

  • Shared orchestration (orchestrate.py) — extracted the Lambda dispatch machinery (ThreadPoolExecutor + boto3 invoke + retry + log parsing + cost estimation) that was duplicated between magg and antarctic_AR_dataset. invoke_lambda.py refactored to use it.

  • Generalized authget_s3_credentials(daac=) accepts any DAAC (NSIDC, GES_DISC, etc.); get_nsidc_s3_credentials kept as alias.

  • Temporal building blocks (temporal.py) — ported from antarctic_AR_dataset:

    • 5 streaming accumulators: Max, Min, Sum, WeightedMean, FirstLandfallCapture
    • 6 per-timestep spatial functions: max, min, weighted_sum, weighted_mean, max_gradient, min_over_levels
    • specs_from_config() bridge to convert YAML config to internal spec dicts
  • merra2_storm.yaml built-in config — 15 storm metrics matching the antarctic_AR_dataset registry, fully validated by the temporal config path.

  • Tests — 187 pass (88 new). Covers accumulators, spatial functions with synthetic xarray grids, config roundtrips, temporal validation, orchestrate pure functions.

What remains

- process_storm() — the main temporal processing function (analogous to process_morton_cell() for spatial). Reads MERRA-2 from S3 via xarray, applies spatial funcs per timestep, feeds streaming accumulators, returns scalar dict. To be ported/adapted from artools/cloud/worker.py.
- Temporal Lambda handlerdeployment/aws/temporal_handler.py wrapping process_storm() with serialization/deserialization of event payloads (base64 storm masks, etc.).
- Temporal orchestrator CLI — a main() equivalent that loads storm catalogs, builds granule indices, and dispatches temporal Lambda invocations using orchestrate.dispatch_lambda().

The above was the original plan, written before #13 landed. Now that runner.py provides the composable agg() API with pluggable backends, the remaining work is:

  • process_event() — the temporal equivalent of process_morton_cell(). Reads gridded data from S3 via xarray, applies spatial funcs per timestep, feeds streaming accumulators, returns scalar dict per event. Ported/adapted from artools/cloud/worker.py. Named process_event() (not process_storm()) since it generalizes beyond storms.
  • Extend agg() in runner.py — branch on get_pipeline_type(config) == "temporal" to call _run_temporal_local() / _run_temporal_lambda(), which iterate over events (not morton cells) and call process_event(). The existing CLI (invoke_lambda.py) would work for temporal pipelines with no changes — just pass --config merra2_storm.yaml.
  • DRY up runner.py Lambda dispatch_run_lambda() / _invoke_lambda_cell() duplicates retry + ThreadPoolExecutor logic that orchestrate.dispatch_lambda() now provides. Should refactor to use it.
  • Derived variable convention — how to handle computed variables like _rainfall = PRECCU + PRECLS in config (currently special-cased in artools worker).
  • Event pipeline type (pipeline.type: event) — dynamic geometry, lifecycle metadata. Deferred until temporal works end-to-end.

Closes #12 when all remaining items are complete.

Test plan

  • All 187 existing + new tests pass (pytest tests/)
  • Backward compatibility: default_config("atl06") works unchanged, defaults to spatial
  • merra2_storm.yaml loads and validates as temporal pipeline
  • Accumulator edge cases: empty, NaN, None inputs
  • Spatial functions tested with synthetic 3×3 xarray grids
  • Orchestrate cost estimation and log parsing verified with synthetic data
  • Integration test with actual MERRA-2 data (deferred — requires S3 access)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Temporal aggregation

1 participant