Python API for notebook/JupyterHub use (#13) by espg · Pull Request #14 · englacial/zagg

espg · 2026-04-07T20:47:29Z

Summary

Extract orchestration logic from CLI scripts into magg.runner.agg() Python API
Enable pipeline execution from Jupyter notebooks and JupyterHub environments
CLI (python -m magg) becomes a thin wrapper around agg()
Supports backend="local" (ThreadPoolExecutor) and backend="lambda" (AWS Lambda)

API

from magg import load_config, agg

config = load_config("atl06.yaml")

# Local processing
results = agg(config, catalog="catalog.json", store="./output.zarr", max_cells=5)

# Lambda backend
results = agg(config, catalog="catalog.json", store="s3://bucket/out.zarr", backend="lambda")

# Dry run
results = agg(config, catalog="catalog.json", store="./out.zarr", dry_run=True)

Remaining work (issue #13)

Notebook example demonstrating JupyterHub usage (operator-managed + BYOC)
Refactor invoke_lambda.py to use agg(backend="lambda") internally
Verify credential flows in JupyterHub environments (env vars, uploaded files)
Future backends: Step Functions (backend="sfn"), Lithops (backend="lithops")

Test plan

Validation tests (missing catalog, missing store, unknown backend, lambda requires s3://)
Dry run tests (summary, max_cells, morton_cell, invalid cell)
Cell selection tests (all, max_cells, morton_cell, invalid)
Config fallback tests (catalog from config, store from config)
Full test suite passes (142 tests)
End-to-end notebook test on JupyterHub

🤖 Generated with Claude Code

…r.agg()

espg · 2026-04-07T21:50:40Z

Note on HTTPS base path mapping

The driver=https support uses a base URL mapping stored in catalog metadata to rewrite S3 URLs to HTTPS URLs at runtime. Here's how it works:

At catalog build time, _extract_base_urls() grabs both the S3 and HTTPS URLs from CMR for the first granule that has both, then derives the common suffix to extract the divergent prefixes:

s3://nsidc-cumulus-prod-protected  →  s3_base
https://data.nsidc.earthdatacloud.nasa.gov/nsidc-cumulus-prod-protected  →  https_base

These are stored once in catalog.metadata, not per-granule. The actual catalog entries remain S3 URLs only — no size increase.

At runtime, when driver=https, the runner rewrites each URL by replacing s3_base with https_base. This is a single string substitution, derived from CMR data rather than hardcoded.

Assumption: all granules in a catalog share the same base path. This holds today because a catalog is built from a single CMR query (one short_name + version + provider), so all granules come from the same DAAC bucket. If we ever mix data products within a single catalog (e.g., ATL06 + ATL08 from different providers, or cross-DAAC queries), this assumption would break and we'd need per-granule or per-provider base mappings. I can't think of a realistic case where this happens — catalogs are product-specific by design — but noting it here for future reference.

espg · 2026-04-07T22:21:54Z

Feel like the best path for the above comment would be to switch to a geoparquet format in the future (for the catalogs)

add Python API, extract orchestration logic from CLIs into magg.runne…

cf5b182

…r.agg()

backend selection for s3 vs https, demo notebook for local vs lambda

cdab458

espg added 3 commits April 7, 2026 15:30

working local / lambda agg function and notebook; plots

f39d7fb

updated notebooks / plots

a076171

updated custom agg notebook (for full outputs)

fd1174d

espg merged commit e1da439 into main Apr 7, 2026
8 checks passed

espg deleted the magg_deployment branch April 7, 2026 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python API for notebook/JupyterHub use (#13)#14

Python API for notebook/JupyterHub use (#13)#14
espg merged 5 commits into
mainfrom
magg_deployment

espg commented Apr 7, 2026 •

edited

Loading

Uh oh!

espg commented Apr 7, 2026

Uh oh!

espg commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

espg commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API

Remaining work (issue #13)

Test plan

Uh oh!

espg commented Apr 7, 2026

Note on HTTPS base path mapping

Uh oh!

espg commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

espg commented Apr 7, 2026 •

edited

Loading