Python API for notebook/JupyterHub use

## Summary

All processing orchestration is currently locked inside CLI `main()` functions (`__main__.py`, `invoke_lambda.py`). This makes it impossible to run pipelines from a Jupyter notebook without shelling out. We need a Python API that exposes the same functionality as importable functions.

## Motivation

Two JupyterHub scenarios drive this:

1. **Operator-managed hub** (CryoCloud, Pangeo, institutional) — operator pre-configures AWS + Earthdata credentials as env vars. Users `import magg` and call functions directly.
2. **BYOC (bring your own compute)** on a free hub — user provides their own AWS credentials (env vars or uploaded file) and fans out to Lambda in their own account.

In both cases, users need a Python API, not a CLI.

## Proposed API

```python
from magg import load_config, run

config = load_config("atl06.yaml")

# Local processing (zero AWS infrastructure needed)
results = run(config, catalog="catalog.json", store="./output.zarr", max_cells=5)

# Lambda backend (requires deployed Lambda + AWS creds)
results = run(config, catalog="catalog.json", store="s3://bucket/out.zarr", backend="lambda")
```

### `magg.run()` signature

```python
def run(
    config: PipelineConfig,
    *,
    catalog: str | None = None,      # path to catalog JSON (overrides config.catalog)
    store: str | None = None,        # store path (overrides config.output.store)
    backend: str = "local",          # "local" or "lambda"
    max_cells: int | None = None,    # limit cells (for testing)
    morton_cell: str | None = None,  # process a specific cell
    max_workers: int | None = None,  # concurrency (default: 4 local, 1700 lambda)
    overwrite: bool = False,         # overwrite existing Zarr template
    dry_run: bool = False,           # preview without processing
    # Lambda-specific
    function_name: str | None = None,  # Lambda function name (env var fallback)
    region: str = "us-west-2",
) -> dict:
    """Run the aggregation pipeline."""
```

Returns a results dict with summary statistics (cells processed, total obs, errors, timing).

### Backend dispatch

- `backend="local"`: `ThreadPoolExecutor`, processes cells in-process. Current `__main__.py` logic.
- `backend="lambda"`: Direct Lambda invocation via boto3. Current `invoke_lambda.py` logic.
- Future: `backend="sfn"` (Step Functions), `backend="lithops"`.

### Credential handling

No custom credential logic needed — existing libraries handle it:
- **Earthdata**: `earthaccess.login()` reads `EARTHDATA_USERNAME`/`EARTHDATA_PASSWORD` env vars, `~/.netrc`, or prompts interactively
- **AWS**: `boto3` reads `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars, `~/.aws/credentials`, or IAM role

JupyterHub operators set env vars; BYOC users set them in their notebook session.

## Implementation plan

1. Extract orchestration logic from `__main__.py` and `invoke_lambda.py` into `magg.runner` module
2. `run()` dispatches to `_run_local()` or `_run_lambda()` based on `backend`
3. CLIs become thin wrappers: parse args → `load_config()` → `run()`
4. Tests for the Python API
5. Notebook example demonstrating JupyterHub usage

## Interaction with other work

- Builds on #8 (config-driven pipeline) — `run()` takes a `PipelineConfig`
- Uses `open_store()` from `magg.store` for backend-agnostic store creation
- Uses `get_child_order()`, `get_store_path()` config helpers
- Lambda function name configurable via `MAGG_LAMBDA_FUNCTION_NAME` env var (already implemented)

## Future backends

- **Step Functions**: CDK-deployed state machine, triggered via `boto3.client('stepfunctions').start_execution()`
- **Lithops**: auto-deploys Lambda function via container image, `fexec.map()` for dispatch
- Backend interface should be simple enough that adding new backends is straightforward


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python API for notebook/JupyterHub use #13

Summary

Motivation

Proposed API

`magg.run()` signature

Backend dispatch

Credential handling

Implementation plan

Interaction with other work

Future backends

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Python API for notebook/JupyterHub use #13

Description

Summary

Motivation

Proposed API

magg.run() signature

Backend dispatch

Credential handling

Implementation plan

Interaction with other work

Future backends

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`magg.run()` signature