Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docs-preview-comment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ jobs:
run: |
npm install -g surge

SURGE_DOMAIN="magg-pr-${{ needs.check-comment.outputs.pr_number }}.surge.sh"
SURGE_DOMAIN="zagg-pr-${{ needs.check-comment.outputs.pr_number }}.surge.sh"

cd site
surge . $SURGE_DOMAIN --token ${{ secrets.SURGE_TOKEN }} || {
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ jobs:
run: |
npm install -g surge

SURGE_DOMAIN="magg-pr-${{ github.event.pull_request.number }}.surge.sh"
SURGE_DOMAIN="zagg-pr-${{ github.event.pull_request.number }}.surge.sh"

cd docs-html
surge . $SURGE_DOMAIN --token ${{ secrets.SURGE_TOKEN }} || {
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/lambda-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ on:
push:
branches: [lambda, main]
paths:
- 'src/magg/**'
- 'src/zagg/**'
- 'deployment/aws/**'
- 'pyproject.toml'
- '.github/workflows/lambda-build.yml'
pull_request:
paths:
- 'src/magg/**'
- 'src/zagg/**'
- 'deployment/aws/**'
- 'pyproject.toml'
- '.github/workflows/lambda-build.yml'
Expand Down
118 changes: 118 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
name: Publish

on:
push:
tags:
- "*.*.*"
workflow_dispatch:
inputs:
tag:
description: 'Tag to publish'
required: true
type: string

jobs:
test:
uses: ./.github/workflows/test.yml

build:
name: Build distribution
needs: test
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.ref }}
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install build
run: python3 -m pip install build --user
- name: Build sdist and wheel
run: python3 -m build
- name: Store distribution packages
uses: actions/upload-artifact@v4
with:
name: python-package-distributions
path: dist/

publish-to-testpypi:
name: Publish to TestPyPI
needs: build
runs-on: ubuntu-latest
environment:
name: testpypi
url: https://test.pypi.org/p/zagg
permissions:
id-token: write

steps:
- name: Download dists
uses: actions/download-artifact@v4
with:
name: python-package-distributions
path: dist/
- name: Publish to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
repository-url: https://test.pypi.org/legacy/
verbose: true

publish-to-pypi:
name: Publish to PyPI
needs: publish-to-testpypi
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/zagg
permissions:
id-token: write

steps:
- name: Download dists
uses: actions/download-artifact@v4
with:
name: python-package-distributions
path: dist/
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1

github-release:
name: Create GitHub Release
needs: publish-to-pypi
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- uses: actions/checkout@v4
with:
ref: main
fetch-depth: 0
- name: Get tag name
id: get_tag
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "tag=${{ inputs.tag }}" >> $GITHUB_OUTPUT
else
echo "tag=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
fi
- name: Create GitHub Release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
TAG_NAME="${{ steps.get_tag.outputs.tag }}"
cat > release_notes.md <<EOF
## Installation

\`\`\`bash
pip install zagg==$TAG_NAME
\`\`\`
EOF
gh release create "$TAG_NAME" \
--repo '${{ github.repository }}' \
--title "Release $TAG_NAME" \
--notes-file release_notes.md || echo "Release already exists"
3 changes: 2 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
branches: [main, lambda]
pull_request:
types: [assigned, opened, synchronize, reopened]
workflow_call:

jobs:
test:
Expand All @@ -26,4 +27,4 @@ jobs:
run: uv sync --extra test

- name: Run tests with pytest
run: uv run pytest --cov=magg --cov-report=xml --cov-report=term
run: uv run pytest --cov=zagg --cov-report=xml --cov-report=term
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,14 @@ env/
venv/
.venv/
*.egg-info/
src/magg.egg-info/
src/zagg.egg-info/
dist/
build/
uv.lock

# hatch-vcs generated version file
src/zagg/_version.py

# IDE
.vscode/
.idea/
Expand Down
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# magg - Multi-resolution Aggregation
# zagg - Multi-resolution Aggregation

Aggregate point observations to multi-resolution grids using HEALPix spatial indexing and serverless compute.

## Overview

magg aggregates sparse point data (e.g., ICESat-2 ATL06 elevation measurements) to gridded products using HEALPix/morton spatial indexing. Processing runs in parallel on AWS Lambda — each worker handles one spatial cell independently, writing to a shared [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) store following the [DGGS convention](https://github.com/zarr-conventions/dggs).
zagg aggregates sparse point data (e.g., ICESat-2 ATL06 elevation measurements) to gridded products using HEALPix/morton spatial indexing. Processing runs in parallel on AWS Lambda — each worker handles one spatial cell independently, writing to a shared [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) store following the [DGGS convention](https://github.com/zarr-conventions/dggs).

## Features

Expand All @@ -22,10 +22,10 @@ Query NASA's CMR to build a mapping of spatial cells to granule S3 URLs.

```bash
# ICESat-2 convenience — cycle number computes dates automatically:
uv run python -m magg.catalog --cycle 22 --parent-order 6
uv run python -m zagg.catalog --cycle 22 --parent-order 6

# General — explicit date range and spatial polygon:
uv run python -m magg.catalog \
uv run python -m zagg.catalog \
--start-date 2024-01-06 --end-date 2024-04-07 \
--short-name ATL06 \
--polygon my_region.geojson \
Expand Down Expand Up @@ -61,20 +61,20 @@ Processing reads a pipeline config YAML (data source, aggregation, output store)

```bash
# Local processing (write to local Zarr):
uv run python -m magg --config atl06.yaml --catalog catalog.json --store ./output.zarr
uv run python -m zagg --config atl06.yaml --catalog catalog.json --store ./output.zarr

# Local processing (write to S3):
uv run python -m magg --config atl06.yaml --catalog catalog.json --store s3://bucket/output.zarr
uv run python -m zagg --config atl06.yaml --catalog catalog.json --store s3://bucket/output.zarr

# Lambda dispatch (requires deployed Lambda function):
uv run python deployment/aws/invoke_lambda.py \
--config atl06.yaml --catalog catalog.json

# Test with a few cells:
uv run python -m magg --config atl06.yaml --catalog catalog.json --max-cells 5
uv run python -m zagg --config atl06.yaml --catalog catalog.json --max-cells 5

# Dry run:
uv run python -m magg --config atl06.yaml --catalog catalog.json --dry-run
uv run python -m zagg --config atl06.yaml --catalog catalog.json --dry-run
```

The store path and output grid parameters are defined in the YAML config (`output.store`, `output.grid.child_order`) and can be overridden via `--store` on the command line.
Expand All @@ -92,9 +92,9 @@ Adjust `GRID_SPACING` in the notebook to control output resolution (default 2 km
## Project Structure

```
magg/
├── src/magg/ # Main package (cloud-agnostic)
│ ├── __main__.py # Local processing runner (python -m magg)
zagg/
├── src/zagg/ # Main package (cloud-agnostic)
│ ├── __main__.py # Local processing runner (python -m zagg)
│ ├── config.py # YAML pipeline configuration
│ ├── processing.py # Core aggregation pipeline
│ ├── catalog.py # CMR query + catalog building
Expand Down
18 changes: 9 additions & 9 deletions deployment/LAMBDA_DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ for testing. The target architecture is **arm64 / py3.12** (20% cheaper per GB-s
- **Runtime**: python3.11
- **Architecture**: x86_64
- **Layer**: `xagg-dependencies:1` (x86_64, py3.11, h5coro==0.0.8)
- **Function code**: `lambda_handler.py` + `magg/` package + obstore/zarr/pydantic/pyyaml
- **Role**: `magg-lambda-execution` (scoped to `xagg` bucket)
- **Function code**: `lambda_handler.py` + `zagg/` package + obstore/zarr/pydantic/pyyaml
- **Role**: `zagg-lambda-execution` (scoped to `xagg` bucket)

### What's in the layer vs function code

Expand All @@ -19,7 +19,7 @@ numpy, pandas, h5coro, mortie, healpy, earthaccess, boto3, astropy, shapely, cra
fastparquet, requests, s3fs, and transitive deps.

**Function code** (20MB unzipped):
`lambda_handler.py`, `magg/` package, obstore, zarr, pydantic-zarr, pyyaml, pydantic,
`lambda_handler.py`, `zagg/` package, obstore, zarr, pydantic-zarr, pyyaml, pydantic,
pydantic-core, typeguard, typing_inspect, annotated-types.

---
Expand Down Expand Up @@ -67,7 +67,7 @@ cd /tmp/layer_build && zip -qr /tmp/lambda_layer_arm64.zip python/

# Publish
aws lambda publish-layer-version \
--layer-name magg-deps-arm64 \
--layer-name zagg-deps-arm64 \
--compatible-runtimes python3.12 \
--compatible-architectures arm64 \
--zip-file fileb:///tmp/lambda_layer_arm64.zip \
Expand All @@ -77,7 +77,7 @@ aws lambda publish-layer-version \
aws lambda update-function-configuration \
--function-name process-morton-cell \
--runtime python3.12 \
--layers "arn:aws:lambda:us-west-2:429435741471:layer:magg-deps-arm64:1" \
--layers "arn:aws:lambda:us-west-2:429435741471:layer:zagg-deps-arm64:1" \
--region us-west-2

# Then update code with arm64 arch
Expand All @@ -95,7 +95,7 @@ option because Linux ARM64 runners have had issues building some of our deps.

Key findings:
- `macos-15` runners use M1 Apple Silicon (ARM64), 3 CPUs, 7GB RAM
- Free for public repos (englacial/magg is public), $0.062/min for private
- Free for public repos (englacial/zagg is public), $0.062/min for private
- Docker is NOT available on macOS ARM64 runners (Apple Virtualization limitation)
- `pip install --platform manylinux2014_aarch64 --only-binary=:all:` works from macOS
to cross-compile Lambda layers — no Docker needed
Expand All @@ -116,13 +116,13 @@ Then unzip wheels into the build directory. This is what we're doing now for tes

## Deploying Updated Function Code (no layer change)

When only `lambda_handler.py` or `magg/` package code changes (no new deps):
When only `lambda_handler.py` or `zagg/` package code changes (no new deps):

```bash
# Build zip
rm -rf /tmp/lambda_build && mkdir -p /tmp/lambda_build
cp deployment/aws/lambda_handler.py /tmp/lambda_build/
cp -r src/magg /tmp/lambda_build/magg
cp -r src/zagg /tmp/lambda_build/zagg

# Add deps not in layer (skip native ones if already unpacked)
pip install --target /tmp/lambda_build --no-deps \
Expand Down Expand Up @@ -186,7 +186,7 @@ deps from the layer into the function code (or vice versa).
### Scripts
- `deployment/aws/build_layer_v14.sh` — x86_64 layer build (runs in AL2023 Docker container)
- `deployment/aws/build_arm64_layer.sh` — arm64 layer build (runs in manylinux Docker container)
- `deployment/aws/build_function.sh` — function code build (handler + magg + non-layer deps)
- `deployment/aws/build_function.sh` — function code build (handler + zagg + non-layer deps)

### CI/CD
- `.github/workflows/lambda-build.yml` — builds both layer + function for x86_64 and arm64,
Expand Down
6 changes: 3 additions & 3 deletions deployment/aws/build_function.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Build Lambda function code zip (handler + magg package + non-layer deps)
# Build Lambda function code zip (handler + zagg package + non-layer deps)
#
# Usage:
# ./build_function.sh # auto-detect arch and python
Expand Down Expand Up @@ -45,9 +45,9 @@ echo "============================================================"

# --- Copy our code ---
echo ""
echo "Copying handler and magg package..."
echo "Copying handler and zagg package..."
cp "$REPO_ROOT/deployment/aws/lambda_handler.py" "$BUILD_DIR/"
cp -r "$REPO_ROOT/src/magg" "$BUILD_DIR/magg"
cp -r "$REPO_ROOT/src/zagg" "$BUILD_DIR/zagg"

# --- Install function-level dependencies ---
# These are packages NOT in the Lambda layer.
Expand Down
8 changes: 4 additions & 4 deletions deployment/aws/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@

set -e

FUNCTION_NAME="${MAGG_LAMBDA_FUNCTION_NAME:-process-morton-cell}"
S3_BUCKET="${MAGG_S3_BUCKET:-xagg}"
FUNCTION_NAME="${ZAGG_LAMBDA_FUNCTION_NAME:-process-morton-cell}"
S3_BUCKET="${ZAGG_S3_BUCKET:-xagg}"
REGION="us-west-2"
ARCH="arm64"
FUNCTION_ONLY=false
Expand All @@ -28,8 +28,8 @@ for arg in "$@"; do
done

case "$ARCH" in
arm64) RUNTIME="python3.12"; LAYER_NAME="magg-deps-arm64" ;;
x86_64) RUNTIME="python3.11"; LAYER_NAME="magg-deps-x86_64" ;;
arm64) RUNTIME="python3.12"; LAYER_NAME="zagg-deps-arm64" ;;
x86_64) RUNTIME="python3.11"; LAYER_NAME="zagg-deps-x86_64" ;;
esac

echo "============================================================"
Expand Down
10 changes: 5 additions & 5 deletions deployment/aws/invoke_lambda.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"""
Production Lambda orchestrator with cost reporting.

Thin wrapper around magg.agg(backend="lambda") that adds verbose progress
Thin wrapper around zagg.agg(backend="lambda") that adds verbose progress
output, architecture-based cost calculation, and results JSON export.

Usage:
Expand All @@ -18,8 +18,8 @@

import boto3

from magg.config import default_config, get_store_path, load_config
from magg.runner import agg
from zagg.config import default_config, get_store_path, load_config
from zagg.runner import agg

# Lambda pricing (us-west-2)
# https://aws.amazon.com/lambda/pricing/
Expand Down Expand Up @@ -122,8 +122,8 @@ def main():
parser.add_argument("--output-dir", default=".", help="Directory for output results JSON")
parser.add_argument(
"--function-name",
default=os.environ.get("MAGG_LAMBDA_FUNCTION_NAME", "process-morton-cell"),
help="Lambda function name (default: env MAGG_LAMBDA_FUNCTION_NAME or 'process-morton-cell')",
default=os.environ.get("ZAGG_LAMBDA_FUNCTION_NAME", "process-morton-cell"),
help="Lambda function name (default: env ZAGG_LAMBDA_FUNCTION_NAME or 'process-morton-cell')",
)
args = parser.parse_args()

Expand Down
Loading
Loading