Skip to content

feat: add OlmoEarth v1/v1.1 embedder (S2 L2A 12-band, FlexiViT)#85

Open
amrit110 wants to merge 4 commits into
cybergis:mainfrom
amrit110:feat/olmoearth
Open

feat: add OlmoEarth v1/v1.1 embedder (S2 L2A 12-band, FlexiViT)#85
amrit110 wants to merge 4 commits into
cybergis:mainfrom
amrit110:feat/olmoearth

Conversation

@amrit110

@amrit110 amrit110 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds olmoearth as a new on-the-fly embedder for the OlmoEarth foundation model family (Allen AI), trained on the Major TOM dataset
  • Supports all 7 released variants: nano, tiny, base, large (v1) and nano_v1_1, tiny_v1_1, base_v1_1 (v1.1) — embedding dims 128/192/768/1024
  • Fetches all 12 S2 L2A bands from GEE in OlmoEarth band-set order, applies per-band mean±2σ normalization, and encodes with the FlexiViT encoder
  • Both pooled (global pool over spatial+temporal+band-set token dims) and grid (spatial token map D×H'×W') output modes supported
  • Configurable patch_size (1–8, default 4) and image_size (default 256, matching training tile size)
  • True batch inference with shape-grouped forward passes
  • olmoearth-pretrain-minimal added as an optional extra (pip install rs-embed[olmoearth])
  • 47 unit tests (all passing, no regressions in the 705-test suite)
  • Full docs page at docs/models/olmoearth.md; catalog/reference tables updated
  • Also fixes two pre-existing broken anchor links in prithvi.md and satmae.md

Test plan

  • All 47 OlmoEarth-specific tests pass
  • Full test suite: 705 passed, no regressions
  • Integration test: nano v1 and nano v1.1 models load and produce correct embedding shapes
  • API functions verified: list_models(), describe_model(), Model class
  • Pre-commit hooks pass (ruff check + format + prettier)
  • Docs build with no warnings (mkdocs build --strict)

amrit110 added 3 commits June 8, 2026 20:49
Adds a new on-the-fly embedder for the OlmoEarth foundation model family
(Allen AI) trained on the Major TOM dataset. Supports all 7 released
variants: nano/tiny/base/large (v1) and nano/tiny/base (v1.1).

Key implementation details:
- Fetches 12 S2 L2A bands from GEE in OlmoEarth band-set order
  (10 m: B2/B3/B4/B8, 20 m: B5/B6/B7/B8A/B11/B12, 60 m: B1/B9)
- Per-band normalization via OlmoEarth COMPUTED strategy (mean±2σ)
- FlexiViT encoder accepts configurable patch_size (1-8, default 4)
  and image_size (default 256, the training tile size)
- Timestamps derived automatically from the temporal midpoint date
- Both pooled (global mean/max over spatial+temporal+band-set dims)
  and grid (spatial token map D×H'×W') output modes supported
- True batch inference with shape-grouped forward passes
- olmoearth-pretrain-minimal added as an optional extra [olmoearth]

Also fixes two pre-existing broken anchor links in prithvi.md and satmae.md.
@amrit110

amrit110 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@Dinghye, please review!

@Dinghye

Dinghye commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

@amrit110 crazy! I will check this today. Thank you soooo much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants