Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio

### Added

- **OlmoEarth v1/v1.1 embedder (`olmoearth`).** Adds support for the [OlmoEarth](https://huggingface.co/collections/allenai/olmoearth) foundation model family from Allen AI, trained on the Major TOM dataset. All 7 released variants are supported: `nano`, `tiny`, `base`, `large` (v1) and `nano_v1_1`, `tiny_v1_1`, `base_v1_1` (v1.1), with embedding dimensions 128/192/768/1024. The adapter fetches all 12 Sentinel-2 L2A bands from GEE in OlmoEarth's native band-set order, applies per-band mean±2σ normalization (OlmoEarth COMPUTED strategy), and encodes with the FlexiViT encoder. Both `pooled` and `grid` output modes are supported. `patch_size` (default 4) and `image_size` (default 256) are configurable via `model_config` or environment variables. Requires the `olmoearth-pretrain-minimal` package: `pip install rs-embed[olmoearth]`.

- **GEE fetch statistics reporting in `export_batch`.** When `show_progress=True`, a `[gee_fetch]` summary line is now printed to stderr after each prefetch chunk completes, reporting total planned fetches, completed, failed, cache hits, and the most recently processed point/sensor. This gives users visibility into GEE quota consumption, cache reuse, and whether runtime is dominated by fetching vs. inference. No output is emitted when `show_progress=False` or when no GEE provider is involved (e.g. precomputed models). The underlying `FetchStats` class in `tools/progress.py` is thread-safe and accumulates counts cumulatively across chunks.

### Fixed
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ This is a convenience index with basic model info only (for quick scanning / lin
| `terrafm` | S2 12-band / S1 VV-VH | 10m | [ICLR 2026](https://arxiv.org/abs/2506.06281) | [link](https://github.com/mbzuai-oryx/TerraFM) |
| `thor` | S2 10-band | 10m | [arXiv 2026](https://arxiv.org/abs/2601.16011) | [link](https://github.com/FM4CS/THOR) |
| `agrifm` | S2 time series (10-band) | 10m | [RSE 2026](https://www.sciencedirect.com/science/article/pii/S0034425726000040) | [link](https://github.com/flyakon/AgriFM) |
| `olmoearth` | S2 L2A 12-band | 10m | [arXiv 2025](https://arxiv.org/abs/2511.13655) | [link](https://huggingface.co/collections/allenai/olmoearth) |

Resolution here means the default provider/source fetch resolution used by the adapter, not the final resized tensor shape seen by the model.

Expand Down
1 change: 1 addition & 0 deletions docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ Some detail-page filenames still use older names for compatibility, but the cano
| `anysat` | S2 10-band time series | 768 | 10m | multi-frame | JEPA; `s2_dates` DOY side input | [detail](models/anysat.md) |
| `galileo` | S2 10-band time series | 128 | 10m | multi-frame | nano default; month tokens | [detail](models/galileo.md) |
| `agrifm` | S2 10-band time series | 1024 | 10m | multi-frame | Video Swin; fixed `T` frame stack | [detail](models/agrifm.md) |
| `olmoearth` | S2 L2A 12-band | 128–1024 | 10m | single composite | FlexiViT; 4 sizes (nano/tiny/base/large); requires `[olmoearth]` extra | [detail](models/olmoearth.md) |

---

Expand Down
213 changes: 213 additions & 0 deletions docs/models/olmoearth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# OlmoEarth (`olmoearth`)

## Quick Facts

| Field | Value |
| -------------------- | --------------------------------------------------------------------------------------------------------- |
| Model ID | `olmoearth` |
| Family / Backbone | OlmoEarth v1/v1.1 — FlexiViT encoder (ViT-style) trained on the Major TOM dataset |
| Adapter type | `on-the-fly` |
| Model config keys | `variant` (default: `nano`), `patch_size` (default: `4`), `image_size` (default: `256`) |
| Training alignment | High (S2 L2A 12-band; native 10 m resolution; per-band mean±2σ normalization matches training pipeline) |

!!! success "OlmoEarth In 30 Seconds"
OlmoEarth is a **multi-modal geospatial foundation model** from Allen AI, trained on the Major TOM dataset with Sentinel-2 L2A as the primary modality. It uses a FlexiViT encoder that accepts variable patch sizes, enabling flexible spatial resolution trade-offs. In `rs-embed`, the adapter fetches all **12 S2 L2A bands** and encodes them in a single forward pass.

Key characteristics:
- All 12 S2 L2A bands in the OlmoEarth band-set order (10 m → 20 m → 60 m groups)
- Per-band normalization using OlmoEarth's COMPUTED strategy (mean ± 2σ)
- 4 size variants in v1 (`nano`/`tiny`/`base`/`large`) and 3 in v1.1 (`nano_v1_1`/`tiny_v1_1`/`base_v1_1`)
- `patch_size` controls the spatial token density (1–8); default `4` matches the official inference example
- Input image resized to `image_size` (default 256) before encoding
- Requires `olmoearth-pretrain-minimal` (`pip install rs-embed[olmoearth]`)

---

## Input Contract

| Field | Value |
| --------------------- | ---------------------------------------------------------------------------------- |
| Backend | provider only (`gee` / `auto`) |
| `TemporalSpec` | `range` or `year` (normalized via shared helper; year → full year composite) |
| Default collection | `COPERNICUS/S2_SR_HARMONIZED` |
| Default bands (order) | `B2, B3, B4, B8, B5, B6, B7, B8A, B11, B12, B1, B9` |
| Default fetch | `scale_m=10`, `cloudy_pct=30`, `composite="median"` |
| `input_chw` | `CHW`, `C=12` in the band order above, raw SR DN `0..10000` |
| Side inputs | timestamps (derived from temporal midpoint), none required from user |

The band order matches OlmoEarth's internal `Modality.SENTINEL2_L2A` definition:
three band sets (10 m, 20 m, 60 m) totaling 12 channels.

---

## Preprocessing Pipeline

```mermaid
flowchart LR
INPUT["S2 12-band raw DN"] --> NORM["Per-band mean±2σ\nnormalization"]
NORM --> RESIZE["Resize to image_size\n(default 256×256)"]
RESIZE --> SAMPLE["Build MaskedOlmoEarthSample\n(B=1, H, W, T=1, C=12)"]
SAMPLE --> ENC["FlexiViT encoder\npatch_size=4 (default)"]
ENC --> POOL["Pool over T×BandSets\n→ (B, H', W', D)"]
POOL --> OUTPUT{Output mode}
OUTPUT -- pooled --> VEC["Global mean/max\n→ (D,) vector"]
OUTPUT -- grid --> GRID["Spatial token map\n(D, H', W')"]
```

---

## Architecture Concept

```mermaid
flowchart LR
S2["S2 L2A\n12 bands\n3 band sets"] --> PE["FlexiViT\npatch embed\n(patch_size 1–8)"]
TS["Timestamps\n(day, month, year)"] --> TE["Temporal + month\nembeddings"]
PE --> ATTN["Transformer\nencoder\n(depth by variant)"]
TE --> ATTN
ATTN --> OUT["tokens:\n(B, H', W', T, S, D)"]
OUT --> MEAN["Mean over T, S"]
MEAN --> RESULT["Spatial grid\n(D, H', W')"]
```

The encoder output is a 6-D tensor `(B, H', W', T=1, S, D)` where `S` is the number of band sets (3 for v1, 1 for v1.1 due to the linear patch embedding change). All pooling is applied after the encoder.

---

## Model-specific Settings

### `variant`

Selects the model size and version. Weights are automatically downloaded from Hugging Face on first use.

| Variant | Version | Encoder Dim | Depth | HuggingFace Repo |
| ------------ | ------- | ----------- | ----- | ---------------------------------- |
| `nano` | v1 | 128 | 4 | `allenai/OlmoEarth-v1-Nano` |
| `tiny` | v1 | 192 | 12 | `allenai/OlmoEarth-v1-Tiny` |
| `base` | v1 | 768 | 12 | `allenai/OlmoEarth-v1-Base` |
| `large` | v1 | 1024 | 24 | `allenai/OlmoEarth-v1-Large` |
| `nano_v1_1` | v1.1 | 128 | 4 | `allenai/OlmoEarth-v1_1-Nano` |
| `tiny_v1_1` | v1.1 | 192 | 12 | `allenai/OlmoEarth-v1_1-Tiny` |
| `base_v1_1` | v1.1 | 768 | 12 | `allenai/OlmoEarth-v1_1-Base` |

!!! note "v1 vs v1.1 architecture difference"
v1 uses a Conv2D-based patch embedding, producing 3 separate band-set token groups per spatial location.
v1.1 uses a linear patch embedding (`use_linear_patch_embed=True`) that merges band sets into a single token stream. Both versions produce the same output dimensionality after pooling.

Short aliases are accepted: `nano_11`, `tiny_11`, `base_11` for v1.1 variants; `nano_v1`, `tiny_v1`, `base_v1`, `large_v1` for v1 variants.

### `patch_size`

Controls the spatial patch size for the FlexiViT encoder. Smaller values produce more spatial tokens (higher resolution) at the cost of longer inference time.

| `patch_size` | Tokens (256×256 image) | Note |
| ------------ | ---------------------- | --------------------------------- |
| `4` | 64 × 64 = 4096 | Default; more spatially detailed |
| `8` | 32 × 32 = 1024 | Faster; coarser spatial grid |
| `2` | 128 × 128 = 16384 | Very detailed; significantly slower |

### `image_size`

Target pixel size for the resize step. The fetched patch is always resized to `(image_size, image_size)` before encoding. Must be divisible by `patch_size`.

Default: `256` (matching the OlmoEarth training tile size).

---

## Output Semantics

### Pooled (`OutputSpec.pooled()`)

The encoder output `(B, H', W', T=1, S, D)` is pooled over all spatial, temporal, and band-set dimensions via the OlmoEarth built-in `pool_unmasked_tokens()`. This produces a `(D,)` vector.

`pooling="mean"` (default) computes mean; `pooling="max"` computes max over token positions.

### Grid (`OutputSpec.grid()`)

Returns a `(D, H', W')` spatial token map as an `xarray.DataArray` with dimensions `(d, y, x)`. The temporal (T=1) and band-set (S) dimensions are averaged out; only the spatial token grid is retained.

Grid size depends on `image_size` and `patch_size`:
```
H' = W' = image_size // patch_size
```
For defaults (256, patch_size=4): `64 × 64` grid.

---

## Environment Variables

| Variable | Default | Effect |
| -------------------------------- | -------- | --------------------------------------------------- |
| `RS_EMBED_OLMOEARTH_VARIANT` | `nano` | Default model variant when `model_config` not given |
| `RS_EMBED_OLMOEARTH_PATCH_SIZE` | `4` | Default patch size when `model_config` not given |
| `RS_EMBED_OLMOEARTH_IMAGE_SIZE` | `256` | Default image resize target |
| `RS_EMBED_OLMOEARTH_FETCH_WORKERS` | `8` | Parallel GEE fetch workers for batch calls |
| `RS_EMBED_OLMOEARTH_BATCH_SIZE` | `4` (CPU) / `16` (CUDA) | Inference batch size for `get_embeddings_batch_from_inputs` |

---

## Installation

OlmoEarth requires an additional package not included in the base `rs-embed` install:

```bash
pip install rs-embed[olmoearth]
# or
uv pip install olmoearth-pretrain-minimal
```

---

## Usage Examples

```python
import rs_embed as rs
from rs_embed.core.specs import BBox, TemporalSpec, OutputSpec

# Pooled embedding with default nano variant
emb = rs.get_embedding(
"olmoearth",
spatial=BBox(minlon=-2.0, minlat=6.0, maxlon=-1.9, maxlat=6.1),
temporal=TemporalSpec.year(2022),
output=OutputSpec.pooled(),
)
print(emb.data.shape) # (128,) for nano

# Use base variant
emb_base = rs.get_embedding(
"olmoearth",
spatial=BBox(minlon=-2.0, minlat=6.0, maxlon=-1.9, maxlat=6.1),
temporal=TemporalSpec.year(2022),
output=OutputSpec.pooled(),
model_config={"variant": "base"},
)
print(emb_base.data.shape) # (768,) for base

# Grid embedding (spatial token map)
emb_grid = rs.get_embedding(
"olmoearth",
spatial=BBox(minlon=-2.0, minlat=6.0, maxlon=-1.9, maxlat=6.1),
temporal=TemporalSpec.year(2022),
output=OutputSpec.grid(),
model_config={"variant": "nano", "patch_size": 8},
)
print(emb_grid.data.shape) # (128, 32, 32) for nano with patch_size=8

# Class-based API for repeated calls
from rs_embed.model import Model
from rs_embed.core.specs import PointBuffer

model = Model("olmoearth", model_config={"variant": "tiny"})
embeddings = model.get_embeddings_batch([
PointBuffer(lon=-1.95, lat=6.05, buffer_m=1000),
PointBuffer(lon=-2.10, lat=6.20, buffer_m=1000),
], temporal=TemporalSpec.year(2022))
```

---

## Notes and Caveats

- The OlmoEarth normalizer clips to `mean ± 2σ` before rescaling to `[0, 1]`. Values outside this range are clipped, not discarded.
- `patch_size` is a **model input** (FlexiViT accepts variable patch sizes), not a preprocessing hyperparameter. Different `patch_size` values may produce embeddings with different spatial characteristics.
- The `large` variant is only available in v1 (no v1.1 large release at time of writing).
- Weights are cached by `huggingface_hub` in the default HF cache directory.
2 changes: 1 addition & 1 deletion docs/models/prithvi.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
In `rs-embed`, its most important characteristics are:

- **required** temporal (`year, day_of_year`) and location (`lat, lon`) side inputs auto-derived by the adapter: see [Input Contract](#input-contract)
- 30 m default `sensor.scale_m`, not the more common S2 10 m default — a frequent source of silent drift: see [Reproducibility Notes](#reproducibility-notes)
- 30 m default `sensor.scale_m`, not the more common S2 10 m default — a frequent source of silent drift: see [Environment Variables / Tuning Knobs](#environment-variables-tuning-knobs)
- `resize` vs `pad` preprocessing changes token geometry and should be treated as part of the experiment, not as a cosmetic knob: see [Environment Variables / Tuning Knobs](#environment-variables-tuning-knobs)

---
Expand Down
2 changes: 1 addition & 1 deletion docs/models/satmae.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
In `rs-embed`, its most important characteristics are:

- RGB-only (`B4,B3,B2`); raw SR is converted to `uint8` before model preprocessing: see [Preprocessing Pipeline](#preprocessing-pipeline)
- token path is always used (`mask_ratio=0.0`), and any CLS token is auto-removed before pooling/grid: see [Output Semantics](#output-semantics)
- token path is always used (`mask_ratio=0.0`), and any CLS token is auto-removed before pooling/grid: see [Reference](#reference)
- checkpoint selection via `RS_EMBED_SATMAE_ID` (Hugging Face model ID) — default targets the fMoW large checkpoint: see [Environment Variables / Tuning Knobs](#environment-variables-tuning-knobs)

---
Expand Down
4 changes: 3 additions & 1 deletion docs/models_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Read this section before comparing any model that accepts `TemporalSpec.range(..

For most on-the-fly adapters, `TemporalSpec.range(start, end)` means "filter imagery in `[start, end)` and build one composite patch for model input," usually with `median` and optionally `mosaic` through `SensorSpec.composite`.

The multi-frame adapters `agrifm`, `anysat`, and `galileo` instead split the requested range into sub-windows and composite one frame per bin. Current single-composite adapters include `remoteclip`, `satmae`, `satmaepp`, `satmaepp_s2_10b`, `scalemae`, `wildsat`, `prithvi`, `terrafm`, `terramind`, `dofa`, `fomo`, `thor`, and `satvision`.
The multi-frame adapters `agrifm`, `anysat`, and `galileo` instead split the requested range into sub-windows and composite one frame per bin. Current single-composite adapters include `remoteclip`, `satmae`, `satmaepp`, `satmaepp_s2_10b`, `scalemae`, `wildsat`, `prithvi`, `terrafm`, `terramind`, `dofa`, `fomo`, `thor`, `satvision`, and `olmoearth`.

### Multi-frame Semantics

Expand Down Expand Up @@ -68,6 +68,7 @@ Use this table to avoid unfair comparisons between plain image encoders and adap
| `thor` | Yes (`S1`/`S2`) | Yes (select one modality per call: `s1` or `s2`) | No | No hard extra metadata (optional S1 options: orbit, linear/DB path) |
| `agrifm` | No (this adapter path) | No | No extra side tensor, but temporal stack `[T,C,H,W]` required | Temporal coverage is important (no separate metadata tensor) |
| `satvision` | No (this adapter path) | No | No separate side tensor | Yes: strict 14-channel order/calibration schema (band semantics) |
| `olmoearth` | Yes (multi-modal architecture) | S2 L2A only in this adapter | Yes (image + mask + timestamps; all derived automatically) | No hard extra metadata (timestamps derived from temporal midpoint) |

In practice, the most obviously multi-input models here are `prithvi` (image plus temporal and location coordinates), `anysat` (time series plus `s2_dates`), `galileo` (image-derived tensors plus masks and `months`), `dofa` (image plus wavelengths), and `scalemae` (image plus `input_res_m`).

Expand All @@ -93,6 +94,7 @@ This table only lists env vars that materially change model input construction o
| `thor` | `RS_EMBED_THOR_IMG`, `RS_EMBED_THOR_NORMALIZE`, plus modality and sensor-side options (`s2`/`s1`) |
| `agrifm` | `RS_EMBED_AGRIFM_IMG`, `RS_EMBED_AGRIFM_NORM`, `RS_EMBED_AGRIFM_FRAMES` |
| `satvision` | `RS_EMBED_SATVISION_TOA_IMG`, `RS_EMBED_SATVISION_TOA_NORM`, channel-index and calibration env keys |
| `olmoearth` | `RS_EMBED_OLMOEARTH_VARIANT`, `RS_EMBED_OLMOEARTH_IMAGE_SIZE`, `RS_EMBED_OLMOEARTH_PATCH_SIZE` |

### Practical Guidance

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ nav:
- AnySat: models/anysat.md
- Galileo: models/galileo.md
- AgriFM: models/agrifm.md
- OLMoEarth: models/olmoearth.md
- API:
- Overview: api.md
- Specs & Data Structures: api_specs.md
Expand Down
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,13 @@ terramind = [
# TerraMind still loads its backbone through the TerraTorch registry.
"terratorch==1.2.1",
]
olmoearth = [
"olmoearth-pretrain-minimal>=0.0.5",
]
full = [
"matplotlib>=3.10",
"terratorch==1.2.1",
"olmoearth-pretrain-minimal>=0.0.5",
]
dev = [
"pytest>=7.4",
Expand Down
1 change: 1 addition & 0 deletions src/rs_embed/embedders/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
"thor": ("onthefly_thor", "THORBaseEmbedder"),
"agrifm": ("onthefly_agrifm", "AgriFMEmbedder"),
"satvision": ("onthefly_satvision_toa", "SatVisionTOAEmbedder"),
"olmoearth": ("onthefly_olmoearth", "OlmoEarthEmbedder"),
}

MODEL_ALIASES: dict[str, str] = {
Expand Down
Loading
Loading