Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions dev-docs/specs/2026-05-12-cog-block-cache-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# COG block-aligned header cache

**Date:** 2026-05-12
**Issue:** [#500](https://github.com/developmentseed/deck.gl-raster/issues/500)
**Status:** Design — supersedes [`2026-05-05-geotiff-readahead-cache-design.md`](2026-05-05-geotiff-readahead-cache-design.md) (and the unmerged PR [#509](https://github.com/developmentseed/deck.gl-raster/pull/509)).

## Background

The original design (the sequential exponential read-ahead cache, then a frozen-after-open variant) tried to optimize *steady-state* tile rendering by bulk-loading `TileOffsets` / `TileByteCounts` arrays for each IFD. That moved the cost to *open time*. On a real 200 GB Vermont COG, that's tens of MB downloaded before any tile renders — even though the initial view is at an overview level whose primary-image arrays are never used.

geotiff.js takes the opposite approach. Each `fromUrl` call requests just 1024 bytes (header + first IFD pointer), but its `BlockedSource` pads this up to one 64 KiB block — so the first underlying HTTP request is a full block, not 1024 bytes. `getImage(i)` reads only that IFD's entries; tile-array values are wrapped in a `DeferredArray` that holds only their file offset + count. When a tile is requested, `DeferredArray.get(index)` fetches the specific 4–8 byte offset entry (and separately the byte-count entry) through the same block-aligned `BlockedSource` — adjacent entries from the same array all live in the same 64 KiB block, so per-tile lookups cost ~0 HTTP requests after the first one in a region. The block cache lives inside the source layer; cogeotiff's lazy per-entry reads benefit from it automatically.

Our implementation uses the same shape: cogeotiff's `image.getTileSize(idx)` reads `TileOffsets[idx]` and `TileByteCounts[idx]` lazily through the wrapped header source, and our chunk cache turns those 4–8 byte reads into shared 64 KiB block fetches. cogeotiff's first byte read uses its default `DefaultReadSize` (16 KiB); `SourceChunk` pads it up to the chunk size, so the actual wire request is one block.

That design lines up with how cogeotiff was built to be used — `image.init(true)` already loads only "important tags" (dimensions, tile size, georeferencing, GeoKeys) and defers everything else.

## Goals

1. **Low first-paint latency on huge COGs.** Opening a 200 GB COG should make ~one HTTP request, not tens of MB worth.
2. **Bounded steady-state cost per tile.** After warmup, per-tile metadata reads should be effectively free (served from block cache).
3. **Bounded memory.** The cache must evict; never grow without bound.
4. **No header / tile cache crossover.** Tile data bytes must not pollute the header cache. Header bytes must not have to share space with tile data.

## Solution

Use chunkd's built-in `SourceChunk` + `SourceCache` middleware with a fixed 64 KiB block size and an LRU-ish cache. Drop all the bespoke read-ahead machinery from the previous design.

```ts
const source = new SourceHttp(url);
source.metadata = { size: Infinity }; // #524 workaround
const view = new SourceView(source, [
new SourceChunk({ size: 64 * 1024 }),
new SourceCache({ size: 8 * 1024 * 1024 }), // ~128 blocks
]);

const tiff = await Tiff.create(view, { signal });
tiff.options = undefined; // disable leader-bytes path
```

### Why fixed 64 KiB blocks?

- Matches geotiff.js's default. Proven in practice across the GeoTIFF ecosystem.
- One block holds ~8000 BigTIFF tile-offset entries (8 bytes each) or ~16000 classic-TIFF entries (4 bytes each). A viewport's worth of adjacent tile lookups almost always hits a single cached block.
- No tunable that has to be right per file. Pathological cases (huge metadata regions, far-offset probes) all degrade gracefully — they just cost more block fetches.

### Why LRU eviction?

The previous design's sequential cache *never evicted*. For long-running sessions or large files, that's a memory leak. `SourceCache` is a two-generation cache (cacheA flips to cacheB on overflow, cacheB drops) — not strict LRU but bounded and approximately recency-aware in practice.

### Why disable cogeotiff's leader-bytes path?

cogeotiff auto-detects the GDAL ghost option `BLOCK_LEADER=SIZE_AS_UINT4` at `Tiff.create()` time. If present, `TiffImage.getTileSize()` skips the `TileByteCounts` lookup and instead fetches 4 bytes just before the tile data. The comment in cogeotiff explains the intent: *"This fetch will generally load in the bytes needed for the image too provided the image size is less than the size of a chunk."* But that assumption breaks for tiles larger than the block size (very common — many COG tiles are 256×256×3 bytes ≈ 200 KB, well above 64 KiB). When it breaks, the result is:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it know where the tile data is though? Does it fetch the offset separately?


1. A 64 KiB chunk fetch near the tile, populated into the header cache, evicting metadata.
2. The actual tile fetch via `dataSource` still has to fetch the whole tile.

So the optimization actively hurts. Setting `tiff.options = undefined` after `Tiff.create()` removes it. `getTileSize` then always takes the explicit `TileOffsets` / `TileByteCounts` path, which goes through cogeotiff's lazy per-entry mechanism — served by our header cache, never touching tile data. cogeotiff core only reads `tiff.options` from this one location, so no other behavior is affected.

### Why separate `dataSource` and `headerSource`?

The split (already present in our `GeoTIFF.fromUrl`) keeps tile data out of the header cache:

- `dataSource` = raw `SourceHttp` — used by [`packages/geotiff/src/fetch.ts`](../../packages/geotiff/src/fetch.ts) for tile data reads via `geotiff.dataSource.fetch(...)`. No caching, no chunking. Each tile is one HTTP range request.
- `headerSource` = the wrapped `SourceView` — passed to `Tiff.create()`. All of cogeotiff's reads (IFD parsing, lazy tag fetches, lazy per-tile offset/bytecount entries) go through this. Block-cached.

### Why drop the eager `TileOffsets`/`TileByteCounts` prefetch?

`prefetchTags` currently bulk-fetches both arrays for the primary image. On a 200 GB COG with millions of tiles, that array alone is ~8 MB. The deferred approach lets cogeotiff lazy-fetch individual entries through the block cache; adjacent entries in a viewport hit one block.

Other tags in `prefetchTags` stay — they're small and needed to decode tiles:

- `SamplesPerPixel`, `BitsPerSample`, `SampleFormat`
- `Photometric`, `Predictor`, `PlanarConfiguration`
- `ColorMap` (for paletted)
- `GdalNoData`, `GdalMetadata`
- `LercParameters` (for LERC compression)

These tag values are typically <10 KB total per IFD. Loading them at open lets us return a fully-formed `GeoTIFF` without per-tile latency for tag lookups.

## What gets removed

Compared to the unmerged PR [#509](https://github.com/developmentseed/deck.gl-raster/pull/509):

- `packages/geotiff/src/source/readahead-cache.ts` (entire file — `SequentialBlockCache`, `SourceReadaheadCache`, `freeze()` lifecycle).
- `packages/geotiff/src/source/concurrency.ts` (`mutex()` helper — no longer needed).
- `packages/geotiff/src/source/` directory itself (becomes empty).
- `Overview.ensureTagsLoaded()` bulk-prefetch path.
- The `prefetch`, `multiplier`, `maxGap` options on `GeoTIFF.fromUrl`.

Net code change versus current `main`: small. We're adding ~5 lines to `fromUrl`, dropping 2 lines from `prefetchTags`, and undoing the `[SourceChunk, SourceCache]` → `[SourceReadaheadCache]` replacement that PR #509 made. Nothing more.

## API

`GeoTIFF.fromUrl(url, options)` signature:

```ts
static async fromUrl(
url: string | URL,
options: {
/** AbortSignal for the header reads. */
signal?: AbortSignal;
/** Bytes per chunk for the header cache. Defaults to 64 KiB. */
chunkSize?: number;
/** Total cache size in bytes. Defaults to 8 MiB. */
cacheSize?: number;
} = {},
): Promise<GeoTIFF>
```

`chunkSize` and `cacheSize` are kept exposed (vs. hidden) because the previous design exposed similar knobs and removing all of them is gratuitously breaking. Defaults are tuned for the typical case; users almost never need to touch them.

## Tests

- **Unit:** `prefetchTags` no longer fetches `TileOffsets` / `TileByteCounts` (add an assertion against the existing test that exercises this path; verify the returned `CachedTags` has `tileOffsets: undefined` / `tileByteCounts: undefined` or removes those fields).
- **Integration:** open a fixture through `SourceFile` + the same `[SourceChunk, SourceCache]` stack used by `fromUrl`, verify it works end-to-end (read width/height/transform, fetch a tile).
- **Integration:** open a fixture, then disable `tiff.options`; assert that `image.getTileSize(0)` takes the `TileOffsets`/`TileByteCounts` path. (Indirect: count underlying source fetches and verify the leader-bytes 4-byte read does not appear.)
- **Regression:** `fromurl.test.ts` (the #524 workaround test) still passes after the option-shape change.

## Out of scope

- **Background pre-warming** of unvisited overviews. Easy to layer on later (call `image.fetch(TiffTag.TileOffsets)` from a `requestIdleCallback`).
- **Custom block-cache middleware.** `SourceChunk` + `SourceCache` from `@chunkd/middleware` is sufficient. No reason to roll our own.
- **Tunable cache replacement policy.** `SourceCache`'s two-generation eviction is good enough for now.

## References

- Reference implementation: [geotiff.js `BlockedSource`](https://github.com/geotiffjs/geotiff.js/blob/master/src/source/blockedsource.js)
- cogeotiff `getTileSize`: [`tiff.image.ts:568-596`](https://github.com/blacha/cogeotiff/blob/c489ebab2136a779a705bf1dedebbc250e17a747/packages/core/src/tiff.image.ts#L568-L596)
- cogeotiff `ImportantTags` (auto-loaded by `init(true)`): in `@cogeotiff/core/build/tiff.image.js:8-17`
- Previous design (superseded): [`2026-05-05-geotiff-readahead-cache-design.md`](2026-05-05-geotiff-readahead-cache-design.md)
13 changes: 1 addition & 12 deletions examples/vermont-cog-comparison/src/App.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -334,19 +334,8 @@ export default function App() {
inFlightRef.current.add(url);
void (async () => {
try {
// Pad each tunable to (the file's known header size) OR a
// generic 16 MB default for files that haven't been measured.
// Vermont COGs scale wildly (3-band 30 cm = 60 MB header,
// 1-band yearly = ~3 MB), so a per-file value is a big win.
// - prefetch sizes the initial Tiff read,
// - chunkSize >= prefetch so the read fits in one source chunk
// (otherwise SourceChunk splits it into chunkSize-aligned pieces).
// - cacheSize >= chunkSize to actually retain the header chunk.
const headerBytes = file.headerByteLength ?? 16 * 1024 * 1024;
const gt = await GeoTIFF.fromUrl(url, {
chunkSize: headerBytes,
cacheSize: Math.max(headerBytes, 16 * 1024 * 1024),
prefetch: headerBytes,
cacheSize: 64 * 1024 * 1024,
});
setGeotiffs((prev) => new Map(prev).set(url, gt));
} catch (err) {
Expand Down
68 changes: 0 additions & 68 deletions examples/vermont-cog-comparison/src/vt-imagery.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,6 @@ export type VTFile = {
bands: BandCount;
/** Group identifier used for `<optgroup>` separation in the UI. */
category: FileCategory;
/**
* Total bytes occupied by the TIFF header (= offset where tile data begins).
*
* When known, lets us request the exact range in a single HTTP call when
* opening the COG. When `undefined` the loader falls back to a generic
* default. Measure with `GeoTIFF.getHeaderByteLength()` and add to
* {@link HEADER_BYTE_LENGTHS} below.
*/
headerByteLength?: number;
};

/**
Expand Down Expand Up @@ -99,64 +90,6 @@ const YEARLY_FILENAMES = [
"1994_50cm_LeafOFF_1Band.tif",
] as const;

/**
* Measured header byte lengths for files we've opened. Pulled from the
* `GeoTIFF.getHeaderByteLength()` log; see [README](../README.md). Add
* an entry whenever a new file's header length is observed so the loader
* can fetch the exact range in one request.
*
* Keyed by VTFile.id (filename without `.tif`).
*/
const HEADER_BYTE_LENGTHS: Record<string, number> = {
STATEWIDE_2025_30cm_LeafON_3Band: 60_998_796,
STATEWIDE_2024_30cm_LeafOFF_4Band: 27_985_606,
STATEWIDE_2023_30cm_LeafON_4Band: 29_757_726,
"STATEWIDE_2021-2022_30cm_LeafOFF_4Band": 27_985_604,
STATEWIDE_2021_60cm_LeafON_4Band: 7_445_474,
"STATEWIDE_2011-2015_50cm_LeafOFF_4Band": 10_014_005,
"STATEWIDE_2006-2010_50cm_LeafOFF_1Band": 10_012_861,
"STATEWIDE_1994-2000_50cm_LeafOFF_1Band": 10_012_861,
"STATEWIDE_1974-1992_100cm_LeafOFF_1Band": 3_507_513,
"2023_15cm_LeafOFF_4Band": 73_284_297,
"2022_30cm_LeafOFF_4Band": 18_243_651,
"2021_30cm_LeafOFF_4Band": 5_955_273,
"2020_15cm_LeafOFF_4Band": 10_516_770,
"2019_30cm_LeafOFF_4Band": 26_110_689,
"2019_15cm_LeafOFF_4Band": 28_574_141,
"2018_30cm_LeafOFF_4Band": 12_708_843,
"2018_15cm_LeafOFF_4Band": 41_931_939,
"2017_30cm_LeafOFF_4Band": 3_127_309,
"2017_15cm_LeafOFF_4Band": 4_943_211,
"2016-2019_30cm_LeafOFF_4Band": 27_985_838,
"2016_30cm_LeafOFF_4Band": 4_399_707,
"2016_15cm_LeafOFF_4Band": 14_061_893,
"2015_50cm_LeafOFF_4Band": 1_124_768,
"2014_50cm_LeafOFF_4Band": 1_657_062,
"2013_50cm_LeafOFF_4Band": 1_492_374,
"2013_30cm_LeafOFF_4Band": 563_878,
"2013_20cm_LeafOFF_4Band": 934_966,
"2013_15cm_LeafOFF_4Band": 5_870_274,
"2012_50cm_LeafOFF_4Band": 2_283_359,
"2011_50cm_LeafOFF_4Band": 1_539_159,
"2010_50cm_LeafOFF_1Band": 1_122_683,
"2009_50cm_LeafOFF_1Band": 753_547,
"2009_30cm_LeafOFF_3Band": 477_338,
"2008_50cm_LeafOFF_1Band": 1_107_847,
"2008_30cm_LeafON_3Band": 7_464_402,
"2007_50cm_LeafOFF_1Band": 1_210_867,
"2006_50cm_LeafOFF_1Band": 3_188_535,
"2006_15cm_LeafOFF_3Band": 544_454,
"2004_16cm_LeafOFF_3Band": 5_226_902,
"2001_15cm_LeafOFF_3Band": 100_696,
"2000_50cm_LeafOFF_1Band": 1_124_491,
"1999_50cm_LeafOFF_1Band": 3_137_297,
"1998_50cm_LeafOFF_1Band": 665_395,
"1998_13cm_LeafOFF_1Band": 35_737,
"1996_50cm_LeafOFF_1Band": 749_887,
"1995_50cm_LeafOFF_1Band": 2_537_357,
"1994_50cm_LeafOFF_1Band": 1_538_035,
};

const FILENAME_PATTERN =
/^(STATEWIDE_)?(\d{4}(?:-\d{4})?)_(\d+)cm_Leaf(OFF|ON)_(\d)Band\.tif$/;

Expand All @@ -182,7 +115,6 @@ function parseVTFilename(filename: string, category: FileCategory): VTFile {
url: `${BASE_URL}/${filename}`,
bands: bandCount,
category,
headerByteLength: HEADER_BYTE_LENGTHS[id],
};
}

Expand Down
Loading