From 694ced58e2c7cec66e81441e18cdec88a4ab4eba Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 13:23:22 -0400 Subject: [PATCH 01/12] docs: spec for COG block-aligned header cache Supersedes the earlier read-ahead cache design. Uses chunkd's existing SourceChunk + SourceCache (64 KiB blocks, 8 MiB LRU) instead of a custom sequential cache; drops the eager TileOffsets/TileByteCounts prefetch in favor of cogeotiff's lazy per-entry reads through the block cache; disables cogeotiff's GDAL leader-bytes path so the header cache stays free of image-data bytes. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-05-12-cog-block-cache-design.md | 128 ++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 dev-docs/specs/2026-05-12-cog-block-cache-design.md diff --git a/dev-docs/specs/2026-05-12-cog-block-cache-design.md b/dev-docs/specs/2026-05-12-cog-block-cache-design.md new file mode 100644 index 00000000..1ee06ca4 --- /dev/null +++ b/dev-docs/specs/2026-05-12-cog-block-cache-design.md @@ -0,0 +1,128 @@ +# COG block-aligned header cache + +**Date:** 2026-05-12 +**Issue:** [#500](https://github.com/developmentseed/deck.gl-raster/issues/500) +**Status:** Design — supersedes [`2026-05-05-geotiff-readahead-cache-design.md`](2026-05-05-geotiff-readahead-cache-design.md) (and the unmerged PR [#509](https://github.com/developmentseed/deck.gl-raster/pull/509)). + +## Background + +The original design (the sequential exponential read-ahead cache, then a frozen-after-open variant) tried to optimize *steady-state* tile rendering by bulk-loading `TileOffsets` / `TileByteCounts` arrays for each IFD. That moved the cost to *open time*. On a real 200 GB Vermont COG, that's tens of MB downloaded before any tile renders — even though the initial view is at an overview level whose primary-image arrays are never used. + +geotiff.js takes the opposite approach. Each `fromUrl` call fetches just 1024 bytes (header + first IFD pointer). `getImage(i)` reads only that IFD's entries; tile-array values are wrapped in a `DeferredArray` that holds only their file offset + count. Per-tile reads fetch a single 4–8 byte entry through `BlockedSource`, a fixed-block LRU that coalesces adjacent entries into one block. The block cache lives inside the source layer; cogeotiff's lazy per-entry reads benefit from it automatically. + +That design lines up with how cogeotiff was built to be used — `image.init(true)` already loads only "important tags" (dimensions, tile size, georeferencing, GeoKeys) and defers everything else. + +## Goals + +1. **Low first-paint latency on huge COGs.** Opening a 200 GB COG should make ~one HTTP request, not tens of MB worth. +2. **Bounded steady-state cost per tile.** After warmup, per-tile metadata reads should be effectively free (served from block cache). +3. **Bounded memory.** The cache must evict; never grow without bound. +4. **No header / tile cache crossover.** Tile data bytes must not pollute the header cache. Header bytes must not have to share space with tile data. + +## Solution + +Use chunkd's built-in `SourceChunk` + `SourceCache` middleware with a fixed 64 KiB block size and an LRU-ish cache. Drop all the bespoke read-ahead machinery from the previous design. + +```ts +const source = new SourceHttp(url); +source.metadata = { size: Infinity }; // #524 workaround +const view = new SourceView(source, [ + new SourceChunk({ size: 64 * 1024 }), + new SourceCache({ size: 8 * 1024 * 1024 }), // ~128 blocks +]); + +const tiff = await Tiff.create(view, { signal }); +tiff.options = undefined; // disable leader-bytes path +``` + +### Why fixed 64 KiB blocks? + +- Matches geotiff.js's default. Proven in practice across the GeoTIFF ecosystem. +- One block holds ~8000 BigTIFF tile-offset entries (8 bytes each) or ~16000 classic-TIFF entries (4 bytes each). A viewport's worth of adjacent tile lookups almost always hits a single cached block. +- No tunable that has to be right per file. Pathological cases (huge metadata regions, far-offset probes) all degrade gracefully — they just cost more block fetches. + +### Why LRU eviction? + +The previous design's sequential cache *never evicted*. For long-running sessions or large files, that's a memory leak. `SourceCache` is a two-generation cache (cacheA flips to cacheB on overflow, cacheB drops) — not strict LRU but bounded and approximately recency-aware in practice. + +### Why disable cogeotiff's leader-bytes path? + +cogeotiff auto-detects the GDAL ghost option `BLOCK_LEADER=SIZE_AS_UINT4` at `Tiff.create()` time. If present, `TiffImage.getTileSize()` skips the `TileByteCounts` lookup and instead fetches 4 bytes just before the tile data. The comment in cogeotiff explains the intent: *"This fetch will generally load in the bytes needed for the image too provided the image size is less than the size of a chunk."* But that assumption breaks for tiles larger than the block size (very common — many COG tiles are 256×256×3 bytes ≈ 200 KB, well above 64 KiB). When it breaks, the result is: + +1. A 64 KiB chunk fetch near the tile, populated into the header cache, evicting metadata. +2. The actual tile fetch via `dataSource` still has to fetch the whole tile. + +So the optimization actively hurts. Setting `tiff.options = undefined` after `Tiff.create()` removes it. `getTileSize` then always takes the explicit `TileOffsets` / `TileByteCounts` path, which goes through cogeotiff's lazy per-entry mechanism — served by our header cache, never touching tile data. cogeotiff core only reads `tiff.options` from this one location, so no other behavior is affected. + +### Why separate `dataSource` and `headerSource`? + +The split (already present in our `GeoTIFF.fromUrl`) keeps tile data out of the header cache: + +- `dataSource` = raw `SourceHttp` — used by [`packages/geotiff/src/fetch.ts`](../../packages/geotiff/src/fetch.ts) for tile data reads via `geotiff.dataSource.fetch(...)`. No caching, no chunking. Each tile is one HTTP range request. +- `headerSource` = the wrapped `SourceView` — passed to `Tiff.create()`. All of cogeotiff's reads (IFD parsing, lazy tag fetches, lazy per-tile offset/bytecount entries) go through this. Block-cached. + +### Why drop the eager `TileOffsets`/`TileByteCounts` prefetch? + +`prefetchTags` currently bulk-fetches both arrays for the primary image. On a 200 GB COG with millions of tiles, that array alone is ~8 MB. The deferred approach lets cogeotiff lazy-fetch individual entries through the block cache; adjacent entries in a viewport hit one block. + +Other tags in `prefetchTags` stay — they're small and needed to decode tiles: + +- `SamplesPerPixel`, `BitsPerSample`, `SampleFormat` +- `Photometric`, `Predictor`, `PlanarConfiguration` +- `ColorMap` (for paletted) +- `GdalNoData`, `GdalMetadata` +- `LercParameters` (for LERC compression) + +These tag values are typically <10 KB total per IFD. Loading them at open lets us return a fully-formed `GeoTIFF` without per-tile latency for tag lookups. + +## What gets removed + +Compared to the unmerged PR [#509](https://github.com/developmentseed/deck.gl-raster/pull/509): + +- `packages/geotiff/src/source/readahead-cache.ts` (entire file — `SequentialBlockCache`, `SourceReadaheadCache`, `freeze()` lifecycle). +- `packages/geotiff/src/source/concurrency.ts` (`mutex()` helper — no longer needed). +- `packages/geotiff/src/source/` directory itself (becomes empty). +- `Overview.ensureTagsLoaded()` bulk-prefetch path. +- The `prefetch`, `multiplier`, `maxGap` options on `GeoTIFF.fromUrl`. + +Net code change versus current `main`: small. We're adding ~5 lines to `fromUrl`, dropping 2 lines from `prefetchTags`, and undoing the `[SourceChunk, SourceCache]` → `[SourceReadaheadCache]` replacement that PR #509 made. Nothing more. + +## API + +`GeoTIFF.fromUrl(url, options)` signature: + +```ts +static async fromUrl( + url: string | URL, + options: { + /** AbortSignal for the header reads. */ + signal?: AbortSignal; + /** Bytes per chunk for the header cache. Defaults to 64 KiB. */ + chunkSize?: number; + /** Total cache size in bytes. Defaults to 8 MiB. */ + cacheSize?: number; + } = {}, +): Promise +``` + +`chunkSize` and `cacheSize` are kept exposed (vs. hidden) because the previous design exposed similar knobs and removing all of them is gratuitously breaking. Defaults are tuned for the typical case; users almost never need to touch them. + +## Tests + +- **Unit:** `prefetchTags` no longer fetches `TileOffsets` / `TileByteCounts` (add an assertion against the existing test that exercises this path; verify the returned `CachedTags` has `tileOffsets: undefined` / `tileByteCounts: undefined` or removes those fields). +- **Integration:** open a fixture through `SourceFile` + the same `[SourceChunk, SourceCache]` stack used by `fromUrl`, verify it works end-to-end (read width/height/transform, fetch a tile). +- **Integration:** open a fixture, then disable `tiff.options`; assert that `image.getTileSize(0)` takes the `TileOffsets`/`TileByteCounts` path. (Indirect: count underlying source fetches and verify the leader-bytes 4-byte read does not appear.) +- **Regression:** `fromurl.test.ts` (the #524 workaround test) still passes after the option-shape change. + +## Out of scope + +- **Background pre-warming** of unvisited overviews. Easy to layer on later (call `image.fetch(TiffTag.TileOffsets)` from a `requestIdleCallback`). +- **Custom block-cache middleware.** `SourceChunk` + `SourceCache` from `@chunkd/middleware` is sufficient. No reason to roll our own. +- **Tunable cache replacement policy.** `SourceCache`'s two-generation eviction is good enough for now. + +## References + +- Reference implementation: [geotiff.js `BlockedSource`](https://github.com/geotiffjs/geotiff.js/blob/master/src/source/blockedsource.js) +- cogeotiff `getTileSize`: [`tiff.image.ts:568-596`](https://github.com/blacha/cogeotiff/blob/c489ebab2136a779a705bf1dedebbc250e17a747/packages/core/src/tiff.image.ts#L568-L596) +- cogeotiff `ImportantTags` (auto-loaded by `init(true)`): in `@cogeotiff/core/build/tiff.image.js:8-17` +- Previous design (superseded): [`2026-05-05-geotiff-readahead-cache-design.md`](2026-05-05-geotiff-readahead-cache-design.md) From cbce75fa841db7676be617a4752374a8fbf3e168 Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 13:27:11 -0400 Subject: [PATCH 02/12] refactor(geotiff): drop eager TileOffsets/TileByteCounts prefetch cogeotiff lazily fetches individual entries from these arrays via the header source on first access. With a block-aligned header cache (next commit), adjacent per-tile lookups hit the same 64 KiB block. The eager bulk fetch downloaded tens of MB on huge COGs (e.g. Vermont) before any tile could render, all of which was wasted work when the initial view was at an overview level that didn't use the primary image's arrays. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/geotiff/src/ifd.ts | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/packages/geotiff/src/ifd.ts b/packages/geotiff/src/ifd.ts index c15c4724..1d5776b7 100644 --- a/packages/geotiff/src/ifd.ts +++ b/packages/geotiff/src/ifd.ts @@ -18,8 +18,6 @@ export interface CachedTags { predictor: Predictor; sampleFormat: TiffTagType[TiffTag.SampleFormat]; samplesPerPixel: TiffTagType[TiffTag.SamplesPerPixel]; - tileByteCounts: TiffTagType[TiffTag.TileByteCounts] | null; - tileOffsets: TiffTagType[TiffTag.TileOffsets] | null; } /** Pre-fetch TIFF tags for easier visualization. */ @@ -48,8 +46,6 @@ export async function prefetchTags( predictor, sampleFormat, samplesPerPixel, - tileByteCounts, - tileOffsets, ] = await Promise.all([ image.fetch(TiffTag.BitsPerSample, { signal }), image.fetch(TiffTag.ColorMap, { signal }), @@ -64,11 +60,6 @@ export async function prefetchTags( image.fetch(TiffTag.Predictor, { signal }), image.fetch(TiffTag.SampleFormat, { signal }), image.fetch(TiffTag.SamplesPerPixel, { signal }), - // Pre-fetch tile offsets and byte counts. If we don't prefetch them, - // TiffImage.getTileSize will have to fetch them for each tile, which - // results in many redundant requests. - image.fetch(TiffTag.TileByteCounts, { signal }), - image.fetch(TiffTag.TileOffsets, { signal }), ]); const missingTag: (tagName: string) => never = (tagName: string) => { @@ -108,8 +99,6 @@ export async function prefetchTags( // https://web.archive.org/web/20240329145340/https://www.awaresystems.be/imaging/tiff/tifftags/sampleformat.html sampleFormat: sampleFormat ?? [SampleFormat.Uint], samplesPerPixel, - tileByteCounts, - tileOffsets, }; } From 8930919377d15f0d9ad40eb1b19b420068d4c60a Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 13:28:06 -0400 Subject: [PATCH 03/12] feat(geotiff): disable cogeotiff leader-bytes path so header cache stays metadata-only cogeotiff auto-detects GDAL ghost option BLOCK_LEADER=SIZE_AS_UINT4 at Tiff.create() time. When set, TiffImage.getTileSize fetches 4 bytes near the tile data instead of reading TileByteCounts. The intent is that the fetch's chunk also contains the tile, but tiles are often larger than the chunk size, so the optimization pollutes the header cache with image-data chunks and evicts metadata. cogeotiff core only reads tiff.options here; nulling it after creation is safe. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/geotiff/src/geotiff.ts | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/packages/geotiff/src/geotiff.ts b/packages/geotiff/src/geotiff.ts index 513681bc..e564de7d 100644 --- a/packages/geotiff/src/geotiff.ts +++ b/packages/geotiff/src/geotiff.ts @@ -111,6 +111,14 @@ export class GeoTIFF { defaultReadSize: prefetch, signal, }); + // Disable cogeotiff's GDAL leader-bytes path so `TiffImage.getTileSize` + // always reads from TileOffsets/TileByteCounts through the header source. + // The leader-bytes optimization assumes a tile fits in one chunk, which + // breaks for typical 256x256x3 tiles (~200 KB) vs. our 64 KiB blocks. + // Without this, the leader read pulls image-data bytes into the header + // cache, evicting metadata. cogeotiff core only reads `tiff.options` in + // that one path, so nulling it here is safe. + tiff.options = undefined; return GeoTIFF.fromTiff(tiff, dataSource, { signal }); } From bf724c71d86c7bc156014c7b58f5b64d8e563e9d Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 13:29:20 -0400 Subject: [PATCH 04/12] feat(geotiff)!: 64 KiB block-aligned LRU header cache in fromUrl Replaces the per-call prefetch tuning with a fixed-block cache matching geotiff.js's BlockedSource. cogeotiff's lazy per-entry reads now hit a shared 64 KiB block when adjacent (the typical case for tile-offset arrays). LRU eviction keeps memory bounded at 8 MiB by default. Breaking: drops the `prefetch` option on `GeoTIFF.fromUrl`. `prefetch` remains available on `GeoTIFF.open` for direct callers that need to control cogeotiff's defaultReadSize. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/geotiff/src/geotiff.ts | 43 ++++++++++++++------------------- 1 file changed, 18 insertions(+), 25 deletions(-) diff --git a/packages/geotiff/src/geotiff.ts b/packages/geotiff/src/geotiff.ts index e564de7d..d0d71d13 100644 --- a/packages/geotiff/src/geotiff.ts +++ b/packages/geotiff/src/geotiff.ts @@ -237,25 +237,28 @@ export class GeoTIFF { /** * Create a new GeoTIFF from a URL. * + * Wraps the HTTP source with a fixed-size block-aligned LRU cache tuned for + * TIFF metadata. cogeotiff's lazy per-entry reads (for tile offsets, byte + * counts, and other tag values) are served by the block cache; adjacent + * entries within a single block hit one underlying request. Tile data reads + * bypass the cache and go straight to the raw HTTP source. + * * @param url The URL of the GeoTIFF to open. - * @param options Optional parameters for chunk size and cache size. - * @param options.chunkSize The minimum size for each request made to the source while reading header metadata. Defaults to 32KB. - * @param options.cacheSize The size of the cache for recently accessed header chunks. Currently no caching is applied to data fetches. Defaults to 1MB. - * @param options.prefetch Number of bytes to prefetch when reading TIFF tags and IFDs. Defaults to 32KB, which is enough for most tags and small IFDs. Increase if you have many tags or large IFDs. + * @param options Optional parameters. + * @param options.chunkSize Bytes per chunk for the header cache. Defaults to 64 KiB (matches geotiff.js's BlockedSource). + * @param options.cacheSize Total cache size in bytes. Defaults to 8 MiB (~128 blocks at the default chunk size). * @param options.signal An optional {@link AbortSignal} to cancel the header reads. * @returns A Promise that resolves to a GeoTIFF instance. */ static async fromUrl( url: string | URL, { - chunkSize = 1024 * 1024, - cacheSize = 10 * 1024 * 1024, - prefetch = 32 * 1024, + chunkSize = 64 * 1024, + cacheSize = 8 * 1024 * 1024, signal, }: { chunkSize?: number; cacheSize?: number; - prefetch?: number; signal?: AbortSignal; } = {}, ): Promise { @@ -269,33 +272,23 @@ export class GeoTIFF { // In a browser, `Content-Range` is only readable when the server lists it in // `Access-Control-Expose-Headers` (S3 does not by default), so the // `Content-Length` fallback — the length of a single *chunk*, not the file — - // gets recorded as the file size. `@chunkd/middleware`'s chunk layer then - // rejects any later read past that bogus size with - // "SourceError: Request outside of bounds". + // gets recorded as the file size. Reads past that bogus size would then be + // rejected as out-of-bounds. // // Seed `metadata` ourselves so `SourceHttp` never records a size (it only // fills in `metadata` while it is still null), treating the source as having // unbounded length. Remove once the upstream fix lands. source.metadata = { size: Number.POSITIVE_INFINITY }; - // Figure out optimal defaults in light of - // https://github.com/blacha/cogeotiff/issues/1431 - // Defaulting to 32KB chunks is too small for tile data. - // https://github.com/developmentseed/deck.gl-raster/issues/294 - - // read files in chunks - const chunk = new SourceChunk({ size: chunkSize }); - // 10MB cache for recently accessed chunks - const cache = new SourceCache({ size: cacheSize }); - - const view = new SourceView(source, [chunk, cache]); + const view = new SourceView(source, [ + new SourceChunk({ size: chunkSize }), + new SourceCache({ size: cacheSize }), + ]); return await GeoTIFF.open({ - // Use raw source for tile data to avoid unnecessary copying through the - // cache and chunk layers. + // Tile data reads bypass the header cache (raw source). dataSource: source, headerSource: view, - prefetch, signal, }); } From 5a676deb0abf6375751d75be1a74ebe7cc50eae3 Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 13:30:12 -0400 Subject: [PATCH 05/12] test(geotiff): integration test for block-aligned header cache Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/geotiff/tests/block-cache.test.ts | 86 ++++++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 packages/geotiff/tests/block-cache.test.ts diff --git a/packages/geotiff/tests/block-cache.test.ts b/packages/geotiff/tests/block-cache.test.ts new file mode 100644 index 00000000..a9adb529 --- /dev/null +++ b/packages/geotiff/tests/block-cache.test.ts @@ -0,0 +1,86 @@ +import { SourceCache, SourceChunk } from "@chunkd/middleware"; +import type { Source } from "@chunkd/source"; +import { SourceView } from "@chunkd/source"; +import { SourceFile } from "@chunkd/source-file"; +import { describe, expect, it } from "vitest"; +import { GeoTIFF } from "../src/geotiff.js"; +import { fixturePath } from "./helpers.js"; + +/** Wrap a Source to record every underlying fetch (offset + length). */ +function instrument(source: Source): { + source: Source; + fetches: () => Array<{ offset: number; length: number | undefined }>; +} { + const log: Array<{ offset: number; length: number | undefined }> = []; + const wrapped: Source = { + type: source.type, + url: source.url, + metadata: source.metadata, + head: source.head.bind(source), + fetch: async (offset, length, options) => { + log.push({ offset, length }); + return source.fetch(offset, length, options); + }, + }; + return { source: wrapped, fetches: () => log }; +} + +describe("block-aligned header cache", () => { + const path = fixturePath("uint8_rgb_deflate_block64_cog", "rasterio"); + + it("opens a fixture through SourceChunk + SourceCache", async () => { + const file = new SourceFile(path); + const { source, fetches } = instrument(file); + const view = new SourceView(source, [ + new SourceChunk({ size: 64 * 1024 }), + new SourceCache({ size: 8 * 1024 * 1024 }), + ]); + + const tiff = await GeoTIFF.open({ + dataSource: file, + headerSource: view, + }); + + expect(tiff.width).toBeGreaterThan(0); + expect(tiff.height).toBeGreaterThan(0); + expect(fetches().length).toBeGreaterThan(0); + + // Every underlying fetch must be aligned to 64 KiB boundaries because + // SourceChunk pads requests up to chunkSize. + for (const { offset, length } of fetches()) { + expect(offset % (64 * 1024)).toBe(0); + if (length !== undefined) { + expect(length).toBeLessThanOrEqual(64 * 1024); + } + } + }); + + it("does not pull image-data bytes through the header cache after open", async () => { + // Tiff.create() in GeoTIFF.open disables `tiff.options`, so getTileSize + // takes the explicit TileOffsets/TileByteCounts path — not the + // leader-bytes path that would fetch 4 bytes adjacent to image data. + const file = new SourceFile(path); + const { source, fetches } = instrument(file); + const view = new SourceView(source, [ + new SourceChunk({ size: 64 * 1024 }), + new SourceCache({ size: 8 * 1024 * 1024 }), + ]); + + const tiff = await GeoTIFF.open({ + dataSource: file, + headerSource: view, + }); + + expect(tiff.tiff.options).toBeUndefined(); + + const fetchesAfterOpen = fetches().length; + + // Trigger a tile metadata lookup through cogeotiff's lazy path. + await tiff.image.getTileSize(0); + + // At most 2 new underlying chunk fetches (TileOffsets + TileByteCounts + // blocks); often 0 if the relevant chunks are already cached from open. + const newFetches = fetches().slice(fetchesAfterOpen); + expect(newFetches.length).toBeLessThanOrEqual(2); + }); +}); From 9dd030d757b0e441f5eb9923e977195b89e846f6 Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 13:44:02 -0400 Subject: [PATCH 06/12] refactor(geotiff): rename source -> dataSource in tile-fetch path; explicit one-block prefetch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses PR review feedback on #529: - Rename the `source` parameter on the vendored getTile/getBytes helpers to `dataSource`. Functionally unchanged — every caller already passes `self.dataSource` — but the new name makes it impossible to confuse with the header source that cogeotiff uses internally for the TileOffsets/TileByteCounts lookups. - Expand the doc comments on those helpers to explain the header-vs-data split explicitly. - Pass `prefetch: chunkSize` from `GeoTIFF.fromUrl` to `GeoTIFF.open`, so the very first cogeotiff read is exactly one block. SourceChunk would pad it anyway, but being explicit keeps the intent local. - Update the spec to clarify how per-tile offset/bytecount lookups work and note that we explicitly fetch a full block on the first read (cogeotiff's DefaultReadSize is 16 KiB). - Add a TODO referencing the upstream issue (to be filed) tracking a cleaner opt-out for cogeotiff's GDAL leader-bytes path. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-05-12-cog-block-cache-design.md | 4 +- packages/geotiff/src/fetch.ts | 47 ++++++++++--------- packages/geotiff/src/geotiff.ts | 7 +++ 3 files changed, 36 insertions(+), 22 deletions(-) diff --git a/dev-docs/specs/2026-05-12-cog-block-cache-design.md b/dev-docs/specs/2026-05-12-cog-block-cache-design.md index 1ee06ca4..b69b762c 100644 --- a/dev-docs/specs/2026-05-12-cog-block-cache-design.md +++ b/dev-docs/specs/2026-05-12-cog-block-cache-design.md @@ -8,7 +8,9 @@ The original design (the sequential exponential read-ahead cache, then a frozen-after-open variant) tried to optimize *steady-state* tile rendering by bulk-loading `TileOffsets` / `TileByteCounts` arrays for each IFD. That moved the cost to *open time*. On a real 200 GB Vermont COG, that's tens of MB downloaded before any tile renders — even though the initial view is at an overview level whose primary-image arrays are never used. -geotiff.js takes the opposite approach. Each `fromUrl` call fetches just 1024 bytes (header + first IFD pointer). `getImage(i)` reads only that IFD's entries; tile-array values are wrapped in a `DeferredArray` that holds only their file offset + count. Per-tile reads fetch a single 4–8 byte entry through `BlockedSource`, a fixed-block LRU that coalesces adjacent entries into one block. The block cache lives inside the source layer; cogeotiff's lazy per-entry reads benefit from it automatically. +geotiff.js takes the opposite approach. Each `fromUrl` call requests just 1024 bytes (header + first IFD pointer), but its `BlockedSource` pads this up to one 64 KiB block — so the first underlying HTTP request is a full block, not 1024 bytes. `getImage(i)` reads only that IFD's entries; tile-array values are wrapped in a `DeferredArray` that holds only their file offset + count. When a tile is requested, `DeferredArray.get(index)` fetches the specific 4–8 byte offset entry (and separately the byte-count entry) through the same block-aligned `BlockedSource` — adjacent entries from the same array all live in the same 64 KiB block, so per-tile lookups cost ~0 HTTP requests after the first one in a region. The block cache lives inside the source layer; cogeotiff's lazy per-entry reads benefit from it automatically. + +Our implementation uses the same shape: cogeotiff's `image.getTileSize(idx)` reads `TileOffsets[idx]` and `TileByteCounts[idx]` lazily through the wrapped header source, and our chunk cache turns those 4–8 byte reads into shared 64 KiB block fetches. We explicitly pass `prefetch: chunkSize` to `Tiff.create` so the very first byte read is a full block too (cogeotiff's default `DefaultReadSize` is only 16 KiB). That design lines up with how cogeotiff was built to be used — `image.init(true)` already loads only "important tags" (dimensions, tile size, georeferencing, GeoKeys) and defers everything else. diff --git a/packages/geotiff/src/fetch.ts b/packages/geotiff/src/fetch.ts index 60fb2dab..db378ae5 100644 --- a/packages/geotiff/src/fetch.ts +++ b/packages/geotiff/src/fetch.ts @@ -298,23 +298,24 @@ async function fetchBandSeparateTileBytes( } /** - * Load a tile into a ArrayBuffer + * Load a tile into an ArrayBuffer. * - * if the tile compression is JPEG, This will also apply the JPEG compression tables to the resulting ArrayBuffer see {@link getJpegHeader} + * If the tile compression is JPEG, this will also apply the JPEG compression + * tables to the resulting ArrayBuffer (see `image.getJpegHeader`). * - * Though this function lives upstream in @cogeotiff/core, we vendor it here so - * that we can use a custom fetch. - * - * This is to separate the source used for fetching header/IFD data (which is - * typically small and benefits from caching) from the source used for fetching - * tile data (which can be large and should avoid unnecessary copying through - * cache layers). + * Though this function lives upstream in @cogeotiff/core, we vendor it here + * so we can route the *tile data* read through a separate source. The tile's + * byte range is looked up via `image.getTileSize(idx)`, which inside cogeotiff + * uses the source that was passed to `Tiff.create` (our header source — cached + * for small repeated reads). The actual tile bytes are then fetched from + * `dataSource`, which is the raw HTTP source with no caching: tile data is + * large and read once, so caching it would just evict header metadata. */ async function getTile( image: TiffImage, x: number, y: number, - source: Pick, + dataSource: Pick, options?: { signal?: AbortSignal }, ): Promise<{ bytes: ArrayBuffer; @@ -344,26 +345,30 @@ async function getTile( ); } + // image.getTileSize() reads TileOffsets[idx] and TileByteCounts[idx] from + // the header source (cogeotiff's lazy per-entry path, served by the chunk + // cache). It does NOT read tile data — only the 4–8 byte offset/count + // entries. const { offset, imageSize } = await image.getTileSize(idx); - return getBytes(image, offset, imageSize, source, options); + // The actual tile bytes go through dataSource (uncached HTTP). + return getBytes(image, offset, imageSize, dataSource, options); } -/** Read image bytes at the given offset. - * - * Though this function lives upstream in @cogeotiff/core, we vendor it here so - * that we can use a custom fetch. +/** + * Read image bytes at the given offset from `dataSource`. * - * This is to separate the source used for fetching header/IFD data (which is - * typically small and benefits from caching) from the source used for fetching - * tile data (which can be large and should avoid unnecessary copying through - * cache layers). + * Though this function lives upstream in @cogeotiff/core, we vendor it here + * so we can route reads through the data source (uncached) rather than the + * header source (cached) that cogeotiff would use by default. Tile data is + * large and read once; caching it would evict header metadata and inflate + * memory. */ async function getBytes( image: TiffImage, offset: number, byteCount: number, - source: Pick, + dataSource: Pick, options?: { signal?: AbortSignal }, ): Promise<{ bytes: ArrayBuffer; @@ -373,7 +378,7 @@ async function getBytes( return null; } - const bytes = await source.fetch(offset, byteCount, options); + const bytes = await dataSource.fetch(offset, byteCount, options); if (bytes.byteLength < byteCount) { throw new Error( `Failed to fetch bytes from offset:${offset} wanted:${byteCount} got:${bytes.byteLength}`, diff --git a/packages/geotiff/src/geotiff.ts b/packages/geotiff/src/geotiff.ts index d0d71d13..bbe2e9ad 100644 --- a/packages/geotiff/src/geotiff.ts +++ b/packages/geotiff/src/geotiff.ts @@ -118,6 +118,9 @@ export class GeoTIFF { // Without this, the leader read pulls image-data bytes into the header // cache, evicting metadata. cogeotiff core only reads `tiff.options` in // that one path, so nulling it here is safe. + // + // TODO: replace this with a cleaner opt-out once upstream supports one + // (see https://github.com/blacha/cogeotiff/issues/ — TBD). tiff.options = undefined; return GeoTIFF.fromTiff(tiff, dataSource, { signal }); } @@ -289,6 +292,10 @@ export class GeoTIFF { // Tile data reads bypass the header cache (raw source). dataSource: source, headerSource: view, + // Read a full block on cogeotiff's first byte fetch. SourceChunk would + // pad a smaller request up to chunkSize anyway, but being explicit keeps + // the intent local and survives middleware changes. + prefetch: chunkSize, signal, }); } From 733efe63639d0b9b67ae7239127484665f63c510 Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 14:10:57 -0400 Subject: [PATCH 07/12] feat(geotiff): opt-in debug option to log dataSource fetches MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds `debug?: boolean` (off by default) to `GeoTIFF.open` and `GeoTIFF.fromUrl`. When enabled, the tile-fetch path logs each `dataSource.fetch` call to the console with a `data`/`mask` label, offset, and length. Useful for diagnosing per-request behavior against the browser network panel — e.g. surfacing the tiny mask-tile requests that motivated the option in the first place. Threaded through `HasTiffReference` so both `GeoTIFF` and `Overview` paths log. A future change can coalesce adjacent (data, mask) tile pairs into a single range request; until then, this option is the easiest way to observe the current behavior. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/geotiff/src/fetch.ts | 38 +++++++++++++++++++++++++++----- packages/geotiff/src/geotiff.ts | 33 +++++++++++++++++++++++---- packages/geotiff/src/overview.ts | 5 +++++ 3 files changed, 67 insertions(+), 9 deletions(-) diff --git a/packages/geotiff/src/fetch.ts b/packages/geotiff/src/fetch.ts index db378ae5..bfe1409c 100644 --- a/packages/geotiff/src/fetch.ts +++ b/packages/geotiff/src/fetch.ts @@ -34,6 +34,9 @@ interface HasTiffReference extends HasTransform { /** The nodata value for the image, if any. */ readonly nodata: number | null; + + /** When true, the tile-fetch path logs each dataSource fetch to the console. */ + readonly debug: boolean; } export async function fetchTile( @@ -53,7 +56,10 @@ export async function fetchTile( const tileFetch = fetchCogBytes(self, x, y, { signal }); const maskFetch = self.maskImage != null - ? getTile(self.maskImage, x, y, self.dataSource, { signal }) + ? getTile(self.maskImage, x, y, self.dataSource, { + signal, + debug: self.debug ? { label: "mask" } : undefined, + }) : Promise.resolve(null); const [tileBytes, maskBytes] = await Promise.all([tileFetch, maskFetch]); @@ -149,6 +155,13 @@ export async function fetchTiles( type GetBytesResponse = { bytes: ArrayBuffer; compression: Compression }; type ByteRange = Awaited>; +/** + * Opt-in debug tag for {@link getTile} / {@link getBytes}. When present, + * each underlying `dataSource.fetch` call is logged to the console with the + * tag's `label`, the offset, and the byte count. When absent, no logging. + */ +type DebugTag = { label: string }; + async function decodeMask( mask: GetBytesResponse, maskImage: TiffImage, @@ -237,9 +250,15 @@ async function fetchCogBytes( signal?: AbortSignal; } = {}, ): Promise { + const debug: DebugTag | undefined = self.debug + ? { label: "data" } + : undefined; switch (self.cachedTags.planarConfiguration) { case PlanarConfiguration.Contig: { - const tile = await getTile(self.image, x, y, self.dataSource, { signal }); + const tile = await getTile(self.image, x, y, self.dataSource, { + signal, + debug, + }); if (tile === null) { throw new Error(`Tile at (${x}, ${y}) not found`); } @@ -280,6 +299,9 @@ async function fetchBandSeparateTileBytes( signal?: AbortSignal; } = {}, ): Promise { + const debug: DebugTag | undefined = self.debug + ? { label: "data" } + : undefined; const byteRanges = await findBandSeparateTileByteRanges(self, x, y); const buffers = byteRanges.map(async ({ offset, imageSize }) => { const tile = await getBytes( @@ -287,7 +309,7 @@ async function fetchBandSeparateTileBytes( offset, imageSize, self.dataSource, - { signal }, + { signal, debug }, ); if (tile === null) { throw new Error(`Tile at (${x}, ${y}) not found`); @@ -316,7 +338,7 @@ async function getTile( x: number, y: number, dataSource: Pick, - options?: { signal?: AbortSignal }, + options?: { signal?: AbortSignal; debug?: DebugTag }, ): Promise<{ bytes: ArrayBuffer; compression: Compression; @@ -369,7 +391,7 @@ async function getBytes( offset: number, byteCount: number, dataSource: Pick, - options?: { signal?: AbortSignal }, + options?: { signal?: AbortSignal; debug?: DebugTag }, ): Promise<{ bytes: ArrayBuffer; compression: Compression; @@ -378,6 +400,12 @@ async function getBytes( return null; } + if (options?.debug !== undefined) { + console.log( + `[geotiff dataSource] ${options.debug.label}: offset=${offset} length=${byteCount}`, + ); + } + const bytes = await dataSource.fetch(offset, byteCount, options); if (bytes.byteLength < byteCount) { throw new Error( diff --git a/packages/geotiff/src/geotiff.ts b/packages/geotiff/src/geotiff.ts index bbe2e9ad..cd2b2f2b 100644 --- a/packages/geotiff/src/geotiff.ts +++ b/packages/geotiff/src/geotiff.ts @@ -70,6 +70,15 @@ export class GeoTIFF { /** Parsed GDALMetadata tag, if present. */ readonly gdalMetadata: GDALMetadata | null; + /** + * When true, log each {@link dataSource} fetch (image tile data and mask + * tile data) to the console with offset/length and a `data`/`mask` label. + * Useful for diagnosing per-request behavior against the browser network + * panel. Enable via the `debug` option on {@link GeoTIFF.open} or + * {@link GeoTIFF.fromUrl}. + */ + readonly debug: boolean; + private constructor( tiff: Tiff, image: TiffImage, @@ -79,6 +88,7 @@ export class GeoTIFF { cachedTags: CachedTags, dataSource: Pick, gdalMetadata: GDALMetadata | null, + debug: boolean, ) { this.tiff = tiff; this.image = image; @@ -88,6 +98,7 @@ export class GeoTIFF { this.cachedTags = cachedTags; this.dataSource = dataSource; this.gdalMetadata = gdalMetadata; + this.debug = debug; } /** @@ -99,14 +110,22 @@ export class GeoTIFF { * @param options.headerSource The source used to construct the TIFF. This is typically a layered source with caching and chunking, to optimise access to TIFF tags and IFDs. * @param options.prefetch Number of bytes to prefetch when reading TIFF tags and IFDs. Defaults to 32KB, which is enough for most tags and small IFDs. Increase if you have many tags or large IFDs. * @param options.signal An optional {@link AbortSignal} to cancel the header reads. + * @param options.debug When true, the returned GeoTIFF logs each tile/mask data fetch to the console. Off by default. */ static async open(options: { dataSource: Pick; headerSource: Source; prefetch?: number; signal?: AbortSignal; + debug?: boolean; }): Promise { - const { dataSource, headerSource, prefetch = 32 * 1024, signal } = options; + const { + dataSource, + headerSource, + prefetch = 32 * 1024, + signal, + debug, + } = options; const tiff = await Tiff.create(headerSource, { defaultReadSize: prefetch, signal, @@ -122,7 +141,7 @@ export class GeoTIFF { // TODO: replace this with a cleaner opt-out once upstream supports one // (see https://github.com/blacha/cogeotiff/issues/ — TBD). tiff.options = undefined; - return GeoTIFF.fromTiff(tiff, dataSource, { signal }); + return GeoTIFF.fromTiff(tiff, dataSource, { signal, debug }); } /** @@ -133,13 +152,14 @@ export class GeoTIFF { * * @param dataSource A source for fetching tile data. This is separate from the source used to construct the TIFF to allow for separate caching implementations. * @param options.signal An optional {@link AbortSignal} to cancel header tag reads. + * @param options.debug When true, the returned GeoTIFF logs each tile/mask data fetch to the console. */ static async fromTiff( tiff: Tiff, dataSource: Pick, - options: { signal?: AbortSignal } = {}, + options: { signal?: AbortSignal; debug?: boolean } = {}, ): Promise { - const { signal } = options; + const { signal, debug = false } = options; const images = tiff.images; if (images.length === 0) { throw new Error("TIFF does not contain any IFDs"); @@ -198,6 +218,7 @@ export class GeoTIFF { cachedTags, dataSource, gdalMetadata, + debug, ); const overviews: Overview[] = dataEntries.map(([key, dataImage]) => { @@ -251,6 +272,7 @@ export class GeoTIFF { * @param options.chunkSize Bytes per chunk for the header cache. Defaults to 64 KiB (matches geotiff.js's BlockedSource). * @param options.cacheSize Total cache size in bytes. Defaults to 8 MiB (~128 blocks at the default chunk size). * @param options.signal An optional {@link AbortSignal} to cancel the header reads. + * @param options.debug When true, the returned GeoTIFF logs each tile/mask data fetch to the console with offset/length and a `data`/`mask` label. Off by default. * @returns A Promise that resolves to a GeoTIFF instance. */ static async fromUrl( @@ -259,10 +281,12 @@ export class GeoTIFF { chunkSize = 64 * 1024, cacheSize = 8 * 1024 * 1024, signal, + debug, }: { chunkSize?: number; cacheSize?: number; signal?: AbortSignal; + debug?: boolean; } = {}, ): Promise { const source = new SourceHttp(url, {}); @@ -297,6 +321,7 @@ export class GeoTIFF { // the intent local and survives middleware changes. prefetch: chunkSize, signal, + debug, }); } diff --git a/packages/geotiff/src/overview.ts b/packages/geotiff/src/overview.ts index 195d9229..17740b57 100644 --- a/packages/geotiff/src/overview.ts +++ b/packages/geotiff/src/overview.ts @@ -70,6 +70,11 @@ export class Overview { return this.geotiff.nodata; } + /** Inherits the {@link GeoTIFF.debug} flag from the parent. */ + get debug(): boolean { + return this.geotiff.debug; + } + /** The number of tiles in the x and y directions */ get tileCount(): TiffImageTileCount { return this.image.tileCount; From 1635159710c4d4eb298e2bcb0f7dc73722115789 Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 15:20:31 -0400 Subject: [PATCH 08/12] update issue link --- packages/geotiff/src/geotiff.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/geotiff/src/geotiff.ts b/packages/geotiff/src/geotiff.ts index cd2b2f2b..cc51618a 100644 --- a/packages/geotiff/src/geotiff.ts +++ b/packages/geotiff/src/geotiff.ts @@ -139,7 +139,7 @@ export class GeoTIFF { // that one path, so nulling it here is safe. // // TODO: replace this with a cleaner opt-out once upstream supports one - // (see https://github.com/blacha/cogeotiff/issues/ — TBD). + // https://github.com/blacha/cogeotiff/issues/1467 tiff.options = undefined; return GeoTIFF.fromTiff(tiff, dataSource, { signal, debug }); } From 6007dfcd9af72f090311e9c1ca4e263a95accd6c Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 15:21:27 -0400 Subject: [PATCH 09/12] clean up vt example --- examples/vermont-cog-comparison/src/App.tsx | 13 +--- .../vermont-cog-comparison/src/vt-imagery.ts | 68 ------------------- 2 files changed, 1 insertion(+), 80 deletions(-) diff --git a/examples/vermont-cog-comparison/src/App.tsx b/examples/vermont-cog-comparison/src/App.tsx index a3886752..bbe79645 100644 --- a/examples/vermont-cog-comparison/src/App.tsx +++ b/examples/vermont-cog-comparison/src/App.tsx @@ -334,19 +334,8 @@ export default function App() { inFlightRef.current.add(url); void (async () => { try { - // Pad each tunable to (the file's known header size) OR a - // generic 16 MB default for files that haven't been measured. - // Vermont COGs scale wildly (3-band 30 cm = 60 MB header, - // 1-band yearly = ~3 MB), so a per-file value is a big win. - // - prefetch sizes the initial Tiff read, - // - chunkSize >= prefetch so the read fits in one source chunk - // (otherwise SourceChunk splits it into chunkSize-aligned pieces). - // - cacheSize >= chunkSize to actually retain the header chunk. - const headerBytes = file.headerByteLength ?? 16 * 1024 * 1024; const gt = await GeoTIFF.fromUrl(url, { - chunkSize: headerBytes, - cacheSize: Math.max(headerBytes, 16 * 1024 * 1024), - prefetch: headerBytes, + cacheSize: 64 * 1024 * 1024, }); setGeotiffs((prev) => new Map(prev).set(url, gt)); } catch (err) { diff --git a/examples/vermont-cog-comparison/src/vt-imagery.ts b/examples/vermont-cog-comparison/src/vt-imagery.ts index 7eab4644..9d394b18 100644 --- a/examples/vermont-cog-comparison/src/vt-imagery.ts +++ b/examples/vermont-cog-comparison/src/vt-imagery.ts @@ -29,15 +29,6 @@ export type VTFile = { bands: BandCount; /** Group identifier used for `` separation in the UI. */ category: FileCategory; - /** - * Total bytes occupied by the TIFF header (= offset where tile data begins). - * - * When known, lets us request the exact range in a single HTTP call when - * opening the COG. When `undefined` the loader falls back to a generic - * default. Measure with `GeoTIFF.getHeaderByteLength()` and add to - * {@link HEADER_BYTE_LENGTHS} below. - */ - headerByteLength?: number; }; /** @@ -99,64 +90,6 @@ const YEARLY_FILENAMES = [ "1994_50cm_LeafOFF_1Band.tif", ] as const; -/** - * Measured header byte lengths for files we've opened. Pulled from the - * `GeoTIFF.getHeaderByteLength()` log; see [README](../README.md). Add - * an entry whenever a new file's header length is observed so the loader - * can fetch the exact range in one request. - * - * Keyed by VTFile.id (filename without `.tif`). - */ -const HEADER_BYTE_LENGTHS: Record = { - STATEWIDE_2025_30cm_LeafON_3Band: 60_998_796, - STATEWIDE_2024_30cm_LeafOFF_4Band: 27_985_606, - STATEWIDE_2023_30cm_LeafON_4Band: 29_757_726, - "STATEWIDE_2021-2022_30cm_LeafOFF_4Band": 27_985_604, - STATEWIDE_2021_60cm_LeafON_4Band: 7_445_474, - "STATEWIDE_2011-2015_50cm_LeafOFF_4Band": 10_014_005, - "STATEWIDE_2006-2010_50cm_LeafOFF_1Band": 10_012_861, - "STATEWIDE_1994-2000_50cm_LeafOFF_1Band": 10_012_861, - "STATEWIDE_1974-1992_100cm_LeafOFF_1Band": 3_507_513, - "2023_15cm_LeafOFF_4Band": 73_284_297, - "2022_30cm_LeafOFF_4Band": 18_243_651, - "2021_30cm_LeafOFF_4Band": 5_955_273, - "2020_15cm_LeafOFF_4Band": 10_516_770, - "2019_30cm_LeafOFF_4Band": 26_110_689, - "2019_15cm_LeafOFF_4Band": 28_574_141, - "2018_30cm_LeafOFF_4Band": 12_708_843, - "2018_15cm_LeafOFF_4Band": 41_931_939, - "2017_30cm_LeafOFF_4Band": 3_127_309, - "2017_15cm_LeafOFF_4Band": 4_943_211, - "2016-2019_30cm_LeafOFF_4Band": 27_985_838, - "2016_30cm_LeafOFF_4Band": 4_399_707, - "2016_15cm_LeafOFF_4Band": 14_061_893, - "2015_50cm_LeafOFF_4Band": 1_124_768, - "2014_50cm_LeafOFF_4Band": 1_657_062, - "2013_50cm_LeafOFF_4Band": 1_492_374, - "2013_30cm_LeafOFF_4Band": 563_878, - "2013_20cm_LeafOFF_4Band": 934_966, - "2013_15cm_LeafOFF_4Band": 5_870_274, - "2012_50cm_LeafOFF_4Band": 2_283_359, - "2011_50cm_LeafOFF_4Band": 1_539_159, - "2010_50cm_LeafOFF_1Band": 1_122_683, - "2009_50cm_LeafOFF_1Band": 753_547, - "2009_30cm_LeafOFF_3Band": 477_338, - "2008_50cm_LeafOFF_1Band": 1_107_847, - "2008_30cm_LeafON_3Band": 7_464_402, - "2007_50cm_LeafOFF_1Band": 1_210_867, - "2006_50cm_LeafOFF_1Band": 3_188_535, - "2006_15cm_LeafOFF_3Band": 544_454, - "2004_16cm_LeafOFF_3Band": 5_226_902, - "2001_15cm_LeafOFF_3Band": 100_696, - "2000_50cm_LeafOFF_1Band": 1_124_491, - "1999_50cm_LeafOFF_1Band": 3_137_297, - "1998_50cm_LeafOFF_1Band": 665_395, - "1998_13cm_LeafOFF_1Band": 35_737, - "1996_50cm_LeafOFF_1Band": 749_887, - "1995_50cm_LeafOFF_1Band": 2_537_357, - "1994_50cm_LeafOFF_1Band": 1_538_035, -}; - const FILENAME_PATTERN = /^(STATEWIDE_)?(\d{4}(?:-\d{4})?)_(\d+)cm_Leaf(OFF|ON)_(\d)Band\.tif$/; @@ -182,7 +115,6 @@ function parseVTFilename(filename: string, category: FileCategory): VTFile { url: `${BASE_URL}/${filename}`, bands: bandCount, category, - headerByteLength: HEADER_BYTE_LENGTHS[id], }; } From 4013a348dcba8b5f01ea85d07caa7237acc2309b Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 15:35:52 -0400 Subject: [PATCH 10/12] refactor(geotiff)!: drop redundant prefetch option from GeoTIFF.open MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In the typical fromUrl path, prefetch was just coupled to chunkSize via SourceChunk padding — small requests get padded up to one block, large ones fetch multiple blocks. The option added no behavior over what the chunking middleware already provides, and exposed a knob that callers almost never need to tune. Direct GeoTIFF.open callers who want a specific initial fetch size can compose a SourceChunk of the desired block size into their headerSource; the option is the right tool for that job, not a separate dial on open. cogeotiff's default DefaultReadSize (16 KiB) is now used for the very first read; SourceChunk pads it to chunkSize transparently. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-05-12-cog-block-cache-design.md | 2 +- packages/geotiff/src/geotiff.ts | 21 ++++++------------- 2 files changed, 7 insertions(+), 16 deletions(-) diff --git a/dev-docs/specs/2026-05-12-cog-block-cache-design.md b/dev-docs/specs/2026-05-12-cog-block-cache-design.md index b69b762c..02517377 100644 --- a/dev-docs/specs/2026-05-12-cog-block-cache-design.md +++ b/dev-docs/specs/2026-05-12-cog-block-cache-design.md @@ -10,7 +10,7 @@ The original design (the sequential exponential read-ahead cache, then a frozen- geotiff.js takes the opposite approach. Each `fromUrl` call requests just 1024 bytes (header + first IFD pointer), but its `BlockedSource` pads this up to one 64 KiB block — so the first underlying HTTP request is a full block, not 1024 bytes. `getImage(i)` reads only that IFD's entries; tile-array values are wrapped in a `DeferredArray` that holds only their file offset + count. When a tile is requested, `DeferredArray.get(index)` fetches the specific 4–8 byte offset entry (and separately the byte-count entry) through the same block-aligned `BlockedSource` — adjacent entries from the same array all live in the same 64 KiB block, so per-tile lookups cost ~0 HTTP requests after the first one in a region. The block cache lives inside the source layer; cogeotiff's lazy per-entry reads benefit from it automatically. -Our implementation uses the same shape: cogeotiff's `image.getTileSize(idx)` reads `TileOffsets[idx]` and `TileByteCounts[idx]` lazily through the wrapped header source, and our chunk cache turns those 4–8 byte reads into shared 64 KiB block fetches. We explicitly pass `prefetch: chunkSize` to `Tiff.create` so the very first byte read is a full block too (cogeotiff's default `DefaultReadSize` is only 16 KiB). +Our implementation uses the same shape: cogeotiff's `image.getTileSize(idx)` reads `TileOffsets[idx]` and `TileByteCounts[idx]` lazily through the wrapped header source, and our chunk cache turns those 4–8 byte reads into shared 64 KiB block fetches. cogeotiff's first byte read uses its default `DefaultReadSize` (16 KiB); `SourceChunk` pads it up to the chunk size, so the actual wire request is one block. That design lines up with how cogeotiff was built to be used — `image.init(true)` already loads only "important tags" (dimensions, tile size, georeferencing, GeoKeys) and defers everything else. diff --git a/packages/geotiff/src/geotiff.ts b/packages/geotiff/src/geotiff.ts index cc51618a..7a1c4d0f 100644 --- a/packages/geotiff/src/geotiff.ts +++ b/packages/geotiff/src/geotiff.ts @@ -107,27 +107,22 @@ export class GeoTIFF { * This creates and initialises the underlying Tiff, then classifies IFDs. * * @param options.dataSource A source for fetching tile data. This is separate from the source used to construct the TIFF to allow for separate caching implementations. - * @param options.headerSource The source used to construct the TIFF. This is typically a layered source with caching and chunking, to optimise access to TIFF tags and IFDs. - * @param options.prefetch Number of bytes to prefetch when reading TIFF tags and IFDs. Defaults to 32KB, which is enough for most tags and small IFDs. Increase if you have many tags or large IFDs. + * @param options.headerSource The source used to construct the TIFF. This is typically a layered source with caching and chunking, to optimise access to TIFF tags and IFDs. Callers who want to control the initial read size should compose a `SourceChunk` of the desired block size; cogeotiff's default `defaultReadSize` (16 KiB) gets padded up by the chunking layer anyway. * @param options.signal An optional {@link AbortSignal} to cancel the header reads. * @param options.debug When true, the returned GeoTIFF logs each tile/mask data fetch to the console. Off by default. */ static async open(options: { dataSource: Pick; headerSource: Source; - prefetch?: number; signal?: AbortSignal; debug?: boolean; }): Promise { - const { - dataSource, - headerSource, - prefetch = 32 * 1024, - signal, - debug, - } = options; + const { dataSource, headerSource, signal, debug } = options; + // We use cogeotiff's default read size; in the typical fromUrl path, + // SourceChunk pads any small request up to the block size anyway, so + // tuning this independently of the chunk size is rarely useful. const tiff = await Tiff.create(headerSource, { - defaultReadSize: prefetch, + defaultReadSize: Tiff.DefaultReadSize, signal, }); // Disable cogeotiff's GDAL leader-bytes path so `TiffImage.getTileSize` @@ -316,10 +311,6 @@ export class GeoTIFF { // Tile data reads bypass the header cache (raw source). dataSource: source, headerSource: view, - // Read a full block on cogeotiff's first byte fetch. SourceChunk would - // pad a smaller request up to chunkSize anyway, but being explicit keeps - // the intent local and survives middleware changes. - prefetch: chunkSize, signal, debug, }); From dde5173c3ce081459e910b3c3553bcf109c62417 Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 15:36:48 -0400 Subject: [PATCH 11/12] fix(geotiff): make HasTiffReference.debug optional MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Overview doesn't expose debug — only the primary GeoTIFF does. Drop the required-property constraint so Overview satisfies the interface without its own getter. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/geotiff/src/fetch.ts | 2 +- packages/geotiff/src/overview.ts | 5 ----- 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/packages/geotiff/src/fetch.ts b/packages/geotiff/src/fetch.ts index bfe1409c..deba5f7b 100644 --- a/packages/geotiff/src/fetch.ts +++ b/packages/geotiff/src/fetch.ts @@ -36,7 +36,7 @@ interface HasTiffReference extends HasTransform { readonly nodata: number | null; /** When true, the tile-fetch path logs each dataSource fetch to the console. */ - readonly debug: boolean; + readonly debug?: boolean; } export async function fetchTile( diff --git a/packages/geotiff/src/overview.ts b/packages/geotiff/src/overview.ts index 17740b57..195d9229 100644 --- a/packages/geotiff/src/overview.ts +++ b/packages/geotiff/src/overview.ts @@ -70,11 +70,6 @@ export class Overview { return this.geotiff.nodata; } - /** Inherits the {@link GeoTIFF.debug} flag from the parent. */ - get debug(): boolean { - return this.geotiff.debug; - } - /** The number of tiles in the x and y directions */ get tileCount(): TiffImageTileCount { return this.image.tileCount; From f081f6c5eb38cf0facbb59278ab0c909b386490d Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Tue, 12 May 2026 15:41:02 -0400 Subject: [PATCH 12/12] refactor(geotiff): make debug field internal (_debug) and drop Overview accessor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses PR review: - Class field debug -> _debug, marked @internal. The user-facing option on GeoTIFF.open/fromUrl stays named 'debug'. - Remove Overview.debug getter — overview tile fetches don't need to log; the primary GeoTIFF's _debug is the only opt-in surface. - HasTiffReference._debug? is optional so Overview (without it) still satisfies the interface. - Drop the explicit defaultReadSize parameter by switching from Tiff.create to 'new Tiff(...).init({ signal })', letting the constructor default to Tiff.DefaultReadSize. Co-Authored-By: Claude Opus 4.7 (1M context) --- packages/geotiff/src/fetch.ts | 14 +++++++++----- packages/geotiff/src/geotiff.ts | 29 +++++++++++++++-------------- packages/geotiff/src/overview.ts | 5 +++++ 3 files changed, 29 insertions(+), 19 deletions(-) diff --git a/packages/geotiff/src/fetch.ts b/packages/geotiff/src/fetch.ts index deba5f7b..940b53c9 100644 --- a/packages/geotiff/src/fetch.ts +++ b/packages/geotiff/src/fetch.ts @@ -35,8 +35,12 @@ interface HasTiffReference extends HasTransform { /** The nodata value for the image, if any. */ readonly nodata: number | null; - /** When true, the tile-fetch path logs each dataSource fetch to the console. */ - readonly debug?: boolean; + /** + * Internal: when true, the tile-fetch path logs each dataSource fetch to + * the console. Set via `GeoTIFF.open({ debug: true })`. + * @internal + */ + readonly _debug?: boolean; } export async function fetchTile( @@ -58,7 +62,7 @@ export async function fetchTile( self.maskImage != null ? getTile(self.maskImage, x, y, self.dataSource, { signal, - debug: self.debug ? { label: "mask" } : undefined, + debug: self._debug ? { label: "mask" } : undefined, }) : Promise.resolve(null); @@ -250,7 +254,7 @@ async function fetchCogBytes( signal?: AbortSignal; } = {}, ): Promise { - const debug: DebugTag | undefined = self.debug + const debug: DebugTag | undefined = self._debug ? { label: "data" } : undefined; switch (self.cachedTags.planarConfiguration) { @@ -299,7 +303,7 @@ async function fetchBandSeparateTileBytes( signal?: AbortSignal; } = {}, ): Promise { - const debug: DebugTag | undefined = self.debug + const debug: DebugTag | undefined = self._debug ? { label: "data" } : undefined; const byteRanges = await findBandSeparateTileByteRanges(self, x, y); diff --git a/packages/geotiff/src/geotiff.ts b/packages/geotiff/src/geotiff.ts index 7a1c4d0f..3c427c0f 100644 --- a/packages/geotiff/src/geotiff.ts +++ b/packages/geotiff/src/geotiff.ts @@ -71,13 +71,15 @@ export class GeoTIFF { readonly gdalMetadata: GDALMetadata | null; /** - * When true, log each {@link dataSource} fetch (image tile data and mask - * tile data) to the console with offset/length and a `data`/`mask` label. - * Useful for diagnosing per-request behavior against the browser network - * panel. Enable via the `debug` option on {@link GeoTIFF.open} or - * {@link GeoTIFF.fromUrl}. + * Internal: when true, log each `dataSource` fetch (image tile data and + * mask tile data) to the console with offset/length and a `data`/`mask` + * label. Enable via the `debug` option on {@link GeoTIFF.open} or + * {@link GeoTIFF.fromUrl}. Read by the tile-fetch path; not part of the + * public API surface. + * + * @internal */ - readonly debug: boolean; + readonly _debug: boolean; private constructor( tiff: Tiff, @@ -98,7 +100,7 @@ export class GeoTIFF { this.cachedTags = cachedTags; this.dataSource = dataSource; this.gdalMetadata = gdalMetadata; - this.debug = debug; + this._debug = debug; } /** @@ -118,13 +120,12 @@ export class GeoTIFF { debug?: boolean; }): Promise { const { dataSource, headerSource, signal, debug } = options; - // We use cogeotiff's default read size; in the typical fromUrl path, - // SourceChunk pads any small request up to the block size anyway, so - // tuning this independently of the chunk size is rarely useful. - const tiff = await Tiff.create(headerSource, { - defaultReadSize: Tiff.DefaultReadSize, - signal, - }); + // Construct + init in two steps so we don't have to pass cogeotiff's + // `defaultReadSize` ourselves (the constructor defaults it to + // `Tiff.DefaultReadSize` when no options are provided). In the typical + // fromUrl path, SourceChunk pads any small request up to the block size + // anyway, so tuning this independently of the chunk size is rarely useful. + const tiff = await new Tiff(headerSource).init({ signal }); // Disable cogeotiff's GDAL leader-bytes path so `TiffImage.getTileSize` // always reads from TileOffsets/TileByteCounts through the header source. // The leader-bytes optimization assumes a tile fits in one chunk, which diff --git a/packages/geotiff/src/overview.ts b/packages/geotiff/src/overview.ts index 195d9229..8af8c306 100644 --- a/packages/geotiff/src/overview.ts +++ b/packages/geotiff/src/overview.ts @@ -70,6 +70,11 @@ export class Overview { return this.geotiff.nodata; } + /** Inherits the {@link GeoTIFF._debug} flag from the parent. */ + get _debug(): boolean { + return this.geotiff._debug; + } + /** The number of tiles in the x and y directions */ get tileCount(): TiffImageTileCount { return this.image.tileCount;