Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,5 @@ slides/lightning-talk/index_files/
docs/notebooks/data/

dev-docs/plans/

issue-drafts/
2 changes: 1 addition & 1 deletion ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ src/lazycogs/
7. Calls `compute_output_grid()` to get the output affine transform and dimensions (width, height). No eager coordinate arrays are produced.
8. Creates a single `MultiBandStacBackendArray` (a dataclass) with shape `(band, time, y, x)` holding all the parameters needed to materialise any chunk later, then wraps it in one `xarray.core.indexing.LazilyIndexedArray`. This avoids `xr.concat` (used internally by `ds.to_array()`), which would eagerly load `LazilyIndexedArray`-backed objects.
9. Uses `rasterix.RasterIndex` for spatial indexing, but materialises the x/y coordinate variables eagerly as numpy arrays so chunked scalar spatial selections compute reliably.
10. Constructs the `xr.DataArray` directly from the 4-D variable. If `chunks` is provided, calls `.chunk(chunks)` to convert to a dask-backed array; otherwise the `LazilyIndexedArray` remains in play so narrow slices (e.g. a single pixel) translate to minimal I/O. When output nodata is known, the returned array sets `da.attrs["_FillValue"]` and `da.encoding["_FillValue"]` for downstream serialization. When unknown, no `_FillValue` metadata is attached.
10. Constructs the `xr.DataArray` directly from the 4-D variable. If `chunks` is provided, calls `.chunk(chunks)` to convert to a dask-backed array; otherwise the `LazilyIndexedArray` remains in play so narrow slices (e.g. a single pixel) translate to minimal I/O. Spatial metadata is serialized for both GeoZarr-style consumers and GDAL/rioxarray consumers: `spatial:transform` stays in affine coefficient order, while `spatial_ref.attrs["GeoTransform"]` uses GDAL geotransform order. When output nodata is known, the returned array sets `da.attrs["_FillValue"]` without duplicating it in `da.encoding`, which keeps rioxarray export paths compatible with xarray CF encoding. When unknown, no `_FillValue` metadata is attached.
11. Keeps lazy runtime state on the backing array rather than in `da.attrs`. This lets xarray operations such as `sortby()` and deep copies clone metadata safely without trying to pickle live objects like `DuckdbClient`.

## Explain: dry-run read estimator
Expand Down
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,11 @@ asks you to pass `dtype=` explicitly.
When you omit `nodata=`:

- if sampled bands all agree on one scalar nodata sentinel, the returned
`DataArray` sets `attrs["_FillValue"]` and `encoding["_FillValue"]`, and
masked mosaic output materializes with that same sentinel instead of zero
`DataArray` sets `attrs["_FillValue"]`, and masked mosaic output materializes
with that same sentinel instead of zero
- if sampled bands disagree, `open()` raises `ValueError` and asks you to pass
`nodata=` explicitly
- if sampled bands have no nodata sentinel, no `_FillValue` encoding is
- if sampled bands have no nodata sentinel, no `_FillValue` metadata is
attached and `0` remains only an implementation fill value for uncovered
regions
- if later chunk reads encounter a conflicting source nodata value, compute
Expand All @@ -131,6 +131,12 @@ When you omit `nodata=`:
Explicit `dtype=` and `nodata=` stay authoritative even when source assets are
heterogeneous.

`lazycogs.open()` also attaches CF/rioxarray-compatible spatial metadata. The
GeoZarr-style `spatial:transform` attribute stays in affine coefficient order,
while `spatial_ref.attrs["GeoTransform"]` is written in GDAL geotransform order
so sliced 2D images and 3D band stacks can be read by rioxarray without repairing
the transform metadata.

Float-only mosaic methods such as `MeanMethod`, `MedianMethod`, and
`StdevMethod` auto-promote inferred integer outputs to `float32`. If you pass
an explicit integer `dtype=` with one of those methods, `open()` raises and
Expand Down
8 changes: 5 additions & 3 deletions docs/guides/dtype-nodata.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,12 @@ When you omit `nodata=`:
- if sampled bands all have `nodata=None`, lazycogs leaves output nodata unknown
- if sampled bands disagree, `open()` raises and asks you to pass `nodata=` explicitly

When nodata is known, the returned `DataArray` advertises it in both:
When nodata is known, the returned `DataArray` advertises it with:

- `da.attrs["_FillValue"]`
- `da.encoding["_FillValue"]`

lazycogs intentionally does not duplicate `_FillValue` into `da.encoding` because
that collides with xarray's CF encoding step during rioxarray exports.

## What output nodata means

Expand Down Expand Up @@ -102,7 +104,7 @@ If a later asset conflicts with the inferred output contract, compute raises ins

```python
da.dtype
da.encoding.get("_FillValue")
da.attrs.get("_FillValue")
```

Those two values tell you most of what lazycogs has promised about the returned array.
Loading
Loading