a5px is a Rust-backed R package that streams GeoTIFF / Cloud-Optimised GeoTIFF rasters from local files or cloud object stores and aggregates their pixels into A5 pentagonal DGGS cells in a single pass.
The output is a tibble (or Arrow table, or Parquet file on disk) keyed
by a5R::a5_cell, ready for SQL-style queries against the resulting
embeddings or summary statistics. On a 12-band Sentinel-2 COG it is
roughly 7x faster than gdal raster zonal-stats end-to-end and
11x faster than a hand-rolled terra + dplyr pipeline.
a5px is not on CRAN. Install the development version from GitHub:
# install.packages("pak")
pak::pak("belian-earth/a5px")You will need a working Rust toolchain (cargo
and rustc >= 1.85) and a recent version of
a5R in your library.
A complete read of a public Cloud-Optimised GeoTIFF (the EOx 2020 Sentinel-2 cloudless mosaic, RGB) into A5 cells, then a quick visualisation:
library(a5px)
#> Loading required package: a5R
url <- "https://s2downloads.eox.at/demo/EOxCloudless/2020/rgb_corrected_geodetic/3/0/0.tif"
EOx <- a5_read_raster(
url,
resolution = 5L,
stat = "mean"
)
EOx
#> # A tibble: 7,928 × 4
#> cell band_01 band_02 band_03
#> <a5_cell> <dbl> <dbl> <dbl>
#> 1 3066000000000000 5.08 12.8 15.5
#> 2 04d6000000000000 55.7 50.0 23.6
#> 3 5276000000000000 251. 174. 94.5
#> 4 bd3e000000000000 253. 255. 254.
#> 5 00ca000000000000 23.2 32.3 28.5
#> 6 2dce000000000000 30.9 28.8 20.6
#> 7 1e1e000000000000 14.8 25.4 30.0
#> 8 8316000000000000 1.90 9.64 9.96
#> 9 8d02000000000000 66.6 52.8 25.5
#> 10 3a56000000000000 27.1 54.5 27.0
#> # ℹ 7,918 more rows
# Render the three bands as RGB and view on the globe
a5view::a5_view(EOx,
fill = a5view::cells_rgb(EOx$band_01, EOx$band_02, EOx$band_03),
fill_identity = TRUE,
border = "#ffffff",
border_width = 0.25,
globe = TRUE,
opacity = 1,
zoom=2,
lng = 0,
lat = 44)# 1. tibble path: one numeric column per band, easy R interactivity
tbl <- a5_read_raster(src, resolution, stat, bands, threads, io_concurrency,
as_vector = FALSE)
# 2. arrow Table path: cell + FixedSizeList<float, n_bands> value column
arr <- a5_read_raster_arrow(src, resolution, stat, bands, threads,
io_concurrency, value_type = "float64")
# 3. Rust-direct Parquet path: skips the R-side Arrow round-trip entirely
a5_raster_to_parquet(src, dest, resolution, stat, bands, value_type,
compression, threads, io_concurrency)
# Convenience writer for either (1) or (2)
a5_write_parquet(x, dest, compression = "zstd", ...)All three readers share the same arguments:
src— path or URL. Schemes: local,file://,http(s)://,s3://,gs://,az://. Cloud reads stream byte ranges; the full file is never materialised.resolution— A5 resolution (0–30); seea5R::a5_cell_area().stat— one or more of"mean","sum","count","min","max". A vector emits one column per (band, stat) pair on the tibble path and one FixedSizeList per stat on the Arrow / Parquet paths.bands—NULL(all), integer indices (1-based), or character band names matched against the GDALDESCRIPTIONtag.threads,io_concurrency— tile-level concurrency.
# sum and count let you merge multi-tile aggregates without re-reading
agg <- a5_read_raster(src, resolution = 14L,
stat = c("mean", "sum", "count"))
# columns: cell, B02__mean, B02__sum, B02__count, B03__mean, ...For planar (INTERLEAVE=BAND) COGs — common in embedding rasters like
Alpha Earth Foundations — requesting bands = 1:8 only fetches the byte
ranges of those bands. Reading 8 of 64 bands from a 2.8 GB AEF tile
drops from 62 s to 24 s end-to-end.
url <- "https://data.source.coop/.../tile.tiff"
a5_raster_to_parquet(
url, "embeddings.parquet",
resolution = 14L,
bands = 1:8,
value_type = "float32",
compression = "zstd",
threads = 16L,
io_concurrency = 16L
)a5px_set_threads(8) # global default
a5px_get_threads() # current
options(a5px.threads = 8) # picked up at .onLoad
Sys.setenv(A5PX_NUM_THREADS = "8") # ditto
Sys.setenv(A5PX_PROFILE = "1") # dump stage timings to stderr- Formats — tiled GeoTIFF and Cloud-Optimised GeoTIFF, ZSTD / Deflate / LZW / JPEG / uncompressed.
- CRS resolution — EPSG codes, WKT in citation fields, and
reconstruction from explicit GeoKey parameters. Custom centered
projections written by GDAL (
+proj=laea +lon_0=... +lat_0=..., Albers, LCC, polar stereographic, etc.) are recognised without needing an EPSG number. - NoData — dataset-wide
TIFFTAG_GDAL_NODATAis honoured with NaN-safe comparison.
- Strip-based GeoTIFFs (errors with a clear message).
- Per-band nodata — a GeoTIFF spec limitation; will land with VRT support.
- Inverse / cell-driven mode for
pixel_area > cell_area. - Antimeridian-crossing rasters.
async-tiff+object_storefor streamed reads.proj4rsfor pure-Rust CRS reprojection;proj4wktfor the WKT-only fallback.a5for cell math.arrow-array+parquetfor the Rust-direct Parquet writer.extendr-apifor the R bindings.
Built on the a5 Rust crate by
Felix Palmer and
a5R for the cell type and Arrow
interop. Streaming I/O and TIFF parsing courtesy of the
async-tiff team at
Development Seed.
