Summary
Two related issues prevent computing expressions across two datasets (e.g., forecast.temp - target.temp):
- Alias syntax doesn't parse for datasets:
FROM dataset1 a, dataset2 b fails with a parse error.
- Multi-variable SELECT from datasets not supported: Even within a single dataset,
SELECT u_wind - v_wind FROM climate fails.
Reproduction
Issue 1: Alias syntax parse failure
REGISTER DATASET climate FROM 'zarr:///path/to/tutorial_climate.zarr';
-- Expected: compute self-difference (should be all zeros)
-- Actual: parse error
SELECT a.temperature - b.temperature FROM climate a, climate b;
-- Error: query error: unexpected trailing tokens after statement: Ident("a")
The parser doesn't recognize FROM <dataset_name> <alias> — it treats the alias as a new statement.
Issue 2: Multi-variable expressions from datasets
-- Expected: element-wise difference between u_wind and v_wind arrays
-- Actual: error
SELECT u_wind - v_wind FROM climate;
-- Error: not implemented: multi-variable SELECT from datasets is not yet supported (found: u_wind, v_wind)
When a dataset has multiple data variables, the query planner can't resolve expressions that reference more than one variable.
Expected Behavior
Cross-dataset queries
REGISTER DATASET era5 FROM 'zarr:///path/to/era5.zarr';
REGISTER DATASET hres FROM 'zarr:///path/to/hres.zarr';
-- Should align on shared dimensions (lat, lon, time) and compute element-wise difference
SELECT a.temperature - b.temperature FROM era5 a, hres b;
Multi-variable expressions
-- Should compute element-wise difference between two variables sharing the same dimensions
SELECT u_wind - v_wind FROM climate;
-- Wind speed from components
SELECT sqrt(u_wind * u_wind + v_wind * v_wind) FROM climate;
Context
Cross-dataset arithmetic is essential for forecast evaluation metrics:
MAE = avg_cells(abs(forecast.temp - target.temp))
RMSE = sqrt(avg_cells((forecast.temp - target.temp)^2))
The array-level query system (REGISTER ARRAY) does support multi-source queries with aliases (FROM arr1 a, arr2 b) via the join logic in crates/arrdb-query/src/planner/logical.rs:209-240. The gap is in the dataset-to-array resolution layer — dataset queries need to expand variable references into the underlying array sources.
Files
crates/arrdb-query/src/parser/stmt.rs — FROM clause parsing for datasets may not handle aliases
crates/arrdb-query/src/planner/logical.rs:90-106 — multi-source plan building (works for arrays)
crates/arrdb-exec/src/session.rs — dataset query path needs multi-variable expression support
Summary
Two related issues prevent computing expressions across two datasets (e.g.,
forecast.temp - target.temp):FROM dataset1 a, dataset2 bfails with a parse error.SELECT u_wind - v_wind FROM climatefails.Reproduction
Issue 1: Alias syntax parse failure
The parser doesn't recognize
FROM <dataset_name> <alias>— it treats the alias as a new statement.Issue 2: Multi-variable expressions from datasets
When a dataset has multiple data variables, the query planner can't resolve expressions that reference more than one variable.
Expected Behavior
Cross-dataset queries
Multi-variable expressions
Context
Cross-dataset arithmetic is essential for forecast evaluation metrics:
MAE = avg_cells(abs(forecast.temp - target.temp))RMSE = sqrt(avg_cells((forecast.temp - target.temp)^2))The array-level query system (
REGISTER ARRAY) does support multi-source queries with aliases (FROM arr1 a, arr2 b) via the join logic incrates/arrdb-query/src/planner/logical.rs:209-240. The gap is in the dataset-to-array resolution layer — dataset queries need to expand variable references into the underlying array sources.Files
crates/arrdb-query/src/parser/stmt.rs— FROM clause parsing for datasets may not handle aliasescrates/arrdb-query/src/planner/logical.rs:90-106— multi-source plan building (works for arrays)crates/arrdb-exec/src/session.rs— dataset query path needs multi-variable expression support