Summary
GROUP BY <dimension> in aggregate queries always returns a single row instead of one row per group. The executor's FinalAggOp ignores the group_dims field entirely, reducing all chunks into a single aggregate value.
Reproduction
-- Register the tutorial dataset (4 variables, 365 timesteps, 4 pressure levels)
REGISTER DATASET climate FROM 'zarr:///path/to/tutorial_climate.zarr';
-- Expected: 4 rows (one per pressure level: 1000, 850, 500, 250)
-- Actual: 1 row with a single global average
SELECT avg_cells(temperature) FROM climate GROUP BY level;
-- Output: AVG(temperature) | 246.90143670341035
-- Expected: 365 rows (one per day)
-- Actual: 1 row
SELECT avg_cells(temperature) FROM climate GROUP BY time;
-- Output: AVG(temperature) | 246.90143670341035
Root Cause
The physical planner correctly passes group_dims into PhysicalPlan::FinalAggregate (crates/arrdb-query/src/planner/physical.rs:148-152), but the executor's FinalAggOp (crates/arrdb-exec/src/agg.rs) never uses them. It simply iterates all chunks, accumulates partials, and merges into a single scalar result.
// crates/arrdb-exec/src/compiler.rs:78-86
PhysicalPlan::FinalAggregate { input, aggs, .. } => {
// `group_dims` is in the `..` — completely ignored
let child = compile(actual_input, ctx)?;
Ok(Box::new(FinalAggOp::new(child, aggs.clone())))
}
FinalAggOp::new doesn't accept group dimensions at all.
Expected Behavior
SELECT avg_cells(temperature) FROM climate GROUP BY level should return 4 rows:
level | AVG(temperature)
------+-----------------
1000 | 271.23
850 | 258.45
500 | 233.12
250 | 224.81
Impact
This blocks per-dimension aggregation queries, which are needed for:
- Per-lead-time forecast evaluation metrics (EWB benchmarking)
- Time-series aggregation (
GROUP BY MONTH(time), GROUP BY YEAR(time))
- Any dimensional reduction that isn't a full collapse
Files
crates/arrdb-exec/src/agg.rs — FinalAggOp needs group-aware accumulation
crates/arrdb-exec/src/compiler.rs:78-86 — needs to pass group_dims to the operator
crates/arrdb-exec/src/result.rs — ExecutionResult needs a grouped-aggregate variant
Summary
GROUP BY <dimension>in aggregate queries always returns a single row instead of one row per group. The executor'sFinalAggOpignores thegroup_dimsfield entirely, reducing all chunks into a single aggregate value.Reproduction
Root Cause
The physical planner correctly passes
group_dimsintoPhysicalPlan::FinalAggregate(crates/arrdb-query/src/planner/physical.rs:148-152), but the executor'sFinalAggOp(crates/arrdb-exec/src/agg.rs) never uses them. It simply iterates all chunks, accumulates partials, and merges into a single scalar result.FinalAggOp::newdoesn't accept group dimensions at all.Expected Behavior
SELECT avg_cells(temperature) FROM climate GROUP BY levelshould return 4 rows:Impact
This blocks per-dimension aggregation queries, which are needed for:
GROUP BY MONTH(time),GROUP BY YEAR(time))Files
crates/arrdb-exec/src/agg.rs—FinalAggOpneeds group-aware accumulationcrates/arrdb-exec/src/compiler.rs:78-86— needs to passgroup_dimsto the operatorcrates/arrdb-exec/src/result.rs—ExecutionResultneeds a grouped-aggregate variant