Skip to content

GROUP BY returns single row — executor ignores group dimensions #1

@jayendra13

Description

@jayendra13

Summary

GROUP BY <dimension> in aggregate queries always returns a single row instead of one row per group. The executor's FinalAggOp ignores the group_dims field entirely, reducing all chunks into a single aggregate value.

Reproduction

-- Register the tutorial dataset (4 variables, 365 timesteps, 4 pressure levels)
REGISTER DATASET climate FROM 'zarr:///path/to/tutorial_climate.zarr';

-- Expected: 4 rows (one per pressure level: 1000, 850, 500, 250)
-- Actual: 1 row with a single global average
SELECT avg_cells(temperature) FROM climate GROUP BY level;
-- Output: AVG(temperature) | 246.90143670341035

-- Expected: 365 rows (one per day)
-- Actual: 1 row
SELECT avg_cells(temperature) FROM climate GROUP BY time;
-- Output: AVG(temperature) | 246.90143670341035

Root Cause

The physical planner correctly passes group_dims into PhysicalPlan::FinalAggregate (crates/arrdb-query/src/planner/physical.rs:148-152), but the executor's FinalAggOp (crates/arrdb-exec/src/agg.rs) never uses them. It simply iterates all chunks, accumulates partials, and merges into a single scalar result.

// crates/arrdb-exec/src/compiler.rs:78-86
PhysicalPlan::FinalAggregate { input, aggs, .. } => {
    // `group_dims` is in the `..` — completely ignored
    let child = compile(actual_input, ctx)?;
    Ok(Box::new(FinalAggOp::new(child, aggs.clone())))
}

FinalAggOp::new doesn't accept group dimensions at all.

Expected Behavior

SELECT avg_cells(temperature) FROM climate GROUP BY level should return 4 rows:

level | AVG(temperature)
------+-----------------
1000  | 271.23
 850  | 258.45
 500  | 233.12
 250  | 224.81

Impact

This blocks per-dimension aggregation queries, which are needed for:

  • Per-lead-time forecast evaluation metrics (EWB benchmarking)
  • Time-series aggregation (GROUP BY MONTH(time), GROUP BY YEAR(time))
  • Any dimensional reduction that isn't a full collapse

Files

  • crates/arrdb-exec/src/agg.rsFinalAggOp needs group-aware accumulation
  • crates/arrdb-exec/src/compiler.rs:78-86 — needs to pass group_dims to the operator
  • crates/arrdb-exec/src/result.rsExecutionResult needs a grouped-aggregate variant

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions