Skip to content

IR compiler re-walks AST to resolve types already computed during compilation #7

@jayendra13

Description

@jayendra13

Problem

compile_node in crates/arrdb-ir/src/compiler.rs returns only a RegId — it discards the dtype of the compiled register. When downstream code needs to know a register's type (e.g., to decide whether to insert a Cast), it calls resolve_type(operand, schema) which re-walks the entire AST subtree from scratch.

This happens in at least two places:

  1. Unary float-only ops (compiler.rs:101) — after compile_node(operand) returns, resolve_type(operand) re-walks the operand subtree to check if a Cast to Float64 is needed.
  2. emit_scalar_binop (compiler.rs:152) — calls resolve_type(buf_expr) to determine the buffer dtype for promotion, even though the subtree was already compiled.

Example

For sqrt(count + temp):

  • compile_node(count + temp) internally calls resolve_type on count and temp to compute promotion → first walk
  • Back in the UnaryOp arm, resolve_type(count + temp) re-resolves the same subtree → second walk

Same function, same subtree, same answer — computed twice because the first result was never stored.

Proposed fix

Track output dtype per register in CompileContext:

struct CompileContext<'a> {
    schema: &'a ArraySchema,
    ops: Vec<ExprOp>,
    next_reg: u16,
    reg_dtypes: Vec<DType>,  // indexed by RegId
}

Each compile_node call records the output dtype when allocating a register. Then type lookups become O(1):

let src = compile_node(operand, ctx)?;
let input_dt = &ctx.reg_dtypes[src.0 as usize];  // no re-walk

This eliminates all redundant resolve_type calls in the compiler.

Impact

Low — resolve_type is pure and the AST trees are small. This is an efficiency cleanup, not a correctness issue.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions