Skip to content

compute: Add WASM-compiled scalar expression evaluation (milestone 1)#35250

Draft
antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
antiguru:compiled_mse
Draft

compute: Add WASM-compiled scalar expression evaluation (milestone 1)#35250
antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
antiguru:compiled_mse

Conversation

@antiguru
Copy link
Member

Introduce the mz-expr-compiler crate, which compiles a subset of MirScalarExpr trees to WebAssembly modules that operate on columnar data.
This is milestone 1: an end-to-end proof compiling a + b for Int64 columns.

The crate has four modules:

  • analyze — determines whether an expression tree is compilable (milestone 1: Column, Literal(Int64), CallBinary(AddInt64))
  • codegen — emits a WASM module via wasm-encoder with a row loop that reads columnar Int64 inputs, performs i64.add, and writes columnar output with null propagation
  • columnar — columnar buffer types (ColumnBatch, TypedColumn, ResultColumn) with row-to-column and column-to-row conversions
  • engine — wraps wasmtime to compile and instantiate the generated WASM, exposing ExprEngine::compile() and CompiledExpr::evaluate()

The compute layer integration adds an ENABLE_COMPILED_EXPRESSIONS dyncfg (default: false) and a try_compile_mfp_expressions hook in as_collection_core() that attempts WASM compilation when the flag is enabled.
Actual batch evaluation replacing the interpreter is a follow-up.

Known limitation: the generated WASM uses wrapping i64.add without overflow detection.
The interpreter uses checked_add and returns NumericFieldOverflow.
Overflow detection will be added in milestone 2.

Tests added: 19 tests in mz-expr-compiler — analysis unit tests, codegen validation, columnar round-trip tests, engine integration tests, and proptest differential tests comparing compiled vs interpreted on random Int64 data.

🤖 Generated with Claude Code

Introduce the `mz-expr-compiler` crate, which compiles a subset of
`MirScalarExpr` trees to WebAssembly modules that operate on columnar
data. This is milestone 1: an end-to-end proof compiling `a + b` for
Int64 columns.

The crate has four modules:
* `analyze` — determines whether an expression tree is compilable
  (milestone 1: Column, Literal(Int64), CallBinary(AddInt64))
* `codegen` — emits a WASM module via `wasm-encoder` with a row loop
  that reads columnar Int64 inputs, performs i64.add, and writes
  columnar output with null propagation
* `columnar` — columnar buffer types (`ColumnBatch`, `TypedColumn`,
  `ResultColumn`) with `rows_to_columns` / `columns_to_rows` conversions
* `engine` — wraps `wasmtime` to compile and instantiate the generated
  WASM, exposing `ExprEngine::compile()` and `CompiledExpr::evaluate()`

The compute layer integration adds:
* `ENABLE_COMPILED_EXPRESSIONS` dyncfg (default: false) in
  `compute-types`
* A `try_compile_mfp_expressions` hook in `as_collection_core()` that
  attempts WASM compilation when the flag is enabled and logs the result.
  Actual batch evaluation is a follow-up.

Known limitation: the generated WASM uses wrapping i64.add without
overflow detection; the interpreter uses checked_add and returns
NumericFieldOverflow. Overflow detection will be added in milestone 2.

Tests: 19 tests including 7 analysis unit tests, 3 codegen tests, 2
columnar round-trip tests, 4 engine integration tests, and 3 proptest
differential tests comparing compiled vs interpreted on random data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

antiguru and others added 9 commits February 27, 2026 15:58
* Align wasm-encoder version with wasmtime's internal dependency (0.244.0)
  to avoid duplicate crate in cargo deny
* Add gimli and linux-raw-sys to deny.toml skip list for wasmtime deps
* Add wasmtime crates as wrappers for the banned `log` crate
* Refactor proptests to use closure form with #[mz_ore::test] to satisfy
  the test-attribute lint
* Regenerate workspace-hack

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the `try_compile_mfp_expressions` compilation probe in
`as_collection_core()` with an actual `MfpEvaluator` dispatch that
routes between the interpreter and WASM-compiled evaluation.

Add `CompiledExprSession` for per-row WASM evaluation of a single
expression through a cached instance, and `CompiledMfp` which wraps
an `MfpPlan` and dispatches each expression to WASM or the interpreter.
The temporal bounds and projection logic remains interpreted.

The `MfpEvaluator` enum in the flat_map closure selects between
`MfpPlan::evaluate` (interpreted) and `CompiledMfp::evaluate`
(compiled) based on the `ENABLE_COMPILED_EXPRESSIONS` feature flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The WASM codegen for AddInt64 previously emitted a bare i64.add which
wraps on overflow, while the interpreter uses checked_add and returns
EvalError::NumericFieldOverflow. This caused the compiled path to
silently produce wrong results on overflow.

Add inline overflow detection after i64.add in emit_add_int64. Two new
i64 locals (local_a, local_b) save operands before the add via
local.tee. After the add, the standard signed overflow check
((a ^ result) & (b ^ result)) < 0 detects when both operands share a
sign but the result differs. The check is guarded by !is_null to skip
garbage values from null propagation.

Tighten assert_compiled_matches_interpreted in the proptests to require
exact error agreement between compiled and interpreted paths, removing
the lenient blocks that previously tolerated wrapping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expand the expr-compiler from only AddInt64 to all Int64 binary
arithmetic (sub, mul, div, mod), bitwise (and, or, xor), and unary
(neg, bitnot, abs) operations.

Infrastructure changes:
* Error codes instead of boolean error flag: the WASM error byte now
  carries a discriminant (0=ok, 1=NumericFieldOverflow, 2=DivisionByZero,
  3=Int64OutOfRange) that the host maps to the appropriate EvalError.
* Fix local clobbering in nested expressions: operands are saved to
  locals only after both children are evaluated, preventing inner calls
  from overwriting local_a/local_b before the outer operation reads them.
* Fix null/error precedence: each fallible operation saves and restores
  is_null around child evaluation so that errors in sibling subtrees
  propagate even when another subtree produces null, matching the
  interpreter's semantics where errors take precedence over nulls.
* Add CallUnary support to is_compilable, collect_columns, and emit_expr.
* Introduce EmitLocals struct and preamble/postamble helpers to reduce
  parameter threading across emit functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add milestone 3 of the WASM expression compiler: comparison operators,
Bool as a WASM output type, and compiled predicate evaluation.

Type inference (`infer_type`) determines expression output types from
input column types, enabling generic comparison operators (Eq, NotEq, Lt,
Lte, Gt, Gte) to be compiled when both operands are Int64. WASM
comparisons return i32, widened to i64 via `i64.extend_i32_u` to maintain
the stack convention. Bool values use i64 encoding (0=false, 1=true).

`is_compilable` now accepts `input_types` to gate comparisons on operand
types. Bool unary `Not` uses `i64.eqz` + extend. `Datum::True`/`False`
compile to `i64.const 1`/`0`. Host-side decoding maps i64 results to
`Datum::True`/`Datum::False` or `TypedColumn::Bool` based on inferred
output type.

`CompiledMfp::try_new` accepts `input_types` and compiles predicates
alongside expressions. `evaluate_inner` uses compiled predicate sessions,
falling back to the interpreter for non-compiled predicates. The compute
call site passes `&[]` for now (no regressions).

Added 10 proptests covering all comparison operators, Not, nested
comparisons with arithmetic, and comparisons with literals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…inference

Add WASM-compiled support for VariadicFunc::And/Or with three-valued SQL
null logic (FALSE dominates AND, TRUE dominates OR), UnaryFunc::IsNull/
IsTrue/IsFalse as null-consuming operations that never produce null
output, and Datum::True/False input handling.

Add infer_input_types_from_mfp() to derive column types from expression
usage patterns (e.g., AddInt64 implies Int64 operands), replacing the
hardcoded &[] at the compute call site so comparisons and predicates
can now compile to WASM at runtime.

The And/Or codegen saves and restores tracking locals on the WASM stack
around each child evaluation to support arbitrary nesting depth.

Proptests cover all new operations including nested And-of-Or and
Or-of-And combinations, plus deterministic edge-case tests for null
dominance semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uation

Add mfp_eval benchmark with 7 scenarios (arithmetic, predicates, boolean
logic, null filtering) across 3 batch sizes comparing CompiledMfp against
SafeMfpPlan throughput per row.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add eval_batch() that processes N rows in a single WASM call instead of
one call per row. The method reuses the existing WASM instance, grows
memory automatically when the batch exceeds capacity, and writes input
data in column-major layout matching the generated WASM function's
expectations.

Benchmarks show 8-17x speedup over interpreted evaluation and 15-40x
over per-row compiled evaluation, confirming WASM call overhead was the
bottleneck in the per-row path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant