compute: Add WASM-compiled scalar expression evaluation (milestone 1)#35250
Draft
antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
Draft
compute: Add WASM-compiled scalar expression evaluation (milestone 1)#35250antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
antiguru wants to merge 10 commits intoMaterializeInc:mainfrom
Conversation
Introduce the `mz-expr-compiler` crate, which compiles a subset of `MirScalarExpr` trees to WebAssembly modules that operate on columnar data. This is milestone 1: an end-to-end proof compiling `a + b` for Int64 columns. The crate has four modules: * `analyze` — determines whether an expression tree is compilable (milestone 1: Column, Literal(Int64), CallBinary(AddInt64)) * `codegen` — emits a WASM module via `wasm-encoder` with a row loop that reads columnar Int64 inputs, performs i64.add, and writes columnar output with null propagation * `columnar` — columnar buffer types (`ColumnBatch`, `TypedColumn`, `ResultColumn`) with `rows_to_columns` / `columns_to_rows` conversions * `engine` — wraps `wasmtime` to compile and instantiate the generated WASM, exposing `ExprEngine::compile()` and `CompiledExpr::evaluate()` The compute layer integration adds: * `ENABLE_COMPILED_EXPRESSIONS` dyncfg (default: false) in `compute-types` * A `try_compile_mfp_expressions` hook in `as_collection_core()` that attempts WASM compilation when the flag is enabled and logs the result. Actual batch evaluation is a follow-up. Known limitation: the generated WASM uses wrapping i64.add without overflow detection; the interpreter uses checked_add and returns NumericFieldOverflow. Overflow detection will be added in milestone 2. Tests: 19 tests including 7 analysis unit tests, 3 codegen tests, 2 columnar round-trip tests, 4 engine integration tests, and 3 proptest differential tests comparing compiled vs interpreted on random data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
* Align wasm-encoder version with wasmtime's internal dependency (0.244.0) to avoid duplicate crate in cargo deny * Add gimli and linux-raw-sys to deny.toml skip list for wasmtime deps * Add wasmtime crates as wrappers for the banned `log` crate * Refactor proptests to use closure form with #[mz_ore::test] to satisfy the test-attribute lint * Regenerate workspace-hack Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the `try_compile_mfp_expressions` compilation probe in `as_collection_core()` with an actual `MfpEvaluator` dispatch that routes between the interpreter and WASM-compiled evaluation. Add `CompiledExprSession` for per-row WASM evaluation of a single expression through a cached instance, and `CompiledMfp` which wraps an `MfpPlan` and dispatches each expression to WASM or the interpreter. The temporal bounds and projection logic remains interpreted. The `MfpEvaluator` enum in the flat_map closure selects between `MfpPlan::evaluate` (interpreted) and `CompiledMfp::evaluate` (compiled) based on the `ENABLE_COMPILED_EXPRESSIONS` feature flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The WASM codegen for AddInt64 previously emitted a bare i64.add which wraps on overflow, while the interpreter uses checked_add and returns EvalError::NumericFieldOverflow. This caused the compiled path to silently produce wrong results on overflow. Add inline overflow detection after i64.add in emit_add_int64. Two new i64 locals (local_a, local_b) save operands before the add via local.tee. After the add, the standard signed overflow check ((a ^ result) & (b ^ result)) < 0 detects when both operands share a sign but the result differs. The check is guarded by !is_null to skip garbage values from null propagation. Tighten assert_compiled_matches_interpreted in the proptests to require exact error agreement between compiled and interpreted paths, removing the lenient blocks that previously tolerated wrapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expand the expr-compiler from only AddInt64 to all Int64 binary arithmetic (sub, mul, div, mod), bitwise (and, or, xor), and unary (neg, bitnot, abs) operations. Infrastructure changes: * Error codes instead of boolean error flag: the WASM error byte now carries a discriminant (0=ok, 1=NumericFieldOverflow, 2=DivisionByZero, 3=Int64OutOfRange) that the host maps to the appropriate EvalError. * Fix local clobbering in nested expressions: operands are saved to locals only after both children are evaluated, preventing inner calls from overwriting local_a/local_b before the outer operation reads them. * Fix null/error precedence: each fallible operation saves and restores is_null around child evaluation so that errors in sibling subtrees propagate even when another subtree produces null, matching the interpreter's semantics where errors take precedence over nulls. * Add CallUnary support to is_compilable, collect_columns, and emit_expr. * Introduce EmitLocals struct and preamble/postamble helpers to reduce parameter threading across emit functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add milestone 3 of the WASM expression compiler: comparison operators, Bool as a WASM output type, and compiled predicate evaluation. Type inference (`infer_type`) determines expression output types from input column types, enabling generic comparison operators (Eq, NotEq, Lt, Lte, Gt, Gte) to be compiled when both operands are Int64. WASM comparisons return i32, widened to i64 via `i64.extend_i32_u` to maintain the stack convention. Bool values use i64 encoding (0=false, 1=true). `is_compilable` now accepts `input_types` to gate comparisons on operand types. Bool unary `Not` uses `i64.eqz` + extend. `Datum::True`/`False` compile to `i64.const 1`/`0`. Host-side decoding maps i64 results to `Datum::True`/`Datum::False` or `TypedColumn::Bool` based on inferred output type. `CompiledMfp::try_new` accepts `input_types` and compiles predicates alongside expressions. `evaluate_inner` uses compiled predicate sessions, falling back to the interpreter for non-compiled predicates. The compute call site passes `&[]` for now (no regressions). Added 10 proptests covering all comparison operators, Not, nested comparisons with arithmetic, and comparisons with literals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…inference Add WASM-compiled support for VariadicFunc::And/Or with three-valued SQL null logic (FALSE dominates AND, TRUE dominates OR), UnaryFunc::IsNull/ IsTrue/IsFalse as null-consuming operations that never produce null output, and Datum::True/False input handling. Add infer_input_types_from_mfp() to derive column types from expression usage patterns (e.g., AddInt64 implies Int64 operands), replacing the hardcoded &[] at the compute call site so comparisons and predicates can now compile to WASM at runtime. The And/Or codegen saves and restores tracking locals on the WASM stack around each child evaluation to support arbitrary nesting depth. Proptests cover all new operations including nested And-of-Or and Or-of-And combinations, plus deterministic edge-case tests for null dominance semantics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uation Add mfp_eval benchmark with 7 scenarios (arithmetic, predicates, boolean logic, null filtering) across 3 batch sizes comparing CompiledMfp against SafeMfpPlan throughput per row. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add eval_batch() that processes N rows in a single WASM call instead of one call per row. The method reuses the existing WASM instance, grows memory automatically when the batch exceeds capacity, and writes input data in column-major layout matching the generated WASM function's expectations. Benchmarks show 8-17x speedup over interpreted evaluation and 15-40x over per-row compiled evaluation, confirming WASM call overhead was the bottleneck in the per-row path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce the
mz-expr-compilercrate, which compiles a subset ofMirScalarExprtrees to WebAssembly modules that operate on columnar data.This is milestone 1: an end-to-end proof compiling
a + bfor Int64 columns.The crate has four modules:
analyze— determines whether an expression tree is compilable (milestone 1: Column, Literal(Int64), CallBinary(AddInt64))codegen— emits a WASM module viawasm-encoderwith a row loop that reads columnar Int64 inputs, performs i64.add, and writes columnar output with null propagationcolumnar— columnar buffer types (ColumnBatch,TypedColumn,ResultColumn) with row-to-column and column-to-row conversionsengine— wrapswasmtimeto compile and instantiate the generated WASM, exposingExprEngine::compile()andCompiledExpr::evaluate()The compute layer integration adds an
ENABLE_COMPILED_EXPRESSIONSdyncfg (default: false) and atry_compile_mfp_expressionshook inas_collection_core()that attempts WASM compilation when the flag is enabled.Actual batch evaluation replacing the interpreter is a follow-up.
Known limitation: the generated WASM uses wrapping i64.add without overflow detection.
The interpreter uses
checked_addand returnsNumericFieldOverflow.Overflow detection will be added in milestone 2.
Tests added: 19 tests in
mz-expr-compiler— analysis unit tests, codegen validation, columnar round-trip tests, engine integration tests, and proptest differential tests comparing compiled vs interpreted on random Int64 data.🤖 Generated with Claude Code