Skip to content

Make ScalarFn array validity lazy when the function defines a validity expression#8336

Open
joseph-isaacs wants to merge 2 commits into
claude/cool-bardeen-l8jlsy-3-definitely-all-invalidfrom
claude/cool-bardeen-l8jlsy-4-lazy-scalarfn-validity
Open

Make ScalarFn array validity lazy when the function defines a validity expression#8336
joseph-isaacs wants to merge 2 commits into
claude/cool-bardeen-l8jlsy-3-definitely-all-invalidfrom
claude/cool-bardeen-l8jlsy-4-lazy-scalarfn-validity

Conversation

@joseph-isaacs

Copy link
Copy Markdown
Contributor

Summary

PR 4 of a 4-PR stack (stacked on #8335) — the payoff: lazy validity for ScalarFn arrays.

Previously ValidityVTable<ScalarFn> always eagerly executed the validity expression via the legacy session. Now, when the scalar function provides a validity expression over its inputs, the expression is converted into a lazy ScalarFn array DAG instead:

  • Literal nodes → ConstantArray
  • ArrayExpr leaves → unwrap to the child array they hold
  • interior nodes → lazy ScalarFn arrays via Array::<ScalarFn>::try_new
  • constant results are folded back into AllValid/AllInvalid via child_to_validity

Why the eager path remains for some functions

Functions that don't define a validity expression (Kleene and/or, where validity depends on the computed values) keep the eager path. The erased fallback for these is is_not_null(expr) — self-referential: lazily materializing it means resolving the validity of the inner node, which spawns another is_not_null DAG over a fresh copy of the same node, recursing without ever shrinking. This manifested as a stack overflow in element-wise execution paths (execute_scalaris_invalidvalidity()), caught by test_bool_consistency. The new ScalarFnRef::validity_opt exposes whether a function defines its own validity expression so the vtable can pick the right path.

Checks

  • cargo nextest run -p vortex-array (2962 passed)
  • cargo nextest run --workspace --exclude vortex-cuda --exclude vortex-nvcomp --exclude vortex-tensor --exclude vortex-duckdb (passed)
  • cargo nextest run -p vortex-duckdb (196 passed; 2 network-dependent tests excluded)
  • cargo clippy -p vortex-array --all-targets, cargo +nightly fmt --all

https://claude.ai/code/session_01VPQ7dfZtijfrsjAipwXvEj


Generated by Claude Code

…y expression

Previously ValidityVTable<ScalarFn> always eagerly executed the
validity expression via the legacy session. Now, when the scalar
function provides a validity expression over its inputs, the expression
is converted into a lazy ScalarFn array DAG instead: Literal nodes
become ConstantArrays, ArrayExpr leaves unwrap to the child arrays they
hold, and interior nodes become lazy ScalarFn arrays. Constant results
are folded back into AllValid/AllInvalid via child_to_validity.

Functions that do not define a validity expression (e.g. Kleene logic
and/or, where validity depends on the computed values) keep the eager
path. The erased fallback for these is is_not_null over the expression
itself, so a lazy representation would be self-referential: resolving
the validity of the inner node spawns another is_not_null DAG, which
recurses without ever shrinking (this manifested as a stack overflow in
element-wise execution paths). ScalarFnRef::validity_opt is added to
expose whether a function defines its own validity expression.

https://claude.ai/code/session_01VPQ7dfZtijfrsjAipwXvEj
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Jun 10, 2026 — with Claude
…to claude/cool-bardeen-l8jlsy-4-lazy-scalarfn-validity
@codspeed-hq

codspeed-hq Bot commented Jun 10, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 17.27%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 5 regressed benchmarks
✅ 1521 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation varbinview_zip_block_mask 2.9 ms 3.7 ms -21.57%
Simulation bitwise_not_vortex_buffer_mut[128] 216.9 ns 275.3 ns -21.19%
Simulation bitwise_not_vortex_buffer_mut[1024] 278.6 ns 336.9 ns -17.31%
Simulation bitwise_not_vortex_buffer_mut[2048] 342.2 ns 400.6 ns -14.56%
Simulation varbinview_zip_fragmented_mask 6.2 ms 6.9 ms -11.27%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/cool-bardeen-l8jlsy-4-lazy-scalarfn-validity (d72800b) with claude/cool-bardeen-l8jlsy-3-definitely-all-invalid (365401e)1

Open in CodSpeed

Footnotes

  1. No successful run was found on claude/cool-bardeen-l8jlsy-3-definitely-all-invalid (97b60ac) during the generation of this report, so ff6018d was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Copy link
Copy Markdown
Contributor Author

Investigated the CodSpeed varbinview_zip_* regressions — they are a codegen-layout artifact, not a real cost of this change:

  1. The changed code never runs in these benchmarks. Instrumenting ScalarFn::validity with a counter shows 0 calls during varbinview_zip_block_mask/varbinview_zip_fragmented_mask (vs ~47k calls in the bool-consistency tests, confirming the instrumentation works). The benches go straight through the dedicated ZipKernel for VarBinView and never query a ScalarFn array's validity.

  2. Bisecting the two files in this PR reproduces the full ~19% delta locally only when arrays/scalar_fn/vtable/validity.rs changes — the file whose code is never executed here. The scalar_fn/erased.rs change alone shows no delta.

  3. codegen-units=1 erases the regression entirely (branch 3: 505µs median vs branch 3 + this PR's files: 494µs). With default codegen units, adding code to that module shifts LLVM's CGU partitioning and changes inlining in the unrelated hot zip loop, which CodSpeed's instruction-counting simulation faithfully reports.

The same flavor of noise shows in the rest of the stack: the bitwise_not_vortex_buffer_mut and chunked_*_canonical_into benchmarks flip between ±20–47% on adjacent PRs that don't touch vortex-buffer or the chunked builders at all.

I can't acknowledge regressions on the CodSpeed dashboard from this session — that needs someone with dashboard access.

https://claude.ai/code/session_01VPQ7dfZtijfrsjAipwXvEj


Generated by Claude Code

/// Transforms the expression into one representing the validity of this expression.
pub fn validity(&self, expr: &Expression) -> VortexResult<Expression> {
Ok(self.0.validity(expr)?.unwrap_or_else(|| {
Ok(self.validity_opt(expr)?.unwrap_or_else(|| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to remove the TODO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants