feat[gpu]: sliced validity in Arrow device export by 0ax1 · Pull Request #8318 · vortex-data/vortex

0ax1 · 2026-06-09T15:27:37Z

No description provided.

codspeed-hq · 2026-06-09T15:31:18Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 3 improved benchmarks
❌ 3 regressed benchmarks
✅ 1521 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`bitwise_not_vortex_buffer_mut[128]`	216.9 ns	275.3 ns	-21.19%
❌	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	278.6 ns	336.9 ns	-17.31%
❌	Simulation	`bitwise_not_vortex_buffer_mut[2048]`	342.2 ns	400.6 ns	-14.56%
⚡	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	46.8 µs	31.9 µs	+46.83%
⚡	Simulation	`chunked_varbinview_canonical_into[(1000, 10)]`	198.2 µs	162 µs	+22.31%
⚡	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	213.4 µs	177.2 µs	+20.41%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ad/sliced-varbinview-e2e (f8d19e7) with develop (031fb76)}

Add cuDF e2e coverage for sliced and multi-buffer Utf8View arrays, including non-ASCII values and sliced null validity. Keep bit-offset validity repacking on the CUDA stream for Arrow Device export, with focused tests and a CUDA benchmark for the repack path. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Rebuild the validity bitmap 64 bits at a time with a funnel shift over two adjacent input words, masking the leading offset bits and the trailing length bits, instead of testing bits one by one. Launch one word per thread with a grid-stride loop so warp accesses coalesce. Repack of 100M bits on GH200 drops from 140us to 21us (6.7x). Also derive the output size from len + arrow_offset instead of taking a redundant output_bytes parameter, drop the now-unneeded output memset (every word is written, edge masks zero the padding), bound the host-to-device copy to the slice's bytes via shrink_offset, and cover negative-shift and multi-word offsets in the repack tests. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

0ax1 · 2026-06-10T14:11:25Z

GH 200:

┌───────────────────────────────────────────────────────────────┬───────────────┬────────────┬─────────────┬─────────────┐
│                        Kernel variant                         │ Time (median) │ Throughput │ Step change │ vs baseline │
├───────────────────────────────────────────────────────────────┼───────────────┼────────────┼─────────────┼─────────────┤
│ Bit-by-bit, byte writes (PR baseline)                         │      140.3 µs │   178 GB/s │           — │        1.0× │
├───────────────────────────────────────────────────────────────┼───────────────┼────────────┼─────────────┼─────────────┤
│ u64 funnel-shift words, blocked ranges (start_elem/stop_elem) │       39.0 µs │   641 GB/s │      −72.2% │        3.6× │
├───────────────────────────────────────────────────────────────┼───────────────┼────────────┼─────────────┼─────────────┤
│ + grid-stride loop (coalesced warp accesses)                  │       26.9 µs │   929 GB/s │      −31.2% │        5.2× │
├───────────────────────────────────────────────────────────────┼───────────────┼────────────┼─────────────┼─────────────┤
│ + 256 threads/block, one word per thread                      │       21.0 µs │  1.19 TB/s │      −21.8% │        6.7× │
└───────────────────────────────────────────────────────────────┴───────────────┴────────────┴─────────────┴─────────────┘

0ax1 force-pushed the ad/sliced-varbinview-e2e branch from cdee93f to da7278c Compare June 9, 2026 15:29

0ax1 force-pushed the ad/sliced-varbinview-e2e branch from da7278c to 795fe55 Compare June 9, 2026 15:31

0ax1 changed the title ~~test[gpu]: cover sliced utf8 Arrow device export~~ test[gpu]: sliced utf8 Arrow device export Jun 9, 2026

0ax1 changed the title ~~test[gpu]: sliced utf8 Arrow device export~~ feat[gpu]: cover sliced validity in Arrow device export Jun 9, 2026

0ax1 added the changelog/feature A new feature label Jun 9, 2026

0ax1 changed the title ~~feat[gpu]: cover sliced validity in Arrow device export~~ feat[gpu]: sliced validity in Arrow device export Jun 9, 2026

0ax1 marked this pull request as ready for review June 9, 2026 15:34

0ax1 requested a review from a team June 9, 2026 15:34

0ax1 force-pushed the ad/sliced-varbinview-e2e branch from 795fe55 to 6ce4bd7 Compare June 9, 2026 15:39

0ax1 enabled auto-merge (squash) June 9, 2026 15:41

0ax1 requested review from onursatici and robert3005 June 9, 2026 15:41

robert3005 reviewed Jun 9, 2026

View reviewed changes

Comment thread vortex-cuda/kernels/src/arrow_validity.cu Outdated

0ax1 and others added 2 commits June 10, 2026 14:08

style[gpu]: clang-format arrow_validity kernel

f8d19e7

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

0ax1 requested a review from robert3005 June 10, 2026 14:13

robert3005 approved these changes Jun 10, 2026

View reviewed changes

0ax1 merged commit f46621d into develop Jun 10, 2026
78 of 81 checks passed

0ax1 deleted the ad/sliced-varbinview-e2e branch June 10, 2026 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[gpu]: sliced validity in Arrow device export#8318

feat[gpu]: sliced validity in Arrow device export#8318
0ax1 merged 3 commits into
developfrom
ad/sliced-varbinview-e2e

0ax1 commented Jun 9, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

0ax1 commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0ax1 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Uh oh!

Uh oh!

0ax1 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0ax1 commented Jun 9, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 9, 2026 •

edited

Loading

0ax1 commented Jun 10, 2026 •

edited

Loading