Add cross-segment Select with result_offset for wide table scans by krleonid · Pull Request #54 · krleonid/duckdb

krleonid · 2026-05-21T10:38:25Z

Summary

Adds result_offset parameter to compression_select_t API, enabling Select to write at arbitrary positions in the result vector
ColumnData::SelectVector now handles cross-segment selection by splitting the selection vector across segment boundaries
StandardColumnData::Select no longer falls back to full Scan+Slice when vectors span multiple segments
StructColumnData::Select propagates selection to child columns and short-circuits for constant-NULL structs

Problem

With 16KB block size, VARCHAR segments hold ~1279 rows. A 2048-row vector spans ~2 segments. Previously, StandardColumnData::Select checked scan_entire_vector which was always false for cross-segment vectors, forcing a fallback to Scan(all 2048 rows) + Slice(keep 34) — wasting 98% of decode work.

Solution

SelectVector now iterates over segments, calling segment.Select() with the appropriate result_offset for each segment's portion of selected rows. All compression Select implementations (RLE, uncompressed string, FSST, dict_fsst, validity, numeric constant, empty validity) accept the new offset parameter.

Benchmarks

DB1: 559 columns, 135K rows, 16KB blocks, dynamic filter selects ~34 rows per 2048-row chunk:

Metric	Baseline	Optimized	Change
Scan	0.115-0.128s	0.103-0.123s	-15%
Latency	0.186	0.171	-10%

DB2: 982 columns, 21K rows, 16KB blocks:

Metric	Baseline	Optimized	Change
Scan	0.148s	0.132s	-11%
Latency	0.326s	0.259s	-21%

Test plan

test/sql/types/struct/* — all 44 tests pass
Correctness verified (count + sum of column lengths match)
Full CI suite

🤖 Generated with Claude Code

Two optimizations for wide tables with selective filters: 1. Cross-segment SelectVector: Previously, when a vector (2048 rows) spanned multiple storage segments (common with 16KB blocks where VARCHAR segments hold ~1279 rows), StandardColumnData::Select fell back to scanning all 2048 rows then slicing. Now SelectVector splits the selection across segments, calling segment.Select() on each with a result_offset, reading only the selected rows from each segment. 2. StructColumnData::Select: Propagates the selection vector to child columns instead of scanning all rows. Short-circuits immediately for constant-NULL structs. The compression_select_t API gains a result_offset parameter, allowing Select to write at arbitrary positions in the result vector. All compression implementations updated: RLE, uncompressed string, FSST, dict_fsst, validity, numeric constant, and empty validity. Benchmarks (559-col table, 135K rows, 16KB blocks, filter selects ~34/2048): - Scan: 0.115-0.128s → 0.103-0.123s (15% faster) - 982-col table: latency 0.326s → 0.259s (21% faster) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Several compression Select implementations ignored the result_offset parameter, writing results at index 0 instead of the correct offset when spanning multiple segments. Also fix DictFSST dictionary mode to not mutate the result vector type (scan+slice made it non-flat, crashing subsequent segments), and guard StandardColumnData::Select against non-flat vectors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

krleonid force-pushed the feature/cross-segment-select branch from c1220d1 to b2fbe7f Compare May 21, 2026 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cross-segment Select with result_offset for wide table scans#54

Add cross-segment Select with result_offset for wide table scans#54
krleonid wants to merge 2 commits into
mainfrom
feature/cross-segment-select

krleonid commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krleonid commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Benchmarks

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krleonid commented May 21, 2026 •

edited

Loading