Skip to content

Add StructColumnData::Select to propagate selection to children#53

Open
krleonid wants to merge 1 commit into
mainfrom
feature/struct-select-optimization
Open

Add StructColumnData::Select to propagate selection to children#53
krleonid wants to merge 1 commit into
mainfrom
feature/struct-select-optimization

Conversation

@krleonid
Copy link
Copy Markdown
Owner

Summary

  • Implements StructColumnData::Select that propagates the selection vector directly to child columns instead of scanning all 2048 rows per vector and slicing
  • For all-NULL struct columns (constant validity), skips children immediately
  • Falls back to generic path for pushdown-extract mode

Motivation

On wide tables with many struct columns and selective filters (e.g., bloom filter selecting ~37 rows out of 2048), the previous code scanned all rows for every struct column then discarded 98% via slice. This change lets child columns use their compression-specific select functions to read only the needed rows.

Benchmarks (559-column table, 135K rows, filter selects ~2400 rows)

Metric Baseline Optimized Change
SEQ_SCAN (release+ASan, hot) 382ms 324ms -15%
CPU time 725ms 621ms -14%
Query latency 979ms 865ms -12%

On reldebug (O2, single-threaded): 6.3s → 1.9s (-70%)

Test plan

  • test/sql/types/struct/* — all 44 tests pass (3575 assertions)
  • test/sql/types/* — all 418 tests pass
  • test/sql/storage/compression/* — all 168 tests pass
  • Full CI suite

🤖 Generated with Claude Code

Previously, struct columns had no Select override and fell back to
ColumnData::Select which scans all 2048 rows per vector then slices
to the selected rows. For wide tables with many struct columns and
selective filters, this caused 98% wasted work.

The new Select implementation:
- Scans struct validity (detects all-NULL instantly)
- Propagates Select to child columns, which use compression-specific
  select functions to read only selected rows
- Falls back to generic path for pushdown-extract mode

On a 559-column table with dynamic bloom filter selecting ~37 rows
per 2048-row chunk, this reduces scan time by 12-15% on release
builds and up to 70% on non-LTO builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant