Add StructColumnData::Select to propagate selection to children#53
Open
krleonid wants to merge 1 commit into
Open
Add StructColumnData::Select to propagate selection to children#53krleonid wants to merge 1 commit into
krleonid wants to merge 1 commit into
Conversation
Previously, struct columns had no Select override and fell back to ColumnData::Select which scans all 2048 rows per vector then slices to the selected rows. For wide tables with many struct columns and selective filters, this caused 98% wasted work. The new Select implementation: - Scans struct validity (detects all-NULL instantly) - Propagates Select to child columns, which use compression-specific select functions to read only selected rows - Falls back to generic path for pushdown-extract mode On a 559-column table with dynamic bloom filter selecting ~37 rows per 2048-row chunk, this reduces scan time by 12-15% on release builds and up to 70% on non-LTO builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StructColumnData::Selectthat propagates the selection vector directly to child columns instead of scanning all 2048 rows per vector and slicingMotivation
On wide tables with many struct columns and selective filters (e.g., bloom filter selecting ~37 rows out of 2048), the previous code scanned all rows for every struct column then discarded 98% via slice. This change lets child columns use their compression-specific
selectfunctions to read only the needed rows.Benchmarks (559-column table, 135K rows, filter selects ~2400 rows)
On reldebug (O2, single-threaded): 6.3s → 1.9s (-70%)
Test plan
test/sql/types/struct/*— all 44 tests pass (3575 assertions)test/sql/types/*— all 418 tests passtest/sql/storage/compression/*— all 168 tests pass🤖 Generated with Claude Code