Skip to content

go/writer: column-oriented parquet write buffer#4485

Draft
jacobmarble wants to merge 1 commit into
mainfrom
jgm-parquet-columns-simple
Draft

go/writer: column-oriented parquet write buffer#4485
jacobmarble wants to merge 1 commit into
mainfrom
jgm-parquet-columns-simple

Conversation

@jacobmarble
Copy link
Copy Markdown
Contributor

Description:

  1. Switch the Parquet write buffer from row-major [][]any to column-major []any of typed slice pointers (*[]int64, *[]parquet.ByteArray, etc.) with a parallel []int16 definition-level slice. This avoids boxing a 3-word slice header on every append.

  2. Values are converted to their Parquet physical types by the Write method instead of during the flush loop. This eliminates the transposition step and reduces GC pressure. It also exposes type conversion errors emitted by getFooVal(val) at the same time.

  3. Parquet Variant types require two physical columns. This prepares the internal API to do that better.

Workflow steps:

n/a

Documentation links affected:

n/a

Notes for reviewers:

This is the simple part of #4471

@jacobmarble jacobmarble requested review from a team and removed request for a team May 15, 2026 16:53
@jacobmarble jacobmarble marked this pull request as draft May 15, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant