Skip to content

go/writer: improve parquet write performance#4471

Draft
jacobmarble wants to merge 6 commits into
mainfrom
jgm-parquet-columns
Draft

go/writer: improve parquet write performance#4471
jacobmarble wants to merge 6 commits into
mainfrom
jgm-parquet-columns

Conversation

@jacobmarble
Copy link
Copy Markdown
Contributor

@jacobmarble jacobmarble commented May 14, 2026

Description:

Improves parquet write per a new benchmark:

  • 15% CPU
  • 41% memory
  • 75% memory allocations

Two-part refactor of ParquetWriter's buffer-to-sink pipeline:

  1. Column-oriented buffer (b55c7158d): changes the write buffer from [][]any to []any of typed slice pointers (*[]int64, *[]parquet.ByteArray, etc.) with a parallel []int16 definition-level slice. Values are converted to their Parquet physical types at Write time instead of during the flush loop, eliminating the per-flush transposition pass and reducing GC scanning pressure.

  2. Page-copy (cf5e9e76d): transferColumnValues previously decoded every column value from the scratch file and re-encoded it into the sink. Now the scratch writer uses the same codec as the sink and disables dictionary encoding, so compressed data pages are copied byte-for-byte through a new mergeWriter type that accepts pre-encoded pages and tracks row counts explicitly.

Benchmark results (A = baseline, B = column-oriented buffer, C = page-copy):

Step 1 (A→B): cpu +7%, mem −25%, allocs ~unchanged.
Step 2 (B→C): cpu −9–25%, mem −6–33%, allocs −75%.

                                   │      A (sec/op)      │          C (sec/op)          │
                                   │        sec/op        │   sec/op     vs base         │
ParquetWriter/small/uncompressed-8            699.4m ± 2%   559.9m ± 2%  -19.95% (p=0.000 n=8+7)
ParquetWriter/small/snappy-8                  718.6m ± 3%   636.6m ± 6%  -11.41% (p=0.000 n=8+7)
ParquetWriter/large/uncompressed-8             3.271 ± 2%    2.653 ± 2%  -18.92% (p=0.000 n=8+7)
ParquetWriter/large/snappy-8                   3.318 ± 4%    3.036 ± 2%   -8.48% (p=0.000 n=8+7)
geomean                                        1.528         1.302       -14.83%

                                   │     A (B/op)     │          C (B/op)           │
                                   │      B/op        │    B/op      vs base        │
ParquetWriter/small/uncompressed-8          638.3Mi ± 0%   333.0Mi ± 0%  -47.82% (p=0.000 n=8+7)
ParquetWriter/small/snappy-8               651.8Mi ± 0%   444.2Mi ± 0%  -31.85% (p=0.000 n=8+7)
ParquetWriter/large/uncompressed-8         2.945Gi ± 0%   1.464Gi ± 0%  -50.29% (p=0.000 n=8+7)
ParquetWriter/large/snappy-8               2.967Gi ± 0%   1.990Gi ± 0%  -32.94% (p=0.000 n=8+7)
geomean                                    1.364Gi        819.8Mi       -41.32%

                                   │  A (allocs/op)  │       C (allocs/op)       │
                                   │   allocs/op     │ allocs/op    vs base      │
ParquetWriter/small/uncompressed-8       4.013M ± 0%   1.020M ± 0%  -74.58% (p=0.000 n=8+7)
ParquetWriter/small/snappy-8             4.013M ± 0%   1.021M ± 0%  -74.56% (p=0.000 n=8+7)
ParquetWriter/large/uncompressed-8      19.997M ± 0%   5.046M ± 0%  -74.77% (n=8+7)
ParquetWriter/large/snappy-8            19.997M ± 0%   5.048M ± 0%  -74.75% (p=0.000 n=8+7)
geomean                                  8.958M        2.269M       -74.67%

Net: −15% CPU, −41% memory, −75% allocations (all p=0.000).

Workflow steps:

No user-visible change.

Documentation links affected:

None.

Notes for reviewers:

  • makeColumnBuffer stores a pointer-to-slice in []any to avoid boxing a 3-word slice header on every append.
  • appendVal[T parquetValue] is a 5-line generic the compiler inlines at each call site, allowing devirtualization of the getValFn[T] parameter.
  • mergeWriter (merge_writer.go) is the new sink type; it accepts WriteDataPage calls and tracks row counts via SetNumRows.

@jacobmarble jacobmarble force-pushed the jgm-parquet-columns branch from 1050885 to cf5e9e7 Compare May 14, 2026 17:53
@jacobmarble jacobmarble marked this pull request as ready for review May 14, 2026 18:24
@jacobmarble jacobmarble marked this pull request as draft May 14, 2026 18:26
@jacobmarble jacobmarble force-pushed the jgm-parquet-columns branch from d76c2d0 to 3e7d697 Compare May 14, 2026 19:03
@jacobmarble jacobmarble changed the title go/writer: column-oriented parquet write buffer go/writer: improve parquet write performance May 14, 2026
@jacobmarble jacobmarble marked this pull request as ready for review May 14, 2026 19:17
@jacobmarble jacobmarble requested a review from a team May 14, 2026 19:17
Copy link
Copy Markdown
Member

@mdibaiee mdibaiee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing results!!

A few small comments, otherwise LGTM

One note: I think we can now remove WithDisableDictionaryEncoding here:

func WithDisableDictionaryEncoding() ParquetOption {
return func(cfg *parquetConfig) {
cfg.disableDictionaryEncoding = true
}
}

Comment thread go/writer/merge_writer.go
}

// Close finalizes any open row group, writes the parquet footer (file metadata, footer length,
// and trailing magic), and marks the writer closed. The underlying sink is not itself closed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the underlying sink not closed?

Comment thread go/writer/merge_writer.go
}

// SetNumRows records the row count for this row group; required before Close. The same value
// must also describe every column written into the row group.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means "The same value must also describe every column written into the row group"

Comment thread go/writer/merge_writer.go
parent *mergeWriter
metadata *metadata.RowGroupMetaDataBuilder
ordinal int16
nextCol int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The name of this property threw me off a bit... if I understand correctly this is essentially colOrdinal of the column being written by NextColumn

also maybe we can store it as int16 since it seems to be used as such

Comment thread go/writer/parquet.go
// at construction time will fail again immediately on the first real write.
sinkWriter, err := newMergeWriter(cwc, schemaRoot, props, kvmeta)
if err != nil {
panic(fmt.Sprintf("creating sink writer: %s", err))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we get better error presentation for users if we bubble up the error or we need the panic stacktrace?

Comment thread go/writer/merge_writer.go
Comment on lines +229 to +232
compressed := cw.pageWriter.Compress(&buf, page.Data())
compressedData := make([]byte, len(compressed))
copy(compressedData, compressed)
newBuf := memory.NewBufferBytes(compressedData)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a comment as to why the copy is necessary?

@mdibaiee
Copy link
Copy Markdown
Member

@jacobmarble it seems like tests are timing out on this branch while they work on main... perhaps something is hanging with the new implementation?

@dyaffe
Copy link
Copy Markdown
Member

dyaffe commented May 15, 2026

Just commenting directly here as per the slack thread. Lets hold off on this change until after we switch to the Snowpipe streaming SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants