Skip to content

[Feature] Support spillable write buffer for PK tables #149

@lxy-9602

Description

@lxy-9602

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Currently, the primary key table write path in paimon-cpp does not support spilling write buffers to disk. When memory pressure is high, the system directly flushes data into data files, which leads to a large number of small files — degrading performance and increasing compaction overhead.

To align with the Java version and improve memory efficiency, we need to implement spillable write buffers for PK tables. This feature will allow in-memory arrow buffer to be spilled to other storage (e.g., local disk) for temporary spill files when memory thresholds are exceeded, rather than immediately flushing small data files.

Solution

We may support write-buffer-spillable, write-buffer-spill.max-disk-size, spill-compression et al. options.

Due to differences in memory representation — particularly that Java's spilling mechanism is tightly coupled with its internal row-oriented format — a direct port is not feasible. Instead, paimon-cpp will design and refactor this component using Apache Arrow's columnar memory format to:

  • Minimize row-column conversion overhead
  • Reuse the same PK merge logic for both reading and writing

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions