-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Currently, the primary key table write path in paimon-cpp does not support spilling write buffers to disk. When memory pressure is high, the system directly flushes data into data files, which leads to a large number of small files — degrading performance and increasing compaction overhead.
To align with the Java version and improve memory efficiency, we need to implement spillable write buffers for PK tables. This feature will allow in-memory arrow buffer to be spilled to other storage (e.g., local disk) for temporary spill files when memory thresholds are exceeded, rather than immediately flushing small data files.
Solution
We may support write-buffer-spillable, write-buffer-spill.max-disk-size, spill-compression et al. options.
Due to differences in memory representation — particularly that Java's spilling mechanism is tightly coupled with its internal row-oriented format — a direct port is not feasible. Instead, paimon-cpp will design and refactor this component using Apache Arrow's columnar memory format to:
- Minimize row-column conversion overhead
- Reuse the same PK merge logic for both reading and writing
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!