-
Notifications
You must be signed in to change notification settings - Fork 414
Description
Description
FastAppendAction::existing_manifest() in crates/iceberg/src/transaction/append.rs filters manifest list entries with:
.filter(|entry| entry.has_added_files() || entry.has_existing_files())This drops manifests that contain only Deleted entries (has_deleted_files() but neither has_added_files() nor has_existing_files()).
Impact
After a rewrite_files operation (or any operation that creates a delete-only manifest to mark old files as removed), a subsequent fast_append drops the delete manifest from the new snapshot's manifest list. The old manifests still carry Added entries for the removed files, but there is no longer a Delete manifest to exclude them. The deleted files reappear as alive.
This causes compounding data duplication — each subsequent append or rewrite cycle adds another copy of the "ghost" files, producing exponential row growth:
Cycle 1: 72 rows
Cycle 2: 145 rows
Cycle 3: 297 rows
...
Cycle 12: 235,026 rows
Root Cause
The filter in existing_manifest() was intended to skip empty manifests, but it inadvertently skips delete-only manifests. A delete-only manifest is not empty — it records which file paths were removed and must be preserved until expire_snapshots cleans it up.
Fix
Add || entry.has_deleted_files() to the filter:
.filter(|entry| {
entry.has_added_files()
|| entry.has_existing_files()
|| entry.has_deleted_files()
})Reproduction
- Create a table and append data files
- Perform a
rewrite_filesoperation (replaces old files with a compacted file) - Perform a
fast_appendwith new data files - Scan the table — deleted files from step 2 reappear as live data
- Repeat steps 2-4 — duplication compounds exponentially
Notes
- Currently,
rewrite_filesis not yet onmain, so this bug is latent. It becomes immediately triggerable once any operation that produces delete-only manifests lands. - The Iceberg spec requires delete manifests to persist across snapshots until they are cleaned up by
expire_snapshots. Dropping them prematurely violates snapshot isolation guarantees.