Skip to content

FastAppendAction drops delete-only manifests, causing deleted files to reappear #2148

@drbothen

Description

@drbothen

Description

FastAppendAction::existing_manifest() in crates/iceberg/src/transaction/append.rs filters manifest list entries with:

.filter(|entry| entry.has_added_files() || entry.has_existing_files())

This drops manifests that contain only Deleted entries (has_deleted_files() but neither has_added_files() nor has_existing_files()).

Impact

After a rewrite_files operation (or any operation that creates a delete-only manifest to mark old files as removed), a subsequent fast_append drops the delete manifest from the new snapshot's manifest list. The old manifests still carry Added entries for the removed files, but there is no longer a Delete manifest to exclude them. The deleted files reappear as alive.

This causes compounding data duplication — each subsequent append or rewrite cycle adds another copy of the "ghost" files, producing exponential row growth:

Cycle 1: 72 rows
Cycle 2: 145 rows
Cycle 3: 297 rows
...
Cycle 12: 235,026 rows

Root Cause

The filter in existing_manifest() was intended to skip empty manifests, but it inadvertently skips delete-only manifests. A delete-only manifest is not empty — it records which file paths were removed and must be preserved until expire_snapshots cleans it up.

Fix

Add || entry.has_deleted_files() to the filter:

.filter(|entry| {
    entry.has_added_files()
        || entry.has_existing_files()
        || entry.has_deleted_files()
})

Reproduction

  1. Create a table and append data files
  2. Perform a rewrite_files operation (replaces old files with a compacted file)
  3. Perform a fast_append with new data files
  4. Scan the table — deleted files from step 2 reappear as live data
  5. Repeat steps 2-4 — duplication compounds exponentially

Notes

  • Currently, rewrite_files is not yet on main, so this bug is latent. It becomes immediately triggerable once any operation that produces delete-only manifests lands.
  • The Iceberg spec requires delete manifests to persist across snapshots until they are cleaned up by expire_snapshots. Dropping them prematurely violates snapshot isolation guarantees.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions