Skip to content

refactor: Optimize SnapshotUtil ancestor methods with early return#560

Merged
wgtmac merged 2 commits intoapache:mainfrom
WZhuo:snapshot_util
Feb 27, 2026
Merged

refactor: Optimize SnapshotUtil ancestor methods with early return#560
wgtmac merged 2 commits intoapache:mainfrom
WZhuo:snapshot_util

Conversation

@WZhuo
Copy link
Contributor

@WZhuo WZhuo commented Feb 11, 2026

  • Optimize IsAncestorOf to return early instead of collecting all ancestors
  • Optimize IsParentAncestorOf to return early when matching parent is found
  • Add TableMetadata overloads for AncestorsOf, AncestorsBetween, OldestAncestorOf
  • Add IsParentAncestorOf overload with custom lookup function

- Optimize IsAncestorOf to return early instead of collecting all ancestors
- Optimize IsParentAncestorOf to return early when matching parent is found
- Add TableMetadata overloads for AncestorsOf, AncestorsBetween, OldestAncestorOf
- Add IsParentAncestorOf overload with custom lookup function
Copy link
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An early return should improve performance when there is a large snapshot history, so +1 from me. Thanks for working on this.

@wgtmac
Copy link
Member

wgtmac commented Feb 25, 2026

Review Report: PR #560

📄 File: src/iceberg/util/snapshot_util.cc & src/iceberg/util/snapshot_util_internal.h

Java Counterpart: core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java

  • Parity Check: ⚠️
    • Discrepancy in IsAncestorOf: The new implementation introduces an early return if (snapshot_id == ancestor_snapshot_id) return true; before verifying if snapshot_id actually exists.
    • In Java, isAncestorOf delegates to ancestorsOf(snapshotId, ...), which strictly evaluates Preconditions.checkArgument(start != null, "Cannot find snapshot: %s", snapshotId); before anything else.
    • Impact: In C++, if queried about a non-existent snapshot_id, IsAncestorOf(invalid_id, invalid_id, lookup) will silently return true, whereas Java throws an IllegalArgumentException.
    • Fix: Perform the lookup and validation of snapshot_id before the equality check.
  • Style Check:
    • The transition away from std::ranges::any_of on a fully constructed std::vector of ancestors to an iterative, early-exit while loop in IsAncestorOf and IsParentAncestorOf is excellent. It avoids unnecessary heap allocations and redundant O(N) snapshot lookups.
    • Adding overloads for TableMetadata cleans up the dependency chain nicely.
  • Logic Check:
    • Returning nullptr from the lambda within AncestorsBetween perfectly aligns with Java's Iterables.filter early-termination behavior and cleanly exits the AncestorsOf traversal loop without throwing errors.
    • Loop safety (current != nullptr) and NotFound error forwarding via ICEBERG_ACTION_FOR_NOT_FOUND are correctly implemented.

Summary & Recommendation

  • Request Changes: The PR is highly optimized and well-structured, but the missing validation for snapshot_id in IsAncestorOf breaks strict parity with Java and could mask invalid snapshot queries. Please move the lookup validation to occur before the snapshot_id == ancestor_snapshot_id check.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Report: PR #560

(Generated by gemini-cli)

📄 File: src/iceberg/util/snapshot_util.cc

Java Counterpart: org/apache/iceberg/util/SnapshotUtil.java

  • Parity Check: ✅ The new optimizations in IsAncestorOf and IsParentAncestorOf perfectly match the lazy evaluation logic of Java's ancestorsOf Iterable. It appropriately evaluates the snapshot and its parents iteratively without loading all ancestors at once. The newly added TableMetadata overloads correspond flawlessly to their Java equivalents.
  • Style & Comments: ✅ Code style adheres to conventions. The code uses std::function for lookups efficiently and utilizes std::move and std::optional effectively.
  • Logic Check: ✅ Loop logic is sound. Avoiding the allocation of a std::vector of ancestors inside the validation checks eliminates unnecessary memory consumption and overhead. Correct termination conditions are applied.
  • Design & Conciseness: ✅ Excellent structural enhancement. Decoupling the snapshot lookup via TableMetadata increases the utility of SnapshotUtil, permitting lookups where the full Table object is unavailable.
  • Test Quality: ✅ Existing SnapshotUtilTest cases continue to execute perfectly, as the original Table overloads correctly delegate to the new TableMetadata overloads. Existing test coverage accurately validates the optimized internal logic.

📄 File: src/iceberg/util/snapshot_util_internal.h

Java Counterpart: org/apache/iceberg/util/SnapshotUtil.java

  • Parity Check: ✅ Signatures faithfully represent the Java implementation scope.
  • Style & Comments: ✅ Documentation comments accurately reflect the behavior of the newly introduced and modified functions.
  • Logic Check: ✅ No logic issues in declarations.
  • Design & Conciseness: ✅ Well-organized headers.
  • Test Quality: ✅ Covered by integration with the .cc tests.

Summary & Recommendation

  • Comment: The code perfectly mirrors the Java implementation's iterative ancestor evaluation, correctly bypasses unnecessary vector allocations, and broadens reusability. All tests pass successfully. Great PR!

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving this! LGTM

@wgtmac wgtmac merged commit 349eb92 into apache:main Feb 27, 2026
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants