[FLINK-38909] Fix Unable to delete S3 checkpoint due to presence of default file #27423
+79
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
This pull request fixes a critical bug (FLINK-38909) that causes checkpoint cleanup to fail with a
PathIsNotEmptyDirectoryException. The root cause was an incorrect, non-recursive delete call on a checkpoint's storage location, which by design contains multiple files.A completed Flink checkpoint always consists of multiple data files and a metadata file, grouped under a common path (
exclusiveCheckpointDir). This logical location is never empty. Attempting to delete it with a non-recursivedelete(path, false)command is fundamentally incorrect and guaranteed to fail on any compliant file system. This bug leads to orphaned checkpoint data and storage leaks.This fix corrects the logic by using a recursive delete, ensuring that all files and objects associated with a checkpoint's location are properly removed, regardless of the underlying filesystem's architecture.
Brief change log
FsCompletedCheckpointStorageLocation.disposeStorageLocation(), the filesystem call was changed tofs.delete(exclusiveCheckpointDir, true). This enables recursive deletion, ensuring the entire directory tree of a checkpoint is properly removed.Verifying this change
This change added tests and can be verified as follows:
FsCompletedCheckpointStorageLocationTestto specifically reproduce the bug and validate the fix. This test simulates a real, non-empty checkpoint by creating a storage location with subdirectories and files. It then calls thedisposeStorageLocation()method and asserts that no exception is thrown and the location is completely removed.Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation