feat: store state and combined snapshot+state in S3 with configurable retention#121
Merged
Merged
Conversation
… retention
Currently only the raw incremental snapshot is uploaded to S3. That is not enough
to support rollback of nodes back to snapshot-streaming: both the GlobalSnapshotInfo
(state) and the combined snapshot+state bundle are required to validate and restore.
Changes:
- S3Config gains uploadStateEnabled, uploadCombinedEnabled and retentionCount.
- S3DAO: new uploadState / uploadCombined (gzipped JSON, user metadata carries
ordinal and hash) and matching download + prune operations.
- Key scheme states/{ordinal}-{hash} and combined/{ordinal}-{hash} preserves the
hash so forked snapshots at the same ordinal do not overwrite each other,
while the ordinal prefix keeps retention pruning O(1) per upload.
- SnapshotProcessor.storeInS3 runs the three uploads in parallel (each gated by
its own flag) and prunes all variants at ordinal - retentionCount after upload.
Contributor
|
LGTM |
marcus-girolneto
added a commit
that referenced
this pull request
Apr 23, 2026
… retention (#121) Currently only the raw incremental snapshot is uploaded to S3. That is not enough to support rollback of nodes back to snapshot-streaming: both the GlobalSnapshotInfo (state) and the combined snapshot+state bundle are required to validate and restore. Changes: - S3Config gains uploadStateEnabled, uploadCombinedEnabled and retentionCount. - S3DAO: new uploadState / uploadCombined (gzipped JSON, user metadata carries ordinal and hash) and matching download + prune operations. - Key scheme states/{ordinal}-{hash} and combined/{ordinal}-{hash} preserves the hash so forked snapshots at the same ordinal do not overwrite each other, while the ordinal prefix keeps retention pruning O(1) per upload. - SnapshotProcessor.storeInS3 runs the three uploads in parallel (each gated by its own flag) and prunes all variants at ordinal - retentionCount after upload.
marcus-girolneto
added a commit
that referenced
this pull request
Apr 23, 2026
… retention (#121) Currently only the raw incremental snapshot is uploaded to S3. That is not enough to support rollback of nodes back to snapshot-streaming: both the GlobalSnapshotInfo (state) and the combined snapshot+state bundle are required to validate and restore. Changes: - S3Config gains uploadStateEnabled, uploadCombinedEnabled and retentionCount. - S3DAO: new uploadState / uploadCombined (gzipped JSON, user metadata carries ordinal and hash) and matching download + prune operations. - Key scheme states/{ordinal}-{hash} and combined/{ordinal}-{hash} preserves the hash so forked snapshots at the same ordinal do not overwrite each other, while the ordinal prefix keeps retention pruning O(1) per upload. - SnapshotProcessor.storeInS3 runs the three uploads in parallel (each gated by its own flag) and prunes all variants at ordinal - retentionCount after upload.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Snapshot-streaming currently uploads only the raw incremental snapshot to S3. That is not sufficient to support node rollback: both the
GlobalSnapshotInfo(state) and the combined snapshot+state bundle are required to validate and restore. Two production incidents — nodes missing snapshots, and the streaming DB getting stuck on inserts — both benefit from having the last N states reliably retrievable from S3.GlobalSnapshotInfo) and combined (SnapshotWithState) to S3 alongside the existing snapshot upload. Each upload is independently toggled.retentionCountof each (default1000) via exact-prefix pruning on the expired ordinal — O(1) per upload, no full bucket scan.states/{ordinal}-{hash}andcombined/{ordinal}-{hash}: ordinal enables cheap retention, hash prevents fork collisions. S3 object user-metadata carries bothordinalandhashfor tooling.downloadState(ordinal, hash)/downloadCombined(ordinal, hash)exposed onS3DAOfor future rollback verification callers.Config
New fields under
snapshotStreaming.s3(all default to off / safe values):Files changed
src/main/scala/org/constellation/snapshotstreaming/Configuration.scalasrc/main/scala/org/constellation/snapshotstreaming/s3/S3DAO.scalasrc/main/scala/org/constellation/snapshotstreaming/SnapshotProcessor.scalasrc/main/resources/reference.confOut of scope / follow-ups
downloadState/downloadCombinedare exposed but no consumer wires them in. The ticket's "ensure via S3 download that state/ordinal/hash match" is presumably handled by the network-monitoring service that triggers the rollback.N - retentionCountis ever written. Matches "store lastN going forward." Happy to add a one-shot backfill if needed.Test plan
uploadStateEnabled = trueanduploadCombinedEnabled = true,retentionCount = 100for quick verificationstates/{ordinal}-{hash}andcombined/{ordinal}-{hash}objects appear with correct user-metadataSnapshotWithStateand that ordinal/hash match the object key