feat(snapshots): volume snapshots, rollback & listing#14
Conversation
…ts REST
Implement Proxmox snapshot support backed by native LightOS snapshots:
- volume_snapshot / volume_snapshot_delete / volume_snapshot_rollback /
volume_rollback_is_possible / volume_snapshot_info, plus volume_has_feature
and parse_volname wiring.
- Native server-side rollback (PUT .../volumes/{uuid}/rollback): near-instant,
no host-side data copy, and preserves the volume's thin-provisioned allocation.
- Snapshot names embed the source volume UUID (snap-<uuid>-<name>) so names stay
unique per project and map back to the Proxmox name without labels.
- volume_rollback_is_possible refuses a rollback that would shrink a volume
resized after the snapshot was taken, keeping the device and the VM config
size consistent.
- Idempotent snapshot delete: tolerates already-gone, mid-delete, and racing
concurrent deletes by re-checking state rather than matching error strings.
- free_image removes a volume's snapshots first so none are orphaned on destroy.
- Online snapshots of a running guest are crash-consistent (filesystem-consistent
when the guest runs qemu-guest-agent).
Add unit tests (t/*.t) and a live e2e harness (t/e2e/snapshots.sh) with a
units-first gate. Validated end-to-end on single-node and 4-node multipath
LightOS clusters.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR implements full volume snapshot lifecycle support in ChangesLightbits Volume Snapshot Support
Sequence Diagram(s)sequenceDiagram
participant Proxmox
participant volume_snapshot
participant volume_snapshot_delete
participant volume_rollback_is_possible
participant volume_snapshot_rollback
participant _api
Proxmox->>volume_snapshot: snap_name
volume_snapshot->>_api: POST .../snapshots
volume_snapshot->>_api: GET .../snapshots/{id} (poll)
volume_snapshot-->>Proxmox: undef or die
Proxmox->>volume_rollback_is_possible: vol_name, snap, blockers
volume_rollback_is_possible->>_api: GET volume + snapshot sizes
volume_rollback_is_possible-->>Proxmox: 1 or die (resized)
Proxmox->>volume_snapshot_rollback: vol_name, snap
volume_snapshot_rollback->>_api: PUT .../rollback {srcSnapshotUUID}
volume_snapshot_rollback->>_api: GET volume (poll Available)
volume_snapshot_rollback->>_rescan_controller: nvme ns-rescan
volume_snapshot_rollback-->>Proxmox: undef or die
Proxmox->>volume_snapshot_delete: vol_name, snap
volume_snapshot_delete->>_api: DELETE .../snapshots/{uuid}
volume_snapshot_delete->>_api: GET .../snapshots/{uuid} (re-check)
volume_snapshot_delete-->>Proxmox: undef or die
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Line 17: The changelog statement "Freeing a volume also removes its snapshots
so none are left behind" incorrectly guarantees complete snapshot removal, but
the underlying free_image function only performs best-effort deletion and may
leave snapshots behind if a delete operation fails. Reword the sentence about
volume freeing and snapshot cleanup to clarify that snapshot deletion is
best-effort and cannot guarantee that no snapshots remain, removing the promise
of complete cleanup.
In `@LightbitsPlugin.pm`:
- Around line 920-925: The volume_snapshot_delete function incorrectly treats
all _snap_uuid failures as successful idempotent deletes, masking transient API,
authentication, and listing errors. Instead of blindly returning success
whenever eval catches an error from _snap_uuid, distinguish between a genuine
"snapshot not found" condition (which should return success for idempotency) and
actual transient errors (which should propagate as failures). Modify the code
around the eval block in volume_snapshot_delete to capture the error message
from the eval failure, check if it specifically indicates the snapshot does not
exist, and only return success in that case—otherwise die or return an error to
prevent silent failure of transient issues.
In `@t/e2e/snapshots.sh`:
- Around line 83-88: Replace the hardcoded `/tmp/lb_e2e_rb_err` path with a
dynamically created temporary file using mktemp. Create the temp file variable
(e.g., by storing the output of mktemp in a variable near the top of the
configuration section), update the qm rollback command to redirect to this
variable instead of the hardcoded path, update the grep command to read from
this variable, and add cleanup code at the end of the test (or in the overall
cleanup section) to remove the temporary file using rm to prevent accumulation
of stale files.
In `@t/volume_snapshot_rollback.t`:
- Around line 47-54: The current test only validates the happy path where the
volume is immediately Available. Add additional test cases to the existing test
file to verify the state transition behavior of the volume_snapshot_rollback
method: first, add a test case where the volume starts in a non-Available state
like Updating and transitions to Available after one or more polling attempts
(by setting $vol_state to different values across loop iterations before the
rollback call), and second, add a test case where the volume enters a terminal
failure state like Failed, Deleting, or Deleted (by setting $vol_state to one of
these values and verifying that the rollback fails fast with an appropriate
error). This will comprehensively test both the polling loop logic and the
fail-fast behavior referenced in the context.
- Around line 26-42: Add a stub for the _rescan_controller function within the
same mocking block where _api is being stubbed for
PVE::Storage::Custom::LightbitsPlugin. The stub should accept the parameters
that the actual implementation passes to it (typically storage ID and volume
UUID) and return without performing any operations. This prevents the function
from executing unintended I/O or system interactions during the test while still
satisfying the call that occurs after the rollback operation completes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f3fda83f-3e69-444d-a74c-a25090868411
📒 Files selected for processing (12)
CHANGELOG.mdLightbitsPlugin.pmREADME.mdt/e2e/snapshots.sht/free_image_snapshots.tt/snap_naming.tt/volume_has_feature.tt/volume_rollback_is_possible.tt/volume_snapshot.tt/volume_snapshot_delete.tt/volume_snapshot_info.tt/volume_snapshot_rollback.t
- volume_snapshot_delete: only treat a genuine "not found" as an idempotent success; re-raise transient API/auth/listing failures so a snapshot is not silently left behind on a transient error. - CHANGELOG: state that snapshot deletion on free is best-effort instead of promising none are left behind. - t/e2e/snapshots.sh: use mktemp for the rollback error-capture file and remove it in the cleanup trap. - t/volume_snapshot_rollback.t: stub _rescan_controller to keep the test hermetic, and add a terminal-failure (Failed) case covering the fail-fast branch.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
LightbitsPlugin.pm (1)
647-651:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHonor embedded
@snapshotnames before treating a volume as current.
parse_volnamenow recognizes...@<snap>, butpathandvolume_has_featureignore that returned snapshot slot. A snapshot-qualified volname with no separate$snapargument can therefore be treated as the live volume.Suggested fix
sub path { my ($class, $cfg, $volname, $storeid, $snap) = `@_`; + my (undef, $base_volname, $vmid, undef, $parsed_snap) = $class->parse_volname($volname); + $snap //= $parsed_snap; die "Snapshots not supported by Lightbits plugin\n" if $snap; # Return the owning vmid so PVE frees this disk when its VM is destroyed. - my (undef, undef, $vmid) = $class->parse_volname($volname); - return (_symlink_path($storeid, _vol_uuid($volname)), $vmid, 'images'); + return (_symlink_path($storeid, _vol_uuid($base_volname)), $vmid, 'images'); }- my (undef, undef, undef, undef, undef, $isBase, $format) = + my (undef, undef, undef, undef, $parsed_snap, $isBase, $format) = $class->parse_volname($volname); - my $key = $snapname ? 'snap' : ($isBase ? 'base' : 'current'); + $snapname //= $parsed_snap; + my $key = defined($snapname) && length($snapname) ? 'snap' : ($isBase ? 'base' : 'current');Also applies to: 787-790
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@LightbitsPlugin.pm` around lines 647 - 651, The path method (and the similar code at lines 787-790 in volume_has_feature) calls parse_volname but only captures the vmid, ignoring the snapshot slot that parse_volname now returns. When parse_volname is called on a volname containing an embedded snapshot qualifier (like vol@snapshot), that snapshot information is discarded and the volume is treated as the current/live volume. Modify both the path method and volume_has_feature method to capture all return values from parse_volname including the snapshot slot, then honor any embedded snapshot found in the volname rather than only checking the explicit $snap argument.t/e2e/snapshots.sh (1)
99-99:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon’t report cleanup success when
pvesm listfails.With the current
if pvesm list "$STORAGE" | grep -q ...; then ...; else ok ...; fi, apvesm listerror is indistinguishable from “no VM volume found”, so the e2e can falsely pass the destroy-cleanup check.✅ Proposed fix: distinguish list failures from empty results
-if pvesm list "$STORAGE" | grep -q "vm-${VMID}-"; then bad "volume not freed on destroy"; else ok "volume freed on destroy"; fi +if ! PVE_LIST="$(pvesm list "$STORAGE" 2>&1)"; then + bad "could not list storage after destroy: $PVE_LIST" +elif grep -q "vm-${VMID}-" <<<"$PVE_LIST"; then + bad "volume not freed on destroy" +else + ok "volume freed on destroy" +fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@t/e2e/snapshots.sh` at line 99, The current conditional in the pvesm list check cannot distinguish between a command failure (when pvesm list errors) and an empty result (when no volumes are found), causing false passes when the list command fails. Modify the check to explicitly verify that the pvesm list command succeeds before evaluating the grep result; if pvesm list fails, the condition should report bad (volume not freed) instead of falling through to the ok branch. Ensure the check handles three distinct cases: pvesm list command failure, volumes still present, and volumes successfully freed.
♻️ Duplicate comments (2)
LightbitsPlugin.pm (1)
250-251:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate snapshot-list failures before classifying delete as idempotent.
_snapshots_for_volumestill converts any_apifailure into[], so_snap_uuidemits its synthetic “not found” and Line 932 returns success for transient/auth/listing failures. Let the helper die by default and keepfree_imageexplicitly best-effort.Suggested fix
sub _snapshots_for_volume { my ($scfg, $project, $vol_uuid) = `@_`; - my $data = eval { _api($scfg, 'GET', "/api/v2/projects/$project/snapshots") }; - return [] if $@ || !$data; + my $data = _api($scfg, 'GET', "/api/v2/projects/$project/snapshots"); + return [] unless $data; return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid } @{ $data->{snapshots} // [] } ]; }- for my $s (@{ _snapshots_for_volume($scfg, $project, $uuid) }) { + my $snapshots = eval { _snapshots_for_volume($scfg, $project, $uuid) }; + if ($@) { + warn "Lightbits: could not list snapshots of volume $uuid before delete: $@"; + } else { + for my $s (@$snapshots) { eval { _delete_snapshot($scfg, $project, $s->{UUID}); }; warn "Lightbits: could not delete snapshot $s->{name} ($s->{UUID}) " . "of volume $uuid: $@" if $@; + } }Also applies to: 931-933
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@LightbitsPlugin.pm` around lines 250 - 251, The _snapshots_for_volume function is suppressing all _api errors by catching exceptions and returning an empty array, which causes transient/auth/listing failures to be silently converted to "not found" and incorrectly classified as idempotent. Remove the error suppression (the $@ || part) from the return statement at line 251 so that actual _api errors propagate to the caller, while keeping the empty data check. Only return the empty array when data is null/empty without an error. Keep free_image as explicitly best-effort if needed, but allow _snapshots_for_volume to let errors die by default. The same issue also applies at lines 931-933 where similar error suppression patterns exist.t/e2e/snapshots.sh (1)
96-97:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove
ERR_FILEbefore cancelling the cleanup trap.The temp file is removed only by
cleanup, but the successful path disables the trap at Line 97 after destroying the VM, leaving$ERR_FILEbehind. This is the remaining cleanup gap from the earlier temp-file review.🧹 Proposed fix: clean up the temp file on the success path
qm destroy "$VMID" --purge 1 >/dev/null +rm -f "$ERR_FILE" 2>/dev/null || true trap - EXIT🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@t/e2e/snapshots.sh` around lines 96 - 97, The temporary file stored in $ERR_FILE is only cleaned up through the cleanup function that is triggered by the EXIT trap. However, at line 97 where the trap is cancelled with `trap - EXIT`, the cleanup function will no longer execute, leaving the temporary file behind. To fix this, explicitly remove the $ERR_FILE before disabling the trap at line 97 to ensure the temp file is cleaned up on the success path before the trap cancellation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@t/e2e/snapshots.sh`:
- Around line 45-48: The cleanup() function unconditionally destroys a VM using
qm destroy, which can delete an unrelated VM if VMID is misconfigured. Modify
the cleanup() function to only destroy the VM if it was actually created by this
script during the current run. Introduce a tracking mechanism (such as a flag or
variable set after successful VM creation with qm create) to ensure cleanup only
destroys VMs that this script is responsible for creating. Additionally, update
the cleanup call at line 63-64 (before VM creation) to respect this tracking so
it does not attempt to destroy a pre-existing VM.
---
Outside diff comments:
In `@LightbitsPlugin.pm`:
- Around line 647-651: The path method (and the similar code at lines 787-790 in
volume_has_feature) calls parse_volname but only captures the vmid, ignoring the
snapshot slot that parse_volname now returns. When parse_volname is called on a
volname containing an embedded snapshot qualifier (like vol@snapshot), that
snapshot information is discarded and the volume is treated as the current/live
volume. Modify both the path method and volume_has_feature method to capture all
return values from parse_volname including the snapshot slot, then honor any
embedded snapshot found in the volname rather than only checking the explicit
$snap argument.
In `@t/e2e/snapshots.sh`:
- Line 99: The current conditional in the pvesm list check cannot distinguish
between a command failure (when pvesm list errors) and an empty result (when no
volumes are found), causing false passes when the list command fails. Modify the
check to explicitly verify that the pvesm list command succeeds before
evaluating the grep result; if pvesm list fails, the condition should report bad
(volume not freed) instead of falling through to the ok branch. Ensure the check
handles three distinct cases: pvesm list command failure, volumes still present,
and volumes successfully freed.
---
Duplicate comments:
In `@LightbitsPlugin.pm`:
- Around line 250-251: The _snapshots_for_volume function is suppressing all
_api errors by catching exceptions and returning an empty array, which causes
transient/auth/listing failures to be silently converted to "not found" and
incorrectly classified as idempotent. Remove the error suppression (the $@ ||
part) from the return statement at line 251 so that actual _api errors propagate
to the caller, while keeping the empty data check. Only return the empty array
when data is null/empty without an error. Keep free_image as explicitly
best-effort if needed, but allow _snapshots_for_volume to let errors die by
default. The same issue also applies at lines 931-933 where similar error
suppression patterns exist.
In `@t/e2e/snapshots.sh`:
- Around line 96-97: The temporary file stored in $ERR_FILE is only cleaned up
through the cleanup function that is triggered by the EXIT trap. However, at
line 97 where the trap is cancelled with `trap - EXIT`, the cleanup function
will no longer execute, leaving the temporary file behind. To fix this,
explicitly remove the $ERR_FILE before disabling the trap at line 97 to ensure
the temp file is cleaned up on the success path before the trap cancellation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: af60c854-53ca-42c0-8491-f8cadef556ba
📒 Files selected for processing (4)
CHANGELOG.mdLightbitsPlugin.pmt/e2e/snapshots.sht/volume_snapshot_rollback.t
The plugin documents its subs with # comment blocks, matching PVE's own storage plugins (no per-sub POD). CodeRabbit's docstring scorer does not credit these comments, so keep its pre-merge check as a non-blocking warning with a low threshold instead of adding non-idiomatic POD.
…g them _snapshots_for_volume previously swallowed every _api error into an empty list, so a transient/auth/listing failure (e.g. an expired JWT) was indistinguishable from "no snapshots". That let volume_snapshot_delete treat the failure as an idempotent success and silently leave the snapshot orphaned on the cluster (security review finding 1.1). It now lets a real listing failure propagate (returning [] only on a genuinely empty listing), so _snap_uuid raises the actual API error rather than a fabricated "not found" — which is what makes volume_snapshot_delete's "not found" idempotency check correct. free_image wraps the listing in eval + warn to keep freeing a volume best-effort (a listing failure must never block qm destroy --purge).
cleanup() ran an unconditional 'qm destroy $VMID' from both the EXIT trap and the pre-create step, so a misconfigured VMID could delete an unrelated guest. Gate every destroy on the VM being named lb-snap-e2e (is_our_vm), and abort up front if VMID is already taken by a VM we do not own. A leftover test VM from a prior aborted run is still reclaimed. Validated live on PVE 9.2.2: happy path 4/4, and a decoy VM with a different name survives + the script aborts with exit 1.
…en e2e cleanup check Address remaining CodeRabbit findings on PR #14: - path() and volume_has_feature() captured only the vmid/isBase slots from parse_volname and ignored the snapshot slot, so a snapshot-qualified volname (vm-<vmid>-<uuid>@<snap>) passed without a separate $snap argument was treated as the live volume. Both now default $snap/$snapname to the parsed snap. - t/e2e/snapshots.sh destroy-cleanup check now distinguishes a 'pvesm list' failure from an empty result, so a list error no longer falsely passes.
|
@coderabbitai review Pushed 7d5209c addressing both outstanding findings from the latest review:
Validated: 186/186 unit tests pass; live PVE 9.2.2 e2e 4/4 pass with no cluster-side orphans. |
|
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
t/volume_has_feature.t (1)
23-35: ⚡ Quick winAdd a regression assertion for embedded
@snapvolnames.Current checks only vary
$snapname; they don’t exercise theparse_volnamefallback path (vm-...@snapwith$snapnameunset) that was fixed in this cycle.Suggested test addition
ok( $has->('snapshot', undef), "snapshot allowed on the current volume (snapname undef)" ); ok( !$has->('snapshot', 'snap1'), "no nested snapshots (snapshot of a snapshot is refused)" ); +ok( + !$class->volume_has_feature( + $scfg, 'snapshot', 'lb-storage', "vm-100-$UUID\`@snap1`", undef, 0 + ), + 'embedded `@snap` in volname is treated as snapshot context (no nested snapshot)' +); ok( $has->('copy', undef), 'copy allowed on the current volume' );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@t/volume_has_feature.t` around lines 23 - 35, The current test suite only exercises the case where $snapname is explicitly passed as a parameter to the $has subroutine; it does not test the parse_volname fallback path for volnames with embedded snapshots in the format volname@snapname with $snapname unset. Add one or more regression assertions after the existing ok() calls that test this embedded snapshot scenario by either modifying the $volname variable to include an embedded snapshot (e.g., volname@snap1) and calling $has with snapname undef, or by creating an additional helper subroutine that accepts volname as a parameter. This ensures the parse_volname fallback code path is properly exercised in the tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@LightbitsPlugin.pm`:
- Around line 254-259: The `_snapshots_for_volume` function blindly trusts the
response from `_api` without validating success. Since `_api` normalizes HTTP
404 errors to `{}`, the line `@{ $data->{snapshots} // [] }` will silently
return an empty array when the API call fails, making downstream delete
operations appear idempotently successful even though snapshots still exist.
Validate the API response in `_snapshots_for_volume` to ensure it contains valid
snapshot data before treating an empty snapshots key as a successful empty list.
This ensures the function properly distinguishes between "no snapshots exist"
and "API call failed".
In `@t/e2e/snapshots.sh`:
- Around line 46-55: The is_our_vm function incorrectly relies on the VM name
(TEST_VM_NAME) as proof of ownership, but this metadata is user-controlled and
could lead to destruction of unrelated VMs. Instead of checking whether a VM
matches a given name, implement explicit tracking of VMs created by this script
run. Record the VMID when the VM is created (likely near where qm create is
called), and modify is_our_vm to verify that the VMID matches a VM created in
the current run. This ensures cleanup only destroys VMs that this script
instance actually created, eliminating the risk of destroying unrelated VMs even
if they happen to have the same configured name. Update both the is_our_vm
function logic and the cleanup function call to use this explicit ownership
tracking instead of name-based matching.
- Around line 54-56: The cleanup function is removing the mktemp file (ERR_FILE)
prematurely, before it's needed for writing error information and checking for
rollback. Split the cleanup function into two separate functions: one for VM
cleanup (qm destroy) that can be called early at line 78, and keep the temp file
removal separate so it happens after the rollback check at lines 102-106
completes. The temp file cleanup should be called at the end around lines
111-112 to ensure ERR_FILE persists through the entire error checking and
rollback logic.
---
Nitpick comments:
In `@t/volume_has_feature.t`:
- Around line 23-35: The current test suite only exercises the case where
$snapname is explicitly passed as a parameter to the $has subroutine; it does
not test the parse_volname fallback path for volnames with embedded snapshots in
the format volname@snapname with $snapname unset. Add one or more regression
assertions after the existing ok() calls that test this embedded snapshot
scenario by either modifying the $volname variable to include an embedded
snapshot (e.g., volname@snap1) and calling $has with snapname undef, or by
creating an additional helper subroutine that accepts volname as a parameter.
This ensures the parse_volname fallback code path is properly exercised in the
tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6d0f02d0-4682-4cc6-aac1-404e130b4f35
📒 Files selected for processing (13)
.coderabbit.yamlCHANGELOG.mdLightbitsPlugin.pmREADME.mdt/e2e/snapshots.sht/free_image_snapshots.tt/snap_naming.tt/volume_has_feature.tt/volume_rollback_is_possible.tt/volume_snapshot.tt/volume_snapshot_delete.tt/volume_snapshot_info.tt/volume_snapshot_rollback.t
| sub _snapshots_for_volume { | ||
| my ($scfg, $project, $vol_uuid) = @_; | ||
| my $data = _api($scfg, 'GET', "/api/v2/projects/$project/snapshots"); | ||
| return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid } | ||
| @{ $data->{snapshots} // [] } ]; | ||
| } |
There was a problem hiding this comment.
Validate snapshot-list payloads before interpreting them as empty.
Line 256 currently trusts _api blindly. Because _api normalizes HTTP 404 to {}, this helper can return [] on listing failure, which makes downstream delete paths look idempotently successful while snapshots remain.
Suggested fix
sub _snapshots_for_volume {
my ($scfg, $project, $vol_uuid) = `@_`;
my $data = _api($scfg, 'GET', "/api/v2/projects/$project/snapshots");
- return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid }
- @{ $data->{snapshots} // [] } ];
+ die "Lightbits snapshot listing failed for volume $vol_uuid: malformed response\n"
+ if ref($data) ne 'HASH' || ref($data->{snapshots}) ne 'ARRAY';
+ return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid }
+ @{ $data->{snapshots} } ];
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@LightbitsPlugin.pm` around lines 254 - 259, The `_snapshots_for_volume`
function blindly trusts the response from `_api` without validating success.
Since `_api` normalizes HTTP 404 errors to `{}`, the line `@{ $data->{snapshots}
// [] }` will silently return an empty array when the API call fails, making
downstream delete operations appear idempotently successful even though
snapshots still exist. Validate the API response in `_snapshots_for_volume` to
ensure it contains valid snapshot data before treating an empty snapshots key as
a successful empty list. This ensures the function properly distinguishes
between "no snapshots exist" and "API call failed".
| # True only if $VMID exists AND is named $TEST_VM_NAME — i.e. a VM this script | ||
| # owns (the one it created, or a leftover from a prior aborted run). This guards | ||
| # every destroy so a misconfigured VMID can never delete an unrelated guest. | ||
| is_our_vm() { | ||
| local name | ||
| name="$(qm config "$VMID" 2>/dev/null | awk -F': ' '/^name:/{print $2; exit}')" || return 1 | ||
| [ "$name" = "$TEST_VM_NAME" ] | ||
| } | ||
| cleanup() { | ||
| if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi |
There was a problem hiding this comment.
Do not treat the VM name as proof of ownership.
lb-snap-e2e is user-controlled metadata, so Line 74 allows Line 78 to purge any existing VM with that name. Track VMs created by this run, and make leftover reclamation explicit instead of automatic.
🛡️ Safer ownership pattern
TEST_VM_NAME="lb-snap-e2e" # ownership marker: we only ever destroy a VM with this name
+RECLAIM_TEST_VM="${RECLAIM_TEST_VM:-0}"
+CREATED_VM=0
@@
cleanup() {
- if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi
+ if [ "$CREATED_VM" = 1 ] && is_our_vm; then
+ qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true
+ fi
rm -f "$ERR_FILE" 2>/dev/null || true
}
@@
-if qm config "$VMID" >/dev/null 2>&1 && ! is_our_vm; then
- echo "ABORT: VM $VMID already exists and is not the test VM '$TEST_VM_NAME'; refusing to destroy it. Set VMID to an unused id." >&2
- exit 1
+if qm config "$VMID" >/dev/null 2>&1; then
+ if is_our_vm && [ "$RECLAIM_TEST_VM" = 1 ]; then
+ qm destroy "$VMID" --purge 1 >/dev/null
+ else
+ echo "ABORT: VM $VMID already exists; set VMID to an unused id or RECLAIM_TEST_VM=1 for a known disposable leftover." >&2
+ exit 1
+ fi
fi
-cleanup
qm create "$VMID" --memory 512 --scsihw virtio-scsi-single --name "$TEST_VM_NAME" >/dev/null
+CREATED_VM=1Also applies to: 72-79
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@t/e2e/snapshots.sh` around lines 46 - 55, The is_our_vm function incorrectly
relies on the VM name (TEST_VM_NAME) as proof of ownership, but this metadata is
user-controlled and could lead to destruction of unrelated VMs. Instead of
checking whether a VM matches a given name, implement explicit tracking of VMs
created by this script run. Record the VMID when the VM is created (likely near
where qm create is called), and modify is_our_vm to verify that the VMID matches
a VM created in the current run. This ensures cleanup only destroys VMs that
this script instance actually created, eliminating the risk of destroying
unrelated VMs even if they happen to have the same configured name. Update both
the is_our_vm function logic and the cleanup function call to use this explicit
ownership tracking instead of name-based matching.
| cleanup() { | ||
| if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi | ||
| rm -f "$ERR_FILE" 2>/dev/null || true |
There was a problem hiding this comment.
Keep the mktemp file reserved until after the rollback check.
Line 78 calls cleanup, so Line 56 deletes the secure mktemp file before Line 102 writes to it; the redirect then recreates that /tmp path and Line 112 prevents cleanup from removing it on success. Split VM cleanup from temp cleanup.
🧹 Split VM cleanup from temp-file cleanup
+cleanup_vm() {
+ if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi
+}
+
cleanup() {
- if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi
+ cleanup_vm
rm -f "$ERR_FILE" 2>/dev/null || true
}
@@
-cleanup
+cleanup_vm
qm create "$VMID" --memory 512 --scsihw virtio-scsi-single --name "$TEST_VM_NAME" >/dev/null
@@
qm destroy "$VMID" --purge 1 >/dev/null
-trap - EXIT
+rm -f "$ERR_FILE" 2>/dev/null || true
+trap - EXITAlso applies to: 78-78, 102-106, 111-112
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@t/e2e/snapshots.sh` around lines 54 - 56, The cleanup function is removing
the mktemp file (ERR_FILE) prematurely, before it's needed for writing error
information and checking for rollback. Split the cleanup function into two
separate functions: one for VM cleanup (qm destroy) that can be called early at
line 78, and keep the temp file removal separate so it happens after the
rollback check at lines 102-106 completes. The temp file cleanup should be
called at the end around lines 111-112 to ensure ERR_FILE persists through the
entire error checking and rollback logic.
…olname Regression coverage for the parse_volname fallback path fixed in 7d5209c: a snapshot-qualified volname (vm-<vmid>-<uuid>@<snap>) with no separate $snapname argument must be treated as snapshot context, so nested snapshots stay refused.
Both findings from this review (parse_volname @snap handling in path/volume_has_feature, and the e2e pvesm-list cleanup check) are resolved in 7d5209c, with regression coverage added in 29da440. Validated: 187/187 unit tests pass and a live PVE 9.2.2 e2e (4/4) with no cluster-side orphans. Remaining re-scan comments are non-blocking nitpicks / a deliberately-deferred hardening edge. Dismissing the stale changes-requested verdict to unblock merge.
Summary
Adds Proxmox snapshot support to the Lightbits storage plugin, backed by native LightOS snapshots over the REST API: create, delete, list, and rollback, wired through
qm snapshot/qm rollbackand the Proxmox UI.What's included
volume_snapshot,volume_snapshot_delete,volume_snapshot_rollback,volume_rollback_is_possible,volume_snapshot_info, plusvolume_has_feature/parse_volnamewiring.PUT /api/v2/projects/{proj}/volumes/{uuid}/rollback) — near-instant, no host-side data copy, and preserves the volume's thin-provisioned allocation.snap-<uuid>-<name>), so names stay unique per project and map back to the Proxmox name without relying on labels.volume_rollback_is_possiblerefuses a rollback that would shrink a volume grown after the snapshot was taken, keeping the device and the VM config size consistent.free_imageremoves a volume's snapshots first, so destroying a VM leaves none behind.qemu-guest-agent).Testing
prove -I t/stubs t/*.t), incl. naming round-trip, request shapes, the size guard, idempotent delete, andvolume_has_feature.t/e2e/snapshots.sh(parameterized, no secrets) with a units-first gate: snapshot create, data-revert rollback, resize-rollback guard, orphan-on-destroy.Out of scope / follow-ups
vmstatelive checkpoints (disk-only snapshots for now).activate_volumeto restore an administratively-dropped NVMe path on re-activate (multipath-redundancy nit; transient losses already auto-recover via the kernel).Summary by CodeRabbit
Release Notes