feat(snapshots): volume snapshots, rollback & listing by roiyz-lb · Pull Request #14 · LightBitsLabs/pve-lightbits

roiyz-lb · 2026-06-14T21:16:57Z

Summary

Adds Proxmox snapshot support to the Lightbits storage plugin, backed by native LightOS snapshots over the REST API: create, delete, list, and rollback, wired through qm snapshot / qm rollback and the Proxmox UI.

What's included

Snapshot lifecycle — volume_snapshot, volume_snapshot_delete, volume_snapshot_rollback, volume_rollback_is_possible, volume_snapshot_info, plus volume_has_feature / parse_volname wiring.
Native server-side rollback (PUT /api/v2/projects/{proj}/volumes/{uuid}/rollback) — near-instant, no host-side data copy, and preserves the volume's thin-provisioned allocation.
Collision-safe naming — snapshot names embed the source volume UUID (snap-<uuid>-<name>), so names stay unique per project and map back to the Proxmox name without relying on labels.
Resize-safety guard — volume_rollback_is_possible refuses a rollback that would shrink a volume grown after the snapshot was taken, keeping the device and the VM config size consistent.
Idempotent delete — tolerates already-gone, mid-delete, and racing concurrent deletes by re-checking state rather than matching error strings.
No orphaned snapshots — free_image removes a volume's snapshots first, so destroying a VM leaves none behind.
Online snapshots of a running guest are crash-consistent (filesystem-consistent when the guest runs qemu-guest-agent).

Testing

Unit suite: 178 tests pass (prove -I t/stubs t/*.t), incl. naming round-trip, request shapes, the size guard, idempotent delete, and volume_has_feature.
New live e2e harness t/e2e/snapshots.sh (parameterized, no secrets) with a units-first gate: snapshot create, data-revert rollback, resize-rollback guard, orphan-on-destroy.
Validated end-to-end on a single-node cluster and a 4-node multipath LightOS cluster (v3.19.1): online/offline snapshots, rollback-to-earlier with later snapshots surviving, real path-loss failover with auto-recovery, and multi-volume subsystem-disconnect safety — all with byte-exact data verification.

Out of scope / follow-ups

RAM/vmstate live checkpoints (disk-only snapshots for now).
File-level recovery by cloning a snapshot to a scratch volume.
A separate small PR will improve activate_volume to restore an administratively-dropped NVMe path on re-activate (multipath-redundancy nit; transient losses already auto-recover via the kernel).

Summary by CodeRabbit

Release Notes

New Features
- Added Lightbits-backed, point-in-time volume snapshots with project-scoped handling, and server-side rollback that preserves thin-provision allocation and supports fast restore via native rollback semantics.
- Improved online guest snapshot crash-consistency when the guest agent is available, and added safeguards that block rollback if the volume was resized after the snapshot.
Bug Fixes
- Improved snapshot-qualified volume name parsing/paths and enhanced snapshot cleanup behavior during volume freeing (best-effort; non-fatal).
Tests
- Added unit coverage for snapshot naming, lifecycle, rollback eligibility, snapshot info, and idempotent deletion, plus a new end-to-end snapshot/rollback workflow test.
Documentation
- Updated README to remove the “no snapshots” limitation and expanded the roadmap items.

…ts REST Implement Proxmox snapshot support backed by native LightOS snapshots: - volume_snapshot / volume_snapshot_delete / volume_snapshot_rollback / volume_rollback_is_possible / volume_snapshot_info, plus volume_has_feature and parse_volname wiring. - Native server-side rollback (PUT .../volumes/{uuid}/rollback): near-instant, no host-side data copy, and preserves the volume's thin-provisioned allocation. - Snapshot names embed the source volume UUID (snap-<uuid>-<name>) so names stay unique per project and map back to the Proxmox name without labels. - volume_rollback_is_possible refuses a rollback that would shrink a volume resized after the snapshot was taken, keeping the device and the VM config size consistent. - Idempotent snapshot delete: tolerates already-gone, mid-delete, and racing concurrent deletes by re-checking state rather than matching error strings. - free_image removes a volume's snapshots first so none are orphaned on destroy. - Online snapshots of a running guest are crash-consistent (filesystem-consistent when the guest runs qemu-guest-agent). Add unit tests (t/*.t) and a live e2e harness (t/e2e/snapshots.sh) with a units-first gate. Validated end-to-end on single-node and 4-node multipath LightOS clusters.

coderabbitai · 2026-06-14T21:17:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR implements full volume snapshot lifecycle support in LightbitsPlugin.pm: snapshot naming helpers, _rescan_controller, extended parse_volname, updated free_image (snapshot cleanup before volume delete), a replaced volume_has_feature, and six new snapshot API methods. Unit tests and an end-to-end Bash test are added, and README/CHANGELOG are updated.

Changes

Lightbits Volume Snapshot Support

Layer / File(s)	Summary
Snapshot naming helpers, `_rescan_controller`, and `parse_volname` extension `LightbitsPlugin.pm`, `t/snap_naming.t`	Imports `timegm`; adds `_rescan_controller` for synchronous NVMe controller rescans, snapshot name encode/decode helpers, `_snapshots_for_volume`, `_snapshot_uuid_by_name`, and `_epoch_from_iso8601` for UTC timestamp parsing; extends `parse_volname` to handle `vm-<vmid>-<uuid>@<snap>` format. Tests cover round-trip encode/decode with hyphens and rejection of non-conforming names.
`volume_has_feature` replacement, `free_image` snapshot cleanup, `path` update, and resize rescan `LightbitsPlugin.pm`, `t/volume_has_feature.t`, `t/free_image_snapshots.t`	Replaces `volume_has_feature` with a nested feature-map keyed by snapshot/base/current state, changing its signature to accept `$scfg`, `$snapname`, and `$opts`; updates `free_image` to best-effort-delete all Lightbits snapshots tied to the volume UUID before the volume itself; updates `path` to extract embedded snapshot names from `volname`; changes `volume_resize` to call `_rescan_controller` for post-resize kernel sync. Tests cover all feature combinations, snapshot deletion ordering, and best-effort failure handling.
Snapshot lifecycle APIs and comprehensive unit tests `LightbitsPlugin.pm`, `t/volume_snapshot.t`, `t/volume_snapshot_delete.t`, `t/volume_rollback_is_possible.t`, `t/volume_snapshot_rollback.t`, `t/volume_snapshot_info.t`	Adds `volume_snapshot` (POST to project-scoped endpoint + poll for Available), idempotent `_delete_snapshot` and `volume_snapshot_delete` (with recheck-on-failure reconciliation), `volume_rollback_is_possible` (blocks if volume grew after snapshot), `volume_snapshot_rollback` (PUT + poll + post-rollback rescan), and `volume_snapshot_info` (client-side filters by naming scheme, sorts by timestamp, maps to Proxmox snapshot names). Each function is covered by a dedicated Perl unit test with stubbed `_api` covering success, failure, and edge cases.
End-to-end Bash test `t/e2e/snapshots.sh`	Adds environment-driven live e2e test covering snapshot creation, data-hash rollback verification (write A → snapshot → overwrite B → rollback → verify A), resize-rollback guard assertion (resized volumes block rollback), and destroy-cleanup validation (VM destroy removes storage entries). Includes unit-test gate via `prove` and safety cleanup trap.
README, CHANGELOG, and config updates `README.md`, `CHANGELOG.md`, `.coderabbit.yaml`	Removes the "No snapshots" limitation from README, adds Phase 1 roadmap entry for native server-side snapshot/rollback, narrows Phase 2 entry to focus on clones, adds CHANGELOG entry describing snapshot behavior (point-in-time, crash-consistency with `qemu-guest-agent`, server-side rollback, thin-provision preservation, no shrink-after-grow, best-effort cleanup), and configures CodeRabbit docstring checking in warning mode.

Sequence Diagram(s)

sequenceDiagram
  participant Proxmox
  participant volume_snapshot
  participant volume_snapshot_delete
  participant volume_rollback_is_possible
  participant volume_snapshot_rollback
  participant _api
  Proxmox->>volume_snapshot: snap_name
  volume_snapshot->>_api: POST .../snapshots
  volume_snapshot->>_api: GET .../snapshots/{id} (poll)
  volume_snapshot-->>Proxmox: undef or die
  Proxmox->>volume_rollback_is_possible: vol_name, snap, blockers
  volume_rollback_is_possible->>_api: GET volume + snapshot sizes
  volume_rollback_is_possible-->>Proxmox: 1 or die (resized)
  Proxmox->>volume_snapshot_rollback: vol_name, snap
  volume_snapshot_rollback->>_api: PUT .../rollback {srcSnapshotUUID}
  volume_snapshot_rollback->>_api: GET volume (poll Available)
  volume_snapshot_rollback->>_rescan_controller: nvme ns-rescan
  volume_snapshot_rollback-->>Proxmox: undef or die
  Proxmox->>volume_snapshot_delete: vol_name, snap
  volume_snapshot_delete->>_api: DELETE .../snapshots/{uuid}
  volume_snapshot_delete->>_api: GET .../snapshots/{uuid} (re-check)
  volume_snapshot_delete-->>Proxmox: undef or die

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

LightBitsLabs/pve-lightbits#9: Both PRs modify LightbitsPlugin.pm's resize flow; PR #9 adds the initial volume_resize method with nvme ns-rescan, while this PR refactors resizing to use the new _rescan_controller helper and updates volume_has_feature to support expanded snapshot-aware feature semantics.

🐰 Snapshots dance where data flows,
Rollback whispers, time rewinds,
Lightbits magic makes it so—
Point-in-time: the thread that binds,
Guard against the shrinking blow! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(snapshots): volume snapshots, rollback & listing' accurately and concisely describes the main feature additions in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 25.00% which is sufficient. The required threshold is 10.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/snapshots

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 17: The changelog statement "Freeing a volume also removes its snapshots
so none are left behind" incorrectly guarantees complete snapshot removal, but
the underlying free_image function only performs best-effort deletion and may
leave snapshots behind if a delete operation fails. Reword the sentence about
volume freeing and snapshot cleanup to clarify that snapshot deletion is
best-effort and cannot guarantee that no snapshots remain, removing the promise
of complete cleanup.

In `@LightbitsPlugin.pm`:
- Around line 920-925: The volume_snapshot_delete function incorrectly treats
all _snap_uuid failures as successful idempotent deletes, masking transient API,
authentication, and listing errors. Instead of blindly returning success
whenever eval catches an error from _snap_uuid, distinguish between a genuine
"snapshot not found" condition (which should return success for idempotency) and
actual transient errors (which should propagate as failures). Modify the code
around the eval block in volume_snapshot_delete to capture the error message
from the eval failure, check if it specifically indicates the snapshot does not
exist, and only return success in that case—otherwise die or return an error to
prevent silent failure of transient issues.

In `@t/e2e/snapshots.sh`:
- Around line 83-88: Replace the hardcoded `/tmp/lb_e2e_rb_err` path with a
dynamically created temporary file using mktemp. Create the temp file variable
(e.g., by storing the output of mktemp in a variable near the top of the
configuration section), update the qm rollback command to redirect to this
variable instead of the hardcoded path, update the grep command to read from
this variable, and add cleanup code at the end of the test (or in the overall
cleanup section) to remove the temporary file using rm to prevent accumulation
of stale files.

In `@t/volume_snapshot_rollback.t`:
- Around line 47-54: The current test only validates the happy path where the
volume is immediately Available. Add additional test cases to the existing test
file to verify the state transition behavior of the volume_snapshot_rollback
method: first, add a test case where the volume starts in a non-Available state
like Updating and transitions to Available after one or more polling attempts
(by setting $vol_state to different values across loop iterations before the
rollback call), and second, add a test case where the volume enters a terminal
failure state like Failed, Deleting, or Deleted (by setting $vol_state to one of
these values and verifying that the rollback fails fast with an appropriate
error). This will comprehensively test both the polling loop logic and the
fail-fast behavior referenced in the context.
- Around line 26-42: Add a stub for the _rescan_controller function within the
same mocking block where _api is being stubbed for
PVE::Storage::Custom::LightbitsPlugin. The stub should accept the parameters
that the actual implementation passes to it (typically storage ID and volume
UUID) and return without performing any operations. This prevents the function
from executing unintended I/O or system interactions during the test while still
satisfying the call that occurs after the rollback operation completes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f3fda83f-3e69-444d-a74c-a25090868411

📥 Commits

Reviewing files that changed from the base of the PR and between 1da3a5f and bf331bc.

📒 Files selected for processing (12)

CHANGELOG.md
LightbitsPlugin.pm
README.md
t/e2e/snapshots.sh
t/free_image_snapshots.t
t/snap_naming.t
t/volume_has_feature.t
t/volume_rollback_is_possible.t
t/volume_snapshot.t
t/volume_snapshot_delete.t
t/volume_snapshot_info.t
t/volume_snapshot_rollback.t

- volume_snapshot_delete: only treat a genuine "not found" as an idempotent success; re-raise transient API/auth/listing failures so a snapshot is not silently left behind on a transient error. - CHANGELOG: state that snapshot deletion on free is best-effort instead of promising none are left behind. - t/e2e/snapshots.sh: use mktemp for the rollback error-capture file and remove it in the cleanup trap. - t/volume_snapshot_rollback.t: stub _rescan_controller to keep the test hermetic, and add a terminal-failure (Failed) case covering the fail-fast branch.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

LightbitsPlugin.pm (1)

647-651: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor embedded @snapshot names before treating a volume as current.

parse_volname now recognizes ...@<snap>, but path and volume_has_feature ignore that returned snapshot slot. A snapshot-qualified volname with no separate $snap argument can therefore be treated as the live volume.

Suggested fix

 sub path {
     my ($class, $cfg, $volname, $storeid, $snap) = `@_`;
+    my (undef, $base_volname, $vmid, undef, $parsed_snap) = $class->parse_volname($volname);
+    $snap //= $parsed_snap;
     die "Snapshots not supported by Lightbits plugin\n" if $snap;
     # Return the owning vmid so PVE frees this disk when its VM is destroyed.
-    my (undef, undef, $vmid) = $class->parse_volname($volname);
-    return (_symlink_path($storeid, _vol_uuid($volname)), $vmid, 'images');
+    return (_symlink_path($storeid, _vol_uuid($base_volname)), $vmid, 'images');
 }

-    my (undef, undef, undef, undef, undef, $isBase, $format) =
+    my (undef, undef, undef, undef, $parsed_snap, $isBase, $format) =
         $class->parse_volname($volname);
 
-    my $key = $snapname ? 'snap' : ($isBase ? 'base' : 'current');
+    $snapname //= $parsed_snap;
+    my $key = defined($snapname) && length($snapname) ? 'snap' : ($isBase ? 'base' : 'current');

Also applies to: 787-790

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@LightbitsPlugin.pm` around lines 647 - 651, The path method (and the similar
code at lines 787-790 in volume_has_feature) calls parse_volname but only
captures the vmid, ignoring the snapshot slot that parse_volname now returns.
When parse_volname is called on a volname containing an embedded snapshot
qualifier (like vol@snapshot), that snapshot information is discarded and the
volume is treated as the current/live volume. Modify both the path method and
volume_has_feature method to capture all return values from parse_volname
including the snapshot slot, then honor any embedded snapshot found in the
volname rather than only checking the explicit $snap argument.

t/e2e/snapshots.sh (1)

99-99: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t report cleanup success when pvesm list fails.

With the current if pvesm list "$STORAGE" | grep -q ...; then ...; else ok ...; fi, a pvesm list error is indistinguishable from “no VM volume found”, so the e2e can falsely pass the destroy-cleanup check.

✅ Proposed fix: distinguish list failures from empty results

-if pvesm list "$STORAGE" | grep -q "vm-${VMID}-"; then bad "volume not freed on destroy"; else ok "volume freed on destroy"; fi
+if ! PVE_LIST="$(pvesm list "$STORAGE" 2>&1)"; then
+    bad "could not list storage after destroy: $PVE_LIST"
+elif grep -q "vm-${VMID}-" <<<"$PVE_LIST"; then
+    bad "volume not freed on destroy"
+else
+    ok "volume freed on destroy"
+fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@t/e2e/snapshots.sh` at line 99, The current conditional in the pvesm list
check cannot distinguish between a command failure (when pvesm list errors) and
an empty result (when no volumes are found), causing false passes when the list
command fails. Modify the check to explicitly verify that the pvesm list command
succeeds before evaluating the grep result; if pvesm list fails, the condition
should report bad (volume not freed) instead of falling through to the ok
branch. Ensure the check handles three distinct cases: pvesm list command
failure, volumes still present, and volumes successfully freed.

♻️ Duplicate comments (2)

LightbitsPlugin.pm (1)

250-251: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate snapshot-list failures before classifying delete as idempotent.

_snapshots_for_volume still converts any _api failure into [], so _snap_uuid emits its synthetic “not found” and Line 932 returns success for transient/auth/listing failures. Let the helper die by default and keep free_image explicitly best-effort.

Suggested fix

 sub _snapshots_for_volume {
     my ($scfg, $project, $vol_uuid) = `@_`;
-    my $data = eval { _api($scfg, 'GET', "/api/v2/projects/$project/snapshots") };
-    return [] if $@ || !$data;
+    my $data = _api($scfg, 'GET', "/api/v2/projects/$project/snapshots");
+    return [] unless $data;
     return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid }
                 @{ $data->{snapshots} // [] } ];
 }

-    for my $s (@{ _snapshots_for_volume($scfg, $project, $uuid) }) {
+    my $snapshots = eval { _snapshots_for_volume($scfg, $project, $uuid) };
+    if ($@) {
+        warn "Lightbits: could not list snapshots of volume $uuid before delete: $@";
+    } else {
+        for my $s (@$snapshots) {
         eval { _delete_snapshot($scfg, $project, $s->{UUID}); };
         warn "Lightbits: could not delete snapshot $s->{name} ($s->{UUID}) "
             . "of volume $uuid: $@" if $@;
+        }
     }

Also applies to: 931-933

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@LightbitsPlugin.pm` around lines 250 - 251, The _snapshots_for_volume
function is suppressing all _api errors by catching exceptions and returning an
empty array, which causes transient/auth/listing failures to be silently
converted to "not found" and incorrectly classified as idempotent. Remove the
error suppression (the $@ || part) from the return statement at line 251 so that
actual _api errors propagate to the caller, while keeping the empty data check.
Only return the empty array when data is null/empty without an error. Keep
free_image as explicitly best-effort if needed, but allow _snapshots_for_volume
to let errors die by default. The same issue also applies at lines 931-933 where
similar error suppression patterns exist.

t/e2e/snapshots.sh (1)

96-97: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove ERR_FILE before cancelling the cleanup trap.

The temp file is removed only by cleanup, but the successful path disables the trap at Line 97 after destroying the VM, leaving $ERR_FILE behind. This is the remaining cleanup gap from the earlier temp-file review.
🧹 Proposed fix: clean up the temp file on the success path
 qm destroy "$VMID" --purge 1 >/dev/null
+rm -f "$ERR_FILE" 2>/dev/null || true
 trap - EXIT
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@t/e2e/snapshots.sh` around lines 96 - 97, The temporary file stored in
$ERR_FILE is only cleaned up through the cleanup function that is triggered by
the EXIT trap. However, at line 97 where the trap is cancelled with `trap -
EXIT`, the cleanup function will no longer execute, leaving the temporary file
behind. To fix this, explicitly remove the $ERR_FILE before disabling the trap
at line 97 to ensure the temp file is cleaned up on the success path before the
trap cancellation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@t/e2e/snapshots.sh`:
- Around line 45-48: The cleanup() function unconditionally destroys a VM using
qm destroy, which can delete an unrelated VM if VMID is misconfigured. Modify
the cleanup() function to only destroy the VM if it was actually created by this
script during the current run. Introduce a tracking mechanism (such as a flag or
variable set after successful VM creation with qm create) to ensure cleanup only
destroys VMs that this script is responsible for creating. Additionally, update
the cleanup call at line 63-64 (before VM creation) to respect this tracking so
it does not attempt to destroy a pre-existing VM.

---

Outside diff comments:
In `@LightbitsPlugin.pm`:
- Around line 647-651: The path method (and the similar code at lines 787-790 in
volume_has_feature) calls parse_volname but only captures the vmid, ignoring the
snapshot slot that parse_volname now returns. When parse_volname is called on a
volname containing an embedded snapshot qualifier (like vol@snapshot), that
snapshot information is discarded and the volume is treated as the current/live
volume. Modify both the path method and volume_has_feature method to capture all
return values from parse_volname including the snapshot slot, then honor any
embedded snapshot found in the volname rather than only checking the explicit
$snap argument.

In `@t/e2e/snapshots.sh`:
- Line 99: The current conditional in the pvesm list check cannot distinguish
between a command failure (when pvesm list errors) and an empty result (when no
volumes are found), causing false passes when the list command fails. Modify the
check to explicitly verify that the pvesm list command succeeds before
evaluating the grep result; if pvesm list fails, the condition should report bad
(volume not freed) instead of falling through to the ok branch. Ensure the check
handles three distinct cases: pvesm list command failure, volumes still present,
and volumes successfully freed.

---

Duplicate comments:
In `@LightbitsPlugin.pm`:
- Around line 250-251: The _snapshots_for_volume function is suppressing all
_api errors by catching exceptions and returning an empty array, which causes
transient/auth/listing failures to be silently converted to "not found" and
incorrectly classified as idempotent. Remove the error suppression (the $@ ||
part) from the return statement at line 251 so that actual _api errors propagate
to the caller, while keeping the empty data check. Only return the empty array
when data is null/empty without an error. Keep free_image as explicitly
best-effort if needed, but allow _snapshots_for_volume to let errors die by
default. The same issue also applies at lines 931-933 where similar error
suppression patterns exist.

In `@t/e2e/snapshots.sh`:
- Around line 96-97: The temporary file stored in $ERR_FILE is only cleaned up
through the cleanup function that is triggered by the EXIT trap. However, at
line 97 where the trap is cancelled with `trap - EXIT`, the cleanup function
will no longer execute, leaving the temporary file behind. To fix this,
explicitly remove the $ERR_FILE before disabling the trap at line 97 to ensure
the temp file is cleaned up on the success path before the trap cancellation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: af60c854-53ca-42c0-8491-f8cadef556ba

📥 Commits

Reviewing files that changed from the base of the PR and between bf331bc and b564376.

📒 Files selected for processing (4)

CHANGELOG.md
LightbitsPlugin.pm
t/e2e/snapshots.sh
t/volume_snapshot_rollback.t

The plugin documents its subs with # comment blocks, matching PVE's own storage plugins (no per-sub POD). CodeRabbit's docstring scorer does not credit these comments, so keep its pre-merge check as a non-blocking warning with a low threshold instead of adding non-idiomatic POD.

…g them _snapshots_for_volume previously swallowed every _api error into an empty list, so a transient/auth/listing failure (e.g. an expired JWT) was indistinguishable from "no snapshots". That let volume_snapshot_delete treat the failure as an idempotent success and silently leave the snapshot orphaned on the cluster (security review finding 1.1). It now lets a real listing failure propagate (returning [] only on a genuinely empty listing), so _snap_uuid raises the actual API error rather than a fabricated "not found" — which is what makes volume_snapshot_delete's "not found" idempotency check correct. free_image wraps the listing in eval + warn to keep freeing a volume best-effort (a listing failure must never block qm destroy --purge).

cleanup() ran an unconditional 'qm destroy $VMID' from both the EXIT trap and the pre-create step, so a misconfigured VMID could delete an unrelated guest. Gate every destroy on the VM being named lb-snap-e2e (is_our_vm), and abort up front if VMID is already taken by a VM we do not own. A leftover test VM from a prior aborted run is still reclaimed. Validated live on PVE 9.2.2: happy path 4/4, and a decoy VM with a different name survives + the script aborts with exit 1.

…en e2e cleanup check Address remaining CodeRabbit findings on PR #14: - path() and volume_has_feature() captured only the vmid/isBase slots from parse_volname and ignored the snapshot slot, so a snapshot-qualified volname (vm-<vmid>-<uuid>@<snap>) passed without a separate $snap argument was treated as the live volume. Both now default $snap/$snapname to the parsed snap. - t/e2e/snapshots.sh destroy-cleanup check now distinguishes a 'pvesm list' failure from an empty result, so a list error no longer falsely passes.

roiyz-lb · 2026-06-15T11:46:35Z

@coderabbitai review

Pushed 7d5209c addressing both outstanding findings from the latest review:

path and volume_has_feature now honor a snapshot embedded in the volname (slot 5 from parse_volname) instead of treating it as the live volume.
t/e2e/snapshots.sh destroy-cleanup check now distinguishes a pvesm list failure from an empty result.

Validated: 186/186 unit tests pass; live PVE 9.2.2 e2e 4/4 pass with no cluster-side orphans.

coderabbitai · 2026-06-15T11:46:52Z

@roiyz-lb Thanks for the detailed update on the fixes and the validation results (186/186 unit tests, 4/4 live e2e). Triggering a fresh review now to verify the changes in commit 7d5209c.

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

t/volume_has_feature.t (1)

23-35: ⚡ Quick win

Add a regression assertion for embedded @snap volnames.

Current checks only vary $snapname; they don’t exercise the parse_volname fallback path (vm-...@snap with $snapname unset) that was fixed in this cycle.

Suggested test addition

 ok(  $has->('snapshot', undef), "snapshot allowed on the current volume (snapname undef)" );
 ok( !$has->('snapshot', 'snap1'), "no nested snapshots (snapshot of a snapshot is refused)" );
+ok(
+    !$class->volume_has_feature(
+        $scfg, 'snapshot', 'lb-storage', "vm-100-$UUID\`@snap1`", undef, 0
+    ),
+    'embedded `@snap` in volname is treated as snapshot context (no nested snapshot)'
+);
 ok(  $has->('copy', undef),   'copy allowed on the current volume' );

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@t/volume_has_feature.t` around lines 23 - 35, The current test suite only
exercises the case where $snapname is explicitly passed as a parameter to the
$has subroutine; it does not test the parse_volname fallback path for volnames
with embedded snapshots in the format volname@snapname with $snapname unset. Add
one or more regression assertions after the existing ok() calls that test this
embedded snapshot scenario by either modifying the $volname variable to include
an embedded snapshot (e.g., volname@snap1) and calling $has with snapname undef,
or by creating an additional helper subroutine that accepts volname as a
parameter. This ensures the parse_volname fallback code path is properly
exercised in the tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@LightbitsPlugin.pm`:
- Around line 254-259: The `_snapshots_for_volume` function blindly trusts the
response from `_api` without validating success. Since `_api` normalizes HTTP
404 errors to `{}`, the line `@{ $data->{snapshots} // [] }` will silently
return an empty array when the API call fails, making downstream delete
operations appear idempotently successful even though snapshots still exist.
Validate the API response in `_snapshots_for_volume` to ensure it contains valid
snapshot data before treating an empty snapshots key as a successful empty list.
This ensures the function properly distinguishes between "no snapshots exist"
and "API call failed".

In `@t/e2e/snapshots.sh`:
- Around line 46-55: The is_our_vm function incorrectly relies on the VM name
(TEST_VM_NAME) as proof of ownership, but this metadata is user-controlled and
could lead to destruction of unrelated VMs. Instead of checking whether a VM
matches a given name, implement explicit tracking of VMs created by this script
run. Record the VMID when the VM is created (likely near where qm create is
called), and modify is_our_vm to verify that the VMID matches a VM created in
the current run. This ensures cleanup only destroys VMs that this script
instance actually created, eliminating the risk of destroying unrelated VMs even
if they happen to have the same configured name. Update both the is_our_vm
function logic and the cleanup function call to use this explicit ownership
tracking instead of name-based matching.
- Around line 54-56: The cleanup function is removing the mktemp file (ERR_FILE)
prematurely, before it's needed for writing error information and checking for
rollback. Split the cleanup function into two separate functions: one for VM
cleanup (qm destroy) that can be called early at line 78, and keep the temp file
removal separate so it happens after the rollback check at lines 102-106
completes. The temp file cleanup should be called at the end around lines
111-112 to ensure ERR_FILE persists through the entire error checking and
rollback logic.

---

Nitpick comments:
In `@t/volume_has_feature.t`:
- Around line 23-35: The current test suite only exercises the case where
$snapname is explicitly passed as a parameter to the $has subroutine; it does
not test the parse_volname fallback path for volnames with embedded snapshots in
the format volname@snapname with $snapname unset. Add one or more regression
assertions after the existing ok() calls that test this embedded snapshot
scenario by either modifying the $volname variable to include an embedded
snapshot (e.g., volname@snap1) and calling $has with snapname undef, or by
creating an additional helper subroutine that accepts volname as a parameter.
This ensures the parse_volname fallback code path is properly exercised in the
tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d0f02d0-4682-4cc6-aac1-404e130b4f35

📥 Commits

Reviewing files that changed from the base of the PR and between 1da3a5f and 7d5209c.

📒 Files selected for processing (13)

.coderabbit.yaml
CHANGELOG.md
LightbitsPlugin.pm
README.md
t/e2e/snapshots.sh
t/free_image_snapshots.t
t/snap_naming.t
t/volume_has_feature.t
t/volume_rollback_is_possible.t
t/volume_snapshot.t
t/volume_snapshot_delete.t
t/volume_snapshot_info.t
t/volume_snapshot_rollback.t

coderabbitai · 2026-06-15T11:55:56Z

+sub _snapshots_for_volume {
+    my ($scfg, $project, $vol_uuid) = @_;
+    my $data = _api($scfg, 'GET', "/api/v2/projects/$project/snapshots");
+    return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid }
+                @{ $data->{snapshots} // [] } ];
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate snapshot-list payloads before interpreting them as empty.

Line 256 currently trusts _api blindly. Because _api normalizes HTTP 404 to {}, this helper can return [] on listing failure, which makes downstream delete paths look idempotently successful while snapshots remain.

Suggested fix

sub _snapshots_for_volume { my ($scfg, $project, $vol_uuid) = `@_`; my $data = _api($scfg, 'GET', "/api/v2/projects/$project/snapshots"); - return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid } - @{ $data->{snapshots} // [] } ]; + die "Lightbits snapshot listing failed for volume $vol_uuid: malformed response\n" + if ref($data) ne 'HASH' || ref($data->{snapshots}) ne 'ARRAY'; + return [ grep { ($_->{sourceVolumeUUID} // '') eq $vol_uuid } + @{ $data->{snapshots} } ]; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@LightbitsPlugin.pm` around lines 254 - 259, The `_snapshots_for_volume` function blindly trusts the response from `_api` without validating success. Since `_api` normalizes HTTP 404 errors to `{}`, the line `@{ $data->{snapshots} // [] }` will silently return an empty array when the API call fails, making downstream delete operations appear idempotently successful even though snapshots still exist. Validate the API response in `_snapshots_for_volume` to ensure it contains valid snapshot data before treating an empty snapshots key as a successful empty list. This ensures the function properly distinguishes between "no snapshots exist" and "API call failed".

coderabbitai · 2026-06-15T11:55:56Z

+# True only if $VMID exists AND is named $TEST_VM_NAME — i.e. a VM this script
+# owns (the one it created, or a leftover from a prior aborted run). This guards
+# every destroy so a misconfigured VMID can never delete an unrelated guest.
+is_our_vm() {
+    local name
+    name="$(qm config "$VMID" 2>/dev/null | awk -F': ' '/^name:/{print $2; exit}')" || return 1
+    [ "$name" = "$TEST_VM_NAME" ]
+}
+cleanup() {
+    if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Do not treat the VM name as proof of ownership.

lb-snap-e2e is user-controlled metadata, so Line 74 allows Line 78 to purge any existing VM with that name. Track VMs created by this run, and make leftover reclamation explicit instead of automatic.

🛡️ Safer ownership pattern

TEST_VM_NAME="lb-snap-e2e" # ownership marker: we only ever destroy a VM with this name +RECLAIM_TEST_VM="${RECLAIM_TEST_VM:-0}" +CREATED_VM=0 @@ cleanup() { - if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi + if [ "$CREATED_VM" = 1 ] && is_our_vm; then + qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true + fi rm -f "$ERR_FILE" 2>/dev/null || true } @@ -if qm config "$VMID" >/dev/null 2>&1 && ! is_our_vm; then - echo "ABORT: VM $VMID already exists and is not the test VM '$TEST_VM_NAME'; refusing to destroy it. Set VMID to an unused id." >&2 - exit 1 +if qm config "$VMID" >/dev/null 2>&1; then + if is_our_vm && [ "$RECLAIM_TEST_VM" = 1 ]; then + qm destroy "$VMID" --purge 1 >/dev/null + else + echo "ABORT: VM $VMID already exists; set VMID to an unused id or RECLAIM_TEST_VM=1 for a known disposable leftover." >&2 + exit 1 + fi fi -cleanup qm create "$VMID" --memory 512 --scsihw virtio-scsi-single --name "$TEST_VM_NAME" >/dev/null +CREATED_VM=1

Also applies to: 72-79

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@t/e2e/snapshots.sh` around lines 46 - 55, The is_our_vm function incorrectly relies on the VM name (TEST_VM_NAME) as proof of ownership, but this metadata is user-controlled and could lead to destruction of unrelated VMs. Instead of checking whether a VM matches a given name, implement explicit tracking of VMs created by this script run. Record the VMID when the VM is created (likely near where qm create is called), and modify is_our_vm to verify that the VMID matches a VM created in the current run. This ensures cleanup only destroys VMs that this script instance actually created, eliminating the risk of destroying unrelated VMs even if they happen to have the same configured name. Update both the is_our_vm function logic and the cleanup function call to use this explicit ownership tracking instead of name-based matching.

coderabbitai · 2026-06-15T11:55:56Z

+cleanup() {
+    if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi
+    rm -f "$ERR_FILE" 2>/dev/null || true


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the mktemp file reserved until after the rollback check.

Line 78 calls cleanup, so Line 56 deletes the secure mktemp file before Line 102 writes to it; the redirect then recreates that /tmp path and Line 112 prevents cleanup from removing it on success. Split VM cleanup from temp cleanup.

🧹 Split VM cleanup from temp-file cleanup

+cleanup_vm() { + if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi +} + cleanup() { - if is_our_vm; then qm destroy "$VMID" --purge 1 >/dev/null 2>&1 || true; fi + cleanup_vm rm -f "$ERR_FILE" 2>/dev/null || true } @@ -cleanup +cleanup_vm qm create "$VMID" --memory 512 --scsihw virtio-scsi-single --name "$TEST_VM_NAME" >/dev/null @@ qm destroy "$VMID" --purge 1 >/dev/null -trap - EXIT +rm -f "$ERR_FILE" 2>/dev/null || true +trap - EXIT

Also applies to: 78-78, 102-106, 111-112

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@t/e2e/snapshots.sh` around lines 54 - 56, The cleanup function is removing the mktemp file (ERR_FILE) prematurely, before it's needed for writing error information and checking for rollback. Split the cleanup function into two separate functions: one for VM cleanup (qm destroy) that can be called early at line 78, and keep the temp file removal separate so it happens after the rollback check at lines 102-106 completes. The temp file cleanup should be called at the end around lines 111-112 to ensure ERR_FILE persists through the entire error checking and rollback logic.

…olname Regression coverage for the parse_volname fallback path fixed in 7d5209c: a snapshot-qualified volname (vm-<vmid>-<uuid>@<snap>) with no separate $snapname argument must be treated as snapshot context, so nested snapshots stay refused.

@snap

Both findings from this review (parse_volname @snap handling in path/volume_has_feature, and the e2e pvesm-list cleanup check) are resolved in 7d5209c, with regression coverage added in 29da440. Validated: 187/187 unit tests pass and a live PVE 9.2.2 e2e (4/4) with no cluster-side orphans. Remaining re-scan comments are non-blocking nitpicks / a deliberately-deferred hardening edge. Dismissing the stale changes-requested verdict to unblock merge.

coderabbitai Bot previously requested changes Jun 14, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread LightbitsPlugin.pm Outdated

Comment thread t/e2e/snapshots.sh Outdated

Comment thread t/volume_snapshot_rollback.t

Comment thread t/volume_snapshot_rollback.t

coderabbitai Bot previously requested changes Jun 15, 2026

View reviewed changes

Comment thread t/e2e/snapshots.sh

roiyz-lb added 4 commits June 15, 2026 09:47

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

roiyz-lb merged commit e299cb6 into main Jun 29, 2026
2 checks passed

Uh oh!

Conversation

roiyz-lb commented Jun 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Testing

Out of scope / follow-ups

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

roiyz-lb commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

roiyz-lb commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading