Skip to content

feat: NVMe-oF multipath support (head-device discovery + multi-endpoint connect)#12

Merged
roiyz-lb merged 4 commits into
mainfrom
fix/nvme-multipath-discovery
Jun 11, 2026
Merged

feat: NVMe-oF multipath support (head-device discovery + multi-endpoint connect)#12
roiyz-lb merged 4 commits into
mainfrom
fix/nvme-multipath-discovery

Conversation

@roiyz-lb

@roiyz-lb roiyz-lb commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Makes the plugin work correctly with native NVMe-oF multipath against a multi-node LightOS (ANA) cluster. Three related changes:

  1. Multipath head-device resolution (_find_nvme_device): resolve the namespace head device (/dev/nvme<C>n<N>) from /sys/block, skipping per-path nvme<C>c<P>n<N> entries (which have no /dev node). The previous /sys/class/nvme walk built a device name from a path controller, which only equals the head with a single path and returned a non-existent node under multipath.
  2. Multi-endpoint connect (activate_volume): lb_nvme_host accepts a comma-separated list of host:port data endpoints; the plugin connects to each (per-endpoint gating via _connected_endpoints, so a path that failed once is retried later instead of leaving the volume permanently single-path). On a multi-node ANA cluster this guarantees the volume's optimized path is present — a single connection can land on a non-optimized path and the namespace then never becomes accessible — and gives transparent failover when a node goes down. Reconnect tuning: --keep-alive-tmo=30 --reconnect-delay=10 --ctrl-loss-tmo=-1.
  3. Safe teardown (deactivate_volume): disconnect the subsystem only when no volume of any storage on the host still maps it, decided from local symlinks rather than the REST API (nvme disconnect -n is subsystem-wide, so the old per-storeid/API-derived check could drop in-use paths for a second storage sharing the cluster, or fire on a transient API error).

Testing

Unit: 114 tests pass (taint-clean, perl -T -c), including new t/find_nvme_device.t (head vs per-path selection, multi-controller→one head, non-multipath, older-kernel subsysnqn fallback, nsid/NQN/no-/dev-node negatives) and t/nvme_endpoints.t (comma split, whitespace, default port, bracketed IPv6).

Extensive end-to-end testing was performed on a real 3-node LightOS cluster (PVE 9.2.2 guest, 3 data paths), covering positive, negative, and simulated-failover scenarios:

  • Positive: Proxmox VM started with an lb-storage disk → plugin connected all 3 data nodes → single multipath head device resolved and used by QEMU; multipath I/O verified (read/write); clean disconnect on VM stop.
  • Negative: single-endpoint connect against a volume whose ANA-optimized path is a different node correctly reproduces the "device did not appear" failure that multi-endpoint connect fixes; per-path (cN) devices with no /dev node are never returned; lone per-path entries, nsid/NQN mismatches, and missing-/dev-node heads all resolve to "not found" rather than a wrong device.
  • Simulated failover: with a VM running on the multipath disk, the ANA-optimized node was rebooted (real node failure). I/O paused briefly (no error, ctrl-loss-tmo=-1), the cluster re-optimized ANA to another replica, the head device survived, and I/O resumed at full throughput; the node rejoined as a healthy path afterward. Replica-count and single-vs-multi-replica path behavior were also exercised.

Security review

Reviewed by the security agent with a specific focus on data-loss when multiple clusters/storages share one Proxmox host; all findings addressed (safe subsystem teardown, per-endpoint connect, IPv6 parsing, no PII in tests). Verified clean to merge.

Notes

  • For the multi-endpoint skip to match cleanly, configure lb_nvme_host with IP endpoints.
  • Follow-up (deferred): match the namespace NGUID to the volume UUID in _find_nvme_device to close a narrow nsid-reuse race.

Summary by CodeRabbit

  • New Features

    • NVMe-oF targets can be configured with multiple host:port endpoints for transparent failover and preserved optimized paths.
  • Bug Fixes

    • More reliable multipath device detection by resolving the correct namespace “head” device.
    • Prevented accidental subsystem disconnects when other volumes on the host still reference the subsystem.

roiyz-lb added 3 commits June 11, 2026 10:31
Under native NVMe multipath (CONFIG_NVME_MULTIPATH=Y, the kernel default)
each namespace appears in /sys/block both as the multipath head
(nvme<C>n<N>, which has a /dev node) and as one per-path device
(nvme<C>c<P>n<N>, which does not). The previous /sys/class/nvme walk built
the device name from a path controller's number, which only equals the head
when there is a single path; with multiple paths it produced a name with no
/dev node and discovery failed.

Rewrite the resolver to enumerate /sys/block, consider only head namespaces,
match by subsystem NQN and nsid, and return the head's /dev node. Factor the
sysfs/dev roots and the block-device test into overridable hooks so the logic
is unit-testable, and add t/find_nvme_device.t (head selection vs per-path,
multi-path collapse, non-multipath, older-kernel subsysnqn fallback, and the
nsid/NQN/no-dev-node negatives).

Verified on PVE 9.2.2 (multipath=Y): attach resolves to the head /dev/nvme0n1,
not the per-path nvme0c0n1. Multi-path positive/negative (multiple live paths,
failover) still needs a multi-node cluster to validate.
activate_volume now connects to every endpoint in lb_nvme_host (a
comma-separated host:port list) instead of just one. On a multi-node
LightOS (ANA) cluster a single connection can land on a non-optimized
path, so the volume's namespace never becomes accessible and attach
fails with 'block device did not appear'; connecting to all data nodes
guarantees the optimized path is present, and gives transparent failover
to another replica when a node goes down.

Factor endpoint parsing into _nvme_endpoints (comma split, whitespace
trim, default port 4420, rightmost-colon so bracketed IPv6 works) and add
t/nvme_endpoints.t. Pairs with the multipath head-device resolution fix.

Validated end-to-end on a 3-node cluster: VM on Proxmox over 3 paths,
head device resolved, and a live reboot of the optimized node failed over
to another replica with I/O resuming after a brief pause.
…nnect, IPv6, PII

From the security review of the multipath branch:

- deactivate_volume: disconnect the subsystem only when no volume of ANY
  storage on the host still maps it, decided from local symlinks instead of
  the per-storeid API listing. The old check could (a) fire on a transient
  API error and (b) ignore a second storage sharing the same cluster — and
  since 'nvme disconnect -n' is subsystem-wide (drops every path), that tore
  down in-flight I/O for other volumes/storages. Adds _nqn_still_in_use.

- activate_volume: gate the multi-endpoint connect per endpoint (via
  _connected_endpoints) instead of per subsystem. The old _is_connected gate
  meant that if some endpoints failed on first activation the volume stayed
  permanently single-path, so a later node failure (ctrl-loss-tmo=-1) would
  hang VMs instead of failing over.

- _nvme_endpoints: strip IPv6 brackets ([v6]:port / [v6]) and untaint all
  branches; update the test to expect the unbracketed host.

- t/nvme_endpoints.t: replace real lab data-NIC IPs with 10.0.0.x.

Re-validated on the 3-node cluster: VM start connects all 3 paths, head
device resolves, QEMU uses it; VM stop disconnects cleanly.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4e89d9b0-174c-4b7c-a69a-c70d8119b846

📥 Commits

Reviewing files that changed from the base of the PR and between 484e2cd and 5286018.

📒 Files selected for processing (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Adds multipath NVMe-oF support: parse comma-separated lb_nvme_host endpoints, ensure connection to all endpoints during activation, resolve multipath head namespace devices via /sys/block enumeration, and prevent unsafe subsystem disconnects using local symlink checks.

Changes

NVMe-oF Multipath and Safe Deactivation

Layer / File(s) Summary
Endpoint parsing and configuration
LightbitsPlugin.pm (62–84), t/nvme_endpoints.t
_nvme_endpoints parses lb_nvme_host into [host, port] pairs with whitespace trimming, default port 4420, and IPv6 bracket handling; tests validate parsing and edge cases.
Sysfs utilities and test infrastructure
LightbitsPlugin.pm (86–138)
Adds _read_sysfs and _connected_endpoints to inspect /sys/class/nvme, introduces overridable $SYS_BLOCK and $DEV_DIR, and provides _dev_path and _is_block helpers for testable filesystem interactions.
Multipath-safe device resolution
LightbitsPlugin.pm (139–182), t/find_nvme_device.t
Refactors _find_nvme_device to enumerate /sys/block head namespace entries, match on subsystem NQN and nsid, and return /dev block devices only when present; tests exercise multipath head selection, fallbacks, and negative cases.
Volume activation with multipath endpoints
LightbitsPlugin.pm (585–615)
activate_volume iterates parsed endpoints, skips already-connected endpoints, runs nvme connect per endpoint with reconnect options, and waits for subsystem connectivity before device lookup.
Safe volume deactivation with local checks
LightbitsPlugin.pm (633–667)
Adds _nqn_still_in_use scanning /dev/lightbits/*/* symlinks to detect local usage; deactivate_volume disconnects the subsystem only when no local symlink references the NQN, replacing the previous API-derived check.
Changelog documentation
CHANGELOG.md (18, 30–31)
Documents the added comma-separated multipath endpoint support and changed multipath-safe device discovery and conditional subsystem disconnection behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely captures the main change: introducing NVMe-oF multipath support through head-device discovery and multi-endpoint connection logic.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 18: Replace the inconsistent "Multi-path" phrasing with the standardized
lowercase, no-hyphen term "multipath" across the changelog; specifically update
the header text "Multi-path NVMe-oF: `lb_nvme_host` accepts..." to read
"multipath NVMe-oF" and verify other occurrences (e.g., "multipath head" and
"native NVMe multipath") match this same lowercase, no-hyphen style for
consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3e27d4be-fc7f-4efc-80fc-48a43aeda7ac

📥 Commits

Reviewing files that changed from the base of the PR and between 142774a and 484e2cd.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • LightbitsPlugin.pm
  • t/find_nvme_device.t
  • t/nvme_endpoints.t

Comment thread CHANGELOG.md Outdated
@roiyz-lb roiyz-lb merged commit d83e0b3 into main Jun 11, 2026
2 checks passed
@roiyz-lb roiyz-lb deleted the fix/nvme-multipath-discovery branch June 11, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant