Description
Hi, I have encountered an issue where the LED state becomes abnormal during NVMe hot-plugging.
During NVMe hot-plug operations on systems with Native Multipathing(nvme_core.multipath=Y), enabled, ledmon fails to track the device state changes correctly. Specifically, when a drive is removed and re-inserted, the LED state becomes stuck in FAILURE. This happens because ledmon stores the physical PCI sysfs path, but udev reports events using the virtual subsystem path(e.g., /sys/devices/virtual/nvme-subsystem/...). Due to this path mismatch, ledmon cannot find the matching block device in its internal list, preventing the state from resetting to NORMAL.
Steps to reproduce bug
-
Use an NVMe SSD that supports Multi-controller Capabilities (CMIC).
-
Ensure the kernel is running with nvme_core.multipath=Y (default in many modern distributions).
-
Start ledmon in debug mode to monitor state transitions.
-
Perform a hot-remove of the NVMe drive.
-
Re-insert (hot-plug) the same NVMe drive.
-
Check the LED state using ledctl --list-slots --controller-type VMD
Expected behavior
When the drive is removed, the LED state should update accordingly (e.g., to an empty or neutral state). Upon re-insertion, the state should return to NORMAL.
Actual behavior
The LED state transitions to FAILURE after removal and remains in FAILURE even after the drive is plugged back in. The logs show No matching block device found because of the discrepancy between the stored physical path and the udev virtual path.
Environment
OS: Linux 6.8.0-83-generic
Disks: NVMe SSD with Multi-controller support (CMIC enabled)
Kernel Parameters: nvme_core.multipath=Y
Ledmon version
Intel(R) Enclosure LED Monitor Service 1.1.0
Ledmon logs
Initial State (Drive Attached)
ledmon[4902]: NEW /sys/devices/pci0000:d5/0000:d5:00.5/pci10003:00/10003:00:08.0/10003:0d:00.0/nvme/nvme1/nvme1c1n1: state 'ONESHOT_NORMAL'.
ledmon[4902]: CHANGE /sys/devices/pci0000:d5/0000:d5:00.5/pci10003:00/10003:00:08.0/10003:0d:00.0/nvme/nvme1/nvme1c1n1: from 'ONESHOT_NORMAL' to 'UNKNOWN'
root@SKY-9236-efi-N1:~/ledmon/src/ledctl# ./ledctl --list-slots --controller-type VMD
slot: 19 led state: NORMAL device: /dev/nvme1n1
Drive Removal (Path Mismatch Occurs)
ledmon[4902]: === UDEV Event: action=remove, syspath=/sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1 ===
ledmon[4902]: Searching for matching block device in list...
ledmon[4902]: Comparing: stored sysfs_path=/sys/devices/pci0000:d5/0000:d5:00.5/pci10003:00/10003:00:08.0/10003:0d:00.0/nvme/nvme1/nvme1c1n1 with udev syspath=/sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1
ledmon[4902]: No matching block device found for syspath: /sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1
ledmon[4902]: CHANGE /sys/devices/pci0000:d5/0000:d5:00.5/pci10003:00/10003:00:08.0/10003:0d:00.0/nvme/nvme1/nvme1c1n1: from 'UNKNOWN' to 'FAILURE'.
root@SKY-9236-efi-N1:~/ledmon/src/ledctl# ./ledctl --list-slots --controller-type VMD
slot: 19 led state: FAILURE device: /dev/nvme1n1
Drive Re-insertion (State Remains FAILURE)
ledmon[4902]: CHANGE /sys/devices/pci0000:d5/0000:d5:00.5/pci10003:00/10003:00:08.0/10003:0d:00.0/nvme/nvme1/nvme1c1n1: from 'FAILURE' to 'UNKNOWN'
ledmon[4902]: === UDEV Event: action=add, syspath=/sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1 ===
ledmon[4902]: Searching for matching block device in list...
ledmon[4902]: Comparing: stored sysfs_path=/sys/devices/pci0000:d5/0000:d5:00.5/pci10003:00/10003:00:08.0/10003:0d:00.0/nvme/nvme1/nvme1c1n1 with udev syspath=/sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1
ledmon[4902]: No matching block device found for syspath: /sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1
ledmon[4902]: === UDEV Event: action=change, syspath=/sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n1 ===
root@SKY-9236-efi-N1:~/ledmon/src/ledctl# ./ledctl --list-slots --controller-type VMD
slot: 19 led state: FAILURE device: /dev/nvme1n1
Ledctl logs
No response
Ledmon supported controllers
VMD (Intel Volume Management Device)
Additional information
No response
Description
Hi, I have encountered an issue where the LED state becomes abnormal during NVMe hot-plugging.
During NVMe hot-plug operations on systems with Native Multipathing(nvme_core.multipath=Y), enabled, ledmon fails to track the device state changes correctly. Specifically, when a drive is removed and re-inserted, the LED state becomes stuck in FAILURE. This happens because ledmon stores the physical PCI sysfs path, but udev reports events using the virtual subsystem path(e.g., /sys/devices/virtual/nvme-subsystem/...). Due to this path mismatch, ledmon cannot find the matching block device in its internal list, preventing the state from resetting to NORMAL.
Steps to reproduce bug
Use an NVMe SSD that supports Multi-controller Capabilities (CMIC).
Ensure the kernel is running with nvme_core.multipath=Y (default in many modern distributions).
Start ledmon in debug mode to monitor state transitions.
Perform a hot-remove of the NVMe drive.
Re-insert (hot-plug) the same NVMe drive.
Check the LED state using ledctl --list-slots --controller-type VMD
Expected behavior
When the drive is removed, the LED state should update accordingly (e.g., to an empty or neutral state). Upon re-insertion, the state should return to NORMAL.
Actual behavior
The LED state transitions to FAILURE after removal and remains in FAILURE even after the drive is plugged back in. The logs show No matching block device found because of the discrepancy between the stored physical path and the udev virtual path.
Environment
OS: Linux 6.8.0-83-generic
Disks: NVMe SSD with Multi-controller support (CMIC enabled)
Kernel Parameters: nvme_core.multipath=Y
Ledmon version
Intel(R) Enclosure LED Monitor Service 1.1.0
Ledmon logs
Initial State (Drive Attached)
Drive Removal (Path Mismatch Occurs)
Drive Re-insertion (State Remains FAILURE)
Ledctl logs
No response
Ledmon supported controllers
VMD (Intel Volume Management Device)
Additional information
No response