Skip to content

fix(device-plugin): do not abort plugin start in case no device is found#16

Merged
dkeven merged 1 commit intofeat/nvsharefrom
device/fix/start_no_device
Mar 12, 2026
Merged

fix(device-plugin): do not abort plugin start in case no device is found#16
dkeven merged 1 commit intofeat/nvsharefrom
device/fix/start_no_device

Conversation

@dkeven
Copy link
Member

@dkeven dkeven commented Mar 12, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:
In #13, support for hot plug/unplug is added to device-plugin.
In rare cases, the nvidia driver loses control of the only GPU and reinitiates it shortly after. If this happens when the device plugin is starting, and before the reinitialization is finished, NVML may report no devices are found, causing device plugin to abort start, thus when the reinitialization finishes, the GPU still will not be reported by the device plugin. We make the case of no devices also eligible to restart device plugin after a delay.

@dkeven dkeven merged commit 04c1579 into feat/nvshare Mar 12, 2026
1 check passed
@dkeven dkeven deleted the device/fix/start_no_device branch March 12, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant