Add upgradePolicy to NVIDIADriver CRD#2582
Conversation
c145169 to
10fce04
Compare
43f03c5 to
2080190
Compare
fa6df1e to
ad29e8b
Compare
ad29e8b to
8d2a1ea
Compare
8d2a1ea to
c4fe7d4
Compare
|
LGTM. I am wondering about backward compatibility. Existing user-defined nvidiadriver CRs won't have upgrade configuration, and previously they were defaulting to the config in clusterpolicy. If custom upgrade config was used in clusterpolicy previously and user specified nvidiadriver was using it and now user upgrades to newer version, nvidiadriver will start using default config we specify. Would we be documenting this as a breaking change or something for users to be aware of when they are using nvidiadrivers and they jump to v26.7.0? |
rajathagasthya
left a comment
There was a problem hiding this comment.
Just a couple of minor comments. Overall LGTM.
Good observation. If users are depending on a custom upgradePolicy from ClusterPolicy, then yes, this would be a breaking change for user-created NVIDIADriver CRs. We will need to call this out in our release notes. |
The upgradePolicy influences how the driver-upgrade controller upgrades GPU driver daemonsets. Adding this field to the NVIDIADriver CRD allows users to define different upgrade policies for different NVIDIADriver CRs. If nil or empty, the driver-upgrade controller will fallback to using a default upgradePolicy defined in the code which aligns with the defaults in our helm values. Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
c4fe7d4 to
172ef9b
Compare
The upgradePolicy influences how the driver-upgrade controller upgrades GPU driver daemonsets. Adding this field to the NVIDIADriver CRD allows users to define different upgrade policies for different NVIDIADriver CRs. If nil or empty, the driver-upgrade controller will fallback to using a default upgradePolicy defined in the code which aligns with the defaults in our helm values.
Code changes in this PR were drafted with the assistance of Claude Code.
Testing
I tested the following scenarios on a k8s cluster with two GPUs:
driver.useNvidiaDriverCRD=truein ClusterPolicy; b) creating a default NVIDIADriver CR. Verify ClusterPolicy-managed pods gets orphaned then deleted by upgrade controller in a rolling fashion.spec.version. Verify upgrade controller upgrades pods in a rolling fashion.