Version
v0.10.0-rc01-0-g7251824d
Describe the bug.
Summary
After an instance was deleted, the Machine returned to Ready and then started an automatic DPU firmware update. During that update, the Machine moved to Error.
The Machine health shows a HostUpdateInProgress alert for DpuFirmware with PreventAllocations. The current REST status derivation appears to treat any PreventAllocations alert as Error, but this specific alert represents an in-progress automatic DPU firmware update.
Observed Behavior
Machine status becomes:
Error
Expected Behavior
Machine should report Initializing while the automatic DPU firmware update is running, rather than Error.
The PreventAllocations classification should still prevent new allocations while the update is active.
Likely Cause
REST machine status derivation treats any health alert with PreventAllocations as Error.
This is valid for many failure cases, but HostUpdateInProgress with target DpuFirmware is an expected update workflow state.
Acceptance Criteria
- A Machine with
HostUpdateInProgress / DpuFirmware / AutomaticDpuFirmwareUpdate health maps to Initializing, not
Error.
- The Machine remains unavailable for new allocation while the
PreventAllocations alert is active.
- Other health alerts with
PreventAllocations still map to Error.
- Test coverage includes the observed payload, including the paired
HeartbeatTimeout alert.
Minimum reproducible example
Relevant log output
Other/Misc.
No response
Code of Conduct
Version
v0.10.0-rc01-0-g7251824d
Describe the bug.
Summary
After an instance was deleted, the Machine returned to
Readyand then started an automatic DPU firmware update. During that update, the Machine moved toError.The Machine health shows a
HostUpdateInProgressalert forDpuFirmwarewithPreventAllocations. The current REST status derivation appears to treat anyPreventAllocationsalert asError, but this specific alert represents an in-progress automatic DPU firmware update.Observed Behavior
Machine status becomes:
ErrorExpected Behavior
Machine should report
Initializingwhile the automatic DPU firmware update is running, rather thanError.The
PreventAllocationsclassification should still prevent new allocations while the update is active.Likely Cause
REST machine status derivation treats any health alert with PreventAllocations as Error.
This is valid for many failure cases, but HostUpdateInProgress with target DpuFirmware is an expected update workflow state.
Acceptance Criteria
HostUpdateInProgress/DpuFirmware/AutomaticDpuFirmwareUpdatehealth maps toInitializing, notError.PreventAllocationsalert is active.PreventAllocationsstill map toError.HeartbeatTimeoutalert.Minimum reproducible example
Relevant log output
Other/Misc.
No response
Code of Conduct