fix(activation): don't retry one-shot ACM activation calls on timeout#2743
Open
madhavilosetty-intel wants to merge 2 commits into
Open
fix(activation): don't retry one-shot ACM activation calls on timeout#2743madhavilosetty-intel wants to merge 2 commits into
madhavilosetty-intel wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR prevents non-idempotent ACM activation operations from being retried when AMT drops the session on a successful activation/upgrade (resulting in an expected gateway timeout). It adds a oneShot option to the shared WSMAN invocation helper so that these specific calls are capped to a single attempt and the timeout can propagate to the activation state machine, which then re-checks device status.
Changes:
- Add an opt-in
oneShotflag toinvokeWsmanCallto cap retry attempts to 1 for one-shot operations. - Use
oneShotforAdminSetup(ACM activation) andUpgradeClientToAdmin(CCM→ACM upgrade) WSMAN calls. - Update activation unit tests to assert the new
invokeWsmanCall(..., oneShot=true)invocation for these operations.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/stateMachines/common.ts |
Adds oneShot flag to invokeWsmanCall to force a single attempt for non-idempotent activation/upgrade calls. |
src/stateMachines/activation.ts |
Marks AdminSetup and UpgradeClientToAdmin as one-shot WSMAN calls to avoid retries on expected timeouts. |
src/stateMachines/activation.test.ts |
Updates assertions to verify invokeWsmanCall is invoked with oneShot=true for AdminSetup/Upgrade. |
3b75e10 to
3578ce2
Compare
On a successful ACM activation (AdminSetup) or CCM->ACM upgrade (UpgradeClientToAdmin), AMT transitions to admin mode and drops the session without sending a WSMAN response, so a gateway timeout is the expected outcome. invokeWsmanCall was treating that timeout as retryable (wsman_max_attempts floor) and re-issuing the non-idempotent call against a device already in ACM, producing HTTP 401 / connection resets and eventually an uncaught exception. Add an opt-in oneShot flag to invokeWsmanCall that caps the call at a single attempt, and use it for sendAdminSetup and sendUpgradeClientToAdmin so the timeout propagates to the state machine, which waits and re-queries device status via CHECK_ACTIVATION_ON_AMT.
43b2340 to
8d0c1d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On a successful ACM activation (AdminSetup) or CCM->ACM upgrade (UpgradeClientToAdmin), AMT transitions to admin mode and drops the session without sending a WSMAN response, so a gateway timeout is the expected outcome. invokeWsmanCall was treating that timeout as retryable (wsman_max_attempts floor) and re-issuing the non-idempotent call against a device already in ACM, producing HTTP 401 / connection resets and eventually an uncaught exception.
Add an opt-in oneShot flag to invokeWsmanCall that caps the call at a single attempt, and use it for sendAdminSetup and sendUpgradeClientToAdmin so the timeout propagates to the state machine, which waits and re-queries device status via CHECK_ACTIVATION_ON_AMT.
PR Checklist
What are you changing?
Anything the reviewer should know when reviewing this PR?
If the there are associated PRs in other repositories, please link them here (i.e. device-management-toolkit/repo#365 )