Add 5xx retry policy for transient server errors#26821
Open
andrewmathew1 wants to merge 3 commits into
Open
Conversation
Adds a retry policy for 500, 502, and 504 responses in the azcosmos client retry pipeline. Only read operations are retried, which is consistent with the .NET, Java, and Python Cosmos SDKs. The retry budget is one in-region retry followed by one cross-region retry. The cross-region retry only fires when cross-region retries are enabled and a preferred location is available to fail over to. Fixes Azure#25639. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds transient 5xx retry handling to the Cosmos client retry policy so read requests can recover from certain server-side failures while ensuring write requests are not retried.
Changes:
- Added retry handling for HTTP
500,502, and504inclientRetryPolicy, with an in-region retry followed by an optional cross-region retry. - Introduced
serverErrorRetryCountto track the 5xx retry budget independently of other retry counters. - Added unit tests for read/write behavior on these 5xx responses and documented the change in the package changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sdk/data/azcosmos/cosmos_client_retry_policy.go | Implements 5xx retry logic and tracking in the client retry policy. |
| sdk/data/azcosmos/cosmos_client_retry_policy_test.go | Adds tests validating read retries and ensuring writes are not retried for 5xx. |
| sdk/data/azcosmos/CHANGELOG.md | Notes the new 5xx retry behavior in the unreleased changelog. |
Adds tests covering scenarios the original PR did not exercise: * In-region retry succeeds without a cross-region attempt. * Cross-region retries disabled - only the in-region retry runs. * Exhausted retries surface `*azcore.ResponseError` with the original status code. * 501 (Not Implemented) is not retried. * 500 interleaved with 503 retries correctly across regions. Together with the existing tests this brings `attemptRetryOnServerError` and `shouldRetryStatus` to 100% statement coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses review feedback on the 5xx retry policy. Previously the cross-region retry was gated only on `enableCrossRegionRetries` and the length of `preferredLocations`, so single-region accounts (or configurations where the only resolved read endpoint matches the current endpoint) would burn the cross-region retry on the same URL without actually failing over. The cross-region retry now also requires more than one resolved read endpoint in the location cache. The mock location-cache helper used in tests is updated to populate `readEndpoints` / `writeEndpoints` per available location so existing cross-region retry tests continue to exercise the failover path, and a new test pins the single-endpoint behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a retry policy in the
clientRetryPolicyfor500,502, and504responses. Only read operations are retried; writes are surfaced to the caller immediately.The retry budget is one in-region retry followed by one cross-region retry. The cross-region retry only fires when cross-region retries are enabled and a preferred location is available to fail over to. After both attempts are exhausted (or skipped), the original
5xxresponse is returned wrapped as a non-retriable error, matching the pattern already used for503/404/403in this file.Fixes #25639.