fix(proxy): harden admission control and usage refresh#372
fix(proxy): harden admission control and usage refresh#372mhughdo wants to merge 9 commits intoSoju06:mainfrom
Conversation
9ff2355 to
2042759
Compare
There was a problem hiding this comment.
Pull request overview
This PR hardens the proxy’s local admission controls and refresh behavior to better isolate traffic classes, prevent expensive-work overload, and reduce retry storms from repeated refresh/usage failures.
Changes:
- Split downstream bulkhead admission into explicit lanes (proxy HTTP, proxy websocket, compact, dashboard) and return clearer local overload responses (HTTP + websocket handshake).
- Add in-process “expensive work” admission limits (token refresh, upstream websocket connect, response create) and avoid penalizing accounts for local overload/unavailable failures.
- Add usage refresh cooldown/deactivation improvements plus expanded unit/integration regression coverage and OpenSpec artifacts.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| app/core/config/settings.py | Adds new admission/cooldown settings and derives lane defaults from existing proxy limit. |
| app/core/resilience/overload.py | Introduces shared helpers for local overload envelopes and websocket HTTP-denial responses. |
| app/core/resilience/bulkhead.py | Implements split bulkhead lanes and explicit local overload/unavailable responses with structured logging. |
| app/core/resilience/backpressure.py | Updates backpressure rejection payloads and websocket handling to use HTTP denial responses. |
| app/main.py | Wires new bulkhead lane settings into app middleware creation. |
| app/modules/proxy/work_admission.py | Adds a second-stage work admission controller for expensive proxy operations. |
| app/modules/proxy/service.py | Integrates work admission (refresh/connect/create), improves cleanup/release paths, and treats local errors as account-neutral. |
| app/modules/proxy/api.py | Ensures Retry-After is set for locally-generated 429 overload responses. |
| app/modules/accounts/auth_manager.py | Adds per-account refresh singleflight + short failure caching and optional refresh admission leasing. |
| app/modules/usage/updater.py | Adds usage-refresh auth-failure cooldown + 401 deactivation-message detection + state reset helper for tests. |
| tests/unit/test_bulkhead.py | Expands regression coverage for lane isolation and websocket HTTP denial responses. |
| tests/unit/test_backpressure.py | Updates assertions for new overload payloads and websocket denial behavior. |
| tests/unit/test_proxy_utils.py | Adds proxy service regression tests for local overload surfacing and admission release/ordering. |
| tests/unit/test_auth_manager.py | Adds refresh singleflight/admission/failure-caching regression tests. |
| tests/unit/test_usage_updater.py | Adds usage refresh cooldown + 401 deactivation-message regression tests. |
| tests/unit/test_settings_multi_replica.py | Adds coverage for new lane/work-admission settings defaults and env overrides. |
| tests/integration/test_http_responses_bridge.py | Updates bridge request-state initialization to align with new response-create gate tracking. |
| openspec/changes/stabilize-proxy-admission-and-usage-refresh/* | Adds proposal/design/spec/task artifacts describing the new behaviors. |
Comments suppressed due to low confidence (1)
app/core/config/settings.py:276
_validate_http_bridge_instance_configuration()now also mutates bulkhead lane defaults (bulkhead_proxy_http_limit,bulkhead_proxy_websocket_limit,bulkhead_proxy_compact_limit). Since this validator name/scope is no longer specific to HTTP bridge configuration, it’s easy to miss these side effects when editing settings. Consider splitting the bulkhead defaulting into a separate@model_validator(or renaming this one) so the intent is discoverable and changes to bridge validation don’t accidentally impact admission defaults.
if isinstance(value, list):
normalized: list[str] = []
for entry in value:
if isinstance(entry, str):
instance_id = entry.strip()
if instance_id:
normalized.append(instance_id)
return normalized
raise TypeError("http_responses_session_bridge_instance_ring must be a list or comma-separated string")
@field_validator("model_context_window_overrides", mode="before")
@classmethod
def _parse_model_context_window_overrides(cls, value: object) -> dict[str, int]:
if value is None:
return {}
if isinstance(value, str):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 35 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 35 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem observed
503 Service Unavailableresponses from/backend-api/codex/responsesand/backend-api/codex/responses/compact/backend-api/codex/responsesthat made it unclear whether failures were coming from OpenAI or from local proxy admission / overload behavior401failures even after account cleanup, and deactivation-signaling usage failures were not being handled clearly enoughproxy_websocketcapacity, causing recurring local429admission failures until the client or container restartedWhat this changes
401clearly signals upstream deactivationproxy_websocketcapacity instead of monopolizing the laneValidation
openspec validate --specsuv run pytest tests/unit/test_auth_manager.py tests/unit/test_usage_updater.py tests/unit/test_settings_multi_replica.py tests/unit/test_bulkhead.py tests/unit/test_backpressure.py tests/unit/test_proxy_utils.py -quv run pytest tests/integration/test_http_responses_bridge.py -q -k "codex_session_prewarms_first_request or reconnect_uses_last_upstream_turn_state or session_id_reconnect_keeps_upstream_turn_state or does_not_evict_active_session_when_pool_is_full or retry_http_bridge_precreated_request_releases_pending_lock_before_reconnect"uv run pytest tests/integration/test_proxy_websocket_responses.py -q -k "reclaims_idle_downstream_session_and_upstream or does_not_expire_downstream_while_request_pending or keeps_downstream_open_after_clean_upstream_close or emits_timeout_failure_for_stalled_upstream"uv run pytest tests/unit/test_bulkhead.py -q -k "websocket_lane_recovers_after_active_session_exits or websocket_denies_with_http_response_when_lane_full"uv run pytest tests/unit/test_work_admission.py tests/unit/test_usage_updater.py tests/unit/test_settings_multi_replica.py -quvx ruff check app/core/config/settings.py app/core/resilience/backpressure.py app/core/resilience/bulkhead.py app/core/resilience/overload.py app/main.py app/modules/accounts/auth_manager.py app/modules/proxy/service.py app/modules/proxy/work_admission.py app/modules/usage/updater.py tests/unit/test_auth_manager.py tests/unit/test_backpressure.py tests/unit/test_bulkhead.py tests/unit/test_proxy_utils.py tests/unit/test_settings_multi_replica.py tests/unit/test_usage_updater.py tests/unit/test_work_admission.py tests/integration/test_http_responses_bridge.py tests/integration/test_proxy_websocket_responses.pyLocal soak test
503/ repeated usage-refresh error patterncodex-lb-localwith the idle-websocket follow-up and verified the local websocket lane recovered without fresh429admissions after restart