Skip to content

fix(proxy): harden admission control and usage refresh#372

Open
mhughdo wants to merge 9 commits intoSoju06:mainfrom
mhughdo:fix/proxy-admission-usage-hardening
Open

fix(proxy): harden admission control and usage refresh#372
mhughdo wants to merge 9 commits intoSoju06:mainfrom
mhughdo:fix/proxy-admission-usage-hardening

Conversation

@mhughdo
Copy link
Copy Markdown
Contributor

@mhughdo mhughdo commented Apr 8, 2026

Problem observed

  • repeated 503 Service Unavailable responses from /backend-api/codex/responses and /backend-api/codex/responses/compact
  • repeated websocket handshake failures around /backend-api/codex/responses that made it unclear whether failures were coming from OpenAI or from local proxy admission / overload behavior
  • usage refresh kept emitting repeated 401 failures even after account cleanup, and deactivation-signaling usage failures were not being handled clearly enough
  • abandoned downstream websocket sessions could keep holding proxy_websocket capacity, causing recurring local 429 admission failures until the client or container restarted

What this changes

  • split downstream admission handling into explicit HTTP, websocket, compact, and dashboard lanes with clearer local overload responses and retry hints
  • harden response-create and refresh admission so same-account refreshes singleflight correctly, work-admission stays fail-fast under contention, and response-create cleanup does not leak or over-release gates
  • improve usage-refresh handling for repeated auth failures, including clearing module-level singleflight state between resets and deactivating accounts when the usage 401 clearly signals upstream deactivation
  • reclaim idle downstream websocket sessions so abandoned clients release proxy_websocket capacity instead of monopolizing the lane
  • add OpenSpec change artifacts and regression coverage around the affected concurrency, bridge, and websocket recovery paths

Validation

  • openspec validate --specs
  • uv run pytest tests/unit/test_auth_manager.py tests/unit/test_usage_updater.py tests/unit/test_settings_multi_replica.py tests/unit/test_bulkhead.py tests/unit/test_backpressure.py tests/unit/test_proxy_utils.py -q
  • uv run pytest tests/integration/test_http_responses_bridge.py -q -k "codex_session_prewarms_first_request or reconnect_uses_last_upstream_turn_state or session_id_reconnect_keeps_upstream_turn_state or does_not_evict_active_session_when_pool_is_full or retry_http_bridge_precreated_request_releases_pending_lock_before_reconnect"
  • uv run pytest tests/integration/test_proxy_websocket_responses.py -q -k "reclaims_idle_downstream_session_and_upstream or does_not_expire_downstream_while_request_pending or keeps_downstream_open_after_clean_upstream_close or emits_timeout_failure_for_stalled_upstream"
  • uv run pytest tests/unit/test_bulkhead.py -q -k "websocket_lane_recovers_after_active_session_exits or websocket_denies_with_http_response_when_lane_full"
  • uv run pytest tests/unit/test_work_admission.py tests/unit/test_usage_updater.py tests/unit/test_settings_multi_replica.py -q
  • uvx ruff check app/core/config/settings.py app/core/resilience/backpressure.py app/core/resilience/bulkhead.py app/core/resilience/overload.py app/main.py app/modules/accounts/auth_manager.py app/modules/proxy/service.py app/modules/proxy/work_admission.py app/modules/usage/updater.py tests/unit/test_auth_manager.py tests/unit/test_backpressure.py tests/unit/test_bulkhead.py tests/unit/test_proxy_utils.py tests/unit/test_settings_multi_replica.py tests/unit/test_usage_updater.py tests/unit/test_work_admission.py tests/integration/test_http_responses_bridge.py tests/integration/test_proxy_websocket_responses.py

Local soak test

  • built and ran the branch locally in Docker and exercised it continuously for 2 days without reproducing the original 503 / repeated usage-refresh error pattern
  • rebuilt codex-lb-local with the idle-websocket follow-up and verified the local websocket lane recovered without fresh 429 admissions after restart

Copilot AI review requested due to automatic review settings April 8, 2026 13:49
@mhughdo mhughdo force-pushed the fix/proxy-admission-usage-hardening branch from 9ff2355 to 2042759 Compare April 8, 2026 13:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the proxy’s local admission controls and refresh behavior to better isolate traffic classes, prevent expensive-work overload, and reduce retry storms from repeated refresh/usage failures.

Changes:

  • Split downstream bulkhead admission into explicit lanes (proxy HTTP, proxy websocket, compact, dashboard) and return clearer local overload responses (HTTP + websocket handshake).
  • Add in-process “expensive work” admission limits (token refresh, upstream websocket connect, response create) and avoid penalizing accounts for local overload/unavailable failures.
  • Add usage refresh cooldown/deactivation improvements plus expanded unit/integration regression coverage and OpenSpec artifacts.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
app/core/config/settings.py Adds new admission/cooldown settings and derives lane defaults from existing proxy limit.
app/core/resilience/overload.py Introduces shared helpers for local overload envelopes and websocket HTTP-denial responses.
app/core/resilience/bulkhead.py Implements split bulkhead lanes and explicit local overload/unavailable responses with structured logging.
app/core/resilience/backpressure.py Updates backpressure rejection payloads and websocket handling to use HTTP denial responses.
app/main.py Wires new bulkhead lane settings into app middleware creation.
app/modules/proxy/work_admission.py Adds a second-stage work admission controller for expensive proxy operations.
app/modules/proxy/service.py Integrates work admission (refresh/connect/create), improves cleanup/release paths, and treats local errors as account-neutral.
app/modules/proxy/api.py Ensures Retry-After is set for locally-generated 429 overload responses.
app/modules/accounts/auth_manager.py Adds per-account refresh singleflight + short failure caching and optional refresh admission leasing.
app/modules/usage/updater.py Adds usage-refresh auth-failure cooldown + 401 deactivation-message detection + state reset helper for tests.
tests/unit/test_bulkhead.py Expands regression coverage for lane isolation and websocket HTTP denial responses.
tests/unit/test_backpressure.py Updates assertions for new overload payloads and websocket denial behavior.
tests/unit/test_proxy_utils.py Adds proxy service regression tests for local overload surfacing and admission release/ordering.
tests/unit/test_auth_manager.py Adds refresh singleflight/admission/failure-caching regression tests.
tests/unit/test_usage_updater.py Adds usage refresh cooldown + 401 deactivation-message regression tests.
tests/unit/test_settings_multi_replica.py Adds coverage for new lane/work-admission settings defaults and env overrides.
tests/integration/test_http_responses_bridge.py Updates bridge request-state initialization to align with new response-create gate tracking.
openspec/changes/stabilize-proxy-admission-and-usage-refresh/* Adds proposal/design/spec/task artifacts describing the new behaviors.
Comments suppressed due to low confidence (1)

app/core/config/settings.py:276

  • _validate_http_bridge_instance_configuration() now also mutates bulkhead lane defaults (bulkhead_proxy_http_limit, bulkhead_proxy_websocket_limit, bulkhead_proxy_compact_limit). Since this validator name/scope is no longer specific to HTTP bridge configuration, it’s easy to miss these side effects when editing settings. Consider splitting the bulkhead defaulting into a separate @model_validator (or renaming this one) so the intent is discoverable and changes to bridge validation don’t accidentally impact admission defaults.
        if isinstance(value, list):
            normalized: list[str] = []
            for entry in value:
                if isinstance(entry, str):
                    instance_id = entry.strip()
                    if instance_id:
                        normalized.append(instance_id)
            return normalized
        raise TypeError("http_responses_session_bridge_instance_ring must be a list or comma-separated string")

    @field_validator("model_context_window_overrides", mode="before")
    @classmethod
    def _parse_model_context_window_overrides(cls, value: object) -> dict[str, int]:
        if value is None:
            return {}
        if isinstance(value, str):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mhughdo mhughdo marked this pull request as draft April 8, 2026 15:26
@mhughdo mhughdo marked this pull request as ready for review April 9, 2026 02:55
@mhughdo mhughdo requested a review from Soju06 as a code owner April 9, 2026 02:55
@mhughdo mhughdo requested a review from Copilot April 9, 2026 02:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 35 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants