Skip to content

revert#434

Merged
xzrderek merged 2 commits intomainfrom
derekx/revert-litellm-sdk
Mar 10, 2026
Merged

revert#434
xzrderek merged 2 commits intomainfrom
derekx/revert-litellm-sdk

Conversation

@xzrderek
Copy link
Contributor

@xzrderek xzrderek commented Mar 10, 2026

Note

Medium Risk
Changes request forwarding and credential/tag injection in the proxy path, which can break completion routing or tracing if headers/body are not shaped exactly as LiteLLM expects.

Overview
Switches the metadata gateway from making LLM calls via the LiteLLM SDK/OTEL to acting as an HTTP proxy in front of a separate LiteLLM backend. The gateway now requires LITELLM_URL, forwards /chat/completions with injected Langfuse keys and evaluation tags, and adds a catch-all route to proxy arbitrary paths (with API key injection).

Updates deployment/config to match the new topology: new requirements.txt, simplified Dockerfile.gateway, expanded docker-compose.yml to run a litellm-backend container, and LiteLLM config moves from callbacks: [langfuse_otel] to success_callback/failure_callback langfuse. Trace serialization/conversion is simplified by dropping observation metadata and OTEL attribute parsing (including finish_reason extraction), and tests are adjusted accordingly; dependency constraints are updated (notably litellm<1.75.0 and add langchain-fireworks).

Written by Cursor Bugbot for commit 95cf759. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

content=response.content,
status_code=response.status_code,
headers=dict(response.headers),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming responses completely broken by buffered proxy

High Severity

The old code explicitly handled streaming (stream: true) requests using StreamingResponse with an async generator that yielded SSE chunks incrementally. The new code uses httpx.AsyncClient.post() which buffers the entire upstream response before returning it. When a client sends stream: true, the LiteLLM backend produces an SSE stream, but client.post() waits for all chunks, then returns them in a single blob via Response(content=response.content, ...). This completely breaks real-time streaming — clients receive no incremental updates and may timeout on long completions. The proxy_to_litellm catch-all has the same issue.

Additional Locations (1)

Fix in Cursor Fix in Web

content=response.content,
status_code=response.status_code,
headers=dict(response.headers),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream response headers blindly forwarded causing conflicts

Medium Severity

Both handle_chat_completion and proxy_to_litellm forward all upstream response headers verbatim via headers=dict(response.headers). This can include hop-by-hop headers like transfer-encoding: chunked and content-encoding from the LiteLLM backend. Since httpx decodes the body (dechunking, decompressing), forwarding these headers creates a mismatch — the client receives decoded content but headers claiming it's chunked or compressed, which can cause parsing errors.

Additional Locations (1)

Fix in Cursor Fix in Web

from .models import EvaluateResult, MetricResult
from .typed_interface import reward_function

logging.basicConfig(level=logging.INFO)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Library module sets global logging config at import

Low Severity

logging.basicConfig(level=logging.INFO) was added at module level in eval_protocol/reward_function.py, a library module. This configures the root logger as a side effect of importing the module, which can silently override or pre-empt the logging configuration of any application that imports this module. Libraries are expected to only use logging.getLogger(__name__) without calling basicConfig.

Fix in Cursor Fix in Web

@xzrderek xzrderek merged commit 0f3c471 into main Mar 10, 2026
17 checks passed
@xzrderek xzrderek deleted the derekx/revert-litellm-sdk branch March 10, 2026 10:58
xzrderek added a commit that referenced this pull request Mar 10, 2026
The revert in #434 accidentally removed the `logging.basicConfig(level=logging.INFO)`
call from the test remote servers. Without it, the root logger defaults to WARNING,
which silently drops the `logger.info()` call that emits `Status.rollout_finished()`.
The FireworksTracingHttpHandler never fires, no POST /logs is made, and the
RemoteRolloutProcessor polling loop times out after 180 seconds.

This restores the original line (with its explanatory comment) in both
remote_server.py and remote_server_multi_turn.py.

Made-with: Cursor
xzrderek added a commit that referenced this pull request Mar 10, 2026
The revert in #434 accidentally removed the `logging.basicConfig(level=logging.INFO)`
call from the test remote servers. Without it, the root logger defaults to WARNING,
which silently drops the `logger.info()` call that emits `Status.rollout_finished()`.
The FireworksTracingHttpHandler never fires, no POST /logs is made, and the
RemoteRolloutProcessor polling loop times out after 180 seconds.

This restores the original line (with its explanatory comment) in both
remote_server.py and remote_server_multi_turn.py.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant