Skip to content

feat(gateway): Add Messages API to HTTP router#1521

Merged
lightseek-bot merged 5 commits into
mainfrom
ekzhang/add-messages-api
May 24, 2026
Merged

feat(gateway): Add Messages API to HTTP router#1521
lightseek-bot merged 5 commits into
mainfrom
ekzhang/add-messages-api

Conversation

@ekzhang
Copy link
Copy Markdown
Collaborator

@ekzhang ekzhang commented May 22, 2026

Some backends like SGLang recently implement the /v1/messages API - sgl-project/sglang#18630

This PR adds support for routing to the messages API over HTTP, similar to how /v1/chat/completions is handled currently via a GenerationRequest trait impl.

Description

Problem

When hitting /v1/messages in SMG:

  HTTP/1.1 501 Not Implemented

  Messages API not yet implemented for this router

Solution

Forwards the API call directly for HTTP backends, with a text-based routing policy.

Changes

Changed router for /v1/messages identical to /v1/chat/completions.

Test Plan

Add test for /v1/messages

Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Message-creation routing now supports per-request model selection, streaming responses, and builds a consolidated routing text from system + text message content.
  • Tests

    • Added integration and unit tests for non-streaming, streaming (SSE), error propagation, and routing-text extraction scenarios.
  • Mocks

    • Test mock now simulates the messages endpoint with streaming event sequences, configurable failures, delays, and realistic usage/token data.

Review Change Stack

@github-actions github-actions Bot added protocols Protocols crate changes model-gateway Model gateway crate changes labels May 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Implements GenerationRequest for CreateMessageRequest (stream/model/routing text), wires RouterTrait::route_messages to forward typed requests to /v1/messages, adds a mock upstream messages handler, and introduces integration and unit tests for streaming, non-streaming, and error cases.

Changes

Messages API Routing Support

Layer / File(s) Summary
GenerationRequest trait implementation
crates/protocols/src/messages.rs
CreateMessageRequest implements GenerationRequest with is_stream() delegating to stream, get_model() exposing the model, and extract_text_for_routing() concatenating optional system content and message text blocks while ignoring non-text variants.
HTTP router integration
model_gateway/src/routers/http/router.rs
RouterTrait gains route_messages() which forwards typed CreateMessageRequest bodies to the /v1/messages backend via route_typed_request().
Mock worker messages endpoint
model_gateway/tests/common/mock_worker.rs
Registers POST /v1/messages and implements messages_handler supporting failure injection, optional delay, SSE streaming events (message_start, content_block_* deltas, message_stop) and non-streaming JSON responses.
API integration tests
model_gateway/tests/api/messages_api_test.rs, model_gateway/tests/api/mod.rs
Adds three integration tests exercising non-streaming success, streaming SSE event parsing, and upstream-error propagation against the HTTP proxy using AppTestContext and mock workers.
Unit tests for routing/extraction
model_gateway/tests/messages_test.rs
Adds tests for CreateMessageRequest's GenerationRequest behavior: stream flag, model routing, and extract_text_for_routing() across string, text-block, and non-text-only inputs.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Gateway as HTTP_Gateway_Router
  participant Worker as Upstream_Worker_v1_messages
  Client->>Gateway: POST /v1/messages (CreateMessageRequest, stream?, model, messages)
  Gateway->>Gateway: GenerationRequest::extract_text_for_routing()
  Gateway->>Worker: Forward typed request to /v1/messages
  Worker->>Gateway: SSE event stream OR JSON response OR error
  Gateway->>Client: Proxy SSE events or JSON response or propagate error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • lightseekorg/smg#758: Related gRPC Messages streaming router that depends on GenerationRequest behavior for streaming decisions.

Suggested labels

anthropic

Suggested reviewers

  • key4ng
  • slin1237

Poem

🐰 I hop through lines of code and streams,

I gather system prompts and tiny dreams,
I skip the images, keep the text in sight,
The gateway routes the messages right,
A rabbit cheers as tests take flight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding Messages API routing support to the HTTP gateway router, which is the primary objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ekzhang/add-messages-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 22, 2026

Hi @ekzhang, the DCO sign-off check has failed. All commits must include a Signed-off-by line.

To fix existing commits:

# Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-lease

To sign off future commits automatically:

  • Use git commit -s every time, or
  • VSCode: enable Git: Always Sign Off in Settings
  • PyCharm: enable Sign-off commit in the Commit tool window

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the GenerationRequest trait for CreateMessageRequest and adds a route_messages method to the HTTP router. The reviewer suggested enhancing the extract_text_for_routing logic to include ToolResult blocks while excluding control-plane items and internal reasoning blocks to provide better context for routing decisions without introducing noise.

Comment on lines +143 to +154
for msg in &self.messages {
match &msg.content {
InputContent::String(s) => push(s, &mut has_content, &mut buffer),
InputContent::Blocks(blocks) => {
for block in blocks {
if let InputContentBlock::Text(text_block) = block {
push(&text_block.text, &mut has_content, &mut buffer);
}
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of extract_text_for_routing only extracts text from InputContentBlock::Text. We should extend this to include ToolResult blocks to provide more context for routing decisions. However, we must ensure that control-plane items (such as McpApprovalRequest and McpApprovalResponse) are excluded to avoid noise and instability in routing. Additionally, Thinking blocks should be excluded as they contain internal reasoning rather than user-facing content, keeping the analysis channel distinct from routing logic.

        for msg in &self.messages {
            match &msg.content {
                InputContent::String(s) => push(s, &mut has_content, &mut buffer),
                InputContent::Blocks(blocks) => {
                    for block in blocks {
                        match block {
                            InputContentBlock::Text(text_block) => {
                                push(&text_block.text, &mut has_content, &mut buffer);
                            }
                            InputContentBlock::ToolResult(tool_result) => {
                                if tool_result.is_control_plane() {
                                    continue;
                                }
                                if let Some(content) = &tool_result.content {
                                    match content {
                                        ToolResultContent::String(s) => {
                                            push(s, &mut has_content, &mut buffer);
                                        }
                                        ToolResultContent::Blocks(blocks) => {
                                            for b in blocks {
                                                if let ToolResultContentBlock::Text(t) = b {
                                                    push(&t.text, &mut has_content, &mut buffer);
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                            _ => {}
                        }
                    }
                }
            }
        }
References
  1. To avoid noise and instability in routing decisions, do not extract text from control-plane items such as McpApprovalRequest and McpApprovalResponse.
  2. Mixing user-facing content with internal chain-of-thought (CoT) in the analysis channel conflates the two distinct purposes; routing should focus on user-facing or environment state.

body: &CreateMessageRequest,
model_id: &str,
) -> Response {
self.route_typed_request(headers, body, "/v1/messages", model_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Important: route_to_endpoint() in grpc/utils/metrics.rs has no match arm for "/v1/messages", so it falls through to "other". All messages-API metrics (request counts, durations, errors, retries) from the HTTP router will be bucketed under the "other" endpoint label instead of the existing ENDPOINT_MESSAGES ("messages").

The gRPC routers already use metrics_labels::ENDPOINT_MESSAGES directly, but the HTTP router relies on route_to_endpoint(route) to derive the label from the path string.

Fix: add "/v1/messages" => metrics_labels::ENDPOINT_MESSAGES to the match in route_to_endpoint.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e60a18714

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +148 to +149
if let InputContentBlock::Text(text_block) = block {
push(&text_block.text, &mut has_content, &mut buffer);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include tool_result content when extracting routing text

The new CreateMessageRequest::extract_text_for_routing only appends InputContentBlock::Text, so requests whose latest user turn is a tool_result (common in tool-calling loops) can produce an empty or incomplete routing key even though they contain substantial text. This degrades text-based worker selection and can misroute /v1/messages traffic compared with chat routing, which includes tool message content; consider extracting text from ToolResult payloads (string and text blocks) as well.

Useful? React with 👍 / 👎.

ekzhang and others added 2 commits May 22, 2026 21:19
Some backends like SGLang recently implement the `/v1/messages` API - sgl-project/sglang#18630

This PR adds support for routing to the messages API over HTTP, similar to how `/v1/chat/completions` is handled currently via a `GenerationRequest` trait impl.

Signed-off-by: Eric Zhang <eric@thinkingmachines.ai>
Adds integration tests for the new HTTP `/v1/messages` proxy:
- `/v1/messages` mock handler in the shared MockWorker (non-streaming
  + SSE streaming, Anthropic event sequence).
- End-to-end tests via `AppTestContext` for success, streaming, and
  upstream-error propagation.
- Unit tests for the new `GenerationRequest` impl on
  `CreateMessageRequest` (is_stream, routing-text extraction for
  string content, text blocks, and image-only / no-text edge case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric Zhang <eric@thinkingmachines.ai>
@ekzhang ekzhang force-pushed the ekzhang/add-messages-api branch from 1e60a18 to e18412e Compare May 22, 2026 21:20
@github-actions github-actions Bot added the tests Test changes label May 22, 2026
@lightseek-bot lightseek-bot added the priority:high High priority label May 22, 2026
Copy link
Copy Markdown
Member

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems lint is failing. @ekzhang can you take a look?

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 23, 2026

Hi @ekzhang, the DCO sign-off check has failed. All commits must include a Signed-off-by line.

To fix existing commits:

# Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-lease

To sign off future commits automatically:

  • Use git commit -s every time, or
  • VSCode: enable Git: Always Sign Off in Settings
  • PyCharm: enable Sign-off commit in the Commit tool window

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72d525ce93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread model_gateway/tests/messages_test.rs Outdated
.unwrap();

assert!(req.is_stream());
assert_eq!(req.get_model(), Some("claude-sonnet-4-5-20250929"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Disambiguate trait call for get_model in test

CreateMessageRequest has an inherent get_model(&self) -> &str and also implements GenerationRequest::get_model(&self) -> Option<&str>. The assertion here calls req.get_model() and compares it to Some(...), which resolves to the inherent method and produces a type mismatch (&str vs Option<&str>), so this test target will not compile when dependencies are available. Use fully-qualified syntax (e.g. GenerationRequest::get_model(&req)) or compare against the inherent return type.

Useful? React with 👍 / 👎.

@lightseek-bot
Copy link
Copy Markdown
Collaborator

lightseek-bot and others added 2 commits May 23, 2026 13:02
Signed-off-by: zhyncs <46627482+zhyncs@users.noreply.github.com>
@zhyncs zhyncs force-pushed the ekzhang/add-messages-api branch from e630d01 to cf1f2c3 Compare May 23, 2026 23:24
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf1f2c3cd2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +148 to +150
if let InputContentBlock::Text(text_block) = block {
push(&text_block.text, &mut has_content, &mut buffer);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Extract document text when building routing key

CreateMessageRequest::extract_text_for_routing currently appends only InputContentBlock::Text, so requests whose prompt is carried in document blocks (for example DocumentSource::Text or document content blocks) produce an empty/partial routing key even though they contain substantial text. Because HTTP routing uses this extracted text for worker selection (route_typed_request), document-heavy /v1/messages traffic can be misrouted compared with equivalent chat payloads; include textual fields from document blocks when constructing the routing text.

Useful? React with 👍 / 👎.

@lightseek-bot
Copy link
Copy Markdown
Collaborator

@lightseek-bot lightseek-bot merged commit 77c8be7 into main May 24, 2026
83 of 87 checks passed
@lightseek-bot lightseek-bot deleted the ekzhang/add-messages-api branch May 24, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-gateway Model gateway crate changes priority:high High priority protocols Protocols crate changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants