Add OpenAI-compatible /v1 API for Open WebUI integration#551
Add OpenAI-compatible /v1 API for Open WebUI integration#551
Conversation
Adds /v1/models and /v1/chat/completions endpoints that allow OpenAI-compatible clients (Open WebUI, LiteLLM, Continue.dev) to use archi as a backend. Includes streaming SSE and non-streaming JSON responses, bearer token auth, conversation persistence via X-OpenWebUI-Chat-Id header mapping, multi-turn context via external_history, and citation formatting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an OpenAI-compatible /v1 API surface to the chat app so Open WebUI (and other OpenAI-compatible clients) can use archi as a backend, including token-based auth, streaming (SSE), and conversation ID mapping.
Changes:
- Introduces a new Flask blueprint implementing
GET /v1/modelsandPOST /v1/chat/completions(streaming + non-streaming). - Adds API token generation/validation in
UserServiceplus REST endpoints for token management. - Adds a shared citation formatter, schema updates (
api_token_hash,external_chat_id), and Open WebUI deployment/docs/examples.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/interfaces/chat_app/openai_compat.py |
New /v1 blueprint, auth middleware, SSE translation, conversation mapping/persistence. |
src/interfaces/chat_app/app.py |
Adds external_history support and includes source docs/scores in final stream events; registers /v1 blueprint conditionally. |
src/interfaces/chat_app/api.py |
Adds /api/users/me/api-token endpoints for token management. |
src/utils/user_service.py |
Adds API token generation (hash-only storage), lookup, existence check, revocation. |
src/archi/utils/citation_formatter.py |
New utility to format deduped/sorted citation blocks. |
src/cli/templates/init.sql |
Schema updates for users.api_token_hash and conversation_metadata.external_chat_id + indexes. |
src/cli/templates/base-config.yaml |
Adds services.chat_app.openai_compat.enabled config knob. |
tests/unit/test_openai_compat_endpoints.py |
HTTP-layer tests for /v1 routing/validation/auth/streaming/error paths. |
tests/unit/test_openai_compat_conversations.py |
Tests for external chat ID mapping logic and mocked SQL behavior. |
tests/unit/test_api_tokens.py |
Unit tests for token generation/hash lookup/revocation. |
tests/unit/test_citation_formatter.py |
Unit tests for citation formatting behavior (dedupe/sorting/labels). |
examples/deployments/openwebui/docker-compose.openwebui.yaml |
Example Open WebUI compose pointing to archi /v1. |
examples/deployments/openwebui/config.yaml |
Example archi config enabling openai_compat. |
docs/docs/api-reference-v1.md |
API reference for /v1 endpoints and response formats. |
docs/docs/openwebui-integration.md |
Integration/setup guide for Open WebUI. |
docs/docs/proposals/openwebui-compat.md |
Design proposal for the /v1 compatibility layer. |
docs/docs/proposals/multi-collection-routing.md |
Companion proposal including shared citations/multi-collection ideas. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| response = event.get("response") | ||
| if response: | ||
| final_content = response.answer | ||
| docs = event.get("source_documents", []) |
There was a problem hiding this comment.
Non-streaming path assumes the final event’s response has an .answer attribute, but ChatWrapper.stream() emits response as a formatted string (see src/interfaces/chat_app/app.py:2065-2070). This will raise AttributeError on every successful request. Treat response as a string (or handle both string + PipelineOutput) and/or accumulate from chunk events as a fallback.
There was a problem hiding this comment.
Already addressed in a prior commit — the code at this location treats response as a plain string (no .answer access). The comment on line 298-299 documents this. No change needed.
| _persist_messages(conversation_id, query, accumulated[0]) | ||
| return |
There was a problem hiding this comment.
_persist_messages() inserts into the conversations table, but ChatWrapper.stream() already persists both user+assistant messages during _finalize_result() (see src/interfaces/chat_app/app.py:1510-1519 and 2065-2080). When conversation_id is provided (e.g., via X-OpenWebUI-Chat-Id mapping), this will duplicate messages for every /v1 request. Remove these extra inserts and rely on ChatWrapper persistence, or only persist on error in a way that cannot double-insert on successful runs.
There was a problem hiding this comment.
Already addressed — _persist_messages() was removed in a prior commit. ChatWrapper.stream() handles persistence via _finalize_result(). No change needed.
| cursor.execute( | ||
| "SELECT conversation_id FROM conversation_metadata WHERE external_chat_id = %s", | ||
| (external_chat_id,) | ||
| ) | ||
| row = cursor.fetchone() | ||
| if row: | ||
| cursor.execute( | ||
| "UPDATE conversation_metadata SET last_message_at = NOW() WHERE conversation_id = %s", | ||
| (row[0],) | ||
| ) | ||
| conn.commit() | ||
| return row[0] | ||
|
|
||
| cursor.execute( | ||
| """ | ||
| INSERT INTO conversation_metadata (user_id, client_id, title, external_chat_id) | ||
| VALUES (%s, %s, %s, %s) | ||
| RETURNING conversation_id | ||
| """, | ||
| (user_id, client_id, "Open WebUI Chat", external_chat_id) | ||
| ) | ||
| conv_id = cursor.fetchone()[0] | ||
| conn.commit() | ||
| return conv_id | ||
| finally: |
There was a problem hiding this comment.
Conversation mapping has a race: two concurrent requests with the same external_chat_id can both miss the SELECT and then hit the INSERT, causing a unique-constraint violation on idx_conv_meta_external_chat and returning None (breaking multi-turn continuity). Use an atomic upsert (INSERT ... ON CONFLICT (external_chat_id) DO UPDATE ... RETURNING conversation_id) or catch psycopg2.IntegrityError and re-SELECT the existing row.
There was a problem hiding this comment.
Ok, Replaced the SELECT-then-INSERT with an atomic INSERT ... ON CONFLICT (external_chat_id) WHERE external_chat_id IS NOT NULL DO UPDATE SET last_message_at = NOW() RETURNING conversation_id. Eliminates the TOCTOU race entirely.
| except Exception as exc: | ||
| logger.error(f"/v1 streaming error: {exc}", exc_info=True) | ||
| yield _sse_chunk(request_id, model, content=f"\n\n[Error: {exc}]") | ||
| yield _sse_chunk(request_id, model, finish_reason="stop") | ||
| yield "data: [DONE]\n\n" | ||
| _persist_messages(conversation_id, query, None) |
There was a problem hiding this comment.
The streaming exception handler returns the raw exception text to the client ([Error: {exc}]). This can leak internal details (DB info, stack-relevant messages) to untrusted callers. Prefer returning a generic message (and keep the full details only in logs), consistent with ChatWrapper.stream()’s default server error; see chat logs for message.
There was a problem hiding this comment.
Fixed. Both catch-all except Exception handlers (streaming line 269 and non-streaming line 315) now return "server error; see chat logs for message" instead of str(exc), matching the pattern used by ChatWrapper.stream(). The full exception is still logged server-side. Note: the "error" event handlers were already safe since ChatWrapper sanitizes those messages upstream.
| @api.route('/users/me/api-token', methods=['GET']) | ||
| @require_client_id | ||
| def check_api_token(): | ||
| """ | ||
| Check whether the current user has an API token. | ||
|
|
||
| Returns: | ||
| {"has_token": true/false} — never returns the token itself. | ||
| """ | ||
| try: | ||
| services = get_services() | ||
| has_token = services.user_service.has_api_token(g.client_id) | ||
| return jsonify({'has_token': has_token}), 200 | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Error checking API token: {e}") | ||
| return jsonify({'error': str(e)}), 500 | ||
|
|
||
|
|
||
| @api.route('/users/me/api-token', methods=['POST']) | ||
| @require_client_id | ||
| def generate_api_token(): | ||
| """ | ||
| Generate a new API token for the current user. | ||
|
|
||
| Returns the plaintext token once. Replaces any existing token. | ||
|
|
||
| Returns: | ||
| {"token": "archi_..."} | ||
| """ | ||
| try: | ||
| services = get_services() | ||
| # Ensure user exists | ||
| services.user_service.get_or_create_user(g.client_id) | ||
| token = services.user_service.generate_api_token(g.client_id) | ||
|
|
||
| return jsonify({ | ||
| 'token': token, | ||
| 'message': 'Save this token — it will not be shown again.', | ||
| }), 201 | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Error generating API token: {e}") | ||
| return jsonify({'error': str(e)}), 500 | ||
|
|
There was a problem hiding this comment.
These API-token endpoints are only guarded by @require_client_id, which does not authenticate the caller (it can generate an anon_... id). That allows unauthenticated clients to mint bearer tokens and then use /v1 even when auth_enabled is true, undermining the intended auth boundary. Require an authenticated session / RBAC permission for token issuance (and consider restricting to non-anonymous users), returning 401/403 when not logged in.
There was a problem hiding this comment.
Investigated — the /api blueprint (register_api()) is never registered with the Flask app in production. The FlaskAppWrapper wires its /api/ routes individually via add_endpoint() with require_auth() wrapping. These token endpoints are dead code at runtime and not exploitable. Will address auth gating properly when the blueprint is wired in.
| | Field | Type | Required | Description | | ||
| |-------|------|----------|-------------| | ||
| | `model` | string | Yes | Config name from `/v1/models` | | ||
| | `messages` | array | Yes | Array of `{role, content}` objects | | ||
| | `stream` | boolean | No | Enable SSE streaming (default: `false`) | | ||
| | `temperature` | float | No | Override pipeline temperature | | ||
| | `max_tokens` | integer | No | Override max tokens | | ||
|
|
There was a problem hiding this comment.
The /v1 reference documents temperature and max_tokens overrides, but the implementation in src/interfaces/chat_app/openai_compat.py currently ignores these fields (they’re not forwarded into stream_kwargs). Either implement these request overrides or remove them from the public API docs to avoid misleading clients.
There was a problem hiding this comment.
Fixed — removed temperature and max_tokens rows from the docs. Neither openai_compat.py, ChatWrapper.stream(), nor the pipeline layer supports runtime overrides for these. Can add support later as a separate feature if needed.
| {"type": "chunk", "content": "Test answer"}, | ||
| {"type": "final", "response": SimpleNamespace(answer="Test answer"), | ||
| "source_documents": [], "retriever_scores": []}, | ||
| ]) |
There was a problem hiding this comment.
The mocked final event uses response=SimpleNamespace(answer=...), but the real ChatWrapper.stream() emits final with response as a plain string (see src/interfaces/chat_app/app.py:2065-2068). This mismatch will let non-streaming /v1 regressions slip through (the current implementation will AttributeError on .answer). Update the mock event shape to match production so tests exercise the real contract.
There was a problem hiding this comment.
Not applicable — all mock final events in the test file use "response": "<plain string>" (lines 55, 223, 253, 311). There is no SimpleNamespace(answer=...) anywhere. The mocks match the real ChatWrapper.stream() behavior. No change needed.
| def chat_completions(): | ||
| """Handle OpenAI-compatible chat completion requests.""" | ||
| data = request.get_json(silent=True) | ||
| if not data: |
There was a problem hiding this comment.
request.get_json(silent=True) returns {} for an empty-but-valid JSON body, which is falsy; the current if not data: treats that as “invalid JSON” and returns the wrong error. Use an explicit if data is None: check so valid empty JSON produces the expected missing-field validation errors instead.
| if not data: | |
| if data is None: |
There was a problem hiding this comment.
Technically correct, but not worth fixing — an empty {} body would pass through to the next validation and get rejected with "'model' is required" anyway. No real OpenAI-compatible client sends {}. The practical impact is zero.
- fix race condition in _get_or_create_conversation with atomic upsert - stop leaking raw exception text to clients in error handlers - remove undocumented temperature/max_tokens from API docs - fix test mocks to use plain string response (matching ChatWrapper) - remove _persist_messages (ChatWrapper handles persistence) - fix non-streaming response.answer to use plain string Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Conditionally register OpenAI-compatible /v1 blueprint | ||
| openai_compat_config = self.chat_app_config.get("openai_compat", {}) | ||
| if openai_compat_config.get("enabled", False): | ||
| from src.interfaces.chat_app.openai_compat import register_openai_compat | ||
| user_service = UserService(pg_config=self.pg_config) | ||
| register_openai_compat( | ||
| self.app, | ||
| self.chat, | ||
| user_service=user_service, | ||
| auth_enabled=self.auth_enabled, | ||
| ) |
There was a problem hiding this comment.
Docs instruct users to generate bearer tokens via POST /api/users/me/api-token, but the chat app currently doesn’t register the src/interfaces/chat_app/api.py blueprint and FlaskAppWrapper doesn’t add an equivalent /api/users/me/api-token endpoint. As-is, users won’t have a supported way to mint tokens for /v1 when auth is enabled. Either wire these token routes into FlaskAppWrapper (e.g., via add_endpoint + require_auth/require_perm) or adjust the docs to match the actual token issuance mechanism.
| If authentication is enabled, generate a token via archi's API: | ||
|
|
||
| ```bash | ||
| curl -X POST http://localhost:7861/api/users/me/api-token \ | ||
| -H "Cookie: session=<your-session-cookie>" | ||
| ``` | ||
|
|
||
| Save the returned `archi_...` token. It's shown once and cannot be retrieved later. | ||
|
|
There was a problem hiding this comment.
This guide references /api/users/me/api-token for token generation/management, but the chat app currently doesn’t expose these routes (the api.py blueprint isn’t registered and FlaskAppWrapper doesn’t add equivalent endpoints). Either wire the token endpoints into the running chat app or update the guide to the actual token issuance flow so users can complete the integration when auth is enabled.
| Authorization: Bearer archi_<token> | ||
| ``` | ||
|
|
||
| Generate tokens via `POST /api/users/me/api-token`. |
There was a problem hiding this comment.
This reference says tokens are generated via POST /api/users/me/api-token, but the chat app doesn’t currently expose that route (the api.py blueprint isn’t registered and there’s no add_endpoint for it). Either expose those endpoints or update this section to the supported token issuance path so /v1 auth can be configured reliably.
| Generate tokens via `POST /api/users/me/api-token`. | |
| Use a bearer token issued by your deployment's configured authentication flow. | |
| The chat app does not currently expose `POST /api/users/me/api-token`, so do not rely on that route for `/v1` authentication. |
| except Exception as exc: | ||
| logger.error(f"/v1 non-streaming error: {exc}", exc_info=True) | ||
| _persist_messages(conversation_id, query, None) | ||
| return _openai_error(str(exc), "server_error", 500) |
There was a problem hiding this comment.
The non-streaming exception handler returns str(exc) in the OpenAI error payload. This exposes internal exception messages to clients; align with the streaming behavior fix by returning a generic server error message and logging the exception details server-side.
| return _openai_error(str(exc), "server_error", 500) | |
| return _openai_error("Internal server error", "server_error", 500) |
| def _sse_chunk(request_id, model, content=None, finish_reason=None): | ||
| """Build a single SSE data line in OpenAI format.""" | ||
| delta = {} | ||
| if content is not None: | ||
| delta["content"] = content | ||
| if finish_reason is not None: | ||
| delta["role"] = "assistant" |
There was a problem hiding this comment.
_sse_chunk() sets delta.role = "assistant" whenever finish_reason is provided, which makes the final "stop" chunk carry a non-empty delta. This diverges from the documented example in docs/docs/api-reference-v1.md (final chunk has delta: {}) and from typical OpenAI streaming behavior. Only include role in the initial chunk (if at all), and keep the terminal chunk’s delta empty.
| def _sse_chunk(request_id, model, content=None, finish_reason=None): | |
| """Build a single SSE data line in OpenAI format.""" | |
| delta = {} | |
| if content is not None: | |
| delta["content"] = content | |
| if finish_reason is not None: | |
| delta["role"] = "assistant" | |
| def _sse_chunk(request_id, model, content=None, finish_reason=None, include_role=False): | |
| """Build a single SSE data line in OpenAI format. | |
| `role` should only be included on the initial streamed chunk when | |
| explicitly requested. Terminal chunks should keep `delta` empty and | |
| communicate completion via `finish_reason`. | |
| """ | |
| delta = {} | |
| if include_role: | |
| delta["role"] = "assistant" | |
| if content is not None: | |
| delta["content"] = content |
| # 2. Generate an API token (after deployment is running): | ||
| # curl -X POST http://localhost:7861/api/users/me/api-token | ||
| # | ||
| # 3. Start Open WebUI (see docker-compose.openwebui.yaml): | ||
| # ARCHI_TOKEN=archi_<your-token> docker compose -f docker-compose.openwebui.yaml up -d |
There was a problem hiding this comment.
The example deployment instructions suggest generating a token via POST /api/users/me/api-token, but the chat app currently doesn’t expose these token endpoints (the blueprint isn’t registered / no add_endpoint wiring). Update the example to match the real token issuance mechanism or add the missing route wiring so the example is runnable as written.
| # 2. Generate an API token (after deployment is running): | |
| # curl -X POST http://localhost:7861/api/users/me/api-token | |
| # | |
| # 3. Start Open WebUI (see docker-compose.openwebui.yaml): | |
| # ARCHI_TOKEN=archi_<your-token> docker compose -f docker-compose.openwebui.yaml up -d | |
| # 2. Start Open WebUI (see docker-compose.openwebui.yaml): | |
| # docker compose -f docker-compose.openwebui.yaml up -d | |
| # | |
| # Notes: | |
| # - This example enables the OpenAI-compatible API on the chat app. | |
| # - The `/api/users/me/api-token` token-generation endpoint is not exposed by this deployment. |
- Support anonymous access when registry.allow_anonymous is true - Add token TTL expiry via api_token_created_at column - Audit log all auth paths (success, failure, anonymous, revoke) - Propagate is_admin from users table to User dataclass - Add tests for admin/non-admin roles, token expiry, anonymous access, and audit event logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
/v1/modelsand/v1/chat/completionsendpoints that allow OpenAI-compatible clients (Open WebUI, LiteLLM, Continue.dev) to use archi as a backendX-OpenWebUI-Chat-Idheader mapping, and multi-turn context viaexternal_historyChanges
New files
src/interfaces/chat_app/openai_compat.py— Flask blueprint with/v1/modelsand/v1/chat/completionssrc/archi/utils/citation_formatter.py— Formats source documents into citation textsrc/utils/user_service.py— User service with API token generation/validationsrc/interfaces/chat_app/api.py— REST API endpoints for token managementexamples/deployments/openwebui/— Example config and docker-compose for OpenWebUIdocs/docs/api-reference-v1.md— API reference documentationdocs/docs/openwebui-integration.md— Integration guidedocs/docs/proposals/openwebui-compat.md— Design proposalModified files
src/interfaces/chat_app/app.py— Conditionally registers the/v1blueprintsrc/cli/templates/base-config.yaml— Addsopenai_compatconfig sectionsrc/cli/templates/init.sql— Addsexternal_chat_idcolumn andconversation_metadataindexTests
tests/unit/test_openai_compat_endpoints.py— HTTP layer tests (routing, validation, auth, streaming)tests/unit/test_openai_compat_conversations.py— Conversation persistence teststests/unit/test_api_tokens.py— API token generation/validation teststests/unit/test_citation_formatter.py— Citation formatting testsTest plan
🤖 Generated with Claude Code