Skip to content

Add OpenAI-compatible /v1 API for Open WebUI integration#551

Open
swinney wants to merge 5 commits intodevfrom
openwebui-compat-mode
Open

Add OpenAI-compatible /v1 API for Open WebUI integration#551
swinney wants to merge 5 commits intodevfrom
openwebui-compat-mode

Conversation

@swinney
Copy link
Copy Markdown
Collaborator

@swinney swinney commented Apr 14, 2026

Summary

  • Adds /v1/models and /v1/chat/completions endpoints that allow OpenAI-compatible clients (Open WebUI, LiteLLM, Continue.dev) to use archi as a backend
  • Supports streaming (SSE) and non-streaming JSON responses, bearer token auth via API tokens, conversation persistence via X-OpenWebUI-Chat-Id header mapping, and multi-turn context via external_history
  • Includes citation formatting, user service with API token management, and example OpenWebUI deployment config

Changes

New files

  • src/interfaces/chat_app/openai_compat.py — Flask blueprint with /v1/models and /v1/chat/completions
  • src/archi/utils/citation_formatter.py — Formats source documents into citation text
  • src/utils/user_service.py — User service with API token generation/validation
  • src/interfaces/chat_app/api.py — REST API endpoints for token management
  • examples/deployments/openwebui/ — Example config and docker-compose for OpenWebUI
  • docs/docs/api-reference-v1.md — API reference documentation
  • docs/docs/openwebui-integration.md — Integration guide
  • docs/docs/proposals/openwebui-compat.md — Design proposal

Modified files

  • src/interfaces/chat_app/app.py — Conditionally registers the /v1 blueprint
  • src/cli/templates/base-config.yaml — Adds openai_compat config section
  • src/cli/templates/init.sql — Adds external_chat_id column and conversation_metadata index

Tests

  • tests/unit/test_openai_compat_endpoints.py — HTTP layer tests (routing, validation, auth, streaming)
  • tests/unit/test_openai_compat_conversations.py — Conversation persistence tests
  • tests/unit/test_api_tokens.py — API token generation/validation tests
  • tests/unit/test_citation_formatter.py — Citation formatting tests

Test plan

  • All unit tests pass locally (118 passed, 13 pre-existing failures from missing local deps)
  • CI passes on PR
  • Deploy with Open WebUI and verify multi-turn conversation flow
  • Verify streaming and non-streaming responses render correctly
  • Verify conversation persistence across page reloads

🤖 Generated with Claude Code

Austin Swinney and others added 2 commits April 14, 2026 15:01
Adds /v1/models and /v1/chat/completions endpoints that allow
OpenAI-compatible clients (Open WebUI, LiteLLM, Continue.dev) to
use archi as a backend. Includes streaming SSE and non-streaming
JSON responses, bearer token auth, conversation persistence via
X-OpenWebUI-Chat-Id header mapping, multi-turn context via
external_history, and citation formatting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an OpenAI-compatible /v1 API surface to the chat app so Open WebUI (and other OpenAI-compatible clients) can use archi as a backend, including token-based auth, streaming (SSE), and conversation ID mapping.

Changes:

  • Introduces a new Flask blueprint implementing GET /v1/models and POST /v1/chat/completions (streaming + non-streaming).
  • Adds API token generation/validation in UserService plus REST endpoints for token management.
  • Adds a shared citation formatter, schema updates (api_token_hash, external_chat_id), and Open WebUI deployment/docs/examples.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/interfaces/chat_app/openai_compat.py New /v1 blueprint, auth middleware, SSE translation, conversation mapping/persistence.
src/interfaces/chat_app/app.py Adds external_history support and includes source docs/scores in final stream events; registers /v1 blueprint conditionally.
src/interfaces/chat_app/api.py Adds /api/users/me/api-token endpoints for token management.
src/utils/user_service.py Adds API token generation (hash-only storage), lookup, existence check, revocation.
src/archi/utils/citation_formatter.py New utility to format deduped/sorted citation blocks.
src/cli/templates/init.sql Schema updates for users.api_token_hash and conversation_metadata.external_chat_id + indexes.
src/cli/templates/base-config.yaml Adds services.chat_app.openai_compat.enabled config knob.
tests/unit/test_openai_compat_endpoints.py HTTP-layer tests for /v1 routing/validation/auth/streaming/error paths.
tests/unit/test_openai_compat_conversations.py Tests for external chat ID mapping logic and mocked SQL behavior.
tests/unit/test_api_tokens.py Unit tests for token generation/hash lookup/revocation.
tests/unit/test_citation_formatter.py Unit tests for citation formatting behavior (dedupe/sorting/labels).
examples/deployments/openwebui/docker-compose.openwebui.yaml Example Open WebUI compose pointing to archi /v1.
examples/deployments/openwebui/config.yaml Example archi config enabling openai_compat.
docs/docs/api-reference-v1.md API reference for /v1 endpoints and response formats.
docs/docs/openwebui-integration.md Integration/setup guide for Open WebUI.
docs/docs/proposals/openwebui-compat.md Design proposal for the /v1 compatibility layer.
docs/docs/proposals/multi-collection-routing.md Companion proposal including shared citations/multi-collection ideas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +304 to +307
response = event.get("response")
if response:
final_content = response.answer
docs = event.get("source_documents", [])
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-streaming path assumes the final event’s response has an .answer attribute, but ChatWrapper.stream() emits response as a formatted string (see src/interfaces/chat_app/app.py:2065-2070). This will raise AttributeError on every successful request. Treat response as a string (or handle both string + PipelineOutput) and/or accumulate from chunk events as a fallback.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in a prior commit — the code at this location treats response as a plain string (no .answer access). The comment on line 298-299 documents this. No change needed.

Comment on lines +253 to +254
_persist_messages(conversation_id, query, accumulated[0])
return
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_persist_messages() inserts into the conversations table, but ChatWrapper.stream() already persists both user+assistant messages during _finalize_result() (see src/interfaces/chat_app/app.py:1510-1519 and 2065-2080). When conversation_id is provided (e.g., via X-OpenWebUI-Chat-Id mapping), this will duplicate messages for every /v1 request. Remove these extra inserts and rely on ChatWrapper persistence, or only persist on error in a way that cannot double-insert on successful runs.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — _persist_messages() was removed in a prior commit. ChatWrapper.stream() handles persistence via _finalize_result(). No change needed.

Comment on lines +413 to +437
cursor.execute(
"SELECT conversation_id FROM conversation_metadata WHERE external_chat_id = %s",
(external_chat_id,)
)
row = cursor.fetchone()
if row:
cursor.execute(
"UPDATE conversation_metadata SET last_message_at = NOW() WHERE conversation_id = %s",
(row[0],)
)
conn.commit()
return row[0]

cursor.execute(
"""
INSERT INTO conversation_metadata (user_id, client_id, title, external_chat_id)
VALUES (%s, %s, %s, %s)
RETURNING conversation_id
""",
(user_id, client_id, "Open WebUI Chat", external_chat_id)
)
conv_id = cursor.fetchone()[0]
conn.commit()
return conv_id
finally:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversation mapping has a race: two concurrent requests with the same external_chat_id can both miss the SELECT and then hit the INSERT, causing a unique-constraint violation on idx_conv_meta_external_chat and returning None (breaking multi-turn continuity). Use an atomic upsert (INSERT ... ON CONFLICT (external_chat_id) DO UPDATE ... RETURNING conversation_id) or catch psycopg2.IntegrityError and re-SELECT the existing row.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

@swinney swinney Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Replaced the SELECT-then-INSERT with an atomic INSERT ... ON CONFLICT (external_chat_id) WHERE external_chat_id IS NOT NULL DO UPDATE SET last_message_at = NOW() RETURNING conversation_id. Eliminates the TOCTOU race entirely.

Comment on lines +270 to +275
except Exception as exc:
logger.error(f"/v1 streaming error: {exc}", exc_info=True)
yield _sse_chunk(request_id, model, content=f"\n\n[Error: {exc}]")
yield _sse_chunk(request_id, model, finish_reason="stop")
yield "data: [DONE]\n\n"
_persist_messages(conversation_id, query, None)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming exception handler returns the raw exception text to the client ([Error: {exc}]). This can leak internal details (DB info, stack-relevant messages) to untrusted callers. Prefer returning a generic message (and keep the full details only in logs), consistent with ChatWrapper.stream()’s default server error; see chat logs for message.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Both catch-all except Exception handlers (streaming line 269 and non-streaming line 315) now return "server error; see chat logs for message" instead of str(exc), matching the pattern used by ChatWrapper.stream(). The full exception is still logged server-side. Note: the "error" event handlers were already safe since ChatWrapper sanitizes those messages upstream.

Comment on lines +323 to +367
@api.route('/users/me/api-token', methods=['GET'])
@require_client_id
def check_api_token():
"""
Check whether the current user has an API token.

Returns:
{"has_token": true/false} — never returns the token itself.
"""
try:
services = get_services()
has_token = services.user_service.has_api_token(g.client_id)
return jsonify({'has_token': has_token}), 200

except Exception as e:
logger.error(f"Error checking API token: {e}")
return jsonify({'error': str(e)}), 500


@api.route('/users/me/api-token', methods=['POST'])
@require_client_id
def generate_api_token():
"""
Generate a new API token for the current user.

Returns the plaintext token once. Replaces any existing token.

Returns:
{"token": "archi_..."}
"""
try:
services = get_services()
# Ensure user exists
services.user_service.get_or_create_user(g.client_id)
token = services.user_service.generate_api_token(g.client_id)

return jsonify({
'token': token,
'message': 'Save this token — it will not be shown again.',
}), 201

except Exception as e:
logger.error(f"Error generating API token: {e}")
return jsonify({'error': str(e)}), 500

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These API-token endpoints are only guarded by @require_client_id, which does not authenticate the caller (it can generate an anon_... id). That allows unauthenticated clients to mint bearer tokens and then use /v1 even when auth_enabled is true, undermining the intended auth boundary. Require an authenticated session / RBAC permission for token issuance (and consider restricting to non-anonymous users), returning 401/403 when not logged in.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — the /api blueprint (register_api()) is never registered with the Flask app in production. The FlaskAppWrapper wires its /api/ routes individually via add_endpoint() with require_auth() wrapping. These token endpoints are dead code at runtime and not exploitable. Will address auth gating properly when the blueprint is wired in.

Comment on lines +45 to +52
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `model` | string | Yes | Config name from `/v1/models` |
| `messages` | array | Yes | Array of `{role, content}` objects |
| `stream` | boolean | No | Enable SSE streaming (default: `false`) |
| `temperature` | float | No | Override pipeline temperature |
| `max_tokens` | integer | No | Override max tokens |

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /v1 reference documents temperature and max_tokens overrides, but the implementation in src/interfaces/chat_app/openai_compat.py currently ignores these fields (they’re not forwarded into stream_kwargs). Either implement these request overrides or remove them from the public API docs to avoid misleading clients.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed temperature and max_tokens rows from the docs. Neither openai_compat.py, ChatWrapper.stream(), nor the pipeline layer supports runtime overrides for these. Can add support later as a separate feature if needed.

Comment on lines +223 to +226
{"type": "chunk", "content": "Test answer"},
{"type": "final", "response": SimpleNamespace(answer="Test answer"),
"source_documents": [], "retriever_scores": []},
])
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mocked final event uses response=SimpleNamespace(answer=...), but the real ChatWrapper.stream() emits final with response as a plain string (see src/interfaces/chat_app/app.py:2065-2068). This mismatch will let non-streaming /v1 regressions slip through (the current implementation will AttributeError on .answer). Update the mock event shape to match production so tests exercise the real contract.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applicable — all mock final events in the test file use "response": "<plain string>" (lines 55, 223, 253, 311). There is no SimpleNamespace(answer=...) anywhere. The mocks match the real ChatWrapper.stream() behavior. No change needed.

def chat_completions():
"""Handle OpenAI-compatible chat completion requests."""
data = request.get_json(silent=True)
if not data:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.get_json(silent=True) returns {} for an empty-but-valid JSON body, which is falsy; the current if not data: treats that as “invalid JSON” and returns the wrong error. Use an explicit if data is None: check so valid empty JSON produces the expected missing-field validation errors instead.

Suggested change
if not data:
if data is None:

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically correct, but not worth fixing — an empty {} body would pass through to the next validation and get rejected with "'model' is required" anyway. No real OpenAI-compatible client sends {}. The practical impact is zero.

- fix race condition in _get_or_create_conversation with atomic upsert
- stop leaking raw exception text to clients in error handlers
- remove undocumented temperature/max_tokens from API docs
- fix test mocks to use plain string response (matching ChatWrapper)
- remove _persist_messages (ChatWrapper handles persistence)
- fix non-streaming response.answer to use plain string

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2192 to +2202
# Conditionally register OpenAI-compatible /v1 blueprint
openai_compat_config = self.chat_app_config.get("openai_compat", {})
if openai_compat_config.get("enabled", False):
from src.interfaces.chat_app.openai_compat import register_openai_compat
user_service = UserService(pg_config=self.pg_config)
register_openai_compat(
self.app,
self.chat,
user_service=user_service,
auth_enabled=self.auth_enabled,
)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs instruct users to generate bearer tokens via POST /api/users/me/api-token, but the chat app currently doesn’t register the src/interfaces/chat_app/api.py blueprint and FlaskAppWrapper doesn’t add an equivalent /api/users/me/api-token endpoint. As-is, users won’t have a supported way to mint tokens for /v1 when auth is enabled. Either wire these token routes into FlaskAppWrapper (e.g., via add_endpoint + require_auth/require_perm) or adjust the docs to match the actual token issuance mechanism.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +35
If authentication is enabled, generate a token via archi's API:

```bash
curl -X POST http://localhost:7861/api/users/me/api-token \
-H "Cookie: session=<your-session-cookie>"
```

Save the returned `archi_...` token. It's shown once and cannot be retrieved later.

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide references /api/users/me/api-token for token generation/management, but the chat app currently doesn’t expose these routes (the api.py blueprint isn’t registered and FlaskAppWrapper doesn’t add equivalent endpoints). Either wire the token endpoints into the running chat app or update the guide to the actual token issuance flow so users can complete the integration when auth is enabled.

Copilot uses AI. Check for mistakes.
Authorization: Bearer archi_<token>
```

Generate tokens via `POST /api/users/me/api-token`.
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reference says tokens are generated via POST /api/users/me/api-token, but the chat app doesn’t currently expose that route (the api.py blueprint isn’t registered and there’s no add_endpoint for it). Either expose those endpoints or update this section to the supported token issuance path so /v1 auth can be configured reliably.

Suggested change
Generate tokens via `POST /api/users/me/api-token`.
Use a bearer token issued by your deployment's configured authentication flow.
The chat app does not currently expose `POST /api/users/me/api-token`, so do not rely on that route for `/v1` authentication.

Copilot uses AI. Check for mistakes.
except Exception as exc:
logger.error(f"/v1 non-streaming error: {exc}", exc_info=True)
_persist_messages(conversation_id, query, None)
return _openai_error(str(exc), "server_error", 500)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-streaming exception handler returns str(exc) in the OpenAI error payload. This exposes internal exception messages to clients; align with the streaming behavior fix by returning a generic server error message and logging the exception details server-side.

Suggested change
return _openai_error(str(exc), "server_error", 500)
return _openai_error("Internal server error", "server_error", 500)

Copilot uses AI. Check for mistakes.
Comment on lines +356 to +362
def _sse_chunk(request_id, model, content=None, finish_reason=None):
"""Build a single SSE data line in OpenAI format."""
delta = {}
if content is not None:
delta["content"] = content
if finish_reason is not None:
delta["role"] = "assistant"
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sse_chunk() sets delta.role = "assistant" whenever finish_reason is provided, which makes the final "stop" chunk carry a non-empty delta. This diverges from the documented example in docs/docs/api-reference-v1.md (final chunk has delta: {}) and from typical OpenAI streaming behavior. Only include role in the initial chunk (if at all), and keep the terminal chunk’s delta empty.

Suggested change
def _sse_chunk(request_id, model, content=None, finish_reason=None):
"""Build a single SSE data line in OpenAI format."""
delta = {}
if content is not None:
delta["content"] = content
if finish_reason is not None:
delta["role"] = "assistant"
def _sse_chunk(request_id, model, content=None, finish_reason=None, include_role=False):
"""Build a single SSE data line in OpenAI format.
`role` should only be included on the initial streamed chunk when
explicitly requested. Terminal chunks should keep `delta` empty and
communicate completion via `finish_reason`.
"""
delta = {}
if include_role:
delta["role"] = "assistant"
if content is not None:
delta["content"] = content

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +16
# 2. Generate an API token (after deployment is running):
# curl -X POST http://localhost:7861/api/users/me/api-token
#
# 3. Start Open WebUI (see docker-compose.openwebui.yaml):
# ARCHI_TOKEN=archi_<your-token> docker compose -f docker-compose.openwebui.yaml up -d
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example deployment instructions suggest generating a token via POST /api/users/me/api-token, but the chat app currently doesn’t expose these token endpoints (the blueprint isn’t registered / no add_endpoint wiring). Update the example to match the real token issuance mechanism or add the missing route wiring so the example is runnable as written.

Suggested change
# 2. Generate an API token (after deployment is running):
# curl -X POST http://localhost:7861/api/users/me/api-token
#
# 3. Start Open WebUI (see docker-compose.openwebui.yaml):
# ARCHI_TOKEN=archi_<your-token> docker compose -f docker-compose.openwebui.yaml up -d
# 2. Start Open WebUI (see docker-compose.openwebui.yaml):
# docker compose -f docker-compose.openwebui.yaml up -d
#
# Notes:
# - This example enables the OpenAI-compatible API on the chat app.
# - The `/api/users/me/api-token` token-generation endpoint is not exposed by this deployment.

Copilot uses AI. Check for mistakes.
Austin Swinney and others added 2 commits April 14, 2026 19:23
Fix 3 pre-existing pyright type errors and add 46 unit tests
for the RBAC subsystem. See #552, #553.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Support anonymous access when registry.allow_anonymous is true
- Add token TTL expiry via api_token_created_at column
- Audit log all auth paths (success, failure, anonymous, revoke)
- Propagate is_admin from users table to User dataclass
- Add tests for admin/non-admin roles, token expiry, anonymous
  access, and audit event logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants