Add OpenAI-compatible /v1 API for Open WebUI integration by swinney · Pull Request #551 · archi-physics/archi

swinney · 2026-04-14T19:26:52Z

Summary

Adds /v1/models and /v1/chat/completions endpoints that allow OpenAI-compatible clients (Open WebUI, LiteLLM, Continue.dev) to use archi as a backend
Supports streaming (SSE) and non-streaming JSON responses, bearer token auth via API tokens, conversation persistence via X-OpenWebUI-Chat-Id header mapping, and multi-turn context via external_history
Includes citation formatting, user service with API token management, and example OpenWebUI deployment config

Changes

New files

src/interfaces/chat_app/openai_compat.py — Flask blueprint with /v1/models and /v1/chat/completions
src/archi/utils/citation_formatter.py — Formats source documents into citation text
src/utils/user_service.py — User service with API token generation/validation
src/interfaces/chat_app/api.py — REST API endpoints for token management
examples/deployments/openwebui/ — Example config and docker-compose for OpenWebUI
docs/docs/api-reference-v1.md — API reference documentation
docs/docs/openwebui-integration.md — Integration guide
docs/docs/proposals/openwebui-compat.md — Design proposal

Modified files

src/interfaces/chat_app/app.py — Conditionally registers the /v1 blueprint
src/cli/templates/base-config.yaml — Adds openai_compat config section
src/cli/templates/init.sql — Adds external_chat_id column and conversation_metadata index

Tests

tests/unit/test_openai_compat_endpoints.py — HTTP layer tests (routing, validation, auth, streaming)
tests/unit/test_openai_compat_conversations.py — Conversation persistence tests
tests/unit/test_api_tokens.py — API token generation/validation tests
tests/unit/test_citation_formatter.py — Citation formatting tests

Test plan

All unit tests pass locally (118 passed, 13 pre-existing failures from missing local deps)
CI passes on PR
Deploy with Open WebUI and verify multi-turn conversation flow
Verify streaming and non-streaming responses render correctly
Verify conversation persistence across page reloads

🤖 Generated with Claude Code

Adds /v1/models and /v1/chat/completions endpoints that allow OpenAI-compatible clients (Open WebUI, LiteLLM, Continue.dev) to use archi as a backend. Includes streaming SSE and non-streaming JSON responses, bearer token auth, conversation persistence via X-OpenWebUI-Chat-Id header mapping, multi-turn context via external_history, and citation formatting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds an OpenAI-compatible /v1 API surface to the chat app so Open WebUI (and other OpenAI-compatible clients) can use archi as a backend, including token-based auth, streaming (SSE), and conversation ID mapping.

Changes:

Introduces a new Flask blueprint implementing GET /v1/models and POST /v1/chat/completions (streaming + non-streaming).
Adds API token generation/validation in UserService plus REST endpoints for token management.
Adds a shared citation formatter, schema updates (api_token_hash, external_chat_id), and Open WebUI deployment/docs/examples.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`src/interfaces/chat_app/openai_compat.py`	New `/v1` blueprint, auth middleware, SSE translation, conversation mapping/persistence.
`src/interfaces/chat_app/app.py`	Adds `external_history` support and includes source docs/scores in `final` stream events; registers `/v1` blueprint conditionally.
`src/interfaces/chat_app/api.py`	Adds `/api/users/me/api-token` endpoints for token management.
`src/utils/user_service.py`	Adds API token generation (hash-only storage), lookup, existence check, revocation.
`src/archi/utils/citation_formatter.py`	New utility to format deduped/sorted citation blocks.
`src/cli/templates/init.sql`	Schema updates for `users.api_token_hash` and `conversation_metadata.external_chat_id` + indexes.
`src/cli/templates/base-config.yaml`	Adds `services.chat_app.openai_compat.enabled` config knob.
`tests/unit/test_openai_compat_endpoints.py`	HTTP-layer tests for `/v1` routing/validation/auth/streaming/error paths.
`tests/unit/test_openai_compat_conversations.py`	Tests for external chat ID mapping logic and mocked SQL behavior.
`tests/unit/test_api_tokens.py`	Unit tests for token generation/hash lookup/revocation.
`tests/unit/test_citation_formatter.py`	Unit tests for citation formatting behavior (dedupe/sorting/labels).
`examples/deployments/openwebui/docker-compose.openwebui.yaml`	Example Open WebUI compose pointing to archi `/v1`.
`examples/deployments/openwebui/config.yaml`	Example archi config enabling `openai_compat`.
`docs/docs/api-reference-v1.md`	API reference for `/v1` endpoints and response formats.
`docs/docs/openwebui-integration.md`	Integration/setup guide for Open WebUI.
`docs/docs/proposals/openwebui-compat.md`	Design proposal for the `/v1` compatibility layer.
`docs/docs/proposals/multi-collection-routing.md`	Companion proposal including shared citations/multi-collection ideas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T19:33:44Z

+                response = event.get("response")
+                if response:
+                    final_content = response.answer
+                docs = event.get("source_documents", [])


Non-streaming path assumes the final event’s response has an .answer attribute, but ChatWrapper.stream() emits response as a formatted string (see src/interfaces/chat_app/app.py:2065-2070). This will raise AttributeError on every successful request. Treat response as a string (or handle both string + PipelineOutput) and/or accumulate from chunk events as a fallback.

Already addressed in a prior commit — the code at this location treats response as a plain string (no .answer access). The comment on line 298-299 documents this. No change needed.

Copilot · 2026-04-14T19:33:45Z

+                    _persist_messages(conversation_id, query, accumulated[0])
+                    return


_persist_messages() inserts into the conversations table, but ChatWrapper.stream() already persists both user+assistant messages during _finalize_result() (see src/interfaces/chat_app/app.py:1510-1519 and 2065-2080). When conversation_id is provided (e.g., via X-OpenWebUI-Chat-Id mapping), this will duplicate messages for every /v1 request. Remove these extra inserts and rely on ChatWrapper persistence, or only persist on error in a way that cannot double-insert on successful runs.

Already addressed — _persist_messages() was removed in a prior commit. ChatWrapper.stream() handles persistence via _finalize_result(). No change needed.

Copilot · 2026-04-14T19:33:45Z

+                cursor.execute(
+                    "SELECT conversation_id FROM conversation_metadata WHERE external_chat_id = %s",
+                    (external_chat_id,)
+                )
+                row = cursor.fetchone()
+                if row:
+                    cursor.execute(
+                        "UPDATE conversation_metadata SET last_message_at = NOW() WHERE conversation_id = %s",
+                        (row[0],)
+                    )
+                    conn.commit()
+                    return row[0]
+
+                cursor.execute(
+                    """
+                    INSERT INTO conversation_metadata (user_id, client_id, title, external_chat_id)
+                    VALUES (%s, %s, %s, %s)
+                    RETURNING conversation_id
+                    """,
+                    (user_id, client_id, "Open WebUI Chat", external_chat_id)
+                )
+                conv_id = cursor.fetchone()[0]
+                conn.commit()
+                return conv_id
+        finally:


Conversation mapping has a race: two concurrent requests with the same external_chat_id can both miss the SELECT and then hit the INSERT, causing a unique-constraint violation on idx_conv_meta_external_chat and returning None (breaking multi-turn continuity). Use an atomic upsert (INSERT ... ON CONFLICT (external_chat_id) DO UPDATE ... RETURNING conversation_id) or catch psycopg2.IntegrityError and re-SELECT the existing row.

Ok, Replaced the SELECT-then-INSERT with an atomic INSERT ... ON CONFLICT (external_chat_id) WHERE external_chat_id IS NOT NULL DO UPDATE SET last_message_at = NOW() RETURNING conversation_id. Eliminates the TOCTOU race entirely.

Copilot · 2026-04-14T19:33:45Z

+        except Exception as exc:
+            logger.error(f"/v1 streaming error: {exc}", exc_info=True)
+            yield _sse_chunk(request_id, model, content=f"\n\n[Error: {exc}]")
+            yield _sse_chunk(request_id, model, finish_reason="stop")
+            yield "data: [DONE]\n\n"
+            _persist_messages(conversation_id, query, None)


The streaming exception handler returns the raw exception text to the client ([Error: {exc}]). This can leak internal details (DB info, stack-relevant messages) to untrusted callers. Prefer returning a generic message (and keep the full details only in logs), consistent with ChatWrapper.stream()’s default server error; see chat logs for message.

Fixed. Both catch-all except Exception handlers (streaming line 269 and non-streaming line 315) now return "server error; see chat logs for message" instead of str(exc), matching the pattern used by ChatWrapper.stream(). The full exception is still logged server-side. Note: the "error" event handlers were already safe since ChatWrapper sanitizes those messages upstream.

Copilot · 2026-04-14T19:33:46Z

+@api.route('/users/me/api-token', methods=['GET'])
+@require_client_id
+def check_api_token():
+    """
+    Check whether the current user has an API token.
+
+    Returns:
+        {"has_token": true/false} — never returns the token itself.
+    """
+    try:
+        services = get_services()
+        has_token = services.user_service.has_api_token(g.client_id)
+        return jsonify({'has_token': has_token}), 200
+
+    except Exception as e:
+        logger.error(f"Error checking API token: {e}")
+        return jsonify({'error': str(e)}), 500
+
+
+@api.route('/users/me/api-token', methods=['POST'])
+@require_client_id
+def generate_api_token():
+    """
+    Generate a new API token for the current user.
+
+    Returns the plaintext token once. Replaces any existing token.
+
+    Returns:
+        {"token": "archi_..."}
+    """
+    try:
+        services = get_services()
+        # Ensure user exists
+        services.user_service.get_or_create_user(g.client_id)
+        token = services.user_service.generate_api_token(g.client_id)
+
+        return jsonify({
+            'token': token,
+            'message': 'Save this token — it will not be shown again.',
+        }), 201
+
+    except Exception as e:
+        logger.error(f"Error generating API token: {e}")
+        return jsonify({'error': str(e)}), 500
+


These API-token endpoints are only guarded by @require_client_id, which does not authenticate the caller (it can generate an anon_... id). That allows unauthenticated clients to mint bearer tokens and then use /v1 even when auth_enabled is true, undermining the intended auth boundary. Require an authenticated session / RBAC permission for token issuance (and consider restricting to non-anonymous users), returning 401/403 when not logged in.

Investigated — the /api blueprint (register_api()) is never registered with the Flask app in production. The FlaskAppWrapper wires its /api/ routes individually via add_endpoint() with require_auth() wrapping. These token endpoints are dead code at runtime and not exploitable. Will address auth gating properly when the blueprint is wired in.

Copilot · 2026-04-14T19:33:46Z

+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `model` | string | Yes | Config name from `/v1/models` |
+| `messages` | array | Yes | Array of `{role, content}` objects |
+| `stream` | boolean | No | Enable SSE streaming (default: `false`) |
+| `temperature` | float | No | Override pipeline temperature |
+| `max_tokens` | integer | No | Override max tokens |
+


The /v1 reference documents temperature and max_tokens overrides, but the implementation in src/interfaces/chat_app/openai_compat.py currently ignores these fields (they’re not forwarded into stream_kwargs). Either implement these request overrides or remove them from the public API docs to avoid misleading clients.

Fixed — removed temperature and max_tokens rows from the docs. Neither openai_compat.py, ChatWrapper.stream(), nor the pipeline layer supports runtime overrides for these. Can add support later as a separate feature if needed.

Copilot · 2026-04-14T19:33:46Z

+            {"type": "chunk", "content": "Test answer"},
+            {"type": "final", "response": SimpleNamespace(answer="Test answer"),
+             "source_documents": [], "retriever_scores": []},
+        ])


The mocked final event uses response=SimpleNamespace(answer=...), but the real ChatWrapper.stream() emits final with response as a plain string (see src/interfaces/chat_app/app.py:2065-2068). This mismatch will let non-streaming /v1 regressions slip through (the current implementation will AttributeError on .answer). Update the mock event shape to match production so tests exercise the real contract.

Not applicable — all mock final events in the test file use "response": "<plain string>" (lines 55, 223, 253, 311). There is no SimpleNamespace(answer=...) anywhere. The mocks match the real ChatWrapper.stream() behavior. No change needed.

Copilot · 2026-04-14T19:33:46Z

+def chat_completions():
+    """Handle OpenAI-compatible chat completion requests."""
+    data = request.get_json(silent=True)
+    if not data:


request.get_json(silent=True) returns {} for an empty-but-valid JSON body, which is falsy; the current if not data: treats that as “invalid JSON” and returns the wrong error. Use an explicit if data is None: check so valid empty JSON produces the expected missing-field validation errors instead.

Suggested change

if not data:

if data is None:

Technically correct, but not worth fixing — an empty {} body would pass through to the next validation and get rejected with "'model' is required" anyway. No real OpenAI-compatible client sends {}. The practical impact is zero.

- fix race condition in _get_or_create_conversation with atomic upsert - stop leaking raw exception text to clients in error handlers - remove undocumented temperature/max_tokens from API docs - fix test mocks to use plain string response (matching ChatWrapper) - remove _persist_messages (ChatWrapper handles persistence) - fix non-streaming response.answer to use plain string Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T21:15:11Z

+        # Conditionally register OpenAI-compatible /v1 blueprint
+        openai_compat_config = self.chat_app_config.get("openai_compat", {})
+        if openai_compat_config.get("enabled", False):
+            from src.interfaces.chat_app.openai_compat import register_openai_compat
+            user_service = UserService(pg_config=self.pg_config)
+            register_openai_compat(
+                self.app,
+                self.chat,
+                user_service=user_service,
+                auth_enabled=self.auth_enabled,
+            )


Docs instruct users to generate bearer tokens via POST /api/users/me/api-token, but the chat app currently doesn’t register the src/interfaces/chat_app/api.py blueprint and FlaskAppWrapper doesn’t add an equivalent /api/users/me/api-token endpoint. As-is, users won’t have a supported way to mint tokens for /v1 when auth is enabled. Either wire these token routes into FlaskAppWrapper (e.g., via add_endpoint + require_auth/require_perm) or adjust the docs to match the actual token issuance mechanism.

Copilot · 2026-04-14T21:15:11Z

+If authentication is enabled, generate a token via archi's API:
+
+```bash
+curl -X POST http://localhost:7861/api/users/me/api-token \
+  -H "Cookie: session=<your-session-cookie>"
+```
+
+Save the returned `archi_...` token. It's shown once and cannot be retrieved later.
+


This guide references /api/users/me/api-token for token generation/management, but the chat app currently doesn’t expose these routes (the api.py blueprint isn’t registered and FlaskAppWrapper doesn’t add equivalent endpoints). Either wire the token endpoints into the running chat app or update the guide to the actual token issuance flow so users can complete the integration when auth is enabled.

Copilot · 2026-04-14T21:15:12Z

+Authorization: Bearer archi_<token>
+```
+
+Generate tokens via `POST /api/users/me/api-token`.


This reference says tokens are generated via POST /api/users/me/api-token, but the chat app doesn’t currently expose that route (the api.py blueprint isn’t registered and there’s no add_endpoint for it). Either expose those endpoints or update this section to the supported token issuance path so /v1 auth can be configured reliably.

Suggested change

Generate tokens via `POST /api/users/me/api-token`.

Use a bearer token issued by your deployment's configured authentication flow.

The chat app does not currently expose `POST /api/users/me/api-token`, so do not rely on that route for `/v1` authentication.

Copilot · 2026-04-14T21:15:12Z

+    except Exception as exc:
+        logger.error(f"/v1 non-streaming error: {exc}", exc_info=True)
+        _persist_messages(conversation_id, query, None)
+        return _openai_error(str(exc), "server_error", 500)


The non-streaming exception handler returns str(exc) in the OpenAI error payload. This exposes internal exception messages to clients; align with the streaming behavior fix by returning a generic server error message and logging the exception details server-side.

Suggested change

return _openai_error(str(exc), "server_error", 500)

return _openai_error("Internal server error", "server_error", 500)

Copilot · 2026-04-14T21:15:12Z

+def _sse_chunk(request_id, model, content=None, finish_reason=None):
+    """Build a single SSE data line in OpenAI format."""
+    delta = {}
+    if content is not None:
+        delta["content"] = content
+    if finish_reason is not None:
+        delta["role"] = "assistant"


_sse_chunk() sets delta.role = "assistant" whenever finish_reason is provided, which makes the final "stop" chunk carry a non-empty delta. This diverges from the documented example in docs/docs/api-reference-v1.md (final chunk has delta: {}) and from typical OpenAI streaming behavior. Only include role in the initial chunk (if at all), and keep the terminal chunk’s delta empty.

Suggested change

def _sse_chunk(request_id, model, content=None, finish_reason=None):

"""Build a single SSE data line in OpenAI format."""

delta = {}

if content is not None:

delta["content"] = content

if finish_reason is not None:

delta["role"] = "assistant"

def _sse_chunk(request_id, model, content=None, finish_reason=None, include_role=False):

"""Build a single SSE data line in OpenAI format.

`role` should only be included on the initial streamed chunk when

explicitly requested. Terminal chunks should keep `delta` empty and

communicate completion via `finish_reason`.

"""

delta = {}

if include_role:

delta["role"] = "assistant"

if content is not None:

delta["content"] = content

Copilot · 2026-04-14T21:15:13Z

+#   2. Generate an API token (after deployment is running):
+#      curl -X POST http://localhost:7861/api/users/me/api-token
+#
+#   3. Start Open WebUI (see docker-compose.openwebui.yaml):
+#      ARCHI_TOKEN=archi_<your-token> docker compose -f docker-compose.openwebui.yaml up -d


The example deployment instructions suggest generating a token via POST /api/users/me/api-token, but the chat app currently doesn’t expose these token endpoints (the blueprint isn’t registered / no add_endpoint wiring). Update the example to match the real token issuance mechanism or add the missing route wiring so the example is runnable as written.

Suggested change

# 2. Generate an API token (after deployment is running):

# curl -X POST http://localhost:7861/api/users/me/api-token

#

# 3. Start Open WebUI (see docker-compose.openwebui.yaml):

# ARCHI_TOKEN=archi_<your-token> docker compose -f docker-compose.openwebui.yaml up -d

# 2. Start Open WebUI (see docker-compose.openwebui.yaml):

# docker compose -f docker-compose.openwebui.yaml up -d

#

# Notes:

# - This example enables the OpenAI-compatible API on the chat app.

# - The `/api/users/me/api-token` token-generation endpoint is not exposed by this deployment.

Fix 3 pre-existing pyright type errors and add 46 unit tests for the RBAC subsystem. See #552, #553. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Support anonymous access when registry.allow_anonymous is true - Add token TTL expiry via api_token_created_at column - Audit log all auth paths (success, failure, anonymous, revoke) - Propagate is_admin from users table to User dataclass - Add tests for admin/non-admin roles, token expiry, anonymous access, and audit event logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Austin Swinney and others added 2 commits April 14, 2026 15:01

add docs: OpenWebUI integration guide, API reference, and proposal

6198888

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

swinney requested a review from Copilot April 14, 2026 19:27

Copilot started reviewing on behalf of swinney April 14, 2026 19:28 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

swinney requested a review from Copilot April 14, 2026 21:09

Copilot started reviewing on behalf of swinney April 14, 2026 21:09 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Austin Swinney and others added 2 commits April 14, 2026 19:23

fix: RBAC pyright type errors and add unit test coverage

ca0b7a7

Fix 3 pre-existing pyright type errors and add 46 unit tests for the RBAC subsystem. See #552, #553. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

		_persist_messages(conversation_id, query, accumulated[0])
		return

	Generate tokens via `POST /api/users/me/api-token`.
	Use a bearer token issued by your deployment's configured authentication flow.
	The chat app does not currently expose `POST /api/users/me/api-token`, so do not rely on that route for `/v1` authentication.

	return _openai_error(str(exc), "server_error", 500)
	return _openai_error("Internal server error", "server_error", 500)

-#   2. Generate an API token (after deployment is running):
-#      curl -X POST http://localhost:7861/api/users/me/api-token
-#
-#   3. Start Open WebUI (see docker-compose.openwebui.yaml):
-#      ARCHI_TOKEN=archi_<your-token> docker compose -f docker-compose.openwebui.yaml up -d
+#   2. Start Open WebUI (see docker-compose.openwebui.yaml):
+#      docker compose -f docker-compose.openwebui.yaml up -d
+#
+# Notes:
+#   - This example enables the OpenAI-compatible API on the chat app.
+#   - The `/api/users/me/api-token` token-generation endpoint is not exposed by this deployment.

Conversation

swinney commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New files

Modified files

Tests

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

swinney commented Apr 14, 2026 •

edited

Loading

swinney Apr 14, 2026 •

edited

Loading