FastAPI application that exposes POST /v1/chat/completions in an OpenAI-compatible shape, runs the same governance path as the SDK (OrchestrationController.process on a ProcessedRequest built from request messages), then either returns a synthetic chat.completion (REFUSE), forwards the original body (NORMAL_COMPLETE), or forwards with an appended synthetic user turn (SAFE_COMPLETE). Adds X-Moralstack-* response headers for audit.
Normative reference: multiturn design v1.3 section 4.
moralstack.server.create_app— factory:create_app(openai_client=..., orchestrator=..., config=..., session_store=...).moralstack.server.conversation_correlation.ConversationCorrelationStore— process-local lineage mapping for OpenAI-style full-history replays when no explicitconversation_idis provided.compute_conversation_fingerprint— deterministic diagnostic hash from the opening message stem (through the firstusermessage); not the authoritativeconversation_id(usemsconv-*from the correlation store or client headers).build_governance_headers— header dict fromOrchestratorResult.
build_governance_headers (moralstack/server/headers.py) attaches:
| Header | Description |
|---|---|
X-Moralstack-Decision |
final_action (NORMAL_COMPLETE, SAFE_COMPLETE, REFUSE, …) |
X-Moralstack-Risk-Score |
Normalized risk score |
X-Moralstack-Posture |
Conversation governance posture |
X-Moralstack-Path |
Processing path (includes COMPLIANCE_FAST_PATH on DCCL match) |
X-Moralstack-Conversation-Id |
Resolved conversation id |
X-Moralstack-Internal-Draft-Reused |
Whether an internal speculative draft was reused |
X-Moralstack-Cached-From |
Present when a ledger cache hit was applied |
X-Moralstack-Compliance-Decision |
DCCL verdict when a developer contract was evaluated (MATCH, NO_MATCH, SAFETY_OVERRIDE; omitted for NO_CONTRACT) |
X-Moralstack-Compliance-Rule |
Matched structured rule id when decision is MATCH |
- For multi-turn conversational clients (full history replay per request), run one uvicorn worker per process unless you provide a shared session store and distributed locking across workers. Each worker has its own
InMemorySessionStoreandConversationCorrelationStore. - Blocking orchestrator and upstream OpenAI SDK calls run in a Starlette threadpool so the ASGI loop can accept concurrent requests; per-
conversation_idlocks still serialize same-conversation turns. - Per-request controller state:
OrchestrationControlleris typically a process-wide singleton (for example one instance percreate_app). Multi-turn linkage and ledger intent fields for a singleprocess()call are held in a stack-localProcessCallContext(moralstack/orchestration/process_context.py) passed through internal helpers — not on the controller instance — so concurrent proxy requests on differentconversation_idvalues cannot cross-contaminate observability metadata.
The model field in the client JSON body is not forwarded to OpenAI for final
generation. The proxy always uses the resolved upstream model:
GovernanceConfig.model → OPENAI_MODEL → gpt-4o (same precedence as the SDK bootstrap).
Clients may send a virtual alias (for example a COMPL-AI benchmark model id); only
OPENAI_MODEL (or GovernanceConfig.model) is passed to chat.completions.create.
Synthetic REFUSE responses echo the same resolved model in the model field of the
JSON payload.
- Optional extras:
[ui]includes proxy-related deps;[server]is a lighter subset (fastapi,uvicorn,httpx). - Console script
moralstack-serverpoints atmoralstack.server.proxy:main, which intentionally raisesNotImplementedErroruntil a deployer launcher wires real clients (Step 12 examples).
tests/test_server_proxy.py— integration tests withTestClient; async overlap tests (httpx.AsyncClient+ASGITransport); JSONL alignment under concurrent distinctconversation_idwith a real orchestrator.tests/test_server_fingerprint.py— fingerprint unit tests.tests/test_conversation_correlation.py— lineage hash andConversationCorrelationStorebehaviour.