diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..a0ad756
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 nasr
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/PROGRESS.md b/PROGRESS.md
index 9053fee..1f353da 100644
--- a/PROGRESS.md
+++ b/PROGRESS.md
@@ -8,29 +8,25 @@
 
 ## Current state
 
-- **Active milestone:** M10 — Containerization + Terraform (AWS) + CD
-- **Status:** complete on branch (started 2026-05-29, completed 2026-05-29); awaiting CI green and human squash-merge. Per the locked constraints, **no `terraform apply` was run** — the PR ships infra-as-code only. Demo deployment + screenshots remain a manual operator action documented in `infra/README.md`.
-- **Active branch:** `feat/m10-deploy` (PR open — see Milestone status)
-- **Last completed milestone:** M9 — Evaluation harness (PR #12, merged 2026-05-29)
-- **`make check` passing:** baseline green from M9; M10 adds 8 request-id-middleware tests for a backend total of 195. Frontend tests unchanged (7).
-- **Last action:** committed M10 in 5 small Conventional Commits (housekeeping; backend structlog + request-id middleware + production Dockerfile + tests; frontend production Dockerfile + nginx.conf.template; Terraform stack with five modules; CD workflow + .dockerignore relocation + CI terraform job).
-- **Next action:** human squash-merges the M10 PR. After merge, follow `infra/README.md` to apply the stack, set the GitHub `AWS_ROLE_ARN` secret from the OIDC role output, write the API keys via `aws ssm put-parameter`, dispatch the CD workflow, capture demo screenshots, and `terraform destroy` immediately. Then `/start-milestone 11` for docs + diagram + demo.
+- **Active milestone:** M11 — Docs, architecture diagram, demo
+- **Status:** complete on branch (started 2026-05-29, completed 2026-05-29); awaiting CI green and human squash-merge.
+- **Active branch:** `feat/m11-docs-demo` (PR open — see Milestone status)
+- **Last completed milestone:** M10 — Containerization + Terraform (AWS) + CD (PR [#14](https://github.com/div0rce/sentinel/pull/14), merged 2026-05-29 at `b18112d`)
+- **`make check` passing:** baseline green from M10 (195 backend pytest, 7 frontend Vitest, ruff/mypy clean). Docs-only PR; no code surface changed.
+- **Last action:** committed M11 in 5 small Conventional Commits — PROGRESS.md housekeeping, `docs(architecture)` (write-up + Mermaid source + rendered PNG), `docs(demo)` (7-step script), `docs(readme)` (portfolio entry-point), `docs: add MIT LICENSE`.
+- **Next action:** human squash-merges the M11 PR. After merge, capture screenshots from a real demo run, drop them into `docs/screenshots/`, and tackle the post-M11 backlog (real-provider eval numbers per [#13](https://github.com/div0rce/sentinel/issues/13), eval set expansion, multi-tenant + RBAC, OTel traces, Multi-AZ + private subnets).
 - **Blockers:** none.
 
-### M10 DoD verification
+### M11 DoD verification
 
-- [ ] **`terraform plan` is clean; `apply` provisions the stack.** *Pending* — locally we have no `terraform` binary and the user has explicitly forbidden any `terraform plan`/`apply` or AWS API calls in this session. The infra is wired so a `terraform fmt -check` + `terraform validate` job runs in CI on every PR (no AWS creds needed); plan/apply remains a manual operator step. Confirming this DoD item requires the operator to run `terraform plan` against an AWS account, which is the M11 demo workflow.
-- [x] **CD workflow builds and deploys on manual dispatch.** `.github/workflows/cd.yml` is `workflow_dispatch`-only (no `push:`/`pull_request:` triggers — the M10 cost-control invariant), uses `aws-actions/configure-aws-credentials@v4` against an OIDC role written by `infra/modules/ci_oidc/`, builds backend and frontend images, pushes to ECR with the git SHA tag, and force-redeploys the ECS services.
-- [x] **App is reachable at a URL** — *infra-as-code complete*. The ALB DNS (`output "alb_dns_name"`) is the URL once `terraform apply` succeeds. Capturing screenshots is the M11 demo task; the operator runs `terraform destroy` immediately after.
+- [x] **README is complete and accurate; quickstart works from a clean clone.** README.md ships with the full problem → architecture → features → quickstart → evaluation → governance → deployment → limitations → roadmap → license sections, embeds the rendered architecture PNG, and links every sub-doc. Quickstart is the same flow `docs/demo.md` covers in detail; the test suite (`make check`) was re-verified green on this branch.
+- [x] **Architecture diagram committed (source + image).** `docs/architecture.mmd` (76 lines, LR layout) is the single source. `docs/architecture.png` (3168×2234, rendered via `mmdc 11.15.0`) is the committed image. Render command is documented in `docs/architecture.md` and `README.md` so a reviewer can regenerate the PNG from source without guessing.
+- [x] **Limitations + synthetic-data disclaimer present and honest.** README "Limitations & synthetic-data disclaimer" lists synthetic data, small eval set, pending real-provider numbers (#13), demo-only deploy posture, self-reported confidence (routing signal, not calibrated probability), citation-validity in-context check. Top-of-file callout reinforces the disclaimer.
 
-### M10 design lock-ins
+### Follow-ups tracked outside M11
 
-- **Code only.** No `terraform apply`. No AWS resource creation. No incurred costs in this PR.
-- **Cost posture.** Public-subnet + no-NAT-Gateway, single-AZ, Fargate `0.25 vCPU / 0.5 GB`, RDS `db.t4g.micro`. NAT Gateway idle cost (~$32/month) avoided. RDS **not publicly accessible** (security-group ingress keyed only to the backend task SG). Idle floor estimate ~$45/month, dominated by ALB + Fargate + RDS.
-- **CD trigger.** `workflow_dispatch` only. The trigger gate is the M10 cost-control mechanism.
-- **Region.** `us-east-1`. Pinned via `var.region` default.
-- **Secrets.** Runtime secrets in SSM Parameter Store (SecureString); written out-of-band so values stay out of Terraform state. CI identity via GitHub OIDC, not long-lived access keys.
-- **Demo-only.** `infra/README.md` documents the teardown recipe (`terraform destroy` immediately after demo screenshots) and every cost/security tradeoff (single-AZ, no Multi-AZ, no auto-scaling, no remote state, plain HTTP on the ALB).
+- **#13** — record real-provider eval numbers (M9 follow-up). Stays open until keys are wired and `make eval` is run for real.
+- **Backlog (MILESTONES.md):** multi-tenant + RBAC, eval set expansion, OTel traces, Multi-AZ + private subnets + ACM TLS + S3/DynamoDB Terraform backend.
 
 ---
 
@@ -48,8 +44,8 @@
 | M7 | Audit log + HITL | `feat/m07-audit-hitl` | ☑ merged | [#8](https://github.com/div0rce/sentinel/pull/8) | 2026-05-29 |
 | M8 | Frontend | `feat/m08-frontend` | ☑ merged | [#9](https://github.com/div0rce/sentinel/pull/9) | 2026-05-29; perf follow-up [#11](https://github.com/div0rce/sentinel/pull/11) |
 | M9 | Evaluation harness | `feat/m09-eval` | ☑ merged | [#12](https://github.com/div0rce/sentinel/pull/12) | 2026-05-29; real-provider numbers tracked in [#13](https://github.com/div0rce/sentinel/issues/13) |
-| M10 | Deploy (Docker/Terraform/CD) | `feat/m10-deploy` | ◐ complete on branch (PR open) | _filled in after `gh pr create`_ | 2026-05-29; code-only — no apply ran |
-| M11 | Docs + diagram + demo | `feat/m11-docs-demo` | ☐ | — | |
+| M10 | Deploy (Docker/Terraform/CD) | `feat/m10-deploy` | ☑ merged | [#14](https://github.com/div0rce/sentinel/pull/14) | 2026-05-29; code-only — apply remains a manual operator action |
+| M11 | Docs + diagram + demo | `feat/m11-docs-demo` | ◐ complete on branch (PR open) | _filled in after `gh pr create`_ | 2026-05-29; docs-only |
 
 Status key: ☐ not started · ◐ in progress · ☑ merged
 
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..650c3f2
--- /dev/null
+++ b/README.md
@@ -0,0 +1,352 @@
+# Sentinel — governed document intelligence
+
+[![CI](https://github.com/div0rce/sentinel/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/div0rce/sentinel/actions/workflows/ci.yml)
+[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+
+Sentinel is a portfolio-grade implementation of an **enterprise RAG + structured
+extraction platform with deterministic, auditable governance**. It turns an
+unstructured document corpus into two outputs:
+
+1. **Source-cited natural-language answers** (citation-or-refuse).
+2. **Schema-structured records with per-field confidence and provenance**.
+
+Both outputs run through a deterministic, idempotent, **human-in-the-loop**
+workflow with an **append-only audit log**. The full pipeline — ingestion,
+retrieval, RAG, extraction, guardrails, workflow engine, audit — is
+exercised end-to-end against a hand-labeled synthetic benchmark by an
+evaluation harness that **refuses to fabricate metric values** when the
+fakes are in play.
+
+> All sample data and benchmark labels are synthetic. The system has never
+> seen real customer data and is not intended for production use as-is.
+> See [Limitations & synthetic-data disclaimer](#limitations--synthetic-data-disclaimer).
+
+---
+
+## Table of contents
+
+- [Problem](#problem)
+- [Architecture](#architecture)
+- [Features](#features)
+- [Quickstart](#quickstart)
+- [Evaluation](#evaluation)
+- [Governance & guardrails](#governance--guardrails)
+- [Deployment](#deployment)
+- [Limitations & synthetic-data disclaimer](#limitations--synthetic-data-disclaimer)
+- [Roadmap](#roadmap)
+- [Project map](#project-map)
+- [License](#license)
+
+---
+
+## Problem
+
+Most enterprise RAG demos answer the question "can an LLM look something up
+in our docs?" Most enterprise extraction demos answer "can an LLM populate a
+JSON schema?" Both questions are easy. The hard questions are operational:
+
+- How do you know the answer is **grounded** in the corpus, not hallucinated?
+- How do you know which **fields are reliable** and which need a human?
+- How do you **route** ambiguous output to a reviewer and **prove**, after
+  the fact, who decided what and why?
+- How do you **measure** the system's quality on a labeled benchmark — without
+  fabricating numbers when the LLM isn't actually wired up?
+- How do you ship the whole thing as a **container that runs on AWS** with a
+  **non-publicly-accessible database**, **no long-lived CI keys**, and a
+  **manual-only** deployment trigger so the bill stays bounded?
+
+Sentinel is one opinionated answer to all of those. The architecture is built
+around a small set of deterministic invariants that are tested in code:
+
+- **Citation-or-refuse.** Every answer is supported by a retrieved chunk; if
+  not, the system refuses *before* calling the LLM. The same rule applies
+  field-by-field to extraction.
+- **Append-only audit.** Every model suggestion and every human decision
+  writes one row to `audit_events`. The repository layer has no update or
+  delete path. Reconstructing any workflow item's state by replay is a
+  tested property.
+- **Idempotent, deterministic workflow.** Routing or re-routing the same
+  extraction never creates a second `workflow_items` row. Same input → same
+  state.
+- **PII redaction is pre-LLM and pre-storage.** The LLM never sees raw
+  emails / phone numbers / SSNs / credit cards / IPs; the database never
+  stores them in chunk text.
+- **Honesty discipline.** The eval harness emits `n/a (...)` rather than a
+  fabricated number when a fake provider is in play. `eval/RESULTS.md` ships
+  in a methodology-only state until a real-provider run produces real
+  numbers.
+
+## Architecture
+
+![Sentinel architecture](docs/architecture.png)
+
+Headline shape: **Frontend (Vite + React + TypeScript)** behind nginx →
+**Backend (FastAPI on Python 3.12)** with a small set of pipeline modules
+(`retrieval`, `rag`, `extract`, `workflow`) and cross-cutting governance
+(`guardrails`, `audit`) → **Postgres 16 + pgvector** → **external LLM and
+embedding providers** (Anthropic Claude, OpenAI embeddings — both behind
+narrow interfaces and mocked in tests).
+
+The full architectural cross-reference, including end-to-end sequence
+diagrams for `/query`, `/extract`, and human review, an ER diagram, and the
+M10 deployment topology, is in [`docs/architecture.md`](docs/architecture.md).
+The diagram source is [`docs/architecture.mmd`](docs/architecture.mmd) — render
+with `npx -y --package=@mermaid-js/mermaid-cli mmdc -i docs/architecture.mmd -o docs/architecture.png --backgroundColor white --width 1600 --scale 2`.
+
+## Features
+
+| Capability | Where it lives | Tested by |
+| --- | --- | --- |
+| Idempotent ingestion + chunking + embedding | `backend/app/ingest.py`, `backend/app/embeddings/` | `test_ingest.py`, `test_chunking.py` |
+| pgvector cosine top-k retrieval | `backend/app/retrieval.py` | `test_retrieval.py` |
+| Citation-grounded RAG (`POST /query`) | `backend/app/rag.py`, `backend/app/routers/query.py` | `test_rag.py`, `test_query_router.py` |
+| Schema-constrained structured extraction (`POST /extract`) | `backend/app/extract.py`, `backend/app/extraction_schemas/` | `test_extract.py`, `test_extract_router.py` |
+| PII redaction + confidence gating | `backend/app/guardrails.py` | `test_guardrails.py` |
+| Deterministic, idempotent workflow FSM | `backend/app/workflow.py` | `test_workflow.py` |
+| Append-only audit log + replay | `backend/app/audit.py` | `test_audit_events_append_only.py` |
+| Human-in-the-loop review API + UI | `backend/app/routers/review.py`, `frontend/src/views/Review.tsx` | `test_audit_and_review.py`, `Review.test.tsx` |
+| KPI dashboard (volume, categories, confidence, SLA) | `backend/app/routers/dashboard.py`, `frontend/src/views/Dashboard.tsx` | `test_dashboard.py` |
+| Structured logging + request-id correlation | `backend/app/observability.py` | `test_request_id.py` |
+| Eval harness (extraction / retrieval / RAG) | `eval/` | `test_eval_harness.py` |
+| Containerized + Terraform demo deploy on AWS | `backend/Dockerfile`, `frontend/Dockerfile`, `infra/` | `terraform fmt+validate` in CI |
+| Manual-dispatch CD via GitHub OIDC | `.github/workflows/cd.yml`, `infra/modules/ci_oidc/` | review-tested |
+
+
+## Quickstart
+
+The full step-by-step is in [`docs/demo.md`](docs/demo.md). Short version
+(developer laptop, ~15 minutes):
+
+```bash
+# 1. clone
+git clone https://github.com/div0rce/sentinel.git
+cd sentinel
+cp .env.example .env   # set ANTHROPIC_API_KEY and OPENAI_API_KEY
+
+# 2. start Postgres + the API
+docker compose up -d db
+make dev                       # uvicorn on :8000
+
+# 3. start the frontend (second terminal)
+cd frontend && npm ci && npm run dev   # Vite on :5173
+
+# 4. migrate + seed the synthetic corpus
+make migrate
+make seed
+
+# 5. ask a question against the synthetic corpus
+curl -s http://localhost:8000/query \
+  -H 'Content-Type: application/json' \
+  -d '{"query":"What is the total amount due on the Initech Components invoice issued on 2026-01-22?"}' | jq
+```
+
+Open <http://localhost:5173> for the SPA: **Query**, **Review**, and
+**Dashboard** views.
+
+### Run the test suite
+
+```bash
+make check        # ruff + mypy + 195 backend pytest + 7 frontend Vitest
+```
+
+CI runs the same matrix plus `terraform fmt -check && terraform validate`
+on every PR. None of these steps require API keys; the `fake` LLM and
+embedder run offline by default.
+
+## Evaluation
+
+The evaluation harness lives in `eval/`. Three evaluators against a
+hand-labeled synthetic benchmark:
+
+| Evaluator | Metric | Where |
+| --- | --- | --- |
+| Extraction | per-field exact-match after typed normalization (trim+casefold for strings, ISO canonical for dates, ±0.01 for numbers); reports micro / macro / per-field accuracy + per-field precision/recall | `eval/labels/extraction_labels.json` |
+| Retrieval | precision@k, recall@k, MRR (k=5) | `eval/labels/retrieval_labels.json` |
+| RAG | citation-validity rate, answer-cites-relevant rate, expected-substring-match rate, refusal rate | `eval/labels/rag_labels.json` |
+
+**Honesty discipline.** Under either fake provider, the harness emits
+`n/a (...)` and refuses to write a numerical result for the affected metric;
+this is the n/a gate that keeps Golden Rule #5 ("never fabricate evaluation
+numbers") enforced in code. `eval/RESULTS.md` therefore ships in a
+**PENDING / methodology-only** state until a real-provider run produces real
+numbers — see [issue #13](https://github.com/div0rce/sentinel/issues/13).
+
+The full methodology defense (every metric choice, normalization rule, and
+honesty caveat) is in [`docs/evaluation.md`](docs/evaluation.md).
+
+Reproduce the numbers locally:
+
+```bash
+export ANTHROPIC_API_KEY=...
+export OPENAI_API_KEY=...
+export LLM_PROVIDER=anthropic
+export EMBEDDINGS_PROVIDER=openai
+make migrate && make seed && make eval
+```
+
+## Governance & guardrails
+
+Three pillars, all deterministic and tested:
+
+1. **Citation-or-refuse.** `rag.answer_query` requires the LLM to emit
+   `[chunk:N]` markers and refuses if any cited id wasn't in the retrieval
+   set. The same rule applies field-by-field in extraction. Source:
+   `backend/app/rag.py`, `backend/app/extract.py`.
+
+2. **PII redaction.** A registry of named regex patterns
+   (`EMAIL`, `SSN`, `CREDIT_CARD`, `PHONE`, `IPV4`) replaces matches with
+   `[REDACTED:KIND]`. Idempotent: a second pass over redacted output is a
+   no-op. Runs **pre-storage** (chunks at ingest) and **pre-LLM** (the
+   prompt sent to Claude). Toggle via `PII_REDACTION_ENABLED` (default
+   `true`). Source: `backend/app/guardrails.py`. Specification:
+   [`docs/guardrails.md`](docs/guardrails.md).
+
+3. **Confidence gating + HITL routing.** Per-field confidence below
+   `CONFIDENCE_REVIEW_THRESHOLD` (default `0.75`) sets `requires_review=true`
+   on the extraction. The deterministic FSM in `backend/app/workflow.py`
+   routes to one of three states (`auto_approved`, `needs_review`,
+   `rejected`) and is idempotent: re-routing the same extraction never
+   creates a second `workflow_items` row. Specification:
+   [`docs/workflow.md`](docs/workflow.md).
+
+Every model suggestion and every human decision writes exactly one
+`audit_events` row in the same transaction as the state change. The
+repository layer has no `update` or `delete` path; replaying an item's
+events reproduces its current state. Specification:
+[`docs/audit-and-review.md`](docs/audit-and-review.md).
+
+## Deployment
+
+The M10 Terraform stack provisions an ephemeral demo deployment in `us-east-1`:
+
+- VPC with two public subnets, no NAT Gateway (cost posture; the avoided
+  NAT Gateway is the largest avoidable line item — ~$32/month idle).
+- ECS Fargate behind an ALB. Frontend (nginx serving the Vite SPA) is the
+  default target. Backend (FastAPI) receives `/health` directly from the
+  ALB; everything else under `/api/*` is reverse-proxied by nginx and the
+  `/api` prefix is stripped before reaching FastAPI.
+- RDS Postgres 16 (`db.t4g.micro`, single-AZ). Hard invariant:
+  `publicly_accessible = false`; the security group only permits ingress
+  from the backend task SG.
+- ECR for the two images, SSM Parameter Store for runtime secrets (API keys
+  and `DATABASE_URL`), and a tightly scoped GitHub Actions OIDC role for CI.
+
+Estimated idle cost: **~$45/month**, dominated by the ALB + Fargate + RDS.
+
+CD is **manual-dispatch only** via `.github/workflows/cd.yml`. There is no
+`push:` or `pull_request:` trigger; the trigger gate is the cost-control
+mechanism. The CD job assumes the OIDC role, builds and pushes images to
+ECR, and force-redeploys the ECS services.
+
+The full operator runbook (apply / write secrets / deploy / destroy) and
+the cost-and-security posture rationale live in
+[`infra/README.md`](infra/README.md). **`terraform destroy` immediately
+after capturing screenshots** is the documented contract.
+
+## Limitations & synthetic-data disclaimer
+
+This is a portfolio project. The honest limitations:
+
+- **All data is synthetic.** The corpus under `data/sample/` is generated
+  deterministically by `scripts/gen_synthetic_corpus.py` with a fixed seed.
+  No real customer documents have ever been ingested. Performance on real,
+  noisy production documents will differ.
+- **The eval set is small.** Five invoices for extraction, six retrieval
+  queries, five RAG questions. Numbers from this set should be treated as
+  **smoke-level signal**, not statistically significant accuracy claims.
+  Expanding the labeled set is on the post-M11 backlog; the current
+  pending/methodology-only state of `eval/RESULTS.md` is documented in
+  [`docs/evaluation.md`](docs/evaluation.md).
+- **No real-provider numbers committed yet.** `eval/RESULTS.md` ships in
+  PENDING state. Real-provider numbers depend on a one-time `make eval` run
+  with paid API keys, tracked in
+  [issue #13](https://github.com/div0rce/sentinel/issues/13).
+- **Demo-only deployment posture.** Single-AZ RDS, no Multi-AZ, no
+  auto-scaling, no remote Terraform state, no TLS certificate by default
+  (the ALB SG already permits 443; attach an ACM cert and add a 443
+  listener to enable). See `infra/README.md` for the full list of
+  production-readiness gaps.
+- **Self-reported confidence is a routing signal, not a calibrated
+  probability.** The M4 extraction schema collects per-field confidence
+  from the LLM itself; it's used to route low-confidence fields to a human
+  reviewer (M5/M6) but is **not** reported as calibrated probability in
+  the evaluation harness. Calibrating model self-assessment is its own
+  research surface.
+- **Citation-validity is an in-context check.** It verifies that a cited
+  chunk id is in the retrieval set, not that the cited chunk *actually
+  contains* the supporting fact. The `cites-relevant` evaluator is the
+  closest the harness gets to "the cited chunk is the right one"; an
+  LLM-judge faithfulness check is the natural next step and is out of M9
+  scope.
+
+## Roadmap
+
+Built to date (PRs in the GitHub history):
+
+- M0 — Scaffolding, tooling, CI
+- M1 — Data model + migrations (pgvector)
+- M2 — Ingestion + embeddings
+- M3 — Retrieval + citation-grounded RAG
+- M4 — Schema-constrained structured extraction
+- M5 — Guardrails (PII redaction, confidence gating)
+- M6 — Deterministic, idempotent workflow engine
+- M7 — Append-only audit log + HITL approval
+- M8 — Frontend (Query, Review, Dashboard)
+- M9 — Evaluation harness + methodology defense
+- M10 — Containerization + Terraform (AWS) + manual CD
+- M11 — Docs, architecture diagram, demo (this PR)
+
+Post-M11 backlog (tracked in `MILESTONES.md`):
+
+- Multi-tenant separation; role-based access on the review queue.
+- Eval expansion (larger labeled set, per-category breakdown,
+  LLM-judge faithfulness).
+- Observability: OpenTelemetry traces, dashboards.
+- Production-readiness for the AWS deploy: Multi-AZ RDS, private subnets +
+  NAT or VPC endpoints, ACM/ALB TLS, S3 + DynamoDB Terraform backend.
+
+## Project map
+
+```
+sentinel/
+├── README.md MILESTONES.md PROGRESS.md AGENTS.md
+├── Makefile pyproject.toml uv.lock .pre-commit-config.yaml .env.example
+├── docker-compose.yml .dockerignore
+├── .github/workflows/{ci.yml, cd.yml}
+├── backend/
+│   ├── app/
+│   │   ├── main.py config.py db.py models.py observability.py
+│   │   ├── embeddings/  # interface + OpenAI + Fake
+│   │   ├── llm/         # interface + Claude + Fake
+│   │   ├── ingest.py retrieval.py rag.py extract.py
+│   │   ├── guardrails.py workflow.py audit.py
+│   │   ├── extraction_schemas/  # Pydantic schemas registered with the extractor
+│   │   ├── repositories/        # documents, chunks, extractions, workflow_items, audit_events
+│   │   └── routers/             # query, extract, review, dashboard, health
+│   ├── alembic/  # migrations
+│   ├── tests/    # 195 pytest, runs against the CI Postgres+pgvector service
+│   └── Dockerfile
+├── frontend/
+│   ├── src/{App.tsx, api.ts, views/{Query,Review,Dashboard}.tsx, ...}
+│   ├── nginx.conf.template Dockerfile
+│   └── tests via Vitest under src/__tests__/ and src/views/__tests__/
+├── eval/                       # labels, harness, normalize, results, RESULTS.md
+├── data/sample/                # SYNTHETIC corpus + README marking it synthetic
+├── infra/                      # Terraform (network, ecr, rds, ecs, secrets, ci_oidc)
+├── docs/
+│   ├── architecture.md architecture.mmd architecture.png
+│   ├── demo.md
+│   ├── guardrails.md workflow.md audit-and-review.md evaluation.md
+│   └── adr/
+└── scripts/                    # gen_synthetic_corpus.py and friends
+```
+
+## License
+
+[MIT](LICENSE).
+
+---
+
+> Built as a portfolio project. Issues and PRs welcome; see
+> [`AGENTS.md`](AGENTS.md) and [`MILESTONES.md`](MILESTONES.md) for the
+> milestone-driven workflow that produced the codebase.
diff --git a/backend/app/routers/extract.py b/backend/app/routers/extract.py
index 1e1cadb..c80f662 100644
--- a/backend/app/routers/extract.py
+++ b/backend/app/routers/extract.py
@@ -12,6 +12,7 @@
 from backend.app.extract import ExtractionResult, extract_document
 from backend.app.extraction_schemas import list_schemas
 from backend.app.llm import LLMClient, get_llm
+from backend.app.workflow import route_extraction
 
 router = APIRouter(prefix="/extract", tags=["extract"])
 
@@ -84,10 +85,9 @@ def post_extract(
 ) -> ExtractResponse:
     """Extract a structured record for an ingested document.
 
-    The handler delegates all business logic to :func:`extract_document` and only
-    converts the result to the API response shape. On a successful extraction the
-    session is committed so the new ``extractions`` row is durable; failures issue
-    no writes and so need no rollback.
+    The handler delegates extraction to :func:`extract_document`, then immediately
+    routes successful extractions through the deterministic workflow engine before
+    committing. Failures issue no writes and so need no rollback.
     """
     result = extract_document(
         session,
@@ -96,5 +96,8 @@ def post_extract(
         llm=llm,
     )
     if result.status == "ok":
+        if result.extraction_id is None:  # pragma: no cover - defensive invariant
+            raise RuntimeError("successful extraction did not return an extraction_id")
+        route_extraction(session, extraction_id=result.extraction_id)
         session.commit()
     return _to_response(result)
diff --git a/backend/tests/test_extract_router.py b/backend/tests/test_extract_router.py
index 9cdc46d..c3ae98c 100644
--- a/backend/tests/test_extract_router.py
+++ b/backend/tests/test_extract_router.py
@@ -7,12 +7,13 @@
 
 import pytest
 from fastapi.testclient import TestClient
+from sqlalchemy import select
 from sqlalchemy.orm import Session
 
 from backend.app.db import get_session
 from backend.app.llm import FakeLLM, LLMClient
 from backend.app.main import app
-from backend.app.models import Chunk, Document
+from backend.app.models import Chunk, Document, WorkflowItem, WorkflowStatus
 from backend.app.routers.extract import _llm_dependency
 
 
@@ -26,17 +27,31 @@ def _seed_document(session: Session, *, hash_suffix: str, text: str) -> tuple[in
     return doc.id, chunk.id
 
 
-def _valid_invoice_json(*, chunk_id: int) -> str:
+def _valid_invoice_json(*, chunk_id: int, total_due_confidence: float = 0.9) -> str:
     return json.dumps(
         {
             "invoice_number": {"value": "R-1", "confidence": 0.9, "source_chunk_id": chunk_id},
             "vendor": {"value": "Acme", "confidence": 0.9, "source_chunk_id": chunk_id},
             "issue_date": {"value": "2026-01-22", "confidence": 0.9, "source_chunk_id": chunk_id},
-            "total_due": {"value": 100.0, "confidence": 0.9, "source_chunk_id": chunk_id},
+            "total_due": {
+                "value": 100.0,
+                "confidence": total_due_confidence,
+                "source_chunk_id": chunk_id,
+            },
         }
     )
 
 
+def _workflow_items_for_extraction(session: Session, extraction_id: int) -> list[WorkflowItem]:
+    return list(
+        session.scalars(
+            select(WorkflowItem)
+            .where(WorkflowItem.extraction_id == extraction_id)
+            .order_by(WorkflowItem.id)
+        )
+    )
+
+
 @pytest.fixture
 def client(session: Session) -> Iterator[TestClient]:
     """TestClient with session and llm overridden for isolation."""
@@ -87,6 +102,49 @@ def test_post_extract_happy_path(client: TestClient, session: Session) -> None:
     assert body["reason"] is None
 
 
+def test_post_extract_routes_low_confidence_result_to_review_queue(
+    client: TestClient, session: Session
+) -> None:
+    doc_id, chunk_id = _seed_document(session, hash_suffix="lo", text="invoice text")
+    client.canned_llm.response = _valid_invoice_json(  # type: ignore[attr-defined]
+        chunk_id=chunk_id,
+        total_due_confidence=0.4,
+    )
+
+    resp = client.post("/extract", json={"document_id": doc_id, "schema_name": "invoice"})
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["status"] == "ok"
+    assert body["requires_review"] is True
+    assert body["low_confidence_fields"] == ["total_due"]
+
+    items = _workflow_items_for_extraction(session, body["extraction_id"])
+    assert len(items) == 1
+    assert items[0].status is WorkflowStatus.NEEDS_REVIEW
+
+    queue = client.get("/review").json()["items"]
+    assert [item["id"] for item in queue] == [items[0].id]
+
+
+def test_post_extract_routes_high_confidence_result_out_of_review_queue(
+    client: TestClient, session: Session
+) -> None:
+    doc_id, chunk_id = _seed_document(session, hash_suffix="hi", text="invoice text")
+    client.canned_llm.response = _valid_invoice_json(chunk_id=chunk_id)  # type: ignore[attr-defined]
+
+    resp = client.post("/extract", json={"document_id": doc_id, "schema_name": "invoice"})
+    assert resp.status_code == 200
+    body = resp.json()
+    assert body["status"] == "ok"
+    assert body["requires_review"] is False
+
+    items = _workflow_items_for_extraction(session, body["extraction_id"])
+    assert len(items) == 1
+    assert items[0].status is WorkflowStatus.AUTO_APPROVED
+
+    assert client.get("/review").json() == {"items": []}
+
+
 def test_post_extract_returns_failed_on_malformed_llm_output(
     client: TestClient, session: Session
 ) -> None:
@@ -99,6 +157,7 @@ def test_post_extract_returns_failed_on_malformed_llm_output(
     assert body["status"] == "failed"
     assert body["reason"] == "parse_error"
     assert body["extraction_id"] is None
+    assert session.scalars(select(WorkflowItem)).all() == []
 
 
 def test_post_extract_returns_failed_for_unknown_document(client: TestClient) -> None:
diff --git a/docs/architecture.md b/docs/architecture.md
index 6656cb7..c3c2d4b 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,11 +1,490 @@
 # Architecture
 
-> The full architecture write-up and the Mermaid diagram land in Milestone M11.
-> This file exists from M0 so the docs structure is in place from the start.
+Sentinel is a governed document-intelligence platform. It turns an unstructured
+corpus into two outputs — **source-cited natural-language answers** and
+**schema-structured records with per-field confidence** — and runs both outputs
+through a **deterministic, idempotent, human-in-the-loop workflow** with an
+**immutable audit trail**.
 
-Sentinel is a governed document-intelligence platform. The pipeline is:
-ingestion → retrieval → citation-grounded RAG and schema-constrained extraction →
-guardrails → a deterministic, idempotent, human-in-the-loop workflow with an
-immutable audit trail.
+This doc is the architectural cross-reference: what the components are, how they
+fit together, where the source of truth for each invariant lives, and the
+deployment shape from M10.
 
-Architectural decisions are recorded as ADRs under [`docs/adr/`](adr/).
+> All sample data is synthetic. The system is a portfolio project; it is not
+> production and has never seen real customer data. See `data/sample/README.md`.
+
+---
+
+## High-level component diagram
+
+![Sentinel architecture](architecture.png)
+
+Source (regenerate the PNG with `npx -y @mermaid-js/mermaid-cli -i
+docs/architecture.mmd -o docs/architecture.png` or render any Mermaid block in
+this file):
+
+```mermaid
+flowchart LR
+    classDef ext fill:#eef,stroke:#557,color:#114
+    classDef api fill:#efe,stroke:#575,color:#141
+    classDef db  fill:#fee,stroke:#755,color:#411
+    classDef ui  fill:#ffe,stroke:#775,color:#441
+    classDef gov fill:#fef,stroke:#757,color:#414
+
+    user([User browser]):::ui
+    cli([CLI / curl]):::ui
+
+    subgraph Frontend["Frontend (Vite + React + TS)"]
+      spa[SPA: Query / Review / Dashboard]:::ui
+      nginx[nginx reverse proxy]:::ui
+    end
+
+    subgraph Backend["Backend (FastAPI, Python 3.12)"]
+      query[/POST /query<br/>RAG/]:::api
+      extract[/POST /extract<br/>structured extraction/]:::api
+      review[/GET/POST /review<br/>HITL queue/]:::api
+      dash[/GET /dashboard/*<br/>KPIs/]:::api
+      health[/GET /health/]:::api
+
+      subgraph Pipeline
+        retrieval[retrieval<br/>pgvector top-k]
+        rag[rag<br/>citation-or-refuse]
+        extr[extract<br/>schema-constrained]
+        wflow[workflow<br/>deterministic FSM]
+      end
+
+      subgraph Governance
+        guard[guardrails<br/>PII redact + conf gate]:::gov
+        audit[audit<br/>append-only events]:::gov
+      end
+    end
+
+    subgraph Data["Postgres 16 + pgvector"]
+      docs[(documents)]:::db
+      chunks[(chunks + embeddings)]:::db
+      extrs[(extractions)]:::db
+      wfit[(workflow_items)]:::db
+      audtbl[(audit_events)]:::db
+    end
+
+    subgraph Providers["External providers"]
+      claude[Anthropic Claude]:::ext
+      openai[OpenAI embeddings]:::ext
+    end
+
+    user --> spa
+    spa --> nginx
+    nginx -- /api/* --> query
+    nginx -- /api/* --> extract
+    nginx -- /api/* --> review
+    nginx -- /api/* --> dash
+    cli --> health
+
+    query --> retrieval --> chunks
+    query --> rag --> claude
+    rag --> guard
+    extract --> retrieval
+    extract --> extr --> claude
+    extract --> guard
+    extract --> wflow --> wfit
+    extract --> extrs
+    review --> wflow
+    review --> audit --> audtbl
+    wflow --> audit
+
+    retrieval --> openai
+
+    dash --> docs
+    dash --> wfit
+    dash --> extrs
+```
+
+The two solid invariants that drive this shape:
+
+1. **Citation-or-refuse.** Every answer must be supported by a retrieved chunk.
+   `rag.answer_query` requires the LLM to emit `[chunk:N]` markers and refuses
+   if any cited id wasn't in the retrieval set.
+2. **Append-only audit.** Every model suggestion and every human decision
+   writes one row to `audit_events`. The repository layer has no update or
+   delete path. Reconstructing any workflow item's state by replay is a tested
+   property.
+
+---
+
+## Components and source-of-truth files
+
+### Ingestion (`backend/app/ingest.py`)
+
+Documents → SHA-256 hash → idempotent insert → token-based chunking with
+overlap → batched embeddings → bulk insert into `chunks`. The hash check on
+`documents.hash` short-circuits a re-ingest of identical content; chunk inserts
+use `ON CONFLICT DO NOTHING`. PII redaction runs *before* the chunk store so
+the database never sees raw emails / SSNs / phone numbers / IPs.
+
+### Embeddings (`backend/app/embeddings/`)
+
+Provider behind an interface. Two implementations: `OpenAIEmbedder`
+(`text-embedding-3-small`, 1536 dims) and `FakeEmbedder` (deterministic SHA-256
+projection used in CI and unit tests). Provider is selected by
+`EMBEDDINGS_PROVIDER`. CI runs offline with `EMBEDDINGS_PROVIDER=fake`.
+
+### Retrieval (`backend/app/retrieval.py`)
+
+pgvector cosine top-k against `chunks.embedding`. The retrieval set is the
+sole grounding signal for the RAG layer above it; the RAG layer cannot answer
+without one.
+
+### RAG (`backend/app/rag.py`)
+
+Citation-grounded question answering. Flow:
+
+1. Embed the query.
+2. Retrieve top-k chunks.
+3. Reject if the top score is below `RAG_SIMILARITY_THRESHOLD` (returns a
+   refusal with `reason="no_support"`).
+4. Apply pre-LLM PII redaction to the question and the chunk texts.
+5. Send a prompt asking for an answer that cites every claim with `[chunk:N]`.
+6. Parse the `[chunk:N]` markers; if any cited id is not in the retrieval
+   set, return a refusal with `reason="invalid_citation"`. This is the
+   citation-validity invariant tested in `test_rag.py`.
+
+### Extraction (`backend/app/extract.py`)
+
+Schema-constrained extraction. The LLM is given a Pydantic schema (e.g.
+`InvoicePayload`) and a chunk context; it must emit per-field
+`{value, confidence, source_chunk_id}` triples. The result is validated
+against the schema, the source-chunk ids are validated against the retrieval
+set (same citation-validity rule as RAG), per-field confidences are evaluated
+against the review threshold, and the row is persisted to `extractions` with
+`requires_review` and `low_confidence_fields` precomputed for the workflow
+engine.
+
+### Guardrails (`backend/app/guardrails.py`)
+
+Deterministic safety layer. PII redaction is a registry of named regex
+patterns (`EMAIL`, `SSN`, `CREDIT_CARD`, `PHONE`, `IPV4`) replaced with
+`[REDACTED:KIND]`. Idempotent: a second pass over redacted output is a no-op.
+The `requires_review` / `low_confidence_fields` helpers consume the per-field
+confidence map from extraction. See `docs/guardrails.md` for the full
+specification.
+
+### Workflow engine (`backend/app/workflow.py`)
+
+Deterministic finite-state machine with three states: `auto_approved`,
+`needs_review`, `rejected`. Routing is based purely on
+`extraction.requires_review` and the per-field rules; the same extraction
+always routes to the same state. **Idempotent:** routing or re-routing the
+same extraction never creates a second `workflow_items` row. State changes
+emit one audit event each. See `docs/workflow.md` for the state diagram.
+
+### Audit log (`backend/app/audit.py`)
+
+`audit_events` is append-only by construction: the repository exposes only
+`append`, never `update` or `delete`. Every workflow transition (model-driven
+or human-driven) writes exactly one row containing `actor`, `action`,
+`payload`, and a foreign key to the workflow item. Replaying the events for a
+workflow item must reproduce its current state — this is asserted in
+`test_audit_events_append_only.py`. See `docs/audit-and-review.md`.
+
+### Review API + UI (`backend/app/routers/review.py`, `frontend/src/Review.tsx`)
+
+`GET /review` lists items in the `needs_review` state. `POST /review/{id}/approve`
+and `POST /review/{id}/reject` transition the item, write the audit event, and
+return the new state in one transaction. The React UI is a paginated queue
+with approve/reject actions and an actor field; the typed client in
+`frontend/src/api.ts` mirrors the Pydantic shapes and is the only place HTTP
+details live.
+
+### Dashboard API (`backend/app/routers/dashboard.py`)
+
+Four read-only KPI endpoints:
+- `/dashboard/volume` — daily ingestion counts over the last *N* days.
+- `/dashboard/categories` — extractions grouped by schema name.
+- `/dashboard/confidence` — histogram of per-field confidence.
+- `/dashboard/sla` — count of `needs_review` items older than *N* hours.
+
+The frontend Dashboard route is React-lazy-loaded (perf follow-up #11) so the
+Recharts vendor chunk does not block the initial bundle.
+
+### Observability (`backend/app/observability.py`)
+
+`configure_logging()` wires structlog for JSON output (`SENTINEL_LOG_FORMAT=
+console` for local dev). `RequestIdMiddleware` assigns a stable id per request
+(an inbound `X-Request-Id` is sanitised and accepted; a generated `uuid4().hex`
+is used otherwise), binds it to the structlog contextvars for the request
+scope, and surfaces it on the response so end-to-end correlation across the
+nginx → FastAPI hop is one grep.
+
+### Evaluation harness (`eval/`)
+
+Three evaluators (extraction accuracy, retrieval precision/recall/MRR, RAG
+citation-validity / cites-relevant / refusal rate) against a small hand-labeled
+synthetic benchmark. The harness emits `n/a (...)` rather than a number when a
+fake provider is in play — the n/a gate is what keeps Golden Rule #5 enforced
+in code, not just convention. See `docs/evaluation.md` for the methodology
+defense.
+
+---
+
+## End-to-end request flows
+
+### `POST /query` — citation-grounded RAG
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant Client
+    participant FastAPI as POST /query
+    participant Retr as retrieval
+    participant DB as Postgres+pgvector
+    participant Emb as Embeddings (OpenAI)
+    participant Guard as guardrails
+    participant LLM as Claude (Anthropic)
+
+    Client->>FastAPI: { query }
+    FastAPI->>Emb: embed(query)
+    Emb-->>FastAPI: query vector
+    FastAPI->>Retr: top-k(query vector)
+    Retr->>DB: cosine top-k on chunks.embedding
+    DB-->>Retr: [chunks…]
+    Retr-->>FastAPI: retrieved set
+    alt top-score < threshold
+        FastAPI-->>Client: { status: refused, reason: no_support }
+    else ok
+        FastAPI->>Guard: redact_pii(question, chunks)
+        Guard-->>FastAPI: redacted prompt
+        FastAPI->>LLM: prompt (must cite [chunk:N])
+        LLM-->>FastAPI: answer + citations
+        FastAPI->>FastAPI: validate citations vs retrieved set
+        alt any cited id not retrieved
+            FastAPI-->>Client: { status: refused, reason: invalid_citation }
+        else valid
+            FastAPI-->>Client: { status: answered, answer, citations }
+        end
+    end
+```
+
+### `POST /extract` — schema-constrained extraction with HITL routing
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant Client
+    participant FastAPI as POST /extract
+    participant Retr as retrieval
+    participant LLM as Claude (Anthropic)
+    participant Guard as guardrails
+    participant Wflow as workflow
+    participant Audit as audit
+    participant DB as Postgres
+
+    Client->>FastAPI: { document_id, schema_name }
+    FastAPI->>Retr: top-k(document chunks)
+    Retr-->>FastAPI: chunks
+    FastAPI->>Guard: redact_pii(chunks)
+    Guard-->>FastAPI: redacted context
+    FastAPI->>LLM: schema-constrained prompt
+    LLM-->>FastAPI: { fields[name]: { value, confidence, source_chunk_id } }
+    FastAPI->>FastAPI: validate schema + citations
+    FastAPI->>Guard: low_confidence_fields(...)
+    Guard-->>FastAPI: requires_review, low_fields
+    FastAPI->>DB: INSERT extractions
+    FastAPI->>Wflow: route_extraction(...)
+    Wflow->>DB: INSERT workflow_items (idempotent)
+    Wflow->>Audit: append("workflow.routed", actor=system)
+    Audit->>DB: INSERT audit_events (append-only)
+    FastAPI-->>Client: ExtractResponse incl. requires_review, low_confidence_fields
+```
+
+### Human review
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant Reviewer
+    participant UI as React /review
+    participant API as POST /review/{id}/approve|reject
+    participant Wflow as workflow
+    participant Audit as audit
+    participant DB as Postgres
+
+    Reviewer->>UI: open queue
+    UI->>API: GET /review
+    API->>DB: SELECT workflow_items WHERE status=needs_review
+    DB-->>API: items
+    API-->>UI: paginated list
+    Reviewer->>UI: click Approve / Reject
+    UI->>API: POST { actor, note }
+    API->>Wflow: set_status(...)
+    Wflow->>DB: UPDATE workflow_items.status (one row, transactional)
+    Wflow->>Audit: append("review.approved" | "review.rejected", actor=human)
+    Audit->>DB: INSERT audit_events
+    DB-->>API: new state
+    API-->>UI: { id, status, audit_event_id }
+```
+
+---
+
+## Data model
+
+```mermaid
+erDiagram
+    documents ||--o{ chunks : has
+    documents ||--o{ extractions : produces
+    extractions ||--|| workflow_items : routes_to
+    workflow_items ||--o{ audit_events : trail
+
+    documents {
+        bigint id PK
+        text title
+        text source
+        text hash UK "SHA-256 of original text; idempotency key"
+        timestamptz created_at
+    }
+    chunks {
+        bigint id PK
+        bigint document_id FK
+        int ordinal
+        text text "PII-redacted at ingest"
+        vector embedding "pgvector(1536)"
+    }
+    extractions {
+        bigint id PK
+        bigint document_id FK
+        text schema_name
+        jsonb payload "fields[name] = {value, confidence, source_chunk_id}"
+        boolean requires_review
+        jsonb low_confidence_fields
+        timestamptz created_at
+    }
+    workflow_items {
+        bigint id PK
+        bigint extraction_id FK
+        text status "auto_approved | needs_review | rejected"
+        text reason
+        text idempotency_key UK
+        timestamptz created_at
+        timestamptz updated_at
+    }
+    audit_events {
+        bigint id PK
+        bigint workflow_item_id FK
+        text actor "system or a human user id"
+        text action
+        jsonb payload
+        timestamptz created_at "INSERT-only"
+    }
+```
+
+The `idempotency_key` on `workflow_items` is what enforces "routing or
+re-routing the same extraction never creates a second row." The unique index
+on `audit_events` is intentionally absent — order of arrival is the only
+constraint and is preserved by `created_at` plus the row's natural id.
+
+---
+
+## Deployment shape (M10)
+
+```mermaid
+flowchart TB
+    subgraph aws[AWS us-east-1]
+      subgraph vpc[VPC 10.0.0.0/16]
+        igw[Internet Gateway]
+
+        subgraph pub[Public subnets 10.0.0.0/24, 10.0.1.0/24]
+          alb[ALB :80<br/>frontend default; /health → backend]
+          subgraph fe[ECS Fargate]
+            nginx_task[frontend task<br/>nginx :8080]
+          end
+          subgraph be[ECS Fargate]
+            api_task[backend task<br/>FastAPI :8000]
+          end
+          rds[(RDS Postgres 16 db.t4g.micro<br/>publicly_accessible=false)]
+        end
+
+        sd[Service discovery<br/>backend.sentinel.local]
+      end
+
+      ecr_be[(ECR sentinel-backend)]
+      ecr_fe[(ECR sentinel-frontend)]
+      ssm[(SSM SecureString<br/>/sentinel/anthropic_api_key<br/>/sentinel/openai_api_key<br/>/sentinel/database_url)]
+      cwlogs[CloudWatch Logs<br/>retention 7d]
+
+      gha[GitHub Actions OIDC role<br/>scoped: ECR push + ECS update-service]
+    end
+
+    internet([Internet])
+    user2([Reviewer / browser])
+    cd([CD: workflow_dispatch])
+
+    internet --> igw --> alb
+    user2 -.HTTPS in real deploys.-> alb
+    alb -- default --> nginx_task
+    alb -- /health --> api_task
+    nginx_task -- /api/* (rewrite ^/api) --> sd --> api_task
+    api_task --> rds
+    api_task --> ssm
+    nginx_task --> cwlogs
+    api_task --> cwlogs
+
+    cd --> gha
+    gha --> ecr_be
+    gha --> ecr_fe
+    gha --> api_task
+    gha --> nginx_task
+```
+
+### Security invariants encoded in security groups
+
+```
+internet      ──→ alb_sg          (80, 443)
+alb_sg        ──→ frontend_sg     (8080)        ALB → nginx
+alb_sg        ──→ backend_sg      (8000)        ALB → FastAPI /health
+frontend_sg   ──→ backend_sg      (8000)        nginx /api proxy → FastAPI
+backend_sg    ──→ rds_sg          (5432)        FastAPI → Postgres
+```
+
+**RDS is not publicly accessible.** `aws_db_instance.publicly_accessible =
+false` and the `rds` security group ingress is keyed only to the backend SG.
+Even though RDS lives in the same public subnets as the tasks (no private
+subnets in the no-NAT design), the SG bars internet reach.
+
+### Cost posture (deliberate, demo-only)
+
+| Resource              | Approx idle cost | Notes                                       |
+| --------------------- | ---------------: | ------------------------------------------- |
+| ALB                   |          ~$16/mo | Cheapest line item that's still always-on.  |
+| 2× Fargate (0.25 vCPU)|          ~$15/mo | 24/7. Stop the services to stop the bill.   |
+| RDS db.t4g.micro 20 GB|          ~$13/mo | Single-AZ. ~$2/mo storage + ~$11/mo compute.|
+| ECR storage           |           <$1/mo | 20-image cap on each repo.                  |
+| CloudWatch Logs       |           <$1/mo | 7-day retention, demo log volume is tiny.   |
+| **Total idle floor**  |     **~$45/mo**  | Plus per-second Fargate + traffic charges.  |
+
+The estimate excludes a NAT Gateway (~$32/mo idle) by design: ECS tasks live
+in public subnets with `assign_public_ip = true` so they can reach ECR,
+Anthropic, OpenAI, and CloudWatch without one. This is acceptable **only**
+because the security groups are tight (above) and the deployment is
+ephemeral. Run `terraform destroy` immediately after demo screenshots — the
+operator recipe lives in `infra/README.md`.
+
+### CD posture
+
+`.github/workflows/cd.yml` is `workflow_dispatch`-only. There is no `push:`
+or `pull_request:` trigger. The trigger gate is the cost-control mechanism
+for M10; deploys never happen on a code push by accident. The CD job assumes
+the OIDC role written by `infra/modules/ci_oidc/`, builds and pushes the
+images to ECR, and force-redeploys the ECS services.
+
+---
+
+## Cross-references
+
+| Concern | Where to look |
+| --- | --- |
+| Workflow state machine | `docs/workflow.md`, `backend/app/workflow.py` |
+| Audit invariants | `docs/audit-and-review.md`, `backend/app/audit.py` |
+| Guardrails (PII + confidence) | `docs/guardrails.md`, `backend/app/guardrails.py` |
+| Eval methodology | `docs/evaluation.md`, `eval/` |
+| Infra (cost, security, recipe) | `infra/README.md`, `infra/modules/*` |
+| ADRs | [`docs/adr/`](adr/) |
+| Demo runbook | `docs/demo.md` |
diff --git a/docs/architecture.mmd b/docs/architecture.mmd
new file mode 100644
index 0000000..ba2d525
--- /dev/null
+++ b/docs/architecture.mmd
@@ -0,0 +1,76 @@
+---
+title: Sentinel — high-level architecture
+---
+flowchart LR
+    classDef ext fill:#eef,stroke:#557,color:#114
+    classDef api fill:#efe,stroke:#575,color:#141
+    classDef db  fill:#fee,stroke:#755,color:#411
+    classDef ui  fill:#ffe,stroke:#775,color:#441
+    classDef gov fill:#fef,stroke:#757,color:#414
+
+    user([User browser]):::ui
+    cli([CLI / curl]):::ui
+
+    subgraph Frontend["Frontend (Vite + React + TS)"]
+      spa[SPA: Query / Review / Dashboard]:::ui
+      nginx[nginx reverse proxy]:::ui
+    end
+
+    subgraph Backend["Backend (FastAPI, Python 3.12)"]
+      query[/POST /query<br/>RAG/]:::api
+      extract[/POST /extract<br/>structured extraction/]:::api
+      review[/GET/POST /review<br/>HITL queue/]:::api
+      dash[/GET /dashboard/*<br/>KPIs/]:::api
+      health[/GET /health/]:::api
+
+      subgraph Pipeline
+        retrieval[retrieval<br/>pgvector top-k]
+        rag[rag<br/>citation-or-refuse]
+        extr[extract<br/>schema-constrained]
+        wflow[workflow<br/>deterministic FSM]
+      end
+
+      subgraph Governance
+        guard[guardrails<br/>PII redact + conf gate]:::gov
+        audit[audit<br/>append-only events]:::gov
+      end
+    end
+
+    subgraph Data["Postgres 16 + pgvector"]
+      docs[(documents)]:::db
+      chunks[(chunks + embeddings)]:::db
+      extrs[(extractions)]:::db
+      wfit[(workflow_items)]:::db
+      audtbl[(audit_events)]:::db
+    end
+
+    subgraph Providers["External providers"]
+      claude[Anthropic Claude]:::ext
+      openai[OpenAI embeddings]:::ext
+    end
+
+    user --> spa
+    spa --> nginx
+    nginx -- /api/* --> query
+    nginx -- /api/* --> extract
+    nginx -- /api/* --> review
+    nginx -- /api/* --> dash
+    cli --> health
+
+    query --> retrieval --> chunks
+    query --> rag --> claude
+    rag --> guard
+    extract --> retrieval
+    extract --> extr --> claude
+    extract --> guard
+    extract --> wflow --> wfit
+    extract --> extrs
+    review --> wflow
+    review --> audit --> audtbl
+    wflow --> audit
+
+    retrieval --> openai
+
+    dash --> docs
+    dash --> wfit
+    dash --> extrs
diff --git a/docs/architecture.png b/docs/architecture.png
new file mode 100644
index 0000000..31d3ff9
Binary files /dev/null and b/docs/architecture.png differ
diff --git a/docs/demo.md b/docs/demo.md
index 92540e4..ed4a696 100644
--- a/docs/demo.md
+++ b/docs/demo.md
@@ -1,4 +1,351 @@
 # Demo script
 
-> The 7-step demo script lands in Milestone M11. Placeholder for now so the
-> docs structure exists from M0 onward.
+A 7-step runbook that takes a reviewer from `git clone` to an end-to-end
+demonstration of every Sentinel capability — citation-grounded RAG, refusal,
+schema-constrained extraction, the human-in-the-loop review queue, and the
+dashboard — in roughly fifteen minutes on a developer laptop. An optional
+final section repeats the demo on AWS using the M10 Terraform stack.
+
+> All sample data is synthetic. The `data/sample/` corpus is generated by
+> `scripts/gen_synthetic_corpus.py` with a fixed seed (deterministic and
+> reproducible); see `data/sample/README.md`.
+
+## Prerequisites
+
+| Tool | Version | Why |
+| --- | --- | --- |
+| Docker + Docker Compose | recent | Postgres 16 + pgvector |
+| `uv` | 0.4+ | Python toolchain (installs the 3.12 venv on first run) |
+| Node | 20 LTS | Vite dev server for the frontend |
+| Anthropic API key | claude-sonnet-4-6 access | for `/query` and `/extract` |
+| OpenAI API key | text-embedding-3-small access | for embeddings at ingest time |
+
+Without API keys you can still run the test suite (it uses the deterministic
+fake LLM and embedder) but `/query` and `/extract` against the real
+synthetic corpus need real keys.
+
+---
+
+## Step 1 — Clone and start the stack
+
+```bash
+git clone https://github.com/div0rce/sentinel.git
+cd sentinel
+
+# Copy the env template and fill in your keys.
+cp .env.example .env
+$EDITOR .env   # set ANTHROPIC_API_KEY and OPENAI_API_KEY
+
+# Bring up Postgres 16 + pgvector locally.
+docker compose up -d db
+
+# Install Python deps and run the dev API in one command.
+make dev   # uv sync + uvicorn backend.app.main:app --reload, port 8000
+```
+
+In a second terminal, start the frontend:
+
+```bash
+cd frontend
+npm ci
+npm run dev    # Vite dev server on :5173, proxies /query|/extract|/review|/dashboard|/health to :8000
+```
+
+Open <http://localhost:5173> in a browser. The SPA loads on the Query route.
+
+> `screenshot: docs/screenshots/01-query-empty.png` — empty Query page after first load.
+
+## Step 2 — Migrate and seed the synthetic corpus
+
+Apply migrations and ingest the committed sample documents:
+
+```bash
+make migrate    # alembic upgrade head — creates tables + enables the vector extension
+make seed       # python -m backend.app.ingest --path data/sample
+```
+
+The ingest pipeline is idempotent: re-running `make seed` is a no-op because
+the `documents.hash` check short-circuits identical content. The PII redaction
+pass runs **before** the chunk store, so the database never sees raw email
+addresses or phone numbers.
+
+Sanity-check the database (should report ~15 documents and a few hundred
+chunks, depending on the latest synthetic corpus size):
+
+```bash
+psql postgres://sentinel:sentinel@localhost:5432/sentinel \
+  -c "select count(*) as documents from documents;
+      select count(*) as chunks from chunks;"
+```
+
+## Step 3 — Query the corpus, get a cited answer
+
+The flagship capability: ask a natural-language question, receive a
+source-cited answer.
+
+In the SPA, paste this question on the Query page and click **Ask**:
+
+> What is the total amount due on the Initech Components invoice issued on 2026-01-22?
+
+Or hit the API directly:
+
+```bash
+curl -s http://localhost:8000/query \
+  -H 'Content-Type: application/json' \
+  -d '{"query":"What is the total amount due on the Initech Components invoice issued on 2026-01-22?"}' \
+  | jq
+```
+
+Expected response shape (the actual answer text is generated by Claude and may
+phrase the answer differently, but the citation invariants are deterministic):
+
+```json
+{
+  "status": "answered",
+  "answer": "The total amount due on the Initech Components invoice issued on 2026-01-22 is $90,006.92 [chunk:42].",
+  "citations": [{"chunk_id": 42, "document_id": 1, "score": 0.83, "text": "..."}],
+  "reason": null
+}
+```
+
+Two invariants you can verify by inspection:
+
+- **Citation-or-refuse.** Every claim is annotated `[chunk:N]`. The backend
+  parses those markers and refuses the response if any cited id was not in
+  the retrieved set (`reason: "invalid_citation"`).
+- **PII redaction is pre-LLM.** The prompt sent to Claude contains
+  `[REDACTED:EMAIL]` etc. in place of any matched PII; you can verify by
+  setting `SENTINEL_LOG_FORMAT=console` and tailing the API logs.
+
+> `screenshot: docs/screenshots/02-query-cited.png` — Query page rendering an
+> answered response with the citation chip showing source chunk text.
+
+## Step 4 — Query the corpus, observe a refusal
+
+Ask a question with no support in the synthetic corpus:
+
+> When did the first humans land on the moon?
+
+```bash
+curl -s http://localhost:8000/query \
+  -H 'Content-Type: application/json' \
+  -d '{"query":"When did the first humans land on the moon?"}' | jq
+```
+
+Expected response:
+
+```json
+{
+  "status": "refused",
+  "answer": "",
+  "citations": [],
+  "reason": "no_support"
+}
+```
+
+The retrieval top-score is below `RAG_SIMILARITY_THRESHOLD`, so the system
+refuses *before* calling the LLM — no answer is hallucinated, no token spend.
+This is the citation-or-refuse policy doing its job.
+
+> `screenshot: docs/screenshots/03-query-refusal.png` — refusal banner on the
+> Query page with `reason: no_support`.
+
+## Step 5 — Extract a structured record from a document
+
+Pick the document id of an ingested invoice (a fresh `make seed` typically
+makes the first invoice id `1`). Then call `/extract` with the registered
+`invoice` schema:
+
+```bash
+curl -s http://localhost:8000/extract \
+  -H 'Content-Type: application/json' \
+  -d '{"document_id":1, "schema_name":"invoice"}' | jq
+```
+
+Expected shape:
+
+```json
+{
+  "status": "ok",
+  "document_id": 1,
+  "schema_name": "invoice",
+  "extraction_id": 1,
+  "payload": {
+    "vendor": "Initech Components",
+    "invoice_number": "INV-2026000",
+    "issue_date": "2026-01-22",
+    "total_due": 90006.92
+  },
+  "field_confidence": {
+    "vendor": 0.97,
+    "invoice_number": 0.99,
+    "issue_date": 0.94,
+    "total_due": 0.71
+  },
+  "field_citations": {
+    "vendor": [10],
+    "invoice_number": [10],
+    "issue_date": [10],
+    "total_due": [12]
+  },
+  "requires_review": true,
+  "low_confidence_fields": ["total_due"],
+  "reason": null
+}
+```
+
+Three things to point out to a reviewer:
+
+- **Per-field provenance.** `field_citations` maps each extracted field to
+  the chunk id that supports it. The backend validates that each cited id
+  was in the retrieval set (the same citation-validity rule as `/query`).
+- **Per-field confidence.** The LLM emits a self-reported confidence per
+  field; we use it as a routing signal (M5 / M6) but do not interpret it as
+  calibrated probability — see `docs/evaluation.md`.
+- **Routing.** `requires_review` is `true` because at least one field's
+  confidence is below `CONFIDENCE_REVIEW_THRESHOLD` (default `0.75`). The
+  `/extract` handler has already routed the successful extraction through the
+  workflow engine, which inserted a `workflow_items` row in
+  `needs_review` and written one `audit_events` row tagged
+  `actor=system action=workflow.routed`.
+
+> `screenshot: docs/screenshots/04-extract-result.png` — JSON viewer in the
+> SPA Query/Extract panel showing the structured record with the
+> low-confidence field highlighted.
+
+## Step 6 — Approve in the human-in-the-loop review queue
+
+In the SPA, navigate to the **Review** tab. The queue lists every workflow
+item in `needs_review` state. The extraction from Step 5 should be at the top.
+
+Enter your name in the actor field (any string; it's recorded verbatim as the
+audit event's `actor`), optionally add a note, and click **Approve**.
+
+Behind the scenes:
+
+```
+POST /review/{id}/approve  body: {"actor": "Reviewer", "note": "verified against invoice PDF"}
+```
+
+The backend transitions the workflow item from `needs_review` to
+`auto_approved` (the terminal state) **and** writes one new `audit_events`
+row tagged `actor=Reviewer action=review.approved` — both in the same
+transaction.
+
+Confirm the audit trail:
+
+```bash
+psql postgres://sentinel:sentinel@localhost:5432/sentinel <<'SQL'
+WITH target_item AS (
+  SELECT wi.id
+  FROM workflow_items wi
+  JOIN extractions e ON e.id = wi.extraction_id
+  WHERE e.schema_name = 'invoice'
+  ORDER BY wi.updated_at DESC, wi.id DESC
+  LIMIT 1
+)
+SELECT ae.id, ae.actor, ae.action, ae.before, ae.after, ae.request_id, ae.ts
+FROM audit_events ae
+JOIN target_item ti
+  ON ae.target_type = 'workflow_item'
+ AND ae.target_id = ti.id
+ORDER BY ae.id;
+SQL
+```
+
+The CTE finds the most recently updated invoice workflow item, so the query
+does not depend on a particular local id sequence.
+
+You should see two rows: one `system / workflow.routed` (from Step 5) and
+one `Reviewer / review.approved` (from this step). Replaying these events
+in order reproduces the workflow item's current state — the property tested
+in `backend/tests/test_audit_events_append_only.py`.
+
+> `screenshot: docs/screenshots/05-review-queue.png` — Review queue with one
+> item highlighted, approve/reject buttons visible, actor field filled in.
+
+## Step 7 — Dashboard
+
+Click the **Dashboard** tab. The lazy-loaded route renders four KPIs:
+
+- **Volume** — daily ingestion counts over the last 30 days (synthetic
+  corpus → spike on the day of `make seed`).
+- **Categories** — extraction counts grouped by schema name (`invoice`,
+  potentially others as you call `/extract` more).
+- **Confidence histogram** — distribution of per-field confidence across
+  all extractions, bucketed.
+- **SLA** — count of `needs_review` items older than the configured
+  threshold (default 24 h). Useful to demo what happens when reviewers fall
+  behind.
+
+Each panel is a Recharts component fed by a single typed API call from
+`frontend/src/api.ts`; the underlying endpoints are read-only and live in
+`backend/app/routers/dashboard.py`.
+
+> `screenshot: docs/screenshots/06-dashboard.png` — Dashboard rendering with
+> the four panels populated. Capture this *after* Step 5 and Step 6 so the
+> categories panel and the SLA panel both have data.
+
+## Teardown (local)
+
+```bash
+docker compose down -v   # removes the Postgres volume so a fresh demo starts clean
+```
+
+---
+
+## Optional — repeat the demo on AWS
+
+The M10 Terraform stack provisions an ephemeral demo deployment in `us-east-1`.
+The full operator runbook (apply / write secrets / deploy / destroy) lives in
+[`infra/README.md`](../infra/README.md). Short version:
+
+```bash
+cd infra
+export TF_VAR_db_password="$(openssl rand -base64 24)"
+export TF_VAR_github_repository="OWNER/sentinel"   # for the OIDC role; optional
+terraform fmt -recursive -check
+terraform init
+terraform validate
+terraform plan -out=plan.tfplan
+terraform apply plan.tfplan
+
+# Write the API keys out-of-band (not in tfstate)
+aws ssm put-parameter --name /sentinel/anthropic_api_key \
+  --type SecureString --value "$ANTHROPIC_API_KEY" --overwrite
+aws ssm put-parameter --name /sentinel/openai_api_key \
+  --type SecureString --value "$OPENAI_API_KEY" --overwrite
+
+# Force the backend to pick up the new secret values
+aws ecs update-service --cluster sentinel-cluster \
+  --service sentinel-backend --force-new-deployment --no-cli-pager
+
+# The ALB DNS name is the demo URL.
+terraform output alb_dns_name
+```
+
+Migrations against RDS run as a one-off Fargate task; recipe in
+`infra/README.md`. After capturing screenshots, **immediately**:
+
+```bash
+terraform destroy
+```
+
+The cost posture (`~$45/month idle floor`, dominated by the ALB + Fargate +
+RDS) is documented in `infra/README.md`. Leaving the stack running overnight
+is ~$1.50; leaving it for a month is ~$45.
+
+---
+
+## Cross-references
+
+- **Architecture diagram + write-up:** [`architecture.md`](architecture.md).
+- **Guardrail specifics (PII, confidence gating):** [`guardrails.md`](guardrails.md).
+- **Workflow state machine:** [`workflow.md`](workflow.md).
+- **Audit invariants & replay:** [`audit-and-review.md`](audit-and-review.md).
+- **Evaluation methodology:** [`evaluation.md`](evaluation.md).
+- **Infra recipe & cost:** [`../infra/README.md`](../infra/README.md).
+
+> All screenshots referenced above are placeholders. Capture them on a real
+> demo run; commit to `docs/screenshots/` (gitignored by default — flip the
+> rule when capturing for a portfolio review).