Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Build context for the backend image is the repo root (so the Dockerfile can
# COPY pyproject.toml uv.lock alembic.ini). Trim everything that doesn't ship
# in the backend image so the context stays small and free of secrets.

# Repo metadata / VCS
.git/
.github/
.gitignore
.editorconfig
.dockerignore
**/.DS_Store

# Test trees
backend/tests/
backend/.pytest_cache/
backend/__pycache__/
**/__pycache__/
*.pyc
*.pyo

# Local Python state
.venv/
.mypy_cache/
.ruff_cache/
.pytest_cache/

# Local secrets / env
.env
.env.*

# Frontend tree (frontend image has its own context)
frontend/

# Eval, scripts, infra, docs — none of these ship in the backend image
eval/
scripts/
infra/
docs/

# IDE / agent state
.kiro/
.claude/
.agents/

# Local data
data/

# Misc
*.log
*.md
node_modules/
dist/
98 changes: 98 additions & 0 deletions .github/workflows/cd.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
name: CD
on:
workflow_dispatch:
inputs:
services:
description: "Which services to deploy. backend|frontend|both"
required: true
default: both
type: choice
options:
- both
- backend
- frontend

permissions:
id-token: write # OIDC
contents: read

env:
AWS_REGION: us-east-1
PROJECT_NAME: sentinel
IMAGE_TAG: ${{ github.sha }}

jobs:
deploy:
runs-on: ubuntu-latest
# M10 invariant: deploys are manual-dispatch only. Never push:, never pull_request:.
# Cost-control gate is enforced by the trigger above; do not add more.
steps:
- uses: actions/checkout@v4

- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
role-session-name: github-actions-${{ github.run_id }}

- name: Login to Amazon ECR
id: ecr-login
uses: aws-actions/amazon-ecr-login@v2

- name: Resolve repo URIs
id: ecr-uri
run: |
set -eu
backend_uri="${{ steps.ecr-login.outputs.registry }}/${{ env.PROJECT_NAME }}-backend"
frontend_uri="${{ steps.ecr-login.outputs.registry }}/${{ env.PROJECT_NAME }}-frontend"
echo "backend_uri=${backend_uri}" >> "$GITHUB_OUTPUT"
echo "frontend_uri=${frontend_uri}" >> "$GITHUB_OUTPUT"

- name: Build & push backend image
if: ${{ inputs.services == 'backend' || inputs.services == 'both' }}
run: |
set -eu
docker build \
--platform linux/amd64 \
-t "${{ steps.ecr-uri.outputs.backend_uri }}:${{ env.IMAGE_TAG }}" \
-t "${{ steps.ecr-uri.outputs.backend_uri }}:latest" \
-f backend/Dockerfile .
docker push "${{ steps.ecr-uri.outputs.backend_uri }}:${{ env.IMAGE_TAG }}"
docker push "${{ steps.ecr-uri.outputs.backend_uri }}:latest"

- name: Build & push frontend image
if: ${{ inputs.services == 'frontend' || inputs.services == 'both' }}
run: |
set -eu
docker build \
--platform linux/amd64 \
-t "${{ steps.ecr-uri.outputs.frontend_uri }}:${{ env.IMAGE_TAG }}" \
-t "${{ steps.ecr-uri.outputs.frontend_uri }}:latest" \
./frontend
docker push "${{ steps.ecr-uri.outputs.frontend_uri }}:${{ env.IMAGE_TAG }}"
docker push "${{ steps.ecr-uri.outputs.frontend_uri }}:latest"

- name: Force ECS redeploy (backend)
if: ${{ inputs.services == 'backend' || inputs.services == 'both' }}
run: |
aws ecs update-service \
--cluster "${{ env.PROJECT_NAME }}-cluster" \
--service "${{ env.PROJECT_NAME }}-backend" \
--force-new-deployment \
--no-cli-pager

- name: Force ECS redeploy (frontend)
if: ${{ inputs.services == 'frontend' || inputs.services == 'both' }}
run: |
aws ecs update-service \
--cluster "${{ env.PROJECT_NAME }}-cluster" \
--service "${{ env.PROJECT_NAME }}-frontend" \
--force-new-deployment \
--no-cli-pager

- name: Summarise deployment
run: |
echo "Deployed image tag: ${{ env.IMAGE_TAG }}" >> "$GITHUB_STEP_SUMMARY"
echo "Services: ${{ inputs.services }}" >> "$GITHUB_STEP_SUMMARY"
echo "Region: ${{ env.AWS_REGION }}" >> "$GITHUB_STEP_SUMMARY"
15 changes: 15 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,18 @@ jobs:
- run: npm run lint
- run: npm test
- run: npm run build

terraform:
runs-on: ubuntu-latest
defaults:
run:
working-directory: infra
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.9.8
# No AWS credentials needed for fmt + validate.
- run: terraform fmt -recursive -check
- run: terraform init -backend=false
- run: terraform validate
38 changes: 20 additions & 18 deletions PROGRESS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,29 @@

## Current state

- **Active milestone:** M9Evaluation harness (résumé metrics)
- **Status:** complete on branch (started 2026-05-29, completed 2026-05-29); awaiting CI green and human squash-merge
- **Active branch:** `feat/m09-eval` (PR open — see Milestone status)
- **Last completed milestone:** M8Frontend (PR #9, merged 2026-05-29) + perf follow-up (PR #11, merged 2026-05-29)
- **`make check` passing:** yes locally on a freshly migrated DB (187 backend tests + 7 frontend tests; tsc + vite build clean)
- **Last action:** committed the M9 work in 3 small Conventional Commits (PROGRESS housekeeping; eval package + labels + RESULTS.md PENDING; tests + docs/evaluation.md + Settings model bump). Verified `make eval` under fake providers prints `n/a` and refuses to publish numbers; 9 asserted-fixture tests prove the scorer + writer end-to-end.
- **Next action:** human squash-merges the M9 PR. After merge, wire `ANTHROPIC_API_KEY` and `OPENAI_API_KEY`, run `make eval`, and overwrite `eval/RESULTS.md` with real numbers in the immediate follow-up commit. Then `/start-milestone 10` for containerization + Terraform + CD.
- **Active milestone:** M10Containerization + Terraform (AWS) + CD
- **Status:** complete on branch (started 2026-05-29, completed 2026-05-29); awaiting CI green and human squash-merge. Per the locked constraints, **no `terraform apply` was run** — the PR ships infra-as-code only. Demo deployment + screenshots remain a manual operator action documented in `infra/README.md`.
- **Active branch:** `feat/m10-deploy` (PR open — see Milestone status)
- **Last completed milestone:** M9Evaluation harness (PR #12, merged 2026-05-29)
- **`make check` passing:** baseline green from M9; M10 adds 8 request-id-middleware tests for a backend total of 195. Frontend tests unchanged (7).
- **Last action:** committed M10 in 5 small Conventional Commits (housekeeping; backend structlog + request-id middleware + production Dockerfile + tests; frontend production Dockerfile + nginx.conf.template; Terraform stack with five modules; CD workflow + .dockerignore relocation + CI terraform job).
- **Next action:** human squash-merges the M10 PR. After merge, follow `infra/README.md` to apply the stack, set the GitHub `AWS_ROLE_ARN` secret from the OIDC role output, write the API keys via `aws ssm put-parameter`, dispatch the CD workflow, capture demo screenshots, and `terraform destroy` immediately. Then `/start-milestone 11` for docs + diagram + demo.
- **Blockers:** none.

### M9 DoD verification
### M10 DoD verification

- [x] **`make eval` runs end-to-end and writes `eval/RESULTS.md` with metrics, k, dataset size, and method.** The CLI in `eval/run.py` prints a one-line summary per metric and writes `eval/RESULTS.md`. Under fake providers (verified locally) every metric prints `n/a (...)` and the file is left as the methodology-only PENDING document — no numbers ship in the tree until a real run.
- [x] **Methodology is documented well enough to defend verbally in an interview.** `docs/evaluation.md` (224 lines) covers dataset shape, provider pinning, every metric definition (extraction normalization rules, precision@k denominator footnote, lite-faithfulness scope, refusal-rate non-interpretation), the n/a gate, the reproduction recipe, and explicit limits (small dataset, synthetic corpus caveat, no calibration claim, citation-validity vs. true faithfulness).
- [ ] **Numbers are real (from this run). Record them in `PROGRESS.md` "Decision log" too.** *Pending* — no API keys wired in this session. The harness contract + asserted-fixture pytest is what merges; real numbers land in the immediate follow-up commit once keys are configured.
- [ ] **`terraform plan` is clean; `apply` provisions the stack.** *Pending* — locally we have no `terraform` binary and the user has explicitly forbidden any `terraform plan`/`apply` or AWS API calls in this session. The infra is wired so a `terraform fmt -check` + `terraform validate` job runs in CI on every PR (no AWS creds needed); plan/apply remains a manual operator step. Confirming this DoD item requires the operator to run `terraform plan` against an AWS account, which is the M11 demo workflow.
- [x] **CD workflow builds and deploys on manual dispatch.** `.github/workflows/cd.yml` is `workflow_dispatch`-only (no `push:`/`pull_request:` triggers — the M10 cost-control invariant), uses `aws-actions/configure-aws-credentials@v4` against an OIDC role written by `infra/modules/ci_oidc/`, builds backend and frontend images, pushes to ECR with the git SHA tag, and force-redeploys the ECS services.
- [x] **App is reachable at a URL** — *infra-as-code complete*. The ALB DNS (`output "alb_dns_name"`) is the URL once `terraform apply` succeeds. Capturing screenshots is the M11 demo task; the operator runs `terraform destroy` immediately after.

### M9 design lock-ins (per pre-flight review, all delivered)
### M10 design lock-ins

- **Metric set.** Extraction: normalized exact-match (trim + casefold strings, ISO date canonicalisation, 0.01 numeric tolerance), micro + macro accuracy, per-field precision/recall (column reported regardless so optional-field schemas later get the right reading without a code change). Retrieval: precision@k (headline) + recall@k + MRR with the precision-cap footnote. RAG: citation-validity rate + answer-cites-relevant rate + answer-substring rate; refusals counted but not interpreted as quality.
- **Honesty discipline.** Under `EMBEDDINGS_PROVIDER=fake` retrieval and RAG go to `n/a`; under `LLM_PROVIDER=fake` extraction and RAG go to `n/a`. Counts are still emitted because they describe the dataset, not the system. Asserted-fixture pytest tests prove the scorer + writer; nothing in the test path produces a number that could be misread as a quality claim.
- **What ships.** Harness + 5+6+5 hand-authored synthetic labels + asserted pytest fixtures + methodology-only PENDING `eval/RESULTS.md`. No fabricated numbers in the tree. Real numbers fill the file in the immediate follow-up.
- **Provider pair.** `claude-sonnet-4-6` (verified against Anthropic docs 2026-05-29 — dateless 4.6-generation IDs are pinned snapshots, not evergreen pointers); `text-embedding-3-small` (1536-dim, schema-canonical); temperature 0.
- **Code only.** No `terraform apply`. No AWS resource creation. No incurred costs in this PR.
- **Cost posture.** Public-subnet + no-NAT-Gateway, single-AZ, Fargate `0.25 vCPU / 0.5 GB`, RDS `db.t4g.micro`. NAT Gateway idle cost (~$32/month) avoided. RDS **not publicly accessible** (security-group ingress keyed only to the backend task SG). Idle floor estimate ~$45/month, dominated by ALB + Fargate + RDS.
- **CD trigger.** `workflow_dispatch` only. The trigger gate is the M10 cost-control mechanism.
- **Region.** `us-east-1`. Pinned via `var.region` default.
- **Secrets.** Runtime secrets in SSM Parameter Store (SecureString); written out-of-band so values stay out of Terraform state. CI identity via GitHub OIDC, not long-lived access keys.
- **Demo-only.** `infra/README.md` documents the teardown recipe (`terraform destroy` immediately after demo screenshots) and every cost/security tradeoff (single-AZ, no Multi-AZ, no auto-scaling, no remote state, plain HTTP on the ALB).

---

Expand All @@ -45,8 +47,8 @@
| M6 | Workflow engine | `feat/m06-workflow-engine` | ☑ merged | [#7](https://github.com/div0rce/sentinel/pull/7) | 2026-05-29 |
| M7 | Audit log + HITL | `feat/m07-audit-hitl` | ☑ merged | [#8](https://github.com/div0rce/sentinel/pull/8) | 2026-05-29 |
| M8 | Frontend | `feat/m08-frontend` | ☑ merged | [#9](https://github.com/div0rce/sentinel/pull/9) | 2026-05-29; perf follow-up [#11](https://github.com/div0rce/sentinel/pull/11) |
| M9 | Evaluation harness | `feat/m09-eval` | ◐ complete on branch (PR open) | _filled in after `gh pr create`_ | 2026-05-29 |
| M10 | Deploy (Docker/Terraform/CD) | `feat/m10-deploy` | ☐ | — | |
| M9 | Evaluation harness | `feat/m09-eval` | ☑ merged | [#12](https://github.com/div0rce/sentinel/pull/12) | 2026-05-29; real-provider numbers tracked in [#13](https://github.com/div0rce/sentinel/issues/13) |
| M10 | Deploy (Docker/Terraform/CD) | `feat/m10-deploy` | ◐ complete on branch (PR open) | _filled in after `gh pr create`_ | 2026-05-29; code-only — no apply ran |
| M11 | Docs + diagram + demo | `feat/m11-docs-demo` | ☐ | — | |

Status key: ☐ not started · ◐ in progress · ☑ merged
Expand Down
68 changes: 68 additions & 0 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# syntax=docker/dockerfile:1.7
# ---------- builder ----------
FROM python:3.12-slim AS builder

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
UV_LINK_MODE=copy \
UV_PYTHON_DOWNLOADS=never

# Install build essentials only; psycopg[binary] ships its own libpq wheel so we
# don't need libpq-dev / build-essential at runtime.
RUN --mount=type=cache,target=/var/cache/apt \
--mount=type=cache,target=/var/lib/apt \
apt-get update && apt-get install -y --no-install-recommends \
ca-certificates curl \
&& rm -rf /var/lib/apt/lists/*

# Pinned uv release; matches the local toolchain. Upgrade in lockstep with CI.
ADD https://astral.sh/uv/0.4.24/install.sh /uv-installer.sh
RUN sh /uv-installer.sh && rm /uv-installer.sh
ENV PATH="/root/.local/bin:${PATH}"

WORKDIR /app

# Resolve dependencies into a wheel cache first; only the lockfile gates the cache.
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-install-project --no-dev

# Copy application source last so a code-only change does not invalidate the
# dependency layer.
COPY backend ./backend
COPY alembic.ini ./alembic.ini

# ---------- runtime ----------
FROM python:3.12-slim AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PORT=8000 \
SENTINEL_LOG_FORMAT=json

# Non-root user; matches "no root by default" container hygiene.
RUN groupadd --system --gid 1000 sentinel \
&& useradd --system --uid 1000 --gid sentinel --create-home --shell /usr/sbin/nologin sentinel

WORKDIR /app

# Bring in the resolved venv + source from the builder.
COPY --from=builder /app /app

# Drop privileges before any further setup.
USER sentinel

# Use the venv-managed python; honour $PORT for ECS service-port flexibility.
ENV PATH="/app/.venv/bin:${PATH}"

EXPOSE 8000

# Liveness probe matches the FastAPI /health endpoint shipped in M0.
HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
CMD python -c "import sys, urllib.request; \
urllib.request.urlopen(f'http://127.0.0.1:{__import__(\"os\").environ.get(\"PORT\",\"8000\")}/health', timeout=3); \
sys.exit(0)" || exit 1

# Single uvicorn worker is fine for the demo; ECS scales horizontally on tasks.
CMD ["sh", "-c", "uvicorn backend.app.main:app --host 0.0.0.0 --port ${PORT:-8000}"]
15 changes: 12 additions & 3 deletions backend/app/main.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,30 @@
"""FastAPI application entrypoint for Sentinel.

M0 added the liveness probe. M3 wired in the citation-grounded RAG endpoint at
M0 added the liveness probe. M3 wired the citation-grounded RAG endpoint at
``POST /query``. M4 added schema-constrained extraction at ``POST /extract``.
M7 added the human-in-the-loop review queue at ``GET /review`` and
``POST /review/{id}/approve|reject``. M8 adds dashboard KPI feeds at
``GET /dashboard/{volume,categories,confidence,sla}``; the React UI consumes them.
``POST /review/{id}/approve|reject``. M8 added dashboard KPI feeds at
``GET /dashboard/{volume,categories,confidence,sla}``. M10 adds structured
logging + the request-id middleware so every log line carries the request id
and every response surfaces it on ``X-Request-Id``.
"""

from fastapi import FastAPI

from backend.app.observability import RequestIdMiddleware, configure_logging
from backend.app.routers.dashboard import router as dashboard_router
from backend.app.routers.extract import router as extract_router
from backend.app.routers.query import router as query_router
from backend.app.routers.review import router as review_router

configure_logging()

app = FastAPI(title="Sentinel", version="0.1.0")

# Add the request-id middleware *before* including routers so every handler runs
# with the structlog context bound.
app.add_middleware(RequestIdMiddleware)

app.include_router(query_router)
app.include_router(extract_router)
app.include_router(review_router)
Expand Down
Loading
Loading