From 2b636fbddc1589174c00c81fa940239c81211a75 Mon Sep 17 00:00:00 2001 From: Taleef Date: Wed, 3 Jun 2026 22:46:02 +0500 Subject: [PATCH 1/3] docs(journal): log 2026-06-03 deploy fix + CI 3.8x speedup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add JOURNAL entries for the two post-Sprint-8 efforts: - MIE Container Manager deploy fix (v1 API migration: /api/v1 base, {data} envelope, array-shaped create services) — PRs #55/#56, deploy green. - CI test suite 44min -> 11m30s via fixing per-test full-population reruns (@BeforeAll once-per-class) plus 8-way test sharding — PR #57, all 239 pass. Co-Authored-By: Claude Opus 4.8 --- docs/JOURNAL.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/docs/JOURNAL.md b/docs/JOURNAL.md index 2edb23c..830ceb7 100644 --- a/docs/JOURNAL.md +++ b/docs/JOURNAL.md @@ -1,5 +1,39 @@ # Journal +## 2026-06-03 — CI test suite 3.8x faster (test sharding + per-test population-run fix) + +### What changed + +- Root cause of the ~44 min CI: the backend `./gradlew test` step dominated wall-clock (frontend ~50s, E2E manual). Per-class timing showed a few integration tests re-ran a full-population CQL evaluation (~70s) in `@BeforeEach`, once per test method. + - `EvidenceAccessIntegrationTest` ran it 14x (~1022s); converted to one shared run via `@BeforeAll` + `@TestInstance(PER_CLASS)` — its tests are read-only on the population and filter audit by their own upload id → ~71s. + - `CaseFlowRerunIntegrationTest` ran it 5x (~422s); each test targets a distinct outcome-type case with non-overlapping mutations, so one shared run suffices → ~146s. + - `ScopedRunIntegrationTest`, `CaseUpsertIntegrationTest`, `Major1PopulationIntegrationTest` left as-is — their reruns are the behavior under test (idempotency, scoped-run parity, empty-table historical seed) and need per-test isolation. +- `.github/workflows/ci.yml`: backend job is now an 8-way matrix; only shard 0 writes the Gradle cache; added a per-class timing diagnostic step. +- `backend/build.gradle.kts`: `Test.include(Spec)` assigns each test class to a shard by stable path hash (`TEST_SHARD_TOTAL`/`TEST_SHARD_INDEX`); CI forks 4-wide with a 1.5g per-fork heap cap; `GRADLE_TEST_FORKS` override. Local runs (no shard env) unchanged. + +### Result / Verification + +- Wall-clock 44 min → 11m30s (~3.8x); CI green on `main`. +- All 239 backend tests pass; per-shard counts sum to 239 (no tests dropped). +- Remaining ceiling is `ScopedRunIntegrationTest` (~635s); a single class runs in one fork, so further gains require splitting it (deferred). +- Shipped in PR #57. + +## 2026-06-03 — MIE Container Manager deploy fix (v1 API migration) + +### What changed + +The MIE Create-a-Container manager API changed under us; the `deploy-twh-mieweb` backend-container job failed three times. + +- `.github/scripts/deploy-mieweb-container.sh`: + - API base normalized to `/api/v1` — the origin now serves the SPA web UI, `/api` serves Swagger, and the JSON REST API is at `/api/v1` (PR #55). + - Migrated to the v1 contract (PR #56): responses are wrapped in a `{"data": ...}` envelope (`.data[]`, `.data.externalDomains[]`); create body uses `template` (not `template_name`) with `services` as an array of flat objects; job polling reads `.data.status` (success value is `"success"`); create-response job id from `.data.jobId`; container URL from `.data[].httpEntries[0].externalUrl`. + - Shapes verified against the live manager API and the manager's own SPA client. + +### Verification + +- Post-merge `deploy-twh-mieweb` run green end-to-end (build + deploy backend + deploy frontend). +- Live: `GET https://twh-api.os.mieweb.org/actuator/health` → `200 {"status":"UP"}`; frontend → `200`. + ## 2026-06-03 — Sprint 8 scoped run parity (SITE/EMPLOYEE end-to-end + rerun support) ### What changed From c824b467e8017781755ad932067e752aa8cf3d36 Mon Sep 17 00:00:00 2001 From: Taleef Date: Mon, 8 Jun 2026 09:37:53 -0400 Subject: [PATCH 2/3] docs: reconcile living docs to MIE-only deployment and post-Sprint-7 state Bring CLAUDE.md, AGENTS.md, README, DEPLOY, the sprint index, CHANGELOG, .env.example, JOURNAL, and the demo/MCP guides into agreement with the current codebase and the single live MIE TWH deployment. - Stack facts: Next.js 16 + React 19; Spring AI OpenAI starter (gpt-5.4-nano / gpt-4o-mini); measure catalog 60 total / 49 CMS eCQM; SendGrid env var name corrected. - Deployment: MIE Create-a-Container documented as the sole live stack; the decommissioned Vercel + Fly.io stack moved to a labeled historical appendix in DEPLOY.md and bannered in DEPLOY_OS_MIEWEB.md; README surfaces, .env.example CORS origin, and guide hostnames updated. - Status: Sprint 7 closed + Sprint 8 scoped-run parity, CI test-sharding speedup, and the MIE v1 deploy migration reflected across CLAUDE.md, README, sprint index, and CHANGELOG. - DEMO_RUNBOOK refactored from hardcoded (ephemeral) UUIDs to an auth + capture procedure verified against the live API response shapes. No code changes; documentation only. Co-Authored-By: Claude Opus 4.8 --- .env.example | 10 +- AGENTS.md | 34 ++-- CHANGELOG.md | 11 ++ CLAUDE.md | 49 +++--- README.md | 12 +- docs/DEMO_RUNBOOK.md | 72 +++++---- docs/DEPLOY.md | 328 +++++++++++++++++++------------------- docs/DEPLOY_OS_MIEWEB.md | 8 + docs/JOURNAL.md | 26 +++ docs/MCP.md | 2 +- docs/WALKTHROUGH_GUIDE.md | 32 ++-- docs/sprints/README.md | 2 +- 12 files changed, 323 insertions(+), 263 deletions(-) diff --git a/.env.example b/.env.example index 7c8850e..754cbe7 100644 --- a/.env.example +++ b/.env.example @@ -1,26 +1,26 @@ # WorkWell Measure Studio environment template # Copy to environment-specific secret stores; do not commit real values. -# Fly backend runtime +# Backend runtime (MIE container) DATABASE_URL= DATABASE_URL_DIRECT= OPENAI_API_KEY= SPRING_PROFILES_ACTIVE=prod WORKWELL_AUTH_ENABLED=true WORKWELL_AUTH_JWT_SECRET=replace-with-a-strong-random-secret-at-least-32-characters -# Frontend (Vercel) and backend (Fly) are different sites, so the refresh cookie -# must be SameSite=None + Secure or the browser never sends it on the cross-site +# The frontend and backend run on split origins (twh.os.mieweb.org / twh-api.os.mieweb.org), +# so the refresh cookie must be SameSite=None + Secure or the browser never sends it on the # /api/auth/refresh fetch (silent refresh fails, users get logged out on reload). # Production startup fails fast if these are not set to None/true. WORKWELL_AUTH_COOKIE_SAME_SITE=None WORKWELL_AUTH_COOKIE_SECURE=true -WORKWELL_CORS_ALLOWED_ORIGINS=https://frontend-seven-eta-24.vercel.app +WORKWELL_CORS_ALLOWED_ORIGINS=https://twh.os.mieweb.org WORKWELL_DEMO_ENABLED=false WORKWELL_DEMO_ALLOW_PUBLIC_DEMO=false # Optional safety override if a deployment does not use Spring profiles: # WORKWELL_ENVIRONMENT=production -# Vercel frontend +# Frontend (MIE container) NEXT_PUBLIC_API_BASE_URL= NEXT_PUBLIC_APP_NAME=WorkWell Measure Studio NEXT_PUBLIC_DEMO_MODE=false diff --git a/AGENTS.md b/AGENTS.md index 08fdc48..16b2fdd 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,39 +4,39 @@ Operating manual for any AI coding agent (Claude Code, Codex, Cursor, etc.) work ## What this project is - Single-developer Spring Boot + Next.js monorepo -- Goal: implement the gaps and improvements identified in `docs/sprints/` to showcase and overdeliver on the project's original vision -- Build phase: sprint-based feature implementation — see `docs/sprints/README.md` for the ordered work queue +- Goal: keep the merged WorkWell Measure Studio MVP stable, showcaseable, and easy to review +- Phase (as of 2026-06-08): all planned sprints (0–7) are merged to `main`; active work is post-merge closeout and polish. `docs/sprints/` is historical context now, not an active queue. ## Read before any task -1. `docs/sprints/README.md` — sprint index and critical path. This is your active work queue. -2. The specific sprint file for the issue you're working on (e.g., `docs/sprints/SPRINT_00_critical_demo_fixes.md`) -3. `docs/JOURNAL.md` — latest state of the project -4. `README.md` — public project overview and API surface +1. `docs/JOURNAL.md` — latest state of the project (newest entry on top). This is the current source of truth. +2. `CLAUDE.md` — current focus, hard rules, and build/verify commands +3. `README.md` — public project overview and API surface +4. `docs/sprints/README.md` — historical sprint index (all sprints merged; reference only) `docs/archive/SPIKE_PLAN.md` and `docs/archive/PROJECT_PLAN_v1.md` are historical only — do not act on them. -## Sprint execution protocol -- Work **one sprint at a time**, in the order defined in `docs/sprints/README.md` -- Within a sprint, work **one issue at a time** from top to bottom -- Every issue has an **Acceptance Criteria** checklist — every box must pass before the issue is done -- Create a feature branch per issue: `fix/sprint-0-` or `feat/sprint-1-` -- Open a PR for review after each issue — do not batch multiple issues into one PR unless they are tightly coupled (e.g., a migration + the service that uses it) -- **Stop and ask** before starting the next sprint — Taleef reviews before proceeding +## Feature work protocol +- Planned sprint work (0–7) is complete; new work is post-merge polish or follow-up features +- Work **one task at a time**; keep changes small and focused +- Where a sprint file defined acceptance criteria, every box must still pass before that work is considered done +- Create a feature branch per task: `fix/` or `feat/` +- Open a PR for review per task — do not batch unrelated changes; tightly coupled changes (e.g., a migration + the service that uses it) may share a PR +- **Stop and ask** before starting a new workstream — Taleef reviews before proceeding - Update `docs/JOURNAL.md` with a dated entry for everything that ships ## Tech stack (immutable without ADR in docs/DECISIONS.md) - Backend: Java 21, Spring Boot 3.x, Gradle Kotlin DSL, PostgreSQL 16, Flyway - CQL/FHIR: HAPI FHIR JPA + `org.opencds.cqf.fhir:cqf-fhir-cr` 3.26.0 (see CQF_FHIR_CR_REFERENCE.md) -- Frontend: Next.js 14+ App Router, TypeScript, Tailwind, shadcn/ui, Monaco -- AI: Spring AI (Anthropic), MCP via `io.modelcontextprotocol/java-sdk` -- Infra: Docker Compose local; Fly.io + Vercel + Neon prod; GitHub Actions; pnpm +- Frontend: Next.js 16 App Router + React 19, TypeScript, Tailwind, shadcn/ui, Monaco +- AI: Spring AI (OpenAI starter, `spring-ai-openai-spring-boot-starter`), MCP via `io.modelcontextprotocol/java-sdk` +- Infra: Docker Compose local; MIE Create-a-Container + Neon prod (Fly.io + Vercel preview decommissioned); GitHub Actions; pnpm ## Hard rules - Avoid new dependencies unless explicitly approved — if a sprint file calls for a dependency, it is pre-approved; anything else requires asking first - One Spring Boot app, modular packages — no microservices - Spring Application Events + DB audit log — no Kafka or external streaming - Auth: JWT refresh token flow (HttpOnly cookie, token rotation, `/api/auth/refresh`) is approved and specified in Sprint 4. User accounts remain hardcoded — no SSO, no real user directory. -- Email: `WORKWELL_EMAIL_PROVIDER=simulated` is the mandatory default on the demo stack. Do not set `SENDGRID_API_KEY` in any demo environment config. +- Email: `WORKWELL_EMAIL_PROVIDER=simulated` is the mandatory default on the demo stack. Do not set `WORKWELL_EMAIL_SENDGRID_API_KEY` in any demo environment config. - AI never decides compliance (docs/AI_GUARDRAILS.md). CQL engine is sole source of truth. - Every state change writes `audit_event` — no exceptions - No silent scope changes — if something in a sprint file doesn't match the codebase, stop and report before proceeding diff --git a/CHANGELOG.md b/CHANGELOG.md index 72442e6..dc06ba7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,9 +7,20 @@ and this project follows [Semantic Versioning](https://semver.org/) intent for r ## [Unreleased] +### Added +- Scoped-run parity (Sprint 8): `SITE` and `EMPLOYEE` manual runs and same-scope reruns now route through the async run-job path, matching `ALL_PROGRAMS`/`MEASURE`; the `/runs` UI exposes the new scopes. + ### Changed +- CI backend test suite ~3.8× faster (44m → 11m30s) via 8-way test sharding plus a per-class population-run fix (PR #57). +- Deployment consolidated onto MIE Create-a-Container; the Vercel + Fly.io public-preview stack is decommissioned. Living docs (README, DEPLOY, ARCHITECTURE, CLAUDE, AGENTS, sprint index) reconciled to the single live MIE TWH stack. - Repository standards polish: badges, contribution/security/support docs, community templates, and metadata alignment. +### Fixed +- MIE Container Manager deploy migrated to the v1 API contract (`/api/v1` base, `{"data": ...}` envelope, `template`/`services` create body, `.data.status` polling) after the manager API changed (PRs #55, #56). + +### Docs +- Synced CLAUDE.md, AGENTS.md, README, DEPLOY, and the sprint index to the post-Sprint-7 / Sprint-8 state (measure catalog 60/49, Next.js 16 + React 19, OpenAI Spring AI starter, MIE-only deployment). + ## [2026-05-22] ### Added diff --git a/CLAUDE.md b/CLAUDE.md index 5d7e016..3709f30 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -13,16 +13,21 @@ ## Tech stack (immutable without ADR in docs/DECISIONS.md) - Backend: Java 21 + Spring Boot 3.x + Gradle Kotlin DSL + PostgreSQL 16 + Flyway - CQL/FHIR: HAPI FHIR JPA + `org.opencds.cqf.fhir:cqf-fhir-cr` 3.26.0 (see CQF_FHIR_CR_REFERENCE.md) -- Frontend: Next.js 14+ App Router + TypeScript + Tailwind + shadcn/ui + Monaco -- AI: Spring AI (Anthropic starter); MCP via `io.modelcontextprotocol/java-sdk` -- Infra: Docker Compose locally; Fly.io + Vercel + Neon for deploy; GitHub Actions CI; pnpm +- Frontend: Next.js 16 App Router + React 19 + TypeScript + Tailwind + shadcn/ui + Monaco +- AI: Spring AI (OpenAI starter, `spring-ai-openai-spring-boot-starter`); MCP via `io.modelcontextprotocol/java-sdk` +- Infra: Docker Compose locally; MIE Create-a-Container + Neon for deploy (Fly.io + Vercel public-preview stack decommissioned — MIE TWH is the sole live stack); GitHub Actions CI; pnpm + +## Build & verify +- Backend: `cd backend; .\gradlew.bat test` — 239 tests; CI shards 8-way. **Never run two backend `gradlew test` concurrently** (shared temp binary-results race). +- Frontend: `cd frontend; npm run lint; npm run build` +- Run the app: backend `.\gradlew.bat bootRun`; frontend `npm run dev` ## Hard rules - Avoid new dependencies unless they are explicitly approved and documented - One Spring Boot app, modular packages — no microservices - Spring Application Events + DB audit log — no Kafka or external streaming - Auth: user accounts remain hardcoded (no SSO, no real user directory). JWT refresh token flow (HttpOnly cookie, token rotation, `/api/auth/refresh`) is approved and implemented in Sprint 4 — this replaces the prior "stub auth only" constraint. -- Email: `WORKWELL_EMAIL_PROVIDER=simulated` is the default and must remain so on the demo stack. SendGrid wiring exists in the code (Sprint 6) but must not be activated unless `SENDGRID_API_KEY` is explicitly set in a non-demo environment. +- Email: `WORKWELL_EMAIL_PROVIDER=simulated` is the default and must remain so on the demo stack. SendGrid wiring exists in the code (Sprint 6) but must not be activated unless `WORKWELL_EMAIL_SENDGRID_API_KEY` is explicitly set (with `WORKWELL_EMAIL_PROVIDER=sendgrid`) in a non-demo environment. - AI never decides compliance (see docs/AI_GUARDRAILS.md). CQL engine is sole source of truth. - Every state change writes `audit_event` — no exceptions - No silent scope changes. If a stop condition triggers, document fallback in JOURNAL.md. @@ -71,40 +76,30 @@ ## Other docs to consult on demand - @docs/archive/SPIKE_PLAN.md — archived sprint context -- @docs/DEPLOY.md — Vercel + Fly + Neon setup, env vars, rollback -- @docs/MEASURES.md — the 4 demo measures in plain English +- @docs/DEPLOY.md — MIE Create-a-Container + Neon setup, env vars, rollback +- @docs/MEASURES.md — the TWH measure catalog (60 measures) in plain English - @docs/ARCHITECTURE.md — system architecture diagrams + boundaries - @docs/DATA_MODEL.md — schema invariants - @docs/AI_GUARDRAILS.md — AI usage policy - @docs/CQF_FHIR_CR_REFERENCE.md — proven library wiring from spike - @README.md — quickstart -## Current Focus (as of 2026-05-21) - -**All planned sprints merged. TWH consolidation complete. Sprint 7 (overdelivery) is next.** +## Current Focus (as of 2026-06-08) -Sprints merged (all into `main`): -- Sprint 0 (bugs) → PR #16 -- Sprint 2 (data) → PR #17 -- Sprint 1 (pipeline) → PR #18 -- Sprint 3 (employee/SLA) → PR #19 -- Sprint 4 (security) → PR #20 -- Sprint 6 (admin) → PR #21 -- Sprint 5 (tests/CI) → PR #22 -- eCQM + TWH instance support → PR #46 (merged to main) +**All sprints through Sprint 7 are merged and closed; Sprint 8 scoped-run parity has landed. The stack is in post-merge polish / showcase mode.** -Post-merge work completed (all on `main`): -- Real-time run progress (spinner, live timer, auto-reload) -- AI integration health check fix (GET /v1/models) -- TWH consolidation: single MIE container, 47 CMS eCQMs seeded in catalog -- Fly.io decommissioned; MIE TWH is sole deployment +History (all on `main`): +- Sprints 0–6 → PRs #16–#22; eCQM + TWH instance support → PR #46 +- Sprint 7 overdelivery (AI Draft CQL, AI Test Fixtures, Risk Scoring, MAT Export, Mobile Responsive) → issues #47–#51, closed +- Sprint 8 scoped-run parity: `SITE`/`EMPLOYEE` manual runs + rerun now route through the async run-job path +- CI test suite 3.8x faster via 8-way test sharding (44m → 11m30s) → PR #57 +- MIE Container Manager deploy migrated to the v1 API envelope → PRs #55, #56 Current posture: - **Live URL:** `https://twh.os.mieweb.org` — login: `admin@workwell.dev` / `Workwell123!` -- **Deployment:** MIE Create-a-Container only (`deploy-twh-mieweb.yml`); triggers on every push to `main` -- **Measure catalog:** 58 total — 4 OSHA active (CQL), 3 OSHA catalog, 4 HEDIS wellness active (CQL), 47 CMS eCQM Draft entries +- **Deployment:** MIE Create-a-Container only (`deploy-twh-mieweb.yml`); triggers on every push to `main`. The earlier Fly.io + Vercel public-preview stack is decommissioned; MIE TWH is the sole live stack. +- **Measure catalog:** 60 total — 4 OSHA active (CQL), 3 OSHA catalog, 4 HEDIS wellness active (CQL), 49 CMS eCQM Draft entries +- **Supported run scopes:** `ALL_PROGRAMS`, `MEASURE`, `SITE`, `EMPLOYEE`, `CASE` - `main` is fully up to date; no open feature branches - Schema migrations are owned by Taleef — stop and ask before writing any `V0xx__*.sql` file -- Sprint 7 spec is in `docs/sprints/SPRINT_07_overdelivery_features.md` — 5 issues (AI Draft CQL, AI Test Fixtures, Risk Scoring, MAT Export, Mobile Responsive) - Treat `docs/archive/SPIKE_PLAN.md` as historical context only - diff --git a/README.md b/README.md index 5ff5b27..3be33d7 100644 --- a/README.md +++ b/README.md @@ -21,16 +21,16 @@ WorkWell Measure Studio is a Spring Boot + Next.js monorepo for **Total Worker H ## Status -- Sprint queue through **Sprint 7** is implemented in the repo. -- Sprint 7 issues `#47`-`#51` are completed and closed. +- All planned sprints (**0–7**) are implemented and merged to `main`; Sprint 7 issues `#47`–`#51` are closed. +- Post-merge work continues on `main`: Sprint 8 scoped-run parity (`SITE`/`EMPLOYEE` manual runs + reruns), an 8-way CI test-sharding speedup (~3.8×), and the MIE Container Manager v1 API deploy migration. - Default branch: `main` only (stale sprint branches cleaned up). ## Production surfaces -- Primary demo frontend: `https://twh.os.mieweb.org` -- Primary demo backend API: `https://twh-api.os.mieweb.org` -- Public preview frontend: `https://workwell-measure-studio.vercel.app` -- Public preview backend API: `https://workwell-measure-studio-api.fly.dev` +- Live frontend: `https://twh.os.mieweb.org` +- Live backend API: `https://twh-api.os.mieweb.org` + +> The earlier Vercel + Fly.io public-preview stack (`workwell-measure-studio.vercel.app`, `workwell-measure-studio-api.fly.dev`) is **decommissioned**. MIE TWH is the sole live deployment. ## Technology stack diff --git a/docs/DEMO_RUNBOOK.md b/docs/DEMO_RUNBOOK.md index 1032154..05a383e 100644 --- a/docs/DEMO_RUNBOOK.md +++ b/docs/DEMO_RUNBOOK.md @@ -1,37 +1,51 @@ -# Last verified: 2026-05-07 +# Last updated: 2026-06-08 (capture-based; originally verified 2026-05-07 on the legacy stack) # Demo Runbook (Production) +> **Stack note:** URLs point to the live MIE TWH stack. Run/case IDs are environment-specific and +> change every run, so this runbook captures them at demo time via the API (see "Capture current +> IDs") rather than hardcoding them. + ## Production Surfaces -- Frontend: `https://workwell-measure-studio.vercel.app` -- Backend API: `https://workwell-measure-studio-api.fly.dev` +- Frontend: `https://twh.os.mieweb.org` +- Backend API: `https://twh-api.os.mieweb.org` + +## Capture current IDs (run at demo time) -## Pinned Production IDs +Run and case IDs are environment-specific and change on every run, so capture them live rather +than relying on pinned values. All `/api/**` calls require a bearer token. -### Measures -- Audiogram: `4ae5d865-3d64-4a17-905d-f1b315a037e2` -- TB Surveillance: `8c9fda6f-b9bb-413a-be4d-8ce4faa72999` -- HAZWOPER Surveillance: `eaa81302-b6f6-4aba-a143-bb72941f9c00` -- Flu Vaccine: `9db33281-0933-4dd6-86e9-e4c6df2b9a94` +```bash +# 1) Mint an access token (admin or case-manager account) +TOKEN=$(curl -fsS -X POST https://twh-api.os.mieweb.org/api/auth/login \ + -H 'Content-Type: application/json' \ + -d '{"email":"admin@workwell.dev","password":"Workwell123!"}' | jq -r .token) -### Latest run IDs (per measure query, `limit=1`) -- Audiogram latest run: `3866d69a-2519-4051-bad0-98da9ea696bf` -- TB Surveillance latest run: `fba26713-92ff-49e3-84d0-fa8d137881f7` -- HAZWOPER Surveillance latest run: `3866d69a-2519-4051-bad0-98da9ea696bf` -- Flu Vaccine latest run: `3866d69a-2519-4051-bad0-98da9ea696bf` +# 2) Measure IDs (names are stable across reseeds; UUIDs differ per instance) +curl -fsS https://twh-api.os.mieweb.org/api/measures \ + -H "Authorization: Bearer $TOKEN" | jq -r '.[] | "\(.name): \(.id)"' -### Pinned Audiogram open case for MCP `explain_outcome` -- Case ID: `32fee6f4-6e69-4675-b44e-5f6392de7dbd` -- Employee: `emp-006` (Omar Siddiq) -- Outcome status: `OVERDUE` +# 3) Latest run ID for a measure (paste a measure ID from step 2) +MEASURE_ID= +curl -fsS "https://twh-api.os.mieweb.org/api/runs?measureId=$MEASURE_ID&limit=1" \ + -H "Authorization: Bearer $TOKEN" | jq -r '.[0].id' + +# 4) An open Audiogram case ID (for MCP explain_outcome) +curl -fsS "https://twh-api.os.mieweb.org/api/cases?status=open&measureName=Audiogram" \ + -H "Authorization: Bearer $TOKEN" | jq -r '.[0].id' +``` + +Stable seeded reference: employee `emp-006` (Omar Siddiq) carries an Audiogram `OVERDUE` outcome +and is a reliable persona for a deterministic `explain_outcome` demo — external IDs survive +reseeds, the case UUID does not. ## Pre-flight Smoke Check (curl) ```bash -curl -fsS https://workwell-measure-studio-api.fly.dev/actuator/health -curl -fsS https://workwell-measure-studio-api.fly.dev/api/measures -curl -fsS "https://workwell-measure-studio-api.fly.dev/api/runs?measureId=4ae5d865-3d64-4a17-905d-f1b315a037e2&limit=1" -curl -fsS "https://workwell-measure-studio-api.fly.dev/api/cases?status=open&measureName=Audiogram" +curl -fsS https://twh-api.os.mieweb.org/actuator/health +curl -fsS https://twh-api.os.mieweb.org/api/measures -H "Authorization: Bearer $TOKEN" +curl -fsS "https://twh-api.os.mieweb.org/api/runs?measureId=$MEASURE_ID&limit=1" -H "Authorization: Bearer $TOKEN" +curl -fsS "https://twh-api.os.mieweb.org/api/cases?status=open&measureName=Audiogram" -H "Authorization: Bearer $TOKEN" ``` Expected: @@ -42,18 +56,18 @@ Expected: ## 30-Minute Pre-Demo Checklist - Verify backend is up (`/actuator/health` is UP). -- Verify frontend opens at `https://workwell-measure-studio.vercel.app/programs`. +- Verify frontend opens at `https://twh.os.mieweb.org/programs`. - Verify all 4 measures are Active in `GET /api/measures`. - Verify at least one open Audiogram case exists. -- Verify MCP server is running and can execute: - - `list_measures` - - `get_run_summary` - - `explain_outcome` with case ID `32fee6f4-6e69-4675-b44e-5f6392de7dbd` +- Capture a current run ID and open Audiogram case ID (see "Capture current IDs"). +- Verify MCP server is running and can execute `list_measures`, `get_run_summary`, `explain_outcome`. ## Reference MCP Calls for Live Demo +Substitute the IDs captured above: + ```text list_measures -get_run_summary {"runId":"3866d69a-2519-4051-bad0-98da9ea696bf"} -explain_outcome {"caseId":"32fee6f4-6e69-4675-b44e-5f6392de7dbd"} +get_run_summary {"runId":""} +explain_outcome {"caseId":""} ``` diff --git a/docs/DEPLOY.md b/docs/DEPLOY.md index 572daa1..6b6d964 100644 --- a/docs/DEPLOY.md +++ b/docs/DEPLOY.md @@ -4,11 +4,15 @@ **Status:** Current deployment reference for the merged WorkWell Measure Studio stack. **Cost target:** keep the live stack under about $25/month. +> The MIE TWH stack below is the **sole live deployment**. The earlier Vercel + Fly.io +> public-preview stack is **decommissioned** — its setup is retained only as +> [Appendix A](#appendix-a--decommissioned-vercel--flyio-stack-historical-reference) for historical reference. + --- -## MIE Create-a-Container Deployment (Primary Demo Stack) +## MIE Create-a-Container Deployment (sole live stack) -The primary demo deployment runs on MIE's internal container platform (`os.mieweb.org`). +The deployment runs on MIE's internal container platform (`os.mieweb.org`). **One instance only: TWH** — Total Worker Health. Encompasses all OSHA safety + eCQM wellness measures. | Service | Hostname | Image | @@ -21,7 +25,12 @@ The primary demo deployment runs on MIE's internal container platform (`os.miewe Push to `main` triggers `.github/workflows/deploy-twh-mieweb.yml` which: 1. Builds the backend image tagged with `latest` + `sha-` 2. Builds the frontend image with TWH branding baked in via build-args -3. Deploys both containers to MIE via `deploy-mieweb-container.sh` +3. Deploys both containers to MIE via `.github/scripts/deploy-mieweb-container.sh` + +The deploy script talks to the MIE Container Manager **v1 API** (`/api/v1`): +responses are wrapped in a `{"data": ...}` envelope, the create body uses `template` with +`services` as an array of flat objects, and job polling reads `.data.status` (success value +`"success"`). See the 2026-06-03 JOURNAL entries for the v1 migration details (PRs #55, #56). ### Required GitHub Secrets for MIE deploy @@ -33,6 +42,26 @@ Push to `main` triggers `.github/workflows/deploy-twh-mieweb.yml` which: | `OPENAI_API_KEY` | AI services (Draft Spec, Explain Why Flagged) | | `WORKWELL_AUTH_JWT_SECRET_TWH` | JWT signing secret for TWH instance | +The deploy workflow maps these `*_TWH` GitHub secrets onto the backend container's runtime +environment variable names (e.g. `DATABASE_URL_TWH` → `DATABASE_URL`, +`WORKWELL_AUTH_JWT_SECRET_TWH` → `WORKWELL_AUTH_JWT_SECRET`) used in the +[environment variables reference](#environment-variables-reference) below. + +### Backend runtime configuration (set by the workflow / container) + +- `WORKWELL_INSTANCE=twh` — selects TWH seeding (see below) +- `SPRING_PROFILES_ACTIVE=prod` +- `WORKWELL_AUTH_ENABLED=true`, `WORKWELL_AUTH_JWT_SECRET=` +- `WORKWELL_AUTH_COOKIE_SAME_SITE=None`, `WORKWELL_AUTH_COOKIE_SECURE=true` +- `WORKWELL_EMAIL_PROVIDER=simulated` (must stay `simulated` on the demo stack) + +> **Refresh-cookie config:** the refresh-token cookie is set `SameSite=None; Secure`, and +> production startup **fails fast** if `WORKWELL_AUTH_COOKIE_SAME_SITE` is not `None` or +> `WORKWELL_AUTH_COOKIE_SECURE` is not `true`. With the frontend (`twh.os.mieweb.org`) and +> API (`twh-api.os.mieweb.org`) on split origins, this is what lets the browser send the +> cookie on the `POST /api/auth/refresh` fetch — otherwise silent token refresh fails and +> users are logged out on every reload. + ### Instance seeding The backend detects `WORKWELL_INSTANCE=twh` (set in the workflow) and seeds: @@ -40,116 +69,39 @@ The backend detects `WORKWELL_INSTANCE=twh` (set in the workflow) and seeds: - 4 HEDIS wellness catalog measures (Cholesterol, BMI, Diabetes HbA1c, Hypertension) - All 49 CMS eCQM catalog entries (Draft, awaiting CQL authoring) +Total catalog: **60 measures** (see `docs/MEASURES.md` for the full breakdown). + ### Manual re-deploy (force update existing containers) Use `workflow_dispatch` with `replace_existing: true` from the GitHub Actions UI. --- -## Legacy/Public Preview Stack - -| Layer | Service | Tier | Cost | -|-------|---------|------|------| -| Frontend | Vercel | Hobby | $0 | -| Backend | Fly.io | shared-cpu-1x, 512MB | ~$2/mo | -| Postgres | Neon | Free | $0 (3GB cap) | -| AI | OpenAI API | direct, budget-capped | variable | -| Domain | Vercel subdomain | n/a | $0 | - -Fly 256MB free OOMs Spring Boot. Don't try. - -Fallback if Fly cost is a problem: Render free tier (cold-start tradeoff, ~30s first hit per inactive period). - -## Prerequisites - -- GitHub account, repo `workwell-measure-studio` -- Fly CLI: `iwr https://fly.io/install.ps1 -useb | iex` (Windows) or `curl -L https://fly.io/install.sh | sh` -- Vercel CLI: `pnpm i -g vercel` -- Neon account + project created -- OpenAI API key with a hard monthly budget cap set in console billing - -## One-time setup - -### Neon - -1. Create project `workwell-measure-studio`, region us-east, **Postgres 16** -2. Copy **pooled** connection string (for app) -3. Copy **direct** connection string (for Flyway migrations) -4. Save as repo secrets: `DATABASE_URL`, `DATABASE_URL_DIRECT` - -Do not use `neonctl projects create` unless it supports `pg_version=16`; the current CLI defaults to Postgres 17 and is not compliant with the locked stack. - -### Fly.io - -```bash -cd backend -fly launch --no-deploy -fly secrets set DATABASE_URL= -fly secrets set DATABASE_URL_DIRECT= -fly secrets set OPENAI_API_KEY= -fly secrets set SPRING_PROFILES_ACTIVE=prod -fly secrets set WORKWELL_AUTH_ENABLED=true -fly secrets set WORKWELL_AUTH_JWT_SECRET= -fly secrets set WORKWELL_AUTH_COOKIE_SAME_SITE=None -fly secrets set WORKWELL_AUTH_COOKIE_SECURE=true -``` - -> The frontend (Vercel) and backend (Fly) are different registrable domains, so -> every browser→API call is **cross-site**. The refresh-token cookie must be -> `SameSite=None; Secure` or the browser never sends it on the cross-site -> `POST /api/auth/refresh` fetch — silent token refresh fails and users are -> logged out on every page reload. Production startup now **fails fast** if -> `WORKWELL_AUTH_COOKIE_SAME_SITE` is not `None` or `WORKWELL_AUTH_COOKIE_SECURE` -> is not `true`. - -Edit `fly.toml`: `memory = "512mb"`, region = closest to you (e.g., `ord`, `iad`), and keep `min_machines_running = 1` if you need a stable remote MCP connection. - -Stop after wiring the secrets and project settings. Deploy only after the stack is provisioned and verified. - -First deploy verification: - -```bash -fly deploy -curl https://.fly.dev/actuator/health # expect {"status":"UP"} -``` - -### Vercel - -1. Import GitHub repo, root directory `frontend/` -2. Framework: Next.js (auto-detected) -3. Env vars: - - `NEXT_PUBLIC_API_BASE_URL` = Fly app URL (e.g., `https://workwell-measure-studio-api.fly.dev`) - - `NEXT_PUBLIC_APP_NAME` = `WorkWell Measure Studio` - - `NEXT_PUBLIC_DEMO_MODE` = `true` only for local/demo builds that should prefill the login form -4. Stop after project connection and env configuration. First deploy from `main` happens after the stack is provisioned and verified. - -### OpenAI - -1. Get API key from platform.openai.com -2. Set $20/mo hard usage limit in billing -3. Save as Fly secret only (never expose to frontend) - -## Env vars reference +## Environment variables reference | Var | Where | Purpose | |-----|-------|---------| -| `DATABASE_URL` | Fly | Pooled Neon connection for app runtime | -| `DATABASE_URL_DIRECT` | Fly | Direct Neon connection for Flyway migrations | -| `OPENAI_API_KEY` | Fly | AI calls (drafting and explanation surfaces) | -| `SPRING_PROFILES_ACTIVE` | Fly | Always `prod` in deployed env | -| `WORKWELL_AUTH_ENABLED` | Fly | Enable stub auth; set `true` in deployed env | -| `WORKWELL_AUTH_JWT_SECRET` | Fly | Required when auth is enabled; use a strong secret | -| `WORKWELL_AUTH_COOKIE_SAME_SITE` | Fly | Refresh-cookie SameSite. **Must be `None` in production** (cross-site Vercel↔Fly). Default `Lax` for local same-origin dev. | -| `WORKWELL_AUTH_COOKIE_SECURE` | Fly | Refresh-cookie Secure flag. **Must be `true` in production** (required for SameSite=None). Default `false` for local HTTP dev. | -| `NEXT_PUBLIC_API_BASE_URL` | Vercel | Backend URL for fetch calls | -| `NEXT_PUBLIC_APP_NAME` | Vercel | App display name | -| `NEXT_PUBLIC_DEMO_MODE` | Vercel | Prefill login form for local/demo builds only | -| `WORKWELL_EMAIL_PROVIDER` | Fly | Outreach email provider. **Stays `simulated` on the demo stack (default + CLAUDE.md hard rule).** | -| `WORKWELL_EMAIL_SENDGRID_API_KEY` | Fly | SendGrid API key. Wiring exists in code but **must remain unset on the demo stack**; only set in an explicit non-demo deployment alongside `WORKWELL_EMAIL_PROVIDER=sendgrid`. | -| `WORKWELL_EMAIL_FROM_ADDRESS` | Fly | From address for outreach (default `noreply@workwell-demo.dev`). | -| `WORKWELL_EMAIL_FROM_NAME` | Fly | From display name (default `WorkWell Measure Studio`). | - -`.env.example` at repo root mirrors this list (without values). At present, env vars must be verified manually before deploy; the existing CI workflow does not validate deployment secrets or Vercel env configuration. +| `DATABASE_URL` | Backend | Pooled Neon connection for app runtime | +| `DATABASE_URL_DIRECT` | Backend | Direct Neon connection for Flyway migrations | +| `OPENAI_API_KEY` | Backend | AI calls (drafting and explanation surfaces) | +| `SPRING_PROFILES_ACTIVE` | Backend | Always `prod` in deployed env | +| `WORKWELL_INSTANCE` | Backend | `twh` selects the TWH seed set | +| `WORKWELL_AUTH_ENABLED` | Backend | Enable auth; set `true` in deployed env | +| `WORKWELL_AUTH_JWT_SECRET` | Backend | Required when auth is enabled; use a strong secret | +| `WORKWELL_AUTH_COOKIE_SAME_SITE` | Backend | Refresh-cookie SameSite. **Must be `None` in production** (split frontend/API origins). Default `Lax` for local same-origin dev. | +| `WORKWELL_AUTH_COOKIE_SECURE` | Backend | Refresh-cookie Secure flag. **Must be `true` in production** (required for SameSite=None). Default `false` for local HTTP dev. | +| `NEXT_PUBLIC_API_BASE_URL` | Frontend | Backend URL for fetch calls (origin-only, no `/api` suffix, no trailing whitespace) | +| `NEXT_PUBLIC_APP_NAME` | Frontend | App display name | +| `NEXT_PUBLIC_DEMO_MODE` | Frontend | Prefill login form for local/demo builds only; `true` **fails the production frontend build** | +| `WORKWELL_EMAIL_PROVIDER` | Backend | Outreach email provider. **Stays `simulated` on the demo stack (default + CLAUDE.md hard rule).** | +| `WORKWELL_EMAIL_SENDGRID_API_KEY` | Backend | SendGrid API key. Wiring exists in code but **must remain unset on the demo stack**; only set in an explicit non-demo deployment alongside `WORKWELL_EMAIL_PROVIDER=sendgrid`. | +| `WORKWELL_EMAIL_FROM_ADDRESS` | Backend | From address for outreach (default `noreply@workwell-demo.dev`). | +| `WORKWELL_EMAIL_FROM_NAME` | Backend | From display name (default `WorkWell Measure Studio`). | + +`Where = Backend` vars are container environment on the MIE backend container (mapped from the +`*_TWH` GitHub secrets where applicable); `Where = Frontend` vars are build-args/env baked into +the MIE frontend image. `.env.example` at repo root mirrors this list (without values). Env vars +must be verified manually before deploy; the CI workflow does not validate deployment secrets. ### Email delivery (Sprint 6) @@ -164,6 +116,21 @@ Do not set `WORKWELL_EMAIL_SENDGRID_API_KEY` on the demo stack. The non-prod `POST /api/admin/demo-reset` endpoint (admin-only, `@Profile("!prod")`) truncates volatile demo tables including `audit_events`; it returns 403 under the `prod` profile. +## Neon (Postgres) + +1. Project `workwell-twh`, region us-east, **Postgres 16** +2. **Pooled** connection string → `DATABASE_URL_TWH` GitHub secret (app runtime) +3. **Direct** connection string → used for Flyway migrations (`DATABASE_URL_DIRECT`) + +Do not use `neonctl projects create` unless it supports `pg_version=16`; the CLI defaults to +Postgres 17 and is not compliant with the locked stack. + +## OpenAI + +1. Get API key from platform.openai.com +2. Set a hard monthly usage limit in billing +3. Store as the `OPENAI_API_KEY` GitHub secret only (never expose to the frontend) + ## CI/CD **Active deploy workflow:** `.github/workflows/deploy-twh-mieweb.yml` @@ -171,15 +138,15 @@ volatile demo tables including `audit_events`; it returns 403 under the `prod` p - Builds backend + frontend Docker images, pushes to GHCR, deploys both containers to MIE **CI workflow:** `.github/workflows/ci.yml` -- Runs backend build + tests +- Runs backend build + tests (8-way test sharding; ~11m30s wall-clock) - Runs frontend lint -- Does not deploy (deploy is separate workflow above) +- Does not deploy (deploy is the separate workflow above) ## Health checks -- Backend: `GET /actuator/health` → `{"status":"UP"}` -- Frontend: `GET /` → 200 OK -- DB: from Fly machine, `fly ssh console` → `psql $DATABASE_URL_DIRECT -c "SELECT 1"` +- Backend: `GET https://twh-api.os.mieweb.org/actuator/health` → `{"status":"UP"}` +- Frontend: `GET https://twh.os.mieweb.org/` → 200 OK +- DB: `psql "$DATABASE_URL_DIRECT" -c "SELECT 1"` from any host with the Neon direct string Post-deploy smoke checklist (MVP complete surface): - `GET /actuator/health` -> `200` @@ -194,53 +161,32 @@ Post-deploy smoke checklist (MVP complete surface): - `POST /api/cases/{id}/actions/outreach/delivery?deliveryStatus=SENT` -> `200` - `GET /api/cases/{id}` confirms `latestOutreachDeliveryStatus=SENT` -Add Fly HTTP check every 30s on `/actuator/health`. Free, alerts on 3 failures. - ## Rollback -### Fly -```bash -fly releases list -fly releases rollback -``` -Or redeploy a previous SHA: -```bash -git checkout -fly deploy -``` - -### Vercel -Dashboard → Deployments → previous → Promote to Production. +### MIE containers +- Revert the offending commit on `main` (re-triggers `deploy-twh-mieweb.yml`), or +- Re-run the deploy workflow via `workflow_dispatch` at an earlier SHA with `replace_existing: true`. + Each backend image is also tagged `sha-` in GHCR for pinning a known-good build. ### Neon -Each schema migration creates a branch. Promote previous branch to main from Neon dashboard. +Each schema migration creates a branch. Promote the previous branch to main from the Neon dashboard. ## Cost monitoring Daily check while the stack is live: -- **Fly dashboard:** Usage tab, projected monthly - **Neon dashboard:** storage + compute consumed - **OpenAI usage dashboard:** today's spend +- **MIE platform:** internal container hosting (no per-month cloud bill like the legacy Fly tier) If any approaches limit, fix that day. Don't wait. ## Troubleshooting -**Fly deploy fails with OOM** -- Verify `memory = "512mb"` in `fly.toml` -- Reduce JVM heap: `JAVA_OPTS=-Xmx384m -Xss256k` -- Check `fly logs` for OOMKilled - **Neon connection limit hit** -- Use pooled connection string (`DATABASE_URL`), not direct, in app +- Use the pooled connection string (`DATABASE_URL`), not direct, in the app - HikariCP `maximum-pool-size: 10` in `application.yml` -- Direct only for Flyway - -**Vercel build fails** -- Check Node version: 20+ -- Verify `NEXT_PUBLIC_API_BASE_URL` is set in Vercel env -- Clear build cache if backend types changed: Vercel dashboard → Settings → Clear Cache +- Direct connection only for Flyway **OpenAI 429** - One retry with exponential backoff (1s, 2s) @@ -250,33 +196,89 @@ If any approaches limit, fix that day. Don't wait. **Audit log missing entries after deploy** - Check Spring profile is `prod`, not `dev` -- Verify migration ran: `fly ssh console`, then `psql $DATABASE_URL_DIRECT -c "\dt"` -- Should see `audit_event` table +- Verify migrations ran: `psql "$DATABASE_URL_DIRECT" -c "\dt"` — should list `audit_events` **Case detail or outreach delivery endpoint returns 500 after deploy** - Check for SQL operator compatibility in prepared statements. -- PostgreSQL JSON existence should use `jsonb_exists(payload_json, 'key')` in JDBC query text rather than raw `?` operator when bind parameters are present. +- PostgreSQL JSON existence should use `jsonb_exists(payload_json, 'key')` in JDBC query text rather than the raw `?` operator when bind parameters are present. **MCP server can't be reached** -- MCP runs as separate process or endpoint (`/mcp`) -- Check Fly machine has port exposed if using stdio over HTTP -- Verify Claude Desktop config points to the deployed URL and sends an `Authorization` header with a valid WorkWell JWT -- If the machine is scaling to zero, keep `min_machines_running = 1` so the SSE transport stays available for remote clients - -## Domain (optional) - -Vercel subdomain `workwell-measure-studio.vercel.app` is fine for the demo. If buying a real domain later: -1. Buy on any registrar -2. Vercel: Settings → Domains → add, follow DNS instructions -3. Fly: `fly certs add api.`, follow DNS instructions -4. Update `NEXT_PUBLIC_API_BASE_URL` to new backend domain - -## Initial deployment notes - -- Confirm the active Vercel project is `workwell-measure-studio`. -- Confirm Vercel Root Directory is `frontend`. -- For the S0 `/runs` probe, validate preflight before debugging POST: - - `OPTIONS https://workwell-measure-studio-api.fly.dev/api/eval` - - Expect `200` plus `Access-Control-Allow-Origin`. -- If probe UI shows `404` while direct POST works, check CORS/security config and redeploy Fly backend. -- Keep `NEXT_PUBLIC_API_BASE_URL` as origin-only (for example `https://workwell-measure-studio-api.fly.dev`), with no `/api` suffix and no trailing whitespace. +- MCP is exposed at `/sse` + `/mcp/**` on the backend +- Verify the Claude Desktop config points to the deployed URL and sends an `Authorization` header with a valid WorkWell JWT +- Role gates apply: `/sse` and `/mcp/**` return 403 unauthenticated + +**Backend deploy job fails at the MIE manager API** +- Confirm the API base resolves to `/api/v1` (the origin serves the SPA; `/api` serves Swagger) +- Responses are `{"data": ...}` enveloped; the create body uses `template` + `services[]`; job polling reads `.data.status` (`"success"`) + +--- + +## Appendix A — Decommissioned Vercel + Fly.io stack (historical reference) + +> **Decommissioned — do not use.** None of the resources below are deployed any more. +> MIE TWH (above) is the sole live stack. This section is retained only so the earlier +> public-preview setup remains documented. Environment variable *names* are unchanged; +> on the current stack they are set on the MIE containers, not as Fly secrets or Vercel env. + +Legacy stack layout: + +| Layer | Service | Tier | Cost | +|-------|---------|------|------| +| Frontend | Vercel | Hobby | $0 | +| Backend | Fly.io | shared-cpu-1x, 512MB | ~$2/mo | +| Postgres | Neon | Free | $0 (3GB cap) | +| AI | OpenAI API | direct, budget-capped | variable | +| Domain | Vercel subdomain | n/a | $0 | + +Notes from that era: Fly 256MB free OOMs Spring Boot (use 512MB). Fallback if Fly cost was a +problem: Render free tier (cold-start tradeoff, ~30s first hit per inactive period). + +### Legacy prerequisites +- Fly CLI: `iwr https://fly.io/install.ps1 -useb | iex` (Windows) or `curl -L https://fly.io/install.sh | sh` +- Vercel CLI: `pnpm i -g vercel` + +### Legacy Fly.io setup + +```bash +cd backend +fly launch --no-deploy +fly secrets set DATABASE_URL= +fly secrets set DATABASE_URL_DIRECT= +fly secrets set OPENAI_API_KEY= +fly secrets set SPRING_PROFILES_ACTIVE=prod +fly secrets set WORKWELL_AUTH_ENABLED=true +fly secrets set WORKWELL_AUTH_JWT_SECRET= +fly secrets set WORKWELL_AUTH_COOKIE_SAME_SITE=None +fly secrets set WORKWELL_AUTH_COOKIE_SECURE=true +``` + +> On the legacy stack the frontend (Vercel) and backend (Fly) were different registrable +> domains, so every browser→API call was **cross-site** and the refresh-token cookie had to be +> `SameSite=None; Secure`. (The same production fail-fast check applies on MIE today.) + +`fly.toml`: `memory = "512mb"`, region closest to you (e.g., `ord`, `iad`), and +`min_machines_running = 1` for a stable remote MCP connection. + +```bash +fly deploy +curl https://.fly.dev/actuator/health # expect {"status":"UP"} +``` + +### Legacy Vercel setup + +1. Import GitHub repo, root directory `frontend/` +2. Framework: Next.js (auto-detected) +3. Env vars: `NEXT_PUBLIC_API_BASE_URL` = Fly app URL; `NEXT_PUBLIC_APP_NAME`; `NEXT_PUBLIC_DEMO_MODE` (local/demo only) + +### Legacy rollback +- **Fly:** `fly releases list` then `fly releases rollback `, or `git checkout && fly deploy` +- **Vercel:** Dashboard → Deployments → previous → Promote to Production + +### Legacy troubleshooting +- **Fly OOM:** verify `memory = "512mb"`; reduce heap `JAVA_OPTS=-Xmx384m -Xss256k`; check `fly logs` for OOMKilled +- **Vercel build fails:** Node 20+; verify `NEXT_PUBLIC_API_BASE_URL`; clear build cache if backend types changed +- **DB from Fly machine:** `fly ssh console` then `psql $DATABASE_URL_DIRECT -c "SELECT 1"` + +### Legacy domain / probe notes +- Vercel subdomain `workwell-measure-studio.vercel.app` was the demo frontend; Fly `workwell-measure-studio-api.fly.dev` the backend +- S0 `/runs` probe: `OPTIONS https://workwell-measure-studio-api.fly.dev/api/eval` expecting `200` + `Access-Control-Allow-Origin` diff --git a/docs/DEPLOY_OS_MIEWEB.md b/docs/DEPLOY_OS_MIEWEB.md index 47c93aa..5b1cdc1 100644 --- a/docs/DEPLOY_OS_MIEWEB.md +++ b/docs/DEPLOY_OS_MIEWEB.md @@ -1,5 +1,13 @@ # Deploying to MIE Open Source Proxmox +> **Historical / superseded (2026-06-08).** This is the *initial* additive OS-MIEWeb rollout +> playbook, written when the Vercel + Fly.io stack was still live and the MIE containers used the +> earlier `workwell` / `workwell-api` naming. The deployment has since consolidated onto the single +> **TWH** stack (`twh.os.mieweb.org` / `twh-api.os.mieweb.org`, workflow `deploy-twh-mieweb.yml`), +> and the Vercel + Fly.io stack is **decommissioned** — so the Vercel/Fly fallback in the Rollback +> section below no longer applies. See `docs/DEPLOY.md` for the current deployment reference. +> Retained for historical context only. + This runbook covers the additive OS MIEWeb deployment path for WorkWell Measure Studio. It does not replace the current Vercel frontend or Fly.io backend; those stay live during rollout and rollback. ## Target Architecture diff --git a/docs/JOURNAL.md b/docs/JOURNAL.md index 830ceb7..bfb8dbd 100644 --- a/docs/JOURNAL.md +++ b/docs/JOURNAL.md @@ -1,5 +1,31 @@ # Journal +## 2026-06-08 — Documentation sync: truth-up across the living docs + +### What changed + +Brought the living docs into agreement with the current codebase and the single MIE TWH +deployment, removing facts that had drifted since the 2026-05-21 focus snapshot. No code changes. + +- **CLAUDE.md / AGENTS.md:** Frontend `Next.js 14+` → `Next.js 16 + React 19`; AI `Spring AI (Anthropic)` → `Spring AI (OpenAI starter, spring-ai-openai-spring-boot-starter)` (matches `application.yml` `gpt-5.4-nano` / `gpt-4o-mini`); infra `Fly.io + Vercel + Neon` → MIE Create-a-Container + Neon (Fly + Vercel preview decommissioned); SendGrid env var corrected to `WORKWELL_EMAIL_SENDGRID_API_KEY`. CLAUDE.md Current Focus re-dated 2026-06-08 (Sprint 7 closed, Sprint 8 scoped-run parity, CI 3.8× PR #57, MIE v1 deploy fix PRs #55/#56, catalog 60/49, run scopes) and gained a Build & verify section. AGENTS.md reframed from "sprint-based build phase" to post-merge polish; "active work queue" pointer updated. +- **README.md:** Status now notes Sprint 8 parity, CI sharding, and the MIE v1 deploy migration; Production surfaces reduced to the live MIE TWH frontend/backend with an explicit note that the Vercel + Fly public-preview stack is decommissioned. +- **docs/DEPLOY.md:** Rewritten so MIE Create-a-Container is the sole current deployment; all Fly.io/Vercel provisioning, rollback, and troubleshooting moved into a clearly-labeled decommissioned/historical appendix. Env-var names retained; the `Where` column relabeled Backend/Frontend; added the GitHub-secret → container-env mapping note and the v1 manager-API details. +- **docs/sprints/README.md:** Rollout-status line updated to 2026-06-08; index marked historical (no active sprint queue). +- **CHANGELOG.md:** `[Unreleased]` now records Sprint 8 parity, the CI sharding speedup, the MIE v1 deploy fix, and the deployment consolidation. +- **.env.example:** Fly/Vercel framing replaced with MIE context; the stale `WORKWELL_CORS_ALLOWED_ORIGINS` value (`frontend-seven-eta-24.vercel.app`) corrected to `https://twh.os.mieweb.org`. +- **Usage guides (DEMO_RUNBOOK, WALKTHROUGH_GUIDE, MCP):** dead `*.vercel.app` / `*.fly.dev` hostnames swapped to `twh.os.mieweb.org` / `twh-api.os.mieweb.org`, with stack-note banners flagging that embedded example IDs predate the MIE instance. + +### Ground truth verified + +- Measure catalog: 60 total (4 OSHA active CQL, 3 OSHA catalog, 4 HEDIS active CQL, 49 CMS eCQM Draft) — confirmed against `MeasureService` and MEASURES.md/DEPLOY.md. +- AI provider: OpenAI via `spring-ai-openai-spring-boot-starter` (`build.gradle.kts`), models `gpt-5.4-nano` / `gpt-4o-mini` (`application.yml`). +- Frontend: Next.js 16.2.4 / React 19.2.4 (`frontend/package.json`). +- Deployment: only `deploy-twh-mieweb.yml` is active; Fly.io + Vercel preview decommissioned (confirmed with owner). + +### Left as historical (intentionally not rewritten) + +`docs/archive/**`, `docs/new instructions/**`, `docs/superpowers/**`, the per-sprint `SPRINT_0x_*` specs, the MIE migration-process docs (`DEPLOY_OS_MIEWEB.md`, `ECQM_TWH_DEPLOYMENT_PLAN.md`), QA reports (`LIVE_APP_QA_REPORT.md`), and `docs/POST_MERGE_STATUS.md` (a dated 2026-05-11 snapshot already annotated with later resolutions). Old JOURNAL entries that mention Anthropic/Fly are point-in-time records and were left intact. + ## 2026-06-03 — CI test suite 3.8x faster (test sharding + per-test population-run fix) ### What changed diff --git a/docs/MCP.md b/docs/MCP.md index 3b0b0b5..0187e0c 100644 --- a/docs/MCP.md +++ b/docs/MCP.md @@ -31,7 +31,7 @@ Use a JWT minted from `/api/auth/login` for a `ROLE_ADMIN` or `ROLE_CASE_MANAGER "args": [ "-y", "mcp-remote", - "https://workwell-measure-studio-api.fly.dev/sse", + "https://twh-api.os.mieweb.org/sse", "--transport", "sse-only", "--header", diff --git a/docs/WALKTHROUGH_GUIDE.md b/docs/WALKTHROUGH_GUIDE.md index d98f0ec..da8b70a 100644 --- a/docs/WALKTHROUGH_GUIDE.md +++ b/docs/WALKTHROUGH_GUIDE.md @@ -1,11 +1,15 @@ # WorkWell Measure Studio — Complete Walkthrough & Functionality Guide -**Version:** Sprint 6 (all sprints merged) -**Last updated:** 2026-05-18 +**Version:** All sprints (0–7) merged +**Last updated:** 2026-05-18 (URLs updated 2026-06-08 to the live MIE TWH stack) **Audience:** Anyone testing, evaluating, or demonstrating the platform — no technical background required -**Production URL:** https://workwell-measure-studio.vercel.app +**Production URL:** https://twh.os.mieweb.org **Issue tracker:** https://github.com/Taleef7/workwell-measure-studio/issues/23 +> **Stack note:** This walkthrough was authored against the now-decommissioned Vercel/Fly stack. +> URLs point to the live MIE TWH stack (`twh.os.mieweb.org`); any embedded case/run/measure IDs +> are illustrative and may differ on the current instance. + --- ## What is WorkWell Measure Studio? @@ -26,9 +30,9 @@ AI assists human reviewers with drafting and explanation, but **every compliance ### Production URLs | Surface | URL | |---------|-----| -| Application | https://workwell-measure-studio.vercel.app | -| Backend API | https://workwell-measure-studio-api.fly.dev | -| Health check | https://workwell-measure-studio-api.fly.dev/actuator/health | +| Application | https://twh.os.mieweb.org | +| Backend API | https://twh-api.os.mieweb.org | +| Health check | https://twh-api.os.mieweb.org/actuator/health | ### Demo Accounts All accounts use the same password: **`Workwell123!`** @@ -58,7 +62,7 @@ The login screen is the entry point to the application. WorkWell uses demo accou ### Steps -1. Open https://workwell-measure-studio.vercel.app in your browser. +1. Open https://twh.os.mieweb.org in your browser. 2. You will be redirected to the **Login** page at `/login`. 3. Enter the following credentials to log in as an administrator (full access): - **Email:** `admin@workwell.dev` @@ -230,7 +234,7 @@ The case detail page is the **single source of truth** for one employee's non-co ### Steps — opening a case 1. From `/cases`, click **Omar Siddiq's** case row (search for `Omar` if needed). Or navigate directly to: - `https://workwell-measure-studio.vercel.app/cases/32fee6f4-6e69-4675-b44e-5f6392de7dbd` + `https://twh.os.mieweb.org/cases/32fee6f4-6e69-4675-b44e-5f6392de7dbd` 2. The case detail page opens. ### Understanding the page layout @@ -366,7 +370,7 @@ The Runs page shows every execution of every measure — a permanent history of ### Exporting runs data -10. Click **Export Runs CSV** (or go directly to `https://workwell-measure-studio-api.fly.dev/api/exports/runs?format=csv` — you'll need to authenticate first, so use the app's export button). +10. Click **Export Runs CSV** (or go directly to `https://twh-api.os.mieweb.org/api/exports/runs?format=csv` — you'll need to authenticate first, so use the app's export button). 11. The CSV downloads with columns: runId, measureName, measureVersion, scopeType, triggerType, status, startedAt, completedAt, durationMs, totalEvaluated, compliant, dueSoon, overdue, missingData, excluded, passRate, dataFreshAsOf. 12. To export outcomes for a specific run: @@ -730,7 +734,7 @@ WorkWell exposes a **Machine-Callable Protocol (MCP) server** that allows AI age ### Prerequisites - **Claude Desktop is installed.** Download from https://claude.ai/download (macOS and Windows supported). - **A valid WorkWell JWT.** Log in to the WorkWell UI as any user with at least `ROLE_CASE_MANAGER` and copy the access token. In the dashboard the access token is held only in memory — easiest path is to sign in via the WorkWell auth API (`POST /api/auth/login` with `{ "email": "...", "password": "..." }`) and capture the `accessToken` from the JSON response. -- **The WorkWell backend is reachable.** For the deployed demo this is `https://workwell-measure-studio-api.fly.dev`. For local development this is typically `http://localhost:8080`. +- **The WorkWell backend is reachable.** For the deployed demo this is `https://twh-api.os.mieweb.org`. For local development this is typically `http://localhost:8080`. ### Claude Desktop config file @@ -744,7 +748,7 @@ Add (or merge into) an `mcpServers` block pointing at the SSE endpoint with the { "mcpServers": { "workwell": { - "url": "https://workwell-measure-studio-api.fly.dev/sse", + "url": "https://twh-api.os.mieweb.org/sse", "transport": "sse", "headers": { "Authorization": "Bearer " @@ -821,12 +825,12 @@ WorkWell enforces role-based access control throughout the application. This sec 2. Navigate to Studio for any measure. 3. The **Approve** and **Activate** buttons should be disabled or absent. 4. Confirm by opening browser developer tools → Network tab, then manually calling: - `POST https://workwell-measure-studio-api.fly.dev/api/measures/{id}/approve` + `POST https://twh-api.os.mieweb.org/api/measures/{id}/approve` with your session JWT. Expected response: `403 Forbidden`. **Test 2 — Anonymous access rejected:** 5. Open an incognito window. -6. Try to access: `https://workwell-measure-studio-api.fly.dev/api/measures` +6. Try to access: `https://twh-api.os.mieweb.org/api/measures` 7. Expected: `403 Forbidden` (no cookie, no JWT). **Test 3 — Case Manager cannot access Admin:** @@ -837,7 +841,7 @@ WorkWell enforces role-based access control throughout the application. This sec **Test 4 — MCP requires authentication:** 11. In your terminal (if you have curl): ``` - curl https://workwell-measure-studio-api.fly.dev/sse + curl https://twh-api.os.mieweb.org/sse ``` 12. Expected: `403 Forbidden`. diff --git a/docs/sprints/README.md b/docs/sprints/README.md index 80bf1b5..5573934 100644 --- a/docs/sprints/README.md +++ b/docs/sprints/README.md @@ -4,7 +4,7 @@ Generated: 2026-05-14 from the full expert audit of the live application, codebase, v0 prototype screenshots, vision document, and competitive landscape research. -Current rollout status (2026-05-22): Sprint 0 through Sprint 7 are implemented; Sprint 7 issues #47-#51 are closed and merged to `main`. +Current rollout status (2026-06-08): Sprint 0 through Sprint 7 are implemented and merged to `main` (Sprint 7 issues #47–#51 closed). Post-Sprint-7 follow-up on `main` includes Sprint 8 scoped-run parity (SITE/EMPLOYEE manual runs + same-scope reruns, 2026-06-03), an 8-way CI test-sharding speedup, and the MIE Container Manager v1 API deploy migration. This index is now historical reference — there is no active sprint queue. --- From c5578c5c82ff8f8beaa0ea081707c08abff2fca5 Mon Sep 17 00:00:00 2001 From: Taleef Date: Mon, 8 Jun 2026 09:53:06 -0400 Subject: [PATCH 3/3] docs(runbook): use backend-supported filters for demo ID capture MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address Codex review on #59: the capture commands used query params the backend silently ignores. /api/runs has no measureId filter (status, scopeType, triggerType, site, from, to, limit only) and /api/cases filters by measureId, not measureName — so a demo where the first row was not the Audiogram scenario captured unrelated IDs and broke the deterministic get_run_summary / explain_outcome evidence trail. - runs: post-filter the returned JSON by measureName (Audiogram) instead of passing an unsupported measureId param. - cases: filter by the supported measureId (reusing $MEASURE_ID from step 2) instead of the unsupported measureName. - mirror both fixes in the pre-flight smoke check. Co-Authored-By: Claude Opus 4.8 --- docs/DEMO_RUNBOOK.md | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/docs/DEMO_RUNBOOK.md b/docs/DEMO_RUNBOOK.md index 05a383e..e186b40 100644 --- a/docs/DEMO_RUNBOOK.md +++ b/docs/DEMO_RUNBOOK.md @@ -25,13 +25,17 @@ TOKEN=$(curl -fsS -X POST https://twh-api.os.mieweb.org/api/auth/login \ curl -fsS https://twh-api.os.mieweb.org/api/measures \ -H "Authorization: Bearer $TOKEN" | jq -r '.[] | "\(.name): \(.id)"' -# 3) Latest run ID for a measure (paste a measure ID from step 2) -MEASURE_ID= -curl -fsS "https://twh-api.os.mieweb.org/api/runs?measureId=$MEASURE_ID&limit=1" \ - -H "Authorization: Bearer $TOKEN" | jq -r '.[0].id' - -# 4) An open Audiogram case ID (for MCP explain_outcome) -curl -fsS "https://twh-api.os.mieweb.org/api/cases?status=open&measureName=Audiogram" \ +# 3) Latest Audiogram-scoped run ID. /api/runs has no measureId filter — it only +# supports status/scopeType/triggerType/site/from/to/limit — so post-filter the +# returned JSON by measure name. If no measure-scoped Audiogram run exists, drop +# the map(...) and take .[0].id: the latest ALL_PROGRAMS run also covers Audiogram. +MEASURE_ID= # from step 2; reused by the case filter below +curl -fsS "https://twh-api.os.mieweb.org/api/runs?limit=50" \ + -H "Authorization: Bearer $TOKEN" | jq -r 'map(select(.measureName | test("Audiogram"))) | .[0].id' + +# 4) An open Audiogram case ID (for MCP explain_outcome). /api/cases filters by +# measureId (the logical measure UUID), not measureName — reuse $MEASURE_ID. +curl -fsS "https://twh-api.os.mieweb.org/api/cases?status=open&measureId=$MEASURE_ID" \ -H "Authorization: Bearer $TOKEN" | jq -r '.[0].id' ``` @@ -44,8 +48,8 @@ reseeds, the case UUID does not. ```bash curl -fsS https://twh-api.os.mieweb.org/actuator/health curl -fsS https://twh-api.os.mieweb.org/api/measures -H "Authorization: Bearer $TOKEN" -curl -fsS "https://twh-api.os.mieweb.org/api/runs?measureId=$MEASURE_ID&limit=1" -H "Authorization: Bearer $TOKEN" -curl -fsS "https://twh-api.os.mieweb.org/api/cases?status=open&measureName=Audiogram" -H "Authorization: Bearer $TOKEN" +curl -fsS "https://twh-api.os.mieweb.org/api/runs?limit=50" -H "Authorization: Bearer $TOKEN" | jq 'map(select(.measureName | test("Audiogram")))' +curl -fsS "https://twh-api.os.mieweb.org/api/cases?status=open&measureId=$MEASURE_ID" -H "Authorization: Bearer $TOKEN" ``` Expected: