From b4673d14f4cd5df81f3297ffc14333b24852c8a8 Mon Sep 17 00:00:00 2001 From: Alfred Rivas Date: Fri, 1 May 2026 23:48:51 +0200 Subject: [PATCH] docs(repo): BEE-1798 reorganize docs tree + add environments snapshot MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Group docs by intent: api/ (what the server exposes), ops/ (how it's operated), observability/ (what's visible at runtime). Add docs/README.md as index. Capture current dev↔prod state in docs/ops/environments.md (Cloud Run revisions, service accounts, secrets, CORS, riesgos). Also add .env, .env.local, .env.*.local to .gitignore — these were missing entirely from the ignore list. IDEAS.md and PENDING.md stay at docs/ root per global ecosystem convention (skills /ideas and /pending look for them there). Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/openapi-clients.yml | 6 +- .gitignore | 5 + docs/PENDING.md | 4 +- docs/README.md | 36 +++++ docs/{ => api}/openapi.yaml | 0 docs/{api.md => api/reference.md} | 2 +- docs/{ => observability}/metrics.md | 0 .../tracing.md} | 0 docs/{ => ops}/deploy-runbook.md | 0 docs/ops/environments.md | 124 ++++++++++++++++++ docs/{ => ops}/infrastructure.md | 0 docs/{ => ops}/supply-chain.md | 0 scripts/validate-openapi.sh | 2 +- 13 files changed, 172 insertions(+), 7 deletions(-) create mode 100644 docs/README.md rename docs/{ => api}/openapi.yaml (100%) rename docs/{api.md => api/reference.md} (96%) rename docs/{ => observability}/metrics.md (100%) rename docs/{observability.md => observability/tracing.md} (100%) rename docs/{ => ops}/deploy-runbook.md (100%) create mode 100644 docs/ops/environments.md rename docs/{ => ops}/infrastructure.md (100%) rename docs/{ => ops}/supply-chain.md (100%) diff --git a/.github/workflows/openapi-clients.yml b/.github/workflows/openapi-clients.yml index ede707d..86b6661 100644 --- a/.github/workflows/openapi-clients.yml +++ b/.github/workflows/openapi-clients.yml @@ -3,7 +3,7 @@ name: OpenAPI Validate & Generate Clients on: push: paths: - - "docs/openapi.yaml" + - "docs/api/openapi.yaml" branches: [develop, main] release: types: [published] @@ -21,7 +21,7 @@ jobs: with: python-version: "3.12" - run: pip install openapi-spec-validator - - run: openapi-spec-validator docs/openapi.yaml + - run: openapi-spec-validator docs/api/openapi.yaml generate-clients: name: Generate ${{ matrix.language }} client @@ -52,7 +52,7 @@ jobs: uses: openapi-generators/openapitools-generator-action@v1 with: generator: ${{ matrix.generator }} - openapi-file: docs/openapi.yaml + openapi-file: docs/api/openapi.yaml command-args: >- -o ${{ matrix.output }} --additional-properties=packageName=beepbox_client diff --git a/.gitignore b/.gitignore index f85ea4d..5a416fa 100755 --- a/.gitignore +++ b/.gitignore @@ -60,6 +60,11 @@ build/ # Generated OpenAPI clients (artifacts only, not committed) clients/ +# Local secrets (synced via GCP Secret Manager — see ~/.claude/scripts/sync-env-to-gcp.sh) +.env +.env.local +.env.*.local + # Terraform infra/.terraform/ infra/*.tfstate diff --git a/docs/PENDING.md b/docs/PENDING.md index cb77232..a376be6 100644 --- a/docs/PENDING.md +++ b/docs/PENDING.md @@ -71,7 +71,7 @@ Usa el skill `/pending` (recomendado). O copia este bloque al final del fichero: `roles/run.admin` + `roles/iam.serviceAccountUser`, bind GitHub org/repo via principalSet, exportar provider name + SA email a GitHub Actions secrets `WIF_PROVIDER` + `WIF_SA`. Documentar el - setup en `docs/deploy-runbook.md`. + setup en `docs/ops/deploy-runbook.md`. - 🚧 **Bloqueado por**: nada bloquea — siguiente release a Cloud Run forzará el setup. Mientras, deploys siguen siendo manuales via `gcloud` + Docker local. @@ -93,7 +93,7 @@ Usa el skill `/pending` (recomendado). O copia este bloque al final del fichero: uso para callers externos), pero confunde si alguien debugea el servicio con curl. - ⚙️ **Acción requerida**: investigar si Cloud Run reserva - `/healthz`. Si sí, documentar en `docs/deploy-runbook.md` que + `/healthz`. Si sí, documentar en `docs/ops/deploy-runbook.md` que los callers externos deben usar `/readyz`. Si no, abrir issue con GCP support. - 🚧 **Bloqueado por**: nada — bug cosmético, no afecta diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..6b33f22 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,36 @@ +# beepbox docs + +Índice de la documentación del repo. Organizada por intención de uso. + +## API — qué expone el server + +- [`api/reference.md`](api/reference.md) — guía rápida de endpoints + ejemplos. +- [`api/openapi.yaml`](api/openapi.yaml) — OpenAPI 3.1, source of truth. + El CI lo valida y genera clientes (Dart, Kotlin, Swift, TS, Python). + +## Ops — cómo se opera el server + +- [`ops/infrastructure.md`](ops/infrastructure.md) — arquitectura GCP + + Terraform "how to" (Cloud Run, Artifact Registry, Secret Manager, + Firebase Hosting). +- [`ops/environments.md`](ops/environments.md) — snapshot real + dev↔prod (URLs, SAs, secrets, CORS, riesgos detectados). +- [`ops/deploy-runbook.md`](ops/deploy-runbook.md) — pasos de deploy + manual + rollback + troubleshooting. +- [`ops/supply-chain.md`](ops/supply-chain.md) — firma de imágenes, + SBOM, attestations. + +## Observability — qué se ve en runtime + +- [`observability/metrics.md`](observability/metrics.md) — métricas + Prometheus que expone `/metrics`. +- [`observability/tracing.md`](observability/tracing.md) — W3C Trace + Context, OpenTelemetry, propagación. + +## Captura de trabajo no formal + +- [`IDEAS.md`](IDEAS.md) — brainstorms / "podríamos hacer X algún día". +- [`PENDING.md`](PENDING.md) — TODOs conocidos sin task Linear todavía. + +> Estos dos viven en raíz por convención global del ecosistema (las +> skills `/ideas` y `/pending` los leen de ahí). No mover. diff --git a/docs/openapi.yaml b/docs/api/openapi.yaml similarity index 100% rename from docs/openapi.yaml rename to docs/api/openapi.yaml diff --git a/docs/api.md b/docs/api/reference.md similarity index 96% rename from docs/api.md rename to docs/api/reference.md index 2ec6e70..edba5dd 100644 --- a/docs/api.md +++ b/docs/api/reference.md @@ -6,7 +6,7 @@ single source of truth for all endpoints, schemas, and examples. ## Interactive documentation Open the spec in [Scalar](https://docs.scalar.com/swagger-editor) by -pasting the raw URL or uploading `docs/openapi.yaml`. +pasting the raw URL or uploading `docs/api/openapi.yaml`. To browse locally with the server running: diff --git a/docs/metrics.md b/docs/observability/metrics.md similarity index 100% rename from docs/metrics.md rename to docs/observability/metrics.md diff --git a/docs/observability.md b/docs/observability/tracing.md similarity index 100% rename from docs/observability.md rename to docs/observability/tracing.md diff --git a/docs/deploy-runbook.md b/docs/ops/deploy-runbook.md similarity index 100% rename from docs/deploy-runbook.md rename to docs/ops/deploy-runbook.md diff --git a/docs/ops/environments.md b/docs/ops/environments.md new file mode 100644 index 0000000..60c5822 --- /dev/null +++ b/docs/ops/environments.md @@ -0,0 +1,124 @@ +# Environments — beepbox-server + +Snapshot del estado real de los entornos GCP donde corre `beepbox-server`. +**Vive en git porque es code-adjacent.** Este documento se actualiza a +mano cuando cambia el bootstrap; no es source-of-truth automatizado. + +> Última actualización: 2026-05-01 — tras BEE-1794 (CORS). + +## Resumen ejecutivo + +Hay **dos entornos GCP** distintos, mismo código pero **bootstrap muy +asimétrico**: dev está montado con Terraform y service account dedicado, +prod se desplegó a mano sin Terraform y usa el SA por defecto de Compute +Engine. + +| | **dev** (`beeping-platform-dev`) | **prod** (`beeping-platform-prod`) | +|---|---|---| +| Cloud Run URL | `beepbox-server-ai7n45q5lq-ew.a.run.app` | `beepbox-server-jlqkyqxtca-ew.a.run.app` | +| Imagen actual | `sha-96cf4a8` (BEE-1794, 2026-04-29) | `:latest` (revision 00001, 2026-04-23) | +| Service Account | `beepbox-server@…dev` dedicado, least-privilege | ⚠️ `compute default` (`638946150252-compute@…`) | +| Secret Manager | `beepbox-api-keys` + `beepbox-rate-limit-rpm` | ⚠️ no existen — solo `RESEND_API_KEY` | +| `BEEPBOX_AUTH_ENDPOINT` | `…dev.cloudfunctions.net/validateApiKey` | `…prod.cloudfunctions.net/validateApiKey` | +| CORS | habilitado (localhost:3000 + dev Firebase) | desactivado | +| Artifact Registry | `beepbox/` repo en `europe-west1` | `beepbox/` repo en `europe-west1` | +| Firebase Hosting | site `api` con rewrite a Cloud Run | no aplicado | +| Custom domain | `api.beeping.io` → CNAME pendiente | n/a | + +`/version` responde 200 en ambos. `api.beeping.io` no resuelve (DNS +pendiente en GoDaddy). + +## Capas de credenciales + +### Capa 1 — API keys de clientes (las que el caller manda en `Authorization`) + +- Se **generan/validan/revocan en Cloud Functions**, no en este repo. + Trío `generateApiKey` / `validateApiKey` / `revokeApiKey` desplegado + en `europe-west1`, en dev y prod. +- Esas Functions **viven en otro repo** (probablemente + `beeping-functions` — TODO confirmar y enlazar aquí). +- `beepbox-server` **no tiene store local de keys**: las valida + remotamente vía `BEEPBOX_AUTH_ENDPOINT` (HttpKeyStore añadido en + BEE-1687). El env var `BEEPBOX_API_KEYS` que aparece en dev es legacy + CSV — fallback, pero el camino real es la Cloud Function. + +### Capa 2 — credenciales de infra (runtime + deploys) + +- **Runtime Cloud Run (dev)**: SA `beepbox-server@…dev` con + `artifactregistry.reader` + `secretmanager.secretAccessor` + + `logging.logWriter` + `cloudtrace.agent`. Definido en `infra/iam.tf`. + **En prod no se aplicó** — usa el SA default de Compute Engine + (demasiado amplio). +- **Deploys CI**: `.github/workflows/deploy.yml` espera secrets + `WIF_PROVIDER` + `WIF_SA` (Workload Identity Federation). **Nunca se + han creado** (ver `PENDING.md` → pending-001). Por eso BEE-1794 se + desplegó a mano: `docker build` local + `gcloud run deploy`. +- **Secrets de servidor (dev)**: en Secret Manager + (`beepbox-api-keys`, `beepbox-rate-limit-rpm`). Inyectados como env + vars vía `value_source.secret_key_ref` con `version=latest`. Cloud + Run los pickup en cold start. + +## Terraform + +`infra/` tiene todo (artifact-registry, cloud-run, firebase-hosting, +iam, secrets) **pero los defaults apuntan solo a `beeping-platform-dev`**. + +- No hay workspace ni `tfvars` para prod → **prod nunca se aplicó con + Terraform** (deploy manual). +- `terraform.tfstate` vive **local en `infra/`**. El backend GCS está + comentado en `main.tf` — sin remote state, sin lock, sin colaboración + segura. + +## Flujo de deploy actual + +### Pretendido (CI) + +1. Release published o `workflow_dispatch` → +2. `.github/workflows/deploy.yml` autentica con WIF → +3. build + push a AR → +4. `google-github-actions/deploy-cloudrun` → +5. `scripts/smoke.sh` → +6. Rollback automático si smoke falla. + +### Real hoy + +Pipeline bloqueado porque WIF nunca se cableó. Cada release se hace a +mano: + +```bash +docker buildx build --platform linux/amd64 \ + -t europe-west1-docker.pkg.dev/beeping-platform-dev/beepbox/beepbox-server:vX.Y.Z \ + --push . + +gcloud run deploy beepbox-server \ + --image europe-west1-docker.pkg.dev/beeping-platform-dev/beepbox/beepbox-server:vX.Y.Z \ + --region europe-west1 \ + --project beeping-platform-dev +``` + +(es lo que hicimos para BEE-1794, registrado en commit `19f13f2`.) + +## Pendientes registrados (`docs/PENDING.md`) + +- **pending-001** — Crear WIF Pool/Provider + SA con + `artifactregistry.writer` + `run.admin` + `iam.serviceAccountUser`, + bind al repo GitHub, exportar a `WIF_PROVIDER` / `WIF_SA`. + Desbloquea `deploy.yml`. +- **pending-002** — `/healthz` desde fuera devuelve HTML 404 de Google + Frontend (parece reservado por GFE). Cosmético — usamos `/readyz` + para callers externos. + +## Riesgos / inconsistencias detectadas + +1. **Prod sin secrets de beepbox ni SA dedicado.** Si el server prod + intenta leer `BEEPBOX_API_KEYS` o `BEEPBOX_RATE_LIMIT_RPM` desde env + fallará silenciosamente o caerá a defaults. Confirmar que prod tira + **solo** de `BEEPBOX_AUTH_ENDPOINT` y no necesita el CSV. +2. **Prod fuera de Terraform.** Sin IaC reproducible para prod. Drift + garantizado entre lo que hay desplegado y lo que define `infra/`. +3. **WIF no cableado.** Todo deploy es manual desde la máquina del + founder. +4. **Custom domain `api.beeping.io` no resuelve.** Falta CNAME en + GoDaddy apuntando a `beeping-platform-dev-api.web.app`. +5. **Terraform state local.** `infra/terraform.tfstate` no está en GCS + — riesgo de pérdida y bloqueo de colaboración. diff --git a/docs/infrastructure.md b/docs/ops/infrastructure.md similarity index 100% rename from docs/infrastructure.md rename to docs/ops/infrastructure.md diff --git a/docs/supply-chain.md b/docs/ops/supply-chain.md similarity index 100% rename from docs/supply-chain.md rename to docs/ops/supply-chain.md diff --git a/scripts/validate-openapi.sh b/scripts/validate-openapi.sh index 5b2c9c9..8d11d47 100755 --- a/scripts/validate-openapi.sh +++ b/scripts/validate-openapi.sh @@ -2,7 +2,7 @@ set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -SPEC="$SCRIPT_DIR/../docs/openapi.yaml" +SPEC="$SCRIPT_DIR/../docs/api/openapi.yaml" VENV="/tmp/openapi-venv" if [ ! -f "$VENV/bin/openapi-spec-validator" ]; then