Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
NEXT_PUBLIC_APP_URL: https://example.com
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/callbackcloser?sslmode=require
DIRECT_DATABASE_URL: postgresql://postgres:postgres@localhost:5432/callbackcloser?sslmode=require
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY: pk_test_placeholder
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY: pk_test_Y2xlcmsuZXhhbXBsZS5jb20k
CLERK_SECRET_KEY: sk_test_placeholder
STRIPE_SECRET_KEY: sk_test_placeholder
STRIPE_WEBHOOK_SECRET: whsec_placeholder
Expand All @@ -28,7 +28,7 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: 20
node-version: 22
cache: npm

- run: npm ci
Expand Down
8 changes: 8 additions & 0 deletions RUNBOOK.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@
7. Verify Stripe webhook endpoint still points to the correct production URL.
8. Run a live Twilio smoke test (call + missed call + SMS reply + STOP/START).

## Backup + Restore

- Canonical procedure: `docs/BACKUP_RESTORE_RUNBOOK.md`
- Minimum policy:
- Neon PITR enabled for production.
- Logical backup artifacts retained for 30+ days.
- Restore drill executed monthly with recorded evidence.

## Rotate `TWILIO_WEBHOOK_AUTH_TOKEN` (shared webhook token)

1. Generate a new random token (do not reuse old values).
Expand Down
23 changes: 22 additions & 1 deletion app/layout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ import { validateServerEnv } from '@/lib/env.server';

import './globals.css';

const CLERK_PREVIEW_FALLBACK_KEY = 'pk_test_Y2xlcmsuZXhhbXBsZS5jb20k';

const manrope = Manrope({
subsets: ['latin'],
variable: '--font-sans',
Expand All @@ -16,11 +18,30 @@ export const metadata: Metadata = {
description: 'Missed Call -> Booked Job SMS follow-up',
};

function isLikelyValidClerkPublishableKey(value: string) {
return /^pk_(test|live)_[A-Za-z0-9+/=_-]+$/.test(value);
}

function resolveClerkPublishableKey() {
const configured = process.env.NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY?.trim() ?? '';
if (configured && isLikelyValidClerkPublishableKey(configured)) {
return configured;
}

const allowPreviewFallback = process.env.NODE_ENV !== 'production' || process.env.VERCEL_ENV === 'preview';
if (allowPreviewFallback) {
return CLERK_PREVIEW_FALLBACK_KEY;
}

return configured;
}

export default function RootLayout({ children }: { children: React.ReactNode }) {
validateServerEnv();
const clerkPublishableKey = resolveClerkPublishableKey();

return (
<ClerkProvider>
<ClerkProvider publishableKey={clerkPublishableKey}>
<html lang="en">
<body className={`${manrope.variable} min-h-screen font-sans`}>{children}</body>
</html>
Expand Down
107 changes: 107 additions & 0 deletions docs/BACKUP_RESTORE_RUNBOOK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Backup + Restore Runbook (Neon + Prisma)

Date: 2026-03-02
Owner: Ops / Engineering

## Objectives

- Keep production customer data recoverable from accidental deletion, schema mistakes, and provider incidents.
- Define explicit recovery targets:
- **RPO**: <= 15 minutes (via Neon point-in-time recovery)
- **RTO**: <= 60 minutes for partial incident, <= 120 minutes for full environment recovery
- Run and record a restore drill at least **monthly**.

## Backup Policy

1. Primary protection: Neon managed backups / point-in-time recovery enabled on production project.
2. Secondary protection: periodic logical exports for independent restoreability checks.
3. Retention targets:
- Neon PITR window: keep provider default or higher, never below 7 days.
- Logical backup artifacts: retain at least 30 days in secure storage.

## Required Environment

- `DATABASE_URL` (pooled runtime)
- `DIRECT_DATABASE_URL` (direct connection for Prisma + admin tooling)
- PostgreSQL CLI tools installed locally/CI (`pg_dump`, `psql`, `pg_restore` if custom format is used)

## Logical Backup Procedure (Non-Destructive)

Use a direct Postgres connection for dump operations.

```bash
export BACKUP_TS=$(date -u +%Y%m%dT%H%M%SZ)
export BACKUP_FILE="outputs/backups/callbackcloser-${BACKUP_TS}.sql.gz"
mkdir -p outputs/backups

pg_dump "$DIRECT_DATABASE_URL" \
--no-owner \
--no-privileges \
--format=plain \
| gzip > "$BACKUP_FILE"

gzip -t "$BACKUP_FILE"
ls -lh "$BACKUP_FILE"
```

## Restore Drill Procedure (Monthly)

Run against a non-production restore target only.

1. Provision an empty restore target database (`RESTORE_DATABASE_URL`).
2. Restore the latest backup artifact.

```bash
gunzip -c "$BACKUP_FILE" | psql "$RESTORE_DATABASE_URL"
```

3. Run Prisma and app-level sanity checks against the restored DB:

```bash
DIRECT_DATABASE_URL="$RESTORE_DATABASE_URL" npx prisma validate
DATABASE_URL="$RESTORE_DATABASE_URL" npm run db:smoke
```

4. Validate key tables and counts manually:

```bash
psql "$RESTORE_DATABASE_URL" -c 'select count(*) as businesses from "Business";'
psql "$RESTORE_DATABASE_URL" -c 'select count(*) as leads from "Lead";'
psql "$RESTORE_DATABASE_URL" -c 'select count(*) as messages from "Message";'
psql "$RESTORE_DATABASE_URL" -c 'select count(*) as calls from "Call";'
```

5. Record outcome in drill log (template below).

## Incident Restore Procedure (Production Event)

1. Declare incident and freeze deploys/write traffic.
2. Pick restore point timestamp (UTC) based on incident timeline.
3. Restore using Neon PITR/branch restore into a clean recovery target.
4. Run Prisma validation + app smoke checks on recovery target.
5. Cut over app env vars (`DATABASE_URL`, `DIRECT_DATABASE_URL`) to recovered target.
6. Run post-cutover smoke:
- `npm run env:check`
- `npm run db:smoke`
- Twilio inbound/outbound smoke
- Stripe webhook smoke
7. Announce recovery and keep incident watch for at least 1 hour.

## Alerts + Evidence

- Track backup job success/failure in CI logs or scheduler logs.
- Alert on:
- failed backup run
- restore drill failure
- missing drill evidence older than 35 days
- Store drill artifacts:
- command transcript (or CI job URL)
- DB count snapshots
- elapsed restore time
- operator + reviewer sign-off

## Drill Log Template

| Date (UTC) | Operator | Backup Artifact | Restore Target | Result | Restore Duration | Notes / Follow-ups |
|---|---|---|---|---|---|---|
| YYYY-MM-DD | name | path or object key | env/db name | PASS/FAIL | Xm Ys | links to logs + remediation ticket |
5 changes: 5 additions & 0 deletions docs/DB_NEON_PRISMA.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,8 @@ Recommended:

Using the pooled `-pooler` URL for Prisma migrations can cause migration problems or connection behavior issues. Keep migrations on `DIRECT_DATABASE_URL` (direct endpoint) and runtime on `DATABASE_URL` (pooled endpoint).

## Backup / Restore Operations

For production backup cadence, restore drills, and incident recovery workflow, use:

- `docs/BACKUP_RESTORE_RUNBOOK.md`
33 changes: 33 additions & 0 deletions docs/PRODUCTION_READINESS_GAPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,3 +344,36 @@ Dependencies: G4 (recommended)
- `docs/PRODUCTION_READINESS_GAPS.md`
- Commit SHA:
- `119c217`

- 2026-03-02 - G9 (DONE)
- Branch: `hardening/g9-backup-restore-runbook`
- What changed:
- Added dedicated production backup/restore runbook:
- `docs/BACKUP_RESTORE_RUNBOOK.md`
- defines RPO/RTO targets, backup cadence, retention expectations, and incident restore sequence.
- Added explicit monthly restore drill procedure with command-level verification:
- logical backup command (`pg_dump` + gzip)
- restore command (`psql` replay into restore target)
- Prisma + app smoke checks (`prisma validate`, `npm run db:smoke`)
- key table-count verification commands for `Business`, `Lead`, `Message`, `Call`.
- Added drill evidence template (date/operator/artifact/result/duration/follow-up) to enforce auditable restore history.
- Linked backup/restore operations from existing ops docs:
- `RUNBOOK.md`
- `docs/DB_NEON_PRISMA.md`
- Ops notes:
- This closes the documentation + drill-checklist gap for data recovery readiness.
- Actual production drill execution remains an operational action and should be recorded using the included template.
- Commands run + results:
- `npm test` -> PASS (30/30)
- `npm run lint` -> PASS
- `npm run build` -> PASS
- `npm run typecheck` -> PASS
- `npm run env:check` -> PASS
- `npm run db:validate` -> PASS
- Files touched:
- `docs/BACKUP_RESTORE_RUNBOOK.md`
- `RUNBOOK.md`
- `docs/DB_NEON_PRISMA.md`
- `docs/PRODUCTION_READINESS_GAPS.md`
- Commit SHA:
- `2dc1d7c`