Skip to content

kolezka/facebook-eye

Repository files navigation

facebook-eye

Facebook crawler — three deployable units (api + web + crawler) backed by Postgres (operational state) and Neo4j (graph).

Layout

apps/
  api/        Hono on Bun — REST endpoints, webhook dispatcher, serves SPA
  web/        React + Vite — dashboard SPA
  crawler/    Node 24 LTS — Trigger.dev tasks running Playwright (Local | Browserbase)
packages/
  db/         Drizzle schemas + migrations (Postgres)
  graph/      Neo4j driver + Cypher schema + repository fns
  browser/    BrowserDriver interface + Local + Browserbase impls
  crypto/     AES-GCM session encryption
  shared/     Zod schemas, types, URL parsers

Tech

  • Runtime: Bun for api and web; Node 24 LTS for crawler (Playwright reliability)
  • API: Hono with structured logging (pino), request IDs, rate limiting, graceful shutdown
  • Frontend: React + Vite + TanStack Query + Tailwind, error boundary + toasts
  • Job orchestration: Trigger.dev v4 (self-hosted at trigger.raqz.link)
  • DBs: Postgres 16, Neo4j 5
  • ORM: Drizzle (Postgres) + neo4j-driver (Neo4j)

Quickstart (dev — apps on host)

cp .env.example .env
# Generate a 32-byte hex key for SESSION_ENCRYPTION_KEY:
openssl rand -hex 32
# Set APP_SECRET to anything reasonably long.

bun install
bun run dev:db                       # postgres + neo4j via docker compose
bun --filter @fbeye/db migrate       # apply schema
bun run dev:api                      # :3000
bun run dev:web                      # :5173 (proxies /api → :3000)
bun run dev:crawler                  # trigger.dev local dev

Postgres on :5434 (avoids :5432 system + :5433 mtcuteweb). Neo4j on :7474/:7687.

Dev — fully containerized (parity with prod)

docker-compose.dev.yml has an app profile that builds Dockerfile.api/Dockerfile.crawler with their dev targets and bind-mounts source for hot reload (Bun --watch, Vite HMR, Trigger.dev dev).

# DBs only (default, identical to bun run dev:db):
docker compose -f docker-compose.dev.yml up -d

# Full stack with hot reload:
docker compose -f docker-compose.dev.yml --profile app up
# api  → http://localhost:3000
# web  → http://localhost:5173
# crawler logs in foreground; trigger.dev dev needs `npx trigger.dev login`
# at least once on the host (creds are mounted via your shell's home).

Each containerized workspace has its own *-node-modules named volume so the host's bind mount doesn't clobber the container's installed deps.

Production deployment

The repo ships three multi-target Dockerfiles and a production docker-compose.yml orchestrating postgres → migrate (one-shot) → api → optional crawler-runner, all with health gates. Images are pushed to GHCR by CI on every push to main:

  • ghcr.io/<owner>/facebook-eye-api:{latest,<sha>} — Hono API + bundled SPA (Bun runtime, non-root)
  • ghcr.io/<owner>/facebook-eye-migrate:{latest,<sha>} — one-shot Drizzle migrator
  • (ghcr.io/<owner>/facebook-eye-crawler:{latest,<sha>} — Node 24 + Playwright + Trigger.dev SDK; built locally, optional in prod)

1. Provision a host with docker compose available.

2. Generate secrets.

openssl rand -hex 32   # SESSION_ENCRYPTION_KEY (must be 32 bytes hex / 64 chars)
openssl rand -hex 24   # APP_SECRET
openssl rand -hex 24   # INTERNAL_API_SECRET

3. Configure env.

cp .env.compose.example .env.compose
# Fill in everything; required vars use ${VAR:?required} so compose will fail
# fast if any are missing.

4. Initialize Trigger.dev project.

cd apps/crawler
npx trigger.dev@latest login --api-url https://trigger.raqz.link
npx trigger.dev@latest init --project-ref <new-project-ref>
# Paste the ref into TRIGGER_PROJECT_ID in .env.compose, and the secret key
# from the Trigger.dev dashboard into TRIGGER_SECRET_KEY.

5. Bring up the stack.

# Pull pre-built images from GHCR (or omit `pull` to build locally):
docker compose --env-file .env.compose pull
docker compose --env-file .env.compose up -d
docker compose --env-file .env.compose logs -f api

The api will:

  • block on postgres healthcheck
  • wait for the one-shot migrate container to exit cleanly
  • expose /health (always 200) and /ready (200 only when DB is reachable)

To run a self-hosted Trigger.dev runner on this same host (e.g. when this host has the residential IP you want crawler traffic to come from), enable the crawler-runner profile:

docker compose --env-file .env.compose --profile crawler-runner up -d

6. Deploy the crawler tasks.

cd apps/crawler
npx trigger.dev@latest deploy

Run a self-hosted Trigger.dev runner on whatever host has the residential IP you want the crawler to use; it'll pick up crawl-target jobs.

Operations

  • Health/readiness: GET /health (process up), GET /ready (DB reachable)
  • Logs: pino JSON; pipe through pino-pretty locally
  • Auth: dashboard sends x-app-secret header (stored in localStorage). Worker→api callbacks use x-internal-secret.
  • Rate limit: 120 req/min per IP on /api/* (in-memory; replace with redis if scaled out)
  • Idempotency: only one pending/running crawl run per target; second POST /api/crawl-runs returns 409 with the existing run id
  • Cancel a stuck run: POST /api/crawl-runs/:id/cancel

CI

GitHub Actions on push/PR:

  1. verify: format check (Prettier), lint (ESLint), typecheck (tsc), tests (bun test), web build
  2. docker (main only): builds & pushes ghcr.io/<owner>/facebook-eye-{api,migrate}:{latest,<sha>}

Scripts

bun run typecheck       # tsc across all workspaces
bun run lint            # eslint
bun run format          # prettier --write
bun run format:check    # prettier --check (CI)
bun test                # bun:test, all *.test.ts files
bun run build           # build every workspace
bun run dev:db          # docker compose up -d  (postgres + neo4j)
bun run dev:db:down
bun run dev:api / dev:web / dev:crawler

About

Facebook OSINT crawler — Bun monorepo with Hono API, React dashboard, and a Playwright worker orchestrated by Trigger.dev. Postgres for state, Neo4j for the social graph.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors