Skip to content

Local-hardening: encrypt vault, remove /audit reverse-lookup, loopback bind, outbound allowlist#7

Open
Arkessiah wants to merge 1 commit into
zeroc00I:mainfrom
Arkessiah:feat/local-hardening
Open

Local-hardening: encrypt vault, remove /audit reverse-lookup, loopback bind, outbound allowlist#7
Arkessiah wants to merge 1 commit into
zeroc00I:mainfrom
Arkessiah:feat/local-hardening

Conversation

@Arkessiah

Copy link
Copy Markdown

Summary

This PR hardens DontFeedTheAI for local, single-operator use: it removes the
HTTP surface that exposes the reverse-lookup table, encrypts the vault at rest,
binds to loopback by default, and restricts outbound traffic to the configured
LLM upstreams. Only masked surrogates ever cross the network boundary; the real
data never leaves the machine and is never readable over HTTP.

Two of these changes implement items already on your own roadmap (see
docs/threat-model.md: "Write-only vault — the reverse lookup is never exposed
over HTTP"
, and the README note that making /audit write-only is planned).

Motivation

  1. /audit exposes the full surrogate → original table over HTTP — and the
    middleware allows /audit even when PROXY_SECRET is set. Anyone who can
    reach the port can reverse the entire anonymization for the engagement. This
    is the single highest-impact risk and was already acknowledged as a roadmap
    item.
  2. The vault stores real client data in cleartext SQLite (IPs, credentials,
    hostnames). A stolen data/ directory means full exposure across every
    engagement — a poor fit for the NDAs most pentest work operates under.
  3. Default bind is 0.0.0.0, exposing the proxy (and the audit table) to the
    whole LAN.
  4. No outbound restriction — nothing prevents a future bug or tampered config
    from turning the data-holding proxy into an exfiltration channel.

What changed

Encrypted vault at rest (src/crypto.py, new)

  • Original values are encrypted with Fernet (AES-128 + HMAC); the key is derived
    from a VAULT_KEY passphrase via PBKDF2 (200k iterations). The passphrase is
    never written to disk — only a non-secret salt and an encrypted canary live in
    the DB.
  • A keyed HMAC blind index allows dedup/lookup without storing cleartext.
  • verify.db (background verifier) encrypts its original column the same way.
  • Fail-closed: missing VAULT_KEY → the proxy refuses to start; a wrong
    passphrase aborts on the canary check instead of silently diverging.

Remove the HTTP reverse-lookup

  • Deletes the /audit dashboard and the transform_log table that backed it.
    The surrogate → original mapping is no longer served over HTTP at all.
  • /health no longer returns vault contents or counts.

Local-only networking

  • Default HOST=127.0.0.1; compose files publish ports on 127.0.0.1 only.
  • Outbound allowlist (src/netguard.py): the proxy only connects to the
    configured Anthropic/OpenAI/OpenRouter upstreams and local Ollama; any other
    destination is blocked. Extra hosts via UPSTREAM_ALLOWLIST.

Breaking changes — happy to gate these

I kept this branch fail-closed because it targets a strict local setup, but I
understand the VPS/tunnel flow is your recommended mode. I'm glad to rework any
of the breaking parts behind opt-in flags so existing users aren't affected:

  • Mandatory encryption could become opt-in (VAULT_ENCRYPTION=true) with a
    one-time migration for existing plaintext vaults.
  • Loopback default is overridable via HOST, but I can keep 0.0.0.0 as the
    default and just document the loopback recommendation.
  • /audit removal could instead become write-only (show entity types and
    counts, never the original values), matching the roadmap wording more directly.

I'm also happy to split this into smaller, focused PRs (e.g. audit-hardening
first, then encryption) if that's easier to review.

Testing

149 passed, 55 skipped (the skips are the Ollama integration tests). Added unit
tests covering encryption round-trip, no-cleartext-at-rest, and wrong-key
rejection. Mask → forward → unmask round-trips verified end to end.

…ound allowlist

Make DontFeedTheAI a purely local tool. Real data never leaves the machine and
is never exposed over HTTP; only masked surrogates cross the boundary.

- Encrypt the surrogate vault at rest (Fernet / AES-128 + HMAC), key derived
  from VAULT_KEY via PBKDF2. Keyed HMAC blind index dedups values without
  storing cleartext. verify.db encrypted the same way. Fail-closed: missing key
  → proxy refuses to start; wrong key → canary abort. (src/crypto.py)
- Remove the /audit dashboard and the transform_log table behind it — the
  surrogate→original reverse-lookup is no longer served over HTTP.
- Bind to loopback by default (HOST=127.0.0.1); compose publishes ports on
  127.0.0.1 only.
- Restrict outbound traffic to the configured LLM upstreams via an allowlist;
  every other destination is blocked. (src/netguard.py)
- Harden /health so it never exposes vault contents or counts.
- Update docs, .env.example and tests; add encryption tests.

Tests: 149 passed, 55 skipped (Ollama integration).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant