A three-stage CLI demo that shows an AI support agent leaking cross-tenant data because user identity never reaches its tools (the confused deputy problem), and then fixes it with OpenFGA — a CNCF, Zanzibar-inspired ReBAC engine — by authorizing every tool call against per-user relationship tuples.
The point of the demo is that prompt-level guardrails can always be jailbroken, but a deterministic authorization check at the tool boundary cannot. The same attack prompts behave correctly after the fix.
- Stage 1 — Broken. Agent calls
search_ticketswith no notion of who's asking. Alice's prompts pull back Bob's billing info and Carol's API key. - Stage 2 — Fixed. Same agent loop, same tools, same prompts. OpenFGA filters rows at the tool layer before they reach the LLM.
- Stage 3 — Attack replay. The same three prompts run as Alice against both stages, side by side.
- Python 3.11+
- Docker + Docker Compose
- An Anthropic API key (
claude-sonnet-4-6access)
# 1. Start OpenFGA (HTTP on :8080, gRPC on :8081, playground on :3000).
docker compose up -d
# 2. Install Python deps.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 3. Seed sqlite + the OpenFGA store, model, and tuples.
python seed.py
# The script prints lines like:
# FGA_STORE_ID=01J...
# FGA_MODEL_ID=01J...
# 4. Create your .env from the template and paste the IDs in.
cp .env.example .env
# Edit .env:
# ANTHROPIC_API_KEY=sk-ant-...
# FGA_STORE_ID=<paste from seed.py>
# FGA_MODEL_ID=<paste from seed.py>The OpenFGA playground UI is at http://localhost:3000 if you want to poke at the model or tuples by hand.
# Stage 1 — the broken agent, one prompt as Alice.
python stage1_broken.py
# Stage 2 — the same agent, with ReBAC at the tool layer.
python stage2_fixed.py
# Stage 3 — three attack prompts as Alice against both stages, side by side.
python stage3_attack.pystage3_attack.py is the showpiece — it's the file you run on stage.
Re-running python seed.py at any time is safe; it drops and recreates both
tickets.db and the OpenFGA store. After re-seeding you'll need to update
FGA_STORE_ID and FGA_MODEL_ID in .env with the new values.
This demo is organized around Michael Grinich's framing of agent identity (WorkOS) — the same four ideas keep coming up whenever an LLM gets to call tools on behalf of a user: confused deputy, persona shadowing, capability scoping, and audit trail. The three stages each foreground one or two of them.
The support bot in stage1_broken.py is the canonical confused deputy.
It has a privileged capability (the tickets table) and it acts on behalf of
a user (Alice), but the user's identity stops at the frontend log line and
never propagates into search_tickets. The tool sees query="billing" and
nothing else, so it returns every matching row — Bob's $4,200 dispute
included. This is also persona shadowing in action: the LLM has Alice's
question in its context window and answers as if it's serving Alice, but the
side effects — the actual SQL — execute with the bot's full database
privileges, not Alice's. From the LLM's point of view, "Alice is asking" is
just text. From the database's point of view, nothing is asking — it's just
the service account. The two views are out of sync, which is what makes the
bug exploitable by a normal-looking question, no jailbreak required.
stage2_fixed.py collapses the gap by scoping capabilities per call.
Every invocation of search_tickets now carries a Python-level calling_user
argument that the agent runtime injects directly from the authenticated
session — it is not in the tool schema the LLM sees, not in the system
prompt, not in the conversation history. Prompt injection literally cannot
reach it. After the SQL fetches candidate rows, each one is run through an
OpenFGA check(user, viewer, ticket:n) call, and rows the user has no
relationship to are dropped before the LLM ever sees them. The capability the
bot grants itself stops being "read the tickets table" and becomes "read the
specific tickets this specific user is a viewer on, for this specific call."
That's the principle of least privilege expressed in a way the runtime can
enforce instead of a way the LLM has to remember.
stage3_attack.py runs the same three prompts as Alice — two probing other
tenants' data, one legitimate — against both stages. The Stage 1 transcript
leaks; the Stage 2 transcript doesn't, and crucially you can see why in
plain text: every denied row produces an [FGA] DENY user:alice → viewer → ticket:n (owner=bob) line. That's the audit trail Grinich talks about —
not a vague service-level log saying "the bot accessed the DB at 14:02:17",
but a per-request, per-resource, per-relation decision you can point at, hand
to a compliance team, or replay later. In production you'd ship these
decisions to your SIEM and alert on anomalies. The takeaway for the audience:
the LLM can be tricked, the prompt can be jailbroken, the system prompt can
be leaked — but the authorization layer is a deterministic check against
explicit relationship tuples, and it doesn't care how clever the attack
phrasing is.
user_id (from auth session)
│
▼
┌──────────────────────┐ ┌───────────────────────┐
│ ask_bot │ calling_user │ search_tickets │
│ (the agent loop) │ ─────────────► │ query + FGA │
└──────────────────────┘ Python-only │ check loop │
│ └───────────────────────┘
│ │
│ ▼
│ ┌───────────────────────┐
│ │ OpenFGA: viewer? │
│ │ user:<id> → │
│ │ ticket:<n> │
│ tool_result └───────────────────────┘
▼ (filtered rows)
┌──────────────────────┐
│ Anthropic Messages │
│ (LLM never sees │
│ calling_user) │
└──────────────────────┘
The two security-relevant rules:
calling_useris a Python kwarg, not a tool input. The LLM has no way to set, override, or even reference it. The tool schema instage2_fixed.py:TOOLSis byte-for-byte identical tostage1_broken.py.- The OpenFGA check runs after SQL but before return. Filtered rows never enter the LLM context, so even if the LLM were compromised downstream it can't disclose what it never received.
| File | Purpose |
|---|---|
docker-compose.yml |
OpenFGA on :8080, playground on :3000 |
seed.py |
Creates tickets.db and OpenFGA store/model/tuples (idempotent) |
stage1_broken.py |
Vulnerable agent |
stage2_fixed.py |
Same agent + OpenFGA filtering at the tool layer |
stage3_attack.py |
Runs attack prompts against both stages, side by side |
fga_client.py |
Thin async wrapper around check(user, relation, object) |
.env.example |
Template for ANTHROPIC_API_KEY, FGA_STORE_ID, FGA_MODEL_ID |
- No FastAPI, no UI, no auth provider integration — Alice's "session" is a hardcoded string. The demo is about what the agent runtime does with that identity, not where it comes from.
- No real embeddings —
LIKE '%query%'is fine for three rows. - No retries, rate limiting, or production hardening.