Skip to content

harshakshit/agent-iam-demo

Repository files navigation

Agent IAM Demo — Sensitive Information Disclosure in an AI Support Bot

A three-stage CLI demo that shows an AI support agent leaking cross-tenant data because user identity never reaches its tools (the confused deputy problem), and then fixes it with OpenFGA — a CNCF, Zanzibar-inspired ReBAC engine — by authorizing every tool call against per-user relationship tuples.

The point of the demo is that prompt-level guardrails can always be jailbroken, but a deterministic authorization check at the tool boundary cannot. The same attack prompts behave correctly after the fix.

What you'll see

  1. Stage 1 — Broken. Agent calls search_tickets with no notion of who's asking. Alice's prompts pull back Bob's billing info and Carol's API key.
  2. Stage 2 — Fixed. Same agent loop, same tools, same prompts. OpenFGA filters rows at the tool layer before they reach the LLM.
  3. Stage 3 — Attack replay. The same three prompts run as Alice against both stages, side by side.

Prerequisites

  • Python 3.11+
  • Docker + Docker Compose
  • An Anthropic API key (claude-sonnet-4-6 access)

Setup

# 1. Start OpenFGA (HTTP on :8080, gRPC on :8081, playground on :3000).
docker compose up -d

# 2. Install Python deps.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 3. Seed sqlite + the OpenFGA store, model, and tuples.
python seed.py
# The script prints lines like:
#   FGA_STORE_ID=01J...
#   FGA_MODEL_ID=01J...

# 4. Create your .env from the template and paste the IDs in.
cp .env.example .env
# Edit .env:
#   ANTHROPIC_API_KEY=sk-ant-...
#   FGA_STORE_ID=<paste from seed.py>
#   FGA_MODEL_ID=<paste from seed.py>

The OpenFGA playground UI is at http://localhost:3000 if you want to poke at the model or tuples by hand.

Run the demo

# Stage 1 — the broken agent, one prompt as Alice.
python stage1_broken.py

# Stage 2 — the same agent, with ReBAC at the tool layer.
python stage2_fixed.py

# Stage 3 — three attack prompts as Alice against both stages, side by side.
python stage3_attack.py

stage3_attack.py is the showpiece — it's the file you run on stage.

Re-running python seed.py at any time is safe; it drops and recreates both tickets.db and the OpenFGA store. After re-seeding you'll need to update FGA_STORE_ID and FGA_MODEL_ID in .env with the new values.

Demo talk track

This demo is organized around Michael Grinich's framing of agent identity (WorkOS) — the same four ideas keep coming up whenever an LLM gets to call tools on behalf of a user: confused deputy, persona shadowing, capability scoping, and audit trail. The three stages each foreground one or two of them.

Stage 1 — confused deputy and persona shadowing

The support bot in stage1_broken.py is the canonical confused deputy. It has a privileged capability (the tickets table) and it acts on behalf of a user (Alice), but the user's identity stops at the frontend log line and never propagates into search_tickets. The tool sees query="billing" and nothing else, so it returns every matching row — Bob's $4,200 dispute included. This is also persona shadowing in action: the LLM has Alice's question in its context window and answers as if it's serving Alice, but the side effects — the actual SQL — execute with the bot's full database privileges, not Alice's. From the LLM's point of view, "Alice is asking" is just text. From the database's point of view, nothing is asking — it's just the service account. The two views are out of sync, which is what makes the bug exploitable by a normal-looking question, no jailbreak required.

Stage 2 — capability scoping at the tool boundary

stage2_fixed.py collapses the gap by scoping capabilities per call. Every invocation of search_tickets now carries a Python-level calling_user argument that the agent runtime injects directly from the authenticated session — it is not in the tool schema the LLM sees, not in the system prompt, not in the conversation history. Prompt injection literally cannot reach it. After the SQL fetches candidate rows, each one is run through an OpenFGA check(user, viewer, ticket:n) call, and rows the user has no relationship to are dropped before the LLM ever sees them. The capability the bot grants itself stops being "read the tickets table" and becomes "read the specific tickets this specific user is a viewer on, for this specific call." That's the principle of least privilege expressed in a way the runtime can enforce instead of a way the LLM has to remember.

Stage 3 — audit trail and the side-by-side proof

stage3_attack.py runs the same three prompts as Alice — two probing other tenants' data, one legitimate — against both stages. The Stage 1 transcript leaks; the Stage 2 transcript doesn't, and crucially you can see why in plain text: every denied row produces an [FGA] DENY user:alice → viewer → ticket:n (owner=bob) line. That's the audit trail Grinich talks about — not a vague service-level log saying "the bot accessed the DB at 14:02:17", but a per-request, per-resource, per-relation decision you can point at, hand to a compliance team, or replay later. In production you'd ship these decisions to your SIEM and alert on anomalies. The takeaway for the audience: the LLM can be tricked, the prompt can be jailbroken, the system prompt can be leaked — but the authorization layer is a deterministic check against explicit relationship tuples, and it doesn't care how clever the attack phrasing is.

Architecture

       user_id (from auth session)
              │
              ▼
   ┌──────────────────────┐                ┌───────────────────────┐
   │      ask_bot         │  calling_user  │     search_tickets    │
   │  (the agent loop)    │ ─────────────► │   query  +  FGA       │
   └──────────────────────┘  Python-only   │      check loop       │
              │                            └───────────────────────┘
              │                                       │
              │                                       ▼
              │                            ┌───────────────────────┐
              │                            │  OpenFGA: viewer?     │
              │                            │  user:<id> →          │
              │                            │      ticket:<n>       │
              │   tool_result              └───────────────────────┘
              ▼      (filtered rows)
   ┌──────────────────────┐
   │  Anthropic Messages  │
   │  (LLM never sees     │
   │  calling_user)       │
   └──────────────────────┘

The two security-relevant rules:

  1. calling_user is a Python kwarg, not a tool input. The LLM has no way to set, override, or even reference it. The tool schema in stage2_fixed.py:TOOLS is byte-for-byte identical to stage1_broken.py.
  2. The OpenFGA check runs after SQL but before return. Filtered rows never enter the LLM context, so even if the LLM were compromised downstream it can't disclose what it never received.

Files

File Purpose
docker-compose.yml OpenFGA on :8080, playground on :3000
seed.py Creates tickets.db and OpenFGA store/model/tuples (idempotent)
stage1_broken.py Vulnerable agent
stage2_fixed.py Same agent + OpenFGA filtering at the tool layer
stage3_attack.py Runs attack prompts against both stages, side by side
fga_client.py Thin async wrapper around check(user, relation, object)
.env.example Template for ANTHROPIC_API_KEY, FGA_STORE_ID, FGA_MODEL_ID

Non-goals

  • No FastAPI, no UI, no auth provider integration — Alice's "session" is a hardcoded string. The demo is about what the agent runtime does with that identity, not where it comes from.
  • No real embeddings — LIKE '%query%' is fine for three rows.
  • No retries, rate limiting, or production hardening.

About

CLI demo: confused deputy in AI support bot vs OpenFGA ReBAC fix

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages