This page describes Pragma's threat model and the limits of what it guarantees.
It is grounded in src/pragma/coverage/runner.py, src/pragma/judge/client.py,
and the plugin hooks under plugin/hooks/.
| Tier | Reads code? | Executes code? | Network? | Default in plugin |
|---|---|---|---|---|
| Tier 1 — AST classifier | yes (static parse) | no | no | always on |
| Tier 2 — coverage gate | yes | yes — runs the test under audit | no | off (opt-in via PRAGMA_COVERAGE=1) |
| Tier 3 — LLM judge | yes | no | yes — sends source to an endpoint | off (opt-in via PRAGMA_HOOK_WITH_LLM=1) |
The AST classifier reads the test source and walks its syntax tree. It never runs the code. Tier 1 is the only tier the plugin enables by default.
To measure coverage, tier 2 spawns a subprocess that runs the test file under
audit (python -m coverage run -m pytest <test file>, or
npx vitest run for Vitest). That file is the very artifact being checked. It
may be AI-written, gamed, or hostile. Executing it runs arbitrary code with the
privileges of whoever invoked Pragma.
For this reason:
- Tier 2 is opt-in. It is off by default in the plugin and requires
PRAGMA_COVERAGE=1in the PostToolUse hook, or--with-coverageon the CLI. - The subprocess environment is scrubbed. The child process does not inherit
the parent environment.
runner.pybuilds the child env from a minimal allowlist —PATH,HOME,TMPDIR/TEMP/TMP,LANG/LC_ALL/LC_CTYPE,PYTHONHASHSEED,PYTHONDONTWRITEBYTECODE,PYTHONPATH,SYSTEMROOT, plus the coverage DB pointer — and drops everything else. Secret-bearing variables (PRAGMA_*_API_KEY, cloud credentials, tokens, DB passwords) are therefore not visible to the test under audit. - The subprocess is time-bounded. Python: a 5-second per-test pytest timeout inside a 10-second outer subprocess kill-switch. Vitest: an 8-second timeout.
The allowlist and timeouts limit blast radius; they are not a sandbox. Only enable tier 2 when you are comfortable executing the test files in question.
Tier 3 sends the test source plus the resolved production source to whatever
OpenAI-compatible endpoint PRAGMA_LLM_BASE_URL points at (DeepSeek by
default). Implications:
- Treat that payload like any other code-review send-off. For proprietary code, point tier 3 at a self-hosted endpoint (Ollama, LM Studio, vLLM, or an internal gateway) instead of a third-party SaaS.
- Tier 3 is opt-in precisely for this reason: it is off by default and only runs
when
PRAGMA_HOOK_WITH_LLM=1(hook) or--with-llm(CLI) is set and an API key is present. - Tier 3 fails open: if the key, the
openaipackage, or the API call is missing/failing, the judge is skipped silently and nosemantic_gamingverdict is emitted.
Pragma raises the cost of shipping a test that verifies nothing. It does not prove a test is correct, and it can be bypassed.
- The classifier is heuristic. It recognizes known gaming shapes. A novel
evasion that none of the rules match will classify as
verified. Tier 1 catches structural patterns; tier 2 demands the production code actually run; tier 3 reads both files — but none of them is a proof of correctness. - The hooks fail open by design. If
pragmacannot be invoked, the plugin hooks exit0(allow) rather than block. If the blocking-suffix list cannot be fetched (pragma blocking), the hook treats nothing as blocking. This keeps a broken environment from wedging all edits — but it means a determined actor who disables or hidespragmaremoves the gate entirely. - The hooks only block new gaming. Pre-existing gaming in a file is not
blocked; only verdicts the current edit introduces (relative to git
HEAD) block. Seehooks.md. - Do not rely on Pragma as your only control. It is a guardrail against accidental and lazy test-gaming by AI assistants, layered on top of human review and CI — not a replacement for them.
Pragma is MIT-licensed and provided as-is (see LICENSE). There is no warranty;
review the code before granting it the ability to execute test files in a
sensitive environment.