Prevents Claude from finishing tasks with speculation, unverified claims, or invented causality.
Claude regularly delivers responses that sound confident but contain:
- Speculation disguised as diagnosis ("This is probably caused by...")
- Invented causality without evidence ("The error occurs because...")
- Fake quantification without methodology ("This improves performance by 70%")
- Completeness overclaims without verification ("All files have been checked")
These patterns are dangerous precisely because they sound authoritative — indistinguishable from real analysis. This plugin forces Claude to ground every claim in actual observations before it can complete a task.
Use the Vercel Skills CLI to install across any supported agent:
npx skills add bitflight-devops/hallucination-detectorTarget a specific agent:
npx skills add bitflight-devops/hallucination-detector -a claude-code
npx skills add bitflight-devops/hallucination-detector -a cursor
npx skills add bitflight-devops/hallucination-detector -a codex
npx skills add bitflight-devops/hallucination-detector -a opencodeTo uninstall:
npx skills remove hallucination-detectorIn Claude Code, register the marketplace first:
/plugin marketplace add bitflight-devops/hallucination-detectorThen install the plugin:
/plugin install hallucination-detector@hallucination-detectorIn Cursor Agent chat, install from marketplace:
/plugin-add hallucination-detector
Tell Codex:
Fetch and follow instructions from https://raw.githubusercontent.com/bitflight-devops/hallucination-detector/refs/heads/main/.codex/INSTALL.md
Detailed docs: .codex/INSTALL.md
Tell OpenCode:
Fetch and follow instructions from https://raw.githubusercontent.com/bitflight-devops/hallucination-detector/refs/heads/main/.opencode/INSTALL.md
Detailed docs: .opencode/INSTALL.md
Start a new session in your chosen platform and write a response containing speculation (e.g., "this is probably caused by..."). The plugin should block the response and require evidence-first rewriting.
For detailed information about how the plugin works, its architecture, configuration, and internal detection mechanisms, see the Architecture Reference.
LLMs like Claude are optimized during training to produce responses that appear helpful and confident. This creates a systematic failure mode:
Speculation as diagnosis - When asked "why did X happen?", Claude draws on training patterns to generate plausible-sounding explanations. These explanations feel authoritative but have no connection to the actual state of your system. Claude hasn't checked logs, read config files, or verified anything — it's pattern-matching from training data.
Invented causality - Causal claims ("X because Y") require evidence showing the relationship. Claude often asserts causality based on what typically causes similar symptoms, not what actually caused this specific instance. The word "because" in Claude's output frequently signals unverified inference.
Fake rigor - Scores and percentages ("8/10 quality", "70% improvement") create an illusion of measurement. Without methodology, sample size, and reproducible criteria, these numbers are meaningless — yet they make responses feel more credible.
Completeness theater - Claims like "all files checked" or "comprehensive analysis" are rarely true. Claude may have checked some files, or the most likely files, but stating completeness without enumerating scope is misleading.
When Claude's unverified speculation matches reality by chance, the problem is invisible. When it doesn't match, you've wasted time pursuing a false lead — or worse, made changes based on incorrect diagnosis.
The cost compounds in agent workflows: sub-agents act on orchestrator hallucinations, hallucinated "facts" persist across sessions as assumed truth, and the original error becomes impossible to trace.
Instructions alone don't work. Here's why:
- Claude's training optimization overrides behavioral instructions under completion pressure
- Speculation patterns are deeply embedded in how Claude learned to be "helpful"
- Claude cannot reliably self-assess speculation — it doesn't recognize its own patterns
A Stop hook provides structural enforcement — Claude cannot complete a task while trigger language is present. This shifts from "please don't speculate" (ignorable) to "speculation blocks completion" (architectural constraint).
With this plugin installed, Claude will be blocked from finishing if its response contains:
Speculation language — "I think", "probably", "likely", "seems", "should be", "maybe", "might"
Ungrounded causality — "because", "due to", "caused by", "therefore", "this means" without cited evidence
Pseudo-quantification — Scores like "8.5/10" or percentages like "70% improvement" without methodology
Completeness overclaims — "all files checked", "comprehensive analysis", "fully resolved" without listing what was actually inspected
When blocked, Claude must rewrite using evidence-first language:
- "I observed X in file Y at line Z"
- "I don't know yet" / "I can check using my tools"
- "After running command X, the output showed Y"
For a complete list of detection categories and suppression rules, see the Architecture Reference.
Just install it — the plugin runs automatically as a Stop hook. Every time Claude attempts to finish a task, its last message is audited for hallucination triggers.
If triggers are found, Claude is blocked and must rewrite with proper evidence.
To manually audit any text (useful for reviewing sub-agent output or draft responses):
/hallucination-detector:auditThen paste the content to audit.
Without this plugin:
You: "Why is the API returning 500 errors?"
Claude: "This is probably caused by a database connection timeout.
The issue is likely due to connection pool exhaustion,
which typically happens under high load."
Claude confidently explains a cause without having verified anything.
With this plugin:
You: "Why is the API returning 500 errors?"
Claude: [attempts to respond with speculation]
[BLOCKED by hallucination-detector]
Claude: "I don't have enough information yet. Let me check:
1. Reading the error logs...
2. Checking database connection configuration...
I found in /var/log/app.log at line 1247:
'ConnectionError: max retries exceeded for host db.internal'
The logs show the API server cannot reach the database host.
Next step: verify network connectivity and database status."
These are the expected behaviors — the mechanism working as designed:
- Claude will be blocked when it falls into speculation patterns
- Responses will be more verbose (evidence must be cited)
- Claude may need 2-3 rewrites before a response passes
- After 2 blocks in the same response cycle, the plugin allows completion to prevent infinite loops
/plugin update hallucination-detectorOr for manual installations:
cd <install-path> && git pull- Node.js (for the stop-hook script)
- Claude Code v2.0+, Cursor, Codex, or OpenCode
MIT License — see LICENSE file for details.