Releases: vnmoorthy/groundtruth
Releases · vnmoorthy/groundtruth
v0.1.3 — academic-subject exclusions, calibrated to 0 false positives on 1,272 real turns
Third release. Driven entirely by an audit of 1,272 real assistant turns from one user's ~/.claude/projects/.
Result on that corpus:
| version | findings | quality |
|---|---|---|
| v0.1.0 | 30 | mostly false positives (academic prose) |
| v0.1.2 | 5 | better, but academic turns with code-language fenced blocks slipped through |
| v0.1.3 | 0–1 | one ambiguous "successfully added 20,000 more" survives |
Live demonstration of the gate firing in vivo, recorded against claude -p:
- Prompt: Create
hello.txtand end your turn with 'Done.' -
- Turn 1: agent says "Done." → Stop hook returns
{decision: block, reason}
- Turn 1: agent says "Done." → Stop hook returns
-
- Turn 2 (forced by the block): agent retracts with the prescribed phrasing word for word: "I attempted to create hello.txt. I have not verified it. To verify I would need to..."
What's in 0.1.3
Added
- Academic-subject exclusions in
src/detector.mjs: paper / manuscript / submission / chapter / abstract / bibliography / figure subjects in is/are completion frames, with up to ~6 words of modifiers between noun and verb. -
- Memory-observer XML tag exclusions:
<completed>,<fact>,<next_steps>,<achievement>from agent observability tools.
- Memory-observer XML tag exclusions:
-
- Paper-writing compound subject exclusions:
Paper editing,paper preparation, etc.
- Paper-writing compound subject exclusions:
-
- Pure-academic-flavor work modifier exclusions:
intellectual and technical work,scholarly work.
- Pure-academic-flavor work modifier exclusions:
-
- Citation / bibliography / footnote work exclusions.
-
- Word-count operations:
Added 54 words,Cut 200 words.
- Word-count operations:
-
- Author metadata operations.
-
- Paper venue exclusions: TMLR, NeurIPS, ICML, ICLR, CVPR, arXiv, OpenReview, etc., including underscore-joined forms like
PAVO_TMLR_submission.
- Paper venue exclusions: TMLR, NeurIPS, ICML, ICLR, CVPR, arXiv, OpenReview, etc., including underscore-joined forms like
-
- Compiled / typeset PDF / LaTeX / TeX output exclusions.
-
- 9 new detector tests, each derived from a real false positive surfaced in audit-self.
-
- Comparison table and live-demo paragraph in the README.
-
- CONTRIBUTING.md, issue templates, PR template.
Verification
$ node --test 'test/*.test.mjs'
# tests 104
# pass 104
# fail 0
Install
curl -fsSL https://raw.githubusercontent.com/vnmoorthy/groundtruth/main/install.sh | bashOne paste, ~1 second after git clone finishes. Zero new dependencies. Node 18+.
Full release history in CHANGELOG.md.