Skip to content

Releases: DrDexter6000/talking-cli

v0.2.0 — Distributed Prompting, Hardened

27 Apr 16:22

Choose a tag to compare

What changed

This is the first public-quality release of talking-cli. The project has gone through a full TDD-driven development cycle (Phase 0 through H), emerging with a hardened methodology, reproducible evidence, and a self-auditing tool.

Methodology

  • Distributed Prompting methodology formalized in PHILOSOPHY.md (four channels, four rules, budget, five anti-patterns)
  • Prompt-On-Call as the concrete implementation pattern
  • Adversarial Case Study documenting four known failure modes with mitigations

Evidence

  • 2x2 ablation benchmark on GLM-5.1 (15 curated tasks): -67% tokens, +26pp pass rate
  • MCP ecosystem audit: 4 Anthropic servers, 68 scenarios, 0/68 returned actionable guidance
  • MCP-Atlas public corpus: 10 tasks adapted from real sample_tasks.csv (CC-BY-4.0)
  • SkillsBench independent validation: comprehensive skills at P99.5 degrade -2.9pp

Tool

  • audit: H1-H4 heuristics for skill directories (ruleset v1.0.0)
  • audit-mcp: M1-M4 heuristics for MCP servers (static + deep runtime mode)
  • init: Scaffold a new skill directory with passing templates
  • optimize: Generate optimization plans with --apply auto-fix
  • Self-audit score: 100/100 (CI-enforced, badge in README)

Security

  • SECURITY.md with threat model documentation
  • --no-spawn flag for safe static-only MCP analysis
  • --deep warning in README

Infrastructure

  • GitHub Actions CI workflow (self-audit on every PR, fails below 80)
  • Anti-bloat regression test (SKILL.md <=150 lines, persona count <=2)
  • Heuristic ruleset versioning (v1.0.0)
  • Publication-standard benchmark reports (PROVEN/SUCCESS/PARTIAL/FAILURE verdict tiers)

Stats

  • 36 test files, 295 tests passing
  • Self-audit: 100/100 (H1=100, H2=100, H3=100, H4=100)
  • Ruleset version: 1.0.0