In 2025, the METR randomized controlled trial tested 16 experienced developers on 246 real tasks using Cursor Pro with Claude 3.5/3.7 Sonnet. Before starting, developers predicted AI would speed them up by 24%. After use, they believed they had been 20% faster.
Objective measurement: they were 19% slower.
A 39-percentage-point gap between perception and reality — not because of the tool, but because of how it was used.
The same year, 10.3% of applications on Lovable's showcase had critical security flaws. An engineer compromised multiple applications in 47 minutes. The cause: AI-generated code that looked correct but was not verified.
Every LLM — Claude, GPT, Gemini, Codestral — operates as a stochastic generator:
P(tokenₙ | token₁, ..., token_{n-1})
Output is optimized for statistical plausibility, not correctness. The model has no internal mechanism to evaluate whether what it produces is semantically, logically, or functionally correct. This is not a limitation of any specific tool. It is a structural property of the architecture.
Without verification layer:
prompt → AI assistant → suggestion → you accept → shipped
The verification gap is open. Errors are invisible until production.
The distinction is not between tools. It is between two modes of operation.
Versus implements a structured adversarial critique layer on top of AI coding assistant suggestions, based on the IACDM methodology.
Instead of asking "is this suggestion good?", Versus applies specialized critical lenses to each suggestion — systematically, before you accept.
With Versus:
prompt → AI assistant → suggestion → Versus critiques → you decide → shipped
The verification gap is closed at the source.
| Extension | AI Assistant | Marketplace |
|---|---|---|
| Versus · Copilot | GitHub Copilot | Install |
| Versus · Claude | Claude (Anthropic) | Install |
Same methodology. Same lenses. Same adversarial posture. Different integration layer.
flowchart LR
A[👤 Developer] -->|prompt| B["🤖 AI Assistant<br/>Generative Agent"]
B -->|suggestion| C{"⚔️ Versus<br/>Adversarial Layer"}
C -->|Assumptions lens| D["🔍 What does this<br/>assume without declaring?"]
C -->|Architectural lens| E["🏗️ Can each part be<br/>tested in isolation?"]
C -->|Implementability lens| F["⚙️ Is the scope<br/>explicit enough?"]
C -->|Contextual lenses| G["🔒 Security<br/>⚡ Performance<br/>📋 Compliance"]
D & E & F & G --> H[📊 Critique Report]
H --> A
The generative agent (AI assistant) and the verification agent (Versus) are structurally separated. This is not cosmetic — it is a fundamental property. An agent cannot reliably verify its own output. Verification must always be external.
Versus applies two categories of lenses: universal (every suggestion) and contextual (activated by project signals).
| Lens | Core question | Failure class it catches |
|---|---|---|
| Assumptions | What does this suggestion assume without declaring? | Hidden premises (Leveson, 2011) |
| Architectural | Can each part be replaced, removed, or tested in isolation? | Hidden coupling, circular dependencies, SRP violations |
| Implementability | Is this scope executable in a single, focused session? | Underspecified interfaces, implicit dependencies |
| Lens | Activated when | Core question |
|---|---|---|
| Security | Exposed APIs, sensitive data, auth flows | How would an attacker exploit this surface with minimal effort? |
| Performance | High-compute, real-time, large data volumes | Where are the bottlenecks and what is the asymptotic behavior? |
| Scientific | Specialized domain, numerical parameters | Does every value, formula, and algorithm have a verifiable source? |
| Regulatory | Applicable standards (ISO, GDPR, HIPAA, etc.) | Does every normative requirement trace to a module? |
A lens is legitimate only if removing it exposes a class of failure that no other lens would detect. Redundancy disguised as rigor is not rigor.
For each critique session, Versus produces a coverage matrix — a bidimensional map (suggestions × lenses) showing finding distribution by severity.
flowchart LR
F["📋 Finding"] --> S1["🔴 Critical<br/>Zero tolerance<br/>Block acceptance"]
F --> S2["🟡 Important<br/>Explicit decision<br/>Resolve or accept risk"]
F --> S3["🟢 Suggestion<br/>Can be deferred<br/>Address in next iteration"]
style S1 fill:#7a1a1a,color:#fff,stroke:#c0392b
style S2 fill:#7a5c00,color:#fff,stroke:#f39c12
style S3 fill:#1a4a2a,color:#fff,stroke:#27ae60
| Severity | Color | Action |
|---|---|---|
| 🔴 Critical | Red | Block acceptance. Requires resolution. |
| 🟡 Important | Yellow | Explicit decision required: resolve or accept risk with justification. |
| 🟢 Suggestion | Green | Can be deferred without blocking. |
The matrix makes visible second-order information that generic review misses:
- Concentration by suggestion: a suggestion with findings across all lenses is likely conceptually flawed, not just syntactically.
- Concentration by lens: a lens finding issues everywhere indicates a systemic failure.
The formal model behind Versus defines two agent types:
flowchart TD
subgraph GA["Generative Agent (GA)"]
direction TB
G1[Receives prompt]
G2[Generates artifact]
G3["❌ Cannot verify<br/>own output"]
G1 --> G2 --> G3
end
subgraph VA["Verification Agents (VA)"]
direction TB
VA1["🤖 VA-Automatic<br/>linters · type checkers · tests"]
VA2["👤 VA-Human<br/>operator judgment"]
end
GA -->|artifact| VA
VA -->|concrete feedback| GA
VA -->|approved| SHIP[✅ Delivered]
VA -->|rejected| GA
Key property: When a VA rejects an artifact and provides concrete error feedback, the next generation by the GA is conditioned on empirical evidence of failure — qualitatively superior to the original prompt. This is the reconditioning mechanism.
Versus is the practical implementation of the adversarial critique phase within IACDM — Interactive Adversarial Convergence Development Methodology — a structured framework for AI-assisted software development.
IACDM was developed and validated in production R&D projects (2025–2026) and rests on three central theses:
Thesis 1: The quality of AI-assisted software is determined by the quality of the design that precedes code generation, not by the model's generative capability.
Thesis 2: AI is most productive when operating on granular work units with explicit scope, documented assumptions, and well-defined interfaces.
Thesis 3: Deep understanding of the problem is a precondition for any technical decision. Understanding the wrong problem is orders of magnitude more expensive than understanding it slowly.
flowchart TD
subgraph row1["Design"]
P0["0 · Problem Discovery"] --> P1["1 · Architecture"]
P1 --> P2["⚔️ 2 · Adversarial Critique"]
P2 --> P3["3 · Simplification"]
P3 -->|"Δ ≥ 15% or critical"| P2
end
subgraph row2["Build"]
P4["4 · Convergence Gate"] --> P5["5 · Code Implementation"]
P5 --> P6["6 · Tests"]
P6 --> P7["7 · Post-Review"]
end
P3 -->|"Δ < 15%, zero critical"| P4
style P2 fill:#4a1942,color:#fff,stroke:#9b59b6
style row1 fill:#111,stroke:#444
style row2 fill:#111,stroke:#444
Versus operates primarily at Phase 2 — the adversarial critique layer that attacks each architectural decision with specialized lenses before any code is written or accepted.
The full IACDM technical foundation is documented in:
📄 IACDM: Interactive Adversarial Convergence Development Methodology — Technical Foundation
📑 Preprint — TechRxiv (IEEE) · Jasmine Moreira, 2026
Why adversarial instead of collaborative?
LLMs trained with RLHF have a documented bias toward agreement. When asked "is this design good?", the model tends to say yes. This is what Argyris (1977) called single-loop learning — acting without questioning the action.
Versus enforces double-loop learning: it questions not just the solution, but the assumptions that sustain it. The prompt posture is explicitly adversarial: find what is wrong, not confirm what is right.
Why lenses instead of a single critique pass?
Generic critique ("attack this suggestion") produces superficial and uniform observations. The same LLM operates all rounds — without operationally distinct criteria, diversification is illusory. Specialized lenses with binary criteria constitute functionally distinct verification agents, even when operated by the same model.
This maps directly to ATAM (Kazman et al., 2000): different evaluators attack through distinct quality attributes, but trade-offs are resolved by a single entity maintaining systemic vision.
Why is the verification agent always external?
This is not a design choice — it is a structural constraint. A model cannot reliably verify its own output. Self-correction in LLMs has been shown to degrade without external feedback (Huang et al., 2023). The separation between GA and VA is the methodological core that makes verification meaningful.
Why can you switch models after Phase 4?
Gate G₄ marks a natural cost-optimization boundary. Phases 0–4 (design) require deep reasoning over ambiguous, incomplete information — use your most capable model. After G₄, the architecture is fully validated and all decisions are persisted in specs/. Phases 5–7 operate on well-specified, bounded tasks where a faster, cheaper model performs equivalently. No context is lost — it lives in specs/, not in the conversation.
Versus · Copilot — for GitHub Copilot users:
ext install JasmineMoreira.versus-copilot
Versus · Claude — for Claude (Anthropic) users:
ext install JasmineMoreira.versus-claude
Or search "Versus" in the VS Code Extensions panel.
Requirements: VS Code 1.85+, respective AI assistant extension active.
- Spec-Driven Test Protocol — tests against specs, not implementation. Test Map, negative tests, spec coverage report (v0.3.2)
- Custom lens definition (project-specific critique criteria)
- Coverage matrix export (JSON / Markdown)
- Gate history — track finding evolution across sessions
- Team mode — shared lens configuration via
.versus/config.json - Integration with IACDM
specs/knowledge repository
The lenses and critique model implemented in Versus are grounded in:
- Argyris, C. (1977). Double loop learning in organizations. Harvard Business Review.
- Brooks, F. (1987). No Silver Bullet. Computer, 20(4).
- Huang, J. et al. (2023). Large Language Models Cannot Self-Correct Reasoning Yet. arXiv:2310.01798.
- Kazman, R. et al. (2000). ATAM: Method for Architecture Evaluation. SEI/CMU.
- Leveson, N. (2011). Engineering a Safer World. MIT Press.
- METR (2025). Measuring the Impact of Early-2025 AI on Experienced Developer Productivity. arXiv:2507.09089.
- Popper, K. (1959). The Logic of Scientific Discovery. Hutchinson.
Full reference list in the IACDM technical document.
Jasmine Moreira Developer · Researcher · IACDM creator
This software is proprietary. The code is not open source.
The IACDM methodology documentation is made available for research and educational purposes. See LICENSE for details.
"A converged design is not a finished design, but one that is certainly ready to evolve."
— IACDM Technical Foundation, Jasmine Moreira