Skip to content

Add expanded anti-rationalization patterns #4

@mickeylorenzini

Description

@mickeylorenzini

Problem

Harden has anti-patterns that describe audit failures, but not patterns that detect agent self-deception during execution. Agents rationalize skipping steps with plausible-sounding excuses that slip past current checks.

Examples of agent rationalization:

  • "I already tested it manually" → skipped automated verification
  • "This is straightforward enough to not need tests" → skipped TDD
  • "The output looks reasonable" → no oracle verification
  • "I'll come back to this later" → deferred and forgotten

Proposed Solution

Add explicit anti-rationalization patterns to harden's audit that catch:

  • Claims of completed work without evidence (logs, test output, screenshots)
  • Substitution of easier tasks for harder required tasks
  • Deferral language that indicates skipped steps
  • "Looks good/reasonable" assessments without comparison to an independent source

Likely Affected Files/Modules

  • SKILL.md — new patterns in anti-pattern section or as a detection dimension within existing passes

Acceptance Criteria

  • At least 5 documented anti-rationalization patterns with examples
  • Each pattern has a detection heuristic (what text/behavior triggers it)
  • Patterns integrated into appropriate audit pass
  • Tested: agent-generated deliverable with known rationalized skip is flagged

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions