Problem
LLM outputs can exhibit sycophancy — agreeing with the user, giving overly positive assessments, and avoiding pushback, driven by training optimization for user satisfaction. This degrades deliverable quality by:
- Affirming incorrect assumptions instead of challenging them
- Providing false positive assessments ("great approach!") without critical analysis
- Avoiding necessary pushback on flawed designs or specs
- Expressing confidence without supporting evidence
This is distinct from anti-rationalization (agent deceiving itself about skipped work). Sycophancy is agent-to-user deception via excessive agreeableness.
Proposed Solution
Investigate and implement detection of sycophantic patterns in agent-generated deliverables:
- Research existing AI alignment literature on sycophancy detection
- Define detectable patterns: unqualified agreement, false positives, avoidance of tradeoff discussion, confidence without evidence
- Design as harden audit pass or detection dimension
- Consider: can harden detect sycophancy in its own output? (meta-concern)
Likely Affected Files/Modules
SKILL.md — new detection dimension or dedicated pass
Acceptance Criteria
Problem
LLM outputs can exhibit sycophancy — agreeing with the user, giving overly positive assessments, and avoiding pushback, driven by training optimization for user satisfaction. This degrades deliverable quality by:
This is distinct from anti-rationalization (agent deceiving itself about skipped work). Sycophancy is agent-to-user deception via excessive agreeableness.
Proposed Solution
Investigate and implement detection of sycophantic patterns in agent-generated deliverables:
Likely Affected Files/Modules
SKILL.md— new detection dimension or dedicated passAcceptance Criteria