API auth failure detection and auto-recovery

## Problem

Two `authentication_failed` errors hit on Mar 21, 10 seconds apart. Root cause was accepted as 'transient' without evidence. If auth fails mid-job, the job fails silently — no notification to Frederik, no retry mechanism.

Auth failures during running jobs → `status: failed`, no alert.

## Fix

1. **Pattern detection:** If 2+ auth failures occur within 1 hour, automatically:
   - Create a GH issue with diagnostics (time, job IDs, error text)
   - Send ntfy notification
   - Stop queueing new Claude jobs until manual clearance

2. **Auth preflight:** Before every job, verify Claude API auth:
   ```bash
   # Quick API health check
   curl -s -o /dev/null -w '%{http_code}'      -H 'x-api-key: $ANTHROPIC_API_KEY'      'https://api.anthropic.com/v1/models' | grep -q '200'
   ```

3. **Retry with backoff:** On single auth failure mid-job, wait 30 seconds and retry once before marking failed.

4. **Morning report:** Include API auth status (last successful call, any recent failures)

## Acceptance Criteria

- [ ] Auth failure count tracked per hour
- [ ] 2+ failures in 1h → GH issue auto-created
- [ ] Single failure → retry once with 30s backoff
- [ ] Morning report shows auth health
- [ ] Preflight check before every job

## Context

From Jensen limitation audit (2026-03-22). Workstream: WS-001.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API auth failure detection and auto-recovery #19

Problem

Fix

Acceptance Criteria

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

API auth failure detection and auto-recovery #19

Description

Problem

Fix

Acceptance Criteria

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions