Make outbound /send durable (enqueue + worker) so it survives restarts & crashes

## Problem

The outbound send path is **synchronous and not durable**: `a.sender.Send()` (`internal/agent/api.go:1185`) runs the SES SMTP transaction inline in the HTTP request, with only *in-process* retries (1s+5s+15s backoff for transient SES 4xx). If the process dies mid-send, nothing resumes it.

This bites on every deploy (and any crash):
- The server gets SIGTERM and drains via `http.Server.Shutdown` (~30s budget, `cmd/e2a/main.go`). A send that finishes in time is fine.
- A send still in-flight when the grace window expires is **SIGKILLed → lost**, with no server-side retry. Note the relay's own backoff (~21s) can exceed a tight grace window. (Mitigated, not fixed, by `stop_grace_period: 35s` — Mnexa-AI/e2a-ops#153.)
- On a hard SIGKILL the idempotency guard's **deferred `finalize()` never runs** (`internal/agent/idempotency_guard.go`), leaving the claim stuck "in-flight" → client retries hit 409 until the lock expires. And if the kill landed **after SES accepted but before finalize**, the original delivers, the server has no record, and the eventual client retry **re-sends → duplicate**.

Net: outbound send is at-most-once on the happy path, but a mid-flight kill exposes a small **lost-or-duplicated** window that the server never auto-recovers.

## Proposed solution

Make the send durable, mirroring the pattern the webhook delivery path already uses (transactional outbox + River worker):

1. **Enqueue, don't send inline.** In the send handler, write the composed message row **and** a `send` job in **one transaction** (durable outbox), then return `202`/the message id. No SES call on the request path.
2. **Relay from a worker.** A River (or equivalent) worker picks up the job and does the SES SMTP relay, with bounded retries + backoff, updating the message's send status (`queued → sending → sent | failed`) and `provider_message_id`.
3. **Idempotent delivery.** Carry an idempotency token on the job so a worker retry after a crash can't double-send (dedup on the message id / a sent marker checked before the SES call).
4. **Reconcile** stuck `sending` rows on startup (re-enqueue), and surface terminal `failed` to the agent (webhook / status).

## Acceptance criteria
- A `kill -9` of the server mid-send results in the message being **delivered exactly once** after restart (no loss, no duplicate).
- A deploy never loses an accepted send.
- `/send` returns promptly (no longer blocks on the SES round-trip + backoff).
- HITL hold, screening, and the existing idempotency-key contract still hold.

## Context
Surfaced while reviewing deploy-time availability. Partial mitigation shipped: `stop_grace_period: 35s` (Mnexa-AI/e2a-ops#153) lets graceful drain + relay backoff finish, converting most in-flight sends from killed → completed — but the durable path is the real fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make outbound /send durable (enqueue + worker) so it survives restarts & crashes #327

Problem

Proposed solution

Acceptance criteria

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Make outbound /send durable (enqueue + worker) so it survives restarts & crashes #327

Description

Problem

Proposed solution

Acceptance criteria

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions