Skip to content

Email system cleanup#713

Merged
xlcrr merged 13 commits into
masterfrom
fixes/2026/06/fix-email-addresses-cleanup
Jun 28, 2026
Merged

Email system cleanup#713
xlcrr merged 13 commits into
masterfrom
fixes/2026/06/fix-email-addresses-cleanup

Conversation

@xlcrr

@xlcrr xlcrr commented Jun 27, 2026

Copy link
Copy Markdown
Member

Email deliverability — safe resends, bounce handling, list hygiene (v5.12.8)

📣 What this delivers (product)

We send periodic updates to our mailing list. The last send ("Update 28")
stalled on a malformed address, and we had no record of who had already
received it
— so re-running would have re-mailed thousands. Separately, years
of dead addresses were quietly dragging down deliverability.

This PR fixes the foundation so that:

  • Resends are safe. Every recipient of a campaign is recorded; a re-run only
    emails people who haven't been sent yet — never twice.
  • The list cleans itself. Hard bounces and spam complaints from AWS SES now
    flow back automatically and permanently suppress those addresses.
  • Junk can't get in. Obvious bad addresses (e.g. name@8) are rejected at
    signup and again right before sending.
  • Deliverability is protected for the EU Presidency–window campaigns —
    healthier sender reputation, fewer spam-folder landings.

User-facing behaviour is otherwise unchanged. No new UI.

🔧 How we did it (technical)

The ledger is the centrepiece. A new email_sends table with a
UNIQUE(campaign, email) key. That unique key doubles as an atomic claim
lock
: the send command claims a row (insertOrIgnore / conditional UPDATE)
before dispatching, so concurrent workers can't double-send and a crash is
recoverable. Re-runs skip accepted / in-flight queued rows and re-evaluate
previously-skipped ones (an address can become sendable later).

Suppression is the source of truth for "don't email this."

  • email_suppressions — one row per normalized address. EmailSuppression::suppress()
    is the single authority for the precedence rule: a complaint outranks a
    bounce and is never downgraded.
  • email_events — raw SES payloads for audit + idempotency, keyed
    UNIQUE(sns_id, email) so duplicate SNS deliveries are harmless.

The SNS webhook (POST /webhooks/aws/ses/sns) verifies the AWS signature
and TopicArn before anything else, reads the raw text/plain body, and is
CSRF-exempt. Permanent bounces and complaints suppress; transient bounces are
ignored; undetermined ones are logged only. The MessageValidator is
container-injected so it can be faked in tests.

Validation lives in one place — EmailAddress::isSendable() (filter_var +
a dotted-domain check, no DNS lookups) — shared by the subscribe endpoint and
the send guard. EmailAddress::normalize() is the single normalization helper.

The job (DispatchEmail) carries an EmailRecipient value object (not a
User, so subscribers work too) and writes ledger accepted/failed only in
its terminal hooks.

Notable decisions

  • Time-based retries, not $tries. The RateLimited middleware releases
    throttled jobs back to the queue, and each release increments attempts — a
    fixed $tries would mark valid recipients failed mid-blast. Uses
    retryUntil(+6h) + maxExceptions=3 instead.
  • Send eligibility unchanged: emailsub = 1 (still includes unverified
    users). users.email_verified_at was added as a forward-compat column
    (dual-written by confirmEmail()), with a guarded email:backfill-verified-at.
  • --test--only-email (breaking flag rename); --dry-run writes
    nothing; --retry-stale-queued is validated (≥60s) to prevent reclaiming
    in-flight rows.
  • One-job-per-recipient is kept — blast radius of a bad address is exactly one job.

Schema

email_suppressions, email_events, email_sends, and a nullable
users.email_verified_at. (Prod is MySQL 8.4 — the composite unique on utf8mb4
is well within index limits.)

New artisan commands

  • olm:send-email-to-subscribed --campaign=<key> — ledger-tracked, resumable send
  • email:import-suppressions <file> — backfill the SES account suppression list
  • email:backfill-verified-at [--apply] — backfill email_verified_at

🚀 Deploy steps (after merge)

  1. php artisan migrate
  2. AWS: create an SNS topic in eu-west-1, point SES bounce+complaint
    notifications at it, subscribe the webhook URL, set SES_SNS_TOPIC_ARN
  3. Backfill suppressions:
    aws sesv2 list-suppressed-destinations --region eu-west-1 --page-size 1000 --output json > ses-suppressed.json
    php artisan email:import-suppressions ses-suppressed.json
  4. php artisan email:backfill-verified-at --apply
  5. Opt out the 10 malformed legacy users rows (SQL in the changelog)
  6. Resend Update 28: --campaign=update28 --dry-run to review, then the real run
    (ensure a queue worker is running)

✅ Testing

1340 passed, 1 skipped, 0 failures. New coverage under tests/Feature/Email/
(validation, ledger claim/resume/retry, webhook matrix, suppression import,
verification migration). Includes external code-review fixes and a multi-agent
simplification pass. Adds one dependency: aws/aws-php-sns-message-validator
(signature verification only).

xlcrr and others added 13 commits June 27, 2026 16:43
…fill (v5.12.8)

Make bulk sends safe to resend and stop mailing dead addresses.

- email_sends campaign ledger (atomic claim via uniq(campaign,email)) makes
  olm:send-email-to-subscribed runnable, interruptible, resumable, auditable;
  rewritten with --campaign, suppression/invalid/duplicate guards, re-run-safe
  reclaim of skipped rows, dry-run report, --limit/--retry-failed/
  --retry-stale-queued, --test renamed to --only-email. Ledger accepted/failed
  written in the job's terminal hooks.
- EmailRecipient DTO so DispatchEmail no longer assumes a User.
- App\Support\EmailAddress::isSendable() single source of truth; SubscribersController
  refactored onto it.
- email_suppressions (source of truth, complaint outranks bounce) +
  email_events (audit/idempotency).
- POST /webhooks/aws/ses/sns: SNS signature + TopicArn checks first, raw
  text/plain body, CSRF-exempt, container-injected validator. Permanent bounce
  & complaint suppress; transient ignored; undetermined logged only.
- email:import-suppressions backfills the SES account suppression list.
- email_verified_at column + confirmEmail() dual-write + email:backfill-verified-at
  (dry-run/--apply). Send eligibility unchanged (emailsub=1, includes verified=0).
- Adds aws/aws-php-sns-message-validator (signature verification only).

Tests: tests/Feature/Email/* — full suite 1335 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….4.8 (v5.12.8)

- CLAUDE.md: email deliverability invariants (isSendable single source of truth,
  suppression precedence, ledger atomic claim, SNS webhook, email_verified_at).
- Skills: api-endpoints (web mailing/webhook routes + invariant), testing-patterns
  (tests/Feature/Email + MessageValidator fake note).
- Profile.md: verified + email_verified_at columns.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…queued guard

- P1: DispatchEmail uses retryUntil(+6h) + maxExceptions=3 instead of $tries=3.
  RateLimited releases increment attempts, so a fixed $tries would mark valid
  recipients failed without sending during a throttled bulk run.
- P2: SNS SubscriptionConfirmation returns 502 when the SubscribeURL call is not
  successful, so SNS retries instead of leaving the subscription pending.
- P3: --retry-stale-queued rejects non-positive / <60s values (a =0 typo would
  reclaim in-flight queued rows and double-send).
- Perf: send command's user query drops the photosCount global scope + $with
  eager loads and selects only id,email,sub_token.
- 3 regression tests added. Full suite: 1338 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iber row

When a footer subscribe matches a registered account, set the user's emailsub=1
rather than creating a subscriber row the send command would exclude. Fixes the
"subscription created but never mailed" gap for opted-out registered users
(Codex P2). Case-insensitive match; 2 regression tests.

Full suite: 1340 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ocks

- EmailAddress::normalize() is now the single source of truth for
  strtolower(trim()); send command, SNS webhook, and EmailSuppression all use
  it (was duplicated as 2 private methods + 1 inline).
- Send command: verdict->skip-status collapsed to a match expression.
- Trimmed verbose migration docblocks (row-format math is moot on MySQL 8.4;
  fixed a stale readme/sql/ reference).

No behaviour change. Full suite: 1340 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revert the earlier "resubscribe a matched user" logic. The subscribe endpoint
is for anonymous visitors who want newsletter emails; it should not assume or
mutate user accounts. It now just creates a subscribers row (validated by
EmailAddress::isSendable + unique:subscribers). Removed the 2 user-mutation
tests.

Full suite: 1338 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ain regex

Laravel's default `email` rule is RFC-permissive and accepts the malformed
addresses that broke the Update 28 send (e.g. Estherbarriga@8). `email:filter`
(filter_var) rejects them with no DNS lookup, so the custom regex is redundant
on PHP 8.3. SubscribersController now uses `email:filter`; EmailAddress::isSendable()
slims to a filter_var check (kept for the send-time guard, a non-validation
context). normalize() unchanged.

Full suite: 1338 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex performance audit fixes for the send command:
- decide() no longer queries per candidate. runChunk() batch-loads each chunk's
  email_sends rows in one whereIn and passes the row in — kills the ~2N N+1
  (tally + execute passes) of single-row ledger lookups.
- claim() skips the no-op reclaim UPDATE when no row exists (the common
  first-run case); keeps insertOrIgnore + atomic re-check for safety.
- chunk() -> chunkById() (keyset paging) for users and subscribers: faster on
  large tables, stable if rows change mid-run.
- Added index on subscribers.email (anti-join in the send + unique check on
  every signup). Plain index, not unique — legacy dupes would fail a unique one.

Skipped (with reasons): users.emailsub index (low selectivity, hot-table write
cost, rare command); --limit quick mode (tally is cheap now); full batch-insert
claim (complexity > benefit on a 12/sec-throttled send).

Behaviour unchanged. Full suite: 1338 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ise docs

Per review: users.email_verified_at + email:backfill-verified-at are
verification (Phase-2-adjacent) and don't help the mass-email send failure.
Removed the migration, command, model dual-write/cast, and tests so the PR is
just the deliverability fix.

Also reverted the CLAUDE.md invariants block and the .ai/skills (api-endpoints,
testing-patterns) additions — premature to document invariants before the
feature settles, and they made the PR look larger than the product change.
readme/API.md + ArtisanCommands.md (the actual API/command docs) stay.

Full suite: 1334 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review: make the core behaviour read top-to-bottom instead of like a little
framework.
- handle() is now one streaming loop: stream recipient -> prepare ledger row ->
  dispatch or skip.
- recipients() is a generator yielding [EmailRecipient, ?ledgerRow], hiding the
  chunked, batch-loaded reads behind a plain foreach (replaces the
  eachCandidate callback + $stopped stop-state).
- prepare() is the single place a ledger row is classified and written
  (wraps classify/claim/logSkip).
- Dropped the separate read-only tally pass: a cheap preflight (eligible /
  duplicate / already-accepted counts) precedes the confirm; the run prints the
  dispatched/skipped tallies.

Kept (pushed back on dropping): the ledger, suppression checks,
one-job-per-recipient, and the --retry-failed / --retry-stale-queued recovery
valves — now contained inside prepare()/classify(). Kept the per-chunk batch
ledger load, encapsulated in the generator so control flow stays linear.

Behaviour unchanged. Full suite: 1334 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odel

4-agent architecture/simplify audit. Applied the genuine wins:
- prepare() returns [outcome, sendId] instead of mutating $summary by reference;
  the caller tallies — side effects are explicit at the call site.
- Inlined the staleThreshold(): Carbon|false|null tri-state back into handle()
  (the false sentinel was an awkward error signal).
- Trimmed the run summary to the keys it actually prints.
- Moved the duplicated Carbon timestamp parsing into EmailSuppression::suppress()
  (now takes a raw ?string and parses once in the model) — removed the
  copies from the SNS webhook and the import command.
- streamPass() normalizes each email once per recipient.
- Trimmed a migration docblock.

Architecture confirmed sound by the audit — ledger / suppression / one-job-per-
recipient / webhook kept as-is; extracting a service would be over-engineering.

No behaviour change. Full suite: 1334 passed, 1 skipped, 0 failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@xlcrr xlcrr merged commit 80af64f into master Jun 28, 2026
1 check passed
@xlcrr xlcrr deleted the fixes/2026/06/fix-email-addresses-cleanup branch June 28, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant