Skip to content

Remove canonical cache auto-refresh (sweep + on-read TTL)#83

Merged
windoze95 merged 1 commit into
mainfrom
fix/remove-canonical-refresh-sweep
Jun 30, 2026
Merged

Remove canonical cache auto-refresh (sweep + on-read TTL)#83
windoze95 merged 1 commit into
mainfrom
fix/remove-canonical-refresh-sweep

Conversation

@windoze95

Copy link
Copy Markdown
Owner

Problem

The app showed "Website blocked access" on recipe imports/previews. Root cause: our Firecrawl account was returning HTTP 402 — the free tier (1000 credits/mo) was fully exhausted.

The drain was self-inflicted, not abuse (the app is unreleased). The canonical-cache background refresh ran every 30 min, re-scraping entries stale for >7 days. For bot-blocked sites it routed to Firecrawl, but a failed refresh never advanced FetchedAt, so the same handful of blocked URLs retried every 30 min forever (~5 URLs × 48 cycles/day ≈ 240 Firecrawl calls/day → 1000 credits gone in ~5 days). Logs confirmed it: 124/124 Firecrawl calls in 14h were 402, across only 4–7 distinct URLs/day.

Change

Re-fetching imported recipes is wasteful — recipe pages are near-static — so this removes automatic re-fetch entirely:

  • Remove the 30-min background sweep (StartCanonicalBackgroundTasks / refreshStaleCanonicals) and its router wiring.
  • Remove the on-read canonicalTTL re-fetch from ImportFromURL, PreviewFromURL, and PreviewFromURLWithMultiCheck. A cached URL is now served permanently; re-extraction happens only on a true cache miss (or manually if a problem is reported).
  • Strip dead code: the canonicalTTL const + orphaned time import, and GetStaleEntries across the repo interface, implementation, and mock.

Behavior after

  • First time a URL is seen → fetched + extracted once, then cached (the only automatic fetch).
  • Every import/preview of that URL thereafter → served from cache regardless of age. Never re-fetched.
  • Opening an already-saved recipe → reads the user's own copy; never touches the source.

Tests

The two *_StaleCanonicalSkipped tests become *_OldCanonicalStillServed, asserting a year-old entry is served from cache without re-fetching. go test ./... -count=1 → green (11 packages, 0 failures).

Follow-up (not in this PR)

Firecrawl credits stay at 0 until the Jul 15 reset (or a manual top-up); with this loop gone the free tier is sufficient going forward. Optional: a CloudWatch alarm on Firecrawl 402 so exhaustion can't go unnoticed again.

🤖 Generated with Claude Code

https://claude.ai/code/session_01BU4UWZutHd1AnK3XAf7H19

The 30-min background sweep (StartCanonicalBackgroundTasks /
refreshStaleCanonicals) re-scraped >TTL-stale canonical entries every
cycle, and a failed refresh never advanced FetchedAt — so blocked-site
entries retried via Firecrawl forever, draining the Firecrawl free tier
(1000 credits) in ~5 days and surfacing as "Website blocked access"
(Firecrawl HTTP 402) across the app.

Canonical extractions are near-static, so re-fetching is wasteful.
Remove both the background sweep and the on-read canonicalTTL re-fetch:
a cached URL is served permanently and only extracted on a true cache
miss (or manually if a problem is reported). Strip the now-dead
GetStaleEntries from the repo interface, implementation, and mock.

Tests: the two *_StaleCanonicalSkipped tests become
*_OldCanonicalStillServed, asserting an old entry is served from cache
without re-fetching. Full suite green (go test ./... -count=1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BU4UWZutHd1AnK3XAf7H19
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@windoze95 windoze95 merged commit e7bdb85 into main Jun 30, 2026
1 check passed
@windoze95 windoze95 deleted the fix/remove-canonical-refresh-sweep branch June 30, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant