Skip to content

perf(#26): optimize ScrapingBee — render_js=false, session_id, ai_selector, usage monitoring#27

Merged
Jing-yilin merged 3 commits into
developfrom
feature/26-scrapingbee-optimize
Feb 27, 2026
Merged

perf(#26): optimize ScrapingBee — render_js=false, session_id, ai_selector, usage monitoring#27
Jing-yilin merged 3 commits into
developfrom
feature/26-scrapingbee-optimize

Conversation

@Jing-yilin
Copy link
Copy Markdown
Contributor

Closes #26

Changes

render_js=false (5 → 1 credit per page, 80% saving)

Kickstarter's /discover/advanced is server-side rendered. Switching to render_js=false drops cron crawl from 5 credits/page to 1 credit/page.

Scope Before After
26 categories/month 78,000 credits 15,600 credits
124 categories/month 312,750 credits 62,550 credits
AI search 10 credits 6 credits

session_id per (sort × category) crawl

All pages for the same category crawl share a single proxy IP for up to 5 minutes, reducing the chance of Kickstarter rate-limiting between pages.

ai_selector="[data-project]" for AI search

Focuses AI extraction only on project card elements, reducing token processing overhead.

Credit usage monitoring at startup

On service start, logs current monthly credit consumption and warns if ≥ 80% consumed.

ScrapingBee usage: 1250/250000 credits (0.5%), renews 2026-03-27

New public API

  • ScrapingBeeClient.FetchHTMLInSession(ctx, url, sessionID) — 1 credit, sticky IP
  • ScrapingBeeClient.FetchUsage(ctx) — returns UsageResult
  • KickstarterScrapingService.LogUsage() — logs usage with warning threshold
  • DiscoverCampaigns signature adds sessionID int param

@Jing-yilin Jing-yilin merged commit 1f9f52b into develop Feb 27, 2026
2 checks passed
@Jing-yilin Jing-yilin deleted the feature/26-scrapingbee-optimize branch February 27, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant