Skip to content

Serve all client API requests from DB instead of ScrapingBee #37

@Jing-yilin

Description

@Jing-yilin

Problem

Currently, all client requests for trending/newest/ending sorts call ScrapingBee API, consuming 1 credit per request. This is wasteful since:

  • DB already has 5,614+ campaigns updated daily via cron
  • Multiple users requesting the same data = duplicate API calls
  • Unnecessary cost (~/mo plan being exhausted quickly)

Current Flow

User → API → ScrapingBee (2-5s, 1 credit) → Response

Current Credit Usage

  • Every trending request: 1 credit
  • Every newest request: 1 credit
  • Every ending request: 1 credit
  • Hot sort: already uses DB ✓

Solution

Serve all client queries from database:

User → API → PostgreSQL (<5ms, 0 credit) → Response

DB Mapping

  • trending → ORDER BY velocity_24h DESC
  • newest → ORDER BY first_seen_at DESC
  • ending → ORDER BY deadline ASC
  • hot → ORDER BY velocity_24h DESC (already implemented)

Required Changes

  1. campaigns.go: Replace ScrapingBee calls with DB queries
  2. model.go: Add indexes for performance
  3. db.go: Add composite indexes migration

Benefits

  • Cost: ~90% reduction in ScrapingBee usage
  • Speed: 2-5s → <5ms response time
  • Scalability: Support unlimited concurrent users
  • Data freshness: Still updated daily via cron (2:00 UTC)

Implementation

Add DB Indexes

CREATE INDEX idx_campaigns_trending ON campaigns(state, velocity_24h DESC, percent_funded DESC);
CREATE INDEX idx_campaigns_newest ON campaigns(state, first_seen_at DESC);
CREATE INDEX idx_campaigns_ending ON campaigns(state, deadline ASC);
CREATE INDEX idx_campaigns_category_trending ON campaigns(state, category_id, velocity_24h DESC);

Update Handler

All sort modes served from DB with offset-based pagination.

Related

  • Nightly cron already populates DB with fresh data
  • ScrapingBee only used for:
    • Nightly crawl (cron job)
    • User search queries (manual /search endpoint)

Acceptance Criteria

  • All trending/newest/ending requests use DB
  • Response time < 10ms for first page
  • Pagination works correctly
  • Category filtering works
  • ScrapingBee fallback only if DB unavailable
  • Add indexes to optimize queries
  • Verify credit usage drops to near-zero for client requests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions