Skip to content

fix: Camoufox anti-detection upgrade + Docker scraping fix#1

Open
PHY041 wants to merge 4 commits into
mainfrom
feat/camoufox-upgrade
Open

fix: Camoufox anti-detection upgrade + Docker scraping fix#1
PHY041 wants to merge 4 commits into
mainfrom
feat/camoufox-upgrade

Conversation

@PHY041
Copy link
Copy Markdown
Owner

@PHY041 PHY041 commented Mar 6, 2026

Summary

  • Fix Camoufox Docker scraping: use headless='virtual' for full WebGL/GLX support + properly pre-download binary with python -m camoufox fetch
  • Add residential proxy support via PROXY_URL env var (auto-detects geolocation)
  • Add custom binary override via CAMOUFOX_PATH env var (for future FF146 upgrade)
  • Fix photo_map not bound to config.color_variants (images never reached generation pipeline)
  • Stream multipart uploads through Next.js proxy (preserve form boundary)
  • Add listing import endpoint (POST /listing/jobs/{job_id}/import)

Test plan

  • Docker container: Amazon search returns 60 results (was 0 before)
  • headless='virtual' launches Xvfb with GLX automatically
  • Docker build completes with Camoufox binary baked in (~707MB FF135)
  • Test with residential proxy (PROXY_URL) for Amazon.com (vs Amazon.sg redirect)
  • Test listing import endpoint with existing listing-gen project
  • Verify multipart upload works from frontend

🤖 Generated with Claude Code

PHY041 and others added 4 commits March 3, 2026 17:55
…ligence

Loads pre-analyzed dataset (4,119 products / 80,912 images) to provide
category baselines. Scrape results are now compared against the full
category — GPT generates data-backed recommendations like "87% of
competitors missing size chart" and "A+ adoption 65.4%, avg score 7.0/9".

- New benchmark_service.py: lazy-load + pre-aggregate by 19 categories
- New /benchmarks API: categories, category detail, top-gaps, reverse-prompts
- Pipeline injects benchmark into strategy_gen, story_arc, listing_service
- HTML report: new Category Benchmark section with comparison charts
- Frontend: new Benchmark tab with opportunity cards and delta bars
- ZIP export: includes benchmark.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Browser upgrades:
- Add residential proxy support via PROXY_URL env var
- Add custom binary override via CAMOUFOX_PATH env var
- Use headless='virtual' on Linux for full WebGL/GLX (avoids Amazon detection)
- BrowserPool singleton pattern with atexit cleanup

Pipeline fixes:
- Fix photo_map not bound to config.color_variants (images never reached pipeline)
- Stream multipart uploads through Next.js proxy (preserve boundary)
- Add listing import endpoint (POST /listing/jobs/{job_id}/import)
- Pass PROXY_URL and CAMOUFOX_PATH through docker-compose

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use `python -m camoufox fetch` instead of importing the module,
which doesn't trigger binary download. The 707MB Firefox binary
is now baked into the image at build time.

Tested: Amazon search returns 60 results in Docker container
with headless='virtual' (proper Xvfb + GLX support).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Install `.[api]` extras (uvicorn, python-multipart, python-dotenv)
- Add `camoufox[geoip]` for auto-geolocation via IP
- Remove `xvfb-run` from CMD — headless='virtual' manages its own
  Xvfb internally, dual Xvfb causes display conflicts
- Use JSON CMD form for proper signal handling

Tested: Amazon.sg search returns 48 results, Amazon.com search
returns 60 results (auto-redirected via geoip), API health OK.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant