Skip to content

feat(challenge): managed challenge solver, proxy burnout detection, CF fake-success handling#6

Open
Austex wants to merge 1 commit intoDeepBlueDynamics:mainfrom
getproducthub:feat/nonobservability
Open

feat(challenge): managed challenge solver, proxy burnout detection, CF fake-success handling#6
Austex wants to merge 1 commit intoDeepBlueDynamics:mainfrom
getproducthub:feat/nonobservability

Conversation

@Austex
Copy link
Contributor

@Austex Austex commented Mar 1, 2026

Summary

  • Managed Challenge solver: Add CapSolver AntiCloudflareTask for Cloudflare Managed Challenges (proxy-based, no sitekey needed). Supplements existing AntiTurnstileTaskProxyLess for standalone Turnstile widgets.
  • Interactive Turnstile click: Try clicking the Turnstile checkbox before falling back to CapSolver API — fast and free.
  • CapSolver UA-matched context: cf_clearance cookies are bound to the UA that solved them. Create a fresh browser context with matching UA + cookies so Cloudflare accepts them. Cache the UA per domain for reuse.
  • Proxy burnout detection: Track consecutive failures and restart browser with fresh proxy session after N timeouts (default 3). Detect NS_ERROR_PROXY (dead session) and restart immediately.
  • Cloudflare fake-success detection: Catch challenge pages served as HTTP 200 with JS bloat but no real markdown content. Also catch HTTP 403/503 + thin body as blocked.
  • Detection priority fix: DOM selectors now take priority over title matching for accurate challenge type classification (TURNSTILE vs MANAGED).

Changes

app/challenge_solver.py

  • New solve_managed_challenge_capsolver() — AntiCloudflareTask API integration
  • New _click_turnstile_checkbox() — interactive Turnstile widget click
  • New _format_proxy_for_capsolver() — Playwright proxy dict → CapSolver format
  • New _call_capsolver_managed() — CapSolver API create/poll for managed tasks
  • Expanded _extract_turnstile_sitekey() — JS widget, script URL path, HTML regex
  • Updated detect_challenge() — DOM selectors first, title-only fallback as MANAGED
  • Updated wait_for_challenge_resolution() — handle "Verification successful" state
  • Updated resolve_challenge() — 5-step pipeline: auto → click → managed → turnstile → fail

app/browser.py

  • Consecutive failure tracking + _restart_with_fresh_proxy()
  • Proxy exit IP logging via _check_exit_ip()
  • CapSolver UA-matched context switching in crawl_with_context()
  • Proxy error (NS_ERROR_PROXY) detection and immediate restart
  • Human behavior simulation parameters (scroll_count, platform)

app/crawler.py

  • Auto-derive domain from URL for cookie store
  • Cloudflare challenge detection: HTML-heavy/no-markdown, 403/503 thin body, CF signature matching

app/config.py / app/cookie_store.py

  • New settings: challenge timing, proxy session duration, restart threshold
  • Default browser engine: camoufox (was chromium)
  • CapSolver UA cache in CookieStore with TTL expiry

Test plan

  • test_challenge_solver.py — managed solver, click approach, proxy formatting, detection priority
  • test_block_detection.py — CF fake-success detection, HTML-heavy/no-markdown, 403/503 thin body
  • test_camoufox.py — default engine, proxy passthrough, context isolation
  • test_config.py — new settings defaults, sticky session duration
  • test_browser_failure_tracking.py — consecutive failures, proxy restart

🤖 Generated with Claude Code

…n, and Cloudflare fake-success handling

Challenge resolution pipeline:
- Add AntiCloudflareTask (CapSolver) for Managed Challenges that require
  proxy — supplements existing AntiTurnstileTaskProxyLess for standalone
  Turnstile widgets
- Add interactive Turnstile checkbox click as a fast, free step before
  falling back to CapSolver API
- Re-order detection: DOM selectors now take priority over title matching
  so challenge type classification is accurate (TURNSTILE vs MANAGED)
- Title-only matches now classify as MANAGED (not JS_CHALLENGE) since
  Cloudflare uses the same "Just a moment..." title for all challenge
  types — this ensures CapSolver eligibility
- Handle "Verification successful" state where cf_clearance is set but
  page doesn't auto-navigate — detect and re-navigate with cookie

CapSolver UA-matched context:
- cf_clearance cookies are bound to the user agent CapSolver used to
  solve the challenge — create a fresh browser context with the matching
  UA and inject cookies so Cloudflare accepts them
- Cache CapSolver UA per domain in CookieStore so subsequent crawls can
  reuse cf_clearance without re-solving
- Sitekey extraction expanded: JS widget instances, script URL path
  (/turnstile/v0/g/{sitekey}/api.js), HTML regex fallback

Proxy resilience:
- Track consecutive navigation failures per BrowserEngine instance
- After N consecutive timeouts (configurable via PROXY_RESTART_AFTER_FAILURES,
  default 3), restart browser with a fresh proxy session
- Detect NS_ERROR_PROXY errors (dead proxy session) and restart immediately
- Log proxy exit IP on browser start via httpbin.org/ip check
- Shorten default sticky session duration from 30 to 10 minutes
  (configurable via PROXY_SESSION_DURATION_MINUTES)

Cloudflare fake-success detection in content quality:
- HTML-heavy pages (>3K) with tiny markdown (<100 chars) = challenge page
  with JS bloat, not real content — override has_substantial_content guard
- HTTP 403/503 + thin body (<200 chars) = blocked
- HTTP 200 + 2+ Cloudflare signatures + thin body (<500 chars) = blocked
- Auto-derive domain from URL for cookie store when not explicitly passed

Config:
- Default browser engine changed from chromium to camoufox
- docker-compose.yml: proxy env vars now active (not commented out)
- New settings: challenge_auto_wait_ms, challenge_capsolver_timeout_ms,
  proxy_session_duration_minutes, proxy_restart_after_failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant