feat(challenge): managed challenge solver, proxy burnout detection, CF fake-success handling#6
Open
Austex wants to merge 1 commit intoDeepBlueDynamics:mainfrom
Open
Conversation
…n, and Cloudflare fake-success handling
Challenge resolution pipeline:
- Add AntiCloudflareTask (CapSolver) for Managed Challenges that require
proxy — supplements existing AntiTurnstileTaskProxyLess for standalone
Turnstile widgets
- Add interactive Turnstile checkbox click as a fast, free step before
falling back to CapSolver API
- Re-order detection: DOM selectors now take priority over title matching
so challenge type classification is accurate (TURNSTILE vs MANAGED)
- Title-only matches now classify as MANAGED (not JS_CHALLENGE) since
Cloudflare uses the same "Just a moment..." title for all challenge
types — this ensures CapSolver eligibility
- Handle "Verification successful" state where cf_clearance is set but
page doesn't auto-navigate — detect and re-navigate with cookie
CapSolver UA-matched context:
- cf_clearance cookies are bound to the user agent CapSolver used to
solve the challenge — create a fresh browser context with the matching
UA and inject cookies so Cloudflare accepts them
- Cache CapSolver UA per domain in CookieStore so subsequent crawls can
reuse cf_clearance without re-solving
- Sitekey extraction expanded: JS widget instances, script URL path
(/turnstile/v0/g/{sitekey}/api.js), HTML regex fallback
Proxy resilience:
- Track consecutive navigation failures per BrowserEngine instance
- After N consecutive timeouts (configurable via PROXY_RESTART_AFTER_FAILURES,
default 3), restart browser with a fresh proxy session
- Detect NS_ERROR_PROXY errors (dead proxy session) and restart immediately
- Log proxy exit IP on browser start via httpbin.org/ip check
- Shorten default sticky session duration from 30 to 10 minutes
(configurable via PROXY_SESSION_DURATION_MINUTES)
Cloudflare fake-success detection in content quality:
- HTML-heavy pages (>3K) with tiny markdown (<100 chars) = challenge page
with JS bloat, not real content — override has_substantial_content guard
- HTTP 403/503 + thin body (<200 chars) = blocked
- HTTP 200 + 2+ Cloudflare signatures + thin body (<500 chars) = blocked
- Auto-derive domain from URL for cookie store when not explicitly passed
Config:
- Default browser engine changed from chromium to camoufox
- docker-compose.yml: proxy env vars now active (not commented out)
- New settings: challenge_auto_wait_ms, challenge_capsolver_timeout_ms,
proxy_session_duration_minutes, proxy_restart_after_failures
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
app/challenge_solver.pysolve_managed_challenge_capsolver()— AntiCloudflareTask API integration_click_turnstile_checkbox()— interactive Turnstile widget click_format_proxy_for_capsolver()— Playwright proxy dict → CapSolver format_call_capsolver_managed()— CapSolver API create/poll for managed tasks_extract_turnstile_sitekey()— JS widget, script URL path, HTML regexdetect_challenge()— DOM selectors first, title-only fallback as MANAGEDwait_for_challenge_resolution()— handle "Verification successful" stateresolve_challenge()— 5-step pipeline: auto → click → managed → turnstile → failapp/browser.py_restart_with_fresh_proxy()_check_exit_ip()crawl_with_context()app/crawler.pyapp/config.py/app/cookie_store.pyTest plan
test_challenge_solver.py— managed solver, click approach, proxy formatting, detection prioritytest_block_detection.py— CF fake-success detection, HTML-heavy/no-markdown, 403/503 thin bodytest_camoufox.py— default engine, proxy passthrough, context isolationtest_config.py— new settings defaults, sticky session durationtest_browser_failure_tracking.py— consecutive failures, proxy restart🤖 Generated with Claude Code