You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PerplexityBot fetch times out against sites fronted by aggressive bot WAFs (Cloudflare bot-fight, DataDome, Akamai Bot Manager), even on canonical hosts that pass every other bot check. This is site-side behavior, not a crawl-sim defect — documenting here so the tool's output isn't misread as a tool bug.
Observation
During a live audit of https://almostimpossible.agency on v1.5.0:
googlebot, gptbot, claudebot all returned HTTP 200 with identical server HTML
perplexitybot timed out twice at 30s — once on the bare domain, again on https://www.almostimpossible.agency/ directly
The canonical-host fix (#27's companion work in v1.5.0) did not cause this, and it's not a DNS or routing issue. It reproduces on the canonical URL with a normal curl:
curl -I -A "Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://www.perplexity.ai/bot)" \
--max-time 30 https://www.almostimpossible.agency/
# hangs / times out
Why this happens
The three tiers of bot classification at Cloudflare/Akamai/DataDome treat PerplexityBot differently than Googlebot:
Googlebot: verified via reverse-DNS, allowed through by default
GPTBot / ClaudeBot: OpenAI and Anthropic publish crawler IP ranges, which major WAFs recognize
PerplexityBot: published IP ranges and verification are newer and less widely recognized by WAF rulesets. Several providers hold the connection open (silent-drop) rather than 403ing
This is exactly why crawl-sim exists — to surface these gaps. But it also means PerplexityBot will legitimately time out on many large-agency sites.
What crawl-sim should do
v1.5.1 already does the right thing:
bots.perplexitybot.fetchFailed: true with the curl timeout error text preserved
overall.score correctly drops to reflect the failure (no more false 100/A)
Suggested follow-up
Not a bug; consider these enhancements:
Add a WAF-detection hint when three bots succeed and one times out. Output something like warnings[].code = "likely_waf_bot_block" with the affected bot id.
Document the WAF tier-mismatch in README.md or a docs/waf-behavior.md, so users interpret PerplexityBot fetchFailed correctly without opening new bug reports.
Optional: add a retry-with-Googlebot-verification-hint phase that attempts one more fetch with the canonical PerplexityBot UA + longer timeout, so transient WAF-challenge gets distinguished from persistent blocking.
Summary
PerplexityBot fetch times out against sites fronted by aggressive bot WAFs (Cloudflare bot-fight, DataDome, Akamai Bot Manager), even on canonical hosts that pass every other bot check. This is site-side behavior, not a
crawl-simdefect — documenting here so the tool's output isn't misread as a tool bug.Observation
During a live audit of
https://almostimpossible.agencyon v1.5.0:googlebot,gptbot,claudebotall returned HTTP 200 with identical server HTMLperplexitybottimed out twice at 30s — once on the bare domain, again onhttps://www.almostimpossible.agency/directlyThe canonical-host fix (#27's companion work in v1.5.0) did not cause this, and it's not a DNS or routing issue. It reproduces on the canonical URL with a normal
curl:Why this happens
The three tiers of bot classification at Cloudflare/Akamai/DataDome treat PerplexityBot differently than Googlebot:
This is exactly why
crawl-simexists — to surface these gaps. But it also means PerplexityBot will legitimately time out on many large-agency sites.What crawl-sim should do
v1.5.1 already does the right thing:
bots.perplexitybot.fetchFailed: truewith the curl timeout error text preservedbots.perplexitybot.score: 0 / grade: Fwarnings[]now surfaces a high-severityweighted_bot_fetch_failedentry (added in bug: overall composite ignores fetchFailed weighted bots #27)overall.scorecorrectly drops to reflect the failure (no more false 100/A)Suggested follow-up
Not a bug; consider these enhancements:
warnings[].code = "likely_waf_bot_block"with the affected bot id.README.mdor adocs/waf-behavior.md, so users interpretPerplexityBot fetchFailedcorrectly without opening new bug reports.References
https://almostimpossible.agency(Cloudflare-fronted) during v1.5.0 live audit, reproduced 2026-04-16