robots-txt-validator

Validates whether the major LLM crawlers (GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, PerplexityBot, Perplexity-User, Google-Extended, CCBot, Bytespider, Amazonbot, Applebot-Extended) are allowed by a robots.txt file.

No third-party dependencies. Python 3.9 or newer, standard library only.

Why this exists

A site's robots.txt is the single cheapest configuration that decides whether generative engines may include the site in their answer stream. A typo, a stale Disallow: / left over from a staging environment, or an opt-out token set without the operator's knowledge can make the site invisible to ChatGPT, Claude, Perplexity, Gemini and Copilot.

robots-txt-validator parses a robots.txt file and reports, for each of the major LLM crawlers, whether a given path is allowed. It exits with a non-zero status if any crawler is disallowed, so it can be wired into CI.

Crawlers covered

Token	Operator role
`GPTBot`	OpenAI search/training crawler
`ChatGPT-User`	OpenAI in-session retrieval agent
`OAI-SearchBot`	OpenAI search index crawler
`ClaudeBot`	Anthropic crawler
`Claude-Web`	Anthropic in-session retrieval agent
`PerplexityBot`	Perplexity index crawler
`Perplexity-User`	Perplexity in-session retrieval agent
`Google-Extended`	Google generative-AI training opt-out token
`CCBot`	Common Crawl, used by many model trainers
`Bytespider`	ByteDance / Doubao crawler
`Amazonbot`	Amazon crawler used by Alexa and Rufus
`Applebot-Extended`	Apple generative-AI training opt-out token

User-Agent tokens reflect the documented public versions as of early 2026. When operators change their tokens, update CRAWLERS in validator.py.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
validator.py		validator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

robots-txt-validator

Why this exists

Crawlers covered

Install

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

robots-txt-validator

Why this exists

Crawlers covered

Install

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages