Skip to content

feat(seo): 擴增 robots.txt 支援 39 個 AI 爬蟲#245

Open
s123104 wants to merge 2 commits intomainfrom
feat/seo-robots-txt-ai-crawlers-complete
Open

feat(seo): 擴增 robots.txt 支援 39 個 AI 爬蟲#245
s123104 wants to merge 2 commits intomainfrom
feat/seo-robots-txt-ai-crawlers-complete

Conversation

@s123104
Copy link
Copy Markdown
Contributor

@s123104 s123104 commented Apr 10, 2026

Summary

  • 擴增 AI_SEARCH_BOTS 清單從 17 個增至 39 個 AI 爬蟲
  • 新增主要搜尋引擎與 AI 爬蟲:Google-CloudVertexBot, Googlebot, Bingbot, PhindBot
  • 新增 Meta 系列爬蟲:Meta-ExternalFetcher, FacebookBot
  • 新增其他 AI 爬蟲:Cloudflare-AutoRAG, archive.org_bot, Timpibot, ProRataInc, Novellum AI Crawl
  • 移除 robots.txt 所有註解保持簡潔
  • 更新 SEO_MASTER_SSOT.md 完整 AI 爬蟲清單文檔

變更檔案

  • apps/ratewise/scripts/generate-robots-txt.mjs - 擴增 AI 爬蟲清單
  • apps/ratewise/public/robots.txt - 自動生成的 robots.txt
  • docs/SEO_MASTER_SSOT.md - 更新 AI 爬蟲規範文檔

AI 爬蟲完整清單(39 個)

平台 爬蟲
OpenAI GPTBot, OAI-SearchBot, ChatGPT-User
Anthropic ClaudeBot, Claude-User, Claude-SearchBot, anthropic-ai
Perplexity PerplexityBot, Perplexity-User
Google Google-Extended, Google-CloudVertexBot, Googlebot
Microsoft Bingbot
xAI GrokBot
Cohere cohere-ai
You.com YouBot
Phind PhindBot
DuckDuckGo DuckAssistBot
Amazon Amazonbot
Apple Applebot, Applebot-Extended
Common Crawl CCBot
ByteDance Bytespider
Huawei PetalBot
Mistral MistralAI-User
Manus Manus-User
Meta Meta-ExternalAgent, Meta-ExternalFetcher, FacebookBot, facebookexternalhit
X (Twitter) Twitterbot
LinkedIn LinkedInBot
Cloudflare Cloudflare-AutoRAG
Anchor Anchor Browser
Internet Archive archive.org_bot
Ceramic Terracotta Bot
Timpi Timpibot
ProRata.ai ProRataInc
Novellum Novellum AI Crawl

Test plan

  • pnpm typecheck 通過
  • pnpm test -- --run 1957 個測試全過
  • pnpm test -- --run seo 404 個 SEO 測試全過
  • Pre-commit hooks 通過
  • Pre-push hooks 通過

References

Made with Cursor

- 擴增 AI_SEARCH_BOTS 清單至 39 個爬蟲
- 新增 Google-CloudVertexBot, Googlebot, Bingbot, PhindBot 等
- 新增 Meta 系列: Meta-ExternalFetcher, FacebookBot
- 新增 Cloudflare-AutoRAG, archive.org_bot, Timpibot 等
- 移除 robots.txt 所有註解保持簡潔
- 更新 SEO_MASTER_SSOT.md 完整 AI 爬蟲清單

測試:pnpm test -- --run(1957 個測試全過)

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 38030f5.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

@github-actions
Copy link
Copy Markdown
Contributor

✅ SEO 審計通過!所有 2025 標準驗證項目都符合要求。

  • ✅ Sitemap 2025 標準
  • ✅ Breadcrumb Schema
  • ✅ JSON-LD 結構化數據
  • ✅ 內部連結結構

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b13216cef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +46 to +50
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve disallow rules for Googlebot and Bingbot

Adding dedicated groups User-agent: Googlebot and User-agent: Bingbot with only Allow: / makes these crawlers stop using the User-agent: * disallow set, so paths like /ratewise/?, /ratewise/theme-showcase/, and other blocked routes become crawlable again for major search engines. This reintroduces index pollution/crawl-budget regressions that the wildcard block was preventing; either remove these dedicated groups or replicate the same disallow rules inside them.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修復 (commit 38030f5)

感謝指出這個重要問題。已移除 GooglebotBingbot 的專門 User-agent 區塊。

修復內容:

  • 移除 GooglebotBingbotAI_SEARCH_BOTS 清單
  • 這些爬蟲現在遵循 User-agent: * 的 Disallow 規則
  • 開發頁面(/ratewise/theme-showcase//ratewise/ui-showcase/ 等)不會被索引

策略說明:

  • AI 爬蟲(如 GPTBot、ClaudeBot)需要明確 Allow 以確保 AI 搜尋可見度
  • 傳統搜尋爬蟲(Googlebot、Bingbot)應遵循通用規則以保護 crawl budget

已更新 SEO_MASTER_SSOT.md 文檔說明此策略。

- 移除 Googlebot 和 Bingbot 的專門 User-agent 區塊
- 這些爬蟲現在遵循 User-agent: * 的 Disallow 規則
- 防止開發頁面(theme-showcase、ui-showcase 等)被索引
- 更新 SEO_MASTER_SSOT.md 說明策略變更
- AI 爬蟲清單從 39 個調整為 37 個

測試:pnpm test -- --run seo(404 個測試全過)

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

✅ SEO 審計通過!所有 2025 標準驗證項目都符合要求。

  • ✅ Sitemap 2025 標準
  • ✅ Breadcrumb Schema
  • ✅ JSON-LD 結構化數據
  • ✅ 內部連結結構

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants