Which subscription, coding plan, API, router, or free tier should you put behind your AI coding agent? A curated, benchmarked, source-linked answer — ranked by 💵 price · 🧠 power · 🔢 models · 📊 limits · 🔌 integration.
English · 简体中文 · Español · Русский · 日本語 · Português · Français · Deutsch · 한국어 · हिन्दी
This list catalogs the plans you pay for — subscriptions, flat-rate coding plans, pay-as-you-go APIs, routers, and free tiers — not the coding tools themselves. The harness (Claude Code, Cline, Aider, Roo/Kilo, OpenCode) is free. What costs money is the model behind it, so that is what gets ranked here. The harness is only the integration target.
A $3–30/mo flat-rate plan from a Chinese open-weight lab (GLM, Kimi, DeepSeek, MiniMax, Qwen, Doubao) pointed at a free CLI harness gets you around 78–80% on SWE-bench for roughly a tenth of what a $200 frontier subscription costs. The frontier subs still win the hardest tasks. So most people in 2026 run both: a frontier sub for the hard reasoning, a cheap plan for everything else.
⚠️ Pricing in this space changes monthly. Numbers reflect ~June 2026. Always confirm on the official page before buying. Found a stale price? Open a PR — fixes are as valued as additions.
| Badge | Meaning |
|---|---|
| 💎 | Hidden gem — lesser-known, costs less than it should for what you get |
| 🆓 | Has a free tier you can actually run an agent on |
| ✅ | Pricing fact-checked against the official source (June 2026) |
| ⭐ | Value rating (1–5): price vs power vs limits vs integration |
| 🇨🇳 | China-hosted (data-residency / latency caveat for some) |
| Carries notable risk (ToS, reliability, longevity, reseller) |
Integration shorthand: CC-native = native Anthropic-compatible endpoint, a drop-in Claude Code backend via ANTHROPIC_BASE_URL. OpenAI-compat = works in Cline/Roo/Kilo/Aider/Continue/OpenCode by base-URL swap (Claude Code needs a shim/router). native-only = locked to the vendor's own editor/agent, not reusable as a backend.
- How to choose
- Pick by budget
- Pick by who you are
- TL;DR — top picks by use-case
- Master comparison table
- First-party frontier subscriptions
- Bundled tool subscriptions (editor + model)
- Flat-rate coding plans — the value champions 💎
- Pay-as-you-go value APIs
- Speed / fast-inference providers
- Routers & gateways
- More providers worth knowing (2026)
- Free tiers 🆓
- Free credits & student / startup programs
- Student & education plans 🎓
- Niche & specialty
- App builders & autonomous agents
- Hidden gems & reseller proxies
⚠️ - Setup recipes — wire a cheap plan into your harness
- Privacy & data-residency matrix
- Benchmark-per-dollar
- Money traps & common mistakes
- 2026 pricing timeline
- What the community actually says
- Self-host & hybrid (when a subscription isn't the answer)
- FAQ
- Glossary
- How this list is scored & maintained
- Caveats & disclaimer
- Contributing
- License
- ⭐ Star history
Score every plan on five axes:
- 💵 Price — sticker, and the real effective cost (credit ratios, peak multipliers, overage).
- 🧠 Power — model quality; the value tier clusters at ~78–80% SWE-bench Verified, frontier at 85–89%.
- 🔢 Model count — one plan that multiplexes many models (Qwen Coding Plan, OpenRouter) hedges churn.
- 📊 Limits — requests/tokens per 5h-window, weekly caps, concurrency. The hidden cost: one IDE "prompt" fans out to 5–30 model calls, so advertised "prompts/5h" are softer than they look.
- 🔌 Integration — does it expose a native Anthropic endpoint (clean Claude Code drop-in) or only OpenAI-compat (needs a router)? Or is it native-only (no reuse)?
Decision shortcut:
- Want the best agent, simplest path → Claude Pro $20 → Max 5x $100.
- Want most coding per dollar → a flat-rate plan (GLM / MiniMax / Qwen / Kimi) on Claude Code.
- Want $0 → Cerebras free + OpenRouter free (+$10 unlock) + NVIDIA NIM, escalate hard tasks to a paid model.
- Want one key for everything → OpenRouter.
- Want privacy (no China-host) → Synthetic.new (US, no-train, 14-day deletion) or first-party US subs.
flowchart TD
A[Need an AI coding backend] --> B{Budget?}
B -->|$0| C{Sensitive code?}
C -->|Yes| C1[Local: Ollama + Qwen3-Coder]
C -->|No| C2[Cerebras free + OpenRouter free<br/>+10 dollar unlock + ModelScope]
B -->|Under $30/mo| D{Which harness?}
D -->|Claude Code| D1[GLM / Kimi / DeepSeek / MiniMax / Qwen<br/>via ANTHROPIC_BASE_URL]
D -->|Cursor / Copilot| D2[Cursor Pro $20 / Copilot Pro $10]
D -->|Any / mixed| D3[OpenRouter one key]
B -->|$100-200/mo| E{All-day heavy use?}
E -->|Yes, need Opus| E1[Claude Max 5x/20x]
E -->|Yes, want speed| E2[Cerebras Code $50-200]
E -->|Team / governance| E3[Copilot Business / Claude Team]
Skip the analysis paralysis. Find your monthly number, grab the stack.
| Budget | Best pick | What you get | Smartest stack |
|---|---|---|---|
| $0 🆓 | GitHub Copilot Free + Gemini CLI | 2,000 completions + 50 premium reqs/mo from Copilot; a generous agentic CLI from Google | Copilot Free in the IDE for autocomplete, Gemini CLI in the terminal for agent runs, Cursor Hobby as a third bucket of free Tab completions |
| < $10/mo | GLM Coding Plan Lite 💎 ($30/qtr ≈ $10/mo) | ~3× Claude Pro usage; a native Anthropic-compatible endpoint — drop into Claude Code, Cline, or OpenCode | GLM Lite as your Claude Code driver + stack the free tier on top for overflow |
| ~$10/mo | GitHub Copilot Pro ($10) | Unlimited completions, $10 of AI Credits, agent mode, model picker — moved to usage-based credits June 2026 | Copilot Pro in-IDE + GLM Lite in-terminal — two frontier-ish drivers for ~$20 total |
| ~$20/mo | Claude Pro ($20) or Cursor Pro ($20) | Pro: Claude Code in terminal/web/desktop, Sonnet 4.6 + Opus 4.6. Cursor: unlimited Tab + $20 agent usage + Background Agents | Claude Pro (best raw agent) + Copilot Free for inline autocomplete; or Cursor Pro alone if you live in one editor |
| ~$50/mo | MiniMax Max ($50) or GLM Pro (~$72/mo) + Claude Pro ($20) | A high-volume flat plan (MiniMax ~1000 prompts/5h, or GLM Pro) plus native Anthropic quality for the hard stuff | Cheap plan for the grind, Claude Pro reserved for tricky reasoning — best $/throughput on the board |
| ~$100/mo | Claude Max 5x ($100) | 5× Pro usage, priority access to newest models — the sweet spot for devs who hit Pro limits daily (Max plan) | Max 5x as the workhorse + GLM Lite ($10) as a cheap overflow lane when you burn the 5x cap |
| ~$200/mo | Claude Max 20x ($200) or Cursor Ultra ($200) | Max 20x: 20× Pro, top individual tier. Cursor Ultra: 20× usage + priority features in a full IDE | Max 20x for terminal-first power users; add Copilot Pro ($10) only if you want a second vendor's models for variety/redundancy |
Rules of thumb
- Under $20 and price-sensitive? GLM Lite is the single best dollar in coding right now — it speaks Anthropic's API, so your Claude Code muscle memory transfers.
- One tool, all day? Pay for the native subscription (Claude Pro, Cursor Pro). Don't fragment.
- Heavy daily user? Jump straight to Max 5x — it's cheaper than stacking two $50 plans and far less fiddly.
- The pro move at every tier: one premium driver for hard problems + one cheap/free lane for bulk edits and autocomplete. You rarely need two $20+ subscriptions.
Prices verified June 2026. Quarterly-billed plans (GLM) shown as effective monthly. Copilot and GitHub plans moved to usage-based AI Credits on June 1, 2026 — your allotment scales with the base price.
Skip the matrix-staring. Find your row, copy the pick, move on. Prices are USD/mo, individual tiers unless noted (June 2026).
| You are… | Best pick | Why it fits you | ~Price |
|---|---|---|---|
| Solo indie hacker 💎 | Claude Pro + a Z.ai/DeepSeek API key as overflow | One $20 sub covers Claude Code in-terminal; when you hit the 5-hour cap mid-sprint, fall back to a cheap value-API instead of jumping to a |
$20 + pennies |
| Startup eng team (2–20) | GitHub Copilot Business | $19/seat gets org policy, public-code filter, IP indemnity, and central billing — the cheapest plan that's safe to put in front of investors/customers. Pairs with each dev's own Claude/Cursor sub for heavy lifting. pricing | $19/seat |
| Enterprise (governance / SSO / IP) | Copilot Enterprise or Claude Enterprise | Copilot Enterprise ($39/seat) adds SSO/SCIM, audit logs, codebase-indexed knowledge bases, and the same Microsoft IP-indemnity-with-filter. Claude Enterprise (sales-quoted) is the alt if you're Anthropic-first. Both clear procurement. Copilot Enterprise | $39/seat → custom |
| CS student 🆓 | GitHub Copilot (Student) + ChatGPT Free | Verified students get Copilot at Pro level for free (unlimited completions, premium models, monthly premium-request allowance). Zero spend, real tooling. Copilot plans | $0 |
| OSS maintainer 🆓 | Copilot Pro free for OSS + Claude Pro for deep work | Maintainers of popular repos qualify for free Copilot Pro; keep a $20 Claude Pro for the gnarly refactors. Best public-good-to-cost ratio. | $0–$20 |
| Privacy-first / regulated 🔒 | Local stack: Ollama + Qwen3-Coder + Continue.dev | Proprietary code never leaves the box — no API, no retention clause, no DPA to negotiate. Roughly 70–85% of cloud-Claude quality on single-file work. If you must use cloud, add a zero-retention API tier. setup | $0 (hardware) |
| Offline / air-gapped | Ollama + Qwen3-Coder-Next (Continue.dev or OpenCode) | Same local stack, but this is the only category that works with the network cable pulled. Qwen3-Coder-Next runs ~3B active params from an 80B MoE — fits real hardware, no internet ever. models | $0 |
| Vibe-coder / hobbyist 🆓 | Free tier sampler: ChatGPT Free or Copilot Free + Gemini free | Building for fun on weekends — don't pay anything. Copilot Free's 2,000 completions/mo plus a chat model covers casual side projects. Upgrade only when free limits actually bite. | $0 |
| Power-user running parallel agents 💎 | Claude Max 20x (or stack a value-API for fan-out) | If you orchestrate swarms / parallel Claude Code sessions, the 20x usage ceiling is what stops you hitting limits at 2pm. Cheaper than burning equivalent API tokens at this volume. Add a DeepSeek/Z.ai key for the throwaway worker agents. | $200 |
Two cross-cutting rules of thumb:
- The jump from $20 → $100/$200 only pays off if you personally hit usage caps more than ~twice a week. Most people don't — track it before upgrading.
- IP indemnity is a plan feature, not a model feature. It starts at Copilot Business and requires the public-code filter on — free and Pro tiers don't carry it. If a lawyer will ever read your repo, this is the line that matters. details
Sources:
- GitHub Copilot Plans & pricing
- Plans for GitHub Copilot — GitHub Docs
- AI Pricing Compared 2026 — AIViewer
- Build a Local AI Coding Assistant — Qwen3 + Ollama + Continue.dev
- Best Local AI Coding Models for Ollama (2026)
| Use-case | Pick | Why | ~Price |
|---|---|---|---|
| 🏆 Best overall value | GLM Coding Plan 💎🇨🇳 | GLM-5.1 ~94% of Opus coding; native Claude Code; cheapest serious entry | ~$10/mo (qtrly Lite) – $72 Pro |
| 🥇 Best raw frontier | Claude Max 5x | Unlocks Opus in Claude Code, the consensus #1 agent | $100/mo |
| 🪙 Cheapest serious entry | Trae Lite $3 / StepFun $6.99 / MiMo ~$5 / GLM Lite ~$10 💎 | Real coding backend for the price of a coffee | $3–10/mo |
| 💸 Cheapest per token | DeepSeek V4-Flash 💎 | $0.14/M in, $0.0028/M cache-hit, 1M ctx, CC-native | pay-go |
| 🧪 Best free | Cerebras free 🆓 + OpenRouter :free 🆓 | 1M tok/day (fast) + Qwen3-Coder-480B free | $0 |
| ⚡ Best fast+cheap | Groq 💎🆓 / Cerebras Code | Native Anthropic endpoint (Groq); ~2000 tok/s flat-rate (Cerebras) | free / $50/mo |
| 🔀 Best universal router | OpenRouter | 315+ models, one key, Anthropic skin, no token markup | pay-go +5.5% |
| 🔒 Best privacy (US-host) | Synthetic.new 💎 | US infra, no-train, 14-day deletion, dual OpenAI+Anthropic compat | $20–60/mo |
| 🧰 Best for big codebases | Augment Code ✅ | Best-in-class Context Engine for monorepos | $20+/mo |
| 🏢 Best team value | Claude Team Premium seat 💎 | ≈ Max-5x usage + SSO/admin | $100/seat |
Sorted roughly by value. Prices ~June 2026; verify before buying.
| Plan | Type | Price | Models | Limits (coding) | Integration | ⭐ | Notes |
|---|---|---|---|---|---|---|---|
| GLM Coding Plan | flat-rate | Lite $18 · Pro $72 · Max $160 /mo (qtrly Lite ~$10/mo) | GLM-5.1/5/4.7 | Lite ~80, Pro ~400 prompts/5h | CC-native | ⭐5 | 💎🇨🇳✅ |
| DeepSeek API | pay-go API | V4-Pro $0.435/$0.87; Flash $0.14/$0.28 | V4-Pro/Flash | 1M ctx, 500–2500 concur | CC-native | ⭐5 | 💎🇨🇳✅ |
| MiniMax Coding Plan | flat-rate | $10–50/mo | M2.7 (plan), M2.5/M3 (API) | Starter ~100, Max ~1000 prompts/5h | CC-native | ⭐5 | 💎🇨🇳 |
| Kimi Code | flat-rate+API | ~$19/mo + metered | K2.6 (1T) | ~300–1200 calls/5h, 30 concur | CC-native | ⭐5 | 💎🇨🇳 |
| Qwen Cloud Coding Plan | flat-rate | Pro $50/mo (Lite $10, closed) | Qwen3.5 + Kimi/GLM/MiniMax | Pro 6000 req/5h, 1M ctx | CC-native | ⭐4 | 💎🇨🇳✅ |
| OpenRouter | router | pay-go, +5.5% top-up | 315+ (all of them) | balance-bound; free models 50–1000/day | CC-native skin | ⭐5 | 🆓 |
| Claude Pro | first-party | $20/mo | Sonnet 4.6 (no Opus) | ~40–45 msg/5h + weekly | CC-native | ⭐5 | best entry |
| Claude Max 5x | first-party | $100/mo | + Opus 4.6/4.7 | ~50–225 prompts/5h | CC-native | ⭐5 | Opus unlocked |
| Cerebras Code | flat-rate speed | $50/$200 | GLM-4.7 (~2000 tok/s) | 24M–120M tok/day, 131k ctx | OpenAI-compat | ⭐5 | ✅ often sold out |
| Synthetic.new | flat-rate (US) | $20–60/mo | 16 open-weight (GLM/Kimi/Qwen/DS) | ~125–1250 req/5h | CC-native | ⭐5 | 💎🔒 |
| Chutes | flat-rate |
$3/$10/$20 | GLM-5/Kimi/DS/MiniMax/Qwen | 300/2000/5000 req/day | OpenAI-compat | ⭐5 | 💎 |
| Grok Code Fast 1 | pay-go API | $0.20/$1.50/M | grok-code-fast-1 | 256K ctx, ~92 tok/s | CC-native | ⭐5 | 💎 #1 on OpenRouter |
| ChatGPT Plus | first-party | $20/mo | GPT-5.x-Codex | token-credit metered | Codex-native | ⭐4 | Codex #2 agent |
| ChatGPT Pro | first-party | $100/$200 (5x/20x) | GPT-5.5-Codex | high; dedicated GPU | Codex-native | ⭐4 | |
| Claude Max 20x | first-party | $200/mo | Opus 4.6/4.7 | ~200–900 prompts/5h | CC-native | ⭐4 | power tier |
| Cursor Pro / Ultra | bundled | $20 / $200 | all frontier + Auto | $20 / $400 usage pool | Native-only | ⭐4 | Ultra = 2× credit ratio |
| GitHub Copilot Pro | bundled | $10/mo | GPT-5/Claude/Gemini | $10 AI-credits (usage) | Native-only (+ACP) | ⭐4 | free completions 🆓 |
| DeepInfra | speed/API | pay-go (cheapest OSS) | Kimi/DS/Qwen3-Coder/GLM | balance-bound | CC-native | ⭐5 | 💎✅ cheapest host |
| Groq | speed/API | pay-go + free | GPT-OSS/Qwen3/Kimi | free RPM/TPM caps | CC-native | ⭐4 | 💎🆓 |
| Vercel AI Gateway | router | $0 markup (even BYOK) | 100s incl. Claude | $5/mo free credits | CC-native | ⭐4 | 💎🆓✅ |
| Requesty | router | +5% flat | Claude/GPT/Gemini/DS/Qwen | semantic cache ~40% off | OpenAI-compat | ⭐4 | 💎 team governance |
| Mistral Le Chat Pro | first-party | $14.99/mo ($5.99 student) | Devstral 2 + Vibe CLI | ~25 free msg/day | Native-only | ⭐4 | 💎🆓🇪🇺 cheapest major sub |
| Augment Code | bundled | $20–200/mo | Claude/Gemini/GPT | 40k–450k credits/mo | Native-only | ⭐4 | ✅ best big-repo context |
| Zed Pro | bundled | $10/mo | any (BYO key/ACP) | $5 credits + usage | ACP + BYOK | ⭐4 | 💎 anti-lock-in |
| Cerebras free | free | $0 | Qwen3-Coder-480B, GPT-OSS-120B | 1M tok/day, 8K ctx cap | OpenAI-compat | ⭐5 | 💎🆓 fastest free |
| Google AI Studio | free | $0 | Gemini 2.5 Flash, Gemma 3 27B | Flash 250 RPD; Gemma 14.4k RPD | OpenAI-compat | ⭐4 | 🆓 biggest free ctx |
The direct-from-vendor plans. A subscription authenticates the vendor's own harness (Claude Code, Codex CLI, Antigravity, Grok Build) via login — it does not give you a generic API key for third-party OpenAI-compat tools (that's separate per-token billing). Exception: xAI Grok models are OpenAI/Anthropic-compatible.
Value pecking order for agentic coding (June 2026 consensus): Claude > OpenAI Codex > Google Gemini > xAI Grok. An independent 30-day test put Claude ~95% vs ChatGPT ~85% coding accuracy; vendor SWE-bench has GPT-5.5 (88.7%) ≈ Opus 4.7 (87.6%).
-
Claude Pro —
$20/mo($17 annual). Sonnet 4.6 in Claude Code (no Opus). ~40–45 msg/5h + weekly cap, shared with chat/Cowork. Best value entry point to the #1 coding agent. April 2026 doubled the 5h limits & removed peak throttling. ⭐5 -
Claude Max 5x —
$100/mo. Unlocks Opus 4.6/4.7 + 5× throughput (~50–225 prompts/5h). The pro sweet-spot; a $100 middle tier OpenAI/Google don't match as usefully. ⭐5 -
Claude Max 20x —
$200/mo. ~200–900 prompts/5h. For all-day parallel agents; the flat-rate-beats-API math is decisive (90%+ of Claude Code tokens are cache-reads, free on subscription, billed on API — one dev's peak month = $5,623 at API ≈ 4.5 years of Max 5x). ⭐4 -
Claude Team Premium seat 💎 —
$100/seat(annual). ≈ Max-5x usage plus SSO/admin/audit/enterprise-search. Quietly the best team coding value; Standard $20 seat also includes Claude Code. ⭐4
-
ChatGPT Plus —
$20/mo. Codex CLI/IDE bundled (GPT-5.5/5.4/5.3-Codex). Token-credit metered since April 2026 (confusing). Codex = consensus #2 agent. Plus caps run out fast on heavy agentic work. ⭐4 -
ChatGPT Pro —
$100(5x) /$200(20x). High throughput + dedicated GPU. Note the $100-tier "10x boost" promo expired May 31 2026 (now 5x). The classic "$200 plan worth it?" debate = Claude Max 20x vs ChatGPT Pro 20x. ⭐4
-
Google AI Pro —
$19.99/mo(often 50% off year one). Renamed from Google One AI Premium (Apr 2026). Gemini 3.x Pro, 5 TB storage, and enhanced Antigravity + Jules coding-agent access. ✅ -
Google AI Ultra —
$100/mo(5x, new dev tier) /$200/mo(20x, cut from $250) at I/O 2026. 5×/20× usage in the Gemini app and Antigravity; top tier adds Deep Think, Project Genie, 30 TB. ✅ ⚠️ Google is killing the open-source Gemini CLI June 18 2026, migrating users to the closed Antigravity CLI with far lower free quotas (~1000 → ~20 req/day) — the biggest community grievance of 2026.
-
SuperGrok —
$30/mo($300/yr) / Heavy$300/mo. (No standalone "Lite" tier on the current pricing page — it's a legacy/X-Premium-bundle artifact; ignore old $10 figures.) Grok Build CLI runs 8 parallel sub-agents in isolated git worktrees (novel) and is included for all SuperGrok subs.grok-code-fast-1has a cult following (cheap+fast). SWE-bench ~70.8% trails leaders. For coding, the xAI API is often the better buy. ⭐3 💎
Here the plan is the product — you buy into the vendor's editor/agent. The 2026 trend: nearly all moved from fixed request counts to credits / token-metering, making costs less predictable (loud backlash).
-
Cursor — Hobby free · Pro
$20· Pro+$60💎 · Ultra$200. Since June 2025, your plan price = a usage pool at API rates. Credit ratio improves up-tier: Pro $20/$20 (1×), Pro+ $60/$70 (1.17×), Ultra $200/$400 (2×, best).Automode is the value key — effectively unlimited, doesn't drain the pool like pinning Claude/MAX does.⚠️ The June-2025 switch caused a pricing disaster (HN user: "$350 overage in a week"); CEO apologized + refunded. Native-only — can't back Claude Code, and as of Jan 2026 you can't route a Claude sub into Cursor either. Best-in-class Tab/Apply. ⭐4
-
GitHub Copilot — Free 🆓 · Pro
$10· Pro+$39· Max$100· Business$19· Enterprise$39.⚠️ Moved to usage-based AI Credits June 1 2026 (1 credit = $0.01); each plan includes a credit pool (Pro=$15, Pro+=$70). Code completions stay unlimited & free — completion-only users unaffected. Backlash severe (TechTimes: agentic bills jumped 10×–50×). Best-in-class IDE completions + governance/IP-indemnity for orgs. Native-only (escape hatch: Copilot CLI speaks ACP). ⭐4
-
Augment Code ✅ — Indie
$20/40k credits · Standard$60· Max$200. Best-in-class Context Engine for large monorepos (tops context-recall comparisons). VS Code + JetBrains + Auggie CLI. Native-only. Credit burn on tool-heavy tasks is the gripe. ⭐4 💎 -
Zed Pro 💎 — Free · Pro
$10(only +10% markup) · Business$30. The anti-lock-in pick: open ACP drives external agents (Claude Code, Codex, OpenCode) and BYO keys for any provider. Fastest native editor. ⭐4 -
Kiro 💎 — Free · Pro
$20/1k credits · Pro+$40· Power$200. Best spec-driven agent (requirements→design→tasks), full Claude lineup incl. Opus 4.7, fractional 0.01-credit billing. AWS Startups = 1 free year of Pro+. ⭐4 -
Trae 💎🇨🇳 — Free · Lite
$3· Pro$10· Ultra$100. ByteDance VS Code fork; usage pool exceeds sticker (e.g. $20 usage for $10) = "the $3 Cursor alternative."⚠️ ByteDance telemetry shared with affiliates — enterprise dealbreaker. ⭐4 - Sourcegraph Amp — Free to start ($10 credit; $40 for ex-Cody). Pure consumption (no monthly floor), runs Opus 4.8 in "smart" mode. Great for light use, uncapped-burn risk for heavy. Cody Free/Pro were retired into Amp. ⭐3
- JetBrains AI / Junie — beloved IDE integration, but Junie burns credits fast (Ultimate's 35 credits gone in ~4–5 days). Only if you live in JetBrains.
- Overpriced / avoid: Tabnine ($39 floor, no free tier, annual lock-in — only for on-prem/air-gap needs); Windsurf Pro (March 2026 swap to daily/weekly quotas is the year's most-complained-about change, post-Cognition trust low). Supermaven is dead as a standalone (folded into Cursor Tab Nov 2025).
Fixed monthly or quarterly plans that put a frontier-ish open-weight model behind your harness, mostly from Chinese labs. Most expose a native Anthropic endpoint, so they drop into Claude Code via ANTHROPIC_BASE_URL. For the endpoint list, see Alorse/cc-compatible-models.
Consensus pecking order: GLM (cheapest entry, community default) · MiniMax (best price/volume) · Kimi (best long-horizon agent) · Qwen (>262K context, multi-model). Claude Pro $20 is the quality benchmark they undercut.
-
Overseas monthly (verified June 2026):
Lite $18/mo·Pro $72/mo·Max $160/mo— pricesdoubled on Apr 11 2026. Quarterly Lite is the cheap route ($30/qtr ≈ $10/mo). Domestic CN is much cheaper (~$7 / $21 / $68 per mo). The viral $3/mo promo ended Feb 11 2026. ✅ - Models: GLM-5.1 (~94% of Opus 4.6 coding) · GLM-5/5-Turbo · GLM-4.7 · GLM-4.5-Air. Every tier (incl. Lite) gets all models and the full 200K context (128K max output) — tiers differ only in quota, not models or context window. Suggested mapping: GLM-5.1 → Opus slot (hard tasks, frontend/UI), GLM-4.7 → Sonnet (the ×1-quota workhorse), GLM-4.5-Air → Haiku (fast background).
- Limits: Lite ~80, Pro ~400, Max ~1,600 prompts/5h + weekly (one IDE "prompt" = 5–30 model calls).
⚠️ Peak-hour 3× multiplier on GLM-5/5.1 only, 14:00–18:00 UTC+8 (≈08:00–12:00 Kaliningrad); 2× off-peak (1× off-peak via promo through end-June 2026). Run heavy GLM-5.1 off-peak. - Integration:
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic— official Claude Code support + Cline/Roo/Kilo/OpenCode (20+ tools). First-party = no reseller ban risk. -
The single most-recommended budget coding plan of 2026. "3× Claude Max usage for ~$30/mo." Backlash over the Feb price hike + ⅓ quota cut, still rated top value. ⭐5
- Sources: z.ai/subscribe · pricing · GLM-5.1 review
-
Starter $10/mo·Plus $20·Max $50(2 months free annual); High-Speed variants $40–150. - Models: M2.7 / M2.7-Highspeed on the plan; M2.5/M3 (1M ctx) via API.
⚠️ Plan often serves an older model (M2.1) than the benchmarked M2.5/M2.7. - Limits: Starter ~100 → Max ~1,000 prompts/5h; ~50 TPS (100 high-speed).
- Integration:
ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic+ OpenAI-compat. -
"Cut my Claude Code bill in half." Best raw price/volume in the flat-rate bucket; M2.7 ~94% of GLM-5.1 at ~1/5 the input cost. ⭐5
- Sources: coding plan · M2.5 pricing
-
~$19/mo membership+ metered API (K2.6 $0.60–0.95/M in, $2.50–4.00/M out, 75% cache discount). Tiered Moderato/Allegretto/Vivace. - Models: Kimi K2.6 (1T MoE, ~80.2% SWE-bench), K2.5.
- Limits: ~300–1,200 calls/5h, 30 concurrent (generous for parallel agents).
- Integration:
ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic— true Claude Code drop-in; ships own Kimi CLI (6.4k★). -
Best long-horizon agent stability (sustained 4,000+ tool calls over a 13-hour session). "Saving 88% coding costs." Priciest input-side of the open cohort. ⭐5
- Sources: agent support · Kimi Code guide
-
Pro $50/mo(Lite ~$10 closed to new subs since Mar 20 2026). - Models: Qwen3.5-Plus, Qwen3-Coder-Next/Plus/480B + cross-model Kimi/GLM/MiniMax under one key. 1M-token context (best in lane).
- Limits: Pro 6,000 req/5h + 45k/wk + 90k/mo (sliding window). Dedicated
sk-sp-key (not interchangeable with pay-go). - Integration:
ANTHROPIC_BASE_URL=https://coding-intl.dashscope.aliyuncs.com/apps/anthropic+ Qwen Code CLI. -
Standout = one plan multiplexes Qwen+Kimi+GLM+MiniMax and the only credible 1M-context flat plan. ⭐4
- Sources: Model Studio coding plan
-
Synthetic.new 💎🔒 —
$20–60/mo. ~16 always-on open-weight models (Kimi/GLM/Qwen3-Coder-480B/DeepSeek). US infra, no-training, 14-day deletion. Dual OpenAI + Anthropic compat = genuine Claude Code drop-in. The privacy-conscious alternative to China plans. ⭐5 -
Cerebras Code —
$50/$200, flat-rate speed (see Speed). -
OpenCode Go (Zen) 💎 —
$5first month then$10/moflat. ~12–14 Chinese open-weight models (GLM-5.1/Kimi/Qwen3.7/DeepSeek V4/MiniMax). First-class in OpenCode. No Claude/GPT. ⭐4
-
StepFun Step Plan —
$6.99–$99/mo, 100–5,000 prompts/5h, CC-native. Price undercutter, models less battle-tested. ⭐3 -
MiMo (Xiaomi) —
$6–$100/mocredit-based (60M–1.6B), CC-native (api.xiaomimimo.com), incl. multimodal Omni. Barely benchmarked. ⭐3 -
Atlas Cloud 💎 —
$10/$20, 800k–1.8M credits/day, OpenAI-compat (Claude Code/Codex/OpenCode). Daily-credit model for autonomous agents. ⭐4 -
Factory Droid 💎 — from
$20/motoken-based, frontier models (Claude/GPT/Gemini), rolling 5h/7d/30d windows. Notable "I canceled two $200 Max plans for Droid" story. ⭐4
Per-token access from the value labs. Cache pricing is the real cost driver for agent loops — design for cache hits over headline input price.
-
V4-Pro
$0.435/M in · $0.0036/M cache-hit · $0.87/M out(the 75% cut is now permanent). V4-Flash$0.14 / $0.0028 / $0.28. 1M context, 384K max output. - Integration: OpenAI-compat + native Anthropic (
https://api.deepseek.com/anthropic) — drop-in Claude Code (ANTHROPIC_MODEL=deepseek-v4-pro[1m]). -
The cost-per-token champion. V4-Pro ~80.6% SWE-bench / 93.5% LiveCodeBench at <$1/M out. V4-Flash's $0.0028/M cache-hit is unbeatable for bulk loops. ⭐5
- Sources: pricing · Claude Code setup
-
Alibaba Qwen3-Coder API 🇨🇳 — 480B
$0.22/$1.00, Flash$0.195/$0.975, 30B-A3B$0.07/$0.27. 1M free tokens / 90 days (Intl). CC-native. Strongest open-weight agentic coder. Use Singapore-region keys. ⭐4 -
Moonshot Kimi API 🇨🇳 — K2.6
$0.95/$4.00($0.16 cached), K2.5$0.60/$3.00. CC-native. Excellent tool-calling; output price is the grumble. $10 deposit removes daily cap. ⭐4 -
Zhipu GLM API 🇨🇳 — GLM-5.1
$1.40/$4.40, GLM-4.7$0.60/$2.20, FlashX$0.07/$0.40. CC-native. Most enthusiasts buy the cheaper Coding Plan instead. ⭐4 -
MiniMax API 🇨🇳 ✅ — M3
$0.30/$1.20($0.06 cache, free cache writes), M2.5 ~$0.15/$1.15. "20× cheaper than Opus." First-party Anthropic compat. ⭐4
Per-token hosts of open-weight models, optimized for throughput. Groq is the only one with a native Anthropic endpoint (cleanest Claude Code drop-in); the rest are OpenAI-compat (need a shim/router for CC, native in Cline/Roo/OpenCode).
- DeepInfra 💎 ✅ — the cheapest-per-token champion. DeepSeek V3.2 ~$0.26/$0.38, Kimi K2.6 $0.75/$3.50, Qwen3-Coder-480B $0.30/$1.00. 90+ models, cached discounts, native Anthropic endpoint, no upfront cost. Speed good-not-elite. ⭐5
-
Groq 💎🆓 — LPU speed (GPT-OSS-20B ~860 tok/s). GPT-OSS-120B
$0.15/$0.60, Kimi K2$1.00/$3.00. Native Anthropic + OpenAI compat + real free tier. Batch+cache stack to ~25%. No Qwen3-Coder-480B (ceiling is Qwen3-32B). ⭐4 -
Cerebras ✅ — fastest (~2,000–3,000 tok/s). Code Pro
$50(24M tok/day) / Max$200(120M tok/day), GLM-4.7, 131k ctx. Pay-go GPT-OSS-120B$0.35/$0.75.⚠️ Frequently sold out; 131k ctx (half native) + high TTFT blunt the speed in agent loops. ⭐5 flat-rate / ⭐4 pay-go - Together AI — broadest catalog (Qwen3-Coder-480B, Kimi, DeepSeek V4 Pro $2.10/$4.40 w/ $0.20 cached). Mid-pack price, ~89 tok/s. ⭐4
- Fireworks AI — production/enterprise lean, aggressive cache ($0.15/M), DeepSeek V4-Flash $0.14/$0.28, Azure Foundry path. ⭐4
- Novita 💎 — hosts the full Qwen3-Coder family at near-DeepInfra prices; under-the-radar OpenRouter route. ⭐4
-
Hyperbolic 💎 — GPT-OSS-20B
$0.10/Mblended (among cheapest anywhere); hosts Qwen3-Coder-480B (FP8). ~13 models. ⭐3 - SambaNova — uniquely fast on giant 671B/405B models; forever-free + $5 credit 🆓 but 50 req/day cap = eval-only.
One key across many providers. Pick a router as your default access layer.
-
OpenRouter 🆓 — the consensus default. 315+ models, one key, Anthropic-compat "skin" (
ANTHROPIC_BASE_URL=https://openrouter.ai/api= true Claude Code drop-in), no token-price markup (only +5.5% on top-ups), free ZDR + spend caps, generous BYOK (1M free req/mo). Free models (Qwen3-Coder-480B, DeepSeek, Llama 4): 50 RPD → 1000 RPD forever after a one-time $10 deposit. The 5.5% fee stings only above ~$5k/mo spend. ⭐5 - Requesty 💎 — flat 5% markup, all features incl. semantic caching (~40% savings, beats identical-only caches) + smart per-request routing + per-agent model policies (different model per classifier/synthesizer role) + SOC 2 Type II. The team-governance pick. OpenAI-compat. ⭐4
-
Vercel AI Gateway 💎🆓 ✅ — zero markup, even on BYOK. Native Anthropic-compat (
https://ai-gateway.vercel.sh) = direct Claude Code + Claude Agent SDK + "Claude Code Max via Gateway". $5/mo free credits refresh indefinitely (stops once you top up). Best pure-economics pick, esp. in the Vercel ecosystem. ⭐4 - Helicone Gateway 🆓 — observability-first (auto logging/tracing/cost), zero markup, free 10k req/mo; subs $79/$799. ⭐3
- CometAPI — 500+ models incl. latest proprietary, ~20–40% off official, dual OpenAI+Anthropic compat. Prepaid-credit middleman risk. ⭐4
- ElectronHub — 600+ models, weekly credits can exceed cash cost; tight 5–10 RPM on cheap tiers, reseller-trust caveat. ⭐3
- LiteLLM — the OSS self-host standard (free, no markup) — see integration tricks. DIY infra, not turnkey. ⭐4
Genuinely useful entries that don't headline the main sections but fill real gaps — extra Chinese labs and aggregators, Western coding tools, and routers beyond OpenRouter. Grouped and collapsed to keep the list scannable.
🇨🇳 Chinese aggregators & labs (cheap tokens, several with native Anthropic endpoints)
- SiliconFlow 💎 — one of China's largest independent MaaS routers, 200+ models, native Anthropic endpoint (rare) so Claude Code points straight at cheap DeepSeek/Qwen/GLM/Kimi. Intl (.com) + China (.cn) endpoints. DeepSeek-V4-Flash ~$0.14/$0.28.
- PPIO 💎 — CNY-priced router on its own GPU cloud; Qwen3-Coder-Next ≈¥1.4/¥10.5, DeepSeek-V4-Flash ¥1/¥2 — among the lowest token prices anywhere. OpenAI-compat (bridge for Claude Code).
-
Volcengine Ark / BytePlus 💎 (ByteDance Doubao) — flat Doubao Coding Plan: Lite $10/Pro $50 via BytePlus (the foreign-card-payable brand). Doubao-Seed-Code is natively Anthropic-compatible and approaches Claude Sonnet on coding; bundles an "ArkClaw" Claude-Code-style agent. Doubao API floor:
doubao-seed-1.6-flash$0.022/M in. -
Alibaba Bailian multi-model Coding Plan 💎 — $50/mo Pro that multiplexes Qwen3-Coder + Kimi-K2.5 + GLM-5 + MiniMax-M2.5 under one sub, with a native Anthropic endpoint + Singapore region (no Chinese ID).
⚠️ needs a dedicatedsk-sp-key — a normal key silently bills 5× PAYG. - ModelScope 🆓💎 (Alibaba) — 2,000 free API calls/day, no card, incl. Qwen3-Coder-480B. The de-facto $0 way to run a frontier Chinese coder in an agent loop after Qwen's OAuth free tier closed.
- AiHubMix 💎 — China-based unified router exposing OpenAI-, Gemini- and Anthropic-compatible endpoints with first-class Claude Code docs; single key across DeepSeek/Qwen/GLM/Kimi and relayed Claude.
- 302.AI 💎 — prepaid, no TPM throttling (good for bursty agents), one balance across Kimi/Qwen/DeepSeek + GPT/Claude, private-deploy option.
- Big-lab completeness: Baidu ERNIE (Qianfan; ERNIE 4.5 21B-A3B $0.07/$0.28), Tencent Hunyuan (HY3 Preview ~$0.063/$0.21 — but Tencent has raised some prices), iFlytek Spark (free Lite tier + dedicated Spark Code), SenseNova (cheap multimodal MoE). All OpenAI-compat; bridge needed for Claude Code; most need Chinese-ID for direct signup (reachable via relays/302.AI).
⚠️ China-direct relays (Yunwu, SSSAiCode-type) resell frontier Claude/GPT cheaply with no VPN — convenient inside China, but carry the standard reseller-proxy risk. Treat as a hot wallet.
🛠️ Western coding tools with a subscription
- Refact.ai 💎 — $10/mo, the cheapest agentic-coding sub; open-source, fully self-hostable autonomous agent with on-prem fine-tuning and zero telemetry. Free tier = 5,000 coins/mo + unlimited completions.
- Pieces for Developers 💎 — Pro $14.17/mo annual = unlimited Opus 4 / GPT-5 / Gemini 2.5 in-IDE (cheaper than a single Claude Pro seat). The differentiator is a long-term memory/context layer across all your tools, not codegen. Free tier runs local models unlimited.
- Continue 💎 — open-source IDE agent + the Continue Hub model storefront: frontier models at $3/M tokens, Team $20/seat (+$10 credits) with shared config/governance. BYOK too.
- Cline — reference OSS agent; zero-markup BYOK (30+ providers), typical real spend $25–70/mo. Teams plan: first 10 seats permanently free, then $20/seat.
- Kilo Code — the actively-maintained successor to Roo Code (archived May 15 2026). Zero-markup BYOK across 500+ models; optional Kilo Pass prepaid credits with a +50% annual bonus.
-
Goose 💎 (Block / Linux Foundation) — free OSS agent that can ride your existing Claude Max / ChatGPT / Copilot subscription via SDK providers for flat-rate inference — the same BYO-subscription bridge pattern as
copilot-api/claude-code-router. - Zencoder — SOC2 enterprise agent, multi-agent orchestration, "all features in every tier"; Pro $45/seat (30k credits) → Pro Max $195 (180k).
- Tabby 💎 — leading open-source self-hostable completion/chat server (free, ~$5–15/mo GPU); Cloud Team $24/seat; new Pochi autonomous agent. OpenAI-compat endpoint usable from any harness.
- Osaurus 💎 — native macOS harness that mixes local models (Gemma 4, Qwen3.6, Llama, DeepSeek V4) and cloud models (OpenAI, Anthropic, Gemini, xAI) in one sandboxed app, with 20+ native plugins. Open-source; ~112K downloads since its May 15 launch.
- AWS Kiro — Amazon's spec-first agentic IDE (VS Code base), GA'd May 7 2026 as the ground-up successor to Amazon Q Developer and spotlighted at AWS Summit New York (Jun 17). Preview was free; GA pricing not yet confirmed.
🔀 More routers & gateways
- Portkey 💎 — the most production-grade router missing from most lists: built-in guardrails, virtual keys, budget caps (marketed for capping runaway agentic spend), OpenAI and Anthropic compat, fully open-source self-hostable gateway. Free 10K logs/mo; Pro from $49.
- Cloudflare AI Gateway 💎 — near-zero-cost universal proxy (caching/analytics/fallback, no token markup); free 100K logs/mo. June-2026 xAI Grok partnership + Unified Billing make it a one-invoice control plane. Anthropic passthrough works for Claude Code.
- Poe API 💎 (Quora) — a consumer chat sub whose compute points double as a multi-provider coding API: one $19.99/mo plan spans Claude + GPT-5.x + Gemini, often 10–30% under direct. OpenAI- and Anthropic-compatible.
- Glama 💎 — OpenAI-compat gateway plus the largest MCP-server registry/host — uniquely relevant when MCP tool servers matter as much as model access. Credit-bundled subscription.
- Unify 💎 — a quality-predictive "Neural Router" that scores expected output quality before the call and hits cost/latency targets; $100 free credits; BYOK via virtual keys.
- Martian — dedicated per-request cost/quality router with max-cost and willingness-to-pay knobs (claims 20–97% savings); Free 2,500 req, Developer $20/mo.
- Braintrust Gateway 💎 — couples routing with eval + tracing + caching; OpenAI/Anthropic compat; generous free beta.
- APIpie 💎 — a meta-router (aggregates OpenRouter/EdenAI/DeepInfra) with one key, 148 coding models, plus bundled web search + chat memory.
- AIMLAPI — 500+ models, OpenAI + Anthropic compat, up to ~80% under direct. Eden AI — BYOK-friendly, ~5.5% platform fee, free sandbox. TrueFoundry (from $499/mo) and Kong AI Gateway (OSS free / Konnect cloud) — the self-hostable, on-prem-governance enterprise options.
- Atlas Cloud Coding Plan 💎 — unified LLM gateway with a flat coding sub: Starter $10/mo (800K credits/day), Lite $20/mo (1.8M/day), spanning DeepSeek V4 / Kimi K2 / GLM-5 / MiniMax M2 / Qwen3. Works with Claude Code, Codex, OpenCode, and any OpenAI-format tool; models priced up to ~50% under official API.
$0 access you can run a real agent loop on, ranked by what the community reports works (June 2026):
- Cerebras free 💎 — 1M tokens/day, no card, fastest (2000+ tok/s), Qwen3-Coder-480B + GPT-OSS-120B.
⚠️ 8K context cap kills whole-repo work. ⭐5 - Google AI Studio — biggest free context (Flash up to 1M) + Gemma 3 27B at 14,400 RPD.
⚠️ Gemini 2.5 Pro no longer free (~April 2026); limits slashed Dec 2025; free data used for training. ⭐4 - OpenRouter :free — best free coding model (Qwen3-Coder-480B) + DeepSeek/Llama/GLM, one key. Spend the one-time $10 → 1000 RPD forever (50 RPD otherwise). ⭐4
- Groq free 💎 — fastest small-prompt loops;
⚠️ 6,000 TPM cap = many small steps, not big context. ⭐4 - NVIDIA NIM 💎 — 1,000–5,000 credits, no card/no expiry, 40 RPM, frontier open models (MiniMax M2.x, Qwen3-Coder-480B, GLM-5, Kimi K2.5). Eval tier (credit-capped). ⭐4
- Mistral Experiment — 1B tokens/month (!), ~1 req/sec + training opt-in.
- Prototyping-only: GitHub Models (50 RPD), Cloudflare Workers AI, Together ($1 default).
-
Durable free strategy: route 60–80% of agent traffic to free Qwen3-Coder/GPT-OSS/DeepSeek (Cerebras + OpenRouter+$10 + NVIDIA NIM), then escalate the hard 20% to a paid frontier model.
⚠️ Free quotas tightened hard through 2025–2026, so assume any of them can shrink without notice.
Often the cheapest "plan" is one you qualify for. Students, OSS maintainers, and funded startups can get months-to-years of frontier access for $0 — credits that fund Claude Code, Codex, or any agent via the underlying API.
Students get the biggest free pool of all — it has its own deep section: Student & education plans 🎓 (the full table, verification mechanics, gotchas, and a $0 stack if you can't verify).
- OpenAI Codex for Open Source 💎 — 6 months of ChatGPT Pro + Codex free (~$1,200 value) + API credits, from a $1M fund. No minimum star count; open even to maintainers using OpenCode/Cline.
- GitHub Copilot Pro — free for OSS — maintainers of popular repos qualify for free Copilot Pro.
- JetBrains free for OSS — All Products Pack for established projects (renewable).
- Anthropic — Claude for Startups — $25K–$100K+ in Claude API credits (12 mo); funds Claude Code at API rates.
- Google for Startups — AI tier — up to $350K GCP/Vertex credits over 2 years; Vertex carries both Gemini and Claude.
- AWS Activate — up to $200K; now redeemable against Bedrock Claude, so it subsidizes Claude-Code-on-Bedrock.
- Microsoft for Startups Founders Hub — up to $150K Azure credits, with a no-VC entry tier (bootstrapped/solo welcome); GPT-5.x via Azure OpenAI.
- AWS Kiro Pro+ for Startups — a full free year of Kiro Pro+ (apply window reopened Apr 7 – Jun 30, 2026; excludes current Activate members).
- NVIDIA Inception — any stage, no deadline: GPU discounts, DGX Cloud time, up to $100K partner-cloud credits.
- Baseten AI Startup Program 💎 — up to $25K to self-host an open-weight coding model on dedicated inference.
- ModelScope — 2,000 free calls/day (Qwen3-Coder-480B), no card.
- NVIDIA Build — up to 5,000 free credits, 100+ models, OpenAI-compat.
- Cloudflare Workers AI — forever-free 10,000 Neurons/day of open-weight inference (Cloudflare-hosted models only).
- Plus the Free tiers section: Cerebras (1M tok/day), Google AI Studio, OpenRouter
:free, Groq.
Most startup credits require an application and (often) institutional funding. Read eligibility before counting on them — and remember credits expire (typically 12–24 months).
Students can unlock thousands of dollars of frontier coding access for free — but the map shifted hard in 2026 (GitHub paused signups, Google's free year ended, Cursor narrowed to North America). Here's the real, current state as of June 7, 2026 — what actually works, what's a trap, and what to do if you can't verify at all.
| Vendor | Offer | Value | Eligibility | Verify | Catch |
|---|---|---|---|---|---|
| GitHub Copilot Student 🆓 | Free Copilot Student plan: unlimited completions + 200 AI Credits/mo (source) | ~$120/yr vs Pro ($10/mo) | Enrolled 13+, degree/diploma program; re-checked monthly | GitHub Education (school email or dated enrollment proof) (source) | |
| Cursor 💎 | 1 free year of Cursor Pro ($20/mo of usage, frontier models, agent) (source) | ~$240 | University student, individual account, .edu email only | SheerID via dashboard; one-time per email (source) | |
| JetBrains Student Pack 🆓 | Free All Products Pack — every IDE (IntelliJ Ultimate, PyCharm, etc.) + .NET tools (source) | ~$289/yr | Accredited institution; program >1 year | School email, ISIC card, or GitHub Student Pack (auto-grant) | |
|
JetBrains AI (for students) |
AI Free ($0) + one 30-day AI Trial (Junie agent + cloud AI) (source) | ~$10/mo value for 30 days, then ~$0 | Any JetBrains edu license holder | Auto — click AI icon in IDE v2025.1+ | |
| Google AI Pro (Gemini) ❌ | CLOSED to new signups. Was free 12–15 months (Gemini Pro, NotebookLM Plus, 2TB→5TB, Antigravity, Jules) (source) | Was ~$240–$300; $0 now | N/A — ended Mar 11, 2026 globally (US final ~Apr 30) | Was SheerID | |
|
OpenAI / ChatGPT |
Codex $100 credits (2,500 credits) for students — agentic coding (source) | $100 in Codex usage | US/Canada uni students, resident in US/CA | SheerID on ChatGPT account | |
| OpenAI ChatGPT Edu 🆓 | Institution-provisioned ChatGPT (Codex included) at $0 to you (source) | $0 if your school has it | Only at contracting universities | School SSO — no personal apply | |
| Anthropic Claude for Education 🆓 | Campus-wide Pro-tier Claude (Opus/Sonnet, Projects, sometimes Claude Code) at $0 (source) | ~$240/yr value — if your school is a partner | Enrolled at partner uni (Northeastern, LSE, Syracuse, Columbia, etc.); sign in with institutional email | No self-serve — auto-granted when your .edu is recognized | |
| Anthropic Student Builders 🆓 | ~$50 in Claude API credits for a coding/research project (source) | ~$50 (+$5 default) | Any student, .edu email, academic project (no paid work) | Anthropic Console application (~5–7 days) | /for-student-builders URL now redirects — apply via Console |
| Anthropic Pro/Max direct ❌ | NONE. No individual student discount on Claude Pro/Max (source) | $0 student savings | N/A | N/A | |
| Mistral (Le Chat / Vibe) 💎 | Education plan $5.99/mo (vs $14.99 Pro) — incl. all-day CLI/IDE coding + Devstral agent (source) | ~60% off, ~$108 saved | Accredited higher-ed, worldwide; new accounts only | Institutional email auto-check (no SheerID) — manual fallback | |
| Perplexity 💎 | Education Pro $10/mo (50% off) + 1 free month; referrals stack to 24 months free (source) | Up to ~$480 via referrals | Students at SheerID-supported schools | SheerID | |
|
Replit |
Students: 50% off Core = $10/mo for first 6 months only. Educators: free + student credits (source) | Student ~$90; Educator ~$240/yr | Student: .edu email. Educator: verified instructor | .edu at checkout / educator application | |
|
Windsurf (→ Devin) |
Legacy free Pro for students — status uncertain. Brand folding into Devin/Cognition; student URLs redirect, Devin pricing shows no student tier (source) | "Possibly $0 if honored, else none" | Legacy: .edu, accredited | Legacy SheerID in-editor | |
| Tabnine ❌ | No student offer, no free tier. Paid only ($39–$59/mo) (source) | None | N/A | N/A | |
| Phind ❌ | Shut down Jan 16, 2026. Defunct (source) | None | N/A | N/A |
Quick read: GitHub Copilot, JetBrains, and Mistral are the most globally accessible (document/email-based). Cursor is the best single freebie (~$240) but North-America-first. Claude and OpenAI student access is institution-gated or needs a paid plan underneath.
Almost every offer above runs through one of four gatekeepers — learn these and you stop getting rejected:
- SheerID (Cursor, Perplexity, OpenAI Codex, Google's old offer, Windsurf legacy). Two stages: an instant check (name + school-from-dropdown + DOB + academic email matched against enrollment databases), and a document upload fallback if that fails. Critical gotcha: SheerID reads the enrollment date on the document, not the print date — an acceptance/admission letter (future term) is rejected; a transcript showing a past term is rejected. You need a current-term class schedule, tuition receipt, transcript-in-progress, or dated student ID, with your name + school name + a current-term date all visible in one image. Three failed attempts locks you into slow manual support (source).
- GitHub Student Developer Pack — the highest-leverage single verification: one approval cascades into free Copilot Student, JetBrains, and 100+ partner tools. School email or dated proof; approval ~5 days. Status lasts up to 2 years, then re-verify (the renew button unlocks only after expiry) (source).
- .edu / institutional email — the universal fast-path. When recognized, it turns a multi-day review into a few-second auto-approval. Generic Gmail never works. Cursor requires your account email and verification email to be identical. JetBrains/GitHub use the open-source
swotdomain list — you can submit your school's domain there if it's unrecognized. - ISIC card (~€4–25) and UNiDAYS — backups. Get an ISIC only if SheerID/GitHub won't recognize your school and you lack a usable institutional email; JetBrains accepts it directly.
Tips to get approved first try: use your institutional email and make your vendor account email identical to it; pick your school from the dropdown (don't free-type); enter name/DOB exactly as school records show; upload a clear, uncropped, un-edited image (manipulated-looking files are auto-rejected); one email = one offer.
Bootcamp / online / high-school caveats: SheerID and GitHub generally want degree- or diploma-granting accredited institutions. Bootcamps qualify for the GitHub Pack only if their school joined the GitHub Campus Program. JetBrains explicitly requires a program longer than one year, which excludes most short bootcamps. Cursor rejects high-school domains and non-.edu academic domains entirely.
No .edu? Wrong region? No credit card? You can still run a genuinely capable agentic-coding setup for $0 — by pointing a free harness at free model endpoints.
The backbone — free model providers (all OpenAI-compatible, no card to start):
- 🆓 OpenRouter — one API key, ~27 free models (Qwen3-Coder w/ 1M ctx, GLM-4.5-Air, gpt-oss-120b, Kimi K2.6). Free cap is 50 requests/day; a one-time $10 top-up permanently raises it to 1000/day (you keep the cap even after spending the $10). Global. (source)
- 🆓 ModelScope (Alibaba) — the volume king: 2,000 calls/day across 900+ models incl. Qwen3-Coder-480B. Catch: you must bind an Alibaba Cloud (Aliyun) account, and latency is China-CDN-optimized. (source)
- 🆓 Google AI Studio — Gemini 2.5 Flash free at ~1,500 RPD / up to 1M TPM — great free primary for large-context reads. Pro is capped to ~50/day. Free-tier prompts may train Google's models — never send secrets. (source)
- 🆓 Cerebras (fast: gpt-oss-120b, GLM-4.7, but ~5 RPM) and Groq (fast small models, but tiny 6–12K TPM) — use as speed/fallback, not primary. (Cerebras · Groq)
- 🆓 NVIDIA Build / NIM (~40 RPM, big models) and Cloudflare Workers AI (10k Neurons/day — best for free embeddings / codebase RAG). (NVIDIA · Cloudflare)
The harness (free, open-source):
- Claude Code + Claude-Code-Router (ccr) — get the Claude Code workflow at $0 by routing it through the free endpoints above. The centerpiece build.
- OpenCode — natively speaks 75+ providers (no router needed), lowest token consumption in agentic benchmarks. Best for cleanly mixing free providers.
- Cline / Roo (VS Code) and Aider (CLI, git-aware) — paste any free key and go.
- SoulForge (CLI) — edits AST symbols, not strings (LSP + persistent code-graph, 21 providers, MCP, headless CI); claims ~50% fewer tokens via structure-awareness. Free/OSS, BYO keys — pairs with any gateway above. Novel but niche.
Recommended $0 build: ModelScope Qwen3-Coder-480B (volume) as primary → OpenRouter GLM-4.5-Air / NVIDIA (fallback) → Cerebras/Groq (speed bursts) → Cloudflare for embeddings, all driven by OpenCode or Claude Code via ccr. For tool-calling reliability, prefer agent-tuned models (GLM-Air, Qwen3-Coder, gpt-oss-120b). The 50/day free tiers are enough to learn; ModelScope's 2000/day makes it a daily driver. Never send proprietary/secret code to
:freemodel variants — they may log or train on prompts.
- Signup pauses are real. GitHub Copilot paused all new Pro/Pro+/Max and Student signups on Apr 20, 2026 (agentic-compute costs). As of the June 1 changelog it's still paused — newly-verified students in June 2026 get the Pack but land on Copilot Free. Students who activated before Apr 20 keep access (source).
- Auto-renew traps. Cursor renews at $20/mo after the free year; Replit reverts to full Core after 6 months; Google's old offer auto-converts to $19.99/mo. Set a calendar reminder the day you activate.
- US-only / region-locked offers. Cursor is officially "North America" and dropped India from its dropdown; OpenAI's Codex $100 is US/Canada residents only; Claude for Education is partner-school-gated (heavily US/UK). Email/document-based offers (GitHub, JetBrains, Mistral) are far more reliable in India, SEA, LatAm, and Africa.
- Model downgrades on student tiers. Since Mar 12, 2026, GitHub Copilot Student can no longer self-select Claude Opus/Sonnet or GPT-5.x-Codex — you only reach them indirectly via Auto mode (Haiku is the default). The headline value is now the unlimited completions, not premium-model chat.
- "Free" often means "discount" or "credits." Mistral/Perplexity/Replit/Windsurf are discounts; OpenAI Codex and Anthropic Student Builders are credit grants (and Codex likely needs a paid Plus plan underneath to spend them). JetBrains' free pack covers the IDEs, not ongoing AI.
- Expiry & re-verification. GitHub re-verifies up to ~2 years; JetBrains and most SheerID offers are annual; Copilot Student re-checks monthly. A graduating .edu email that dies can silently break renewals — keep proof of enrollment current.
- Dead/defunct, ignore the SEO spam. Google's free year (ended Mar 11, 2026), the OpenAI free-Plus promo (ended May 2025), Tabnine's student deal, and Phind (shut down Jan 16, 2026) are all gone — many affiliate blogs still advertise them. There is no official individual Claude Pro/Max student discount; treat any "student code" for it as fake.
- Payment friction. No-card paths exist (GitHub, JetBrains, Mistral, Perplexity, Google's student rate) — best if you have no card. Watch for a temporary ~₹2 auth charge in India that refunds in 24–48h. Minimum age is usually 16 (18 in India).
-
xAI Grok Code Fast 1 (API) 💎 —
$0.20/$1.50/M($0.02 cached), 256K ctx, OpenAI + Anthropic compat. #1 by usage on OpenRouter. Fast and cheap enough for routine implementation work. $25 free signup credits; up to $175/mo via data-sharing.⚠️ Over-edits without tight scope, so escalate hard reasoning elsewhere. ⭐5 -
Mistral Le Chat Pro / Vibe 💎🆓🇪🇺 —
$14.99/mo($5.99 student). Cheapest major coding sub, includes the Vibe CLI terminal agent (Devstral 2). Free tier has real (limited) coding. ⭐4 -
Mistral Codestral / Devstral 2 (API) 🇪🇺 — Codestral
$0.30/$0.90(32K) with a free FIM endpoint (Continue.dev's go-to autocomplete); Devstral 2$0.40/$2.00, Devstral Small free. EU sovereignty. ⭐4 -
Inception Mercury 💎 — diffusion dLLM,
$0.25/$0.75–1/M, 128K, 5–10× faster than Haiku/GPT-4o-mini, #1 speed on Copilot Arena small-model tier. Latency-sensitive autocomplete buy, not a frontier reasoner. ⭐4 -
Morph Fast Apply 💎 — the "apply" layer: ~10,500 tok/s, ~98% merge accuracy, cuts token cost 50–60% / latency 90%+. Free 200 req/mo, $20 starter. MCP tool works in Claude Code & Cursor.
⚠️ openly transitional category ("Fast Apply Models are Already Dead"). ⭐4 - Relace 💎 — Morph peer with 256K apply context + bundled Search/Rank/Embed retrieval stack. Builder/infra buy. ⭐4
-
Cohere Command A —
$2.50/$10— the lane's weakest coding value (enterprise RAG/multilingual play, not an agentic-coding pick).
A different category from the plans above: here you pay for agent compute, not raw model access. Prompt-to-app builders generate (and often host) whole apps; autonomous "AI software engineers" take a ticket and open a PR. None of these is a backend you point Claude Code at — they're the product. Useful to know so you don't overpay for a metered builder when a $20 sub + a free harness would do.
- Devin (Cognition) — Core $20/mo (+ ~$2.25/ACU pay-as-you-go), Max $200/mo, Teams $80/mo + $40/seat. Fully-autonomous async agent with its own VM, browser and editor; runs its in-house SWE-1.6 model plus frontier models. Billed in ACUs (~15 min of work each). Devin 2.0 dropped the entry from $500 → $20. After absorbing Windsurf (June 2026) the IDE relaunched as Devin Desktop. native-only + API.
- Cosine Genie 💎 — Free (80 tasks) · Hobby $20/seat (5M credits) · Professional $200/seat (60M credits). Runs its own trained model (Genie 2.1), not a frontier wrapper; ingests a Jira ticket and opens a PR. Topped SWE-bench Verified. Generous free trial for an autonomous agent.
- Qodo (ex-CodiumAI) — Free (250 credits + 30 PR reviews/mo) · Teams $30/user (2,500 credits + unlimited PR review). Test-generation + autonomous PR-review bot (Qodo Merge) for GitHub/GitLab/Bitbucket — a category nothing else here covers as a flat sub.
- Replit — Core $20/mo ($25 usage credits, ≤5 collaborators) · Pro $100/mo (≤15 builders, credit rollover). Cloud IDE + Agent 4 (Claude Opus 4.7); credits cover AI and compute and deploy/hosting. Effort-metered — heavy users report $100–300/mo. native-only.
- Lovable 💎 — Free · Pro $25/mo · Business $50/mo. Prompt-to-fullstack (React + Supabase: auth, DB, hosting). Pro credits are shared across unlimited users (cheap for small teams); ~50% student discount; credit rollover. EU-built.
- Bolt.new (StackBlitz) — Free (1M tok/mo) · Pro $25/mo (10M tok, rollover) · Teams $30/seat. Runs the entire toolchain in-browser via WebContainers; Claude backend; deploy to Netlify. Token-metered.
- v0 (Vercel) — Free ($5 credits) · Premium $20/mo · Team $30/seat · Business $100/seat. The React + Tailwind + shadcn/ui UI specialist; explicit per-model menu (v0 Mini/Pro/Max). Tight Vercel-deploy coupling; has a models API.
- Emergent 💎 — Free · Standard $20/mo · Pro $200/mo. Multi-agent "engineer in a box" that ships backend, auth, DB, storage and Stripe (and mobile apps), not just frontend. Pro adds 1M context + custom agents.
- Tempo 💎 — Free · Pro $30/mo · Agent+ $4,500/mo (human-in-the-loop). Plan-before-code: generates flow diagrams + architecture before writing. React-first.
- Create.xyz / Anything 💎 — Free · Pro $19/mo annual. English-to-app; credits cover both build-time and your live app's runtime AI calls. Neon/Postgres backend.
-
Firebase Studio — Free preview · $24.99/mo (Google Developer Program, +$500/yr GCP credits). Gemini-powered cloud full-stack builder.
⚠️ Being wound down — migrate to Antigravity before 2027.
- Google Antigravity — Free preview · Pro $20/mo · Ultra $249.99/mo. Agent-first IDE + CLI that ships Gemini 3.x + Claude Sonnet/Opus 4.6 + gpt-oss-120b in one surface. The successor to Gemini CLI / Code Assist (both stop serving consumer requests June 18, 2026). Free tier trimmed to ~20 agent req/day.
- Google Jules 💎 — Free (15 tasks/day) · bundled into Google AI Pro $19.99 (~75–100 tasks/day) / Ultra $124.99. Async GitHub-PR agent (Gemini): clones your repo in a cloud VM and opens PRs while you work. No standalone sub — it stacks onto the same Google plan as Antigravity.
- Warp 💎 — Free (75 credits/mo) · Build $20/mo (1,500 credits + BYOK on all tiers) · Business $50/seat (mandatory ZDR). The terminal as an agent platform; can orchestrate Claude Code/Codex. Cloud-agent metering starts July 1, 2026.
- Qoder 💎 (Alibaba, ex-Tongyi Lingma) — Free · Pro $20/mo · Pro+ $60/mo. Alibaba's standalone Cursor-class agentic IDE; routes Qwen3-Coder + Claude via credits. The first-party IDE route into the Qwen ecosystem.
- Amazon Q Developer → Kiro — Q Developer Pro ($19/seat, Claude via Bedrock) is being retired (new signups closed May 15 2026); AWS funnels users to Kiro (Pro $20/1k credits · Pro+ $40 · Power $200), the spec-driven agent. A rare case of a hyperscaler killing one coding sub and replacing it with another.
Hidden gems & reseller proxies ⚠️
Squeezing frontier-ish coding out of <$10–30/mo. Genuine bargains exist, but the reseller-proxy corner is risky and rising.
Genuine bargains the community recommends: Chutes ($3/$10 for huge open-weight variety, decentralized) ✅ · OpenCode Go ($10 flat) · Synthetic ($20–30, reliable+private+CC-native) · Z.ai GLM (first-party). Best neutral diaries: patshead.com + InfoWorld's "vibe code for free."
-
Chutes 💎
⚠️ ✅ — Base$3(300 req/day) · Plus$10(2,000/day) · Pro$20(5,000/day). GLM-5/Kimi/DeepSeek/MiniMax/Qwen, OpenAI-compat, TEE privacy.⚠️ Decentralized (Bittensor) = variable latency/quality between nodes, no SLA, quantization drift, frontier models gated to $10+. Treat as hobby/non-critical, keep a fallback. ⭐5 -
NanoGPT 💎 — true pay-per-prompt ($0.10 min, crypto-friendly), proprietary + open models.
⚠️ tool-call failures reported in coding agents (OpenCode). Better as chat/API than a hardcore coding backend. ⭐3 -
AgentRouter
⚠️ — ~$200 free credits, routes Claude/GPT-5/DeepSeek/Zhipu, works as a Claude Code backend. A real free-credit on-ramp, but a non-profit with opaque long-term policy. Trials only, not proprietary code. ⭐3 -
DevPass (LLMGateway.io) 💎
⚠️ — flat-rate gateway:$29→$87 ·$79→$237 ·$179→$537/mo of usage (~3× value). 200+ models (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, GLM-4.7/Qwen3/Kimi K2.6), OpenAI and Anthropic-compat → Claude Code / OpenCode / SoulForge backend; metered in $, no hard request caps (premium models carry a weekly fair-use cap $10–140).⚠️ A 3× discount on frontier access = it is reselling first-party models — same ToS/ban exposure and relay-death risk as the proxies below; deposit only what you'll spend. ⭐3
Relays like PackyCode, YesCode, AnyRouter, EasyClaude, IKunCode, Cubence reverse-proxy official Claude Max/Pro accounts (ToS violation) or aggregate keys. Hard data: Anthropic's 2025–2026 crackdown forced simultaneous price hikes across these, and >60% of 2025 reverse-engineering relays died within 3 months. AnyRouter is Scamadviser-flagged. The universal community rule: only deposit what you need, never large sums — balances evaporate when a relay dies, and Anthropic also bans the underlying-account users. Aggregator-routers (CometAPI, ElectronHub) are the safer middle (legitimately metered) but you still trust a middleman with your prompts.
Most "open-weight" labs now ship an Anthropic-compatible endpoint, so you can keep Claude Code (or any Anthropic-SDK tool) and just repoint the base URL. Below are copy-paste configs that worked as of June 2026. Verify model names against each provider's docs — they rev fast.
Tip
ANTHROPIC_AUTH_TOKEN (not ANTHROPIC_API_KEY) is the variable Claude Code reads for third-party keys. If both are set, AUTH_TOKEN wins. Bump API_TIMEOUT_MS — open models can be slower to first token.
These five expose a native /anthropic route, so no proxy needed. Pick one, drop it into ~/.claude/settings.json:
| Provider | ANTHROPIC_BASE_URL |
Default model var | Source |
|---|---|---|---|
| Z.ai (GLM) 💎 | https://api.z.ai/api/anthropic |
GLM-5.1 |
docs |
| Moonshot (Kimi) | https://api.moonshot.ai/anthropic |
kimi-k2.6 |
docs |
| DeepSeek | https://api.deepseek.com/anthropic |
deepseek-v4-pro |
docs |
| MiniMax | https://api.minimax.io/anthropic |
MiniMax-M2.7 |
docs |
| Qwen (DashScope-intl) | https://dashscope-intl.aliyuncs.com/apps/anthropic |
qwen3.5-plus |
docs |
~/.claude/settings.json (example: GLM):
{
"env": {
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"ANTHROPIC_AUTH_TOKEN": "sk-your-zai-key",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "GLM-5.1",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "GLM-5.1",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "GLM-4.5-Air",
"API_TIMEOUT_MS": "3000000"
}
}Prefer not to touch the file? Export env vars per-shell instead (handy for a throwaway cc-glm alias):
export ANTHROPIC_BASE_URL="https://api.deepseek.com/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-your-deepseek-key"
export ANTHROPIC_MODEL="deepseek-v4-pro" # claude-opus-* → v4-pro
export ANTHROPIC_SMALL_FAST_MODEL="deepseek-v4-flash" # haiku/sonnet → v4-flash
claudeWarning
Gotchas worth knowing. Moonshot's Anthropic shim scales temperature (real = requested × 0.6) (docs). MiniMax M2.x ignores thinking: disabled — reasoning always runs (docs). The CC status line may still say "Sonnet" while a GLM/Qwen model answers — the mapping is silent.
When you want one model per job type (cheap background, big-context, vision), use claude-code-router as a local proxy:
npm i -g @musistudio/claude-code-router
ccr code # launches Claude Code pointed at the local router~/.claude-code-router/config.json — default work on DeepSeek, long context on Qwen, background grunt on Kimi:
{
"Providers": [
{
"name": "deepseek",
"api_base_url": "https://api.deepseek.com/chat/completions",
"api_key": "sk-deepseek-key",
"models": ["deepseek-v4-pro", "deepseek-v4-flash"]
},
{
"name": "dashscope",
"api_base_url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
"api_key": "sk-dashscope-key",
"models": ["qwen3-coder-plus"]
},
{
"name": "moonshot",
"api_base_url": "https://api.moonshot.ai/v1/chat/completions",
"api_key": "sk-moonshot-key",
"models": ["kimi-k2.6"]
}
],
"Router": {
"default": "deepseek,deepseek-v4-pro",
"background": "moonshot,kimi-k2.6",
"think": "deepseek,deepseek-v4-pro",
"longContext": "dashscope,qwen3-coder-plus",
"longContextThreshold": 60000
}
}Switch models live from inside Claude Code with /model deepseek,deepseek-v4-flash. The longContextThreshold (default 60k tokens) auto-routes oversized prompts to the longContext model (docs).
These extensions speak OpenAI Chat Completions, so use each provider's /v1 route, not /anthropic. In the extension settings pick API Provider → OpenAI Compatible and fill:
| Field | Value (example: DeepSeek) |
|---|---|
| Base URL | https://api.deepseek.com/v1 |
| API Key | sk-deepseek-key |
| Model ID | deepseek-v4-pro |
Other base URLs: GLM https://api.z.ai/api/paas/v4, Kimi https://api.moonshot.ai/v1, MiniMax https://api.minimax.io/v1, Qwen https://dashscope-intl.aliyuncs.com/compatible-mode/v1. Cline/Roo/Kilo share the same config shape; set a separate cheaper model in the extension's "Fast"/background slot if it exposes one.
Aider routes through LiteLLM, so any OpenAI-compatible endpoint works via --openai-api-base:
export OPENAI_API_KEY="sk-deepseek-key"
export OPENAI_API_BASE="https://api.deepseek.com/v1"
aider --model openai/deepseek-v4-proDeepSeek is built in, so you can skip the env dance entirely:
export DEEPSEEK_API_KEY="sk-deepseek-key"
aider --model deepseek/deepseek-v4-proSave it in ~/.aider.conf.yml so every project inherits it:
model: deepseek/deepseek-v4-pro
weak-model: deepseek/deepseek-v4-flash # commit msgs, summaries → cheaperOpenCode takes any OpenAI-compatible provider via opencode.json. Define several, then Tab//models to swap mid-session:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"deepseek": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "https://api.deepseek.com/v1" },
"models": { "deepseek-v4-pro": {}, "deepseek-v4-flash": {} }
},
"zai": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "https://api.z.ai/api/paas/v4" },
"models": { "GLM-5.1": {} }
},
"moonshot": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "https://api.moonshot.ai/v1" },
"models": { "kimi-k2.6": {} }
}
},
"model": "deepseek/deepseek-v4-pro",
"small_model": "deepseek/deepseek-v4-flash"
}Keys go in env (DEEPSEEK_API_KEY, ZAI_API_KEY, MOONSHOT_API_KEY) or opencode auth login. small_model handles titles/summaries so the cheap tier soaks up the chatter.
Sanity check any of these with a one-liner before trusting the routing:
curl -s $ANTHROPIC_BASE_URL/v1/messages \
-H "x-api-key: $ANTHROPIC_AUTH_TOKEN" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"'$ANTHROPIC_MODEL'","max_tokens":16,"messages":[{"role":"user","content":"ping"}]}'A clean JSON reply means your plan is wired in. A 401 means wrong key var; a 404 means you used the OpenAI /v1 path where an /anthropic one was needed (or vice-versa).
Where your prompts physically land, who can read them, and whether they feed a training set. Default behavior matters more than the marketing page — most providers offer Zero-Data-Retention (ZDR) only on request, and "we don't train on you" often hides a 7–30 day abuse-monitoring window. Verified June 2026; always confirm against the provider's current DPA before shipping regulated code.
| Provider / plan | Hosting region | Trains on your data? | ZDR available? | Compliance | Sensitive code? |
|---|---|---|---|---|---|
| Anthropic (API / Claude Code, commercial) | US (+ EU/Vertex/Bedrock options) | No — never on API/commercial (src) | ✅ Enterprise ZDR by agreement; else 7-day delete (30 opt-in) (src) | SOC 2 Type II, ISO 27001, HIPAA (BAA) | ✅ Best-in-class — consumer plans now opt-out of training (src), so use API/Work tiers |
| OpenAI (API / Platform) | US (EU/JP/global data-residency for biz) (src) | No on API by default (since 2023) (src) | ✅ Enterprise ZDR per endpoint, not self-serve; else ≤30-day (src) | SOC 2 Type II, ISO 27001/27017/27018/27701, CSA STAR | ✅ Strong — note NYT litigation hold has tested "deletion" claims (src) |
| Google (Gemini API / Vertex) | US + EU + global (Vertex region pinning) | No on paid API / Vertex; free AI Studio tier can be used | ✅ Vertex enterprise controls + region lock | SOC 2/3, ISO 27001 family, HIPAA, FedRAMP | ✅ via Vertex (region-pinned); 🚫 avoid free AI Studio for secrets |
| Cursor (Privacy Mode) | US (routes to OpenAI/Anthropic/Google/xAI under ZDR contracts) | No when Privacy Mode on (src) | ✅ ZDR with all model providers; on by default for Teams/Enterprise (src) | SOC 2 Type II | ✅ if Privacy Mode confirmed; |
| GitHub Copilot (Business/Enterprise) | US + EU data residency (GA 2026; JP/AU roadmap) (src) | No — Business/Enterprise excluded from training | SOC 2 Type II, ISO 27001, FedRAMP (select models) (src) | ✅ Enterprise + residency enabled | |
| 🆓 GLM / Zhipu (Z.ai) | 🇨🇳 China DCs (intl endpoint exists) (src) | Policy: not without consent — verify per contract | Limited public attestations; not GDPR-ready without DPA | ||
| Kimi / Moonshot | 🇸🇬 Singapore servers (src) | Ambiguous — ToS "improve the services" reads as training-permissive (src) | ❌ No public ZDR tier | Minimal public attestations | 🚫 Not for sensitive code without a signed carve-out |
| DeepSeek | 🇨🇳 China (data collected & stored in PRC) (src) | Yes by default — ToS permits training on submissions (src) | ❌ None on first-party API | None relevant; subject to PRC security law | 🚫 Worst choice for IP — run the open weights locally instead |
| MiniMax | 🇨🇳 Mainland China (entity in 🇸🇬) (src) | Claims GDPR/regional compliance; scope unclear | ❌ No public ZDR tier | Self-asserted GDPR alignment, no major attestation | 🚫 PRC jurisdiction — avoid for sensitive code |
| Qwen (Alibaba Model Studio) | 🇸🇬 Singapore (intl) / 🇨🇳 Beijing (CN) — keys not interchangeable (src) | No — Alibaba Cloud states it won't train on your data | Alibaba Cloud SOC/ISO (cloud-level) | ||
| 💎 Synthetic | US (routes to open-weight model hosts) | No first-party training claim — verify downstream hosts | Limited public attestations | ||
| OpenRouter | Pass-through (provider-dependent) | Only if you enable prompt logging; off by default (src) | ✅ One-click "ZDR-only" routing filter (src) | Inherits downstream provider posture | ✅ if you lock to ZDR endpoints — otherwise risk = whatever it routed to |
| Vercel AI Gateway | US/global (pass-through to chosen models) | No first-party training; inherits provider | SOC 2 Type II (Vercel platform) | ||
| Groq | US (GCP buckets, US) (src) | No — contractually barred from training on I/O | ✅ Self-serve ZDR toggle in Data Controls | SOC 2 Type II | ✅ Strong US-only story; speed + privacy |
| Cerebras | US data centers only (src) | No — I/O discarded after response | ✅ ZDR effectively default (in-memory, no retention) | SOC 2 (see policies page) | ✅ Good for US-resident sensitive workloads |
- "No ZDR available" + Chinese hosting (DeepSeek, MiniMax, Kimi, GLM) = treat as public. If you love the models, run the open weights on your own hardware — that sidesteps the jurisdiction and retention questions entirely.
- Aggregators (OpenRouter, Vercel, Synthetic) are only as private as the endpoint they forward to. OpenRouter's ZDR-only filter is the cleanest guardrail; without it you inherit the weakest downstream provider.
- "Doesn't train" ≠ "doesn't store." Default abuse-monitoring windows (7–30 days at OpenAI/Anthropic) still mean your prompts sit on a disk somewhere unless you hold a ZDR agreement.
- For regulated/IP-sensitive code, the safe tier is: Anthropic/OpenAI/Google enterprise with signed ZDR + region pin, GitHub Copilot Enterprise with data residency, Cursor with Privacy Mode verified, or US-only inference (Groq/Cerebras).
- Default vs. configured is the whole game — Copilot residency and OpenRouter ZDR are off until you opt in; Cursor Privacy Mode and Anthropic consumer training flipped toward privacy but only on the right tier.
Compliance badges reflect provider self-attestation; request the current SOC 2 report and DPA before relying on any cell. China-hosted providers are subject to PRC data and national-security law regardless of stated policy.
SWE-bench Verified (mostly vendor-reported; treat as directional — contamination concerns exist, SWE-bench Pro is the cleaner successor):
| Tier | Model | SWE-bench Verified | ~Cost behind it |
|---|---|---|---|
| Frontier | Claude Opus 4.8 | 88.6% | Max $100–200/mo |
| Frontier | GPT-5.3-Codex | 85% | ChatGPT Pro $100–200 |
| Frontier | GPT-5.2 | 80% | — |
| Value 💎 | DeepSeek V4-Pro | 80.6% (LiveCodeBench 93.5%) | $0.435/$0.87 per M |
| Value 💎 | MiniMax M2.5 | 80.2% | $0.15/$1.15 or $10/mo |
| Value 💎 | Kimi K2.6 | 80.2% | $0.95/$4.00 or $19/mo |
| Frontier | Claude Sonnet 4.6 | 79.6% | Pro $20 |
| Value 💎 | GLM-5.1 | 77.8% | $10–30/mo plan |
| Speed/cheap | Grok Code Fast 1 | 70.8% | $0.20/$1.50 per M |
A $10–30/mo flat plan gets you to about 78–80%. The last 5–10 benchmark points cost $100–200/mo. Pay for them only when a task actually needs them.
The subscriptions above are cheap if you read the fine print. These are the gotchas that quietly drain prepaid balances, burn quota 3x faster than the marketing implies, or get your account banned. Each one is a real, documented pattern — not hypotheticals.
| Trap | What it costs you | How to avoid |
|---|---|---|
| No spend cap on usage-based billing | A runaway agent loop bills overage in arrears with no ceiling | Set it before first run |
| Peak-hour quota multipliers | Your "400 prompts" becomes ~133 | Schedule heavy work off-peak |
| Plan serves an older model | Paying flagship price for last-gen quality | Verify the served model, not the brand |
| Tool-heavy agents burn credits | Every tool round-trip re-bills full context | Cache + trim context |
Proxy strips cache_control |
100% input tokens billed when you think caching is on | Check for actual cache hits |
| Quarterly/annual auto-renew | A surprise yearly charge for a tier you outgrew | Calendar the renewal date |
| Reseller relay dies | Prepaid balance vanishes overnight | Don't prepay relays |
| Free-tier rug-pull | Workflow breaks when the freebie ends | Have a paid fallback ready |
| Sub-in-third-party-tool ToS ban | Account terminated, balance gone | Use official endpoints |
| Wrong regional Qwen key | Key silently rejected / wrong billing entity | Match key region to endpoint |
1. Not setting a spend cap (Cursor & every usage-based plan). Without a configured limit in Settings → Billing, on-demand usage bills automatically in arrears — there is no default ceiling, so an agent stuck in a loop on a MAX-mode model can run up a large bill before you notice. Set a team-level (and per-member, on Enterprise) spend limit before your first agentic run. ✅ Cursor spend-limit docs · overage billing
2. Peak-hour quota multipliers (GLM 3x). Zhipu's GLM-5 consumes 3x quota per request from 14:00–18:00 UTC+8 and 2x off-peak. So a plan you think gives you ~400 prompts effectively gives you ~133 during peak hours. The flagship models (GLM-5 / 5.1) are also Pro-tier-and-up only — Lite subscribers silently get GLM-4.7. Plan intensive sessions outside the peak window. Z.AI FAQ · China coding-plan pricing breakdown
3. The plan serves an older model than the brand (MiniMax M2.1). MiniMax markets M2.5/M2.7, but the Coding Plan subscription is powered by M2.1 — the older model — while pay-as-you-go gets the newer ones. For automated agent work, PAYG on the current model can beat the plan on both cost and capability. Always confirm which model version the subscription serves, not what the homepage advertises. Verdent: which MiniMax model · refund complaint #11
4. Credit burn on tool-heavy agents. Agentic loops re-send the entire conversation + tool results on every step. A 20-step task with a 30k-token context can bill 600k+ input tokens — most of it the same text re-read 20 times. On value-API plans this is where budgets evaporate. Trim context aggressively and lean on prompt caching for the static system/tool-definition prefix.
5. Cache-control stripped by proxies. Anthropic honors cache_control only on the native Messages wire format. Route Claude through a proxy that uses the OpenAI-compat path (e.g. OpenRouter's default chat-completions mode) and the cache markers are dropped during serialization — every request bills full input tokens while your code believes caching is active. Verify with an actual cache-hit metric, not by assuming the SDK flag worked. OpenRouter prompt-caching docs · bug report: caching not applied via OpenRouter
6. Quarterly/annual billing surprise. Several "cheap monthly" plans are cheapest only on annual/quarterly commit, and they auto-renew. The yearly charge lands long after you've moved to a better tool. Set a reminder ~1 week before any renewal date and re-evaluate.
7. Reseller relay dying with your prepaid balance. Gray-market relays that resell flagship access at a discount take prepaid top-ups, then disappear (or get their upstream key revoked) — and your balance goes with them. Treat any non-official relay as a hot wallet: never prepay more than you'd lose, and keep an official fallback configured. (See the reseller/hidden-gem section for which are reputable.)
8. Free-tier rug-pulls. 🆓 Generous free tiers exist to acquire you; terms change with little notice (rate limits tighten, the free model gets swapped for a weaker one, or the tier is killed). Don't build a production workflow whose economics only work on a freebie — keep a paid path one config change away.
9. ToS bans for using a sub inside a third-party tool. First-party subscriptions (Claude Pro/Max, ChatGPT Plus, etc.) are licensed for the vendor's own client. Piping that subscription's session into a third-party IDE/agent via a token-extraction relay violates ToS and gets accounts terminated — taking any prepaid value with them. If you want a sub usable in arbitrary tools, buy an API plan with a real key, not a consumer chat sub.
10. Buying the wrong Qwen key. Alibaba's DashScope has separate, non-interchangeable regions — Singapore (dashscope-intl), US-Virginia (dashscope-us), and China-Beijing (dashscope). A key minted in one region fails against another region's endpoint, and the China vs. international platforms are distinct billing entities entirely. Pick the region matching your account/users and pin both the key and base URL to it. Alibaba region/endpoint reference · DashScope setup guide
Rule of thumb: before paying, ask three questions — which exact model does this tier serve, what's the real per-day quota after multipliers, and what happens to my balance if the provider vanishes? If you can't answer all three, you're not buying a plan, you're buying a surprise.
The year the "unlimited" era ended. Every major coding subscription either re-priced, re-metered, or got killed — usually mid-cycle, usually with the existing crowd grandfathered while new subs paid more. Skim this before you commit to any annual plan.
| Date | Event | Verdict |
|---|---|---|
| Jan 23, 2026 | Z.ai cuts daily GLM Coding Plan sales volume to 20% of prior level to protect existing users — early sign the cheap-China-coding-plan party was ending. | |
| Feb 11, 2026 | GLM Coding Plan price ~doubled — first-purchase discounts killed, overseas Lite moved to ~$10/mo. New subs only; existing rates held. (source) | |
| Mar 19, 2026 | Windsurf scraps its credit pool for daily/weekly quotas, Pro $15→$20, adds a $200 Max tier. Existing Pro/Teams grandfathered on price but migrated to rate-limits — you can no longer sprint a month's pool in one project. (source) | 🔄 re-meter |
| Mar 20, 2026 | Alibaba closes Qwen Coding Plan Lite ($3/mo) to new subs; Pro ($50/mo) becomes the only tier. Existing Lite subs keep renewing. (source) | 🔻 budget tier gone |
| Apr 2, 2026 | OpenAI moves Codex to per-token credits (1 credit = $0.01) for Plus/Pro/Business, replacing per-message estimates. A typical task now runs 5–45 credits. (source) | 🔄 re-meter |
| Apr 9, 2026 | OpenAI launches ChatGPT Pro $100 (vs Claude Max) with a launch promo: 10× Plus Codex usage through May 31. (source) | 🎁 promo window |
| Apr 15, 2026 | Alibaba kills the Qwen Code free OAuth tier (the 2,000 req/day freebie). The free-CLI loophole closes. (source) | 🔻 free tier gone |
| May 6, 2026 | Anthropic permanently doubles Claude Code 5-hour limits (Pro/Max/Team/Enterprise) and drops peak-hour throttling — funded by the SpaceX Colossus compute deal. Weekly caps unchanged at this point. (source) | 🟢 more for same |
| May 13, 2026 | Anthropic follows up with a +50% bump to weekly limits — but this one expires Jul 13, 2026 unless extended. (source) | 🟢 temporary boost |
| May 22, 2026 | DeepSeek makes its 75% V4-Pro discount permanent — input ~$1.74→$0.435, output ~$3.48→$0.87 per M tokens. Sets the API price floor for the year. (source) | 🟢🆓-ish floor |
| May 31, 2026 | ChatGPT Pro $100 10× Codex promo expires — settles to 5× Plus. If you subbed for the multiplier, this is the cliff. (source) | ⏳ promo ends |
| Jun 1, 2026 | GitHub Copilot moves all plans to usage-based AI Credits (1 credit = $0.01, billed on tokens). Monthly plans get a credit allotment matching the price; annual subs stayed on legacy PRU billing but saw model multipliers rise. Power users reported agentic bills jumping 10×–50×. (source) | 🔄 re-meter (annual safe) |
| Jun 9, 2026 | Anthropic launches Fable 5, a new Claude-5-family model, available via the Claude API, Claude Code, and GitHub Copilot Pro+/Max/Business/Enterprise. (source) | 🆕 new frontier model |
| Jun 12, 2026 | Moonshot ships Kimi K2.7-Code — 1T MoE (32B active), 256K ctx, Modified-MIT — +21.8% over K2.6 on Kimi Code Bench v2 at ~30% fewer reasoning tokens; API $0.95/$4.00 per M, Kimi Code Beta sub still ~$19/mo. A HighSpeed mode (~180 tok/s) followed Jun 15. (source) | 🟢 model bump |
| Jun 13, 2026 | Z.ai GLM-5.2 — 744B MoE, 1M-token context, MIT-licensed weights — ships on every GLM Coding Plan tier at no extra cost. Some sources report tiers re-cut to ~$10/$30/$80 (from $18/$72/$160); verify before buying. (source) | 🟢 bump / |
| Jun 15, 2026 | Anthropic retires the legacy claude-opus-4-20250514 and claude-sonnet-4-20250514 snapshots (404 after this date) — move to claude-opus-4-8 / claude-sonnet-4-6; Opus 4.1 is deprecated, API retirement Aug 5, 2026. (source) |
🔻 model lifecycle |
| Jun 18, 2026 | Google shuts down Gemini CLI for free/Pro/Ultra users — no grace period; any script calling gemini breaks. Replacement is the closed-source Antigravity CLI (no day-one feature parity). Enterprise Code Assist licenses unaffected. (source) |
☠️ killed |
| Jun 18, 2026 | Google Antigravity CLI goes live as the closed-source Gemini-CLI replacement — free tier (all models, ~5h quota refresh, no card), AI Pro $20/mo, AI Ultra $249.99/mo, PAYG $25 / 2,500 credits; roster spans Gemini 3.1 Pro / 3 Flash + Claude Sonnet 4.6 / Opus 4.6 + GPT-OSS 120B. (source) | 🆕 replacement live |
Patterns worth internalizing:
- Grandfathering is the rule, not the exception. GLM, Qwen, Windsurf, and Copilot-annual all protected existing subscribers. Locking in before a hike is a real strategy.
- "Promo" = the new floor, sometimes. DeepSeek made its discount permanent; OpenAI let its 10× promo lapse. Read which one you're betting on.
- Quotas replaced pools everywhere (Cursor Jun 2025, Windsurf, Copilot, Codex). Daily/weekly rate-limits mean you can't front-load a month of work in a weekend anymore — budget for the cadence, not the total.
Aggregated from r/LocalLLaMA, r/ChatGPTCoding, r/ClaudeAI, r/cursor, r/Anthropic, Hacker News, and neutral blogs (patshead, InfoWorld, serenitiesai, vibecoding, verdent, every.to).
- The most-recommended budget pick is the GLM Coding Plan, usually framed as the cheapest way to run Claude Code. The line people keep quoting: "GLM-4.6 is about 80% as good as Claude Code for a third of the price."
- For Claude itself, a plan beats the API at any real volume, because most Claude Code tokens are cache-reads (free on a subscription, billed on the API). One often-cited month would have cost $5,623 on the API, which is 4.5 years of Max 5x.
- The loudest running complaint is metering. Cursor (June 2025), GitHub Copilot (June 2026), and Windsurf all swapped request caps for usage credits, and Copilot's agentic bills jumped 10–50× for heavy users.
- The common setup is to pair a frontier sub for hard work with a cheap open-weight plan for the overflow. The pair people name most often is Claude Pro at $20 plus GLM Lite at $10.
- When someone posts "I canceled my $200 sub," they have usually moved to Factory's Droid.
- On the skeptical side: Cerebras Code drew fire for advertising "2000 TPS / no weekly limits" while enforcing hidden daily token caps. People warn off sketchy reseller-proxy Claude keys, flag China-hosted plans on privacy, and get caught out by GLM's quarterly billing. OpenRouter stays the default "one key for everything," but flat-rate plans beat it for heavy daily use.
- Windsurf is effectively gone as a brand: Cognition pushed a June 2 OTA update rebranding it to Devin Desktop (same editor, same Free / Pro $20 / Max $200 plans), and the community reads it as "Windsurf sold for pieces." Cursor — fresh off a ~$900M raise — remains the default IDE for people who don't want a terminal, while Claude Code owns the terminal/agentic lane.
- The "$20 plan is a delusion" argument got loud in June: a widely-shared Hacker News thread ("The $15,000 AI Bill") pegs Claude Max at roughly $8K of tokens a month and argues heavy agentic users should stop pretending a flat $20 sub covers real agent loops.
- The local-model story firmed up on r/LocalLLaMA around Qwen 3.6-27B (~24GB VRAM) and Qwen 3.6-35B-A3B (runs on less), with DeepSeek V4-Flash repeatedly cited as matching Claude Haiku on multi-file refactors at 1M context.
Sometimes the right answer to "which sub?" is "none." If you have a spare GPU, work under NDA/air-gapped, or just resent paying rent on tokens, the open-weight tier in 2026 is genuinely good enough for day-to-day coding. This isn't a subscription — it's the exit ramp from one.
| Model | Total / active params | Realistic local home | Coding niche |
|---|---|---|---|
| Qwen3-Coder 30B-A3B | 30B / 3B (MoE) | ~17 GB @ Q4 → fits a single 24 GB GPU (Unsloth) | 💎 best quality-per-VRAM; the default "it just runs" pick |
| Devstral Small 2 (24B) | 24B dense | ~14 GB → RTX 4090 or 32 GB Mac (Mistral) | agentic / SWE-bench, OpenHands & SWE-agent scaffolds |
| gpt-oss-20b | 20B / ~3.6B (MoE) | ~12–16 GB w/ MXFP4 (Unsloth) | 🆓 Apache-2.0, lowest barrier; runs on a 16 GB laptop |
| gpt-oss-120b | 120B / ~5B (MoE) | single 80 GB GPU, or ~64 GB unified w/ offload (blog) | strong reasoning on one card |
| GLM-4.6 | 357B / 32B (MoE) | 135 GB @ dynamic 2-bit; needs RAM+VRAM ≥ file size (Unsloth) | near-Sonnet-4 coding, workstation/Mac Studio territory |
| Devstral 2 (123B) | 123B dense | ~65 GB @ Q4 → H100 / 192 GB Mac (Mistral) | 72% SWE-bench Verified, Opus-class agentic |
| Qwen3-Coder 480B-A35B | 480B / 35B (MoE) | ~276 GB @ UD-Q4 (≈BF16 quality) (Unsloth) | frontier-open; multi-GPU / cloud only |
| DeepSeek V3.2 | 671B / 37B (MoE) | 350–400 GB @ 4-bit; 2–4+ high-end cards (apxml) | the big one; cluster, not desktop |
Rule of thumb: available memory (VRAM + system RAM) ≥ quantized file size. MoE models (the
A3B/A35Bsuffix = active params) punch far above their VRAM weight — that's why a 30B beats a 30B-dense on the same card.
| Runtime | Best for | Note |
|---|---|---|
| Ollama | "make it work in 2 minutes" | one-line pulls, now with a cloud tier |
| LM Studio | GUI, model browsing, MLX on Mac | nicest on-ramp for non-CLI folks |
| llama.cpp | squeezing CPU+GPU offload, GGUF tuning | the engine under most of the above; SSD offload when you're short on RAM |
| vLLM | throughput / multi-GPU / serving a team | the production choice for 120B+ and sharded MoE |
Run the same models on hosted NVIDIA GPUs, same CLI, zero data retention from partners (Ollama): Free ($0, 5-hour + weekly caps), Pro $20/mo (3 concurrent cloud models, ~50× free usage), Max $100/mo for sustained agent loads. Billed on GPU-time, not tokens — short, cache-friendly prompts stretch your quota. Good bridge while you decide whether to buy silicon.
The honest math, not a sales pitch:
- You already own a 24 GB GPU → Qwen3-Coder 30B or Devstral Small is effectively $0/mo (plus electricity). Crossover is immediate — skip the sub.
- Buying hardware for it → a used 24 GB card (~$700–900) pays back a $50/mo sub in ~14–18 months, a $200/mo plan in ~4–5 months. A 96 GB+ rig for GLM-4.6/120B-class work only pencils out against the $100–200/mo tiers, and even then over a year-plus horizon.
- Pure cloud-hosted open weights (Ollama Pro $20, OpenRouter, etc.) → cheaper than a flagship sub, but you're renting again — it competes with the subscription tier, it doesn't escape it.
When self-host wins: privacy/compliance, offline work, high-volume batch/agent loops, or you've already sunk the hardware cost. When it doesn't: you want frontier quality (480B/V3.2 are cluster-class), your time is worth more than the ops overhead, or your usage is bursty enough that a flat sub is just less hassle. For most people the sweet spot is hybrid — a local 30B for the 80% of routine edits, a paid sub or value-API kept on standby for the hard 20%.
Real questions from devs trying to run agentic coding without overpaying. Answers reflect rules as of June 2026 — this space moves fast, so verify links before you bet money on them.
Q: Is it legal/within ToS to point Claude Code at GLM, Kimi, or DeepSeek?
Yes. Claude Code reads ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN, and any Anthropic-API-compatible endpoint is a drop-in. Z.ai even ships an official Claude Code guide and an /api/anthropic endpoint for exactly this (Z.ai docs, cc-compatible-models). You're running Anthropic's open-source CLI against someone else's paid model — Anthropic doesn't police which model the binary talks to. The thing that's banned is the opposite direction (below).
Q: Will I get banned for any of this? Not for pointing the CLI at a non-Anthropic model. You will get cut off if you use a Claude Pro/Max/Free OAuth token inside a third-party tool (Cursor, Cline, OpenCode, OpenClaw, etc.). Anthropic silently blocked this on Jan 9, 2026 and formalized it in Feb 2026 docs (The Register, VentureBeat). OAuth subscriptions are for Anthropic's own apps only.
Q: So can I use my Claude Max sub inside Cursor or Cline? No — not via your subscription login. That's the banned OAuth path. Your options:
- Use an Anthropic API key (pay-as-you-go) in those tools — fully allowed.
- Use Cursor's own $20/mo plan (Cursor pays Anthropic, you pay Cursor).
- Keep your Max sub inside Claude Code / Claude Desktop / claude.ai where it belongs.
Claude Code legal docs · VentureBeat
Q: Is ANTHROPIC_BASE_URL "officially" sanctioned for swapping models?
It's documented for enterprise LLM gateways, not advertised as "route to a competitor." But it's a standard env var on the official binary, and providers build around it openly. No OAuth token is involved, so there's nothing to revoke. Pragmatically: safe and common.
Q: Is my code used for training on cheap consumer plans?
On Anthropic Free/Pro/Max: yes by default since the Aug 28, 2025 terms update — including Claude Code from those accounts — unless you opt out at claude.ai/settings/data-privacy-controls. Opted-in data has a 5-year retention; opted-out is 30 days (Anthropic consumer terms, Anthropic privacy). API / Team / Enterprise are NOT trained on — they fall under Commercial Terms (Claude Code data usage).
| Path | Trained by default? | Opt-out? |
|---|---|---|
| Free / Pro / Max | ✅ yes | toggle in settings |
| API (1st-party) | ❌ no | n/a |
| Team / Enterprise | ❌ no | n/a |
| Z.ai / Kimi / DeepSeek consumer | assume ✅ | read their policy |
Q: Are Chinese plans (GLM, Kimi, Qwen, DeepSeek) "safe"? Safe to run — they're standard HTTPS API calls. The real question is data handling: assume prompts may be logged/used for training, data sits on PRC servers, and content filtering applies. Fine for OSS, hobby, and throwaway code. For proprietary/regulated/client code, don't — use a 1st-party API with a no-train commitment, or self-host. Treat it like any third-party SaaS you didn't sign a DPA with.
Q: What's the cheapest way to run Claude Code? Cheapest non-free answer right now is a third-party coding plan behind the CLI — e.g. the GLM Coding Plan at ~$18/mo gives quota-based access to GLM-5.1/Turbo with full Anthropic-API compatibility (Truescho, Z.ai). Cheaper still: a local model via Ollama (Anthropic-API compatible since v0.14.0, $0 in tokens) (Ollama blog). Cheapest of all: free tiers (below) — at the cost of quota and quality.
Q: Do I even need a subscription? No. You need the CLI (free, open source) plus a token source. That source can be: a subscription, a pay-as-you-go API key, a third-party coding plan, a free tier, or a local model. Pick per workload.
Q: Subscription vs API — which is cheaper? Depends on volume and how steadily you work:
- Heavy daily driver (hours/day, long sessions): a flat subscription (Anthropic Max, or a $18 GLM plan) wins — you'd blow past it on metered API.
- Bursty / occasional (a few sessions a week): API pay-as-you-go is cheaper — you pay only for what you burn, no idle months.
- Rule of thumb: if you'd hit the sub's quota >~60% of days, the sub pays off. Otherwise meter it.
Q: How do I avoid surprise overage bills?
- API: set hard spend limits / budget alerts in the provider console — this is the single most important step. Add a low monthly cap.
- Prefer flat-rate subscriptions when usage is steady — they can't overage, they just throttle.
- Watch context size: agentic loops re-send the whole context every turn, so a bloated session quietly multiplies token cost. Use
/compact, scope the repo, and kill runaway loops. - Use a cheaper model for the bulk, expensive for the hard parts (model routing) instead of one premium model for everything.
Q: What's a "prompt" vs a "request" vs a "token"?
- Token = the billing/measurement atom (~¾ of a word). You're charged per input + output token. Everything ladders up from this.
- Request = one API call (one round-trip to the model). One request carries many tokens.
- Prompt = fuzzy marketing word. On consumer chat plans it often means "one message you send" (≈ one request). Watch out: agentic tools fire many requests per "prompt" — a single instruction in Claude Code can spawn dozens of tool-call round-trips, each metered. A plan advertising "N prompts/day" is not the same as N requests.
Q: Best free option? 🆓 For local/private: Ollama + a coding model (Qwen, DeepSeek-Coder, GLM-Air) behind Claude Code — unlimited, offline, your hardware is the only cost (Ollama blog). For cloud free tiers: rotating promo credits and free quota from Z.ai/Kimi/Qwen/Google AI Studio — generous but rate-limited and usually trained-on. Use free tiers for learning and side projects; don't build a business on a tier that can vanish overnight.
Q: I just want the best bang-for-buck, one pick. Pick by spend level: $0 → Ollama local. ~$18/mo flat, heavy use → a GLM-class coding plan behind Claude Code. Bursty pro work on sensitive code → 1st-party API key with budget caps. Everything else is tuning around those three.
Q: Can a provider just block the trick I'm relying on? Yes — that's the structural risk. The Jan 2026 OAuth ban erased a whole category of "use your sub everywhere" tools overnight, with no notice (Hacker News thread). API-key and base-URL setups are durable (they're paid, sanctioned interfaces). OAuth-token reuse and undocumented loopholes are not — never make one load-bearing.
Q: Will swapping in a cheaper model tank quality? Less than you'd think for routine work. On SWE-Bench Pro, GLM-5.1 reportedly edges out Claude Opus 4.6 (Truescho) — vendor-flavored, so trust your own eval over any leaderboard. Frontier reasoning, tricky debugging, and long-horizon agentic tasks still favor top-tier Anthropic/OpenAI models. Best practice: route cheap-model for grunt work, premium for the hard 10%.
Sources: Anthropic consumer terms · Claude Code data usage · Claude Code legal · The Register — third-party ban · VentureBeat · Z.ai Claude Code docs · cc-compatible-models · Ollama + Anthropic API
New to AI coding subscriptions? Here are the terms that show up everywhere in this list, one line each. ✅ = factually verified against a primary source.
| Term | What it means |
|---|---|
| Agentic coding | The model doesn't just autocomplete — it plans, reads/edits files, runs commands, and loops on the result until a task is done. |
| Harness | The app wrapping the model that gives it tools, file access, and the agent loop (e.g. Claude Code, Cursor, Cline, Aider). Same model, different harness = very different results. |
| Anthropic-compat endpoint | An API that speaks Claude's /v1/messages wire format, so tools built for Claude (like Claude Code) accept it as a drop-in backend. |
| OpenAI-compat endpoint | An API that speaks OpenAI's /v1/chat/completions format — the de facto standard most third-party providers and routers expose. |
ANTHROPIC_BASE_URL |
The env var that re-points Claude Code (or any Anthropic SDK client) at a different Anthropic-compat backend — the core trick for using a subscription/proxy in place of the official API. |
| BYOK | "Bring Your Own Key" — the tool is free or cheap, but you plug in your own provider API key and pay that provider directly for usage. |
| Prompt caching / cache-hit | Reusing the static prefix of a prompt (system, repo context) across calls. A cache hit bills those tokens at ~0.1× input price — a 90% discount — vs a one-time 1.25× write cost. ✅ source |
| Context window | Max tokens the model can "see" at once (input + output). Bigger window = more code/history in scope, but cost and latency scale with what you actually fill. |
| Tokens vs requests vs prompts | Token = sub-word billing unit (~4 chars). Request/prompt = one API call (may contain thousands of tokens). Plans cap one, the other, or both — read which. |
| Flat-rate vs pay-as-you-go | Flat-rate = fixed monthly fee, usage bounded by limits. PAYG = metered per token/request, bill scales with usage. Hybrids cap PAYG with a monthly ceiling. |
| Rate limit (RPM/TPM/RPD) | Throughput ceilings: Requests / Tokens Per Minute, Requests Per Day. The real-world bottleneck on heavy agentic loops, often more than price. |
| 5-hour rolling window | Anthropic's usage cap style on Claude paid plans: quota resets on a sliding 5-hour clock rather than per-calendar-day, so a burst now eats into your next few hours. |
| MoE (Mixture-of-Experts) | Architecture where only a few "expert" sub-networks fire per token, so a model can be huge in total parameters yet cheap/fast to run (e.g. DeepSeek, Qwen, Kimi families). |
| SWE-bench Verified | A 500-task, human-validated subset of SWE-bench: real GitHub issues from 12 Python repos, vetted by 93 developers. The standard "can it actually fix bugs" score. ✅ source |
| Fast-apply model | A small, cheap model that turns a big model's described edit into an exact file patch. Lets a harness use one model to think and another to apply — faster and cheaper. |
| Router / gateway | A layer that sits in front of many models/providers and picks (or lets you pick) where each request goes — for cost, speed, or fallback (e.g. OpenRouter, LiteLLM). |
| Reseller-proxy | A third party reselling upstream API access (often pooled or subscription-backed) through their own endpoint. Cheap, but watch ToS, uptime, and data handling. |
| ZDR (Zero Data Retention) | Provider contractually doesn't store your prompts/outputs after the request completes — the bar to look for if your code is sensitive. |
Quick mental model: you pick a model (the brain), run it inside a harness (the hands), and reach it through an endpoint — official, a router, or a reseller-proxy — billed either flat-rate or pay-as-you-go, bounded by rate limits.
Sources: Anthropic prompt caching docs, OpenAI — Introducing SWE-bench Verified
A short, honest account of where these numbers come from and how to trust them.
Each plan's ⭐ is a single blended score, not an average of equal parts. It weighs the five axes from How to choose — 💵 price, 🧠 power, 🔢 model count, 📊 limits, 🔌 integration — but integration and real effective cost carry the most weight, because a great model behind a clumsy backend or a deceptive credit ratio is worth less in practice.
| ⭐ | Read it as |
|---|---|
| ⭐5 | Best-in-class for its lane. Buy with confidence; few caveats. |
| ⭐4 | Strong pick with one real trade-off (price tier, lock-in, a metering quirk). |
| ⭐3 | Situational — good only if its niche matches you, or trust/reliability is unproven. |
| ⭐1–2 | Listed for completeness or as a warning; not recommended (e.g. overpriced, dying). |
Ratings are relative within a lane, not across the whole list — a ⭐5 free tier and a ⭐5 frontier sub are not the same money. The lane (section) is the context.
✅ marks a plan whose headline price and core limits were cross-checked against the vendor's own pricing page (linked inline) at the date in the section header. It is a pricing check, not a quality endorsement and not a benchmark audit — SWE-bench figures stay vendor-reported and directional. No ✅ means the numbers are from secondary sources (community reports, reviews) and are more likely to drift. Absence of ✅ is not a red flag; it just means "trust but verify harder."
Prices in this space churn monthly. Every figure is dated ~June 2026 (see the banner and section headers).
- Plans get re-checked when a section is touched; the date stamp tells you how stale a number may be.
- We log known churn explicitly so you can pattern-match the risk: GLM doubled prices Feb 2026, Qwen Lite closed to new subs Mar 2026, Copilot moved to credits Jun 2026, Gemini CLI sunsets Jun 18 2026. These are examples of the rate of change, not exceptions.
- Always confirm on the official page before paying. This list is a shortlist, not a price oracle.
Two streams, deliberately kept separate:
- Facts (price, limits, endpoints, context windows) — from official pricing/docs pages, linked inline as source so you can re-verify in one click.
- Sentiment (what's actually good, what burns people) — aggregated from Reddit (r/LocalLLaMA, r/ChatGPTCoding, r/ClaudeAI, r/cursor), Hacker News, and neutral independent blogs/comparison sites (patshead, InfoWorld, serenitiesai, codingplan.fyi). Sentiment shapes the ⭐ and the prose; it never silently overrides a vendor's stated number.
Where the two disagree (e.g. advertised "no weekly limits" vs. reported hidden daily caps), both are stated and the gap is flagged with
- No referral links. No affiliate codes. No sponsored placements. Every URL is a plain link to the vendor or source.
- Not affiliated with, paid by, or endorsed by any listed vendor. Inclusion and ⭐ rank are independent of any vendor relationship — because there are none.
- Reseller-proxy entries are listed with risk warnings, not as endorsements (details).
Pricing fixes are as valuable as new entries. To report drift:
- Open an issue or PR (CONTRIBUTING.md) — include the plan name, the wrong value, the correct value, and a vendor-page link with the date you saw it.
- Keep entries in the right lane, sorted by value, with a source link and concrete numbers.
- A dated official-page link is the fastest path to merge — it lets a maintainer re-verify the ✅ in seconds.
- Pricing volatility: every number here can change within weeks. GLM doubled prices Feb 2026; Qwen Lite closed to new subs Mar 2026; Cerebras is perpetually sold out; Gemini 2.5 Pro stopped being free April 2026; models EOL constantly. Confirm on the official page before buying.
- Vendor benchmarks: SWE-bench numbers are largely self-reported and contamination-prone. Treat as directional.
- Same model ≠ same quality: an open-weight model performs differently across hosts (quantization + serving config). Test with short commitments; hedge across 2–3 plans.
- China-hosting: GLM/Kimi/DeepSeek/MiniMax/Qwen are China-hosted, which is a data-residency problem for sensitive or enterprise code. US-host alternatives: Synthetic.new, first-party US subs.
- ToS: routing a consumer Claude/Copilot subscription into third-party tools, or using reseller relays, can violate provider ToS and risk an account ban. This list documents what exists; it doesn't endorse ToS violations.
- Not affiliated with or endorsed by any listed vendor. No referral links.
Corrections and additions welcome — pricing changes monthly, so fixes are as valuable as new entries. See CONTRIBUTING.md. Keep entries in the right section, sorted by value, with a source link and concrete numbers.
To the extent possible under law, the contributors have waived all copyright and related rights to this work (CC0 1.0).
